Robotframework : How to chunk a binary file or read the file chunk by chunk - robotframework

I have a variable with the binary file read it from a file in Robotframework:
${fileData}= Get Binary File ${CHUNK_GEOJSON_FILE_UPLOAD_PATH}
This keyword read the entire file, no arguments to determine the among of bytes to be read. So what I actually need is to save in ${fileData} only 1MB, or I need to separate the entire file into differents chunks(1Mb) because I will use those chunks to upload the file by chunks using the PATCH from the tus protocol.
Any help will be appreciated

The keyword you are using can get all file details at one go. Can't separate with size that you are intended to do.
I suggest you writing python function and call that as keyword to do this activity.

Related

PDF File Import R

I have multiple .pdf-files (stored in a local folder), that contain text. I would like to import the .pdf-files (i.e., the texts) in R. I applied the function 'read_dir' (R package: [textreadr][1])
library ("textreadr")
Data <- read_dir("<MY PATH>")
The function works well. BUT. For several files, that include special characters (i.e., letters) in their names (such as 'ć'; e.g., 'filenameć.pdf'), the function did not work (error message: 'The following files failed to read in and were removed:' …).
What can I do?
I tried to rename the files via R (did not work (probably due to the same reasons)). That might be a workaround.
I did not want to rename the files manually :)
Follow-Up (only for experts):
For several files, I got one of the following error messages (and I have no idea why):
PDF error: Mismatch between font type and embedded font file
or
PDF error: Couldn't find trailer dictionary
Any suggestions or hints how to solve this issue?
Likely the issue concerns the encoding of the file names. If you absolutely want to use R to rename the files for you, the function you want to use is iconv, determine the encoding of the file names and then convert them to utf-8.
However, a much better system would imply renaming them using bash from command line. Can you provide a more complete set of examples?

feed treetagger in R with text in string rather than text in file

I use TreeTagger from R, through the Korpus package.
Calling the treetag function requires me to indicate a filename,
which contains the text to be processed. However, I would like to provide a string
rather than a filename, because I have a do some preliminary text processing on this string.
I guess this has to go through a file because it is wrapping a script call.
As I am looping over 10000 texts I would like to avoid writing the file to disk and waste time,
but just flow through memory.
Can I avoid this ? Thanks.
No. Or not really. As you suspect, the external script needs a file:
read the docs:
Either a connection or a character vector, valid path to a file,
containing the text to be analyzed. If file is a connection, its
contents will be written to a temporary file, since TreeTagger can't
read from R connection objects.
So its got to write it to a file for the external TreeTagger binary to read. If you don't do that, then the treetag function does it for you. Either way, the text ends up in file.
If TreeTagger can read from a Unix named pipe, or fifo, then you might be able to stream text to it on the fly.
The only other option would be to see if the TreeTagger source can be linked with R in some way so that you can call one of its subroutines directly, passing an R object. I don't even know if this is written in Java or C++ or whatever, but it might be a big job anyway.
As indicated in the documentation:
format:
Either "file" or "obj", depending on whether you want to scan files or analyze the text in a given object, like a character vector. If the latter, it will be written to a temporary file (see file).
Using this knowledge, we can simply use the treetag()-function in combination with a character vector:
treetag(as.vector(yourinput), format = "obj").
Internally R converts it to a text file and Treetagger will refer to that temporary file and analyze it.

run saxon xquery over batch of xml files and produce one output file for each input file

How do I run xquery using Saxon HE 9.5 on a directory of files using the build in command-line? I want to take one file as input and produce one file as output.
This sounds very obvious, but I can't figure it out without using saxon extensions that are only available in PE and EE.
I can read in the files in a directory using fn:collection() or using input parameters. But then I can only produce one output file.
To keep things simple, let's say I have a directory "input" with my files 01.xml, 02.xml, ... 99.xml. Then I have an "output" directory where I want to produce the files with the same names -- 01.xml, 02.xml, ... 99.xml.
Any ideas?
My real data set is large enough (tens of thousands of files) that I don't want to fire off the jvm, so writing a shell script to call the saxon command-line for each file is out of the question.
If there are no built-in command-line options, I may just write my own quick java class.
The capability to produce multiple output files from a single query is not present in the XQuery language (only in XSLT), and the capability to process a batch of input files is not present in Saxon's XQuery command line (only in the XSLT command line).
You could call a single-document query repeatedly from Ant, XProc, or xmlsh (or of course from Java), or you could write the code in XSLT instead.

Cutting A File into Chunks in Qt

Can anybody give me a hint or initial idea how may I cut a file into chunks in Qt ? Is there any particular file like java it has built in function to split. Later on I want to calculate SHA-256 hash value of each chunks. Any idea guys ??
There is no built in function for that.
Open the original file.
Open a file for the first chunk.
Read bytes from the original file.
Write bytes to the chunk file.
Repeat.
See QFile documentation.

Merging EBCDIC converted files and pdf files into a single file and pushing to mainframes

I have two pdf files and two text files which are converted into ebcdif format. The two text files acts like cover files for the pdf files containing details like pdf name, number of pages, etc. in a fixed format.
Cover1.det, Firstpdf.pdf, Cover2.det, Secondpdf.pdf
Format of the cover file could be:
Firstpdf.pdf|22|03/31/2012
that is
pdfname|page num|date generated
which is then converted into ebcdic format.
I want to merge all these files in a single file in the order first text file, first pdf file, second text file, second pdf file.
The idea is to then push this single merged file into mainframes using scp.
1) How to merge above mentioned four files into a single file?
2) Do I need to convert pdf files also in ebcdic format ? If yes, how ?
3) As far as I know, mainframe files also need record length details during transit. How to find out record length of the file if at all I succeed in merging them in a single file ?
I remember reading somewhere that it could be done using put and append in ftp. However since I have to use scp, I am not sure how to achieve this merging.
Thanks for reading.
1) Why not use something like pkzip?
2) I don't think converting the pdf files to ebcdic is necessary or even possible. The files need to be transfered in Binary mode
3) Using pkzip and scp you will not need the record length
File merging could easily be achieved by using Cat command in unix with > and >> append operators.
Also, if the next file should start from a new line (as was my case) a blank echo could be inserted between files.

Resources