Pipe CSV stdout into read.csv - r

As a follow-up to my previous question, say I have a console application that will write CSV data to standard out.
How can I use this with read.csv or some equivalent?

Read the help(Connections) documentation and 'just do it' :)
read.csv() can consume a URL via url(), output from a pipe via pipe() and even work files old-fashioned files. There is an entire manual devoted to data input/output too.

Related

Opening Alteryx .yxdb files in R

Similar to the question below, I was wondering whether there is a way to open .yxdb files in R?
Open Alteryx .yxdb file in Python?
YXDB is Alteryx's own native format. I haven't seen or heard of anything else that can open it.
You could change your workflow to write to a CSV (or other compatible file) as well as writing to the YXDB file.
AFAIK there is no way yet for R to read yxdb files. I also export my Alteryx workflows to CSVs or use the R tool, read.Alteryx, and saveRDS to save it as a fast-loading binary file.

Can data.table's fread accept connections?

I have an executable that I can call using the system() command. This executable will print some data which I can pipe into R using:
read.csv(pipe(command))
fread has amazing performance which I would like to take advantage of bringing the data in, but I cannot use fread(pipe(command)). The alternative is to use the executable and dump its output to a file first, then read it in using fread. Doing so requires writing intermediate data to disk, and also adds overhead by introducing that intermediate step. Is there a way to wrap or use fread with my executable?
fread can't take connections for now and the feature has been requested in 2015: https://github.com/Rdatatable/data.table/issues/561
Even though Maksim's comment would be valid, it would not work on a windows machine. Which, in some cases can be troublesome.

R: Generic Function to Uncompress Files

I need to read multiple compressed files with different formats of compression. I do not wish to manually uncompress all the files. I would like R to handle the uncompression and reading independent of the compression format. This is where I'm stuck.
I could construct a function with a switch case sort of structure for zip - unzip, gz - gzfile, etc. but I would like to know if there already is some function that can uncompress files irrespective of the compression format.
Any suggestions are appreciated. Thanks a lot!
PS:
I know out that read.table can read (some, if not all) compressed files. However, I've been inching towards data.table::fread (because it is much faster), and that seems to not be able to read compressed files (http://r.789695.n4.nabble.com/fread-on-gzipped-files-td4663116.html - yet?). I would prefer temporarily uncompressing and using fread rather than using read.table.
Then here's an upvote :-)
Btw I don't think there is a generic "uncompress" function that does the magic for you (like in any of the shell languages). The options may be simply too broad -- but I suspect you cover 80% of the cases with zip/tar/rar.
Just write a simple uncompress <- function(type = c("zip", "tgz", "tar", "arj :-)))")) {...} that was your original intention.

R passing data frame to another program using system()

I have a data frame that I pass to another program using system(). In the current setup, I first write the contents of the dataframe to a text file, then have the system() command look for the created text file.
df1 <- runif(20)
write(df1, file="file1.txt")
system("myprogram file1.txt")
I have 2 questions:
1) Is there a way to pass a dataframe directly without writing the text file?
2) If not, is there are way to pass the data in memory as a text formatted entity without writing the file to disk?
Thanks for any suggestions.
You can write to anything R calls connections, and that includes network sockets.
So process A can write to the network, and process B can read it without any file-on-disk involved, see help(connections) which even has a working example in the "Examples" section.
Your general topic here is serialization, and R does that for you. You can also pass data that way to other programs using tools that encode metadata about your data structure -- as for example Google's Protocol Buffers (supported in R by the RProtoBuf package).
I spent quite a while and couldn't understand the accepted answer. But I figured out a workaround.
df1 <- runif(20)
system("myprogram /dev/stdin", input = write.table(df1))
However, according to documentation, the input argument will actually be redirected to a temp file, which I suppose will involve some i/o.

Is there a way to read and write in-memory files in R?

I am trying to use R to analyze large DNA sequence files (fastq files, several gigabytes each), but the standard R interface to these files (ShortRead) has to read the entire file at once. This doesn't fit in memory, so it causes an error. Is there any way that I can read a few (thousand) lines at a time, stuff them into an in-memory file, and then use ShortRead to read from that in-memory file?
I'm looking for something like Perl's IO::Scalar, for R.
I don’t know much about R, but have you had a look at the mmap package?
It looks like ShortRead is soon to add a "FastqStreamer" class that does what I want.
Well, I don't know about readFastq accepting something other than a file...
But if it can, for other functions, you can use the R function pipe() to open a unix connection, then you could do this with a combination of unix commands head and tail and some pipes.
For example, to get lines 90 to 100, you use this:
head file.txt -n 100 | tail -n 10
So you can just read the file in chunks.
If you have to, you can always use these unix utilities to create a temporary file, then read that in with shortRead. It's a pain but if it can only take a file, at least it works.
Incidentally, the answer to generally how to do an in-memory file in R (like Perl's IO::Scalar) is the textConnection function. Sadly though, the ShortRead package cannot handle textConnection objects as inputs, so while the idea that I expressed in the question of reading a file in small chunks into in-memory files which are then parsed bit by bit is certainly possible for many applications, but not for may particular application since ShortRead does not like textConnections. So the solution is the FastqStreamer class described above.

Resources