R passing data frame to another program using system() - r

I have a data frame that I pass to another program using system(). In the current setup, I first write the contents of the dataframe to a text file, then have the system() command look for the created text file.
df1 <- runif(20)
write(df1, file="file1.txt")
system("myprogram file1.txt")
I have 2 questions:
1) Is there a way to pass a dataframe directly without writing the text file?
2) If not, is there are way to pass the data in memory as a text formatted entity without writing the file to disk?
Thanks for any suggestions.

You can write to anything R calls connections, and that includes network sockets.
So process A can write to the network, and process B can read it without any file-on-disk involved, see help(connections) which even has a working example in the "Examples" section.
Your general topic here is serialization, and R does that for you. You can also pass data that way to other programs using tools that encode metadata about your data structure -- as for example Google's Protocol Buffers (supported in R by the RProtoBuf package).

I spent quite a while and couldn't understand the accepted answer. But I figured out a workaround.
df1 <- runif(20)
system("myprogram /dev/stdin", input = write.table(df1))
However, according to documentation, the input argument will actually be redirected to a temp file, which I suppose will involve some i/o.

Related

"filename.rdata" file Exploring and Converting to CSV

I'm no R-programmer (because of the problem I started learning it), I'm using Python, In a forcasting task I got a dataset signalList.rdata of a pheomenen called partial discharge.
I tried some commands to load, open and view, Hardly got a glimps
my_data <- get(load('C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/signalList.Rdata'))
but, since i lack deep knowledge about R, I wanted to convert it into a csv file, or any type that I can deal with in python.
or, explore it and copy-paste manually.
so, i'm asking for any solution whether using R or Python or any tool to get what's in the .rdata file.
Have you managed to load the data successfully into your working environment?
If so, write.csv is the function you are looking for.
If not,
setwd("C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/")
signalList <- load("signalList.Rdata")
write.csv(signalList, "signalList.csv")
should do the trick.
If you would like to remove signalList from your working directory,
rm(signalList)
will accomplish this.
Note: changing your working directory isn't necessary, it just makes it easier to read in a comment I feel. You may also specify another path for saving your csv to within the second argument of write.csv.

Command to use with easy way the insert of R dataframe

I have a dataframe loaded successfully in R.
I would like to give the data of df to someone else to use them with quick and easy way without need to load again the file into a df.
Which is the command to give the whole data of df (not the str())
You can save the file into a .RData using save or save.image, depending on your needs. First one will save specific objects while the latter will dump the whole workspace to a file. This method has the advantage of working on probably any R object.
Another option is as #user1945827 mentioned, using dput which will produce a string that is parseable into another R session. This will not work for complex (like S4) objects.

Streaming data in Julia

Currently, is there a good way to read data in Julia in a streaming fashion?
For example, let's say I have a CSV file that is too big to fit in memory. Are there currently built in functions or a library that facilitates working with this?
I'm aware of the prototype DataStream functionality in DataFrames, but that's not currently exposed via a public API.
The eachline function turns an IO source into an iterator of lines. That should allow you to read a file a line at a time. from there the readcsv and readdlm function can read each line if you turn it into an IOBuffer.
for ln in eachline(open("file.csv"))
data = readcsv(IOBuffer(ln))
# do something with this data
end
It's still pretty do it yourself but there aren't that many steps so it's not too bad.

How to write multiple tables, dataframes, regression results etc - to one excel file?

I am looking for an easy way to get objects into MS Excel.
(I am using the preinstalled "Puromycin"-dataset for the examples)
I would like to place the contents of these objects to a single excel file:
Puromycin
summary(Puromycin$rate)
summary(Purymycin$conc)
table(Puromycin$state)
lm( conc ~ rate , data=Puromycin)
By "contents" i mean what is shown in the console when i press enter. I dont know what to call it.
I tried to do this:
sink("datafilewhichexcelhopefullyunderstands.csv")
Puromycin
summary(Puromycin$rate)
summary(Purymycin$conc)
table(Puromycin$state)
lm( conc ~ rate , data=Puromycin)
sink()
This gives med a file with the CSV-extension, however when i open the file in notepad,
there is comma-separation. That means that i cant get Excel to open it properly. By properly
i mean that each number is in its own cell.
Others have suggested this for a similar problem
https://stackoverflow.com/a/13007555/1831980
But as a novice i feel that the solution is too complex, and I am hoping for a simpler method.
What I am doing now is this:
write.table(Puromycin, file="clipboard" , sep=";" , row.names=FALSE )
write.table(summary(Purymycin$conc), file="clipboard" , sep=";" , row.names=FALSE )
... etc...
But this requires i lot of copy-ing and pasting, which I hope to eliminate.
Any help would appreciated.
write.table and its friends are intended to write out columns of data separated by whatever separator is specified. Your clipboard contains several data types because you are using summary which always gives a unique output.
For writing the data values out, you can use write.csv on a data frame and then open with Excel. For example, Puromycin is already a data frame (which you can see with str(Puromycin)) so you can just write it out directly:
write.csv(file = "some file.csv", x = Puromycin)
Which will go into the current working directory (which can be determined with getwd()).
To write out/save the results of the regression model is a bit more of a challenge. You could definitely use sink as you did, but specify an extension of .txt on your file so a text editor can open it. There are fancier methods (sweave, knitr) which you might want to look into in the long run, as they can write really nice reports automatically.
In the meantime, get to know str(any R object) as it will be your friend. You can see all the objects in your workspace with ls().
This will only be helpful if you are prepared to use Excel's Data/Text to Columns functions:
capture.output( sapply( c(Puromycin,
summary(Puromycin$rate),
summary(Puromycin$conc),
table(Puromycin$state),
lm( conc ~ rate , data=Puromycin) ), FUN=print), file="datafilewhichexcelhopefullyunderstands.csv", append=TRUE)
The problem being that Excel will not read the whitespace as a cell separator unless you specifically tell it to. You can (and I have often done so) use the fixed filed input features offered by the Text-to-Columns dialog interface.
Your simplest option may be to use the RExcel tool, it transfers information between R and Excel. However it is not free software.
The XLConnect package is another option, it can be used to write information directly to an Excel file.
The tricky part is the lm call. lm does not return a simple vector, matrix, or data frame (all of which are easy to convert to csv or send directly) and there is not a clear way to convert the various parts of a list to cells in a spreadsheet. What would be better is to use extractor functions to pull the important parts from the return of lm or the summary of the lm object and send those to Excel using the other tools.
If you can tell us more about why you want the numbers in Excel and what you plan to do with them after, then we may be able to offer better help (you may be able to completely skip excel).
If the main goal is to share output with others then you should really look at the knitr package (or other related packages). This will not create Excel files, but can be used (along with the pandoc program and possibly other tools) to create a report file in a format easy to share with others not familiar with R. You could put everything into a .pdf file or a .docx file (the latter read by MS Word and would have tables wich can be edited using Word). There is not a simple way to get edits back into R, but with the track changes you can easily see what changes have been made and hand edit your R script/template accordingly.

Open large files with R

I want to process a file (1.9GB) that contains 100.000.000 datasets in R.
Actually I only want to have every 1000th dataset.
Each dataset contains 3 Columns, separated by a tab.
I tried: data <- read.delim("file.txt"), but R Was not able to manage all datasets at once.
Can I tell R directly to load only every 1000th dataset from the file?
After reading the file I want to bin the data of column 2.
Is it possible to directly bin the number written in column 2?
Is it possible the read the file line by line, without loading the whole file into the memory?
Thanks for your help.
Sven
You should pre-process the file using another tool before reading into R.
To write every 1000th line to a new file, you can use sed, like this:
sed -n '0~1000p' infile > outfile
Then read the new file into R:
datasets <- read.table("outfile", sep = "\t", header = F)
You may want to look at the manual devoted to R Data Import/Export.
Naive approaches always load all the data. You don't want that. You may want another script which reads line-by-line (written in awk, perl, python, C, ...) and emits only every N-th line. You can then read the output from that program directly in R via a pipe -- see the help on Connections.
In general, very large memory setups require some understanding of R. Be patient, you will get this to work but once again, a naive approach requires lots of RAM and a 64-bit operating system.
Maybe package colbycol could be usefull to you.

Resources