I am pulling in NetCDF data from a remote server using data <- httr:GET(my_url) in an R session. I can writeBin(content(data, "raw"), "my_file.nc") and then nc_open("my_file.nc") but that is rather cumbersome (I am processing hundreds of NetCDF files).
Is there a way to convert the raw data straight into a ncdf4 object without going through the file system? For instance, would it be possible to pipe the raw data into nc_open()? I looked at the source code and the function prototype expects a named file, so I suppose a named pipe might work but how do I make a named pipe from a raw blob of bytes in R?
Any other suggestions welcome.
Related
I have a large (210 038 KB) txt file which contains json structured data. It contains itinerary data, which I would like to structure the data on a journey basis, which should be easy enough as long as I can find where in the nesting this is located. My main challenge is that I don't know the structure of the data, and when I try to read it in R with for instance read.table('datafile.txt', header=FALSE) it either runs for a very long time and then crashes, alternatively it produced an unsatisfactory result by separating on "wrong" character (and then it had to restart itself).
I've glanced this post: Parsing JSON arrays from a .txt file in R - several large files which is similar to mine, but there the data were separated by newlines. I instead need to iteratively read the json structure and find out what it's comprised of.
Any suggestions?
In python we can import io and then make make a file like object with some_variable=io.BytesIO() and then download any type of file to that and interact with it like it were a locally saved file except that it's in memory. Does R have something like that? To be clear I'm not asking about what any particular OS does when you save some R object to a temp file.
This is kind of a duplicate of Can I write to and access a file in memory in R? but that is about 9 years old so maybe the functionality exists now either in base or with a package.
Yes, readBin.
readBin("/path", raw(), file.info("/path")$size)
This is a working example:
tfile <- tempfile()
writeBin(serialize(iris, NULL), tfile)
x <- readBin(tfile, raw(), file.info(tfile)$size)
unserialize(x)
…and you get back your iris data.
This is just an example, but for R objects, it is way more convenient to use readRDS/saveRDS().
However, if the object is an image you want to analyse, readBin gives a raw memory representation.
For text files, you should then use:
rawToChar(x)
but again there are readLines(), read.table(), etc., for these tasks.
I am running a long time script (gets information from the server), which runs the whole day and sink() function saves the output to .txt format. I heard that sometimes sink() function stops abruptly if a huge file is created. In my case, the file size is approx. 100-200mb. Which file format is better to use in order to save some space? or is there are any other functions to save data to my computer?
The first option that comes to mind is the feather package. It stores data frames in binary format, which allows you to push and pull data frames easily. The data should also be lightweight in memory compared to traditional options like sink().
An example workflow would be:
#write data
library(feather)
path <- "my_data.feather"
write_feather(df, path)
#read data
df <- read_feather(path)
Without having your data on hand to benchmark myself, try it out, and let me know if it's indeed faster
I am working with an Excel file saved in S3. I am trying to access it using R. To get the file I am using fl <- get_object(paste(file_path,file_name),bucket = bucket). This works fine and returns the file in raw vector format. The problem I am having is that any function I have found to read an Excel file requires an actual file (ie path), not a raw vector.
Is there a way to read a raw vector (of an Excel file) into a data frame? Or, convert the raw vector back to an Excel file so I can reference that file in read_excel() or the like?
The python code below does what I need, but for reasons far beyond my control, I must do this in R.
fl = s3.get_object(Bucket=bucket,Key= file_path + file_name)
df = pd.read_excel(fl['Body'])
I know the as.h2o function from h2o library converts an R data.frame to an H2O frame. Two questions:
Does as.h2o() write data to disk during conversion? How long is this data stored?
Are there other options that avoids the temp step of writing to disk?
The exact path of running as.h2o on a data.frame, df :
path <- write.csv(df)
h2o.upload(path)
remove.file(path)
We temporarily write to disk the data.frame and then subsequently upload rather than import the file into H2O and as soon as the file is uploaded we delete the temporary frame. There is no cleaner alternative to not writing to disk.