I have an API pull request that returns a .gz file, which R recognizes as a "raw" file. I have not been able to find any R package that can decompress it in-memory from the saved file. I've tried fread(), rawToChar() and unzip().
The structure of the file I need to unzip is below. Specifically req$content.
Thanks for your responses. I ended up saving the variables as .gz files in a folder and decompressed them using foreach and fread() from data.table. This solution worked better because data set is really large >80gb so it would not have made sense to do it in-memory. Apologies for not being able to give a fully reproducible example as the data is purchased and not possible to share on public forum.
Related
I am looking for ways to process VSAM files with R and export as a csv.
I have been searching the web and have not been able to find any methods of using R to read VSAM files.
A little more information would be of use. How are you going to get the data from the VSAM files? Are you reading directly from an IBM system? What access method will you be using? What is the structure of the file you are reading since since if you want it to be put in a data.frame, is it something like a CSV file already?. So any other particulars would be helpful.
I'm loading an R rds file into Julia with
using RData
objs = load(rds, convert=true)
The original rds file is ~3GB. When I run the load function about, the memory spikes to ~40GB.
Any ideas what's going on?
The rds files are actually compressed using gzip. Try unzipping your file and see how big it actually is (on Windows you could use 7-zip for that). The compression level for a dataframe easily could be around 80-90% so your numbers look fine.
The 'h2o' package is a fun ML java tool that is accessible via R. The R package for accessing 'h2o' is called "h2o".
One of the input avenues is to tell 'h2o' where a csv file is and let 'h2o' upload the raw CSV. It can be more effective to just point out the folder and tell 'h2o' to import "everything in it" using the h2o.importFolder command.
Is there a way to point out a folder of "gzip" or "bzip" csv files and get 'h2o' to import them?
According to this link (here) the h2o can import compressed files. I just don't see the way to specify this for the importFolder approach.
Is it faster or slower to import the compressed form? If I have another program that makes output does it save me time in the h2o import process speed if they are compressed? If they are raw text? Guidelines and performance best practices are appreciated.
as always, comments, suggestions, and feedback are solicited.
I took the advice of #screechOwl and asked on the 0xdata.atlassian.net board for h2o and was given a clear answer:
It was supplied by user "cliff".
Hi, yes H2O - when importing a folder - takes all the files in the folder; it unzips gzip'd or zip'd files as needed, and parses them all into one large CSV. All the files have to be compatible in the CSV sense - same number and kind of columns.
H2O does not currently handle bzip files.
If I put a very small csv file in my GitHub directory so that it gets copied to /ocpu/github/username/projectname/www/ , will I be able to access the contents of the csv for use in a R function? I tried to ajax the file, but I get a 404 error even though I can see the csv file sitting in the www directory of my local server. I need to have the csv on the server as a static file rather than being uploaded by a function. Thanks
You should be able to access them like any other file. Can you post an example that shows what you are doing and what error you are getting?
That said, if you just want to use this data in your R functions, it is better to include it in the R package as an actual data file. Also see section 1.1.6 of Writing R Extensions. An example is the mapapp package, which includes a dataset called countryExData. Also see the live app.
First I should say that a lot of this is over my head, so I apologize in advance for using incorrect terminology and potentially asking an unclear question. I'm doing my best.
Also, I saw ThisPost; is RCurl the tool I want to use for this task?
Every day for 4 months I'll be analyzing new data, and generating .csv files and .png's that need to be uploaded to a web site so that other team members will be checking. I've (nearly) automated all of the data collecting, data downloading, analysis, and file saving. The analysis is carried out in R, and R saves the files. Currently I use Filezilla to manually upload the new files to the website. Is there a way to use R to upload the files to the web site, so that I don't have to open Filezilla and drag+drop files?
It'd be nice to run my R-code and walk away, knowing that once it finishes running, the newly saved files will be automatically be put on the website.
Thanks for any help!
You didn't specify which protocol you use to upload your files using FileZilla. I assume it is ftp. If so, you can use the ftpUpload function of RCurl:
library(RCurl)
ftpUpload("yourfile", "ftp://ftp.yourserver.foo/yourfile",
userpwd="username:passwd")
RCurl also had methods for scp and should also support sftp using ftpUpload.