.json file is too large to be opened in R with rjson - r

I have a 5.1 GB json file that I would like to read in R using rjson. I want afterwards to construct a dataframe from it, however it won't load because the size is too large.
Is there any way to work around it?
Thank you for your help =)

Nina, I would recommend you using jsonlite package instead of rjson.
library(jsonlite)
your_json <- "your_path.json"
unpacked_json <- jsonlite::stream_in(textConnection(readLines(your_json, n=100000)),verbose=F)
Here you limit the page size to let IDE correctly read your JSON file. For more information I would also recommend you to make some research on this topic:
https://community.rstudio.com/t/how-to-read-large-json-file-in-r/13486
Reading a huge json file in R , issues
I know for sure that it is sometimes really hard to cope with documentation (and as all other human beings we are lazy); and I don't like to read doc-n myself, but I highly recommend you to make yourself familiar with jsonlite documentation and vignettes. Here's the CRAN link: https://cran.r-project.org/web/packages/jsonlite/index.html

Related

Loading a .RDS file in Python 3

I have a dataset in RDS format that I managed in RStudio, but I would like to open this in Python to do the analysis. Would it be possible to open this type of format into Python?
I tried the following codes already:
pip install pyreadr
import pyreadr
result = pyreadr.read_r('/path/to/file.Rds')
However, I get a
MemoryError: Unable to allocate 18.9 MiB for an array with shape
(2483385,) and data type float64.
What can I do?
Pyreadr is a wrapper around the C library librdata, and librdata has a hardcoded limit on the size an R vector can have. The limit used to be very low in old versions, but it was increased. Your vector would fail in older versions but should work in a recent one, so please check that you are using the most recent version.
If that doesn't help, then it may be a bug. If you can share the file please submit an issue in github.
Here a link to the old issues in github librdata and pyreadr (theoretically now solved)
https://github.com/WizardMac/librdata/issues/19.
https://github.com/ofajardo/pyreadr/issues/3
EDIT:
The limit is now permently removed in pyreadr 0.3.0. Now this should not be an issue anymore.
From my knowledge, you could store the data to a pandas dataframe as mentioned in this link.
The second option(link)
How can I explicitly free memory in Python?
If you wrote a Python program that acts on a large input file to create a few million objects representing and it’s taking tons of memory and you need the best way to tell Python that you no longer need some of the data, and it can be freed?
The Simple answer to this problem is:
Force the garbage collector for releasing an unreferenced memory with gc.collect().
I hope this answers your query

Extracting text between double quotation marks in R [duplicate]

I am fairly new to R, but the more use it, the more I see how powerful it really is over SAS or SPSS. Just one of the major benefits, as I see them, is the ability to get and analyze data from the web. I imagine this is possible (and maybe even straightforward), but I am looking to parse JSON data that is publicly available on the web. I am not a programmer by any stretch, so any help and instruction you can provide will be greatly appreciated. Even if you point me to a basic working example, I probably can work through it.
RJSONIO from Omegahat is another package which provides facilities for reading and writing data in JSON format.
rjson does not use S4/S3 methods and so is not readily extensible, but still useful. Unfortunately, it does not used vectorized operations and so is too slow for non-trivial data. Similarly, for reading JSON data into R, it is somewhat slow and so does not scale to large data, should this be an issue.
Update (new Package 2013-12-03):
jsonlite: This package is a fork of the RJSONIO package. It builds on the parser from RJSONIO but implements a different mapping between R objects and JSON strings. The C code in this package is mostly from the RJSONIO Package, the R code has been rewritten from scratch. In addition to drop-in replacements for fromJSON and toJSON, the package has functions to serialize objects. Furthermore, the package contains a lot of unit tests to make sure that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.
The jsonlite package is easy to use and tries to convert json into data frames.
Example:
library(jsonlite)
# url with some information about project in Andalussia
url <- 'https://api.stackexchange.com/2.2/badges?order=desc&sort=rank&site=stackoverflow'
# read url and convert to data.frame
document <- fromJSON(txt=url)
Here is the missing example
library(rjson)
url <- 'http://someurl/data.json'
document <- fromJSON(file=url, method='C')
The function fromJSON() in RJSONIO, rjson and jsonlite don't return a simple 2D data.frame for complex nested json objects.
To overcome this you can use tidyjson. It takes in a json and always returns a data.frame. It is currently not availble in CRAN, you can get it here: https://github.com/sailthru/tidyjson
Update: tidyjson is now available in cran, you can install it directly using install.packages("tidyjson")
For the record, rjson and RJSONIO do change the file type, but they don't really parse per se. For instance, I receive ugly MongoDB data in JSON format, convert it with rjson or RJSONIO, then use unlist and tons of manual correction to actually parse it into a usable matrix.
Try below code using RJSONIO in console
library(RJSONIO)
library(RCurl)
json_file = getURL("https://raw.githubusercontent.com/isrini/SI_IS607/master/books.json")
json_file2 = RJSONIO::fromJSON(json_file)
head(json_file2)

readRDS fails to read in file in R. Is there an alternative?

I'm trying to read the RDS file I downloaded from here:
https://github.com/jcheng5/googleCharts/tree/master/inst/examples/bubble
However, when I try to load it into RStudio via:
data<- readRDS('/Users/nathanf/shinyCharts/healthexp.rds')
I get the Error: unknown input format.
I've searched and found a possible solution already posted on StackOverflow, but the solutions mentioned in it do not work.
Does not work:
readRDS(file) in R
Please note, I'm trying to do this with a freshly installed copy of R (3.2.1) on a Mac running Yosemite.
I've found articles online that say the readRDS function is now defunct. https://stat.ethz.ch/R-manual/R-devel/library/base/html/base-defunct.html
Sooooooo....dearest community, what should I do? Is there another way to read RDS files using a new function?
Any help would be much appreciated.
Thank you,
Nathan
I faced exactly the same problem and I can recommend switching from .rds objects into .RData objects. Simply:
save(random_forest1, "random_forest2.RData")
and then
load("random_forest2.RData")
Just for clarity, you will find your object named as random_forest1 after using load function

Speeding up matlab file import

I am trying to load a matlab file with the R.matlab package. The problem is that it keeps loading indefinitely (e.g. table <- readMat("~/desktop/hg18_with_miR_20080407.mat"). I have a genome file from the Broad Institute (hg18_with_miR_20080407.mat).
You can find it at:
http://genepattern.broadinstitute.org/ftp/distribution/genepattern/dev_archive/GISTIC/broad.mit.edu:cancer.software.genepattern.module.analysis/00125/1.1/
I was wondering: has anyone tried the package and have similar issues?
(hopefully helpful, but not really an answer, though there was too much formatting for a comment)
I think you may need to get a friend with matlab access to save the file into a more reasonable format or use python for data processing. It "hangs" for me as well (OS X 10.9, R 3.1.1). The following works in python:
import scipy.io
mat = scipy.io.loadmat("hg18_with_miR_20080407.mat")
(you can see and work with the rg and cyto' crufty numpy arrays, but they can't be converted to JSON withjson.dumpsand evenjsonpickle.encodecoughs up a lung-full of errors (i.e. you won't be able to userPython` to get access to the object which was the original workaround I was looking for), so no serialization to a file either (and, I have to believe the resultant JSON would have been ugly to deal with).
Your options are to:
get a friend to convert it (as suggested previous)
make CSV files out of the numpy arrays in python
use matlab

Is there an existing way to import EDF files into R?

I have some European Data Format (EDF) files that I would like to import into R.
There are some Python libraries for parsing EDF files and the EDF spec is available, so I know it's possible, but I would avoid writing code if I could.
Does there already exist a facility for importing these kinds of files?
Was looking for the same thing. Found this function written by Fabien Feschet - works well for my data. http://feschet.fr/?p=11
Found another resource recently. This works very well. Need to download both read_edf.R and utilities.R
https://github.com/bwrc/edf/tree/master/R
I tried look for the same thing a while ago, but I couldn't find anything for R. I ended up using biosig Python module to convert edfs to ascii. There is also this edf2ascii-converter.
I guess there wasn't any package available at the time when the question was asked but now you could use edfReader:
https://cran.r-project.org/web/packages/edfReader/
https://github.com/Pisca46/edfReader

Resources