Speeding up matlab file import - r

I am trying to load a matlab file with the R.matlab package. The problem is that it keeps loading indefinitely (e.g. table <- readMat("~/desktop/hg18_with_miR_20080407.mat"). I have a genome file from the Broad Institute (hg18_with_miR_20080407.mat).
You can find it at:
http://genepattern.broadinstitute.org/ftp/distribution/genepattern/dev_archive/GISTIC/broad.mit.edu:cancer.software.genepattern.module.analysis/00125/1.1/
I was wondering: has anyone tried the package and have similar issues?

(hopefully helpful, but not really an answer, though there was too much formatting for a comment)
I think you may need to get a friend with matlab access to save the file into a more reasonable format or use python for data processing. It "hangs" for me as well (OS X 10.9, R 3.1.1). The following works in python:
import scipy.io
mat = scipy.io.loadmat("hg18_with_miR_20080407.mat")
(you can see and work with the rg and cyto' crufty numpy arrays, but they can't be converted to JSON withjson.dumpsand evenjsonpickle.encodecoughs up a lung-full of errors (i.e. you won't be able to userPython` to get access to the object which was the original workaround I was looking for), so no serialization to a file either (and, I have to believe the resultant JSON would have been ugly to deal with).
Your options are to:
get a friend to convert it (as suggested previous)
make CSV files out of the numpy arrays in python
use matlab

Related

.json file is too large to be opened in R with rjson

I have a 5.1 GB json file that I would like to read in R using rjson. I want afterwards to construct a dataframe from it, however it won't load because the size is too large.
Is there any way to work around it?
Thank you for your help =)
Nina, I would recommend you using jsonlite package instead of rjson.
library(jsonlite)
your_json <- "your_path.json"
unpacked_json <- jsonlite::stream_in(textConnection(readLines(your_json, n=100000)),verbose=F)
Here you limit the page size to let IDE correctly read your JSON file. For more information I would also recommend you to make some research on this topic:
https://community.rstudio.com/t/how-to-read-large-json-file-in-r/13486
Reading a huge json file in R , issues
I know for sure that it is sometimes really hard to cope with documentation (and as all other human beings we are lazy); and I don't like to read doc-n myself, but I highly recommend you to make yourself familiar with jsonlite documentation and vignettes. Here's the CRAN link: https://cran.r-project.org/web/packages/jsonlite/index.html

Loading a .RDS file in Python 3

I have a dataset in RDS format that I managed in RStudio, but I would like to open this in Python to do the analysis. Would it be possible to open this type of format into Python?
I tried the following codes already:
pip install pyreadr
import pyreadr
result = pyreadr.read_r('/path/to/file.Rds')
However, I get a
MemoryError: Unable to allocate 18.9 MiB for an array with shape
(2483385,) and data type float64.
What can I do?
Pyreadr is a wrapper around the C library librdata, and librdata has a hardcoded limit on the size an R vector can have. The limit used to be very low in old versions, but it was increased. Your vector would fail in older versions but should work in a recent one, so please check that you are using the most recent version.
If that doesn't help, then it may be a bug. If you can share the file please submit an issue in github.
Here a link to the old issues in github librdata and pyreadr (theoretically now solved)
https://github.com/WizardMac/librdata/issues/19.
https://github.com/ofajardo/pyreadr/issues/3
EDIT:
The limit is now permently removed in pyreadr 0.3.0. Now this should not be an issue anymore.
From my knowledge, you could store the data to a pandas dataframe as mentioned in this link.
The second option(link)
How can I explicitly free memory in Python?
If you wrote a Python program that acts on a large input file to create a few million objects representing and it’s taking tons of memory and you need the best way to tell Python that you no longer need some of the data, and it can be freed?
The Simple answer to this problem is:
Force the garbage collector for releasing an unreferenced memory with gc.collect().
I hope this answers your query

R2PPT crashes R; are there alternatives to R2PPT?

I am attempting to automate the insertion of JPEG images into Powerpoint. I have a macro done for that already, except using R would be infinitely better for my purposes.
The package R2PPT should do this, I understand. However, I cannot use it. For example, when I try to use PPT.Open, I understand I can do it two different ways by calling method = "rcom" or method = "RDCOMClient". Using the latter, R will always crash, sending an error report to windows. Using the former, it tells me I need to install statconnDCOM , before giving the error:
Error in PPT.Open(x) : attempt to apply non-function.
I cannot install statconnDCOM freely, as I wouldn't call this work non-commercial. So if there isn't a way to get around this issue, are there at least some free alternatives to R2PPT so that I can save several hours of manual work with a simple R code? If there is a way for me to use R2PPT, that would be ideal.
Thanks!
Edit:
I'm using R version 2.15 and downloaded the most recent version of R2PPT. Powerpoint is 2007.
Do you have administrative privileges on this machine?
There is an issue with package RDCOMClient. It needs permissions to write file rdcom.err in the root of drive C:. If you don't have privileges to write to c:, there is a rather cumbersome workaround:
Close R
Create "c:\temp" folder if it doesn't exist.
Locate on your hard drive file rdcomclient.dll. It usually placed in \R\library\RDCOMClient\libs\i386\ and in \R\library\RDCOMClient\libs\x64\ (you need to patch file which corresponds your Windows version - 32 bit or 64 bit). It's recommended to make backup copy of this files before patching.
Open rdcomclient.dll in text editor (Notepad++, for example -http://notepad-plus-plus.org/)
Find in file string c:\rdcom.err - it occurs only once.
Go into overwrite mode (usually by pressing "Ins" key). It is very important that new path will have the same number of characters as original one. Type C:\temp\e.rr instead of c:\rdcom.err
Save the file.
Now all should work fine.
Arguably not an answer, but have you looked at using Sweave/knitr to render your presentations in LaTeX using something like Beamer? (As discussed on slide 17 here.)
Wouldn't help any with getting JPGs into a PowerPoint, but would certainly make putting R-output (numerical or graphical) into a presentation much easier!
Edit: if you want to use knitr (which I recommend), here's another reference.

plotting data in R from matlab

I was wondering if it is possible to work between matlab and R to plot some data. I have a script in matlab which generates a text file. From this I was wondering if it was possible to open R from within matlab and plot the data from this text file and then return to matlab.
For example, if I save a text file called test.txt, in the path 'E:\', and then define the path of R which in my case will be:
pathR = 'C:\Program Files\R\R-2.14.1\bin\R';
Is it possible to run a script already written in R saved under test1.R (saved in the same directory as test.txt) in R from matlab?
If you're working with Windows (from the path it looks like you are), you can use the MATLAB R-link from the File Exchange to pass data from Matlab to R, execute commands there, and retrieve output.
I don't use R so this is not something I have done but I see no reason why you shouldn't be able to use the system function to call R from a Matlab session. Look in the product documentation under the section Run External Commands, Scripts, and Programs for this and related approaches.
There are some platform-specific peculiarities to be aware of and you may have to wrestle a little with what is returned (though since you are planning to have R create a plot which is likely to be a side-effect rather than what is returned you may not). As ever this is covered quite well in the product's documentation
After using R(D)COM and Matlab R-link for a while, I do not recommend it. The COM interface has trouble parsing many commands and it is difficult to debug the code. I recommend using a system command from Matlab as described in the R Wiki. This also avoids having to install RAndFriends.

Is there an existing way to import EDF files into R?

I have some European Data Format (EDF) files that I would like to import into R.
There are some Python libraries for parsing EDF files and the EDF spec is available, so I know it's possible, but I would avoid writing code if I could.
Does there already exist a facility for importing these kinds of files?
Was looking for the same thing. Found this function written by Fabien Feschet - works well for my data. http://feschet.fr/?p=11
Found another resource recently. This works very well. Need to download both read_edf.R and utilities.R
https://github.com/bwrc/edf/tree/master/R
I tried look for the same thing a while ago, but I couldn't find anything for R. I ended up using biosig Python module to convert edfs to ascii. There is also this edf2ascii-converter.
I guess there wasn't any package available at the time when the question was asked but now you could use edfReader:
https://cran.r-project.org/web/packages/edfReader/
https://github.com/Pisca46/edfReader

Resources