Loading a .RDS file in Python 3 - r

I have a dataset in RDS format that I managed in RStudio, but I would like to open this in Python to do the analysis. Would it be possible to open this type of format into Python?
I tried the following codes already:
pip install pyreadr
import pyreadr
result = pyreadr.read_r('/path/to/file.Rds')
However, I get a
MemoryError: Unable to allocate 18.9 MiB for an array with shape
(2483385,) and data type float64.
What can I do?

Pyreadr is a wrapper around the C library librdata, and librdata has a hardcoded limit on the size an R vector can have. The limit used to be very low in old versions, but it was increased. Your vector would fail in older versions but should work in a recent one, so please check that you are using the most recent version.
If that doesn't help, then it may be a bug. If you can share the file please submit an issue in github.
Here a link to the old issues in github librdata and pyreadr (theoretically now solved)
https://github.com/WizardMac/librdata/issues/19.
https://github.com/ofajardo/pyreadr/issues/3
EDIT:
The limit is now permently removed in pyreadr 0.3.0. Now this should not be an issue anymore.

From my knowledge, you could store the data to a pandas dataframe as mentioned in this link.
The second option(link)
How can I explicitly free memory in Python?
If you wrote a Python program that acts on a large input file to create a few million objects representing and it’s taking tons of memory and you need the best way to tell Python that you no longer need some of the data, and it can be freed?
The Simple answer to this problem is:
Force the garbage collector for releasing an unreferenced memory with gc.collect().
I hope this answers your query

Related

R tcltk: error when trying to display a png file depending on the OS

This is an issue I am encountering for different pieces of codes I am writing in R.
Basically, I would like to generate a window that displays a picture (a .png file). Following for instance guidances from this or this, I come up with this kind of code:
library(tcltk)
tmpFile <- tempfile(fileext = ".png")
download.file("https://www.r-project.org/logo/Rlogo.png", tmpFile)
tcl("image","create","photo", "imageLogo", file=tmpFile)
win1 <- tktoplevel()
tkpack(ttklabel(win1, image="imageLogo", compound="image"))
This works fine under Mac OS, but not on Linux nor on Windows, where I am displayed such an error message:
[tcl] couldn't recognize data in image file
I can find some workarounds when I want to display graphs, using for instance packages tkrplot or igraph. Nonetheless, I would be really eager to understand why I got such errors when running my scripts on Linux or Windows, whereas it works just fine on Mac OS.
Apologies in case this issue is obvious, but I haven't found anything about potential differences with the tcltk package depending on the OS.
Tk's native support for PNG was added in 8.6. Prior to that, you need to have the tkimg extension loaded into Tk to add the image format handler required. If your installation of Tcl/Tk that R is using is set up right, you can probably make it work with:
tclRequire("Img")
once you've initialised things sufficiently. Yes, the name used internally is “Img” for historical reasons, but that's just impossible to search for! (This is the key thing in this mailing list message from way back.)
However, upgrading the versions of Tcl and Tk to 8.6 is likely to be a better move.
Finally and a bit lately, I would like to close this issue and sum up the different suggestions that were kindly made in response of my question:
R comes along with Tcl 8.5, even with the latest version 3.3.2, which means that there is no way for embedding a PNG file with the usual command into a window created thanks to Tcl/Tk. For some reasons it is working on Mac OS, but do not expect this to work easily on other OSs.
In order to display pictures, graphs, etc. in a window generated by Tcl/Tk in R, better look for either using the GIF support (when possible) or trying alternative solutions (see the question for possible alternative options).
In case one really wants to display PNG files, the solution consists of installing Tcl 8.5 (for instance ActiveTcl) along with the extension Img. In order to use the Tcl/Tk package that you've just installed on your computer, you can refer to the R FAQ for Windows for instance (as stated in the FAQ, you need to install Tcl 8.5 - I tried with Tcl 8.6, thereby hoping to solve my issue, but it didn't work). Basically, you need to set up an environment variable (MY_TCLTK) and put the path where the package Tcl/Tk is installed. Needless to be said, Tcl/Tk is commonly used in R in order to implement GUIs; if you have to go through very complex procedures to set up the system, the package definitely loses its advantages.
Finally, since Tcl 8.6 should be available soon or later with R (already implemented in the devel version), this issue will be de facto outdated.

Speeding up matlab file import

I am trying to load a matlab file with the R.matlab package. The problem is that it keeps loading indefinitely (e.g. table <- readMat("~/desktop/hg18_with_miR_20080407.mat"). I have a genome file from the Broad Institute (hg18_with_miR_20080407.mat).
You can find it at:
http://genepattern.broadinstitute.org/ftp/distribution/genepattern/dev_archive/GISTIC/broad.mit.edu:cancer.software.genepattern.module.analysis/00125/1.1/
I was wondering: has anyone tried the package and have similar issues?
(hopefully helpful, but not really an answer, though there was too much formatting for a comment)
I think you may need to get a friend with matlab access to save the file into a more reasonable format or use python for data processing. It "hangs" for me as well (OS X 10.9, R 3.1.1). The following works in python:
import scipy.io
mat = scipy.io.loadmat("hg18_with_miR_20080407.mat")
(you can see and work with the rg and cyto' crufty numpy arrays, but they can't be converted to JSON withjson.dumpsand evenjsonpickle.encodecoughs up a lung-full of errors (i.e. you won't be able to userPython` to get access to the object which was the original workaround I was looking for), so no serialization to a file either (and, I have to believe the resultant JSON would have been ugly to deal with).
Your options are to:
get a friend to convert it (as suggested previous)
make CSV files out of the numpy arrays in python
use matlab

How can I remove a lock from a linked environment in R?

I tried to run a Bioconductor package (truncateCDF) that modify an environment(hgu133plus2cdf), to remove unwanted probesets, from an affymetrix chip.
Everything went fine, until I got the following message (translated from french):
> assign(cdfname, cdf.env, env=CDF.env)
Error in assign(cdfname, cdf.env, env = CDF.env) :
impossible to change the value of a locked link for 'hgu133plus2cdf'
The assign function is the ultimate function of the code, that save the changes made to the environment dataset CDF.env to the original environment (hgu133plus2cdf), before using it in analyses of affymetrix chip results; so, it is essential.
My question: what is this locked link to the hgu133plus2cdf environment, and how could I bypass it.
The author of this package successfully run its package around 2005; so I suppose it is a feature introduced since then in R (probably not related to Bioconductor, since assign is a basic R function, reason why I ask this question on this forum instead of Biostar).
I tried to read the docs, but I am overwhelmed, when it comes to environments.
Thanks in advance for any help.
I don't think truncateCDF is from a Bioconductor package; it is a at least not current. This sounds like this post and the next two from the same thread from the Bioconductor mailing list. It is a result of a change in R -- packages now have not-easily-modified name spaces, and these are implemented by locking the environment in which name space symbols are defined. Removing probes is not an essential part of a typical microarray work flow. Please ask on the Bioconductor mailing list (no subscription required) if you'd like more help.

R2PPT crashes R; are there alternatives to R2PPT?

I am attempting to automate the insertion of JPEG images into Powerpoint. I have a macro done for that already, except using R would be infinitely better for my purposes.
The package R2PPT should do this, I understand. However, I cannot use it. For example, when I try to use PPT.Open, I understand I can do it two different ways by calling method = "rcom" or method = "RDCOMClient". Using the latter, R will always crash, sending an error report to windows. Using the former, it tells me I need to install statconnDCOM , before giving the error:
Error in PPT.Open(x) : attempt to apply non-function.
I cannot install statconnDCOM freely, as I wouldn't call this work non-commercial. So if there isn't a way to get around this issue, are there at least some free alternatives to R2PPT so that I can save several hours of manual work with a simple R code? If there is a way for me to use R2PPT, that would be ideal.
Thanks!
Edit:
I'm using R version 2.15 and downloaded the most recent version of R2PPT. Powerpoint is 2007.
Do you have administrative privileges on this machine?
There is an issue with package RDCOMClient. It needs permissions to write file rdcom.err in the root of drive C:. If you don't have privileges to write to c:, there is a rather cumbersome workaround:
Close R
Create "c:\temp" folder if it doesn't exist.
Locate on your hard drive file rdcomclient.dll. It usually placed in \R\library\RDCOMClient\libs\i386\ and in \R\library\RDCOMClient\libs\x64\ (you need to patch file which corresponds your Windows version - 32 bit or 64 bit). It's recommended to make backup copy of this files before patching.
Open rdcomclient.dll in text editor (Notepad++, for example -http://notepad-plus-plus.org/)
Find in file string c:\rdcom.err - it occurs only once.
Go into overwrite mode (usually by pressing "Ins" key). It is very important that new path will have the same number of characters as original one. Type C:\temp\e.rr instead of c:\rdcom.err
Save the file.
Now all should work fine.
Arguably not an answer, but have you looked at using Sweave/knitr to render your presentations in LaTeX using something like Beamer? (As discussed on slide 17 here.)
Wouldn't help any with getting JPGs into a PowerPoint, but would certainly make putting R-output (numerical or graphical) into a presentation much easier!
Edit: if you want to use knitr (which I recommend), here's another reference.

Warning / Error when Importing a .sav

I have two versions of SPSS at work. SPSS 11 running on Windows XP and SPSS 20 running on Linux. Both copies of SPSS work fine. Files created with either version of SPSS open without incident on the other version of SPSS. I.E. - I can create a .sav file with SPSS 20 on Linux and open it on SPSS 11 on Windows without incident.
But, if I create a .sav file with SPSS 20 and import the data into either R or PSPP (on Linux), I get a bunch of warnings. The data appears to import correctly, but I am concerned by the warnings. I do not see any warning when importing a .sav from SPSS 11 or other .sav files I have been sent. Many of the analysts at my company use SPSS so I've gotten SPSS files from different versions of SPSS and I have never before seen this warning. The warning messages are nearly identical between PSPP and R which makes sense. AFAIK, they use the same underlying libs to import the data. This is the R error:
Warning messages:
1: In read.spss("test.sav") :
test.sav: File-indicated value is different from internal value for at least one of the three system values. SYSMIS: indicated -1.79769e+308, expected -1.79769e+308; HIGHEST: 1.79769e+308, 1.79769e+308; LOWEST: -1.79769e+308, -1.79769e+308
2: In read.spss("test.sav") :
test.sav: Unrecognized record type 7, subtype 18 encountered in system file
The .sav file is really simple. It has two columns, dumb and dumber. Both are integers. The first two contains two values of 1.0. The second row contains two values of 2.0. I can provide the file on request (I don't see any way to upload it to SO). If anyone would like to see the actual file, PM me and I'll send it to you.
dumb dumber
1.0 1.0
2.0 2.0
Thoughts? Anyone know the best way to file a bug against R without getting roasted alive on the mailing list? :-)
EDIT: I used the term "Error" in the title line. I'll leave it, but I should not have used this word. The comments below are correct in pointing out that the messages I am seeing are warnings, not errors. I do however feel that this is made clear in the body of the question above. Clearly, the SPSS data format has changed over time and SPSS/IBM have failed to document these changes which is the root of the problem.
It's not an error message. It is only a warning. SPSS refuses to document their file formats so people have not been motivated to track down by reverse engineering the structure of new "subtypes". There is no way to file a bug report without getting roasted because there is no bug .... other than a closed format and that bug complaint should be filed with the owners of SPSS!
EDIT: The R-Core is a volunteer group and takes it responsibilities very seriously. It exerts major efforts to track down anything that affects the stability of systems or produces erroneous calculations. If you were willing to be a bit more respectful of the authors of R and suggest the possibility of collaboration on the R-devel mailing list to identify solutions to this problem without using the term "bug", you would arouse much less hostility. There might be someone who would be willing to see if a simple .sav file such as the one you constructed could be examined under a hexadecimal microscope to identify whatever infinite negative value is being mistaken for another infinite negative value. Most of the R-Core is not in possession of working copies of SPSS.
You could offer this link as an example of the product of others who have attempted the reverse engineering of SPSS .sav formats:
http://svn.opendatafoundation.org/ddidext/org.opendatafoundation.data/references/pspp_source/sfm-read.c
Edit: 4/2015; I have seen a recent addition to the ?read.spss help file that refers one to pkg:memisc: "A different interface also based on the PSPP codebase is available in package memisc: see its help for spss.system.file." I have used that package's function successfully (once) on files created by more recent versions of SPSS.
The SPSS file format is not publicly documented and can change, but IBM SPSS does provide free libraries that can read and write the SAV file format. These mask any changes to the format. You can get them from the SPSS Community website (along with many other free goodies including the SPSS integration with R). Go to www.ibm.com/developerworks/spssdevcentral and look around. BTW, there have been substantial additions/changes to the sav file since year 2000, although the core data can still be read by old versions.
HTH,
Jon Peck

Resources