reading large file into R - r

I am trying to read a large space-delimited file (14Gb) of 49,376 rows and 73,625 columns into R for analysis.
I have tried using fread from the data.table package, as suggested here.
I receive the error
Error: segfault from C stack overflow
Is there another approach that could be used here? Any other packages or some kind of work around for this error? My R session info is below.
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US
[4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US
[7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.4
loaded via a namespace (and not attached):
[1] chron_2.3-45 tools_3.0.2

The error was occurring due to insufficient memory. Once I increased the memory limit, fread worked as expected.

Related

Why am I referred to the manual page for "Arithmetic" when I type `?mgcv-faq` without `library(mgcv)`?

Well I spot this when I actually made a mistake. I want to get the manual page of mgcv.FAQ, but I
forgot to do library(mgcv);
wrongly put ?mgcv-faq.
But, R strangely directs me to the doc page as if I have done ?Arithmetic.
What is going on? After I do library(mgcv), putting ?mgcv-faq now gives an error:
#Error in eval(argExpr, envir) : object 'mgcv' not found
#Error in .signatureFromCall(fdef, expr, envir, doEval) :
# error in trying to evaluate the expression for argument ‘e1’ (mgcv)
Can anyone explain this behavior?
sessionInfo() before library(mgcv):
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.4
attached base packages:
[1] stats graphics grDevices utils datasets methods base
sessionInfo() after library(mgcv):
## only shows what is different from the above
other attached packages:
[1] mgcv_1.8-24 nlme_3.1-137
loaded via a namespace (and not attached):
[1] compiler_3.4.4 Matrix_1.2-14 tools_3.4.4 grid_3.4.4
[5] lattice_0.20-35
I don't have Rstudio, so I run R in the terminal. And the manual page is displayed in the terminal.
You can reproduce this with:
expr <- quote(mgcv-faq)
utils:::.helpForCall(expr, parent.frame())
.helpForCall gets called by ? internally with this input.
Now, if the expression does not contain :: or ::: calls, .helpForCall extracts the symbol to look up by doing
expr[[1L]]
#`-`
without checking the length of the expression.
The - operator is first in the expression because the expression is parsed to function syntax.
Why does this happen? Because ? should be able to handle :: and ::: in its input and nobody thought of handling a user error such as yours.

What can I do about frequent, unreproducible segfault errors in R 3.0.1?

After upgrading to R 3.0.X, I've started getting pretty frequent, unreproducible segfault errors like those found by this asker. I never had one of these errors before with R 2.X.X. For example, this is the session info for a long block of code that just caused a fault. However, after R crashed, I ran the entire block of code again and there was no error.
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] klaR_0.6-8 ggplot2_0.9.3.1 labdsv_1.5-0 MASS_7.3-26 mgcv_1.7-22
[6] cluster_1.14.4 sparcl_1.0.3 FD_1.0-11 vegan_2.0-7 permute_0.7-0
[11] geometry_0.3-3 magic_1.5-4 abind_1.4-0 ape_3.0-8 ade4_1.5-2
[16] plyr_1.8
loaded via a namespace (and not attached):
[1] class_7.3-7 colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3
[5] e1071_1.6-1 grid_3.0.1 gtable_0.1.2 labeling_0.1
[9] lattice_0.20-15 Matrix_1.0-12 munsell_0.4 nlme_3.1-109
[13] proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3
[17] stringr_0.6.2
Sometimes R freezes completely and I have to do a force quit, but other times, it allows me to exit with options for a core dump or saving the workspace.
Someone in another post suggested setting options(CBoundsCheck=T) and that seemed to work for a while but I am still getting frequent faults.
I don't think these faults are related to any specific kind of calculation or function as I got one the other day after starting a new session and only setting my working directory and options. The code that triggered the fault and the system info would look like this:
#Set my working directory
setwd("~/Documents/School Spring 2013/Quaternary Dissasembly/Functional Diversity Basefile 3")
#Keep getting segfaults all the time. This might fix it
options(CBoundsCheck=T)
sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Any help anyone could give me to find and fix the bug or the incorrect settings I have would be greatly appreciated. Thank you.
-M

segfault R package check

I am updating a package qdap that had a single note on CRAN in its previous build, which was resolved.
In the qdap development version, under R 2.15.3, using --as-cran and --resave-data=best, I get the following note:
* checking R code for possible problems ... NOTE
Segmentation fault
Why is this segfault occurring (I'm somewhat new to Linux)? How can I find it? I googled this for sometime but I couldn't figure out what the problem is. I gather there's some sort of problem in my code but...
On Windows I get no notes.
I can provide more info if needed (though the GitHub repo is available).
Session Info:
R version 2.15.3 (2013-03-01)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] pacman_0.2.0 XML_3.9-4
loaded via a namespace (and not attached):
[1] tools_2.15.3

R: ABC error with 1 Ss

Edit: Solved, the error dissapeared whe I updated the package.
I'm getting an error when working with just one summary statistic. Is there any reason why this is happening? Is there a way to get around this problem?
Thanks
library(abc)
data(human)
target<-(stat.voight["hausa",])[,1]
sumstat<-(stat.3pops.sim)[,1]
modsel.ha <- postpr(target, models, sumstat, tol=.05, method="mnlogistic")
#Error in eval(predvars, data, env) : numeric 'envir' arg not of length one
Aditional details: no other objects were loaded (to my knowledge) as the R session had just been started. When I use two summary statistics instead of just 1 postpr works fine.
Session details as per request from nograpes
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] abc_1.5 locfit_1.5-7 quantreg_4.79 SparseM_0.96 nnet_7.3-4 MASS_7.3- 21
loaded via a namespace (and not attached):
[1] grid_2.15.1 lattice_0.20-10 tools_2.15.1
The problem turned out to be solved when I upgraded abc. Surely it was a bug of an old version of the package.

difficulties with installing/using Cairo / R 2.12.1

I have compiled R version 2.12.1 (2010-12-16) for my system (CentOS), and afterwards I installed Cairo_1.4-5.
I'd like use Cairo to produce PNG (and maybe PDF) output of my graphs when I batch-invoke my scripts, but this gives me difficulties when I am using X (I am tunneling X through ssh and it often disconnects while the R scripts are running...), so this is why I want to use Cairo.
(I do not have root access btw).
Although it's installed according to sessionInfo
> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
I can not call CairoPDF:
> CairoPDF()
Error: could not find function "CairoPDF"
Weirdly, I can invoke a function called cairo_pdf
> cairo_pdf()
>
I can not call CairoPNG, cairo_PNG or cairo_png:
> CairoPNG()
Error: could not find function "CairoPNG"
> cairo_PNG()
Error: could not find function "cairo_PNG"
> cairo_png()
Error: could not find function "cairo_png"
Is something wrong withmy installation? Frankly I have no idea how to proceed from here, why can't I even call CairoPNG()?
On Ubuntu I have:
> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i486-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=af_ZA.utf8 LC_NUMERIC=C
[3] LC_TIME=af_ZA.utf8 LC_COLLATE=af_ZA.utf8
[5] LC_MONETARY=C LC_MESSAGES=af_ZA.utf8
[7] LC_PAPER=af_ZA.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=af_ZA.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Then run library(Cairo), if this is not installed
library(Cairo)
Error in library(Cairo) : there is no package called 'Cairo'
This means you will have to install the Cairo R package, which interfaces with the Cairo graphics system
> install.packages('Cairo')
It will download, build and install the package - you don't need root for this
If it was successful, you can run
>library(Cairo)
>sessionInfo()
R version 2.12.1 (2010-12-16)<br>
Platform: i486-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=af_ZA.utf8 LC_NUMERIC=C
[3] LC_TIME=af_ZA.utf8 LC_COLLATE=af_ZA.utf8
[5] LC_MONETARY=C LC_MESSAGES=af_ZA.utf8
[7] LC_PAPER=af_ZA.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=af_ZA.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Cairo_1.4-5
HTH
This should do:
install.packages("Cairo")
library(Cairo)

Resources