How can I get around memory problems in R? - r

I'm applying the function image(cp) to gps data but when I do so it throws up the following error
Error in image(as(x, "SpatialGridDataFrame"), ...) :
error in evaluating the argument 'x' in selecting a method for function 'image': Error: cannot allocate vector of size 12.3 Mb
The SpatialPointsDataFrame of my relocation gps data has two columns. One with the coordinates, the other with the ID of the animal.
I'm running it on a 32 bit system with 4 gigs of RAM.
How do I get around this?

One way that might work with no thinking required:
library(raster)
r <- raster(cp)
image(r)
But, you say cp is "gps data" so it's not at all clear why this would be imageable.
One thing you can do is plot it:
plot(cp)
That will work for a SpatialPointsDataFrame. If you want to create an image from this somehow you'll need to specify some details.

Related

Dimension problem while reading netcdf4 files in 'stars' R package

In essence I'm trying to do a relatively simple set of operations on a collection of netcdf4 files I've downloaded. They're sourced from ESA's Lakes Climate Change Initiative database of satellite-derived limnological data, and each netcdf4 file represents one day in a time series going back to the 2000s or earlier. Each netcdf4 file has a number of dimensions representing different variables of interest (surface temperature, chlorophyll-a concentration, etc).
Using the stars package I was hoping to geographically subset the dataset to only the lake I'm interested in and create a monthly aggregated time series for the lake for a number of those variables.
Unfortunately it seems as though there might be something wrong with the netcdf4 files ESA provided, as I'm running into odd errors while working with them in the 'stars' package in R.
For reproducibility's sake here's a link to the file in question - it should be the first file in the directory. If you download it to a directory of your choice and setwd() to it you should be able to repeat what I've managed here:
library(stars)
setwd()
lake_2015_01_01 <- read_stars('ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20150101-fv2.0.2.nc')
## Compare to using the read_ncdf4() function also from stars:
lake_2015_01_01_nc <- read_ncdf('ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20150101-fv2.0.2.nc')
Running the file through read_stars() produces the error:
Error in attr(x, "dimensions")[[along]] : subscript out of bounds
In addition: There were 50 or more warnings (use warnings() to see the first 50)
While running it through read_ncdf() produces the following:
Warning messages:
1: In CPL_crs_from_input(x) :
GDAL Error 1: PROJ: proj_create: Error 1027 (Invalid value for an argument): longlat: Invalid value for units
2: In value[3L] : failed to create crs based on grid mapping
and coordinate variable units. Will return NULL crs.
Original error:
Error in st_crs.character(base_gm): invalid crs: +proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs +units=degrees
But it does successfully complete, just with a broken coordinate system that can be fixed by approximating the coordinate system originally set by the creators:
lake_2015_01_01_nc <- st_set_crs(lake_2015_01_01_nc, 4979)
However the functions stars uses to manipulate data don't work on it, as it's a proxy file that points to the original netcdf4. The architecture of the stars package would suggest that if I want to manipulate the file I need to use read_stars().
I've tried to reverse-engineer the problem a little bit by opening the .nc file in QGIS, where it seems to perform as expected. I'm able to display the data and it seems to be georeferenced correctly, which makes me suspect the data is not being read into 'stars' correctly in the read_ncdf() function, and likely in the read_stars() function as well.
I'm not sure how to go about fixing this however, and I'd like to use both this dataset and stars in the analysis. Does anyone have any insight as to what might be going on here and if it's something I can fix?

SuperLearner Error in R - Object 'All' not found

I am trying to fit a model with the SuperLearner package. However, I can't even get past the stage of playing with the package to get comfortable with it....
I use the following code:
superlearner<-SuperLearner::SuperLearner(Y=y, X=as.data.frame(data_train[1:30]), family =binomial(), SL.library = list("SL.glmnet"), obsWeights = weights)
y is a numeric vector of the same length as my dataframe "data_train", containing the correct labels with 9 different classes. The dataframe "data_train" contains 30 columns with numeric data.
When i run this, i get the Error:
Error in get(library$screenAlgorithm[s], envir = env) :
Objekt 'All' not found
I don't really know what the problem could be and i can't really wrap my head around the source code. Please note that the variable obsWeights in the function contains a numeric vector of the same length as my data with weights i calculated for the model. This shouldn't be the problem, as it doesn't work either way.
Unfortunately i can't really share my data on here, but maybe someone had this error before...
Thanks!
this seems to happen if you do not attach SuperLearner, you can fix via library(SuperLearner)

In mclapply : scheduled cores 9 encountered errors in user code, all values of the jobs will be affected

I went through the existing stackoverflow links regarding this error, but no solution given there is working (and some questions dont have solutions there either)
Here is the problem I am facing:
I run Arima models in parallel using mclapply of parallel package. The sample data is being split by key onto different cores and results are clubbed together using do.call + rbind (the server I place the script in has 20 cores of cpu which is passed on to mc.cores field)
Below is my mclapply code:
print('Before lapply')
data_sub <- do.call(rbind, mclapply(ds,predict_function,mc.cores=num_cores))
print('After lapply')
I get multiple set of values like below as output of 'predict_function'
So basically, I get the file as given above from multiple cores to be send to rbind. The code works perfectly for some part of data. Now, I get another set of data , same like above with same data type of each column, but different value in column 2
data type of each column is given in the column name above.
For the second case, I get below error:
simpleError in charToDate(x): character string is not in a standard unambiguous format
Warning message:
In mclapply(ds, predict, mc.cores = num_cores) :
scheduled cores 9 encountered errors in user code, all values of the jobs will be affected
I dont see this print: print('After lapply') for the second case, but is visible for first case.
I checked the date column in above dataframe, its in Date format. When I tried unique(df$DATE) it threw all valid values in the format as given above.
What is the cause of the error here? is it the first one due to which mclapply isnt able to rbind the values? Is the warning something we need to understand better?
Any advice would be greatly appreciated.

Correctly setting up Shannon's Entropy Calculation in R

I was trying to run some entropy() calculations on Force Platform data and i get a warning message:
> library(entropy)
> d2 <- read.csv("c:/users/SLA9DI/Documents/data2.csv")
> entropy(d2$CoPy, method="MM")
[1] 10.98084
> entropy(d2$CoPx, method="MM")
[1] 391.2395
Warning message:
In log(freqs) : NaNs produced
I am sure it is because the entropy() is trying to take the log of a negative number. I also know R can do complex numbers using complex(), however i have not been successful in getting it to work with my data. I did not get this error on my CoPy data, only the CoPx data, since a force platform gets Center of Pressure data in 2 dimensions. Does anyone have any suggestions on getting complex() to work on my data set or is there another function that would work better to try and get a proper entropy calculation? Entropy shouldn't be that much greater in CoPx compared to CoPy. I also tried it with some more data sets from other subjects and the same thing was popping up, CoPx entropy measures were giving me warning messages and CoPy measurements were not. I am attaching a data set link so anyone can try it out for themselves and see if they can figure it out, as the data is a little long to just post into here.
Data
Edit: Correct Answer
As suggested, i tried the table(...) function and received no warning/error message and the entropy output was also in the expected range as well. However, i apparently overlooked a function in the package discretize() and that is what you are supposed to use to correctly setup the data for entropy calculation.
I think there's no point in applying the entropy function on your data. According to ?entropy, it
estimates the Shannon entropy H of the random variable Y from the corresponding observed counts y
(emphasis mine). This means that you need to convert your data (which seems to be continuous) to count data first, for instance by binning it.

R ffdf Error: is.null(names(x)) is not TRUE

I'm trying to understand what a particular error message means in R.
I'm using an ff data frame (ffdf) to build an rpart model. That works fine.
However, when I try to apply the predict function (or any function) using ffdfdply, I get a cryptic error message that I can't seem to crack. I'm hoping someone here can shed light on its meaning.
PredictedData<-ffdfdply(x=TrainingData,split=TrainingData$id,
FUN=function(x) {x$Predicted<-predict(Model1,newdata=x)
x})
If I've thought about this correctly, ffdfdply will take the TrainingData table, split it into chunks based on TrainingData$id, then apply the predict function using the model file Model1. Then it will return the table (its labelled x in the function field), combine them back together into the table PredictedData. PredictedData should be the same as TrainingData, except with an additional column called "Predicted" added.
However, when I run this, I get rather unhelpful error message.
2014-07-16 21:16:17, calculating split sizes
2014-07-16 21:16:36, building up split locations
2014-07-16 21:17:02, working on split 1/30, extracting data in RAM of 32 split elements, totalling, 0.07934 GB, while max specified data specified using BATCHBYTES is 0.07999 GB
Error: is.null(names(x)) is not TRUE
In addition: Warning message:
In ffdfdply(x = TrainingData, split = TrainingData$id, FUN = function(x) { :
split needs to be an ff factor, converting using as.character.ff to an ff factor
Yes, every column has a name. Those names contain just alphanumeric characters plus periods. But the error message makes me think that that the columns should not have names? I guess I 'm confused what this means exactly.
I appreciate any hints anyone can provide and I'll be happy to provide more detail.
I think I found the solution to this.
Turns out I had periods in my column names. When I removed those periods, this worked perfectly.

Resources