list of dataframes in R: assignment problem - r

I have the following code:
child_tracks <- list()
for(i in 1:106)
{
for(j in 1:5)
{
child_tracks[[i]][[j]] <-
all_samples[[i]][sample(nrow(all_samples[[i]]),length_breakups[[i]][[j]]),]
}
}
As above, "all_samples" is a list of dataframes, while "length_breakups" is a list of lists.
When I assign the calculation to it, it throws an error
: subscript out of bounds. Although a singular variable can hold, a list
can't. For example:
temp <-
all_samples[[i]][sample(nrow(all_samples[[i]]),length_breakups[[i]][[j]]),]
child_tracks[[i]][[j]] <-
all_samples[[i]][sample(nrow(all_samples[[i]]),length_breakups[[i]][[j]]),]
The former works, the latter doesn't. I've seen that the class
definitions are all okay and so are the ranges of the "for" loop.
Just not getting around it. Any comments?

Initialise the size of the list:
child_tracks <- array(list(), c(106,5))
for(i in 1:106)
{
for(j in 1:5)
{
child_tracks[[i]][[j]] <-
all_samples[[i]][sample(nrow(all_samples[[i]]),length_breakups[[i]][[j]]),]
}
}

Related

User defined function - issue with return values

I regularly come up against the issue of how to categorise dataframes from a list of dataframes according to certain values within them (E.g. numeric, factor strings, etc). I am using a simplified version using vectors here.
After writing messy for loops for this task a bunch of times, I am trying to write a function to repeatedly solve the problem. The code below returns a subscripting error (given at the bottom), however I don't think this is a subscripting problem, but to do with my use of return.
As well as fixing this, I would be very grateful for any pointers on whether there are any cleaner / better ways to code this function.
library(plyr)
library(dplyr)
#dummy data
segmentvalues <- c('1_P', '2_B', '3_R', '4_M', '5_D', '6_L')
trialvec <- vector()
for (i in 1:length(segmentvalues)){
for (j in 1:20) {
trialvec[i*j] <- segmentvalues[i]
}
}
#vector categorisation
vcategorise <- function(categories, data) {
#categorises a vector into a list of vectors
#requires plyr and dyplyr
assignment <- list()
catlength <- length(categories)
for (i in 1:length(catlength)){
for (j in 1:length(data)) {
if (any(contains(categories[i], ignore.case = TRUE,
as.vector(data[j])))) {
assignment[[i]][j] <- data[j]
}
}
}
return (assignment)
}
result <- vcategorise(categories = segmentvalues, data = trialvec)
Error in *tmp*[[i]] : subscript out of bounds
You are indexing assignments -- which is ok, even if at an index that doesn't have a value, that just gives you NULL -- and then indexing into what you get there -- which won't work if you get NULL. And NULL you will get, because you haven't allocated the list to be the right size.
In any case, I don't think it is necessary for you to allocate a table. You are already using a flat indexing structure in your test data generation, so why not do the same with assignment and then set its dimensions afterwards?
Something like this, perhaps?
vcategorise <- function(categories, data) {
assignment <- vector("list", length = length(data) * length(categories))
n <- length(data)
for (i in 1:length(categories)){
for (j in 1:length(data)) {
assignment[(i-1)*n + j] <-
if (any(contains(categories[i],
ignore.case = TRUE,
as.vector(data[j])))) {
data[j]
} else {
NA
}
}
}
dim(assignment) <- c(length(data), length(categories))
assignment
}
It is not the prettiest code, but without fully understanding what you want to achieve, I don't know how to go further.

R - Saving the values from a For loop in a vector or list

I'm trying to save each iteration of this for loop in a vector.
for (i in 1:177) {
a <- geomean(er1$CW[1:i])
}
Basically, I have a list of 177 values and I'd like the script to find the cumulative geometric mean of the list going one by one. Right now it will only give me the final value, it won't save each loop iteration as a separate value in a list or vector.
The reason your code does not work is that the object ais overwritten in each iteration. The following code for instance does what precisely what you desire:
a <- c()
for(i in 1:177){
a[i] <- geomean(er1$CW[1:i])
}
Alternatively, this would work as well:
for(i in 1:177){
if(i != 1){
a <- rbind(a, geomean(er1$CW[1:i]))
}
if(i == 1){
a <- geomean(er1$CW[1:i])
}
}
I started down a similar path with rbind as #nate_edwinton did, but couldn't figure it out. I did however come up with something effective. Hmmmm, geo_mean. Cool. Coerce back to a list.
MyNums <- data.frame(x=(1:177))
a <- data.frame(x=integer())
for(i in 1:177){
a[i,1] <- geomean(MyNums$x[1:i])
}
a<-as.list(a)
you can try to define the variable that can save the result first
b <- c()
for (i in 1:177) {
a <- geomean(er1$CW[1:i])
b <- c(b,a)
}

Extracting 3D netcdf variable from lists within nested loop in R

Say I have 10 model configurations of n timesteps for 3 different sites, producing a total 30 netcdf files I want to open and manipulate. I can open the 30 files such as
require(ncdf4)
allfiles= list()
nmod=10
nsites=3
for (i in 1:nmod) {
allfiles[[i]] = list(nc_open(paste0('Model',i,'siteA.nc')),
nc_open(paste0('Model',i,'siteB.nc')),
nc_open(paste0('Model',i,'siteC.nc')))
}
When querying the class of what was opened, I have
class(allfiles)
[1] "list"
class(allfiles[[1]][[1]])
[1] "ncdf4"
as expected.
Now what I would like to do is extract the values from a variable in the files such that
var=list()
for (i in 1:nmod) {
for (j in 1:nsites) {
var[[i]][[j]] <- ncvar_get(allfiles[[i]][[j]],"var1")
nc_close(allfiles[[i]][[j]])
}}
but I get this error message:
`Error in *tmp*[[i]] : subscript out of bounds`
If I try
var[[i]] <- ncvar_get(allfiles[[i]][[j]],"var1")
it (understandbly) only produces a list of 10 model configurations at one site, i.e. var[[1]][[1]][1] prints out the value of the variable at model configuration 1, site A, timestep 1 but var[[1]][[2]] doesn't exist.
How can I declare var in the above loop so that it contains all the values for all models, all sites and all timesteps (e.g. for var[[1]][[2]][1] to exist)?
In your original version where the error occurs, in the first inner loop, you try to do something: var[[1]][[1]] <- something, but var[[1]] doesn't exist, and R doesn't know what to do, so I guess the following thing should work, you set var[i] <- list() before you do var[[i]][[j]] <- something:
var=list()
for (i in 1:nmod) {
var[i] <- list()
for (j in 1:nsites) {
var[[i]][[j]] <- ncvar_get(allfiles[[i]][[j]],"var1")
nc_close(allfiles[[i]][[j]])
}
}
For example, if you do:
var <- list()
for (i in 1:10) {
for (j in 1:10) {
var[[i]][[j]] <- 1
}
}
Then the same error happens. But if you set var[[i]] <- list() before carrying out the inner loop like this:
var <- list()
for (i in 1:10) {
var[[i]] <- list()
for (j in 1:10) {
var[[i]][[j]] <- 1
}
}
Then the problem will be solved.

Using lapply to subset rows from data frames -- incorrect number of dimensions error

I have a list called "scenbase" that contains 40 data frames, which are each 326 rows by 68 columns. I would like to use lapply() to subset the data frames so they only retain rows 33-152. I've written a simple function called trim() (below), and am attempting to apply it to the list of data frames but am getting an error message. The function and my attempt at using it with lapply is below:
trim <- function(i)
{ (i <- i[33:152,]) }
lapply(scenbase, trim)
Error in i[33:152, ] : incorrect number of dimensions
When I try to do the same thing to one of the individual data frames (soil11base.txt) that are included in the list (below), it works as expected:
soil11base.txt <- soil11base.txt[33:152,]
Any idea what I need to do to get the dimensions correct?
You have 2 solutions. You can either
(a) assign to a new list newList = lapply(scenbase, function(x) { x[33:152,,drop=F]} )
(b) use the <<- operator will assign your trimmed data in place lapply(1:length(scenbase), function(x) { scenbase[[x]] <<- scenbase[[x]][33:152,,drop=F]} ).
Your call does not work because the i is not in the global scope. You can work your way around that by using calls to the <<- operator which assigns to the first variable it finds in successive parent environments. Or by creating a new trimmed list.
Here is some code that reproduces solution (a):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
choppedList = lapply(listOfDfs, function(x) { x[33:152,,drop=F]} )
Here is some code that reproduces solution (b):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
lapply(1:length(listOfDfs), function(x) { listOfDfs[[x]] <<- listOfDfs[[x]][33:152,,drop=F]} )

Keep assigned objects in workspace through a function

I am trying to keep an assigned object from a function (building a ts function to begin to model a univariate process, simple I know!). I am having trouble finding a method to keep objects in my workspace. It works fine just using a for loop but I would like to parameterize the following:
ts.builder<-function(x,y,z){
for(i in 9:13){
assign(paste(x,i,sep="_"),ts(yardstick[1:528,i], freq=24))
assign(paste(y,i,sep="_"),ts(yardstick[529:552,i], freq=24))
assign(paste(z,i,sep="_"),ts(yardstick[1:552,i], freq=24))
}
}
ts.builder("yard.book.training","yard.book.small.valid", "yard.book.valid")
Any pointers?
I am thinking it may need a return statement, yet I have not found this to be of use yet.
Untested (a reproducible example helps a lot):
ts.builder <- function() {
xd <- list()
yd <- list()
zd <- list()
for (i in 9:13) {
xd[[i]] <- ts(yardstick[1:528,i], freq=24)
yd[[i]] <- ts(yardstick[529:552,i], freq=24)
zd[[i]] <- ts(yardstick[1:552,i], freq=24)
}
list(yard.book.training=xd, yard.book.small.valid=yd, yard.book.valid=zd)
}
l <- ts.builder()
Then here are the returned values:
l$yard.book.training[[9]]
etc.

Resources