Creating nested lists in a loop in R - r

This bit of code does what I want it to do, but generates a warning for every iteration of the loop:
library(epiR)
cccList <- list()
for (i in 3:ncol(dfData)){
tmpvar <- paste("cccIntactVs.", i, sep = "")
assign(
tmpvar,
epi.ccc(
dfData[2:nrow(dfData),2],
dfData[2:nrow(dfData),i],
ci = "z-transform",
conf.level = 0.95,
rep.measure = FALSE
)
)
cccList[i] <- get(paste0("cccIntactVs.", i))
}
I get this warning every time the output of epi.ccc() is added to cccList():
Warning in cccList[i] <- get(paste0("cccIntactVs.", i)) :
number of items to replace is not a multiple of replacement length
Is there a more proper way of accomplishing this? The output of epi.ccc() is a list of 7 elements. Since the output is the same length each time and I'm only adding to the list, why is it complaining about mismatched lengths or replacement?

You want to use [[i]] instead of [i]
Basically, [ means you want to replace a certain part of a list with different content, and the replacement needs to have just as many items as the number of slots you are trying to replace.
OTOH, using [[ means you want to put everything you are assigning into one slot, which it seems you want to do.
An example of what happens:
myList <- list(1,2,3,4,5,6,7)
myList[3:5] <- c(11, 12, 13)
myList[[6]] <- c(14, 15, 16)
Here, 11-13 are distributed among slots 3 through 5: 3 replacement items in 3 slots.
And 14-16 are placed in one slot: This slot now contains a length-3 vector.
Now what happens if we try this?
myList[1] <- c(17,18,19)
We tell R it should distribute 3 items over one slot. It tries the best it can, which is not much: it discards everything but the first item. But luckily it warns you. If you really just want to assign just the first item, you can use
myList[1] <- c(17,18,19)[1]
But that's not really useful, there's no use in supplying 18 and 19.
You could make it a list of length one:
myList[1] <- list(c(17,18,19))
But generally, if you want to put it in one slot, using [[ is the way to go.
And as a sidetrack: why was it build this way?
The reason is that [ can give you access to multiple slots, and you might not know which ones or how many beforehand. What should happen if I try this?
someVar <- readLines(somefile) # someVar happens to be c(1, 2) instead of having length 1
myList[someVar] <- 21:23
Put 21:23 in both slots? Put 21 in the first slot and the rest (22, 23) in slot 2?
Using [[ means you are sure only one slot is used, and you're not unexpectedly overwriting anything.

cccList[i] <- get(paste0("cccIntactVs.", i)) will trigger this warning if get(paste0("cccIntactVs.", i)) is not of length 1.
Using get(paste0("cccIntactVs.", i))[1] should solve it but if you didn't expect get(paste0("cccIntactVs.", i)) to have a length superior to 1 you likely have a mistake somewhere else in your code, despite the result looking fine to you now.

Related

Problem deleting elements with 2 values in R list

I am trying to format a list such that I would have one word per value(I imported it from a very poor quality csv, and can't do much about improving the csv). I currently am trying to make it so that every element only has one value, however, the code I am currently using is not doing this, although I am not getting error messages.
Here is the code I am currently using:
Terms <- [] #9020 elements with lengths 1, 2, and 3
for (x in 1:length(Terms)){
if (Terms[[x]] %>% is.list()){
term <-Terms[[x]]
length(term) <- 1
Terms[[x]]<-term
}#should return list of same size, but only with elements of length 1
Any help figuring out what I could use to make it so that I can delete any second variables would be appreciated.
An option would be to create a logical condition with lengths and then use that for subsetting the list
lst2 <- lst1[lengths(lst1) == 1]
If the intention is to get only the first element
lst2 <- lapply(lst1, `[`, 1)
NOTE: Assuming the list elements are vectorss

How to assign an edited dataset to a new variable in R?

The title might be misleading but I have the scenario here:
half_paper <- lapply(data_set[,-1], function(x) x[x==0]<-0.5)
This line is supposed to substitute 0 for 0.5 in all of the columns except the first one.
Then I want to take half_paper and put it into here where it would rank all of the columns except the first one in order.:
prestige_paper <-apply(half_paper[,-1],2,rank)
But I get an error and I think that I need to somehow make half_paper into a data set like data_set.
Thanks for all of your help
Your main issue 'This line is supposed to substitute 0 for 0.5 in all of the columns except the first one' can be remedied by placing another line in your anonymous function. The gets operator <- returns the value of whatever is on the right hand side, so your lapply was returning a value of 0.5 for each column. To remedy this, another line can be added to the function that returns the modified vector.
It's also worth noting that lapply returns a list. apply was substituted in for lapply in this case for consistency, but plyr::ddply may suit this specific need better.
half_mtcars <- apply(mtcars[, -1], 2, function(x) {x[x == 0] <- .5;return(x)})
prestige_mtcars_tail <- apply(half_mtcars, 2, rank)
prestige_mtcars <- cbind(mtcars[,1, drop = F], prestige_mtcars_tail)

How to prevent myvector[3:(L-2)] reading backwards when L<5?

I am processing records from a large dataset with varying lengths using data.table[, somefunc(someseries), by=]. The length L of each record someseries could be anything from 1 to 50. I want to handle the following efficiently without needlessly adding an if expression:
For each group, I want the simplest way to access its middle entries someseries[3:(L-2)]
Problem: beware that when L<5, the expression someseries[3:(L-2)] actually misbehaves by inferring backwards direction. This is due to the default "helpful" behavior of [from:to] which uses
seq(from..., to..., by = ((to - from)/(length.out - 1) ...) i.e. infers backwards direction by=-1
In that case I just want somefunc to get passed an empty vector() not someseries[4:2]
But you can't explicitly do seq(... by=1) because that errors if from > to.
Here's a testcase:
set.seed(15)
ragged_arrays <- lapply(ceiling(runif(5,1,5)), function(n) (1:n) )
# indexing with unwanted auto-backwards
lapply(ragged_arrays, function(someseries) someseries[2 : (length(someseries)-2)] )
For the sake of our testcase, somefunc is a function which behaves gracefully when passed an empty vector, e.g. median()
I'm assuming you want to drop the first two and last two elements.
ragged_arrays <- lapply(1:7, seq_len)
lapply(ragged_arrays, function(x) x[seq_along(x) > 2 & rev(seq_along(x)) > 2])

How to subset a list based on the length of its elements in R

In R I have a function (coordinates from the package sp ) which looks up 11 fields of data for each IP addresss you supply.
I have a list of IP's called ip.addresses:
> head(ip.addresses)
[1] "128.177.90.11" "71.179.12.143" "66.31.55.111" "98.204.243.187" "67.231.207.9" "67.61.248.12"
Note: Those or any other IP's can be used to reproduce this problem.
So I apply the function to that object with sapply:
ips.info <- sapply(ip.addresses, ip2coordinates)
and get a list called ips.info as my result. This is all good and fine, but I can't do much more with a list, so I need to convert it to a dataframe. The problem is that not all IP addresses are in the databases thus some list elements only have 1 field and I get this error:
> ips.df <- as.data.frame(ips.info)
Error in data.frame(`128.177.90.10` = list(ip.address = "128.177.90.10", :
arguments imply differing number of rows: 1, 0
My question is -- "How do I remove the elements with missing/incomplete data or otherwise convert this list into a data frame with 11 columns and 1 row per IP address?"
I have tried several things.
First, I tried to write a loop that removes elements with less than a length of 11
for (i in 1:length(ips.info)){
if (length(ips.info[i]) < 11){
ips.info[i] <- NULL}}
This leaves some records with no data and makes others say "NULL", but even those with "NULL" are not detected by is.null
Next, I tried the same thing with double square brackets and get
Error in ips.info[[i]] : subscript out of bounds
I also tried complete.cases() to see if it could potentially be useful
Error in complete.cases(ips.info) : not all arguments have the same length
Finally, I tried a variation of my for loop which was conditioned on length(ips.info[[i]] == 11 and wrote complete records to another object, but somehow it results in an exact copy of ips.info
Here's one way you can accomplish this using the built-in Filter function
#input data
library(RDSTK)
ip.addresses<-c("128.177.90.10","71.179.13.143","66.31.55.111","98.204.243.188",
"67.231.207.8","67.61.248.15")
ips.info <- sapply(ip.addresses, ip2coordinates)
#data.frame creation
lengthIs <- function(n) function(x) length(x)==n
do.call(rbind, Filter(lengthIs(11), ips.info))
or if you prefer not to use a helper function
do.call(rbind, Filter(function(x) length(x)==11, ips.info))
Alternative solution based on base package.
# find non-complete elements
ids.to.remove <- sapply(ips.info, function(i) length(i) < 11)
# remove found elements
ips.info <- ips.info[!ids.to.remove]
# create data.frame
df <- do.call(rbind, ips.info)

What's the shortest way of creating a load of R objects with consecutive names?

This is what I've got at the moment:
weights0 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights1 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights2 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights3 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights4 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights5 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights0 <- 1 # sets initial weights to 1
Nice and clear, but not nice and short!
Would experienced R programmers write this in a different way?
EDIT:
Also, is there an established way of creating a number of weights that depends on a pre-existing variable to make this generalisable? For example, the parameter num.cons would equal 5: the number of constraints (and hence weights) that we need. Imagine this is a common programming problem, so sure there is a solution.
Option 1
If you want to create the different elements in your environment, you can do it with a for loop and assign. Other options are sapply and the envir argument of assign
for (i in 0:5)
assign(paste0("weights", i), array(dim=c(nrow(ind),nrow(all.msim))))
Option 2
However, as #Axolotl9250 points out, depending on your application, more often than not it makes sense to have these all in a single list
weights <- lapply(rep(NA, 6), array, dim=c(nrow(ind),nrow(all.msim)))
Then to assign to weights0 as you have above, you would use
weights[[1]][ ] <- 1
note the empty [ ] which is important to assign to ALL elements of weights[[1]]
Option 3
As per #flodel's suggestion, if all of your arrays are of the same dim,
you can create one big array with an extra dim of length equal to the number
of objects you have. (ie, 6)
weights <- array(dim=c(nrow(ind),nrow(all.msim), 6))
Note that for any of the options:
If you want to assign to all elements of an array, you have to use empty brackets. For example, in option 3, to assign to the 1st array, you would use:
weights[,,1][] <- 1
I've just tried to have a go at achieving this but with no joy, maybe someone else is better than I (most likely!!). However I can't help but feel maybe it's easier to have all the arrays in a single object, a list; that way a single lapply line would do, and instead of referring to weights1 weights2 weights3 weights4 it would be weights[[1]] weights[[2]] weights[[3]] weights[[4]]. Future operations on those arrays would then also be achieved by the apply family of functions. Sorry I can't get it exactly as you describe.
given what you're duing, just using a for loop is quick and intuitive
# create a character vector containing all the variable names you want..
variable.names <- paste0( 'weights' , 0:5 )
# look at it.
variable.names
# create the value to provide _each_ of those variable names
variable.value <- array( dim=c( nrow(ind) , nrow(all.msim) ) )
# assign them all
for ( i in variable.names ) assign( i , variable.value )
# look at what's now in memory
ls()
# look at any of them
weights4

Resources