Convert a vector in R to a nested list - r

I have a vector of length n and I want to convert it to a nested list. The outer list should be of length n / 2 and each item in this list should have two sub-lists. The first sub-list in each list will hold an odd numbered element from the vector and the second sub-list will hold an even numbered element (see my example code if this isn't making sense... I'm having a hard time describing it in general terms).
Question: Is there a way to convert this vector into a nested list without using for loops?
I'm doing this in the context of a simulation and so I want to make sure it is as fast as possible. The aforementioned vector will vary in length between iterations of the simulation, so I'm trying to find an answer I can generalize to a vector of length n. The vector will always be even though.
Example vector and list:
ex_vector <- 1:6
ex_list <- list(
list(1, 2),
list(3, 4),
list(5, 6)
)
Edit: fixed an error in my example code

I'm not sure to understand the general principle, but the code below works for the example:
> x <- apply(matrix(ex_vector, ncol=2, byrow=TRUE), 1, as.list)
> str(x)
List of 3
$ :List of 2
..$ : int 1
..$ : int 2
$ :List of 2
..$ : int 3
..$ : int 4
$ :List of 2
..$ : int 5
..$ : int 6

n <- length(ex_vector)
lapply(split(ex_vector, rep(1:(n/2), each = 2)), split, 1:2)

We create a grouping variable with gl and splitthe vector into a list and convert to list with as.list
n <- 2
out <- lapply(split(ex_vector, as.integer(gl(length(ex_vector), n,
length(ex_vector)))), as.list)
str(out)
#List of 3
# $ 1:List of 2
# ..$ : int 1
# ..$ : int 2
# $ 2:List of 2
# ..$ : int 3
# ..$ : int 4
# $ 3:List of 2
# ..$ : int 5
# ..$ : int 6
Or use %/% to split
lapply(split(ex_vector, (seq_along(ex_vector)-1) %/% n + 1), as.list)
Or compactly
split(as.list(ex_vector), cumsum(seq_along(ex_vector) %%2))

Related

Conditionally replacing elements in a list with a list in R

I am trying to do this
a <- list(1,2,3)
a[a == 2] <- list(1,2,3)
Which gives me number of items to replace is not a multiple of replacement length. Generally speaking, I iteratively want to replace elements in a list of integers based on a condition with other lists of various lengths that depend on the integer in the original list.
The question did not state what result is desired but this works without warning or error replacing the second element of a with the indicated list.
a <- list(1, 2, 3)
a[a == 2] <- list(list(1,2,3))
giving:
> str(a)
List of 3
$ : num 1
$ :List of 3
..$ : num 1
..$ : num 2
..$ : num 3
$ : num 3

List assignment for list with greater than three nesting

I have not been able to find a fix for this error. I have implemented work-arounds before, but I wonder if anyone here knows why it occurs.
the following returns no error as expected
q <- list()
q[["a"]][["b"]] <- 3
q[["a"]][["c"]] <- 4
However, when I add another level of nesting I get:
q <- list()
q[["a"]][["b"]][["c"]]<- 3
q[["a"]][["b"]][["d"]] <- 4
Error in q[["a"]][["b"]][["d"]] <- 4 : more elements supplied than there are to replace
To make this even more confusing if I add a fourth nested list I get:
q <- list()
q[["a"]][["b"]][["c"]][["d"]] <- 3
q[["a"]][["b"]][["c"]][["e"]] <- 4
Error in *tmp*[["c"]] : subscript out of bounds
I would have expected R to return the same error message for the triple nested list as for the quadruple nested list.
I first came across this a few months ago. I am running R 3.4.3.
If we check the str(q) from the first assignment, it is a list with a single element 'a'. On subsequent assignment, it is creating a named vector rather than a list.
q <- list()
q[["a"]][["b"]] <- 3
q[["a"]][["c"]] <- 4
str(q)
#List of 1
# $ a: Named num [1:2] 3 4
# ..- attr(*, "names")= chr [1:2] "b" "c"
is.vector(q$a)
#[1] TRUE
If we try to do an assignment on the next level, it is like assignment based on indexing the name i.e. 'b' which is empty and assign value on 'c'. The option would be to create a list element by wrapping the value with list
q <- list()
q[["a"]][["b"]][["c"]]<- list(3)
q[["a"]][["b"]][["d"]] <- list(4)
It returns the structure with 'q' as a list of 1 element i.e. 'a', which is again a list of length 1 ('b') and as we assign two values '3' and '4' for 'c' and 'd', it is a list of 2 elemeents
str(q)
#List of 1
# $ a:List of 1
# ..$ b:List of 2
# .. ..$ c:List of 1
# .. .. ..$ : num 3
# .. ..$ d:List of 1
# .. .. ..$ : num 4
By this way, we can nest 'n' number of lists
q <- list()
q[["a"]][["b"]][["c"]][["d"]] <- list(3)
q[["a"]][["b"]][["c"]][["e"]] <- list(4)
Note: It is not clear about the expected output structure

How to sort a list by the maximum of each sublist?

I have a list which contains a powerset:
> str(ps10)
List of 1023
$ : int 1
$ : int 2
$ : int [1:2] 1 2
$ : int 3
$ : int [1:2] 1 3
$ : int [1:2] 2 3
$ : int [1:3] 1 2 3
...
How can I sort the outer list by some statistic on the inner list (e.g. min, median, etc.)? The list is created sorted by the maximum inner element using HapEstXXR::powerset(). I want to keep the list structure for later use.
sort, sort.list and order don't accept lists. In SAS, I would add the statistic as yet another column to a dataset and call a PROC SORT by list.statistic, list id, list elements. I haven't figured out how to do this efficiently in R, without creating auxiliary vectors to get the ordering.
Thanks
If L is a list defined as L <- list(c(1,2), c(1,3), c(2,5), c(1,4)), then you could use:
L[order(-sapply(L, max))]
Explanation:
sapply(L, max) gets the maximum for each item in L
Putting that inside order with a minus-sign gives you the (decreasing) order of the elements starting with the one with the highest maximum.
Putting that in between square brackets reorders L in the wanted order.
Suppose you have the following list and you want to sort it decreasingly according to the maximum element.
L = list(c(1), c(1,2), c(1,4), c(2,5))
So in this case the order would be 4, 3, 2, 1.
If I understand your question correctly, you can simply iterate through the list and then use order:
maxArray = rep(NA, length(L))
for(i in 1:length(L)) {
maxArray[i] = max(L[[i]])
}
order(maxArray, decreasing = TRUE)
which will correctly return [1] 4 3 2 1

Setting variable attributes via subsetting a dataframe

I want to set an attribute ("full.name") of certain variables in a data frame by subsetting the dataframe and iterating over a character vector. I tried two solutions but neither works (varsToPrint is a character vector containing the variables, questionLabels is a character vector containing the labels of questions):
Sample data:
jtiPrint <- data.frame(question1 = seq(5), question2 = seq(5), question3=seq(5))
questionLabels <- c("question1Label", "question2Label")
varsToPrint <- c("question1", "question2")
Solution 1:
attrApply <- function(var, label) {
`<-`(attr(var, "full.name"), label)
}
mapply(attrApply, jtiPrint[varsToPrint], questionLabels)
Solution 2:
i <- 1
for (var in jtiPrint[varsToPrint]) {
attr(var, "full.name") <- questionLabels[i]
i <- i + 1
}
Desired output (for e.g. variable 1):
attr(jtiPrint$question1, "full.name")
[1] "question1Label"
The problems seems to be in solution 2 that R sets the attritbute to a new dataframe only containing one variable (the indexed variable). However, I don't understand why solution 1 does not work. Any ideas how to fix either of these two ways?
Solution 1 :
The function is 'attr<-' not '<-'(attr...), also you need to set SIMPLIFY=FALSE (otherwise a matrix is returned instead of a list) and then call as.data.frame :
attrApply <- function(var, label) {
`attr<-`(var, "full.name", label)
}
df <- as.data.frame(mapply(attrApply,jtiPrint[varsToPrint],questionLabels,SIMPLIFY = FALSE))
> str(df)
'data.frame': 5 obs. of 2 variables:
$ question1: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question1Label"
$ question2: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question2Label"
Solution 2 :
You need to set the attribute on the column of the data.frame, you're setting the attribute on copies of the columns :
for(i in 1:length(varsToPrint)){
attr(jtiPrint[[i]],"full.name") <- questionLabels[i]
}
> str(jtiPrint)
'data.frame': 5 obs. of 3 variables:
$ question1: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question1Label"
$ question2: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question2Label"
$ question3: int 1 2 3 4 5
Anyway, note that the two approaches lead to a different result. In fact the mapply solution returns a subset of the previous data.frame (so no column 3) while the second approach modifies the existing jtiPrint data.frame.

R get row mean of every nth element in a list

I have a nested list and I want to get the mean of one particular variable inside the list.
When my list was not nested, it was simple to do, but I am not sure how to change my code now that there are multiple elements of different sizes
str of list:
> str(means2)
List of 1
$ :List of 2
..$ :'data.frame': 12 obs. of 2 variables:
.. ..$ means : num [1:12] 465063 355968 76570 542873 854570 ...
.. ..$ variablenames: chr [1:12] "NumberOfPassengers" "FareClass" "TripType" "JourneyTravelTime" ...
..$ :'data.frame': 12 obs. of 2 variables:
.. ..$ means : num [1:12] 449490 359997 67899 602895 967327 ...
.. ..$ variablenames: chr [1:12] "NumberOfPassengers" "FareClass" "TripType" "JourneyTravelTime" ...
I was using this code
testdf=as.data.frame(rowMeans(simplify2array(sapply(means2,"[[",1))))
I am just not sure how to change this code to match the fact the means I am obtaining are from the 2nd element and not the first(only) element.
Thanks for any help
example:
edited: example had an error
Based on the str of the nested list for 'means2', this should work
unlist(lapply(means2, function(x) rowMeans(do.call(cbind, sapply(x, "[", 1)))))
As there is only a single outer list, we can extract it using [[, loop over the list elements get the first column as vector, get the elementwise sum with Reduce and divide by the length of the list (in the example it is 2).
Reduce(`+`, lapply(means2[[1]], `[`, 1))/2
Or after extracting the list elements, cbind it and do a rowMeans
rowMeans(do.call(cbind,lapply(means2[[1]], `[`, 1)))
data
means2 <- list(list(data.frame(means = 1:5, variablenames = letters[1:5]),
data.frame(means = 11:15, variablenames = letters[6:10])) )

Resources