R - Why does rep() seemingly change behaviour of lists - r

When I started to preinitialize lists of list in R which should be filled afterwards I wondered about the behaviour of list objects when used as value in rep(). When I am trying the following...
listOfLists <- rep(list(1, 2), 4)
... listOfLists is a single list:
1 2 1 2 1 2 1 2
However, I would assume it to be a list of lists which finally contain the values 1 and 2 each:
1 2
1 2
1 2
1 2
To get the desired result I have to surround the value entries with c() additionally:
listOfLists <- rep(list(c(1, 2)), 4)
I wonder why this is the case in R. Shouldn't list create a fully functional list as it normally does instead of doing something similar to c()? Why does grouping the values with c() actually solves the problem here?
Thank you for your thoughts!
Conclusion:
Both Ben Bolker's and Peyton's posts give the final answer. It was the behaviour of neither the list()- nor the c()-function. Instead rep() seems to combine the entries of lists and vectors to one. Surrounding the values with another container makes rep() actualy "ignore" the first but repeat the second container.

What you got with rep(list(c(1, 2)), 4) is not a list of lists; it's a list of numeric vectors. If you really want a list of lists, try
replicate(4,list(1,2),simplify=FALSE)
or
rep(list(list(1, 2)), 4)
You can understand a little bit better why this works as it does by performing exegesis on the first line of ?rep:
‘rep’ replicates the values in ‘x’.
In other words, it promises to replicate the contents of x, but not necessarily to replicate x itself. (This is why the second suggestion, kindly contributed by #flodel, works -- it makes x into a list whose contents are a list -- and why the vector-based rep() works -- the contents of the list are a vector.)

It does create a fully functional list. The difference is that in your first example, you create a list with two elements, whereas in the second example, you create a list with one element--a vector.
When you combine lists (e.g., with rep), you're essentially creating a new list with all the elements of the previous lists. In the first example, then, you'll have eight elements, and in the second example, you'll have four.
Another way to see this:
> length(list(1, 2))
[1] 2
> c(list(1, 2), list(1, 2), list(1, 2))
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 1
[[4]]
[1] 2
[[5]]
[1] 1
[[6]]
[1] 2
> length(list(1:2))
[1] 1
> c(list(1:2), list(1:2), list(1:2))
[[1]]
[1] 1 2
[[2]]
[1] 1 2
[[3]]
[1] 1 2

Related

Repeating patterns in a vector in R

If a vector is produced from a vector of unknown length with unique elements by repeating it unknown times
small_v <- c("as","d2","GI","Worm")
big_v <- rep(small_v, 3)
then how to determine how long that vector was and how many times it was repeated?
So in this example the original length was 4 and it repeats 3 times.
Realistically in my case the vectors will be fairly small and will be repeated only a few times.
1) Assuming that there is at least one unique element in small_v (which is the case in the question since it assumes all elements in small_v are unique):
min(table(big_v))
## [1] 3
or using pipes
big_v |> table() |> min()
## [1] 3
Here is a more difficult test but it still works because small_v2[2] is unique in small_v2 even though the other elements of small_v2 are not unique.
# test data
small_v2 <- c(small_v, small_v[-2])
big_v2 <- rep(small_v2, 3)
min(table(big_v2))
## [1] 3
2) If we knew that the first element of small_v were unique (which is the case in the question since it assumes all elements in small_v are unique) then this would work:
sum(big_v[1] == big_v)
## [1] 3
1) If the elements are all repeating and no other values are there, then use
length(big_v)/length(unique(big_v))
[1] 3
2) Or use
library(data.table)
max(rowid(big_v))
[1] 3
Alternatively we could use rle with with to count the repeats
with(rle(sort(big_v)), max(lengths))
Created on 2023-02-04 with reprex v2.0.2
[1] 3

Split a list of elements into two unique lists (and get all combinations) in R

I have a list of elements (my real list has 11 elements, this is just an example):
x <- c(1, 2, 3)
and want to split them into two lists (using all entries) but I want to get all possible combinations of that list to be returned e.g.:
(1,2)(3) & (1)(2,3) & (2)(1,3)
Does anyone know an efficient way to do this for a more complex list?
Thanks in advance for your help!
List with 3 elements:
vec <- 1:3
Note that for each element we have two possibilities: it is either in 1st split or in 2nd split. So we define a matrix of all possible splits (in rows) using expand.grid which produces all possible combinations:
groups <- as.matrix(expand.grid(rep(list(1:2), length(vec))))
However This will treat scenarios where the groups are flipped as different splits. Also will include scenarios where all the observations are in the same group (but there will only be 2 of them).
If you want to remove them we need to remove the lines from groups matrix that only have one group (2 such lines) and all the lines that split the vector in the same way, only switching the groups.
One-group entries are on top and bottom so removing them is easy:
groups <- groups[-c(1, nrow(groups)),]
Duplicated entries are a bit trickier. But note that we can get rid fo them by removing all the rows where the first group is 2. In effect this will make a requirement that the first element is always assigned to group 1.
groups <- groups[groups[,1]==1,]
Then the job is to split the list we have using each of the rows in the groups matrix. For that we use Map to call split() function on our list vec and each row of groups matrix:
splits <- Map(split, list(vec), split(groups, row(groups)))
> splits
[[1]]
[[1]]$`1`
[1] 1 3
[[1]]$`2`
[1] 2
[[2]]
[[2]]$`1`
[1] 1 2
[[2]]$`2`
[1] 3
[[3]]
[[3]]$`1`
[1] 1
[[3]]$`2`
[1] 2 3

Make dataframe from a list of lists in R

I would like to make a dataframe from a list of n. Each list contains 3 different list inside. I am only intrested in 1 list of those 3 list inside. The list I am intrested in is a data.frame with 12 obs of 12 variables.
My imput tmp in my lapply function is a list of n with each 5 observations.
2 of those observations are the Latitude and Longitude. This is how my lapply function looks like:
DF_Google_Places<- lapply(tmp, function(tmp){
Latitude<-tmp$Latitude
Longitude<-tmp$Longitude
LatLon<- paste(Latitude,Longitude, sep=",")
res<-GET(paste("https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=",LatLon,"&radius=200&types=food&key=AIzaSyDS6usHdhdoGIj0ILtXJKCjuj7FBmDEzpM", sep=""))
jsonAnsw<-content(res,"text")
myDataframe<- jsonlite::fromJSON(content(res,"text"))
})
My question is: how do I get this list of 12 obs of 12 variables into a dataframe from a list of n ?
Could anyone help me out?, Thanks
I'm just posting my comment as an answer so I can show output to show you the idea:
x <- list(a=list(b=1,c=2),d=list(b=3,c=4))
So x is a nested list structure, in this case with consistent naming / structure one level down.
> x
$a
$a$b
[1] 1
$a$c
[1] 2
$d
$d$b
[1] 3
$d$c
[1] 4
Now we'll use do.call to build the data.frame. We need to pass it a named list of arguments, so we'll use list(sapply to get the named list. We'll walk the higher level of the list by position, and the inner level by name since the names are consistent across sub-lists at the inner level. Note here that the key idea is essentially to reverse what would be the intuitive way of indexing; since I want to pull observations at the second level from across observations at the first level, the inner call to sapply traverses multiple values of level one for each value of the name at level two.
y <- do.call(data.frame,
list(sapply(names(x[[1]]),
function(t) sapply(1:length(x),
function(j) x[[j]][[t]]))))
> y
b c
1 1 2
2 3 4
Try breaking apart the command to see what each step does. If there is any consistency in your sub-list structure, you should be able to adapt this approach to walk that structure in the right order and fold the data you need.
On a large dataset, this would not be efficient, but for 12x12 it should be fine.

R unlist a list to integers

[revised version]
I have a large character vector in R of size 57241 that contains gene symbols e.g
gene <- c("AL627309.1","SMIM1","DFFB") # assume this of size 57241
I have another table in which one column table$genes has some combinations of genes in each row e.g
head(table$genes)
[1] ,OR4F5,AL627309.1,OR4F29,OR4F16,AL669831.1,
[2] ,TP73,CCDC27,SMIM1,LRRC47,CEP104,DFFB
..
this table has about 1400 rows. For each gene I wanted to find the index of row in table in which it is located.
To do that I used
ind <- sapply(gene, grep, table$genes, fixed=TRUE,USE.NAMES=FALSE))
The variable "ind" returned is a large list of size 57241 which looks like this
head(ind)
[[1]]
[1] 1
[[2]]
[1] 1
[[3]]
[1] 1
[[4]]
[1] 1
[[5]]
[1] 1
[[6]]
[1] 1
I know for a fact each gene exists only once in that table. So the numbers that I am interested in is the list one in each line above i.e. 1. How can I convert this into an integer vector? When I unlist() this somehow I get a vector of length ~500000 whereas I should be getting the same length as of the list. I have tried many functions and combinations but nothing seems to work. Any ideas?
Thanks
I'm not able to reproduce that behavior with either a list or a dataframe:
> gene <- c("AL627309.1","SMIM1","DFFB")
>
> table <- list(genes =c(",OR4F5,AL627309.1,OR4F29,OR4F16,AL669831.1,",
",TP73,CCDC27,SMIM1,LRRC47,CEP104,DFFB"))
> (ind <- sapply(gene, grep, table$genes, fixed=TRUE,USE.NAMES=FALSE))
[1] 1 2 2
I thought for a bit that you should be using match but after further consideration, it seemed as though there must be something different about your data structure. Try posting dput(head (table$gene)) and dput(gene) to make your problem reproducible. You should also stop using the word "list" to refer to the items in that table$gene items. It confuses regular users of R who think you are talking about an R "list". You can try to see which of the items in your ind "list" has a vector of length greater than one with:
which(sapply(ind, length) > 1)

Subsetting R array: dimension lost when its length is 1

When subsetting arrays, R behaves differently depending on whether one of the dimensions is of length 1 or not. If a dimension has length 1, that dimension is lost during subsetting:
ax <- array(1:24, c(2,3,4))
ay <- array(1:12, c(1,3,4))
dim(ax)
#[1] 2 3 4
dim(ay)
#[1] 1 3 4
dim(ax[,1:2,])
#[1] 2 2 4
dim(ay[,1:2,])
#[1] 2 4
From my point of view, ax and ay are the same, and performing the same subset operation on them should return an array with the same dimensions. I can see that the way that R is handling the two cases might be useful, but it's undesirable in the code that I'm writing. It means that when I pass a subsetted array to another function, the function will get an array that's missing a dimension, if I happened to reduce a dimension to length 1 at an earlier stage. (So in this case R's flexibility is making my code less flexible!)
How can I prevent R from losing a dimension of length 1 during subsetting? Is there another way of indexing? Some flag to set?
As you've found out by default R drops unnecessary dimensions. Adding drop=FALSE while indexing can prevent this:
> dim(ay[,1:2,])
[1] 2 4
> dim(ax[,1:2,])
[1] 2 2 4
> dim(ay[,1:2,,drop = F])
[1] 1 2 4

Resources