I have a list which contains again multiple lists of matrices of the dimensions 3834 1. So all values are basically stored in one column. Now I want to adjust the dimensions of the single matrices of each sublist, so that the new dimensions are 54 71.
Here is some code to reproduce sample data:
######################### create sample data ###########################
# create empty list
list1 <- list()
# fill the list with arrays/matrices
for (i in 1:10) {
list1[[i]] <- array(sample(1:100, 600, replace=T), dim= c(54*71,1))
}
# create the big list
big_list <- list()
for (i in 1:8) {
big_list[[paste0("list", i)]] <- list1
}
The goal can be achieved by using a for loop:
# adjust the dimensions of the matrices by using for loop
for (i in 1:length(big_list)) {
for (j in 1:length(big_list[[1]])) {
dim(big_list[[i]][[j]]) <- c(54,71)
}
}
I am sure that there is a more elegant way than using five lines for this, most likely by using lapply/apply/tapply etc. But I could not figure out how to place the dim() and c(54,71) properly in the command.
Anybody with a hint?
In R, the code
f(x) <- y
is equivalent to
x <- `f<-`(x, value = y)
With that in mind, you can use (nested) lapply with dim<-:
big_list <- lapply(
big_list,
function (lst) lapply(lst, `dim<-`, c(54L, 71L))
)
… and in principle you can omit the anonymous function — but whether that’s readable is debatable:
big_list <- lapply(big_list, lapply, `dim<-`, c(54L, 71L))
For what its worth, map_depth() from purrr is useful to dig into nested lists.
library(purrr)
map_depth(big_list, 2, matrix, nrow = 54, ncol = 71)
# or
map_depth(big_list, 2, `dim<-`, c(54L, 71L))
Related
I have two lists, each of them containing the same number of vectors. List1 contains vectors all of the same length, while List2 contains vectors of possibly different lenghths. The idea is, to use the content of the vectors of List2 as indices for the corresponding vectors of List1.
Here is some code for an reproducible example:
# create empty lists
List1 <- list()
List2 <- list()
# fill the lists with vectors
set.seed(50)
for (i in 1:1000) {
List1[[i]] <- sample(0:500, 360, replace= T)
}
set.seed(100)
for (i in 1:1000) {
List2[[i]] <- sample(0:200, 50, replace= T)
}
Here an example of what I wish to do with extracting a vector from each list:
vec1 <- List1[[1]]
vec2 <- List2[[1]]
# now use the second vector as index for the first one
vec1[vec2]
The question is now, how to transfer this to the entire lists, so that the output is again a list with the resulting vectors?
Anybody with an idea?
EDIT:
This is an edit after some very helpful answers with a new structure of lists, the question remains the same.
Here is the code for an reproducible example:
# create empty lists
List1 <- list()
List2 <- list()
# fill the lists with vectors
set.seed(50)
for (i in 1:1000) {
List1[[i]] <- sample(0:500, 360, replace= T)
}
set.seed(100)
for (i in 1:1000) {
List2[[i]] <- sample(0:200, 50, replace= T)
}
# create the first level sublists
sublist1 <- list()
sublist2 <- list()
for (i in 1:8) {
sublist1[[i]] <- List1
sublist2[[i]] <- List2
}
# create the super lists
superlist1 <- list(sublist1, sublist1, sublist1, sublist1, sublist1, sublist1, sublist1, sublist1)
superlist2 <- list(sublist2, sublist2, sublist2, sublist2, sublist2, sublist2, sublist2, sublist2)
As you can see, the two superlists have the same structure, they only differ in the lengths of the lists of the lowest levels.
So now the question is, how to use the values of superlist2[[1]][[1]][[1]] as vectors for the corresponding element in superlist1, which is superlist1[[1]][[1]][[1]] and so on?
You need an apply-function:
lapply(seq_along(List1), function(n) List1[[n]][List2[[n]]])
returns the expected list.
With tidyverse you could use
library(dplyr)
library(purrr)
List1 %>%
map2(.y = List2, ~.x[.y])
to get the same result.
We could use Map in base R
Map(`[`, List1, List2)
This is my first loop. so it is a try, please be kind:
# loop
for (i in 1:length(List1)) {
print(List1[[i]][List2[[i]]])
}
I have a list of dataframes that I would like to multiply for each element of vector.
The first dataframe in the list would be multiplied by the first observation of the vector, and so on, producing another list of dataframes already multiplied.
I tried to do this with a loop, but was unsuccessful. I also tried to imagine something using map or lapply, but I couldn't.
for(i in vec){
for(j in listdf){
listdf2 <- i*listdf[[j]]
}
}
Error in listdf[[j]] : invalid subscript type 'list'
Any idea how to solve this?
*Vector and the List of Dataframes have the same length.
Use Map :
listdf2 <- Map(`*`, listdf, vec)
in purrr this can be done using map2 :
listdf2 <- purrr::map2(listdf, vec, `*`)
If you are interested in for loop solution you just need one loop :
listdf2 <- vector('list', length(listdf))
for (i in seq_along(vec)) {
listdf2[[i]] <- listdf[[i]] * vec[i]
}
data
vec <- c(4, 3, 5)
df <- data.frame(a = 1:5, b = 3:7)
listdf <- list(df, df, df)
I have a list of multiple matrices. I can transform an item of this list into a dataframe using this code:
as.data.frame(list_of_matrices[i])
But how can I do the same in an automatic way for all indexes (i)?
I tried:
a <- data.frame()
for(i in 1:length(list_of_matrices)){
dataframes[i] <- as.data.frame(list_of_matrices[i])
}
but it didn't work:
Error in `[[<-.data.frame`(`*tmp*`, i, value = list(X1 = 1:102, X2 = c(2L, :
replacement has 102 rows, data has 0
In the OP's code, we need [[ instead of [ because by doing [, it will still be a list of length 1
for(i in seq_along(list_of_matrices)){
list_of_matrices[[i]] <- as.data.frame(list_of_matrices[[i]])
}
If we need multiple objects in the global env, (not recommended), either assign or list2env should work. After naming the list with custom names or letters (a, b, c, ,..), use list2env
names(list_of_matrices) <- letters[seq_along(list_of_matrices)]
list2env(list_of_matrices, .GlobalEnv)
Now, we check for
head(a)
head(b)
Another option is `assign with in the loop itself
for(i in seq_along(list_of_matrices)) {
assign(letters[i], as.data.frame(list_of_matrices[[i]])
}
head(a)
head(b)
NOTE: We assume that the length of list_of_matrices is less than 26 or else have to change the names from the built-in letters to something else..
Try this:
# Example list of matrices
mat_list <- list(
matrix(runif(20), 4, 5),
matrix(runif(20), 4, 5)
)
# Convert to list of df
df_list <- lapply(mat_list, as.data.frame)
I have several dataframes (full data and reducted data) and now I want to do a whole lot of analysing with kmeans and hclust. I want to be able to work in a loop and store the results in a list where I can retreive (parts of) the stored objects based on their names. The reason is that in R-Markdown there is no good way to create new objects (and no, assign is NOT a good option to do so).
So the idea is that I make several kmeans-objects in a for-loop on several dataframes and put them to a list. But I can't seem to store them in such a way, that I can name these objects. In my list everything is cluttering up. See my example.
To retreive (parts of) the object of the desired list, I have problems how to address this parts (see my last part)
set.seed(4711)
df <- data.frame(matrix(sample(0:6, 120, replace = TRUE), ncol = 15, nrow = 8))
list_of_kmeans_objects <- list()
for (i in 2:4){
list_of_kmeans_objects <- c(list_of_kmeans_objects, kmeans(df, centers = i))
}
Now I have a clutterded up list of 36 items. But what I want is a list with 'items' which I also want to be named. My desired list would be:
C2_kmeans_df <- kmeans(df, centers = 2)
C3_kmeans_df <- kmeans(df, centers = 3)
C4_kmeans_df <- kmeans(df, centers = 4)
desired_list_of_kmeans <- list(C2_kmeans_df, C3_kmeans_df, C4_kmeans_df, C5_kmeans_df)
names(desired_list_of_kmeans)[1] <- "C2_kmeans_df"
names(desired_list_of_kmeans)[2] <- "C3_kmeans_df"
names(desired_list_of_kmeans)[3] <- "C4_kmeans_df"
If I should have this list, my last problem is how do I extract for example
C3_kmeans_df$cluster #or
C4_kmeans_df$tot.withinss
from this list, using the names of the objects in the desired list?
Here is an option using lapply and setNames.
idx <- 2:4
out <- setNames(object = lapply(idx, function(i) kmeans(df, centers = i)),
nm = paste0("C", idx, "_kmeans_df"))
Check the names
names(out)
# [1] "C2_kmeans_df" "C3_kmeans_df" "C4_kmeans_df"
Access cluster
out$C2_kmeans_df$cluster
# [1] 2 1 2 1 2 1 2 1
In your present for loop, you erase the list_of_kmeans_objects object at each iteration.
The following code should do what you do want:
list_of_kmeans_objects <- list()
aaa <- 0
for (i in 2:4) {
aaa <- aaa+1
list_of_kmeans_objects[[aaa]] <- kmeans(df, centers=i)
names(list_of_kmeans_objects)[aaa] <- paste0("C", aaa, "_kmeans_df")
}
require(quantmod)
require(TTR)
iris2 <- iris[1:4]
b=NULL
for (i in 1:ncol(iris2)){
for (j in 1:ncol(iris2)){
a<- runCor(iris2[,i],iris2[,j],n=21)
b<-cbind(b,a)}}
I want to calculate a rolling correlation of different columns within a dataframe and store the data separately by a column. Although the code above stores the data into variable b, it is not as useful as it is just dumping all the results. What I would like is to be able to create different dataframe for each i.
In this case, as I have 4 columns, what I would ultimately want are 4 dataframes, each containing 4 columns showing rolling correlations, i.e. df1 = corr of col 1 vs col 1,2,3,4, df2 = corr of col 2 vs col 1,2,3,4...etc)
I thought of using lapply or rollapply, but ran into the same problem.
d=NULL
for (i in 1:ncol(iris2))
for (j in 1:ncol(iris2))
{c<-rollapply(iris2, 21 ,function(x) cor(x[,i],x[,j]), by.column=FALSE)
d<-cbind(d,c)}
Would really appreciate any inputs.
If you want to keep the expanded loop, how about a list of dataframes?
e <- list(length = length(ncol(iris2)))
for (i in 1:ncol(iris2)) {
d <- matrix(0, nrow = length(iris2[,1]), ncol = length(iris2[1,]))
for (j in 1:ncol(iris2)) {
d[,j]<- runCor(iris2[,i],iris2[,j],n=21)
}
e[[i]] <- d
}
It's also a good idea to allocate the amount of space you want with placeholders and put items into that space rather than use rbind or cbind.
Although it is not a good practice to create dataframes on the fly in R (you should prefer putting them in a list as in other answer), the way to do so is to use the assign and get functions.
for (i in 1:ncol(iris2)) {
for (j in 1:ncol(iris2)){
c <- runCor(iris2[,i],iris2[,j],n=21)
# Assign 'c' to the name df1, df2...
assign(paste0("df", i), c)
}
}
# to have access to the dataframe:
get("df1")
# or inside a loop
get(paste0("df", i))
Since you stated your computation was slow, I wanted to provide you with a parallel solution. If you have a modern computer, it probably has 2 cores, if not 4 (or more!). You can easily check this via:
require(parallel) # for parallelization
detectCores()
Now the code:
require(quantmod)
require(TTR)
iris2 <- iris[,1:4]
Parallelization requires the functions and variables be placed into a special environment that is created and destroyed with each process. That means a wrapper function must be created to define the variables and functions.
wrapper <- function(data, n) {
# variables placed into environment
force(data)
force(n)
# functions placed into environment
# same inner loop written in earlier answer
runcor <- function(data, n, i) {
d <- matrix(0, nrow = length(data[,1]), ncol = length(data[1,]))
for (j in 1:ncol(data)) {
d[,i] <- TTR::runCor(data[,i], data[,j], n = n)
}
return(d)
}
# call function to loop over iterator i
worker <- function(i) {
runcor(data, n, i)
}
return(worker)
}
Now create a cluster on your local computer. This allows the multiple cores to run separately.
parallelcluster <- makeCluster(parallel::detectCores())
models <- parallel::parLapply(parallelcluster, 1:ncol(iris2),
wrapper(data = iris2, n = 21))
stopCluster(parallelcluster)
Stop and close the cluster when finished.