How to cbind many data frames with a loop? - r

I have 105 data frames with xts, zoo class and II want to combine their 6th columns into a data frame.
So, I created a data frame that contains all the data frame names to use it with a 'for' function:
mydata <- AAL
for (i in 2:105) {
k <- top100[i,1] # The first column contains all the data frame names
mydata <- cbind(mydata, k)
}
It's obviously wrong, but I have no idea either how to cbind so many data frames with completely different names (my data frame names are NASDAQ Symbols) nor how to pick the 6th column of all.
Thank you in advance

Try foreach package. May be there is more elegant way to do this task, but this approach will work.
library(foreach)
#create simple data frames with columns named 'A' and 'B'
df1<-t(data.frame(1,2,3))
df2<-t(data.frame(4,5,6))
colnames(df1)<-c('A')
colnames(df2)<-c('B')
#make a list
dfs<-list(df1,df2)
#join data frames column by column, this will preserve their names
foreach(x=1:2
,.combine=cbind)%do% # don`t forget this directive
{
dfs[[x]]
}
The result will be:
A B
X1 1 4
X2 2 5
X3 3 6
To pick column number 6:
df[,6]

First, you should store all of your data.frames in a list. You can then use a combination of lapply and do.call to extract and recombine the sixth columns of each of the data.frames:
# Create sample data
df_list <- lapply(1:105, function(x) {
as.data.frame(matrix(sample(1:1000, 100), ncol = 10))
})
# Extract the sixth column from each data.frame
extracted_cols <- lapply(df_list, function(x) x[6])
# Combine all of the columns together into a new data.frame
result <- do.call("cbind", extracted_cols)
One way to get all of your preexisting data.frames into a list would be to use lapply along with get:
df_list <- lapply(top100[[1]], get)

Related

How can I create smaller data frames out of a nested list in R

My list has 12000 entries. Each entry consists of 16 columns and 8 rows.
I would like to create a data frame for every single entry. I'm interested in 3 of the 16 columns (X,Y and Z coordinates)
I already tried this:
data_frame12000 <- as.data.frame(do.call(cbind, list_small_read_laz))
This and other functions only create one big data.frame with all the 16 columns for each entry.
Can anybody help me?
Thank You in advance!
If I am correct, you have a list containing 12000 elements each cintaining a dataframe with 8r*16c. And I suppose the column names are the same for all list elements.
First select X, Y, Z columns from each entry element :
library(tidyverse)
# assumming your list name is 'list_small_read_laz'
reduced_column <- map(list_small_read_laz,~ select(.,X,Y,Z))
Then combine all entries into a single dataframe:
df_reduced_column <- map_dfr(reduced_column, as.data.frame)
Hope this is what you are looking for.
If you have a list of 12000 dataframes you can generate a list of dataframes with only the desired columns using lapply. Here is an example using mtcars:
cars1 <- mtcars
cars2 <- cars1
cars3 <- cars2
list1 <- list(cars1, cars2, cars3)
df_list <- lapply(list1, function(x) x[, c(2, 4, 6)]) # column numbers are used
final_df <- Reduce(rbind, df_list) # if you want all of the dataframes combined by rows

Split a dataframe into a list of nested data frames and matrices

I'd like to split the diamonds data frame into a list of 5 dataframe, group by cut. This instruction got me started.
https://dplyr.tidyverse.org/reference/group_split.html
diamonds_g <- diamonds%>% group_split(cut)%>% setNames(unique(diamonds$cut))
My desired output is a list of 5 nested lists. Each nested list contains one data frame and one matrix, such that:
View(diamonds_g[[1]])
factors <- diamonds_g[[1]][2:4]
mat <- diamonds_g[[1]][6:10]
So each of the nested list (or each cut) contains one data frame of n rows (depending on how many diamonds are classified as that cut) named factors by 3 columns, and one matrix of n rows by 10 columns named mat. In other words, the lowest level of the list (the nested matrix and data frame) should have identical names across the 5 nested lists. How do I proceed?
Thank you.
Do you mean something like this?
result <- lapply(diamonds_g, function(x)
list(factors = x[2:4], mat = as.matrix(x[6:10])))
We can use tidyverse
library(dplyr)
library(purrr)
result <- map(diamonds_g, ~ list(factors = .x[2:4], mat = as.matrix(.x[6:10])))

Creating Subset data frames in R within For loop [duplicate]

This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 4 years ago.
What I am trying to do is filter a larger data frame into 78 unique data frames based on the value of the first column in the larger data frame. The only way I can think of doing it properly is by applying the filter() function inside a for() loop:
for (i in 1:nrow(plantline))
{x1 = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
The issue is I don't know how to create a new data frame, say x2, x3, x4... every time the loop runs.
Can someone tell me if that is possible or if I should be trying to do this some other way?
There must be many duplicates for this question
split(plantline, plantline$Plant_Line)
will create a list of data.frames.
However, depending on your use case, splitting the large data.frame into pieces might not be necessary as grouping can be used.
You could use split -
# creates a list of dataframes into 78 unique data frames based on
# the value of the first column in the larger data frame
lst = split(large_data_frame, large_data_frame$first_column)
# takes the dataframes out of the list into the global environment
# although it is not suggested since it is difficult to work with 78
# dataframes
list2env(lst, envir = .GlobalEnv)
The names of the dataframes will be the same as the value of the variables in the first column.
It would be easier if we could see the dataframes....
I propose something nevertheless. You can create a list of dataframes:
dataframes <- vector("list", nrow(plantline))
for (i in 1:nrow(plantline)){
dataframes[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])
}
You can use assign :
for (i in 1:nrow(plantline))
{assign(paste0(x,i), filter(rawdta.df, Plant_Line == plantline$Plant_Line[i]))}
alternatively you can save your results in a list :
X <- list()
for (i in 1:nrow(plantline))
{X[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
Would be easier with sample data. by would be my favorite.
d <- data.frame(plantline = rep(LETTERS[1:3], 4),
x = 1:12,
stringsAsFactors = F)
l <- by(d, d$plantline, data.frame)
print(l$A)
print(l$B)
Solution using plyr:
ma <- cbind(x = 1:10, y = (-4:5)^2, z = 1:2)
ma <- as.data.frame(ma)
library(plyr)
dlply(ma, "z") # you split ma by the column named z

list variables to individual data.frames

Let's say I have a list of 30 data.frames, each containing 2 variables (called value, and rank), called myList
I'd know I can use
my.DF <- do.call("cbind", myList)
to create the output my.DF containing all the variables next to each other.
It is possible to cbind each variable individually into it's own data.frame i.e to just have a new data.frame of just the 2nd variable?
We can extract the second column by looping over the list (lapply) and wrap with data.frame.
data.frame(lapply(myList, `[`, 2))
If we want to separate the variables,
lapply(names(myList[[1]]), function(x)
do.call(cbind,lapply(myList, `[`, x)))
data
set.seed(24)
myList <- list( data.frame(value=1:6, rank= sample(6)),
data.frame(value=7:12, rank=sample(6)))

assigning a variable to each row for a excel format on R

So i have a excel file with 5 columns and 100 rows. I import this to R.
I want to make unique list vector for each of the rows. Each vector would then contain 5 elements.
My issue is how do i make R to automatically assigns 100 unique variable names and assign each row elements to those variables? I don't want to manually assign variable names to each row.
You can use the split function for that. An example:
# creating a data.frame
df <- data.frame(x=gl(2,10, labels=c("t","c")), y=runif(20))
# splitting the dataframe df in seperate dataframes
lst <- split(df, 1:nrow(df))
This will create a list of dataframes lst. You can access the separate dataframes as follows:
> lst[1]
$`1`
x y
1 t 0.971842
A slightly alternative approach:
# creating a data.frame
set.seed(1)
df <- data.frame(x=rnorm(20), y=runif(20))
# creating a unique value for each row
df$unique <- paste0("u",seq_len(20))
# splitting the dataframe df in seperate dataframes
lst <- split(df, df$unique)
this gives for example:
> lst$u11
x y unique
11 1.511781 0.4776196 u11

Resources