Split into list of data frames by column index [duplicate] - r

This question already has answers here:
How do I split a data frame among columns, say at every nth column?
(1 answer)
What is the algorithm behind R core's `split` function?
(1 answer)
Closed 4 years ago.
Is there an easy way in base R to split a data frame into a list of data frames based on an index factor levels (taken from another data frame)?
For example,
x = data.frame(num1 = 1:26, let = letters, num2 = 10:35, LET = LETTERS)
ls = list(x[, 1:2], x[, 3:4])
But lets say we had an index indicating factor levels for columns, can split be used?
indx = c(1,1,2,2)
? split(x, indx)

It would be the default method of split
out <- split.default(x, indx)
identical(ls, setNames(out, NULL))
#[1] TRUE

Related

subset data into dataframe iteratively - R [duplicate]

This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 2 years ago.
For this dataframe:
df = data.frame(
col = c("a","f","g","a")
)
How do I subset it for each unique letter and input it into a new dataframe like so?:
sheet_a <- subset(df, col == "a")
sheet_f <- subset(df, col == "f")
sheet_g <- subset(df, col == "g")
I think I need to use a column of unique characters using the below code in a for loop but I'm not sure how
uniq.name_col <- unique(as.vector(df$col))
Thank you for any help!
You can try this, that includes code for exporting dataframes to environment:
#Create list
List <- split(df,df$col)
#Set to envir al dataframes
list2env(List,.GlobalEnv)

cbind dataframes in loop [duplicate]

This question already has answers here:
Using cbind on an arbitrarily long list of objects
(4 answers)
Closed 4 years ago.
I have n number of dataframes named "s.dfx" where x=1:n. All the dataframes have 7 columns with different names. Now I want to cbind all the dataframes.
I know the comand
t<-cbind.data.frame(s.df1,s,df2,...,s.dfn)
But I want to optimize and cbind them in a loop, since n is a large number.
I have tried
for(t2 in 1:n){
t<-cbind.data.drame(s.df[t2])
}
But I get this error "Error in [.data.frame(s.df, t2) : undefined columns selected"
Can anyone help?
I don't think that a for-loop would be any faster than do.call(cbind, dfs), but it wasn't clear to me that you actually had such a list yet. I thought you might need to build such list from a character object. This answer assumes you don't have a list yet but that you do have all your dataframes numbered in an ascending sequence that ends in n where the decimal representation might have multiple digits.
t <- do.call( cbind, mget( paste0("s.dfs", 1:n) ) )
Pasqui uses ls inside mget and a pattern to capture all the numbered dataframes. I would have used a slightly different one, since you suggested that the number was higher than 9 which is all that his pattern would capture:
ls(pattern = "^s\\.df[0-9]+") # any number of digits
# ^ need double escapes to make '.' a literal period or fixed=TRUE
library(purrr) #to be redundant
#generating dummy data frames
df1 <- data.frame(x = c(1,2), y = letters[1:2])
df2 <- data.frame(x = c(10,20), y = letters[c(10, 20)])
df3 <- data.frame(x = c(100, 200), y = letters[c(11, 22)])
#' DEMO [to be adapted]: capturing the EXAMPLE data frames in a list
dfs <- mget(ls(pattern = "^df[1-3]"))
#A Tidyverse (purrr) Solution
t <- purrr::reduce(.x = dfs, .f = bind_cols)
#Base R
do.call(cbind,dfs)
# or
Reduce(cbind,dfs)

List of different length character vectors into data frame [duplicate]

This question already has answers here:
How to convert a list consisting of vector of different lengths to a usable data frame in R?
(6 answers)
Closed 5 years ago.
I am struggling to find a proper question for this, so I all ask it myself risking a duplicate
I have extracted the folder structure of my WD and I want to paste the names into a data frame for which each column represents one level of the folder structure.
Using strsplit I end up with a list of character vectors of which each element represents the name of the folder level. eg.
folders<-list(c("Main") , c("Main","Mid"), c("Main", "Mid", "Sub"))
What would be the easiest way to get a data frame out of this? In this case I would want three columns, but I have several more levels (probably down to six levels)
Expected result (NA could be ""):
data.frame(Level1=c("Main", "Main", "Main"), Level2=c(NA,"Mid", "Mid"),
Level3=c(NA,NA,"Sub"))
The easiest would be stri_list2matrix
library(stringi)
df <- as.data.frame(stri_list2matrix(folders, byrow = TRUE), stringsAsFactors=FALSE)
names(df) <- paste0("Level", seq_along(df))
df
# Level1 Level2 Level3
#1 Main <NA> <NA>
#2 Main Mid <NA>
#3 Main Mid Sub
But, this can also be solved with base R
m1 <- max(lengths(folders))
d1 <- as.data.frame(do.call(rbind, lapply(folders, `length<-`, m1)), stringsAsFactors= FALSE)
names(d1) <- paste0("Level", seq_along(d1))

R: Ordering rows [duplicate]

This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 6 years ago.
I am trying to order rows by a variable. I have created a sample data frame below and tried to order the rows but the ordering does not appear to work.
# Create vectors for data frame
score <- rep(seq(1:3), 2)
id <- rep(c(2014, 2015), each = 3)
var_if_1 <- rep(c(0.1, 0.8), each = 3)
var_if_2 <- rep(c(0.9, 0.7), each = 3)
var_if_3 <- rep(c(0.6, 0.2), each = 3)
# Generate and print data frame of raw data
foo <- data.frame(score, id, var_if_1, var_if_2, var_if_3)
foo
# Impose arbitrary ordering
bar <- foo[sample(1:nrow(foo)), ]
bar
# Order rows increasing on 'score'
bar[order(score), ]
What am I doing wrong that this doesn't oder the rows on score?
You should use
bar[order(bar$score), ]
Otherwise, you're ordering on the base of the variable "score" instead of the column.

Alternative to FOR Loop for below [duplicate]

This question already has answers here:
Group Data in R for consecutive rows
(3 answers)
Closed 6 years ago.
I have written a for loop that takes a group of 5 rows from a dataframe and passes it to a function, the function then returns just one row after doing some operations on those 5 rows. Below is the code:
for (i in 1:nrow(features_data1)){
if (i - start == 4){
group = features_data1[start:i,]
group <- as.data.frame(group)
start <- i+1
sub_data = feature_calculation(group)
final_data = rbind(final_data,sub_data)
}
}
Can anyone please suggest me an alternative to this as the for loop is taking a lot of time. The function feature_calculation is huge.
Try this for a base R approach:
# convert features to data frame in advance so we only have to do this once
features_df <- as.data.frame(features_data1)
# assign each observation (row) to a group of 5 rows and split the data frame into a list of data frames
group_assignments <- as.factor(rep(1:ceiling(nrow(features_df) / 5), each = 5, length.out = nrow(features_df)))
groups <- split(features_df, group_assignments)
# apply your function to each group individually (i.e. to each element in the list)
sub_data <- lapply(X = groups, FUN = feature_calculation)
# bind your list of data frames into a single data frame
final_data <- do.call(rbind, sub_data)
You might be able to use the purrr and dplyr packages for a speed-up. The latter has a function bind_rows that is much quicker than do.call(rbind, list_of_data_frames) if this is likely to be very large.

Resources