rbind dataframes with varying names - r

I have a situation where I need to rbind multiple dataframes based on a name, the trouble i'm having is how to define binding on these dataframes when the names differ -
For instance, the names of my dataframes are:
AB_0
AB_1
BCD_0
BCD_1
And I want to rbind AB_0 and BCD_0, and AB_1 and BCD_1 - my common factor I'm binding on is everything from the _ and after
I know I could use strsplit, but all I'm trying to get to is something like:
for(i in 0:1){
do.call("rbind", mget(sprintf("*_%d", i)))
}
where * is some variable string with varying # of characters

Something like this?
AB_0 <- data.frame(a=1, b=1)
AB_1 <- data.frame(a=2, b=2)
BCD_0 <- data.frame(a=3, b=3)
BCD_1 <- data.frame(a=4, b=4)
XX0 <- do.call("rbind", mget(ls(pattern = ".+_0")))
XX1 <- do.call("rbind", mget(ls(pattern = ".+_1")))
Or automate using a list:
XX <- list()
for (i in 0:1) {
XX[[i+1]] <- do.call("rbind", mget(ls(pattern = paste0(".+_",i))))
}

Related

how to make a loop to fetch one variable from 1000 dataframes

I have dataframe by name V1...V1000. inside the dataframe each has one variable with the same name 'var1.predict'. I'm having a hard time creating a loop in order to concatenate all the variables I want to fetch into one new dataframe
this is the syntax I want to make a loop
df <- cbind.data.frame(model_V1$var1.pred,model_V2$var1.pred,.....model_V1000$var1.pred)
I hope someone can help solve this.
thank you
a new dataframe formed by taking one variable from each dataframe
I assume that you mean you have 1,000 dataframes V1... V1000 each with the column var1.predict and you want to extract the predictions column from each df. If so, there are a few methods outlined below with a little reprex:
# putting dummy data in to the global env
lapply(1:3, \(i) {
assign(paste0("V", i), data.frame(v1 = rnorm(5),
v2 = rnorm(5),
var1.predict = rnorm(5)), envir = .GlobalEnv)
})
df_list <- list(V1, V3, V3)
# using a for loop and do.call
pred_cols <- list()
for (df in df_list) {
pred_cols <- c(pred_cols, list(df[["var1.predict"]]))
}
pred_cols_df <- do.call(cbind, pred_cols)
as.data.frame(pred_cols_df)
pred_cols_df
# using a loop without do.call
for (i in seq_along(df_list)) {
if (i == 1) {
pred_cols_df <- df_list[[1]][["var1.predict"]]
} else {
pred_cols_df <- cbind(pred_cols_df, df_list[[i]][["var1.predict"]])
}
}
as.data.frame(pred_cols_df)
pred_cols_df
# using lapply
pred_cols <- lapply(df_list, `[`, "var1.predict")
pred_cols_df <- do.call(cbind, pred_cols)
as.data.frame(pred_cols_df)
pred_cols_df

Replace loop with lapply, sapply

I'm very new to R, and I heard it's best to replace loops with apply functions, however I couldn't wrap my head around on how to transform my loop with this example. Any help would be appreciated.
file_path is a list of file names
file_path[1] = "/home/user/a.rds"
file_path[2] = "/home/user/b.rds"
...
vector_sum <- rep(0,50000)
for(i in 1:5){
temp_data <- readRDS(file_path[i])
temp_data <- as.matrix(temp_data[,c("loss_amount")])
vector_sum <- vector_sum + temp_data
}
My goal is to loop through all the files, in each file only keep loss_amount column and add it to vector_sum, so in the end vector_sum is the sum of all loss_amount columns from all files
Using rowSums.
rowSums(sapply(file_path, \(x) readRDS(x)[, 'loss_amount'], USE.NAMES=F))
# [1] 1.2060989 1.4248851 -0.4759345
Data:
set.seed(42)
l <- replicate(3, matrix(rnorm(6), 3, 2, dimnames=list(NULL, c('x', 'loss_amount'))), simplify=F)
dir.create('foo') ## creates/overwrites `foo` in wd!
Map(\(x, y) saveRDS(x, paste0('foo/', y, '.rds')), l, letters[seq_along(l)])
file_path <- list.files('foo', full.names=TRUE)
Here is one possible way to solve your problem using lapply:
sum(unlist(lapply(file_path, \(fle) readRDS(fle)[, "loss_amount"])))
# or
do.call(sum, lapply(file_path, \(fle) readRDS(fle)[, "loss_amount"]))

Loop various df to transform factors to numeric

I have imported various datasets with the same variables for different years. I am trying to transform some of the columns from factor to numeric. To save time I have created a function, which seems not to work.
I have created a list with the names of the datasets as strings
dfs <- list("df1", "df2", "df3", "df4", "df5", "df6", "df7", "df8")
And a second list with the names of the variables (columns) also as strings
vars <- list("var1", "var2", "var3", "var4")
First I tried joining both lists with an "$" in the middle and then passing the function to transform factors to numerics:
to_int <- function(column){
if (is.factor(column)){
column <-levels(column)[column]
column<-as.numeric(column)
return(column)
}
else{
return(column)
}
}
Option 1: create a vector with strings joined by $
col_names <- vector(mode = "list", length = length(dfs))
# Add the combination of names to each vector
for (df in dfs) {
for (var in vars){
r <- paste(df, var, sep = "$") # Combine the names in the 2 lists with a $ in the middle
col_names[[match(df, dfs)]][match(var, vars)] <- r # Assign result to the pre-set vector
}
}
# Iterate through list (col_names) and apply "to_int" to each of the strings in the list
for (l in col_names){
for (col_name in l){
colnm <- eval(parse(text = col_name))
nmrc <- to_int(colnm) # from factor to numeric each column. Works!
assign(col_name, nmrc, envir = globalenv()) # Creates values (Rstudio) with the correct name but columns on dfs remain intact
}
}
Then I tried treating the strings on both lists separately and get them together inside the loop:
Option 2: Treat the lists as separate strings and join in loop
for (df in dfs) {
for (var in vars){
a <- eval(parse(text = df))
b <- to_int(a[var]) # using $ returns null. using [] no change in original df, still factor
a[var] <- b
}
}
I finally tried creating a new function that has to variables as inputs:
# with two inputs
to_int2 <- function(df, col){
eval(parse(text = df))
if (is.factor(df[col])){ # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
df[col] <-levels(df[col])[df[col]]
df[col]<-as.numeric(df[col])
return(df[col])
}
else{
return(df[col])
}
}
And passed that through a third attempt
Option 3: transform factor to numeric with two inputs
for (df in dfs) {
for (var in vars){
a <- to_int2(df, var) # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
b <- eval(parse(text = df))
b$var <- a # No effect
}
}
None of them had an effect on the desired columns of the dataframes.
Any idea on how to solve this?
Thanks
It's generally better to work with multiple similar datasets as a list of frames. The premise being that whatever you do to one, you will do to all, and that is automated easily using lapply.
As an example, try this:
LOF <- mget(dfs)
LOF <- lapply(LOF, function(df) {
df[vars] <- lapply(df[vars], as.integer)
df
})
But if you must keep them separate, then try this:
for (nm in dfs) {
dat <- get(nm)
dat[vars] <- lapply(dat[vars], as.integer)
assign(nm, dat)
}

How can I multiply multiple dataframes of a list by each observation of a vector?

I have a list of dataframes that I would like to multiply for each element of vector.
The first dataframe in the list would be multiplied by the first observation of the vector, and so on, producing another list of dataframes already multiplied.
I tried to do this with a loop, but was unsuccessful. I also tried to imagine something using map or lapply, but I couldn't.
for(i in vec){
for(j in listdf){
listdf2 <- i*listdf[[j]]
}
}
Error in listdf[[j]] : invalid subscript type 'list'
Any idea how to solve this?
*Vector and the List of Dataframes have the same length.
Use Map :
listdf2 <- Map(`*`, listdf, vec)
in purrr this can be done using map2 :
listdf2 <- purrr::map2(listdf, vec, `*`)
If you are interested in for loop solution you just need one loop :
listdf2 <- vector('list', length(listdf))
for (i in seq_along(vec)) {
listdf2[[i]] <- listdf[[i]] * vec[i]
}
data
vec <- c(4, 3, 5)
df <- data.frame(a = 1:5, b = 3:7)
listdf <- list(df, df, df)

In R, how to read an index vector?

*Example.
I have two vectors, vec_1 and vec_2
vec_1 <- c(1,2,3,4)
ver_2 <- c(6,7)
I want to do
vec = vector()
for(i in 1:2){
vec[i] <- mean(vec_i)
}**
I already tested "paste" of various types. Help!*
We can use mget to get the values of the objects in a list, loop over the list with lapply, get the mean
lapply(mget(paste0("vec_", 1:2)), mean)
If it is a data.frame
lapply(mget(paste0('vec_', 1:10)), function(x) mean(x$Pressure))

Resources