how to make a loop to fetch one variable from 1000 dataframes

how to make a loop to fetch one variable from 1000 dataframes - r

I have dataframe by name V1...V1000. inside the dataframe each has one variable with the same name 'var1.predict'. I'm having a hard time creating a loop in order to concatenate all the variables I want to fetch into one new dataframe
this is the syntax I want to make a loop
df <- cbind.data.frame(model_V1$var1.pred,model_V2$var1.pred,.....model_V1000$var1.pred)
I hope someone can help solve this.
thank you
a new dataframe formed by taking one variable from each dataframe

I assume that you mean you have 1,000 dataframes V1... V1000 each with the column var1.predict and you want to extract the predictions column from each df. If so, there are a few methods outlined below with a little reprex:
# putting dummy data in to the global env
lapply(1:3, \(i) {
assign(paste0("V", i), data.frame(v1 = rnorm(5),
v2 = rnorm(5),
var1.predict = rnorm(5)), envir = .GlobalEnv)
})
df_list <- list(V1, V3, V3)
# using a for loop and do.call
pred_cols <- list()
for (df in df_list) {
pred_cols <- c(pred_cols, list(df[["var1.predict"]]))
}
pred_cols_df <- do.call(cbind, pred_cols)
as.data.frame(pred_cols_df)
pred_cols_df
# using a loop without do.call
for (i in seq_along(df_list)) {
if (i == 1) {
pred_cols_df <- df_list[[1]][["var1.predict"]]
} else {
pred_cols_df <- cbind(pred_cols_df, df_list[[i]][["var1.predict"]])
}
}
as.data.frame(pred_cols_df)
pred_cols_df
# using lapply
pred_cols <- lapply(df_list, `[`, "var1.predict")
pred_cols_df <- do.call(cbind, pred_cols)
as.data.frame(pred_cols_df)
pred_cols_df

Related

Loop various df to transform factors to numeric

I have imported various datasets with the same variables for different years. I am trying to transform some of the columns from factor to numeric. To save time I have created a function, which seems not to work.
I have created a list with the names of the datasets as strings
dfs <- list("df1", "df2", "df3", "df4", "df5", "df6", "df7", "df8")
And a second list with the names of the variables (columns) also as strings
vars <- list("var1", "var2", "var3", "var4")
First I tried joining both lists with an "$" in the middle and then passing the function to transform factors to numerics:
to_int <- function(column){
if (is.factor(column)){
column <-levels(column)[column]
column<-as.numeric(column)
return(column)
}
else{
return(column)
}
}
Option 1: create a vector with strings joined by $
col_names <- vector(mode = "list", length = length(dfs))
# Add the combination of names to each vector
for (df in dfs) {
for (var in vars){
r <- paste(df, var, sep = "$") # Combine the names in the 2 lists with a $ in the middle
col_names[[match(df, dfs)]][match(var, vars)] <- r # Assign result to the pre-set vector
}
}
# Iterate through list (col_names) and apply "to_int" to each of the strings in the list
for (l in col_names){
for (col_name in l){
colnm <- eval(parse(text = col_name))
nmrc <- to_int(colnm) # from factor to numeric each column. Works!
assign(col_name, nmrc, envir = globalenv()) # Creates values (Rstudio) with the correct name but columns on dfs remain intact
}
}
Then I tried treating the strings on both lists separately and get them together inside the loop:
Option 2: Treat the lists as separate strings and join in loop
for (df in dfs) {
for (var in vars){
a <- eval(parse(text = df))
b <- to_int(a[var]) # using $ returns null. using [] no change in original df, still factor
a[var] <- b
}
}
I finally tried creating a new function that has to variables as inputs:
# with two inputs
to_int2 <- function(df, col){
eval(parse(text = df))
if (is.factor(df[col])){ # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
df[col] <-levels(df[col])[df[col]]
df[col]<-as.numeric(df[col])
return(df[col])
}
else{
return(df[col])
}
}
And passed that through a third attempt
Option 3: transform factor to numeric with two inputs
for (df in dfs) {
for (var in vars){
a <- to_int2(df, var) # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
b <- eval(parse(text = df))
b$var <- a # No effect
}
}
None of them had an effect on the desired columns of the dataframes.
Any idea on how to solve this?
Thanks

It's generally better to work with multiple similar datasets as a list of frames. The premise being that whatever you do to one, you will do to all, and that is automated easily using lapply.
As an example, try this:
LOF <- mget(dfs)
LOF <- lapply(LOF, function(df) {
df[vars] <- lapply(df[vars], as.integer)
df
})
But if you must keep them separate, then try this:
for (nm in dfs) {
dat <- get(nm)
dat[vars] <- lapply(dat[vars], as.integer)
assign(nm, dat)
}

Apply Function to Specific Column in R List

I have seen many questions pretty similar to mine, but none of the answers I've seen have actually solved what I'm trying to do. I have a list of data frames, and I'm trying to apply the digest() function to the same column in each data frame in my list. A couple of the answers I've seen on SO to this have been:
dflist <- list(data.frame(number = 1:10, name = 1:10),
data.frame(number = 2:15, name = 1:14))
dflist <- lapply(dflist, function(x){
x$name <- digest(x$name, algo = "sha256")
return(x)
})
#OR this
dflist <- lapply(dflist, function(x) {
x %>% mutate_each(funs(digest(.,algo = "sha256")), "name")
})
Both of these give the same output - which is simply every row in the name column having the same exact value. The digest() function works but only returns the value of the first row, in every row.
I've also tried:
dflist <- lapply(dflist, function(x) {
digest(x[,"name"], algo = "sha256")
})
But this just returns only the first value from each data frame in the list.
Any advice would be much appreciated!

The digest is not vectorized
dflist1 <- lapply(dflist, function(x) {
x$name <- Vectorize(digest::digest)(x$name, algo = "sha256")
x
})
Or use it in transform
dflist1 <- lapply(dflist, transform, name = Vectorize(digest::digest)(name))

rbind dataframes with varying names

I have a situation where I need to rbind multiple dataframes based on a name, the trouble i'm having is how to define binding on these dataframes when the names differ -
For instance, the names of my dataframes are:
AB_0
AB_1
BCD_0
BCD_1
And I want to rbind AB_0 and BCD_0, and AB_1 and BCD_1 - my common factor I'm binding on is everything from the _ and after
I know I could use strsplit, but all I'm trying to get to is something like:
for(i in 0:1){
do.call("rbind", mget(sprintf("*_%d", i)))
}
where * is some variable string with varying # of characters

Something like this?
AB_0 <- data.frame(a=1, b=1)
AB_1 <- data.frame(a=2, b=2)
BCD_0 <- data.frame(a=3, b=3)
BCD_1 <- data.frame(a=4, b=4)
XX0 <- do.call("rbind", mget(ls(pattern = ".+_0")))
XX1 <- do.call("rbind", mget(ls(pattern = ".+_1")))
Or automate using a list:
XX <- list()
for (i in 0:1) {
XX[[i+1]] <- do.call("rbind", mget(ls(pattern = paste0(".+_",i))))
}

Separating database by attribute using a loop

I am trying to separate a database by year using a loop in R, however I´m having troubles when trying to save my multiple results. My code is this one
d<- read.csv("BD_070218.csv")
results<-NULL
for(i in 1990:2015){
ano<-d[which(d$year==i),]
results[[i]] <- ano
}

I think I understand your question.
Two potential methods.
# Set a seed
set.seed(1)
# Create example dataframe
d <- data.frame(
a=1:120,
year=sample(1990:2015,120,replace = TRUE),
d=sample(letters,120,replace = TRUE)
)
# Method 1: Nested dataframes in a list
results<-list()
for(i in 1990:2015){
ano<-d[which(d$year==i),]
eval(parse(text=paste("results$year_", i, " <- ano", sep="")))
}
str(results)
results[["year_2012"]]
# Method 2: individual dataframes
for(i in 1990:2015){
ano<-d[which(d$year==i),]
assign(paste0("year_",i), ano, envir = .GlobalEnv)
}
str(year_2000)

How to pass variables into split()?

I want to run split() in a for loop, but when I pass it variable text, it just creates a new data.frame containing the text. The idea here is to split CMPD_DF_1, CMPD_DF_2, etc. based on CMPD_DF_1[5], CMPD_DF_2[5], etc. How do I pass in the data.frame and not a string?
for (i in 1:10) {
split(paste("CMPD_DF", i, sep = "_"),
paste(paste("CMPD_DF", i, sep = "_"), "[5]", sep=""))
}

Sorry for the initial confusion. You can put your data frames in a list and then use lapply. This assumes the column you are splitting on is the same in each data frame. I'll update with a more general solution...
d1 <- data.frame(x =1:10, y = rep(letters[1:2], each = 5))
d2 <- d1
l <- list(d1,d2)
myFun <- function(x){
return(split(x,x[,2]))
}
lapply(l,myFun)
And here's a way to do this using mapply that will allow for different splitting columns in each data frame. You just pre-specify the columns in a separate list and pass them to mapply:
l <- list(d1,d2)
splitColumns <- list("y","y")
myFun2 <- function(x,col){
return(split(x,x[,col]))
}
mapply(myFun2,l,splitColumns,SIMPLIFY = FALSE)

Your code doesn't work because you're not passing a data.frame to split. You're passing a character vector that contains a string with the name of your data.frame. Something like this should work, but it's not very R-like. #joran's answer is preferable.
for (i in 1:10) {
dfname <- paste("CMPD_DF", i, sep = "_")
split(get(dfname), get(dfname)[5])
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

how to make a loop to fetch one variable from 1000 dataframes - r

Related

Loop various df to transform factors to numeric

Apply Function to Specific Column in R List

rbind dataframes with varying names

Separating database by attribute using a loop

How to pass variables into split()?

Categories

Resources