How to pass variables into split()? - r

I want to run split() in a for loop, but when I pass it variable text, it just creates a new data.frame containing the text. The idea here is to split CMPD_DF_1, CMPD_DF_2, etc. based on CMPD_DF_1[5], CMPD_DF_2[5], etc. How do I pass in the data.frame and not a string?
for (i in 1:10) {
split(paste("CMPD_DF", i, sep = "_"),
paste(paste("CMPD_DF", i, sep = "_"), "[5]", sep=""))
}

Sorry for the initial confusion. You can put your data frames in a list and then use lapply. This assumes the column you are splitting on is the same in each data frame. I'll update with a more general solution...
d1 <- data.frame(x =1:10, y = rep(letters[1:2], each = 5))
d2 <- d1
l <- list(d1,d2)
myFun <- function(x){
return(split(x,x[,2]))
}
lapply(l,myFun)
And here's a way to do this using mapply that will allow for different splitting columns in each data frame. You just pre-specify the columns in a separate list and pass them to mapply:
l <- list(d1,d2)
splitColumns <- list("y","y")
myFun2 <- function(x,col){
return(split(x,x[,col]))
}
mapply(myFun2,l,splitColumns,SIMPLIFY = FALSE)

Your code doesn't work because you're not passing a data.frame to split. You're passing a character vector that contains a string with the name of your data.frame. Something like this should work, but it's not very R-like. #joran's answer is preferable.
for (i in 1:10) {
dfname <- paste("CMPD_DF", i, sep = "_")
split(get(dfname), get(dfname)[5])
}

Related

how to make a loop to fetch one variable from 1000 dataframes

I have dataframe by name V1...V1000. inside the dataframe each has one variable with the same name 'var1.predict'. I'm having a hard time creating a loop in order to concatenate all the variables I want to fetch into one new dataframe
this is the syntax I want to make a loop
df <- cbind.data.frame(model_V1$var1.pred,model_V2$var1.pred,.....model_V1000$var1.pred)
I hope someone can help solve this.
thank you
a new dataframe formed by taking one variable from each dataframe
I assume that you mean you have 1,000 dataframes V1... V1000 each with the column var1.predict and you want to extract the predictions column from each df. If so, there are a few methods outlined below with a little reprex:
# putting dummy data in to the global env
lapply(1:3, \(i) {
assign(paste0("V", i), data.frame(v1 = rnorm(5),
v2 = rnorm(5),
var1.predict = rnorm(5)), envir = .GlobalEnv)
})
df_list <- list(V1, V3, V3)
# using a for loop and do.call
pred_cols <- list()
for (df in df_list) {
pred_cols <- c(pred_cols, list(df[["var1.predict"]]))
}
pred_cols_df <- do.call(cbind, pred_cols)
as.data.frame(pred_cols_df)
pred_cols_df
# using a loop without do.call
for (i in seq_along(df_list)) {
if (i == 1) {
pred_cols_df <- df_list[[1]][["var1.predict"]]
} else {
pred_cols_df <- cbind(pred_cols_df, df_list[[i]][["var1.predict"]])
}
}
as.data.frame(pred_cols_df)
pred_cols_df
# using lapply
pred_cols <- lapply(df_list, `[`, "var1.predict")
pred_cols_df <- do.call(cbind, pred_cols)
as.data.frame(pred_cols_df)
pred_cols_df

Apply Function to Specific Column in R List

I have seen many questions pretty similar to mine, but none of the answers I've seen have actually solved what I'm trying to do. I have a list of data frames, and I'm trying to apply the digest() function to the same column in each data frame in my list. A couple of the answers I've seen on SO to this have been:
dflist <- list(data.frame(number = 1:10, name = 1:10),
data.frame(number = 2:15, name = 1:14))
dflist <- lapply(dflist, function(x){
x$name <- digest(x$name, algo = "sha256")
return(x)
})
#OR this
dflist <- lapply(dflist, function(x) {
x %>% mutate_each(funs(digest(.,algo = "sha256")), "name")
})
Both of these give the same output - which is simply every row in the name column having the same exact value. The digest() function works but only returns the value of the first row, in every row.
I've also tried:
dflist <- lapply(dflist, function(x) {
digest(x[,"name"], algo = "sha256")
})
But this just returns only the first value from each data frame in the list.
Any advice would be much appreciated!
The digest is not vectorized
dflist1 <- lapply(dflist, function(x) {
x$name <- Vectorize(digest::digest)(x$name, algo = "sha256")
x
})
Or use it in transform
dflist1 <- lapply(dflist, transform, name = Vectorize(digest::digest)(name))

Assigning a variable to pasted name of column in R

I have a few data frames with the names:
Meanplots1,
Meanplots2,
Meanplots3 etc.
I am trying to write a for loop to do a series of equations on each data frame.
I am attempting to use the paste0 function.
What I want to happen is for x to be a column of each data set. So the code should work like this line:
x <- Meanplots1$PAR
However, since I want to put this in a for loop I want to format it like this:
for (i in 1:3){
x <- paste0("Meanplots",i,"$PAR")
Dmodel <- nls(y ~ ((a*x)/(b + x )) - c, data = dat, start = list(a=a,b=b,c=c))
}
What this does is it assigns x to the list "Meanplots1$PAR" not the actual column. Any idea on how to fix this?
We can get all the data.frame in a list with mget
lst1 <- mget(ls(pattern = '^MeanPlots\\d+$'))
then loop over the list with lapply and apply the model
DmodelLst <- lapply(lst1, function(dat) nls(y ~ ((a* PAR)/(b + PAR )) - c,
data = dat, start = list(a=a,b=b,c=c)))
Replace 'x' with the column name 'PAR'.
In the OP's loop, create a NULL list to store the output ('Outlst'), get the value of the object from paste0, then apply the formula with the unquoted column name i.e. 'PAR'
Outlst <- vector("list", 3)
ndat <- data.frame(x = seq(0,2000,100))
for(i in 1:3) {
dat <- get(paste0("MeanPlots", i))
modeltmp <- nls(y ~ ((a*PAR)/(b + PAR )) - c,
data = dat, start = list(a=a,b=b,c=c))
MD <- data.frame(predict(modeltmp, newdata = ndat))
MD[,2] <- ndat$x
names(MD) <- c("Photo","PARi")
Outlst[[i]] <- MD
}
Now, we extract the output of each list element
Outlst[[1]]
Outlst[[2]]
instead of creating multiple objects in the global environment

Nested apply with multiple parameters

I would like to use the apply family instead of a for loop.
My for loop is nested and contains several vectors and a list, for which I am unsure how to input as parameters with apply.
Codes <- c("A","B","C")
Samples <- c("A","A","B","B","B","C")
Samples_Names <- c("A1","A2","B1","B2","B3","C1")
Samples_folder <- c("Alpha","Alpha","Beta","Beta","Beta","Charlie")
Df <- list(data.frame(T1 = c(1,2,3)), data.frame(T1 = c(1,2,3)), data.frame(T1 = c(1,2,3)))
for (i in 1:length(Codes)){
for (j in 1:length(Samples)) {
if(Codes[i] == Samples[j]) {
write_csv(Df[[i]], path = paste0(Working_Directory,Samples_folder[j],"/",Samples_Names[j],".csv"))
}
}
}
This will give an output of A1,A2 in Alpha, B1,B2,B3 in Beta, and C1 in charlie.
Since you are looking to just use write_csv, we can use pwalk from purrr to accomplish this over the three equal size vectors. No need to include the loop on Codes, as for each iteration in the apply we can write_csv the dataset corresponding to where Samples is found in Codes.
I shortened Working_Directory to WD.
library(purrr)
pwalk(list(Samples, Samples_folder, Samples_Names),
function(x, y, z) write_csv(Df[[match(x, Codes)]], path = paste0(WD, y, "/", z, ".csv")))

R loop to create data frames with 2 counters

What I want is to create 60 data frames with 500 rows in each. I tried the below code and, while I get no errors, I am not getting the data frames. However, when I do a View on the as.data.frame, I get the view, but no data frame in my environment. I've been trying for three days with various versions of this code:
getDS <- function(x){
for(i in 1:3){
for(j in 1:30000){
ID_i <- data.table(x$ID[j: (j+500)])
}
}
as.data.frame(ID_i)
}
getDS(DATASETNAME)
We can use outer (on a small example)
out1 <- c(outer(1:3, 1:3, Vectorize(function(i, j) list(x$ID[j:(j + 5)]))))
lapply(out1, as.data.table)
--
The issue in the OP's function is that inside the loop, the ID_i gets updated each time i.e. it is not stored. Inorder to do that we can initialize a list and then store it
getDS <- function(x) {
ID_i <- vector('list', 3)
for(i in 1:3) {
for(j in 1:3) {
ID_i[[i]][[j]] <- data.table(x$ID[j:(j + 5)])
}
}
ID_i
}
do.call(c, getDS(x))
data
x <- data.table(ID = 1:50)
I'm not sure the description matches the code, so I'm a little unsure what the desired result is. That said, it is usually not helpful to split a data.table because the built-in by-processing makes it unnecessary. If for some reason you do want to split into a list of data.tables you might consider something along the lines of
getDS <- function(x, n=5, size = nrow(x)/n, column = "ID", reps = 3) {
x <- x[1:(n*size), ..column]
index <- rep(1:n, each = size)
replicate(reps, split(x, index),
simplify = FALSE)
}
getDS(data.table(ID = 1:20), n = 5)

Resources