Replace loop with lapply, sapply - r

I'm very new to R, and I heard it's best to replace loops with apply functions, however I couldn't wrap my head around on how to transform my loop with this example. Any help would be appreciated.
file_path is a list of file names
file_path[1] = "/home/user/a.rds"
file_path[2] = "/home/user/b.rds"
...
vector_sum <- rep(0,50000)
for(i in 1:5){
temp_data <- readRDS(file_path[i])
temp_data <- as.matrix(temp_data[,c("loss_amount")])
vector_sum <- vector_sum + temp_data
}
My goal is to loop through all the files, in each file only keep loss_amount column and add it to vector_sum, so in the end vector_sum is the sum of all loss_amount columns from all files

Using rowSums.
rowSums(sapply(file_path, \(x) readRDS(x)[, 'loss_amount'], USE.NAMES=F))
# [1] 1.2060989 1.4248851 -0.4759345
Data:
set.seed(42)
l <- replicate(3, matrix(rnorm(6), 3, 2, dimnames=list(NULL, c('x', 'loss_amount'))), simplify=F)
dir.create('foo') ## creates/overwrites `foo` in wd!
Map(\(x, y) saveRDS(x, paste0('foo/', y, '.rds')), l, letters[seq_along(l)])
file_path <- list.files('foo', full.names=TRUE)

Here is one possible way to solve your problem using lapply:
sum(unlist(lapply(file_path, \(fle) readRDS(fle)[, "loss_amount"])))
# or
do.call(sum, lapply(file_path, \(fle) readRDS(fle)[, "loss_amount"]))

Related

How to change column names while in a loop in R?

for(i in 1:3){
names <- c("n1","n2","n3")
assign(paste0("mydf",i), data.frame(matrix("", nrow = 3, ncol = 3)))
}
I tried the code shown below but it didn't work.
for(i in 1:3){
names <- c("n1","n2","n3")
assign(paste0("mydf",i), names(data.frame(matrix("", nrow = 3, ncol = 3)))[1:3] <- names)
}
What's your solution? Thanks in advance.
This the approach I would take. The following script not only changes the column names, but also creates 3 dataframes in the global environment kind of like your original script.
for (i in 1:3){
noms <- c("n1","n2","n3") # create the names in order the columns appear in the dataframe
df_ <- data.frame(matrix("", nrow = 3, ncol = 3)) # create the dataframe
df_nom <- paste("mydf", i, sep = "") # create the dataframe name
colnames(df_) <- noms # assign the names to the columns
assign(df_nom, df_) # rename the dataframe
}
1) Normally one puts the data frames in a list but if you really want to them into the current environment then do the following. If you want the global environment then replace the first line with e <- .GlobalEnv or if you want to create a list instead (preferable) then use e <- list() instead.
# define 3 data frames
e <- environment() # or e <- .GlobalEnv or e <- list()
nms <- paste0("mydf", 1:3)
for(nm in nms) e[[nm]] <- data.frame(matrix("", 3, 3))
# change their column names
for(nm in nms) names(e[[nm]]) <- c("n1", "n2", "n3")
2) Even better if we want lists is:
L <- Map(function(x) data.frame(matrix("", 3, 3)), paste0("mydf", 1:3))
L[] <- lapply(L, `names<-`, c("n1", "n2", "n3")) # change col names
Converting
Note that we can convert a list L to data frames that are loose in the environment using one of these depending on which environment you want to put the list components into.
list2env(L, environment())
list2env(L, .GlobalEnv)
and we can go the other way using where e is environment() or .GlobalEnv depending on what we need. We can omit the e argument is the data frames are in the current environment.
L <- mget(nms, e)
You can use get to get the data.frame by name, update the names and assign it back.
nNames <- c("n1","n2","n3")
for(i in 1:3) {
D <- paste0("mydf",i)
tt <- get(D)
names(tt) <- nNames
assign(D, tt)
}
names(mydf1)
#[1] "n1" "n2" "n3"
Alternatively the names could already be set when creating the matrix by using dimnames:
nNames <- c("n1","n2","n3")
for(i in 1:3) {
assign(paste0("mydf", i),
data.frame(matrix("", 3, 3, dimnames=list(NULL, nNames))))
}
names(mydf1)
#[1] "n1" "n2" "n3"

How can I multiply multiple dataframes of a list by each observation of a vector?

I have a list of dataframes that I would like to multiply for each element of vector.
The first dataframe in the list would be multiplied by the first observation of the vector, and so on, producing another list of dataframes already multiplied.
I tried to do this with a loop, but was unsuccessful. I also tried to imagine something using map or lapply, but I couldn't.
for(i in vec){
for(j in listdf){
listdf2 <- i*listdf[[j]]
}
}
Error in listdf[[j]] : invalid subscript type 'list'
Any idea how to solve this?
*Vector and the List of Dataframes have the same length.
Use Map :
listdf2 <- Map(`*`, listdf, vec)
in purrr this can be done using map2 :
listdf2 <- purrr::map2(listdf, vec, `*`)
If you are interested in for loop solution you just need one loop :
listdf2 <- vector('list', length(listdf))
for (i in seq_along(vec)) {
listdf2[[i]] <- listdf[[i]] * vec[i]
}
data
vec <- c(4, 3, 5)
df <- data.frame(a = 1:5, b = 3:7)
listdf <- list(df, df, df)

rbind dataframes with varying names

I have a situation where I need to rbind multiple dataframes based on a name, the trouble i'm having is how to define binding on these dataframes when the names differ -
For instance, the names of my dataframes are:
AB_0
AB_1
BCD_0
BCD_1
And I want to rbind AB_0 and BCD_0, and AB_1 and BCD_1 - my common factor I'm binding on is everything from the _ and after
I know I could use strsplit, but all I'm trying to get to is something like:
for(i in 0:1){
do.call("rbind", mget(sprintf("*_%d", i)))
}
where * is some variable string with varying # of characters
Something like this?
AB_0 <- data.frame(a=1, b=1)
AB_1 <- data.frame(a=2, b=2)
BCD_0 <- data.frame(a=3, b=3)
BCD_1 <- data.frame(a=4, b=4)
XX0 <- do.call("rbind", mget(ls(pattern = ".+_0")))
XX1 <- do.call("rbind", mget(ls(pattern = ".+_1")))
Or automate using a list:
XX <- list()
for (i in 0:1) {
XX[[i+1]] <- do.call("rbind", mget(ls(pattern = paste0(".+_",i))))
}

subsetting a data.frame using a for loop

I have a data.frame, and I want to subset it every 10 rows and then applied a function to the subset, save the object, and remove the previous object. Here is what I got so far
L3 <- LETTERS[1:20]
df <- data.frame(1:391, "col", sample(L3, 391, replace = TRUE))
names(df) <- c("a", "b", "c")
b <- seq(from=1, to=391, by=10)
nsamp <- 0
for(i in seq_along(b)){
a <- i+1
nsamp <- nsamp+1
df_10 <- df[b[nsamp]:b[a], ]
res <- lapply(seq_along(df_10$b), function(x){...}
saveRDS(res, file="res.rds")
rm(res)
}
My problem is the for loop crashes when reaching the last element of my sequence b
When partitioning data, split is your friend. It will create a list with each data subset as an item which is then easy to iterate over.
dfs = split(df, 1:nrow(df) %/% 10)
Then your for loop can be simplified to something like this (untested... I'm not exactly sure what you're doing because example data seems to switch from df to sc2_10 and I only hope your column named b is different from your vector named b):
for(i in seq_along(dfs)){
res <- lapply(seq_along(dfs[[i]]$b), function(x){...}
saveRDS(res, file = sprintf("res_%s.rds", i))
rm(res)
}
I also modified your save file name so that you aren't overwriting the same file every time.

Using R to loop through vector and copy some sequences to data.frame

I want to search through a vector for the sequence of strings "hello" "world". When I find this sequence, I want to copy it, including the 10 elements before and after, as a row in a data.frame to which I'll apply further analysis.
My problem: I get an error "new column would leave holes after existing columns". I'm new to coding, so I'm not sure how to manipulate data.frames. Maybe I need to create rows in the loop?
This is what I have:
df = data.frame()
i <- 1
for(n in 1:length(v))
{
if(v[n] == 'hello' & v[n+1] == 'world')
{
df[i,n-11:n+11] <- v[n-10:n+11]
i <- i+1
}
}
Thanks!
May be this helps
indx <- which(v1[-length(v1)]=='hello'& v1[-1]=='world')
lst <- Map(function(x,y) {s1 <- seq(x,y)
v1[s1[s1>0 & s1 < length(v1)]]}, indx-10, indx+11)
len <- max(sapply(lst, length))
d1 <- as.data.frame(do.call(rbind,lapply(lst, `length<-`, len)))
data
set.seed(496)
v1 <- sample(c(letters[1:3], 'hello', 'world'), 100, replace=TRUE)

Resources