Usage of iteration element within name pattern - r

I would like to iterate with a for loop trough a list applying the following function to all list elements:
new_x = do.call("rbind",mget(ls(pattern = "^x.*")))
where x is a certain name pattern of a dataframe.
How do I iterate through a list where the list element i is the name pattern for my function?
The goal would be to get something like this:
for (i in filenames){
i = do.call("rbind",mget(ls(pattern = "^i.*")))
}
So my question is basically how to use i within a name pattern, so I'm able to use the loop to rbind togerther seperate parts of a dataframe xpart1, xpart2, xpart3 to x; ypart1, ypart2, ypart3 to y and so on....
Thank you in advance!

If we are using a for loop, then an option
v1 <- ls(pattern = "^x.*")
lst1 <- vector('list', length(v1))
for(i in seq_along(v1)){
lst1[[i]] <- get(v1[i])
}
do.call(rbind, lst1)
Or if we need to use i to create the pattern, we can use paste
lst1 <- vector('list', length(filenames))
names(lst1) <- filenames
for(i in filenames){
lst1[[i]] <- get(ls(pattern = paste0(i, ".*")))
}
do.call(rbind, lst1)
NOTE: get returns the value of a single object, whereas mget returns more than one object in a list. If we use for loop, we assume that it is returning one object within the loop and get is only needed
Based on the OP's clarification, we can also use mget
xs <- paste0("xpart", 1:100)
ys <- paste0("ypart", 1:100)
xsdat <- do.call(rbind, mget(xs))
ysdat <- do.call(rbind, mget(ys))

Related

Using lapply variable in read.csv

I'm just getting used to using lapply and I've been trying to figure out how I can use names from a vector to append within filenames I am calling, and naming new dataframes. I understand I can use paste to call the files of interest, but not sure I can create the new dataframes with the _var name appended.
site_list <- c("site_a","site_b", "site_c","site_d")
lapply(site_list,
function(var) {
all_var <- read.csv(paste("I:/Results/",var,"_all.csv"))
tbl_var <- read.csv(paste("I:/Results/",var,"_tbl.csv"))
rsid_var <- read.csv(paste("I:/Results/",var,"_rsid.csv"))
return(var)
})
Generally, it often makes more sense to apply a function to the list elements and then to return a list when using lapply, where your variables are stored and can be named. Example (edit: use split to process files together):
files <- list.files(path= "I:/Results/", pattern = "site_[abcd]_.*csv", full.names = TRUE)
files <- split(files, gsub(".*site_([abcd]).*", "\\1", files))
processFiles <- function(x){
all <- read.csv(x[grep("_all.csv", x)])
rsid <- read.csv(x[grep("_rsid.csv", x)])
tbl <- read.csv(x[grep("_tbl.csv", x)])
# do more stuff, generate df, return(df)
}
res <- lapply(files, processFiles)

Using a variable to fetch a document

Lets say there are these documents I want to fetch I named in a very clear pattern for example EIS_Chip14_1_pre. Everything is constant except the two numbers in the name which range from 6:18 and 1:4. Im using a for loop:
How do I include both "i" and "n" into my title of the storage vector so I dont overwrite it?
dat <- vector(mode = "list")
i <- numeric(13)
n <- numeric(4)
for(i in 6:18){
for(n in 1:4){
path <- paste0("C:/.../downloads/EIS_Chip",i,"_", n, "_pre.dat")
dat_(i)[[n]] <- read.csv(file = path)
}
}
This isn't necessarily the best solution, but it should do the trick for you.
At the top of your outer loop, initialize a variable, tmp here, with an empty list. Inside your inner loop, store the results into elements of that list using [[. When your inner loop is finished, assign the value in tmp to a variable dynamically named using paste0.
dat <- vector(mode = "list")
i <- numeric(13)
n <- numeric(4)
for(i in 6:18){
tmp = list()
for(n in 1:4){
path <- paste0("C:/.../downloads/EIS_Chip",i,"_", n, "_pre.dat")
list[[n]] <- read.csv(file = path)
}
assign(paste0("dat_",i),tmp)
}
A better way to approach this though could be to use expand.grid to get every combination of i and n, then write a function that reads in a file using those values. Finally, using something like purrr::pmap to iterate over your values. This will return a list with 52 elements, one for each file you loaded.
read_in_file = function(i,n){
path = paste0("C:/.../downloads/EIS_Chip",i,"_", n, "_pre.dat")
return(read.csv(file=path))
}
combinations = expand.grid(i = 6:18, n = 1:4)
dat = purrr::pmap(combinations,read_in_file)
This works because expand.grid returns a data.frame which is a special kind of list. Because we named the inputs appropriately, purrr::pmap processes the list of inputs and assigns them correctly to the function.

Run a loop based on vector elements

I have to run a loop based on certain vector values. Some example code and data is shown below:
list_store <- list()
vec <- c(3,2,3)
data_list <- lapply(list(head(mtcars,10), head(mtcars,15), head(mtcars,20), head(mtcars, 9),
head(mtcars,14), head(mtcars,18), head(mtcars,20), head(mtcars,10)),
function(x) rownames_to_column(x))
data_list1 <- lapply(list(head(mtcars,7), head(mtcars,8), head(mtcars,10)), function(x) rownames_to_column(x))
result <- lapply(data_list, function(i){
list_store[[length(list_store) + 1]] <- merge(i, data_list1[[1]], all.y = TRUE)
})
The above code is that I want to merge first three files of data_list with first file of data_list1, the next two files of data_list with second file of data_list1 and finally the other three files of data_list with the third file of data_list1. In my code I merge all the files of data_list with the first file of data_list1, but I want to change data_list1 as per vec
I can have a loop keeping track of i, j and so on to do all the process, but I want to know if there is any efficient way.
We replicate the 'vec' by the sequence of 'vec', use that to split the 'data_list' into 3 list elements each having a list. Then, use Map to pass the corresponding list elements from the split dataset and 'data_list1', loop through the nested list with lapply and merge with the elements of 'data_list1', use c to convert the nested list back to the normal list structure of 'data_list'.
do.call(c,
Map(function(x,y) lapply(x, function(dat)
merge(dat, y, all.y = TRUE)),
split(data_list, rep(seq_along(vec), vec)),
data_list1))

R use grep to clean column in list of lists

I have a large data set stored as a list of lists that may be simplified thus:
list1 <- list(1,"bob", "age=14;years")
list2 <- list(2,"bill", "age=24;years")
list3 <- list(3,"bert", "age=36;years")
data.list <- list(list1, list2, list3)
I wish to clean the third column such that I have only the numeric value of age.
This can be done with the following function that returns a new list:
clean <- function(x){
x <- as.numeric(gsub('.*age=(.*?);.*','\\1', x[3]))
}
data.age <- lapply(data.list, clean)
But how may I either
a) directly clean the column to return the value
or
b) replace the origional column [3] with the data.age column?
You need to return the list back in your function, so modify your function as:
clean <- function(x){
x[[3]] <- as.numeric(gsub('.*age=(.*?);.*','\\1', x[[3]]))
x
}
data.age <- lapply(data.list, clean)
should do the trick.

How to save results from for loop on list into a new list under "i" vector name?

I have the following code:
final_results <- list()
myfunc <- function(v1) {
deparse(substitute(v1))
}
for (i in mylist) {
...calculations...
tmp_results <- as.data.frame(cbind(effcrs,weights))
colnames(tmp_results) <- c('efficiency',names(inputs),
names(outputs)) # header
rownames(tmp_results) <- namesDMU[,1]
#Save to list
name_in_list <- myfunc(i)
dea_results[[name_in_list]] <- tmp_results
}
The above code loops through a list of data frames. I would like each result yielded from the loop to be stored in a separate list under the same name as the original file obtained from mylist or i
I tried using the deparse substitute. when i apply it to an individual item in mylist it looks like this:
myfunc(standard_DEA$'2010-11-11')
[1] "standard_DEA$\"2010-11-11\""
I don't know what the issue is. At the moment it saves everything under the name "i" and replaces all vectors so the end result is a list of 1.
Thank you in advance
This looks like you want a do loop.
library(dplyr)
function_which_returns_dataframe = function(i) {
...calculations...
tmp_results <- as.data.frame(cbind(effcrs,weights))
colnames(tmp_results) <- c('efficiency',names(inputs),
names(outputs)) # header
rownames(tmp_results) <- namesDMU[,1]
tmp_results
}
data_frame(mylist = mylist,
name = names(mylist)) %>%
group_by(mylist, name) %>%
do(function_which_returns_dataframe(.$mylist[[1]]))

Resources