Run a loop based on vector elements - r

I have to run a loop based on certain vector values. Some example code and data is shown below:
list_store <- list()
vec <- c(3,2,3)
data_list <- lapply(list(head(mtcars,10), head(mtcars,15), head(mtcars,20), head(mtcars, 9),
head(mtcars,14), head(mtcars,18), head(mtcars,20), head(mtcars,10)),
function(x) rownames_to_column(x))
data_list1 <- lapply(list(head(mtcars,7), head(mtcars,8), head(mtcars,10)), function(x) rownames_to_column(x))
result <- lapply(data_list, function(i){
list_store[[length(list_store) + 1]] <- merge(i, data_list1[[1]], all.y = TRUE)
})
The above code is that I want to merge first three files of data_list with first file of data_list1, the next two files of data_list with second file of data_list1 and finally the other three files of data_list with the third file of data_list1. In my code I merge all the files of data_list with the first file of data_list1, but I want to change data_list1 as per vec
I can have a loop keeping track of i, j and so on to do all the process, but I want to know if there is any efficient way.

We replicate the 'vec' by the sequence of 'vec', use that to split the 'data_list' into 3 list elements each having a list. Then, use Map to pass the corresponding list elements from the split dataset and 'data_list1', loop through the nested list with lapply and merge with the elements of 'data_list1', use c to convert the nested list back to the normal list structure of 'data_list'.
do.call(c,
Map(function(x,y) lapply(x, function(dat)
merge(dat, y, all.y = TRUE)),
split(data_list, rep(seq_along(vec), vec)),
data_list1))

Related

How to change column names of many dataframes in R?

I would like to make the same changes to the column names of many dataframes. Here's an example:
ChangeNames <- function(x) {
colnames(x) <- toupper(colnames(x))
colnames(x) <- str_replace_all(colnames(x), pattern = "_", replacement = ".")
return(x)
}
files <- list(mtcars, nycflights13::flights, nycflights13::airports)
lapply(files, ChangeNames)
I know that lapply only changes a copy. How do I change the underlying dataframe? I want to still use each dataframe separately.
Create a named list, apply the function and use list2env to reflect those changes in the original dataframes.
library(nycflights13)
files <- dplyr::lst(mtcars, flights, airports)
result <- lapply(files, ChangeNames)
list2env(result, .GlobalEnv)

Collecting a list of files that doesn't have any data to add to your frame

I'm trying to load multiple excel files all into one data frame. Some of the files don't have any data in the sheet that I'm looking for, so I'm looking to write code that collects the files that do have data, but also tells me which files weren't included because they didn't have any data. The code I have written does tell me which ones don't have data if I simply 'print(i)' inside the no part of my ifelse statement. However, as soon as I try to do anything else instead of printing, it seems to just ignore me! It's infuriating. How can I collect the names of the files that haven't contributed towards the total data frame?
this works fine:
library(readxl)
files <- list.files(path="./sfiles", pattern = "*.xls", full.names = T)
alldiasendcgmlist <- lapply(files,function(i){
ifelse(nrow(i)==NULL, NULL, i$name<-i)
x= read_excel(i,sheet=2,skip=4)
ifelse(nrow(x)>1, x$ID <- i, print(i))
x
})
but as soon as I want to collect these printings inside a vector, the vector just continues to remain empty:
library(readxl)
files <- list.files(path="./sfiles", pattern = "*.xls", full.names = T)
vectornodata <- character(0)
alldiasendcgmlist <- lapply(files,function(i){
ifelse(nrow(i)==NULL, NULL, i$name<-i)
x= read_excel(i,sheet=2,skip=4)
ifelse(nrow(x)>1, x$ID <- i, nodata <- append(vectornodata, i))
x
})
Help!
Consider building your list of data frames without any logical conditions. Then afterwards run Filter to separate empty and non-empty elements. Negate (another high-order function) is used to return the opposite (i.e., NULL elements without length).
# BUILD NAMED LIST OF EMPTY AND NON-EMPTY DFs
alldiasendcgmlist <- lapply(files, function(i) read_excel(i,sheet=2,skip=4))
alldiasendcgmlist <- setNames(alldiasendcgmlistl), gsub(".xls", "", basename(files)))
# EXTRACT ACTUAL DFs FROM FULL LIST
full_xlfiles <- Filter(length, alldiasendcgmlist)
# EXTRACT NULL ELEMENTS FROM FULL LIST
empty_xlfiles <- Filter(function(df) Negate(length)(df), alldiasendcgmlist)
empty_xlfiles <- names(empty_xlfiles)
I will use my own example to show what one can do. Simply make your function return a list and then use one of its elements to filter our empty data frames:
input <- c(1,2,3,4,5,6)
func <- function(x){
list("square"=x^2, "even"= ifelse(x %% 2 == 0, TRUE, FALSE))
}
res <- lapply(1:6, func) # returns a list of lists
even_numbers <- input[sapply(res, function(x) x[[2]])] # use 2nd element to filter
In your case, you can use a boolean vector to identify files that are empty.

Using lapply variable in read.csv

I'm just getting used to using lapply and I've been trying to figure out how I can use names from a vector to append within filenames I am calling, and naming new dataframes. I understand I can use paste to call the files of interest, but not sure I can create the new dataframes with the _var name appended.
site_list <- c("site_a","site_b", "site_c","site_d")
lapply(site_list,
function(var) {
all_var <- read.csv(paste("I:/Results/",var,"_all.csv"))
tbl_var <- read.csv(paste("I:/Results/",var,"_tbl.csv"))
rsid_var <- read.csv(paste("I:/Results/",var,"_rsid.csv"))
return(var)
})
Generally, it often makes more sense to apply a function to the list elements and then to return a list when using lapply, where your variables are stored and can be named. Example (edit: use split to process files together):
files <- list.files(path= "I:/Results/", pattern = "site_[abcd]_.*csv", full.names = TRUE)
files <- split(files, gsub(".*site_([abcd]).*", "\\1", files))
processFiles <- function(x){
all <- read.csv(x[grep("_all.csv", x)])
rsid <- read.csv(x[grep("_rsid.csv", x)])
tbl <- read.csv(x[grep("_tbl.csv", x)])
# do more stuff, generate df, return(df)
}
res <- lapply(files, processFiles)

Usage of iteration element within name pattern

I would like to iterate with a for loop trough a list applying the following function to all list elements:
new_x = do.call("rbind",mget(ls(pattern = "^x.*")))
where x is a certain name pattern of a dataframe.
How do I iterate through a list where the list element i is the name pattern for my function?
The goal would be to get something like this:
for (i in filenames){
i = do.call("rbind",mget(ls(pattern = "^i.*")))
}
So my question is basically how to use i within a name pattern, so I'm able to use the loop to rbind togerther seperate parts of a dataframe xpart1, xpart2, xpart3 to x; ypart1, ypart2, ypart3 to y and so on....
Thank you in advance!
If we are using a for loop, then an option
v1 <- ls(pattern = "^x.*")
lst1 <- vector('list', length(v1))
for(i in seq_along(v1)){
lst1[[i]] <- get(v1[i])
}
do.call(rbind, lst1)
Or if we need to use i to create the pattern, we can use paste
lst1 <- vector('list', length(filenames))
names(lst1) <- filenames
for(i in filenames){
lst1[[i]] <- get(ls(pattern = paste0(i, ".*")))
}
do.call(rbind, lst1)
NOTE: get returns the value of a single object, whereas mget returns more than one object in a list. If we use for loop, we assume that it is returning one object within the loop and get is only needed
Based on the OP's clarification, we can also use mget
xs <- paste0("xpart", 1:100)
ys <- paste0("ypart", 1:100)
xsdat <- do.call(rbind, mget(xs))
ysdat <- do.call(rbind, mget(ys))

read.csv into nested list and set element names

I'm reading .csv files from several different directories into a nested list. Along the lines of
filenames <- list(a = list.files("/some_dir_1", pattern = "*.csv"), # not a reproducible example but for demonstration purposes
b = list.files("/some_dir_2", pattern = "*.csv"),
c = list.files("/some_dir_3", pattern = "*.csv"))
# creates a nested of list of file paths
dat.list <- lapply(filenames, lapply, read.csv)
# creates a nested list of dataframes, with the same structure as filenames
I'd like to name each element with their file path.
This could be done by naming them one by one, e.g.
names(dat.list[["a"]]) <- filenames[["a"]]
or by putting this in a for-loop, but is there a more versatile method? Preferably a tidyverse friendly solution, along the lines of...
filenames %>% lapply(., lapply, read_csv) %>% #some naming call#
Or am I going about this in the wrong way?
Any help would be greatly appreciated, thanks.
Based on the description, either we can use lapply to loop through the sequence of 'filenames' or with for loop to change the names of each of the dat.list[[i]] elements
lapply(seq_along(filenames), function(i) setNames(dat.list[[i]], filenames[[i]]))
Or with Map
Map(setNames, dat.list, filenames)
Or
for(i in seq_along(filenames)) names(dat.list[[i]]) <- filenames[[i]]
If we want to use tidyverse, the equivalent option based on base R Map would be
library(purrr)
map2(dat.list, filenames, setNames)
NOTE: The for loop assignment will reflect on the original 'dat.list', while we have to assign the lapply back to dat.list to update the 'dat.list'
data
filenames <- list(a = c('a1.csv', 'a2.csv'), b = c('b1.csv', 'b2.csv'))
set.seed(24)
dat.list <- lapply(1:2, function(i) replicate(2, as.data.frame(matrix(sample(1:5, 5*5,
replace = TRUE), 5, 5)), simplify = FALSE))

Resources