How to unlist a list to a specific level - r

The following function can be used to load an entire excel workbook
install.packages("xlsx")
library("xlsx")
library(readxl)
read_excel_allsheets <- function(filename)
{
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
names(x) <- sheets
x
}
mysheets <- read_excel_allsheets(choose.files())
I am trying to find a way to unlist all worksheets into the global environment as separate dataframes. Thusfar I have been accesing them one at a time using mysheets$, but this inefficient for the large workbooks am using.
I have tried
unlist(mysheets, recursive=F)
but it does not provide the desired result.

I don't see what is inefficient about using mysheets$. You could rename mysheets to a shorter name if that's what's bothering you. If you really want to move the dfs to the global environment, you could use this hack:
for (s in names(mysheets)) {
eval(parse(text = sprintf("%s <- mysheets[['%s']]", make.names(s), s)))
}
rm(mysheets)
Note that the dfs' names will be changed by make.names. If you'd like to keep the names change the line in the for statement to:
eval(parse(text = sprintf("`%s` <- mysheets[['%s']]", s, s)))

Related

Adding a new column with filenames for the list of files in a for loop

I have a time series data. I stored the data in txt files under daily subfolders in Monthly folders.
setwd(".../2018/Jan")
parent.folder <-".../2018/Jan"
sub.folders <- list.dirs(parent.folder, recursive=TRUE)[-1] #To read the sub-folders under parent folder
r.scripts <- file.path(sub.folders)
A_2018 <- list()
for (j in seq_along(r.scripts)) {
A_2018[[j]] <- dir(r.scripts[j],"\\.txt$")}
Of these .txt files, I removed some of the files which I don't want to use for the further analysis, using the following code.
trim_to_two <- function(x) {
runs = rle(gsub("^L1_\\d{4}_\\d{4}_","",x))
return(cumsum(runs$lengths)[which(runs$lengths > 2)] * -1)
}
A_2018_new <- list()
for (j in seq_along(A_2018)) {
A_2018_new[[j]] <- A_2018[[j]][trim_to_two(A_2018[[j]])]
}
Then, I want to make a rowbind by for loop for the whole .txt files. Before that, I would like to remove some lines in each txt file, and add one new column with file name. The following is my code.
for (i in 1:length(A_2018_new)) {
for (j in 1:length(A_2018_new[[i]])){
filename <- paste(str_sub(A_2018_new[[i]][j], 1, 14))
assign(filename, read_tsv(complete_file_name, skip = 14, col_names = FALSE),
)
Y <- r.scripts %>% str_sub(46, 49)
MD <- r.scripts %>% str_sub(58, 61)
HM <- filename %>% str_sub(9, 12)
Turn <- filename %>% str_sub(14, 14)
time_minute <- paste(Y, MD, HM, sep="-")
Map(cbind, filename, SampleID = names(filename))
}
}
But I didn't get my desired output. I tried to code using other examples. Could anyone help to explain what my code is missing.
Your code seems overly complex for what it is doing. Your problem is however not 100% clear (e.g. what is the pattern in your file names that determine what to import and what not?). Here are some pointers that would greatly simplify the code, and likely avoid the issue you are having.
Use lapply() or map() from the purrr package to iterate instead of a for loop. The benefit is that it places the different data frames in a list and you don't need to assign multiple data frames into their own objects in the environment. Since you tagged the tidyverse, we'll use the purrr functions.
library(tidyverse)
You could for instance retrieve the txt file paths, using something like
txt_files <- list.files(path = 'data/folder/', pattern = "txt$", full.names = TRUE) # Need to remove those files you don't with whatever logic applies
and then use map() with read_tsv() from readr like so:
mydata <- map(txt_files, read_tsv)
Then for your manipulation, you can again use lapply() or map() to apply that manipulation to each data frame. The easiest way is to create a custom function, and then apply it to each data frame:
my_func <- function(df, filename) {
df |>
filter(...) |> # Whatever logic applies here
mutate(filename = filename)
}
and then use map2() to apply this function, iterating through the data and filenames, and then list_rbind() to bind the data frames across the rows.
mydata_output <- map2(mydata, txt_files, my_func) |>
list_rbind()

How to change column names of many dataframes in R?

I would like to make the same changes to the column names of many dataframes. Here's an example:
ChangeNames <- function(x) {
colnames(x) <- toupper(colnames(x))
colnames(x) <- str_replace_all(colnames(x), pattern = "_", replacement = ".")
return(x)
}
files <- list(mtcars, nycflights13::flights, nycflights13::airports)
lapply(files, ChangeNames)
I know that lapply only changes a copy. How do I change the underlying dataframe? I want to still use each dataframe separately.
Create a named list, apply the function and use list2env to reflect those changes in the original dataframes.
library(nycflights13)
files <- dplyr::lst(mtcars, flights, airports)
result <- lapply(files, ChangeNames)
list2env(result, .GlobalEnv)

Using lapply variable in read.csv

I'm just getting used to using lapply and I've been trying to figure out how I can use names from a vector to append within filenames I am calling, and naming new dataframes. I understand I can use paste to call the files of interest, but not sure I can create the new dataframes with the _var name appended.
site_list <- c("site_a","site_b", "site_c","site_d")
lapply(site_list,
function(var) {
all_var <- read.csv(paste("I:/Results/",var,"_all.csv"))
tbl_var <- read.csv(paste("I:/Results/",var,"_tbl.csv"))
rsid_var <- read.csv(paste("I:/Results/",var,"_rsid.csv"))
return(var)
})
Generally, it often makes more sense to apply a function to the list elements and then to return a list when using lapply, where your variables are stored and can be named. Example (edit: use split to process files together):
files <- list.files(path= "I:/Results/", pattern = "site_[abcd]_.*csv", full.names = TRUE)
files <- split(files, gsub(".*site_([abcd]).*", "\\1", files))
processFiles <- function(x){
all <- read.csv(x[grep("_all.csv", x)])
rsid <- read.csv(x[grep("_rsid.csv", x)])
tbl <- read.csv(x[grep("_tbl.csv", x)])
# do more stuff, generate df, return(df)
}
res <- lapply(files, processFiles)

loop for changing file endings

I had to split a gigantic csv file up, and now I'd like to process each of them, then stack.
The split up data files are named data-000.csv, data-001.csv, etc., up through 374.
However, I don't know how to get R to read the [i].
for (i in 3:3) {
dat = read.csv("F:data-00[i].csv")
}
**cannot open file 'F:data-[i].csv': No such file or directory**
where dat = read.csv('F:data-003.csv') works just fine.
How do I replace the suffix and process through my text files?
Many thanks!
We can use paste to get the value stored in i instead of literally using it. For storing more than one datasets, it would be better to create a NULL list and then assign the data into that object
lst1 <- vector('list', 3)
for (i in 1:3) {
lst1[[i]] = read.csv(paste0("F:data-00", i, ".csv")
}
Also, if the digits should be 3 digit with prefix 0s, then an option is to format with sprintf
lst1 <- vector('list', 374)
files <- sprintf('F:data-%03d', 1:374)
names(lst1) <- files
for(file in files) {
lst1[[file]] <- read.csv(file)
}
It can also be easier if we use lapply as paste/sprintf are vectorized, it can be taken out of the loop
lst1 <- lapply(files, read.csv)
With tidyverse, we can use map (from purrr) and read_csv (from readr)
library(purrr)
library(readr)
lst1 <- map(files, read_csv)
Or using fread from data.table
library(data.table)
lst1 <- lapply(file, fread)

Import Multiple Sheets into Multiple Data Frames in R

I have an Excel file with a lot of sheets and I need a code to import each sheet in a separate data frame which will be named in the same convention as the sheet name in Excel.
Example, tabs A, B, C will be imported as data frame A, B, and C respectively.
From other threads, I saw codes like:
length(excel_sheets(filename)) to get the number of sheets in the file
Then create a list that would contain each tab:
read_excel_allsheets <- function(filename) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
names(x) <- sheets
x
}
But I do not know how the tabs gets imported into R from there.
Would greatly appreciate the help.
Thanks in advance!
Here's one way to do it:
# write test data
tf <- writexl::write_xlsx(
list("the mtcars" = mtcars, "iris data" = iris),
tempfile(fileext = ".xlsx")
)
# read excel sheets
sheets <- readxl::excel_sheets(tf)
lst <- lapply(sheets, function(sheet)
readxl::read_excel(tf, sheet = sheet)
)
names(lst) <- sheets
# shove them into global environment
list2env(lst, envir = .GlobalEnv)
Your function reads in all the tabs and saves them as elements of a single list (because of lapply()). You can take the elements out of the list with list2env:
your_excel_list <- read_excel_allsheets("test.xlsx")
list2env(your_excel_list, .GlobalEnv)
You'll see that the named elements of your list are now data frames (or actually tbl_df) in your global environment
could read in one line.
should load magrittr and dplyr packages.
data <- lapply(list.files(pattern = "*.xlsx"),function(x) x=read_excel(x,sheet = "(sheetname)")) %>% bind_rows

Resources