R: Generate dynamic names for dataframes - r

I need to read several csv files from a directory and save each data in separate dataframe.
The filenames are in a character vector:
lcl_forecast_data_files <- dir(lcl_forecast_data_path, pattern=glob2rx("*.csv"), full.names=TRUE)
For example: "fruc2021.csv", "gem2020.csv", "strb2021.csv".
So far I am reading the files step by step:
fruc2021 <- read_csv2("fruc2021.csv")
gem2020 <- read_csv2("gem2020.csv")
strb2010 <- read_csv2("strb2021.csv")
But there are many more files in the directory and subdirectories. To read them all one by one is very tedious.
Now I have already experimented a little with the map function, but I have not yet figured out how to automatically generate the names of the dataframes from the file names.
A first simple try was:
lcl_forecast_data <- lcl_forecast_data_files %>%
map(
function(x) {
str_replace(basename(x), ".csv","") <- read_csv2(x)
}
)
But this did not work :-(
Is it even possible to generate names for dataframes like this?
Or are there other, simpler possibilities?
Greetings
Benne
Translated with www.DeepL.com/Translator (free version)

If you do not want to use a list and lapply as #Onyambu suggested you can use assign() to generate the dataframes.
filenames <- c("fruc2021.csv", "gem2020.csv", "strb2021.csv")
for (i in filenames) {
assign(paste('',gsub(".csv","",i),sep=''),read.csv(i))
}

Related

how to use colnames() inside of a forloop to show all colnames from a list

I am trying to loop through a list of data frames to quickly check a bunch of data all at once.
source_files <- list.files(pattern = "\\.csv$")
got a list of csv files in my working directory
for (i in source_files) {
assign(substr(i,1,nchar(i)-4), read.csv(i))
}
use a for loop to read all the csv files
for (i in source_files) {
n <- substr(i,1,nchar(i)-4)
glimpse(n)
}
however this doesn't work. I tried it with a few functions like glimpse, colnames...etc and none of them seem to work. glimpse returns chr "2021xxxdata"
i want to check them before i combine them with bind_rows. any better way to do this? or a way to make this work?

Loop for reading multiple files with readOGR and output to different files in R

I have a bunch of .gpx files in a folder and I'm trying to read them all with readOGR and get one file in memory for each .gpx file. Here's what isn't working:
myfiles <- list.files(".", pattern = "*.gpx")
for (i in 1:length(myfiles)) {
temp.gpx <- readOGR(dsn = myfiles[i], layer="tracks")
temp.gpx
}
What this does is read all of the files and then write them to temp.gpx. What I'd like this to do is to read them and write them to, e.g., temp1.gpx, temp2.gpx, etc.
Unfortunately, I'm pretty new to R and I've no idea how to do it. I tried looking online and found some solutions that were specific to non-spatial files and messed up these files in one way or another.
Does anyone know how to accomplish this?
Thanks!
You can use assign() to generate variable names using other variables:
myfiles <- list.files(".",pattern = "*.gpx")
for (i in 1:length(myfiles)) {
varName <- paste0("temp", i, ".gpx")
assign(varName, readOGR(dsn = myfiles[i], layer="tracks"))
}
This will create a character variable varName with each iteration of the loop which will have the value temp1.gpx, temp2.gpx, etc:
## i <- 1
varName <- paste0("temp", i, ".gpx")
## [1] "temp1.gpx"
The assign() then assigns the result of readOGR() to the current temp*.gpx variable.
The use of assign is in most cases a very poor choice. Although Stuart Allen answered your question correctly, your are most likely asking the wrong question.
What you are trying to do is a typical beginners mistake. With this approach you end up with several named objects that are difficult to manipulate because you need to refer to them by their names, making it hard to use the objects in a loop, for example.
Instead you probably should make a list with all your objects:
gpx <- lapply(myfiles,
function(f) { readOGR(dsn=f, layer="tracks") }
)
And take it from there.

R3.4.1 Read data from multiple .csv files

I'm trying to build up a function that can import/read several data tables in .csv files, and then compute statistics on the selected files.
Each of the 332 .csv file contains a table with the same column names: Date, Pollutant and id. There are a lot of missing values.
This is the function I wrote so far, to compute the mean of values for a pollutant:
pollutantmean <- function(directory, pollutant, id = 1:332) {
library(dplyr)
setwd(directory)
good<-c()
for (i in (id)){
task1<-read.csv(sprintf("%03d.csv",i))
}
p<-select(task1, pollutant)
good<-c(good,complete.cases(p))
mean(p[good,])
}
The problem I have is that each time it goes through the loop a new file is read and the data already read are replaced by the data from the new file.
So I end up with a function working perfectly fine with 1 single file, but not when I want to select multiple files
e.g. if I ask for id=10:20, I end up with the mean calculated only on file 20.
How could I change the code so that I can select multiple files?
Thank you!
My answer offers a way of doing what you want to do (if I understood everything correctly) without using a loop. My two assumptions are: (1) you have 332 *.csv files with the same header (column names) - so all file are of the same structure, and (2) you can combine your tables into one big data frame.
If these two assumptions are correct, I would use a list of your files to import your files as data frames (so this answer does not contain a loop function!).
# This creates a list with the name of your file. You have to provide the path to this folder.
file_list <- list.files(path = [your path where your *.csv files are saved in], full.names = TRUE)
# This will create a list of data frames.
mylist <- lapply(file_list, read.csv)
# This will 'row-bind' the data frames of the list to one big list.
mydata <- rbindlist(mylist)
# Now you can perform your calculation on this big data frame, using your column information to filter or subset to get information of just a subset of this table (if necessary).
I hope this helps.
Maybe something like this?
library(dplyr)
pollutantmean <- function(directory, pollutant, id = 1:332) {
od <- setwd(directory)
on.exit(setwd(od))
task_list <- lapply(sprintf("%03d.csv", id), read.csv)
p_list <- lapply(task_list, function(x) complete.cases(select(x, pollutant)))
mean(sapply(p_list, mean))
}
Notes:
- Put all your library calls at the beginning of your scripts, they will be much easier to read. Never inside a function.
- To set a working directory inside a function is also a bad idea. When the function returns, that change will still be on and you might get lost. The better way is to set wd's outside functions, but since you've set it inside the function, I've addapted the code accordingly.

merge multiple files with different rows in R

I know that this question has been asked previously, but answers to the previous posts cannot seem to solve my problem.
I have dozens of tab-delimited .txt files. Each file has two columns ("pos", "score"). I would like to compile all of the "score" columns into one file with multiple columns. The number of rows in each file varies and they are irrelevant for the compilation.
If someone could direct me on how to accomplish this, preferably in R, it would be a lot of helpful.
Alternatively, my ultimate goal is to read the median and mean of the "score" column from each file. So if this could be accomplished, with or without compiling the files, it would be even more helpful.
Thanks.
UPDATE:
As appealing as the idea of personal code ninjas is, I understand this will have to remain a fantasy. Sorry for not being explicit.
I have tried lapply and Reduce, e.g.,
> files <- dir(pattern="X.*\\.txt$")
> File_list <- lapply(filesToProcess,function(score)
+ read.table(score,header=TRUE,row.names=1))
> File_list <- lapply(files,function(z) z[c("pos","score")])
> out_file <- Reduce(function(x,y) {merge(x,y,by=c("pos"))},File_list)
which I know doesn't really make sense, considering I have variable row numbers. I have also tried plyr
> files <- list.files()
> out_list <- llply(files,read.table)
As well as cbind and rbind. Usually I get an error message, because the row numbers don't match up or I just get all the "score" data compiled into one column.
The advice on similar posts (e.g., Merging multiple csv files in R, Simultaneously merge multiple data.frames in a list, and Merge multiple files in a list with different number of rows) has not been helpful.
I hope this clears things up.
This problem could be solved in two steps:
Step 1. Read the data from your csv files into a list of data frames, where files is a vector of file names. If you need to add extra arguments to read.csv, add them like shown below. See ?lapply for details.
list_of_dataframes <- lapply(files, read.csv, stringsAsFactors = FALSE)
Step 2. Calculate means for each data frame:
means <- sapply(list_of_dataframes, function(df) mean(df$score))
Of course, you can always do it in one step like this:
means <- sapply(files, function(filename) mean(read.csv(filename)$score))
I think you want smth like this:
all_data = do.call(rbind, lapply(files,
function(f) {
cbind(read.csv(f), file_name=f)
}))
You can then do whatever "by" type of action you like. Also, don't forget to adjust the various read.csv options to suit your needs.
E.g. once you have the above, you can do the following (and much more):
library(data.table)
dt = data.table(all_data)
dt[, list(mean(score), median(score)), by = file_name]
A small note: you could also use data.table's fread, to read the files in instead of the read.table and its derivatives, and that would be much faster, and while we're at it, use rbindlist instead of do.call(rbind,.

Assigning unknown variable to new variable name

I have to load in many files and tansform their data. Each file contains only one data.table, however the tables have various names.
I would like to run a single script over all of the files -- to do so, i must assign the unknown data.table to a common name ... say blob.
What is the R way of doing this? At present, my best guess (which seems like a hack, but works) is to load the data.table into a new environment, and then: assign('blob', get(objects(envir=newEnv)[1], env=newEnv).
In a reproducible context this is:
newEnv <- new.env()
assign('a', 1:10, envir = newEnv)
assign('blob', get(objects(envir=newEnv)[1], env=newEnv))
Is there a better way?
The R way is to create a single object, i.e. a single list of data tables.
Here is some pseudocode that contains three steps:
Use list.files() to create a list of all files in a folder.
Use lapply() and read.csv() to read your files and create a list of data frames. Replace read.csv() with read.table() or whatever is appropriate for your data.
Use lapply() again, this time with as.data.table() to convert the data frames to data tables.
The pseudocode:
filenames <- list.files("path/to/files")
dat <- lapply(files, read.csv)
dat <- lapply(dat, as.data.table)
Your result should be a single list, called dat, containing a data table for each of your original files.
I assume that you saved the data.tables using save() somewhat like this:
d1 <- data.table(value=1:10)
save(d1, file="data1.rdata")
and your problem is that when you load the file you don't know the name (here: d1) that you used when saving the file. Correct?
I suggest you use instead saveRDS() and readRDS() for saving/loading single objects:
d1 <- data.table(value=1:10)
saveRDS(d1, file="data1.rds")
blob <- readRDS("data1.rds")

Resources