Subset multiple dataframes in a loop in R - r

I am trying to drop columns from over 20 data frames that I have imported. However, I'm getting errors when I try to iterate through all of these files. I'm able to drop when I hard code the individual file name, but as soon as I try to loop through all of the files, I have errors. Here's the code:
path <- "C://Home/Data/"
files <- list.files(path=path, pattern="^.file*\\.csv$")
for(i in 1:length(files))
{
perpos <- which(strsplit(files[i], "")[[1]]==".")
assign(
gsub(" ","",substr(files[i], 1, perpos-1)),
read.csv(paste(path,files[i],sep="")))
}
mycols <- c("test," "trialruns," "practice")
`file01` = `file01`[,!(names(`file01`) %in% mycols)]
So, the above will work and drop those three columns from file01. However, I can't iterate through files02 to files20 and drop the columns from all of them. Any ideas? Thank you so much!

As #zx8754 mentions, consider lapply() maintaining all dataframes in one compiled list instead of multiple objects in your environment (but below also includes how to output individual dfs from list):
path <- "C://Home/Data/"
files <- list.files(path=path, pattern="^.file*\\.csv$")
mycols <- c("test," "trialruns," "practice")
# READ IN ALL FILES AND SUBSET COLUMNS
dfList <- lapply(files, function(f) {
read.csv(paste0(path, f))[mycols]
})
# SET NAMES TO EACH DF ELEMENT
dfList <- setNames(dfList, gsub(".csv", "", files))
# IN CASE YOU REALLY NEED INDIVIDUAL DFs
list2env(dfList, envir=.GlobalEnv)
# IN CASE YOU NEED TO APPEND ALL DFs
finaldf <- do.call(rbind, dfList)
# TO RETRIEVE FIRST DF
dfList[[1]] # OR dfList$file01

Related

How to change column names of many dataframes in R?

I would like to make the same changes to the column names of many dataframes. Here's an example:
ChangeNames <- function(x) {
colnames(x) <- toupper(colnames(x))
colnames(x) <- str_replace_all(colnames(x), pattern = "_", replacement = ".")
return(x)
}
files <- list(mtcars, nycflights13::flights, nycflights13::airports)
lapply(files, ChangeNames)
I know that lapply only changes a copy. How do I change the underlying dataframe? I want to still use each dataframe separately.
Create a named list, apply the function and use list2env to reflect those changes in the original dataframes.
library(nycflights13)
files <- dplyr::lst(mtcars, flights, airports)
result <- lapply(files, ChangeNames)
list2env(result, .GlobalEnv)

Using lapply variable in read.csv

I'm just getting used to using lapply and I've been trying to figure out how I can use names from a vector to append within filenames I am calling, and naming new dataframes. I understand I can use paste to call the files of interest, but not sure I can create the new dataframes with the _var name appended.
site_list <- c("site_a","site_b", "site_c","site_d")
lapply(site_list,
function(var) {
all_var <- read.csv(paste("I:/Results/",var,"_all.csv"))
tbl_var <- read.csv(paste("I:/Results/",var,"_tbl.csv"))
rsid_var <- read.csv(paste("I:/Results/",var,"_rsid.csv"))
return(var)
})
Generally, it often makes more sense to apply a function to the list elements and then to return a list when using lapply, where your variables are stored and can be named. Example (edit: use split to process files together):
files <- list.files(path= "I:/Results/", pattern = "site_[abcd]_.*csv", full.names = TRUE)
files <- split(files, gsub(".*site_([abcd]).*", "\\1", files))
processFiles <- function(x){
all <- read.csv(x[grep("_all.csv", x)])
rsid <- read.csv(x[grep("_rsid.csv", x)])
tbl <- read.csv(x[grep("_tbl.csv", x)])
# do more stuff, generate df, return(df)
}
res <- lapply(files, processFiles)

Import many .csv files into one dataframe and add column based on name

I have > 50 .csv files in a folder on my computer. The files all contain the same column headings/ format.
I have code to import all the .csv files and name them appropriately:
path <- "~/My folder Location/"
files <- list.files(path=path, pattern="*.csv")
for(file in files)
{
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)),
read.csv(paste(path,file,sep="")))
}
I now have many .csv files that are named, as I prefer, in the environment. However, I now wish to create two columns within each data.frame based on parts of the data.frame name and then create one big data.frame
For example, if one of the data.frames is:
LeftArm_Beatrice
I wish to include:
LeftArm_Beatrice$BodyPart <- c("LeftArm")
LeftArm_Beatrice$Name <- c("Beatrice")
Another example, if one of the data.frames is:
RightLeg_Sally
I wish to include:
RightLeg_Sally$BodyPart <- c("RightLeg")
RightLeg_Sally$Name <- c("Sally")
I then want to merge all these 50+ data.frames into one. If these steps can be included in my importing code, that would be fantastic.
Thanks!
might this work ! I actually needed more clarifications on the data and naming to be followed. So let me know if you have any questions
path = "D:/pathname/"
l = list.files(path, pattern = ".csv")
# below func does importing and creation of new columns
func <- function(i){
df <- read.csv(paste0(path,l[i]))
names <- unlist(strsplit(l[i], "_"))
df["BodyPart"] <- names[1]
df["Name"] <- names[2]
return(df)
}
# l1 shall have each of the dataframes individually with new columns attached
l1 = lapply(1:length(l), func)
# here we combine all dataframes together
l2 <- as.data.frame(l1)

How to read variable number of files and then combine the data frames in R?

I would like to design a function. Say I have files file1.csv, file2.csv, file3.csv, ..., file100.csv. I only want to read some of them every time I call the function by specifying an integer vector id, e.g., id = 1:10, then I will read file1.csv,...,file10.csv.
After reading those csv files, I would like to row combine them into a single variable. All csv files have the same column structure.
My code is below:
namelist <- list.files()
for (i in id) {
assign(paste0( "file", i ), read.csv(namelist[i], header=T))
}
As you can see, after I read in all the data matrix, I stuck at combining them since they all have different variable names.
You should read in each file as an element of a list. Then you can combine them as follows:
namelist <- list.files()
df <- vector("list", length = length(id))
for (i in id) {
df[[i]] <- read.csv(namelist[i], header = TRUE)
}
df <- do.call("rbind", df)
Or more concisely:
df <- do.call(rbind, lapply(list.files(), read.csv))
I do this, which is more R like without the for loop:
## assuming you have a folder full of .csv's to merge
filenames <- list.files()
all_files <- Reduce(rbind, lapply(filenames, read.csv))
If I understand correctly what you want to do then this is all you need:
namelist <- list.files()
singlevar = c()
for (i in id) {
singlevar = rbind(singlevar, read.csv(namelist[i], header=T))
}
Since in the end you want one single object to contain all the partial information from the single files, rbind as you go.

applying same function on multiple files in R

I am new to R program and currently working on a set of financial data. Now I got around 10 csv files under my working directory and I want to analyze one of them and apply the same command to the rest of csv files.
Here are all the names of these files: ("US%10y.csv", "UK%10y.csv", "GER%10y.csv","JAP%10y.csv", "CHI%10y.csv", "SWI%10y.csv","SOA%10y.csv", "BRA%10y.csv", "CAN%10y.csv", "AUS%10y.csv")
For example, because the Date column in CSV files are Factor so I need to change them to Date format:
CAN <- read.csv("CAN%10y.csv", header = T, sep = ",")
CAN$Date <- as.character(CAN$Date)
CAN$Date <- as.Date(CAN$Date, format ="%m/%d/%y")
CAN_merge <- merge(all.dates.frame, CAN, all = T)
CAN_merge$Bid.Yield.To.Maturity <- NULL
all.dates.frame is a data frame of 731 consecutive days. I want to merge them so that each file will have the same number of rows which later enables me to combine 10 files together to get a 731 X 11 master data frame.
Surely I can copy and paste this code and change the file name, but is there any simple approach to use apply or for loop to do that ???
Thank you very much for your help.
This should do the trick. Leave a comment if a certain part doesn't work. Wrote this blind without testing.
Get a list of files in your current directory ending in name .csv
L = list.files(".", ".csv")
Loop through each of the name and reads in each file, perform the actions you want to perform, return the data.frame DF_Merge and store them in a list.
O = lapply(L, function(x) {
DF <- read.csv(x, header = T, sep = ",")
DF$Date <- as.character(CAN$Date)
DF$Date <- as.Date(CAN$Date, format ="%m/%d/%y")
DF_Merge <- merge(all.dates.frame, CAN, all = T)
DF_Merge$Bid.Yield.To.Maturity <- NULL
return(DF_Merge)})
Bind all the DF_Merge data.frames into one big data.frame
do.call(rbind, O)
I'm guessing you need some kind of indicator, so this may be useful. Create a indicator column based on the first 3 characters of your file name rep(substring(L, 1, 3), each = 731)
A dplyr solution (though untested since no reproducible example given):
library(dplyr)
file_list <- c("US%10y.csv", "UK%10y.csv", "GER%10y.csv","JAP%10y.csv", "CHI%10y.csv", "SWI%10y.csv","SOA%10y.csv", "BRA%10y.csv", "CAN%10y.csv", "AUS%10y.csv")
can_l <- lapply(
file_list
, read.csv
)
can_l <- lapply(
can_l
, function(df) {
df %>% mutate(Date = as.Date(as.character(Date), format ="%m/%d/%y"))
}
)
# Rows do need to match when column-binding
can_merge <- left_join(
all.dates.frame
, bind_cols(can_l)
)
can_merge <- can_merge %>%
select(-Bid.Yield.To.Maturity)
One possible solution would be to read all the files into R in the form of a list, and then use lapply to to apply a function to all data files. For example:
# Create vector of file names in working direcotry
files <- list.files()
files <- files[grep("csv", files)]
#create empty list
lst <- vector("list", length(files))
#Read files in to list
for(i in 1:length(files)) {
lst[[i]] <- read.csv(files[i])
}
#Apply a function to the list
l <- lapply(lst, function(x) {
x$Date <- as.Date(as.character(x$Date), format = "%m/%d/%y")
return(x)
})
Hope it's helpful.

Resources