How to change column names of many dataframes in R? - r

I would like to make the same changes to the column names of many dataframes. Here's an example:
ChangeNames <- function(x) {
colnames(x) <- toupper(colnames(x))
colnames(x) <- str_replace_all(colnames(x), pattern = "_", replacement = ".")
return(x)
}
files <- list(mtcars, nycflights13::flights, nycflights13::airports)
lapply(files, ChangeNames)
I know that lapply only changes a copy. How do I change the underlying dataframe? I want to still use each dataframe separately.

Create a named list, apply the function and use list2env to reflect those changes in the original dataframes.
library(nycflights13)
files <- dplyr::lst(mtcars, flights, airports)
result <- lapply(files, ChangeNames)
list2env(result, .GlobalEnv)

Related

R use a character vector holding names of data.frames for bind_rows

I am facing a problem that I thought would be straightforward to solve but turned out to be far above my horizon.
I guess I have a misconception stuck in my brain.
I have some data.frames which I imported from files. All of which have the exact same columns with the same names. Since they are quite many I wanted to automate the process of combining them into one data.frame using bind_rows.
files <- list.files(path = "/home/username/Documents/", pattern = ".txt")
batch.import <- function(filename) {
name <- unlist(strsplit(filename,"\\."))[1] # get rid of .txt
df <- read_tsv(filename)
colnames(df) <- c("name1", "name2", "name3", "name4")
assign(name, df, envir = .GlobalEnv)
}
map(files,batch.import)
dataframes <- unlist(strsplit(files,"\\."))[seq(1,length(unlist(strsplit(files,"\\."))),2)] # This produces a chr vector with all the data.frames I want to merge
First thing I obviously tried was:
combinedData <- bind_rows(dataframes)
Would have been too easy... I agree. Since it is a chr vector I actually understand, that this doesn't really refer to the data.frames but just to tries to do something with the text.
So I tried to use combinedData <- bind_rows(paste(dataframes)) which I thought could have done the job. But it wouldn't combine the data.frames either.
So I tried something more sophisticated, like a for loop (I also tried to use map() usage here, which unfortunately I dont remember):
for (df in dataframes) {
if (exists("combinedData") {
combinedData <- bind_rows(combinedData, .data[[df]]) # Here I think is the error (if not already before) I also tried {{}}
} else {
cobinedData <- .data[[df]]
}
}
So from what I was reading until now I have to do something with {{}} or .data[[]] but this concept still didn't make it through to my synapses.
Any suggestions how I can use my chr-vector of data.frame names to combine the respective data.frames?
Thank you very much!
Michael
What you can use is foreach instead. Here is psuedo code
library(foreach)
library(dplyr)
files <- list.files(path = "/home/username/Documents/", pattern = ".txt", full.name = TRUE)
# foreach will return a list of df which you can combine later using bind_rows
list_df <- foreach(i_file = files) %do% {
df <- read_tsv(filename)
colnames(df) <- c("name1", "name2", "name3", "name4")
df
}
combine_df <- bind_rows(list_df)
If you want to create a named list of data import
files_name_no_ext <- gsub(pattern = "\\.txt", replacement = "", files)
names(list_df) <- files_name_no_ext

Using lapply variable in read.csv

I'm just getting used to using lapply and I've been trying to figure out how I can use names from a vector to append within filenames I am calling, and naming new dataframes. I understand I can use paste to call the files of interest, but not sure I can create the new dataframes with the _var name appended.
site_list <- c("site_a","site_b", "site_c","site_d")
lapply(site_list,
function(var) {
all_var <- read.csv(paste("I:/Results/",var,"_all.csv"))
tbl_var <- read.csv(paste("I:/Results/",var,"_tbl.csv"))
rsid_var <- read.csv(paste("I:/Results/",var,"_rsid.csv"))
return(var)
})
Generally, it often makes more sense to apply a function to the list elements and then to return a list when using lapply, where your variables are stored and can be named. Example (edit: use split to process files together):
files <- list.files(path= "I:/Results/", pattern = "site_[abcd]_.*csv", full.names = TRUE)
files <- split(files, gsub(".*site_([abcd]).*", "\\1", files))
processFiles <- function(x){
all <- read.csv(x[grep("_all.csv", x)])
rsid <- read.csv(x[grep("_rsid.csv", x)])
tbl <- read.csv(x[grep("_tbl.csv", x)])
# do more stuff, generate df, return(df)
}
res <- lapply(files, processFiles)

A list of dataframes where column variable type to be changed from character to numeric in R

I have created a nested list of dataframes through read.csv and lapply function. This nested list of data frames contains the first column as product and rest 239 columns for data on various countries.
All the numbers are in character format which I wish to convert into numeric form for each dataframe in the list.
I have used the following code. But it removes the product column[1] from each dataframe and displays only [2:240] rest of the columns. How to prevent the product column from getting removed?
files <- list.files(path = "D:\\R34\\casia3\\data_kaz\\export\\", pattern = "*.csv")
myfiles <- lapply(files, function(x) {
df <- read.csv(x, strip.white = T, stringsAsFactors = F, sep = ",")
df$ID <- as.character(x)
return(df)
})
myfiles <- lapply(myfiles, function(x) lapply(x[2:240], as.numeric))
We can use type.convert to automatically convert the class
lstdat <- lapply(lstdat, function(x) {x[] <- lapply(x,
type.convert, as.is = TRUE); x})
Try doing
myfiles <- lapply(myfiles, function(x) {x[2:240] <- lapply(x[2:240], as.numeric);x})
Since you are applying as.numeric function to column 2:240 only those are returned back. We can apply the function to those selected columns and return back the entire dataframe from the inner lapply call.
If interested you might also consider this tidyverse alternative
library(tidyverse)
myfiles <- map(myfiles,. %>% mutate_at(2:240, as.numeric))

Subset multiple dataframes in a loop in R

I am trying to drop columns from over 20 data frames that I have imported. However, I'm getting errors when I try to iterate through all of these files. I'm able to drop when I hard code the individual file name, but as soon as I try to loop through all of the files, I have errors. Here's the code:
path <- "C://Home/Data/"
files <- list.files(path=path, pattern="^.file*\\.csv$")
for(i in 1:length(files))
{
perpos <- which(strsplit(files[i], "")[[1]]==".")
assign(
gsub(" ","",substr(files[i], 1, perpos-1)),
read.csv(paste(path,files[i],sep="")))
}
mycols <- c("test," "trialruns," "practice")
`file01` = `file01`[,!(names(`file01`) %in% mycols)]
So, the above will work and drop those three columns from file01. However, I can't iterate through files02 to files20 and drop the columns from all of them. Any ideas? Thank you so much!
As #zx8754 mentions, consider lapply() maintaining all dataframes in one compiled list instead of multiple objects in your environment (but below also includes how to output individual dfs from list):
path <- "C://Home/Data/"
files <- list.files(path=path, pattern="^.file*\\.csv$")
mycols <- c("test," "trialruns," "practice")
# READ IN ALL FILES AND SUBSET COLUMNS
dfList <- lapply(files, function(f) {
read.csv(paste0(path, f))[mycols]
})
# SET NAMES TO EACH DF ELEMENT
dfList <- setNames(dfList, gsub(".csv", "", files))
# IN CASE YOU REALLY NEED INDIVIDUAL DFs
list2env(dfList, envir=.GlobalEnv)
# IN CASE YOU NEED TO APPEND ALL DFs
finaldf <- do.call(rbind, dfList)
# TO RETRIEVE FIRST DF
dfList[[1]] # OR dfList$file01

applying same function on multiple files in R

I am new to R program and currently working on a set of financial data. Now I got around 10 csv files under my working directory and I want to analyze one of them and apply the same command to the rest of csv files.
Here are all the names of these files: ("US%10y.csv", "UK%10y.csv", "GER%10y.csv","JAP%10y.csv", "CHI%10y.csv", "SWI%10y.csv","SOA%10y.csv", "BRA%10y.csv", "CAN%10y.csv", "AUS%10y.csv")
For example, because the Date column in CSV files are Factor so I need to change them to Date format:
CAN <- read.csv("CAN%10y.csv", header = T, sep = ",")
CAN$Date <- as.character(CAN$Date)
CAN$Date <- as.Date(CAN$Date, format ="%m/%d/%y")
CAN_merge <- merge(all.dates.frame, CAN, all = T)
CAN_merge$Bid.Yield.To.Maturity <- NULL
all.dates.frame is a data frame of 731 consecutive days. I want to merge them so that each file will have the same number of rows which later enables me to combine 10 files together to get a 731 X 11 master data frame.
Surely I can copy and paste this code and change the file name, but is there any simple approach to use apply or for loop to do that ???
Thank you very much for your help.
This should do the trick. Leave a comment if a certain part doesn't work. Wrote this blind without testing.
Get a list of files in your current directory ending in name .csv
L = list.files(".", ".csv")
Loop through each of the name and reads in each file, perform the actions you want to perform, return the data.frame DF_Merge and store them in a list.
O = lapply(L, function(x) {
DF <- read.csv(x, header = T, sep = ",")
DF$Date <- as.character(CAN$Date)
DF$Date <- as.Date(CAN$Date, format ="%m/%d/%y")
DF_Merge <- merge(all.dates.frame, CAN, all = T)
DF_Merge$Bid.Yield.To.Maturity <- NULL
return(DF_Merge)})
Bind all the DF_Merge data.frames into one big data.frame
do.call(rbind, O)
I'm guessing you need some kind of indicator, so this may be useful. Create a indicator column based on the first 3 characters of your file name rep(substring(L, 1, 3), each = 731)
A dplyr solution (though untested since no reproducible example given):
library(dplyr)
file_list <- c("US%10y.csv", "UK%10y.csv", "GER%10y.csv","JAP%10y.csv", "CHI%10y.csv", "SWI%10y.csv","SOA%10y.csv", "BRA%10y.csv", "CAN%10y.csv", "AUS%10y.csv")
can_l <- lapply(
file_list
, read.csv
)
can_l <- lapply(
can_l
, function(df) {
df %>% mutate(Date = as.Date(as.character(Date), format ="%m/%d/%y"))
}
)
# Rows do need to match when column-binding
can_merge <- left_join(
all.dates.frame
, bind_cols(can_l)
)
can_merge <- can_merge %>%
select(-Bid.Yield.To.Maturity)
One possible solution would be to read all the files into R in the form of a list, and then use lapply to to apply a function to all data files. For example:
# Create vector of file names in working direcotry
files <- list.files()
files <- files[grep("csv", files)]
#create empty list
lst <- vector("list", length(files))
#Read files in to list
for(i in 1:length(files)) {
lst[[i]] <- read.csv(files[i])
}
#Apply a function to the list
l <- lapply(lst, function(x) {
x$Date <- as.Date(as.character(x$Date), format = "%m/%d/%y")
return(x)
})
Hope it's helpful.

Resources