I'm trying to rename multiple variables which show up in a few different files I'm working with. In this example I'll just provide one row for the rule. Here's the code:
renaming <- function(dataset){
names(dataset)[names(dataset)=="Lookup Code...3"]<-"Recipient Code"
.
.
.
}
data <- read_excel("File.xlsx",sheet = "Sheet name")
renaming(data)
In the above example I am passing through one dataset. At this point the variable is not being renamed. I'm only new to making functions in R so maybe my syntax is off somewhere.
Once that problem is resolved I would like to then be able to pass a list into this function. I would like to do this by using a for loop which would look something like this:
dataset_list <- c("Data","Data_1",...)
for(i in 1:length(dataset_list)){
renaming(dataset_list[i])
}
I made an attempt at a for loop similar to this but the dataset doesn't seem to get picked up in order to be passed into the function.
I appreciate the help and if you need clarification on this please ask.
You can try -
renaming <- function(dataset){
names(dataset)[names(dataset)=="Lookup Code...3"]<-"Recipient Code"
#Some other code
#Some more code
#Return the changed dataset
dataset
}
#Get all the filenames in a vector
filenames <- list.files(pattern = '.xlsx')
#apply the function to each file
list_data <- lapply(filenames, function(x) {
renaming(readxl::read_excel(x))
})
list_data would have list of dataframes where each dataframe should have the changed column name and other code applied as written in renaming function. You can access individual dataframes using list_data[[1]], list_data[[2]] etc.
Related
I need to read several csv files from a directory and save each data in separate dataframe.
The filenames are in a character vector:
lcl_forecast_data_files <- dir(lcl_forecast_data_path, pattern=glob2rx("*.csv"), full.names=TRUE)
For example: "fruc2021.csv", "gem2020.csv", "strb2021.csv".
So far I am reading the files step by step:
fruc2021 <- read_csv2("fruc2021.csv")
gem2020 <- read_csv2("gem2020.csv")
strb2010 <- read_csv2("strb2021.csv")
But there are many more files in the directory and subdirectories. To read them all one by one is very tedious.
Now I have already experimented a little with the map function, but I have not yet figured out how to automatically generate the names of the dataframes from the file names.
A first simple try was:
lcl_forecast_data <- lcl_forecast_data_files %>%
map(
function(x) {
str_replace(basename(x), ".csv","") <- read_csv2(x)
}
)
But this did not work :-(
Is it even possible to generate names for dataframes like this?
Or are there other, simpler possibilities?
Greetings
Benne
Translated with www.DeepL.com/Translator (free version)
If you do not want to use a list and lapply as #Onyambu suggested you can use assign() to generate the dataframes.
filenames <- c("fruc2021.csv", "gem2020.csv", "strb2021.csv")
for (i in filenames) {
assign(paste('',gsub(".csv","",i),sep=''),read.csv(i))
}
I have a question about generalizing some code into a function in R. Below is the code I want to generalize:
#file name information
years <-c("_1999.XPT","_2003.XPT","_2005.XPT","_2007.XPT","_2009.XPT","_2011.XPT","_2013.XPT","_2015.XPT")
#create initial frame
assign("diabetes", get(paste0("diabetes",years[1])))
#binding rest of frames
for(i in 2:length(years))
{
update_frame <- bind_rows(get("diabetes"),get(paste0("diabetes",years[i])))
assign("diabetes", update_frame)
}
The basic idea is that I want to do a vertical join (bind_rows) of multiple year files into a single dataframe.
My attempted solution to this looks something like this:
big_bind <- function(name)
{
#create initial frame
assign(name, get(paste0(name,years[1])))
#binding rest of frames
for(i in 2:length(years))
{
update_frame <- bind_rows(get(name),get(paste0(name,years[i])))
assign(name, update_frame)
}
}
big_bind("diabetes")
The solution above doesn't work, which leaves me stumped because it works if I swap out the name variable for "diabetes". To be a little more specific, the code runs without errors, but doesn't do anything. I think it has something to do with how R defines variables for functions. Anybody see what I'm missing or has a solution?
I am trying to write multiple dataframes to multiple .csv files dynmanically. I have found online how to do the latter part, but not the former (dynamically define the dataframe).
# create separate dataframes from each 12 month interval of closed age
for (i in 1:max_age) {assign(paste("closed",i*12,sep=""),
mc_masterc[mc_masterc[,7]==i*12,])
write.csv(paste("closed",i*12,sep=""),paste("closed",i*12,".csv",sep=""),
row.names=FALSE)
}
In the code above, the problem is with the first part of the write.csv statement. It will create the .csv file dynamically, but not with the actual content from the table I am trying to specify. What should the first argument of the write.csv statement be? Thank you.
The first argument of write.csv needs to be an R object, not a string. If you don't need the objects in memory you can do it like so:
for (i in 1:max_age) {
df <- mc_masterc[mc_masterc[,7]==i*12,])
write.csv(df,paste("closed",i*12,".csv",sep=""),
row.names=FALSE)
}
and if you need them in memory, you can either do that separately, or use get to return an object based on a string. Seperate:
for (i in 1:max_age) {
df <- mc_masterc[mc_masterc[,7]==i*12,])
assign(paste("closed",i*12,sep=""),df)
write.csv(df,paste("closed",i*12,".csv",sep=""),
row.names=FALSE)
}
With get:
for (i in 1:max_age) {
assign(paste("closed",i*12,sep=""), mc_masterc[mc_masterc[,7]==i*12,])
write.csv(get(paste("closed",i*12,sep="")),paste("closed",i*12,".csv",sep=""),
row.names=FALSE)
}
I'm trying to build up a function that can import/read several data tables in .csv files, and then compute statistics on the selected files.
Each of the 332 .csv file contains a table with the same column names: Date, Pollutant and id. There are a lot of missing values.
This is the function I wrote so far, to compute the mean of values for a pollutant:
pollutantmean <- function(directory, pollutant, id = 1:332) {
library(dplyr)
setwd(directory)
good<-c()
for (i in (id)){
task1<-read.csv(sprintf("%03d.csv",i))
}
p<-select(task1, pollutant)
good<-c(good,complete.cases(p))
mean(p[good,])
}
The problem I have is that each time it goes through the loop a new file is read and the data already read are replaced by the data from the new file.
So I end up with a function working perfectly fine with 1 single file, but not when I want to select multiple files
e.g. if I ask for id=10:20, I end up with the mean calculated only on file 20.
How could I change the code so that I can select multiple files?
Thank you!
My answer offers a way of doing what you want to do (if I understood everything correctly) without using a loop. My two assumptions are: (1) you have 332 *.csv files with the same header (column names) - so all file are of the same structure, and (2) you can combine your tables into one big data frame.
If these two assumptions are correct, I would use a list of your files to import your files as data frames (so this answer does not contain a loop function!).
# This creates a list with the name of your file. You have to provide the path to this folder.
file_list <- list.files(path = [your path where your *.csv files are saved in], full.names = TRUE)
# This will create a list of data frames.
mylist <- lapply(file_list, read.csv)
# This will 'row-bind' the data frames of the list to one big list.
mydata <- rbindlist(mylist)
# Now you can perform your calculation on this big data frame, using your column information to filter or subset to get information of just a subset of this table (if necessary).
I hope this helps.
Maybe something like this?
library(dplyr)
pollutantmean <- function(directory, pollutant, id = 1:332) {
od <- setwd(directory)
on.exit(setwd(od))
task_list <- lapply(sprintf("%03d.csv", id), read.csv)
p_list <- lapply(task_list, function(x) complete.cases(select(x, pollutant)))
mean(sapply(p_list, mean))
}
Notes:
- Put all your library calls at the beginning of your scripts, they will be much easier to read. Never inside a function.
- To set a working directory inside a function is also a bad idea. When the function returns, that change will still be on and you might get lost. The better way is to set wd's outside functions, but since you've set it inside the function, I've addapted the code accordingly.
I am trying to write a program to open a large amount of files and run them through a function I made called "sort". Every one of my file names starts with "sa1", however after that the characters vary based on the file. I was hoping to do something along the lines of this:
for(x in c("Put","Characters","which","Vary","by","File","here")){
sa1+x <- read.csv("filepath/sa1+x",header= FALSE)
sa1+x=sort(sa1+x)
return(sa1+x)
}
In this case, say that x was 88. It would open the file sa188, name that dataframe sa188, and then run it through the function sort. I dont think that writing sa1+x is the correct way to bind together two values, but I dont know a way to.
You need to use a list to contain the data in each csv file, and loop over the filenames using paste0.
file_suffixes <- c("put","characters","which","vary","by","file","here")
numfiles <- length(file_suffixes)
list_data <- list()
sorted_data <- list()
filename <- "filepath/sa1"
for (x in 1:numfiles) {
list_data[[x]] <- read.csv(paste0(filename, file_suffixes[x]), header=FALSE)
sorted_data[[x]] <- sort(list_data[[x]])
}
I am not sure why you use return in that loop. If you're writing a function, you should be returning the sorted_data list which contains all your post-sorting data.
Note: you shouldn't call your function sort because there is already a base R function called sort.
Additional note: you can use dir() and regex parsing to find all the files which start with "sa1" and loop over all of them, thus freeing you from having to specify the file_suffixes.