Writing a csv file in R with parameter in the file name - r

I am doing a small log processing project in R. I am trying to write a function that gets a dataframe, and writes it in a csv file with some parameters (dataframe name, today's date.. etc)
I have made some progress but didn't manage to write the csv. I hope the code is reproducible and good.
library(dplyr)
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- paste0('"',"./logs/",dfname, "_", Sys.Date(),'.csv"')
dfpath <- as.data.frame(dfpath)
df %>% write_excel_csv(dfpath)
}
wrt_csv(mtcars)
EDIT- this is a final version that works well. Thanks to Ronak Shah.
wd<- getwd()
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- paste0(wd,'/logs/',dfname, '_', Sys.Date(),'.csv')
df %>% write_excel_csv(dfpath)
}
I do however now have a bunch of dataframes that i want to run the function with them. should I make them as a list? this didn't quite work
l <- list(df1,df2)
lapply(l , wrt_csv)
Any thoughts?
Thanks!

Keep dfpath as string. Try :
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- paste0('/logs/',dfname, '_', Sys.Date(),'.csv')
write.csv(df, dfpath, row.names = FALSE)
#Or same as OP
#df %>% write_excel_csv(dfpath)
}
wrt_csv(mtcars)

We can also do
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- sprintf('/logs/%s_%s.csv', dfname, Sys.Date())
write.csv(df, dfpath, row.names = FALSE)
}
wrt_csv(mtcars)

Related

Write a function to manipulate and then write a dataframe

I would like to read in multiple .csv files (dataframes) from a folder and apply a function that I create to all the files. And finally this function will write the new .csv files.
I want the function to do the following 3 things
df$Class <- gsub("null", "OTHER", df$Class)
df$Class <- gsub(': ', ',', df$Class)
df <- df %>% select(c(Image, everything(.), -Name))
I don't really know how to put these thing into a function, but I've tried
`
file_names <- list.files(pattern = "\\.csv$")
tidy_up_fxn <- function(file_names) {
df <- do.call(bind_rows,lapply(file_names,data.table::fread))
df$Class <- gsub("null", "OTHER", df$Class)
df$Class <- gsub(': ', ',', df$Class)
df <- df %>% select(c(Image, everything(.), -Name))
out <- function(df)
fwrite(out, file = file_names, sep = ",")
}
tidy_up_fxn(file_names)
`
When I run it, R gets busy for a few seconds and then nothing happens. Please, help correct my function!
The following works the way I intended to
file_names <- list.files(pattern = "\\.csv$")
tidy_up_fxn <- function(file_names) {
df <- bind_rows(lapply(file_names,data.table::fread))
df$Class <- gsub("null", "OTHER", df$Class)
df$Class <- gsub(': ', ',', df$Class)
df <- df %>% select(c(Image, everything(.), -Name))
fwrite(df, file = "new.csv", sep = ",")
}
tidy_up_fxn(file_names)
Thank you all!!

Multiple bind_rows with R Dplyr

I need to bind_row 27 excel files. Although I can do it manually, I want to do it with a loop. The problem with the loop is that it will bind the first file to i and then the first file to i+1, hence losing i. How can I fix this?
nm <- list.files(path="sample/sample/sample/")
df <- data.frame()
for(i in 1:27){
my_data <- bind_rows(df, read_excel(path = nm[i]))
}
We could loop over the files with map, read the data with read_excel and rbind with _dfr
library(purrr)
my_data <- map_dfr(nm, read_excel)
In the Op's code, the issue is that in each iteration, it is creating a temporary dataset 'my_data', instead, it should be binded to the original 'df' already created
for(i in 1:27){
df <- rbind(df, read_excel(path = nm[i]))
}
We could use sapply
result <- sapply(files, read_excel, simplify=FALSE) %>%
bind_rows(.id = "id")
You can still use your for loop:
my_data<-vector('list', 27)
for(i in 1:27){
my_data[i] <- read_excel(path = nm[i])
}
do.call(rbind, my_data)

R parantheses Problems and alternative way of simplyfing CSV concatenations

Im new to R and not used to the Syntax very well i got the following Error:
“Error: unexpected '}' in ”}"
so i know now that there is any Problem with my parantheses.
Problem is, I am looking for 1 h now and I couldnt find any unmached Brackets.
while i was parsing the Code itselve seemed quiet expensive for a solution which should be simple.
so my Intention ist to search a directroy full of CSV and i want to concatenate those (rowwise) which have the same Filename. Is there any function in R yet? Or is the following approach acceptable?
concated_CSV <- data.frame()
Data1 <- data.frame(n)
Data2 <- data.frame()
for (File in Filenames) {
if (Data1$n == 1) {
Data1 <- read.csv(File, header=T, sep=";", dec=",")
Filename_Data1 <- unlist(strsplit(File, ".csv"))
Tendril_Nr_Data1 <- unlist(strsplit(File, "_"))[1]
}
else if (is.na(Data1$n)) {
Data2 <- read.csv(File, header=T, sep=";", dec=",")
Filename_Data2 <- unlist(strsplit(File, ".csv"))
Tendril_Nr_Data2 <- unlist(strsplit(File, "_"))[1]
}
else if (Tendril_Nr_Data1 == Tendril_Nr_Data2) {
concated_CSV <- rbind(Data1, Data2)
new_Filename <- paste0(trg_dir, "/", Tendril_Nr_Data1, ".csv")
write.csv(concated_CSV, new_Filename, row.names=FALSE)
}
}
thank you very much and
best wishes
thanks for your Answers. As you see Im aswell new to Stackoverflow and was just on the reading side so far.
here ist the code i tryied to simplify so you cant use it.
the "Filenames" represents the Filenames im dealing with.
#Stackoverflow example
Filenames <- c("6.1.3.1_1.CSV","6.1.3.1_2.CSV","6.4.3.1.CSV","6.1.2.1_1.CSV","6.1.2.1_2.CSV","6.1.5.CSV")
Filename_Data1 <- "6.1.3.1_1.CSV"
Filename_Data2 <- "6.1.3.1_2.CSV"
#record File for an Output
concated_CSV<- data.frame()
n <- 1
Data1 <- data.frame(n)
Data2<- data.frame()
for(File in Filenames){
if (Data1$n==1 ){
Data1 <- read.csv(File, header=T, sep=";", dec=",")
Filename_Data1 <- unlist(strsplit(File, ".csv"))
Tendril_Nr_Data1 <- unlist(strsplit(Filename_Data1, "_"))[1]
} else if (Data1$n=!1){
Data2 <- read.csv(File, header=T, sep=";", dec=",")
Filename_Data2 <- unlist(strsplit(File, ".csv"))
Tendril_Nr_Data2 <- unlist(strsplit(Filename_Data1, "_"))[1]
} else if (identical(Tendril_Nr_Data1, Tendril_Nr_Data2)){
concated_CSV <- rbind(Data1, Data2)
#tis is the name and directory to which the file should be saved in
#new_Filename <- paste0(trg_dir, "/",Tendril_Nr_Data1,".csv")
n_Filename <- "hello"
write.csv(concated_CSV,n_Filename, row.names = FALSE)
}
}
the missing parantheses hasnt disappered.
My intention ist to write a program which compares CSV-Data-Filenames in a given Directory and if there is a Filename twice for example "abc_1.csv" and abc_2.csv" the Program shall concatenate the CSV-Data rowwise and save a file named "abc.csv" (hope this is clearer).

R efficiently bind_rows over many dataframes stored on harddrive

I have roughly 50000 .rda files. Each contains a dataframe named results with exactly one row. I would like to append them all into one dataframe.
I tried the following, which works, but is slow:
root_dir <- paste(path, "models/", sep="")
files <- paste(root_dir, list.files(root_dir), sep="")
load(files[1])
results_table = results
rm(results)
for(i in c(2:length(files))) {
print(paste("We are at step ", i,sep=""))
load(files[i])
results_table= bind_rows(list(results_table, results))
rm(results)
}
Is there a more efficient way to do this?
Using .rds is a little bit easier. But if we are limited to .rda the following might be useful. I'm not certain if this is faster than what you have done:
library(purrr)
library(dplyr)
library(tidyr)
## make and write some sample data to .rda
x <- 1:10
fake_files <- function(x){
df <- tibble(x = x)
save(df, file = here::here(paste0(as.character(x),
".rda")))
return(NULL)
}
purrr::map(x,
~fake_files(x = .x))
## map and load the .rda files into a single tibble
load_rda <- function(file) {
foo <- load(file = file) # foo just provides the name of the objects loaded
return(df) # note df is the name of the rda returned object
}
rda_files <- tibble(files = list.files(path = here::here(""),
pattern = "*.rda",
full.names = TRUE)) %>%
mutate(data = pmap(., ~load_rda(file = .x))) %>%
unnest(data)
This is untested code but should be pretty efficient:
root_dir <- paste(path, "models/", sep="")
files <- paste(root_dir, list.files(root_dir), sep="")
data_list <- lapply("mydata.rda", function(f) {
message("loading file: ", f)
name <- load(f) # this should capture the name of the loaded object
return(eval(parse(text = name))) # returns the object with the name saved in `name`
})
results_table <- data.table::rbindlist(data_list)
data.table::rbindlist is very similar to dplyr::bind_rows but a little faster.

Merge specific columns from csv files and use the filenames as headers

I would like to merge specific columns from two csv files and use the filename as the column header.In this example, I want to merge the third column from each file into a single data frame. the csv files have the same number of rows and columns.
Sample data sets:
File1.csv
V1,V2,V3,V4
1,1986,64,61
File2.csv
V1,V2,V3,V4
1,1990,100,61
Expected Result:
"File1","File2"
64,100
Here's my script:
my.file.list <- list.files(pattern = "*.csv")
my.list <- lapply(X = my.file.list, FUN = function(x) {
read.csv(x, header=TRUE,colClasses = c("NULL", "NULL", "numeric", "NULL"), sep = ",")[,1]
})
my.df <- do.call("cbind", my.list)
How do I add the column headers based from the file names?
I tried this:
sub('.csv','',basename(my.file.list),fixed=TRUE)
but I don't know how to add this as headers.
I'll appreciate any help.
my.file.list <- list.files(pattern = "*.csv")
my.list <- list()
for (i in 1:length(my.file.list)) {
df <- read.csv(my.file.list[[i]], header=TRUE, sep=",")["V3"]
names(df) <- paste0("FILE", i)
my.list[[i]] <- df
}
my.df <- do.call("cbind", my.list)
#Tim Biegeleisen Many thanks for the help. I got the idea now. Here's the improve version of your answer that I can use for files with different filenames.
my.file.list <- list.files(pattern = "*.csv")
my.list <- list()
for (i in 1:length(my.file.list)) {
df <- read.csv(my.file.list[[i]], header=TRUE, sep=",")["V3"]
names(df) <-paste0(sub('.csv','',basename(my.file.list[i]),fixed=TRUE), i)
my.list[[i]] <- df
}
my.df <- do.call("cbind", my.list)

Resources