Im new to R and not used to the Syntax very well i got the following Error:
“Error: unexpected '}' in ”}"
so i know now that there is any Problem with my parantheses.
Problem is, I am looking for 1 h now and I couldnt find any unmached Brackets.
while i was parsing the Code itselve seemed quiet expensive for a solution which should be simple.
so my Intention ist to search a directroy full of CSV and i want to concatenate those (rowwise) which have the same Filename. Is there any function in R yet? Or is the following approach acceptable?
concated_CSV <- data.frame()
Data1 <- data.frame(n)
Data2 <- data.frame()
for (File in Filenames) {
if (Data1$n == 1) {
Data1 <- read.csv(File, header=T, sep=";", dec=",")
Filename_Data1 <- unlist(strsplit(File, ".csv"))
Tendril_Nr_Data1 <- unlist(strsplit(File, "_"))[1]
}
else if (is.na(Data1$n)) {
Data2 <- read.csv(File, header=T, sep=";", dec=",")
Filename_Data2 <- unlist(strsplit(File, ".csv"))
Tendril_Nr_Data2 <- unlist(strsplit(File, "_"))[1]
}
else if (Tendril_Nr_Data1 == Tendril_Nr_Data2) {
concated_CSV <- rbind(Data1, Data2)
new_Filename <- paste0(trg_dir, "/", Tendril_Nr_Data1, ".csv")
write.csv(concated_CSV, new_Filename, row.names=FALSE)
}
}
thank you very much and
best wishes
thanks for your Answers. As you see Im aswell new to Stackoverflow and was just on the reading side so far.
here ist the code i tryied to simplify so you cant use it.
the "Filenames" represents the Filenames im dealing with.
#Stackoverflow example
Filenames <- c("6.1.3.1_1.CSV","6.1.3.1_2.CSV","6.4.3.1.CSV","6.1.2.1_1.CSV","6.1.2.1_2.CSV","6.1.5.CSV")
Filename_Data1 <- "6.1.3.1_1.CSV"
Filename_Data2 <- "6.1.3.1_2.CSV"
#record File for an Output
concated_CSV<- data.frame()
n <- 1
Data1 <- data.frame(n)
Data2<- data.frame()
for(File in Filenames){
if (Data1$n==1 ){
Data1 <- read.csv(File, header=T, sep=";", dec=",")
Filename_Data1 <- unlist(strsplit(File, ".csv"))
Tendril_Nr_Data1 <- unlist(strsplit(Filename_Data1, "_"))[1]
} else if (Data1$n=!1){
Data2 <- read.csv(File, header=T, sep=";", dec=",")
Filename_Data2 <- unlist(strsplit(File, ".csv"))
Tendril_Nr_Data2 <- unlist(strsplit(Filename_Data1, "_"))[1]
} else if (identical(Tendril_Nr_Data1, Tendril_Nr_Data2)){
concated_CSV <- rbind(Data1, Data2)
#tis is the name and directory to which the file should be saved in
#new_Filename <- paste0(trg_dir, "/",Tendril_Nr_Data1,".csv")
n_Filename <- "hello"
write.csv(concated_CSV,n_Filename, row.names = FALSE)
}
}
the missing parantheses hasnt disappered.
My intention ist to write a program which compares CSV-Data-Filenames in a given Directory and if there is a Filename twice for example "abc_1.csv" and abc_2.csv" the Program shall concatenate the CSV-Data rowwise and save a file named "abc.csv" (hope this is clearer).
Related
I am doing a small log processing project in R. I am trying to write a function that gets a dataframe, and writes it in a csv file with some parameters (dataframe name, today's date.. etc)
I have made some progress but didn't manage to write the csv. I hope the code is reproducible and good.
library(dplyr)
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- paste0('"',"./logs/",dfname, "_", Sys.Date(),'.csv"')
dfpath <- as.data.frame(dfpath)
df %>% write_excel_csv(dfpath)
}
wrt_csv(mtcars)
EDIT- this is a final version that works well. Thanks to Ronak Shah.
wd<- getwd()
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- paste0(wd,'/logs/',dfname, '_', Sys.Date(),'.csv')
df %>% write_excel_csv(dfpath)
}
I do however now have a bunch of dataframes that i want to run the function with them. should I make them as a list? this didn't quite work
l <- list(df1,df2)
lapply(l , wrt_csv)
Any thoughts?
Thanks!
Keep dfpath as string. Try :
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- paste0('/logs/',dfname, '_', Sys.Date(),'.csv')
write.csv(df, dfpath, row.names = FALSE)
#Or same as OP
#df %>% write_excel_csv(dfpath)
}
wrt_csv(mtcars)
We can also do
wrt_csv <- function(df) {
dfname <- deparse(substitute(df))
dfpath <- sprintf('/logs/%s_%s.csv', dfname, Sys.Date())
write.csv(df, dfpath, row.names = FALSE)
}
wrt_csv(mtcars)
I follow here some post here
How to combine multiple .csv files in R?
and here
Reading Many CSV Files at the Same Time in R and Combining All into one dataframe
My purpose is basically the same: combining into one big matrix multiples, very large, csv file in R.
I have this solution that I would like to speed up as much as possible:
Here a fully reproducible example; I have much more and bigger files
setwd("C:/") #### set an easy directory to create acceptably large files
#### this takes about 60 seconds
for(i in 1:80){
print(80-i)
write.table(matrix(rnorm(20*3891,0,1),ncol=20),col.names=F,row.names=F,sep=",",file=paste(i,"file.csv",sep=""))
}
listfiles<-list.files(path="C:/",pattern="*.csv")
#### now the problem: this takes about 30-40 seconds; as I have bigger (and much more) files I want to speed up this step
library(plyr)
mybigmatrix<-ldply(listfiles,read.csv,header=F)
Thanks in advance for any help
maybe the use of special packages and functions like readr and the function read_csv()
mybigmatrix<-ldply(listfiles,readr::read_csv,header=F)
Here a fully reproducible example that shows a problem with fread() that does not allow me to coerce in matrix the data.table object.
setwd("C:/") #### set an easy directory to create acceptably large files
#### this takes few seconds
for(i in 1:5){
print(5-i)
write.table(matrix(rnorm(5*3891,0,1),nrow=5),col.names=F,row.names=F,sep=",",file=paste(i,"file.csv",sep=""))
}
listfiles<-list.files(path="C:/",pattern="*.csv")
myfread<-function(file){
data_frame <- fread(file,sep=",",header=FALSE,stringsAsFactors=FALSE,select=c(1:3891),colClasses=c(rep("as.numeric",3891)))
data_frame
}
###### this is a matrix 25*3891 I want an array of 1297x3x25
alld<-rbindlist(lapply(listfiles,myfread))
### why this is in characters??
as.matrix(alld)
k<-1297
m<-3
vectorr<-as.vector(t(as.matrix(alld)))
tem <- vectorr
n <- length(tem)/(k * m)
tem <- array(tem, c(m, k, n))
tem <- aperm(tem, c(2, 1, 3))
xup <- tem ####### here I have characters
I think any of these options should work well for you.
setwd("C:/Users/your_path_here/test")
fnames <- list.files()
csv <- lapply(fnames, read.csv)
result <- do.call(rbind, csv)
filedir <- setwd("C:/Users/your_path_here/csv_files")
file_names <- dir(filedir)
your_data_frame <- do.call(rbind,lapply(file_names,read.csv))
filedir <- setwd("C:/Users/your_path_here/csv_files")
file_names <- dir(filedir)
your_data_frame <- do.call(rbind, lapply(file_names, read.csv, skip = 1, header = FALSE))
filedir <- setwd("C:/Users/your_path_here/csv_files")
file_names <- dir(filedir)
your_data_frame <- do.call(rbind, lapply(file_names, read.csv, header = FALSE))
temp <- setwd("C:/Users/Excel/Desktop/test")
temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)
Finally, try this:
setwd("C:/Users/your_path_here/")
file_list <- list.files()
file_list <- list.files("C:/Users/your_path_here/")
for (file in file_list){
# if the merged dataset doesn't exist, create it
if (!exists("dataset")){
dataset <- read.table(file, header=TRUE, sep="\t")
}
# if the merged dataset does exist, append to it
if (exists("dataset")){
temp_dataset <-read.table(file, header=TRUE, sep="\t")
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
}
}
Here is the data I am working with. https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip
I'm trying to create a function called pollutantmean that will load selected files, aggregate (rbind) the columns, and return a mean of a certain column. I have figured out everything except how to run the loop so I can turn the multiple files into one big data frame.
for (id in 1:5) {
files_full <- Sys.glob("*.csv")
fileQ <- files_full[[id]]
empty_tbl <- rbind(empty_tbl, read.csv(fileQ, header = TRUE))
}
This for loop works by itself but when i try and use my bigger function
pollutantmean <- function(directory = "specdata", pollutant, id = 1:332) {
empty_tbl <- data.frame()
for (id in 1:332) {
files_full <- Sys.glob("*.csv")
fileQ <- files_full[[i]]
empty_tbl <- rbind(empty_tbl, read.csv(fileQ, header = TRUE))
}
goodata <- na.omit(empty_tbl)
if(pollutant == "sulfate") {
mean(goodata[,2])
} else {
mean(goodata[,3])
}
}
I get the:
"Error in read.table(file = file, header = header, sep = sep, quote = quote, :
'file' must be a character string or connection".
I am at a complete loss over how to fix this and have tried many, many different ways. I'm sure I'm messing something up with the naming of the file but I try the for loop by itself and it works fine...
Consider using lapply() on csv files that uses the directory argument of function. Below assumes specdata is a subfolder of the current working directory:
pollutantmean <- function(directory = "specdata", pollutant) {
files_full <- Sys.glob(paste0(directory,"/*.csv"))[1:332] # FIRST 332 CSVs IN DIRECTORY
dfList <- lapply(files_full, read.csv, header=TRUE)
df <- do.call(rbind, dfList)
gooddata <- na.omit(df)
pmean <- ifelse(pollutant == "sulfate", mean(gooddata[,2]), mean(gooddata[,3]))
}
I am having some trouble storing the data after it runs. The code is picking the files up correctly and running the forecast model but it somehow stores the value for the last file. All the others are lost. Is there anyway that I can have all the results stored in a different array. The problem is that the format of the output is in "forecast" format and because of that I am getting stuck on it. I have looked through all the websites but couldn't find something like that.
Here is the code:
library(forecast)
library(quantmod)
library(forecast)
fileList <-as.array(length(50))
Forecast1 <- as.array(length(50))
fileList<-list.files(path ='C:\\Users\\User\\Downloads\\wOOLWORTHS\\',recursive =T, pattern = ".csv")
i<- integer()
j<-integer()
i=1
setwd("C:\\Users\\User\\Downloads\\wOOLWORTHS\\")
while (i<51)
{
a<-fileList[i]
print(a)
a <- read.csv(a)
fileSales<-a$sales
fileTransform<-log(fileSales)
plot.ts(fileTransform)
result1<-HoltWinters(fileTransform,beta = FALSE,gamma =FALSE,seasonal ="multiplicative",optim.control =TRUE)
result2<-forecast.HoltWinters(result1,h=1)
summary(result1)
accuracy(result2)
#Forecast1[i] <- result2(forecast)
#print(Forecast1[i])
i=i+1
}
It may just be how you are storing your results. Try filling an empty list instead (e.g.Forecast1):
setwd("C:\\Users\\User\\Downloads\\wOOLWORTHS\\")
library(forecast)
library(quantmod)
library(forecast)
fileList <- list.files(path ='C:\\Users\\User\\Downloads\\wOOLWORTHS\\',recursive =T, pattern = ".csv")
Forecast1 <- vector(mode="list", 50)
for(i in seq(length(fileList)){
a <- fileList[[i]]
#print(a)
a <- read.csv(a)
fileSales<-a$sales
fileTransform<-log(fileSales)
plot.ts(fileTransform)
result1<-HoltWinters(fileTransform,beta = FALSE,gamma =FALSE,seasonal ="multiplicative",optim.control =TRUE)
result2<-forecast.HoltWinters(result1,h=1)
#summary(result1)
#accuracy(result2)
Forecast1[[i]] <- result2
#print(Forecast1[i])
print(paste(i, "of", length(fileList), "completed"))
}
I have a folder with 142 tab-delimited text files. Each file has 19 variables, and then a number of rows beneath (usually no more than 30 rows, but it varies).
I want to do several things with these files in R automatically, and I can't seem to get exactly what I want with my code. I am new to loops, I got both sections of code from previous posts here at stackoverflow but can't seem to figure out how to combine their functions.
I want to turn the filename into a variable when reading the files into R, so that each row has the identifying file name
Concatenate all files (with filename variable and no header) into one dataframe with dimensions Yx19, where Y=however many resulting rows there are.
I am able to create a list of the 142 dataframes using this code:
myFiles = list.files(path="~/Documents/ForR/", pattern="*.txt")
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
names(data) <- myFiles
for(i in myFiles)
data[[i]]$Source = i
do.call(rbind, data)
I am able to create the dataframe I want with 19 variables, but the filename is not present:
files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
DF <- NULL
for (f in files) {
dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
DF <- rbind(DF, dat)
}
How do I add the file name (without .txt if possible) as a variable to the loop?
add to the loop
dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
DF <- NULL
for (f in files) {
dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
DF <- rbind(DF, dat)
}
Shouldn't the row.names from the do.call be in the format names(list)[n].i where i is 1:number_of_rows_for_data.frame n? so you can just make a column from the row.names
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
combined.data <- do.call(rbind, data)
combined.data$file_origin <- row.names(combined.data)
You can use basename to get the last path element( filename) , for example:
(files = file.path("~","Documents","ForR",c("file1.txt", "file2.txt")))
"~/Documents/ForR/file1.txt" "~/Documents/ForR/file2.txt"
(basename(files))
[1] "file1.txt" "file2.txt"
Then sub to remove the extension ".txt":
sub('.txt','',basename(files),fixed=TRUE)
[1] "file1" "file2"