If i have multiple csv files stored as:
m1.csv, m2.csv,.....,m50.csv and what I would like to do is load each csv into R, run the data in the i-th file and store it as a variable: m'i'. I am trying to use a for loop but i'm not sure if i can quite use them in such a way. For example:
for (i in 1:100){
A<-as.matrix(read.csv("c:/Users/Desktop/m"i".csv))
...
#some analysis on A
...
m"i"<- #result of analysis on A
}
V<-cbind(m1,m2, .... ,m100)
Try this
filenames = list.files(getwd())
filenames = filenames[grepl(".csv",file_names)]
files = lapply(filenames, read.csv)
files = do.call(rbind,files)
Related
Need your help , I am new to R.
The scenario is have a list of sas datasets in the specfic locations.
path <- 'C:\\XXXX\\XXX'
files <- list.files(path = path,pattern="*.sas7bdat", full.names=FALSE)
the files variable gives the list of files names available in that directory.
i am keeping the file name as the dataframe using split function removing the extensions stored in domain_name variable.
Iterating each filename which his the sas dataset importing and create each dataset name dynamically.(for instance if there are 30 sas datasets, 30 R dataframes should be created.
library(haven)
for (i in 1:length(files)){
domain_name=strsplit(i,split='.sas7bdat', fixed=TRUE)
domain_name <- read_sas(data_file=paste(path,i,sep='/'))
}
could you explain the concept and fix this problem.
Thanks in advance
The following should in principle work. As there is no real example I can only guess.
path <- 'C:/path2file/'
print(path)
files <- list.files(path = path, pattern="*.sas7bdat", full.names=FALSE)
print(files)
mydf <- list()
for (i in 1:length(files)){
filename <- paste0(path, files[i])
print(filename)
# browser() # if you like to step through the file
mydf[[i]] <- haven::read_sas(data_file=filename)
print(names(mydf[[i]]))
eval(parse(text = paste0("mydf_", i, " <- haven::read_sas(data_file=filename)")))
}
Then you can access each data.frame via e.g. df1 <- mydf[[1]]
I would like to use a R Script that i wrote on Multiple Folders that include a csv file and a text file.
The function i wrote takes the csv and the text file and calculates a vector.
So basically the code i need would open every folder, take the csv file and the text file and would calculate me the fitting vectors.
I thought about using list.files to get a list with the names of all folders and then use lapply to apply the function on every folder, but i dont know how to define the read.csv and read.table.
setwd("C:\\WD")
ptf = "C:\\PathtoFiles"
temp = list.files(path = ptf)
lapply(temp, exfunction)
exfunction = function() {
csvfile = read.csv("nameofile.csv")
textfile = read.table("nameoffile.txt", header=TRUE)
calcvec = vector(mode = "numeric", length = length(textfile))
#Code that calculates the vector
return(calcvec)
}
I have a script that takes raw csv files in a folder, transforms the data in a method described in a function(filename) called "analyze", and spits out values into the console. When I attempt to write.csv these values, it only gives the last value of the function. IF there was a set amount of files per folder I would just do each specific csv file through the program, say [1:5], and lapply/set a matrix into write.csv. However, there is a potential for an infinite amount of files drawn from the directory, so this will not work (I think?). How would I export potentially infinite function outputs to a csv file? I have listed below my final steps after the function definition. It lists all the files in the folder and applys the function "anaylze" to all the files in the folder.
filename <- list.files(path = "VCDATA", pattern = ".csv", full.names = TRUE)
for (f in filename) {
print(f)
analyze(f)
}
Best,
Evan
It's hard to tell without a reproducible example, but I think you have assign the output of analyze to a vector or a dataframe (instead of spitting it out to the console).
Something along these lines:
filename <- list.files(path = "VCDATA", pattern = ".csv", full.names = TRUE)
results <- vector() #empty vector
for (f in filename) {
print(f)
results[which(filename==f)] <- analyze(f) #assign output vector
}
write.csv(results, file=xxx) #write csv file when loop is finished
I hope this answers your question, but it really depends on the format of the output of the analyze function.
I have ~45 files of 5-6 Mo containing over 3000 json objects that I want to work with in R. I've been able to import each jsonr file independantly with fromJSON() as a list except one for which I had to use stream_in(), but am having trouble coercing it into a useful structure. I want to create a data frame merging with rbind all the files. The goal is to merge the result with the other file using cbind.
allfiles <- list.files()
for (file in allfiles) {
jsonFusion <- fromJSON(file)
file 1 <- do.call(rbind,jsonFusion)
}
stream_in(file("files2"))
The first step (loop) is a little bit slow and I don't know how to merge file 1 and file 2 and more how to have a dataframe!!!!
the function as.data.frame() is not working
Assuming the data structures are consistent.
library(jsonlite)
all_files <- list.files(path = "path/to/files", full.names = TRUE)
rbind.pages(lapply(all_files,fromJSON))
I would like to execute anova on multiple datasets stored in my working directory. I have come up so far with:
files <- list.files(pattern = ".csv")
for (i in seq_along(files)) {
mydataset.i <- files[i]
AnovaModel.1 <- aov(DES ~ DOSE, data=mydataset.i)
summary(AnovaModel.1)
}
As you can see I am very new to loops and cannot make this work. I also understand that I need to add a code to append all summary outputs in one file. I would appreciate any help you can provide to guide to the working loop that can execute anovas on multiple .csv files in the directory (same headers) and produce outputs for the record.
you might want to use list.files with full.names = TRUE in case you are not on the same path.
files <- list.files("path_to_my_dir", pattern="*.csv", full.names = T)
# use lapply to loop over all files
out <- lapply(1:length(files), function(idx) {
# read the file
this.data <- read.csv(files[idx], header = TRUE) # choose TRUE/FALSE accordingly
aov.mod <- aov(DES ~ DOSE, data = this.data)
# if you want just the summary as object of summary.aov class
summary(aov.mod)
# if you require it as a matrix, comment the previous line and uncomment the one below
# as.matrix(summary(aov.mod)[[1]])
})
head(out)
This should give you a list with each entry of the list having a summary matrix in the same order as the input file list.
Your error is that your loop is not loading your data. Your list of file names is in "files" then you start moving through that list and set mydataset.i equal to the name of the file that matches your itterator i... but then you try to run aov on the file name that is stored in mydataset.i!
The command you are looking for to redirect your output to a file is sink. Consider the following:
sink("FileOfResults.txt") #starting the redirect to the file
files <- list.files("path_to_my_dir", pattern="*.csv", full.names = T) #using the fuller code from Arun
for (i in seq_along(files)){
mydataset.i <- files[i]
mydataset.d <- read.csv(mydataset.i) #this line is new
AnovaModel.1 <- aov(DES ~ DOSE, data=mydataset.d) #this line is modified
print(summary(AnovaModel.1))
}
sink() #ending the redirect to the file
I prefer this approach to Arun's because the results are stored directly to the file without jumping through a list and then having to figure out how to store the list to a file in a readable fashion.