I would like to loop through a vector of directory names and insert the directory name into the read.table function and 'see' the tables outside the loop.
I have a vector of directory names:
dir_names <- c("SRR2537079","SRR2537080","SRR2537081","SRR2537082", "SRR2537083","SRR2537084")
I now want to loop through these and read the tables in each directory.
I have:
list.data<-list()
for(i in dir_names){
#print(i)
list.data[[i]] <- read.table('dir_names[i]/circularRNA_known.txt', header=FALSE, sep="\t",stringsAsFactors=FALSE)
}
but this isn't recognizing dir_names[i]. Do I have to use paste somehow??
You are right, you need to paste the value. i will also be the list element not a number, so you don't need to call it as dir_names[i] just i
list.data<-list()
for(i in dir_names){
#print(i)
list.data[[i]] <- read.table(paste0(i,'/circularRNA_known.txt'), header=FALSE, sep="\t",stringsAsFactors=FALSE)
}
Can I also suggest that (just for your info, if you wanted a more elegant solution) that you could use plyr's llply instead of a loop. It means it can all happen within one line, and could easily change the output to combine all files into a data.frame (using ldply) if they are in consistent formats
list.data.2 <- llply(dir_names, function(x) read.table(paste0(x,"/circularRNA_known.txt"), header=FALSE, sep="\t",stringsAsFactors=FALSE))
dir_names[i] should be used as a variable.
list.data<-list()
for(i in (1:length(dir_names))){
#print(i)
list.data[[i]] <- read.table(paste0(dir_names[i], '/circularRNA_known.txt'), header=FALSE, sep="\t",stringsAsFactors=FALSE)
}
Related
I have import a excel file with multi worksheets. It’s a list format.
names(mysheets)
#[1] "test_sheet1" "test_sheet2"
Test_sheet1 and test _sheet2 have a different matrix.
I have to put each worksheets as individual data frame.
If do it manually, the code will look like this:
s_1 <- data.frame(mysheets[1])
s_2 <- data.frame(mysheets[2])
I try to write a function to do it, because I have many excel files and each file have multi worksheets
function
p_fun <- function (y) {
for (s_i in 1:2) {
for (i in 1:2) {
s_i<- data.frame(y[i])
return(s_i) }}}
It didn’t work correctly.
Appreciate if anyone can help.
You could use mget to get the object and then change them to data.frame
list_df <- lapply(mget(names(mysheets)), data.frame)
If you want them as separate dataframes, we can do
names(list_df) <- paste0('s_', seq_along(list_df))
list2env(list_df, .GlobalEnv)
We can use assign if we are doing this in a for loop
for(i in seq_along(mysheets)) assign(paste0("s", i), data.frame(mysheets[i]))
So I have a folder with bunch of csv, I set the wd to that folder and extracted the files names:
data_dir <- "~/Desktop/All Waves Data/csv"
setwd(data_dir)
vecFiles <- list.files(data_dir)
all good, now the problem comes when I try to load all of the files using a loop on vecFiles:
for(fl in vecFiles) {
fl <- read.csv(vecFiles[i], header = T, fill = T)
}
The loop treats 'fl' as a plain string when it comes to the naming, resulting only saving the last file under 'fl' (by overwriting the previous one at each time).
I was trying to figure out why this happens but failed.
Any explanation?
Edit: Trying to achieve the following: assume you have a folder with data1.csv, data2.csv ... datan.csv, I want to load them into separate data frames named data1, data2 ..... datan
You want to read in all csv file from your working directory and have the locations of those files saved in vecFiles.
Why your attempt doesn't work
What you are currently doing doesn't work, because you are overwriting the object fn with the newly loaded csv file in every iteration. After all iterations have been run through, you are left with only the last overwritten fn object.
Another example to clarify why fn only contains the value of the last csv-file: If you declare fn <- "abc" in line1, and in line2 you say fn <- "def" (i.e. you overwrite fn from line1) you will obviously have the value "def" saved in fn after line2, right?
fn <- "abc"
fn <- "def"
fn
#[1] "def"
Solutions
There are two prominent ways to solve this: 1) stick with a slightly altered for-loop. 2) Use sapply().
1) The altered for loop: Create an empty list called fn, and assign the loaded csv files to the i-th element of that list in every iteration:
fn <- list()
for(i in seq_along(vecFiles)){
fn[[i]] <- read.csv(vecFiles[i], header=T, fill=T)
}
names(fn) <- vecFiles
2) Use sapply(): sapply() is a function that R-users like to use instead of for-loops.
fn <- sapply(vecFiles, read.csv, header=T, fill=T)
names(fn) <- vecFiles
Note that you can also use lapply() instead of sapply(). The only difference is that lapply() gives you a list as output
You're not declaring anything new when you load the file. Each time you load, it loads into fl, because of that you would only see the last file in vecFiles.
Couple of potential solutions.
First lapply:
fl <- lapply(vecFiles, function(x) read.csv(x, header=T, fill=t) )
names(fl) <- vecFiles
This will create a list of elements within fl.
Second 'rbind':
Under the assumption your data has all the same columns:
fl <- read.csv(vecFiles[1], header=T, fill=t)
for(i in vecFiles[2:length(vecFiles)]){
fl <- rbind(fl, read.csv(vecFiles[i], header=T, fill=t) )
}
Hopefully that is helpful!
So I'm working on an Coursera assignment for an R course.
I'm using a for loop to try to create a data frame that combines the data of 332 csv files.
The for loop only returns the data frame of the last (332th) csv file.
What am I doing wrong?
corr <- function(directory, threshold = 0) {
files <- Sys.glob("specdata//*.csv")
## Create empty numeric vector to append the nitrate values
nitr <- numeric()
## Create empty numeric vector to append the sulfate values
sulf <- numeric()
for (j in 1:length(files)) {
read.data <- read.csv(files[j])
}
}
This is an easy one, you're overwriting read.data each iteration of the loop. You probably want something like:
files <- Sys.glob("specdata//*.csv")
## Create empty numeric vector to append the nitrate values
nitr <- numeric()
## Create empty numeric vector to append the sulfate values
sulf <- numeric()
out <- vector("list")
for (j in 1:length(files)) {
out[[j]] <- read.csv(files[j])
}
A good way to debug for loops is to set j equal to 1, run through the body of the loop, then set it equal to 2 and do the same thing. Also you might want to use seq_along(files) instead of 1:length(files) the former can give you bad results when files is of length 0.
directory and threshold are defined as arguments but not used.
nitr and sulf are created but not used
to get such a list of files list.files("specdata", pattern=".csv", full.names=TRUE) is usually used
For every iteration, files[j] is read (and replaces former one) but nothing is done then.
Also your function should return something.
I don't think you really need a function, the code below should do the job.
```
files <- list.files("specdata", pattern=".csv", full.names=TRUE)
res <- vector("list", length(files))
for (j in 1:length(files)) { # or seq_along(files)
res[[j]] <- read.csv(files[j])
}
res
```
Actually this:
lapply(list.files("specdata", pattern=".csv", full.names=TRUE), read.csv)
would probably work just as fine and is by far less verbose and has a lovely R accent. If you need more arguments for read.csv, eg header=TRUE you can add them (named and comma-separated) after the function name:
lapply(list.files("specdata", pattern=".csv", full.names=TRUE), read.csv, header=TRUE)
I believe this would be the fastest way to do this. This will also show a progress bar of the task being done.
library(data.table)
library(pbapply)
# get file names
files <- list.files("c:/your_folder", pattern=".csv", full.names=TRUE)
# read and pile all files
dt <- rbindlist(pblapply(files, fread))
I have multiple CSV files and I know how to read them and rbind them. But my problem is that before binding them, I want to perform some actions, and then rbind them.
So for one file i would do this:
a<-read.table(file="F:..... .csv", skip=1401, nrow=2,header=FALSE, sep=";")
head(a)
##display only some columns
G<-a[,c(11:13)]
H<-a[, c(14:16)]
names(G)<-names(H)
H_G<-as.data.frame(rbind(G, H))
##transpose to long format
H_G<-t(H_G)
and now i want to rbind fromm all other files.
I tried it with this
filenames <- list.files(path="F:....2",pattern="*.csv")
readlist <- lapply(filenames, read.table, skip=1401, nrow=2,header=FALSE, sep=";")
but then I do not get the result I want.
This code will do what you want
Here I initialize some test matrices:
a<-matrix(1:100,10)
b<-matrix(901:1000,10)
write.csv(file="test.csv",a)
write.csv(file="test2.csv",b)
Here I perform your loop:
filenames <- dir(pattern="*.csv")
for (i in c(1:length(filenames))){
print(filenames[i])
assign(filenames[i],read.csv(filenames[i], header=FALSE))
assign(filenames[i], get(filenames[i])[,8:10])
if(i==1){output<-data.frame(matrix(vector(),10,0))}
results<-rbind(output,get(filenames[i]))
if(i==length(filenames)){output<-t(results)}
}
Notes: column numbers I did in this line assign(filenames[i], get(filenames[i])[,8:10]) are arbitrary, you should insert your own.
Let me know if you have any questions or if this doesn't work for you.
`
Here's the zip file to the specdata directory with all the CSV files in it:
https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip
I'm trying to get all the files into a data frame so I can use complete.cases, this code creates a list of data frames but not a single data frame so I am currently getting errors when trying to use complete.cases. I looked at using merge but I can't seem to wrap my head around how to use merge inside a for loop with multiple files. I have tried implementing rbind and I think I'm close to getting it that way but I also can't seem to figure out how to use it correctly inside a for loop. I am a beginner, trying to understand for loops before I move on vectorized functions like lapply.
Here's the code:
complete<- function(directory, id=1:332){
data<-NULL
for (i in 1:length(id)) {
data[[i]]<- c(paste(directory, "/",formatC(id[i], width=3, flag=0),".csv",sep=""))
}
cases<-NULL
for (d in 1:length(data)) {
cases[[d]]<-c(read.csv(data[d]))
}
df<-NULL
for (c in 1:length(cases)){
df[[c]]<-(data.frame(cases[c]))
}
df
}
The first thing to do is remove the for loops (if you are a beginner, then just get into the apply family right off the bat, for-loops in R are sometimes easier, but the apply family is the R way).
files <- list.files()
data <- lapply(files,function(x) read.csv(x))
Then depending on whether you actually want merge or rbind (because they are not the same)
data_rbind <- do.call("rbind", data)
Or
merge.df <- Reduce(function(x, y) merge(x, y, all=T,by="your_value",sort=F), data, accumulate=F)