Is there a way of saying something like:
for (i in 1:10){
ga${i} <- read.table(file="ene.${i}.dat",header=T, sep = ",")
}
in R.
I tried using many other constructs, but none suited the requirement.
Thanks.
We can extract file names first.
ga <- lapply(list.files(path = ".", pattern = "\\.dat"), read.csv)
or with loop:
lf <- list.files(path = ".", pattern = "\\.dat")
ga <- structure(vector("list", length(lf)),
names = gsub("\\.dat", "", lf))
for (i in seq_along(ga))
ga[i] <- read.csv(lf[i])
To assign data to the separate variables:
lf <- list.files(path = ".", pattern = "\\.dat")
fn <- gsub("\\.dat", "", lf)
for (i in seq_along(lf))
assign(fn[i], read.csv(lf[i]))
You can use an empty list and then a paste function to do something like this:
ga <- list()
for (i in 1:10) {
ga[[i]] <- read.table(file = paste('ene.', i, '.dat', sep = ''), header = TRUE, sep = ',')
}
Then, you will have a list of data frames. You can index as ga[[1]], ga[[2]] etc. to access them.
Related
This program works because I made the varibles inisde lapply global by using the <<- operator. However, it does not work with the real files in the real program. These are .tsv files whith named columns. The answer I get when I run the real program is: Error: (converted from warning) Error in : (converted from warning) Error in : arguments imply differing number of rows: 3455, 4319. What might be causing this?
lc <- list("test.txt", "test.txt", "test.txt", "test.txt")
lc1 <- list("test.txt", "test.txt", "test.txt")
lc2 <- list("test.txt", "test.txt")
#list of lists. The lists contain file names
lc <- list(lc, lc1, lc2)
#new names for the three lists in the list of lists
new_dataFns <- list("name1", "name2", "name3")
file_paths <- NULL
new_path <- NULL
#add the file names to the path and read and merge the contents of each list in the list of lists
lapply(
lc,
function(lc) {
filenames <- file.path(getwd(), lc)
dataList <<- lapply(filenames, function (lc) read.table(file=lc, header=TRUE))
dataList <<- lapply(dataList, function(dataList) {merge(as.data.frame(dataList),as.data.frame(dataList))})
}
)
#add the new name of the file to the path total will be 3 paths/fille_newname.tsv.
lapply(new_dataFns, function(new_dataFns) {new_path <<- file.path(getwd(), new_dataFns)})
print(new_path)
print(dataList)
finalFiles <- merge(as.data.frame(dataList), as.data.frame(new_path))
print(finalFiles)
I found a solution to the problem by writing a different type of code. Please see below. The input to the function is provided by the app input widgets
glyCount1 <- function(answer = NULL, fileChoice = NULL, combination = NULL, enteredValue = NULL, nameList) {
lc = nameList
new_dataFns <- gsub(" ", "", nameList)
first_path <- NULL
new_path <- NULL
old_path <- NULL
file_content <- NULL
for(i in 1:length(lc)){
for(j in 1:length(lc[[i]])){
if(!is.null(lc[[i]])){
first_path[[j]]<- paste(getwd(), "/", lc[[i]][j], sep = "")
tryCatch(file_content[[j]] <- read.csv(file = first_path[[i]], header = TRUE, sep = ","), error = function(e) NULL)
old_path[[j]] <- paste(getwd(), "/", i, ".csv", sep = "")
write.table(file_content[[j]], file = old_path[[j]], append = TRUE, col.names = FALSE)
}
}
}
}
I have a bunt of single files which need to apply a test. I need to find the way to write automatically results of each file into a file. Here is what I do:
library(ape)
stud_files <- list.files("path/dir/data",full.names = T)
for (f in stud_files) {
df <- read.table(f, header=TRUE, sep=";")
df_xts <- as.xts(df$cola, order.by = as.Date(df$colb,"%m/%d/%Y"))
pet <- testa(df_xts)
res <- data.frame(estimate = pet$estimate,
p.value=pet$p.value,
logi = pet$alternative)
write.dna(res,file = "res_testa.xls",format = "sequential")
}
This loop works well, except the last command which aim to write the results of each file consecutively, it saved only the last performance. And the results save as string, not a table as I define above (data.frame). Any idea in this case? Thanks in advance
Check help(write.dna).
write.dna(x, file, format = "interleaved", append = FALSE,
nbcol = 6, colsep = " ", colw = 10, indent = NULL,
blocksep = 1)
append a logical, if TRUE the data are appended to the file without
erasing the data possibly existing in the file, otherwise the file (if
it exists) is overwritten (FALSE the default).
Set append = TRUE and you should be all set.
As some of the comments point out, however, you are probably better off generating your table, and then writing it all at once to a file. Unless you have billions of files, you likely won't run out of memory.
Here is how I would approach this.
library(ape)
library(data.table)
stud_files <- list.files("path/dir/data",full.names = T)
sumfunc <- function(f) {
df <- read.table(f, header=TRUE, sep=";")
df_xts <- as.xts(df$cola, order.by = as.Date(df$colb,"%m/%d/%Y"))
pet <- testa(df_xts)
res <- data.table(estimate = pet$estimate,
p.value=pet$p.value,
logi = pet$alternative)
return(res)
}
lres <- lapply(stud_files, sumfunc)
dat <- rbindlist(lres)
write.table(dat,
file = "res_testa.csv",
sep = ",",
quote = FALSE,
row.names = FALSE)
So, i have this input csv of the form,
id,No.,V,S,D
1,0100000109,623,233,331
2,0200000109,515,413,314
3,0600000109,611,266,662
I need to read the No. Column as it is(i.e., as a character). I know i can use something like this for that:
data <- read.csv("input.csv", colClasses = c("MSISDN" = "character"))
I have a code that i'm using to read the csv file in chunks:
chunk_size <- 2
con <- file("input.csv", open = "r")
data_frame <- read.csv(con,nrows = chunk_size,colClasses = c("MSISDN" = "character"),quote="",header = TRUE,)
header <- names(data_frame)
print(header)
print(data_frame)
if(nrow(data_frame) == chunk_size) {
repeat {
data_frame <- read.csv(con,nrows = chunk_size, header = FALSE, quote="")
names(data_frame)<-c(header)
print(header)
print(data_frame)
if(nrow(data_frame) < chunk_size) {
break
}
}
}
close(con)
But, here what the issue i'm facing is that, the first chunk will only read the No. Column as a character, the rest of the chunks will not.
How can i resolve this?
PS: the original input file has about 150+ columns and about 20 Million rows.
You can read the data as string with readLines and split it:
fileName <- "input.csv"
df <- do.call(rbind.data.frame, strsplit(readLines(fileName), ",")[-1]) # skipping headlines
colnames(df) <- c("id","No.","V","S","D") #adding headlines
or the direct approach with read.csv:
fileName <- "input.csv"
col <- c("integer","character","integer","integer","integer")
df <- read.csv(file = fileName,
sep = ",",
colClasses=col,
header = TRUE,
stringsAsFactors = FALSE)
You need to give the column type colClasses in the read.csv() inside the repeat procedure.
You no longer have the header so you need to define an unnamed vector to specify the colClasses.
Let's say the size of colClasses is 150.
myColClasses=rep("numeric",150)
myColClasses[2] <- "character"
repeat {
data_frame <- read.csv(con,nrows = chunk_size, colClasses=myColClasses, header = FALSE, quote="")
...
I need to run the same set of code for multiple CSV files. I want to do it with the same with macro. Below is the code that I am executing, but results are not coming properly. It is reading the data in 2-d format while I need to run in 3-d format.
lf = list.files(path = "D:/THD/data", pattern = ".csv",
full.names = TRUE, recursive = TRUE, include.dirs = TRUE)
ds<-lapply(lf,read.table)
I dont know if this is going to be useful but one of the way I do is:
##Step 1 read files
mycsv = dir(pattern=".csv")
n <- length(mycsv)
mylist <- vector("list", n)
for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i],header = T)
then I useually just use apply function to change things, for example,
## Change coloumn name
mylist <- lapply(mylist, function(x) {names(x) <- c("type","date","v1","v2","v3","v4","v5","v6","v7","v8","v9","v10","v11","v12","v13","v14","v15","v16","v17","v18","v19","v20","v21","v22","v23","v24","total") ; return(x)})
## changing type coloumn for weekday/weekend
mylist <- lapply(mylist, function(x) {
f = c("we", "we", "wd", "wd", "wd", "wd", "wd")
x$type = rep(f,52, length.out = 365)
return(x)
})
and so on.
Then I save with this following code again after all the changes I made (it is also sometime useful to split original file name and rename each files to save with a part of file name so that I can track each individual files later)
## for example some of my file had a pattern in file name such as "201_E424220_N563500.csv",so I split this to save with a new name like this:
mylist <-lapply(1:length(mylist), function(i) {
mylist.i <- mylist[[i]]
s = strsplit(mycsv[i], "_" , fixed = TRUE)[[1]]
d = cbind(mylist.i[, c("type", "date")], ID = s[1], Easting = s[2], Northing = s[3], mylist.i[, 3:ncol(mylist.i)])
return(d)
})
for(i in 1:n)
write.csv(file = paste("file", i, ".csv", sep = ""), mylist[i], row.names = F)
I hope this will help. When you get some time pleaes read about the PLYR package as I am sure this will be very useful for you, it is a very useful package with lots of data analysis options. PLYR has apply functions such as:
## l_ply split list, apply function and discard result
## ldply split list, apply function and return result in data frame
## laply split list, apply function and return result in an array
for example you can use the ldply to read all your csv and return a data frame simething like:
data = ldply(list.files(pattern = ".csv"), function(fname) {
j = read.csv(fname, header = T)
return(j)
})
So here J will be your data frame with all your csv files data.
Thanks,Ayan
I have a problems making R read a set of files in a folder and returning cross product of them.
I have a folder which contains one test.csv file and n train.csv files.
I need a loop to read though on folder and return a file that contain the cross product of test and each of the train files… so the rows of file should look like this.
test*train01
test*train02
test*train03
...
I wrote a script to make that for two defined line but don’t know how to adapt that for the whole folder and the pattern that I need.
data01 <- as.matrix(read.csv(file = "test.csv", sep = ",", header=FALSE))
data02 <- as.matrix(read.csv(file = "train.csv", sep = ",", header=FALSE))
test <- list()
test01<- list()
test02<- list()
i<- 1
while (i <= 25){
test01[[i]] <- c(data01[i, ])
test02[[i]] <- c(data02[i, ])
test[[i]]<- crossprod(test01[[i]],test02[[i]])
i <- i+1
}
write.csv(test, file="testing.csv", row.names = FALSE)
Try:
test <- function(data) {
data01 <- as.matrix(read.csv(file = "test.csv", sep = ",", header=FALSE))
data02 <- as.matrix(read.csv(file = data, sep = ",", header=FALSE))
test <- list()
test01<- list()
test02<- list()
i<- 1
while (i <= 25){
test01[[i]] <- c(data01[i, ])
test02[[i]] <- c(data02[i, ])
test[[i]]<- crossprod(test01[[i]],test02[[i]])
i <- i+1
}
return(test)
}
result <- lapply(list.files(pattern='Train.*'),test)
Then just loop result to save in CSV file.
EDIT: How to save:
files <- list.files(pattern='Train.*')
for (i in seq(length(result))) {
write.csv(result[[i]], paste0('result_',files[i]), row.names = FALSE)
}
EDIT: Saving in one file:
write.csv(do.call(rbind,result),'result.csv', row.names = FALSE) # Appending by row
or
write.csv(do.call(cbind,result),'result.csv', row.names = FALSE) # Appending by column