R iterate to read csv files - r

pollutantmean <- function(id){
n <- length(id)
for (i in 1 : n){
pol <- read.csv('id[i].csv')
}
}
pollutantmean(150:160)
The filenames of csv are like 001.csv, 002.csv, 100.csv etc
001, 002 and 100, these are id, and each csv has a column of id whose content is 1 if the filename is 001.
When I run this code, the console remind me this is no such file id[i].csv

First of all, you don't need a loop. And second, you need to think about how to represent ids.
ids <- sprintf("%03i", 1:999) # 0's are padded at the beginning
filenames <- paste0(ids, ".csv")
results <- lapply(filenames, read.csv) # you get a list of data frames
Alternatively you can read in all csv files in a certain folder using, say:
results <- lapply(dir(pattern="\\.csv$"), read.csv)
The "\.csv$" stuff means that ".csv" has to be at the end of the filename. (see ?regexpr for technicalities)
... and a function that takes a number and gives you back a data frame would look like this:
read.this <- function(i) read.csv(sprintf("%003i.csv",i))
... And now you can lapply it to your desired range:
lapply(101:150, read.this)

The first problem is line 4 and it should be replaced by
pol <- read.csv(paste0(id[i], ".csv"))
If id[i] is within quotes (either simple or double), it's understood litterally by read.csv, eg the function is looking for something named id[i].csv and which explains your error message.
But with such function, pol will be overwritten anyway at every step anyway.
If you really want to wrapup these lines into a function you need to return a list:
pollutantmean <- function(id){
res <- vector("list", length(id))
for (i in 1:n){
res[[i]] <- read.csv(paste0(id[i], ".csv"))
}
}
But a loop here would not be very elegant here, so we can simply:
pollutantmean <- function(id){
lapply(id, function(i) read.csv(paste0(i, ".csv"))
}
Or even (no function option) this should work:
lapply(id, function(i) read.csv(paste0(i, ".csv"))

Related

Add multiple dataframes to one dataframe without overwriting the existing dataframe in R

I have 332 csv files and each file has the same number of variables and the same format, and I need to create a function that every time the user calls it, can specify the folder where the csv files are located and the id of the csv files they want to store in one data frame.
The name of the files follows the next format: 001.csv, 002.csv ... 332.csv.
data <- function(directory, id_default = 1:332){
setwd(paste0("/Users/", directory))
id <- id_default
for(i in length(id)){
if(i < 10){
aux <- paste0("00",i)
filename <- paste0(aux,".csv")
}else if(i < 100){
aux <- paste0("0", i)
filename <- paste0(aux, ".csv")
}else if(i >= 100){
filename <- paste0(i, ".csv")
}
my_dataframe <- do.call(rbind, lapply(filename, read.csv))
}
my_dataframe #Print dataframe
}
But the problem is that it only store the last csv file, it seems that every time that enters the loop it overwrites the dataframe with the last csv file.
How do I fix it? Plz help
Here, we are looping over the last 'id', i.e the length. Instead it should be
for(i in 1:length(id))
Or more correctly
for(i in seq_along(id))
In addition to the issue with looping, the if/else if is not really needed. We could use sprintf
filenames <- sprintf('%03d.csv', id)
i.e.
data <- function(directory, id_default = 1:332){
setwd(paste0("/Users/", directory))
filenames <- sprintf('%03d.csv', id_default)
do.call(rbind, lapply(filenames, read.csv))
}
A tidy solution will use purrr (better than a loop for this task): https://purrr.tidyverse.org/reference/map.html
library(tidyverse)
directory <- "directory"
id <- c(1,20,300)
# add leading 0s with stringr's str_pad
id %<>% str_pad(3, pad = "0")
It is best to avoid using setwd() like this.
Instead, add directory to the file paths.
paths <- str_c(directory, "/", id, ".csv")
# map files to that function (similar to a loop) and stack rows
map_dfr(paths, read_csv)
Even better, use here()--it makes file paths work: https://github.com/jennybc/here_here
paths <- str_c(
here::here(directory, id),
".csv")
# map files to that function (similar to a loop) and stack rows
map_dfr(paths, read_csv)
Your example seems to want to make the default id's 1:332. If we wanted all files in the directory, we could use paths <- list.files(here::here(directory)).
read_my_data <- function(directory, id = 1:332){
paths <- str_c(
here::here(directory, str_pad(id, 3, pad = "0")),
".csv")
map_dfr(paths, read_csv)
}
read_my_data("directory")
If you need to combine files from multiple directories in parallel, you can use pmap_dfr()

Specifying consecutive file names and assigning consecutive vectors with counter variable in for loops

I am trying to analyze 10 sets of data, for which I have to import the data, remove some values and plot histograms. I could do it individually but can naturally save a lot of time with a for loop. I know this code is not correct, but I have no idea of how to specify the name for the input files and how to name each iterated variable in R.
par(mfrow = c(10,1))
for (i in 1:10)
{
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
What I want to do is to have the counter number in every "i" in my code. Am I just approaching this the wrong way in R? I have read about the assign and paste functions, but honestly do not understand how I can apply them properly in this particular problem.
you can do if in several ways:
Use list.files() to get all files given directory. You can use regular expression as well. See here
If the names are consecutive, then you can use
for (i in 1:10)
{
filename <- sprintf("freqspeci.frq_%s",i)
freqi <- read.delim(filename, sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
Use also can use paste() to create file name.
paste("filename", 1:10, sep='_')
you could just save all your datafiles into an otherwise empty Folder. Then get the filenames like:
filenames <- dir()
for (i in 1:length(filenames)){
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
# and here whatever else you want to do on These files
}

how to convert results of a for loop to a list?

I have several files in a directory. I can read them like this:
files <- list.files("C:\\New folder", "*.bin",full.names=TRUE)
for (i in 1:length(files)) {
conne <- file(files[i], "rb")
file <- readBin(conne, double(), size=4, n=300*700, signed=TRUE)
file2 <- matrix(data=file,ncol=700,nrow=300)
}
I wonder how can I put all the matrices (file2) as a list?
For instance:
m1<-matrix(nrow=4,ncol=2,data=runif(8))
m2<-matrix(nrow=4,ncol=2,data=runif(8))
I put them in a list as:
ml <- list(m1, m2)
In addition to akrun's answer, you could also just put them in a list to begin with by taking advantage of the lapply function. Modifying your code just slightly, it would look like this:
files <- list.files("C:\\New folder", "*.bin",full.names=TRUE)
dat <- lapply(1:length(files), function(i) {
conne <- file(files[i], "rb")
file <- readBin(conne, double(), size=4, n=300*700, signed=TRUE)
file2 <- matrix(data=file,ncol=700,nrow=300)
close(conne) # as indicated in the comments below
return(file2)
})
dat is now a list of all of your matrices. lapply acts as a loop, much like for, and will pass each iteration of its first argument, here 1:length(files), to the function as a parameter. The returned value it gets from the function will be passed to the list called dat as its own element.
Assuming that the OP created objects 'm1', 'm2' etc in the global envrironment, we can use mget to get the values of the object in a list by specifying the pattern argument in the ls as 'm' followed by numbers (\\d+).
mget(ls(pattern='m\\d+'))
If the question is to split up a large matrix into chunks
n <- 4
lapply(split(seq_len(nrow(m)),
as.numeric(gl(nrow(m), n, nrow(m)))), function(i) m[i,])

How to read variable number of files and then combine the data frames in R?

I would like to design a function. Say I have files file1.csv, file2.csv, file3.csv, ..., file100.csv. I only want to read some of them every time I call the function by specifying an integer vector id, e.g., id = 1:10, then I will read file1.csv,...,file10.csv.
After reading those csv files, I would like to row combine them into a single variable. All csv files have the same column structure.
My code is below:
namelist <- list.files()
for (i in id) {
assign(paste0( "file", i ), read.csv(namelist[i], header=T))
}
As you can see, after I read in all the data matrix, I stuck at combining them since they all have different variable names.
You should read in each file as an element of a list. Then you can combine them as follows:
namelist <- list.files()
df <- vector("list", length = length(id))
for (i in id) {
df[[i]] <- read.csv(namelist[i], header = TRUE)
}
df <- do.call("rbind", df)
Or more concisely:
df <- do.call(rbind, lapply(list.files(), read.csv))
I do this, which is more R like without the for loop:
## assuming you have a folder full of .csv's to merge
filenames <- list.files()
all_files <- Reduce(rbind, lapply(filenames, read.csv))
If I understand correctly what you want to do then this is all you need:
namelist <- list.files()
singlevar = c()
for (i in id) {
singlevar = rbind(singlevar, read.csv(namelist[i], header=T))
}
Since in the end you want one single object to contain all the partial information from the single files, rbind as you go.

I'm trying to skip over errors and warnings in this for loop in r but its not working?

I am trying to read csv files with their names as dates into a for loop and then print out a few columns of data from that file when it is actually there. I need to skip over the dates that I don't have any data for and the dates that don't actually exist. When I put in my code there is no output, it is just blank. Why doesn't my code work?
options(width=10000)
options(warn=2)
for(a in 3:5){
for(b in 0:1){
for(c in 0:9){
for(d in 0:3){
for(e in 0:9){
mydata=try(read.csv(paste("201",a,b,c,d,e,".csv",sep="")), silent=TRUE)
if(class(mydata)=="try-error"){next}
else{
mydata$Data <- do.call(paste, c(mydata[c("LAST_UPDATE_DT","px_last")], sep=""))
print(t(mydata[,c('X','Data')]))
}
}}}}}
That's a really terrible way to read in all your files. Try this:
f <- list.files(pattern="*.csv")
mydata <- sapply(f, read.csv, simplify=FALSE)
This will return a list mydata of data frames, each of which is the contents of the corresponding file.
Or, if there are other csv files that you don't want to read in, you can restrict the specification:
f <- list.files(pattern="201\\d{5}\\.csv")
And to combine everything into one big data frame:
do.call(rbind, mydata)

Resources