import multiple txt files into R - r

I am working with MODIS 8-day data and am trying to import all the txt files of one MODIS product into the R, but not as one single data.frame, as individual txt files. So I can later apply same functions on them. The main objective is to export specific elements within each txt file. I was successful in excluding the desired elements from one txt file with the following command:
# selecting the element within the table
idxs <- gsub("\\]",")", gsub("\\[", "c(", "[24,175], [47,977], [159,520], [163,530]
,[165,721], [168,56], [217,820],[243,397],[252,991],[284,277],[292,673]
,[322,775], [369,832], [396,872], [434,986],[521,563],[522,717],[604,554]
,[608,50],[614,69],[752,213],[780,535],[786,898],[788,1008],[853,1159],[1014,785],[1078,1070]") )
lst <- rbind( c(24,175), c(47,977), c(159,520), c(163,530) ,c(165,721), c(168,56), c(217,820),c(243,397),c(252,991),c(284,277),c(292,673),c(322,775), c(369,832), c(396,872), c(434,986),c(521,563),c(522,717),c(604,554),c(608,50),c(614,69),c(752,213),c(780,535),c(786,898),c(788,1008),c(853,1159),c(1014,785),c(1078,1070))
mat <- matrix(scan("lst.txt",skip = 6),nrow=1200)
Clist <- as.data.frame(mat[lst])
But I need these element from all of the txt files and honestly I do not want to run it manually for 871 times. So I try to read all the txt files and then apply this function to them. but unfortunately it does not work. here is my approach:
folder <- "C:/Users/Documents/R/MODIS/txt/"
txt_files <- list.files(path=folder, pattern=".txt")
df= c(rep(data.frame(), length(txt_files)))
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(txt_files[i]))}
and this is the error I encounter:
**Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'rastert_a2001361.txt': No such file or directory**
additional information: each txt file includes 1200rows and 1200columns and 20-30 elements need to be extracted from the table.
I am very much looking forward for your answers and appreciate any helps or recommendations with this matter.

The issue is that list.files returns only the file name within the folder, not the full path to the file. If you working direction is not "C:/Users/Documents/R/MODIS/txt/" your code could not work. Change your code to
for(i in 1:length(txt_files)) {df[[i]]<- as.list(read.table(file.path(folder, txt_files[i])))}
Now it should be working.
file.path combines your path and your file with correct, OS specific, path seperator.

Related

How to get passed the following error "Error in readLines(filestocopy) : 'con' is not a connection"?

I am new to coding and very new to this forum, so I hope my request makes sense.
I am trying to select images listed in a .csv file and to copy them to a new folder. The pictures and the .csv file are both in the folder GRA04. The .csv file contain only one column with the picture names.
I used the following code:
#set working directory
setwd("E:/2019/GRA04")
#create and identify a new folder in R
targetdir <- dir.create("GRA04_age")<br/>
#find the files you want to copy
filestocopy <- read.csv("age.csv", header=FALSE) #read csv as data table (only one column, each raw being a file name)
filestocopy_v <- readLines(filestocopy)#convert data table in character vector
filestocopy_v #shows the character vector
#copy the files to the new folder
file.copy(filestocopy_v, targetdir, recursive = TRUE)
When reaching the line
filestocopy_v <- readLines(filestocopy)
I get this error message:
Error in readLines(filestocopy) : 'con' is not a connection
I looked online for solutions with no luck. I ran this code before (or else something similar... didn't back it up...) and it worked fine, so I am not sure what is happening...
Thanks!
Out of interest, would the following now do what you're trying to achieve?
filestocopy_v <- filestocopy[[1]]

How to read in file with dynamic name while avoiding hard-coding in R?

I run into issues reading in csv files with dynamic names and avoiding hard coding the file path. I'd like short tidy code (non-hardcoded). If I hardcode the full path (everything before the "~") it reads in the files fine. But soft-coding (if that is the opposite of hard coding) the file path it gives the error (despite showing the correct path in the error. I have two variable parts of the file name that I paste into the file name before reading it in. If I avoid paste and just type a path per individual it also works.
#dynamic part I usually have in a loop with all the options.
part_a <- "outside" #other options here in my loop include "inside"
part_b <- "late" # other option "early" or "preterm"
#reading in the df
df <-read.csv(paste0("~/Data/FromR/clean_",part_a,part_b,"_2016.csv"),
check.names=FALSE, na.strings="null")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'C:/Users/myname/Documents/Data/FromR/clean_outsidelate_2016.csv': No such file or directory
if I use getwd() in the first part of the paste in place of ~ as suggested here it works by producing this string "C:/Users/myname/Documents/MyR_Projects/Specific_R_project/" at the beginning of the paste. But how can I get it to work with the "~"? when using the ~ it stops at the "Documents" folder...
The desired outcome is to read in the file without error perform functions and repeat with other files. My loop works fine hardcoded, and I only wanted to make it more general or softcoded.
I just tried to read a file (testFile.txt) in my home from a different wdand it works fine with ~
myFile <- "testFile
mymy <- ".txt"
ciao <- read.delim(paste0("~/",myFile,mymy))
In powershell you can use %~% (have a look here tread), but I am not sure how to expand the $HOME in R.
#-------- edit
Have a look here and here. Basically any variable defined in your .Renviron should be accessible.

Function reading in a folder of files returning "Error in file(con, "r"): cannot open conection..."

I'm trying to import a folder of files into R. The following code works for one folder that contains the same type of files, but will not work for another folder. The type of data is the same (both debian files formatted in the same way, just containing different subject's data).
The following code allows me to read all the files (named subject1-subject10) in the "Data1" folder and put it into a list named Data:
files <- as.character(list.files(path="/Users/wendy/Box Sync/Data1"))
data <- list()
for (i in seq_along(files)) {
data[[i]] <- readLines(files[[i]])
}
But the following code does not work - this folder (Data2) contains subject11 - subject50:
files <- as.character(list.files(path="/Users/wendy/Box Sync/Data2"))
data <- list()
for (i in seq_along(files)) {
data[[i]] <- readLines(files[[i]])
}
This brings up the following message:
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'subject11': No such file or directory
I'm confused, because both folders, containing their respective subject data are in the same file path, except for the last folder name in the path.
The second folder (Data2) differs only in the following ways:
Number of files in the folder
contains different subjects
There is more data (more variables) recorded in "Data2" (e.g. recording age, height, race in Data 2 versus only recording age and height in Data1)
If I were to put some of Data2's files into the Data1 folder and run the top code again, it will produce the same error message as when I run the second code chunk.
You should add the full.names option.
list.files(path="/Users/wendy/Box Sync/Data2", full.names = TRUE)
Without it, it only outputs the name of the files, and thus it works only if files with that exact file name are found in the current working directory.

Looping over a set of standardized files to collect information and save it in a different files

I have several files in a folder. They all have same layout and I have extracted the information I want from them.
So now, for each file, I want to write a .csv file and name it after the original input file and add "_output" to it.
However, I don't want to repeat this process manually for each file. I want to loop over them. I looked for help online and found lots of great tips, including many in here.
Here's what I tried:
#Set directory
dir = setwd("D:/FRhData/elb") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = matrix()
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = readLines(filelist[i])
*code with all calculations*
write.csv(x = finalDF, file = paste (filename[i] ,"_output. csv")
}
Unfortunately, it didn't work out. Here's the error message I get:
Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
In addition: Warning message:
In myfile[i] <- readLines(filelist[i]) :
number of items to replace is not a multiple of replacement length
And 'report2016-03.txt' is the name of the first file the code should be executed on.
Does anyone know what I should do to correct this mistake - or any other possible mistakes you can foresee?
Thanks a lot.
======================================================================
Here's some of the resources I used:
https://www.r-bloggers.com/looping-through-files/
How to iterate over file names in a R script?
Looping through files in R
Loop in R loading files
How to loop through a folder of CSV files in R
This worked for me. I used a vector instead of a matrix, took out the readLines() call and used paste0 since there was no separator.
dir = setwd("C:/R_projects") #set directory
filelist = list.files(dir) #save file names into filelist
myfile = vector()
finalDF <- data.frame(a=3, b=2)
#Read files into R
for ( i in 1:length(filelist)){
myfile[i] = filelist[i]
write.csv(x = finalDF, file = paste0(myfile[i] ,"_output.csv"))
}
list.files(dir)

coursera air pollution assignment

Using Mac OS 10.10.3
RStudio Version 0.98.1103
My working directory is a list of 332 .csv files and I set it correctly. Here's the code:
pollutantmean <- function(directory, pollutant, id = 1:332) {
all_files <- list.files(directory, full.names = T)
dat <- data.frame()
for(i in id) {
dat <- rbind(dat, read.csv(all_files[i]))
}
ds <- (dat[, pollutant], na.rm = TRUE)
mean(ds[, pollutant])
}
Part of the assignment is to get the mean of the first 10 numeric values of a pollutant. To do this, I used the call function (where "spectata" is the directory with 332 .csv files):
pollutantmean(specdata, "Nitrate", 1:10)
The error messages I get are:
**Error in file(file, "rt") : cannot open the connection
** In addition: Warning message: In file(file, "rt") : cannot open file 'NA': No such file or directory
Like many students that have posed questions here, I’m new to programming and to R and still distant from getting any results when calling my function. There are many questions and answers about this coursera assignment in stack overflow but my review of these exchanges hasn't addressed the bug in my code.
Anyone have a suggestion how to fix the bug?
In addition to the other answers is you can try this:
all_files <- list.files(directory, pattern="*.csv", full.names = TRUE)
to avoid select any other kind of file.
or even this strange one
all_files <- paste(directory, "\\", sprintf("%03d", id), ".csv", sep="")
I take the time to answer since the question comes back at every Coursera session.
First, be careful with the typo : Do call pollutantmean("specdata", "Nitrate", 1:10)
instead of pollutantmean(specdata, "Nitrate", 1:10.
Then your working directory should be the parent directory of "specdata" (for exemple, if your path was /dev/specdata, your working directory should have been /dev).
You can get the current working directory with getwd() and set the new one with setwd() (careful there, the path would be relative to the current working directory).
Add a line after all_files <- list.files(directory, full.names = TRUE) (it's a bad habit to use T instead of TRUE):
print(all_files)
Then call your function again, so you will see the content of that object. Then, check where are you working with getwd().
Modify your line no. 5 to dat <- rbind(dat, read.csv(i, comment.char = ""))
This will bind the data of all csv files to 'dat' dataframe.
Based upon the information provided, it can be assumed there are not 332 files in the directory you specify (if one attempts to access an index of a vector that is out of bounds, an NA is returned - hence the error "cannot open file 'NA'"). This is suggestive that the path you are using (which is not provided) points to a directory which does not contain the csv files (presuming there truly are 332 files in that directory). Some suggestions:
Check that the directory you are providing is accurate. Simply do a list.files to see what files exist in the directory you are using.
Use the pattern argument of list.files to be sure you are only going to read the csv files
Loop over the files using the length of the vector returned from list.files, rather than having to code this manually
You can add a sanity check to be sure you are reading all files by printing out each file, or returning a list containing the results and file names

Resources