R Project: Loop calculation on files in folder - r

I have a bunch of text files in a folder. The script should read the whole list and do a calculation on each file. The result should be written in a "results.txt" file. I also want to have the name of the processed file in the results table and the result. But this line is still missing. But I don't know how to add it.
I am pretty far, but now I am stuck:
library(data.table)
ldf <- list() # creates a list
list_txt <- dir(pattern = "*.txt")
for (k in 1:length(list_txt)){
ldf[[k]] <- fread(list[k], select = c("Count"))
br=c(0,1,3,9,15,500) #Set breaks
bins=c(0,1,2,3,4) #Set bins
freq=hist(ldf[[k]]$Count, breaks=br, plot=FALSE)
df=data.frame(bins, frequency=freq$counts)
df$pct <- df$frequency*100 / sum(df$frequency)
df$pct<-round(df$pct,digits=0)
df$hscore<-df$pct * df$bins
hscore=sum(df$hscore)
cat(df$hscore,file="results.txt",sep="\n")
}
The error code I get is:
Error in hist.default(ldf[[k]]$Count, breaks = br, plot = FALSE) :
some 'x' not counted; maybe 'breaks' do not span range of 'x'
Any suggestions?

I tried a bit more and came to this code, wich works without error messages:
library(data.table)
ldf <- list() # creates a list
list_txt <- dir(pattern = "*.txt")
for (k in 1:length(list_txt)){
ldf[[k]] <- fread(list_txt[k], select = c("Count"))
br=c(-Inf,1,3,9,15,Inf) #Set breaks
bins=c(0,1,2,3,4) #Set bins
freq=hist(ldf[[k]]$Count, breaks=br, plot=FALSE)
df=data.frame(bins, frequency=freq$counts)
df$pct <- df$frequency*100 / sum(df$frequency)
df$pct<-round(df$pct,digits=0)
df$hscore<-df$pct * df$bins
hscore=sum(df$hscore)
cat(df$hscore,file="results.txt",sep="\n")
}
BUT, it creates a results.txt file with only 5 entries. When I call the list_txt, there are 164 files.
What could be the problem?

Related

R function displays object "obser" but won't write.CSV

My code works and displays the correct values on screen for print(obser) but will not write the .csv and when I tried str(obser) it gave error object 'obser' not found. I have tried various online help and books and the function is correctly written.
If instead of running the function in the console in RStudio, I run line by line in the scrip screen will the csv then be created?
complete <- function(directory= "specdata", id = 1:332){
# directory <- "specdata"
# id <- 1:332
files_list <- list.files(path=directory,full.names=T)[id]
NumOfFiles <- length(files_list)
obser <- data.frame()
indivFile <-data.frame()
nobserv <- vector(mode= "integer", length = NumOfFiles)
for (i in 1:NumOfFiles){
indivFile <- read.csv(files_list[i]) # read data file into df inc NA's
indivFile <- na.omit(indivFile) # removes NA prev file
x <- nrow(indivFile[1])
nobserv[i] <- x
}
x_name <-"ID"
y_name <-"nobs"
obser <- data.frame(id, nobserv)
return(obser) # object returned
print(obser)
wd <- getwd()
setwd(wd)
write.csv(obser, file="Observations2.csv")
}
Try and save the object returned from the function. Ergo:
output<-complete("specdata",id=1:332)
write.csv(output,"Observations2.csv",row.names=F)

rbind txt files from online directory (R)

I am trying to get concatenate text files from url but i don't know how to do this with the html and the different folders?
This is the code i tried, but it only lists the text files and has a lot of html code like this How do I fix this so that I can combine the text files into one csv file?
library(RCurl)
url <- "http://weather.ggy.uga.edu/data/daily/"
dir <- getURL(url, dirlistonly = T)
filenames <- unlist(strsplit(dir,"\n")) #split into filenames
#append the files one after another
for (i in 1:length(filenames)) {
file <- past(url,filenames[i],delim='') #concatenate for urly
if (i==1){
cp <- read_delim(file, header=F, delim=',')
}
else{
temp <- read_delim(file,header=F,delim=',')
cp <- rbind(cp,temp) #append to existing file
rm(temp)# remove the temporary file
}
}
here is a code snippet that I got to work for me. I like to use rvest over RCurl, just because that's what I've learned. In this case, I was able to use the html_nodes function to isolate each file ending in .txt. The result table has the times saved as character strings, but you could fix that later. Let me know if you have any questions.
library(rvest)
library(readr)
url <- "http://weather.ggy.uga.edu/data/daily/"
doc <- xml2::read_html(url)
text <- rvest::html_text(rvest::html_nodes(doc, "tr td a:contains('.txt')"))
# define column types of fwf data ("c" = character, "n" = number)
ctypes <- paste0("c", paste0(rep("n",11), collapse = ""))
data <- data.frame()
for (i in 1:2){
file <- paste0(url, text[1])
date <- as.Date(read_lines(file, n_max = 1), "%m/%d/%y")
# Read file to determine widths
columns <- fwf_empty(file, skip = 3)
# Manually expand `solar` column to be 3 spaces wider
columns$begin[8] <- columns$begin[8] - 3
data <- rbind(data, cbind(date,read_fwf(file, columns,
skip = 3, col_types = ctypes)))
}

Expected result is not printed for a for loop

I have 14 raster files in Tiff format and I want to read values of a series of pixels (same file location). However, when I ran the R code, the expected results did not show up. Could you tell me why?
#set working directory#
path <- 'E:/TSL_VCF/Tiffs'
setwd(path)
#list tiff files in the working directory#
list.files(path, pattern = 'tif')
#count the number of tiff files#
mylist <- list.files(path, pattern = 'tif')
mylength <- length(mylist)
#get values for certain "location"#
for (i in 1:mylength){
myraster <- raster(mylist[i])
mymatrix <- as.matrix(myraster)
mymatrix[1,771]
}
results are not printed because the instruction
mymatrix[1,771]
is inside the "for" loop. This:
#set working directory#
path <- 'E:/TSL_VCF/Tiffs'
setwd(path)
#list tiff files in the working directory#
list.files(path, pattern = 'tif')
#count the number of tiff files#
mylist <- list.files(path, pattern = 'tif')
mylength <- length(mylist)
#get values for certain "location"#
for (i in 1:mylength){
myraster <- raster(mylist[i])
mymatrix <- as.matrix(myraster)
print(mymatrix[1,771])
}
should work.
However, it won't store your resulting array anywhere but on the screen.
I'd suggest you to have a look at the extract function of the raster package for a better solution. If you build a rasterstack in advance using something like:
mystack <- stack(mylist)
you can also avoid looping over the files and just do something like:
result <- extract(mystack, as.matrix(c(1,771), nrow = 1))
, and you should get the results in the "result" variable
HTH,
Lorenzo
HTH,
Lorenzo
You need to explicitly call print if you want R to print stuff in a loop. For example:
m = rnorm(10)
for (i in 1:10) m[i] # doesn't print
for (i in 1:10) print(m[i]) # print

how to convert results of a for loop to a list?

I have several files in a directory. I can read them like this:
files <- list.files("C:\\New folder", "*.bin",full.names=TRUE)
for (i in 1:length(files)) {
conne <- file(files[i], "rb")
file <- readBin(conne, double(), size=4, n=300*700, signed=TRUE)
file2 <- matrix(data=file,ncol=700,nrow=300)
}
I wonder how can I put all the matrices (file2) as a list?
For instance:
m1<-matrix(nrow=4,ncol=2,data=runif(8))
m2<-matrix(nrow=4,ncol=2,data=runif(8))
I put them in a list as:
ml <- list(m1, m2)
In addition to akrun's answer, you could also just put them in a list to begin with by taking advantage of the lapply function. Modifying your code just slightly, it would look like this:
files <- list.files("C:\\New folder", "*.bin",full.names=TRUE)
dat <- lapply(1:length(files), function(i) {
conne <- file(files[i], "rb")
file <- readBin(conne, double(), size=4, n=300*700, signed=TRUE)
file2 <- matrix(data=file,ncol=700,nrow=300)
close(conne) # as indicated in the comments below
return(file2)
})
dat is now a list of all of your matrices. lapply acts as a loop, much like for, and will pass each iteration of its first argument, here 1:length(files), to the function as a parameter. The returned value it gets from the function will be passed to the list called dat as its own element.
Assuming that the OP created objects 'm1', 'm2' etc in the global envrironment, we can use mget to get the values of the object in a list by specifying the pattern argument in the ls as 'm' followed by numbers (\\d+).
mget(ls(pattern='m\\d+'))
If the question is to split up a large matrix into chunks
n <- 4
lapply(split(seq_len(nrow(m)),
as.numeric(gl(nrow(m), n, nrow(m)))), function(i) m[i,])

why R does not recognise columns and rows as they are supposed to be?

The code given bellow is to extract values of one region and write that to a text file(there are 365 binary files in the directory).
The problem which I am facing is that all my files are binary with size of 360 rows and 720 columns.
I specified that in this line:
file2 <- matrix(data=file,ncol=720,nrow=360)
but I got an error:
Error in mean(file2[X, Y], na.rm = TRUE) : subscript out of bounds.
and then I rewrote the above line as:
file2 <- matrix(data=file,ncol=360,nrow=720)
I put ncol=360 and nrows =720 which is not right. But that worked and I didn't get any error. However, the results were not correct.
X <- c(364:369) ; Y <- c(82:92) ##### for sellected region
extract <- vector()
dir1 <- list.files("C:\\Users\\Climate_Rad_f_GAMMA_%d.img", full.names = TRUE)
listfile<-dir1()
for (i in c(1:365))
{
conne <- file(listfile[i], "rb")
file <- readBin(conne, double(), size=4, n=720*360, signed=T)
file2 <- matrix(data=file,ncol=720,nrow=360)
extract[i] <- mean(file2[X,Y],na.rm=TRUE)
close(conne)
write.table(extract,"C:\\Users\\sam.txt")
}
In this code:
mean(file2[X,Y],na.rm=TRUE)
You are selecting rows X and columns Y. You have but 360 rows. You might want this instead:
mean(file2[Y,X],na.rm=TRUE)

Resources