Iteration/loop through imported data in R - r

I have created a working process below that allows for the graph's baseline correction for a given data set outlined below.
setwd("C:/Users/o/OneDrive/Desktop")
importData = (read.delim("OSJH103h.txt", header=F))
matrixData = as.matrix(importData)
swappedColRow = t(matrixData)
row.names(swappedColRow) = c(1,2)
removedColumn = swappedColRow[-c(1),]
matrixRemovedCol = as.matrix(removedColumn)
swappedMatrix = t(matrixRemovedCol)
bc.irls = baseline(swappedMatrix, lambda=2, hwi=100, it=10, int=2000, method = 'fillPeaks')
mf = getCorrected(bc.irls)
mf2d=data.frame(ys=mf[1,], xs=importData$V1)
par(mfrow=c(1,1))
plot(x=mf2d$xs, y=smooth(mf2d$ys), col=2, type="lines")
How would I import multiple data files that could be iterated/looped through and remove the baseline for each given dataset?
I have outlined a method for importing all the .txt files in a given directory.
temp = list.files(pattern="*.txt")
myfiles = lapply(temp, read.delim, header=FALSE)
The files are imported as [[1]], [[2]], [[3]]...
Thus replacing 'importData' for myfiles[[2]] yields the same result
Looking for a way to import ~10/15 data sets at a time and remove the baseline for each. Then ideally, export corrected data to a separate txt file.
I hope this makes sense. Any help would be appreciated.

Perhaps this:
library(baseline)
temp = list.files(pattern="*.txt")
reproc_base <- function(temp) {
importData = lapply(temp, read.delim, header=FALSE)
matrixData = lapply(importData,as.matrix)
swappedColRow = lapply(matrixData, t)
swappedColRow = lapply(swappedColRow, row.names,c(1,2)) # uncertain
# lapply(myList, function(x) { x["ID"] <- NULL; x }) SOF?12664430
removedColumn = lapply(swappedColRow, function(x) {x[1, ] <- NULL; x}) # uncertain
matrixRemovedCol = lapply(removedColumn, as.matrix)
swappedMatrix = lapply(matrixRemovedCol, t)
bc.irls = lapply(swappedMatrix, baseline, lambda=2, hwi=100, it=10, int=2000, method = 'fillPeaks')
mf = lapply(bc.irls, getCorrected)
return(mf)
}
#while debugonce(reproc_base), you'll probably just want 1 file
debugonce(reproc_base)
test_mf <- reproc_base(temp[1])
Well, as you see, there are a couple of notations that I'm uncertain about. But play with it in debugonce(reproc_base) or debug(reproc_base) and let's see where it breaks. And anonymous function SOF 12664430.

My solution in case anyone is interested
temp = list.files(pattern="*.txt", full.names = T)
myfiles = lapply(temp, read.delim, header=FALSE)
for (i in 1:length(temp)){
matrixData = as.matrix(myfiles[[i]])
swappedColRow = t(matrixData)
row.names(swappedColRow) = c(1,2)
removedColumn = swappedColRow[-c(1),]
matrixRemovedCol = as.matrix(removedColumn)
swappedMatrix = t(matrixRemovedCol)
bc.irls = baseline(swappedMatrix, lambda=2, hwi=100, it=10, int=2000, method = 'fillPeaks')
# plot(bc.irls)
mf = getCorrected(bc.irls)
mf2d=data.frame(xs=myfiles[[i]]$V1, ys=mf[1,])
par(mfrow=c(1,1))
plot(x=mf2d$xs, y=smooth(mf2d$ys), col=2, type="l")
teststr<-temp[i]
str_sub(teststr,1,2)<-""
str_sub(teststr,-4,str_length(teststr))<-""
teststr
write.csv(mf2d,paste0(teststr," BLC.csv"), row.names = FALSE)
}

Related

How to simultaneously perform same code on multiple datasets of different lengths in Rstudio?

I need to use the functions detrend() and chron() from the dplR package on >300 tree-ring width datasets (.rwl files), of differing lengths. Rather than copying and pasting the code for each object, I would like to do this simultaneously. After some google-ing, it looks like I need to develop a for loop, but I have not had much luck after some troubleshooting. Could someone help put me in the right direction? Below is my current code.
##read data files in
or001 <- read.rwl("or001.rwl", format = "tucson")
or002 <- read.rwl("or002.rwl", format = "tucson")
or004 <- read.rwl("or004.rwl", format = "tucson")
#detrend - negex method
or001.negex <- detrend(or001, nyrs = NULL, method = "ModNegExp", f = 0.5,
pos.slope = FALSE)
or002.negex <- detrend(or002, nyrs = NULL, method = "ModNegExp", f = 0.5,
pos.slope = FALSE)
or004.negex <- detrend(or004, nyrs = NULL, method = "ModNegExp", f = 0.5,
pos.slope = FALSE)
#build final chronology
or001.negex.crn <- chron(or001.negex, prefix = 'OR')
or002.negex.crn <- chron(or002.negex, prefix = 'OR')
or004.negex.crn <- chron(or004.negex, prefix = 'OR')
#export final chronologies
write_excel_csv(or001.negex.crn, path = "or001.negex.crn.csv")
write_excel_csv(or002.negex.crn, path = "or002.negex.crn.csv")
write_excel_csv(or004.negex.crn, path = "or004.negex.crn.csv")
Consider reading the datasets in a list and apply the same function by creating a function ('f1')
f1 <- function(file, filenm) {
dat <- read.rwl(file, format = "tucson")
negex <- detrend(dat, nyrs = NULL, method = "ModNegExp", f = 0.5,
pos.slope = FALSE)
negex.crn <- chron(negex, prefix = 'OR')
write_excel_csv(negex.crn, path = filenm)
return(negex.crn)
}
# // get all the files with the `.rwl` pattern
# // from the current working directory
files <- list.files(pattern = "\\.rwl$", full.names = TRUE)
# // change the file names by replacing the suffix with negex.crn.csv
# // loop over the files, and apply the function
nm1 <- sub("\\.rwl", "negex.crn.csv", basename(files))
Map(f1, file = files, filenm = nm1)

For-loop in R to create a new file (but gives incorrect/unexpected output)

I'm currently busy with some data and I need to check their validity.
Therefore, I would like to use a for-loop to go through all my data files.
In this for-loop, I would like to calculate some things (like mean, min,max...).
My code below works but produced an incorrectly written csv file. The problem occurs after the calculations (and their values) are done during csv file creation. CSV:
"c.1..1..1004.89081855716..630.174466667434..461.738905906677.." "c.1..1..950.990843858612..479.98560814955..517.955102920532.."
1 1
1 1
1004.89081855716 950.990843858612
630.174466667434 479.98560814955
461.738905906677 517.955102920532
1535.86795806885 1452.30199813843
-13.3948961645365 3.72026950120926
1259.26423788071 1159.17089223862
Approach/What I'm expecting:
So I start from some data files with eye tracking data in it.
As you can see at the beginning of the code, I try to get some values out of this eye tracking data (validity, new file with only validity == 1 data...). Once I created the filtered_data dataframe, I want to calculate some extra values out of it (mean, sd, min/max).
My plan is to create a new csv file (validity_loop.csv) in which I can find all my calculations (validity_left, validity_right,mean_eye_x, mean_eye_y, min_eye_x,max_eye_x,min_eye_y,max_eye_y). All in a row. One row for each data set (file_list[i]).
Can someone help me in how to tackle and solve this issue?
Here is my code:
set <- setwd("/Users/Sarah/Documents")
file_list <- list.files(set, pattern = ".csv", all.files = TRUE)
validity_list <- data_list <- vector("list", "length" = length(file_list))
for(i in seq_along(file_list)){
filename = file_list[i]
#read files
data_frame = read.csv(filename, sep = ",", dec = ".",
header = TRUE,
stringsAsFactors = FALSE)
#what has to be done
#validity
validity_left <- mean(is.numeric(data_frame$left_gaze_point_validity))
validity_right <-mean(is.numeric(data_frame$right_gaze_point_validity))
#Zuiver dataframe (validity ==1)
to_keep = which(data_frame$left_gaze_point_validity == 1 &
data_frame$right_gaze_point_validity==1)
filtered_data = data_frame[to_keep,]
filtered_data$left_eye_x = as.numeric(filtered_data$left_eye_x)
filtered_data$left_eye_y = as.numeric(filtered_data$left_eye_y)
filtered_data$right_eye_x = as.numeric(filtered_data$right_eye_x)
filtered_data$right_eye_y = as.numeric(filtered_data$right_eye_y)
#1 eye-data
filtered_data$eye_x <- (filtered_data$left_eye_x+filtered_data$right_eye_x)/2
filtered_data$eye_y <- (filtered_data$left_eye_y+filtered_data$right_eye_y)/2
#Pixels
filtered_data$eye_x <- (filtered_data$eye_x)*1920
filtered_data$eye_y <- (filtered_data$eye_y)*1080
#SD and Mean + min-max
mean_eye_x<- mean(filtered_data$eye_x)
mean_eye_y <- mean(filtered_data$eye_y)
sd_eye_x <- sd(filtered_data$eye_x)
sd_eye_y <- sd(filtered_data$eye_y)
min_eye_x <- min(filtered_data$eye_x)
min_eye_y <- min(filtered_data$eye_y)
max_eye_x <- max(filtered_data$eye_x)
max_eye_y <- max(filtered_data$eye_y)
#add everything to new file
validity_list[[i]] <- c(validity_left, validity_right,
mean_eye_x, mean_eye_y,
min_eye_x, min_eye_y,
max_eye_x, max_eye_y)
}
#new document
write.table(validity_list,
file = "Master T&O/Thesis /Loop/Validity/validity_loop.csv",
col.names = TRUE, row.names = FALSE)
I managed to get a new data frame in R, which contains the value of my validity_list as a matrix form.
#FOR LOOP poging 2
set <- setwd("/Users/Sarah/Documents/Master T&O/Thesis /Loop")
file_list <- list.files(set, pattern = ".csv", all.files = TRUE)
validity_list <- vector("list", "length" = length(file_list))
for(i in seq_along(file_list)){
filename = file_list[i]
#read files
data_frame = read.csv(filename, sep = ",", dec = ".", header = TRUE, stringsAsFactors = FALSE)
#what has to be done
#validity
validity_left <- mean(is.numeric(data_frame$left_gaze_point_validity))
validity_right <-mean(is.numeric(data_frame$right_gaze_point_validity))
#Zuiver dataframe (validity ==1)
to_keep = which(data_frame$left_gaze_point_validity == 1 & data_frame$right_gaze_point_validity==1)
filtered_data = data_frame[to_keep,]
filtered_data$left_eye_x = as.numeric(filtered_data$left_eye_x)
filtered_data$left_eye_y = as.numeric(filtered_data$left_eye_y)
filtered_data$right_eye_x = as.numeric(filtered_data$right_eye_x)
filtered_data$right_eye_y = as.numeric(filtered_data$right_eye_y)
#1 eye-data
filtered_data$eye_x <- (filtered_data$left_eye_x+filtered_data$right_eye_x)/2
filtered_data$eye_y <- (filtered_data$left_eye_y+filtered_data$right_eye_y)/2
#Pixels
filtered_data$eye_x <- (filtered_data$eye_x)*1920
filtered_data$eye_y <- (filtered_data$eye_y)*1080
#SD and Mean + min-max
mean_eye_x<- mean(filtered_data$eye_x)
mean_eye_y <- mean(filtered_data$eye_y)
sd_eye_x <- sd(filtered_data$eye_x)
sd_eye_y <- sd(filtered_data$eye_y)
min_eye_x <- min(filtered_data$eye_x)
min_eye_y <- min(filtered_data$eye_y)
max_eye_x <- max(filtered_data$eye_x)
max_eye_y <- max(filtered_data$eye_y)
#add everything to new file
validity_list[[i]] <- c(validity_left, validity_right,mean_eye_x, mean_eye_y, min_eye_x,max_eye_x,min_eye_y,max_eye_y)
validity_matrix <- matrix(unlist(validity_list), ncol = 8, byrow = TRUE)
}
#new document
write.table(validity_matrix, file = "/Users/Sarah/Documents/Master T&O/Thesis /Loop/Validity/validity_loop.csv", dec = ".")
The only problem I have now, is the fact that my values for the validity_list items are wrong, but that's another problem and I'm trying to fix it!
If I get it then the following line grabs all your data together:
validity_list[[i]] <- c (validity_left, validity_right,mean_eye_x,
mean_eye_y, min_eye_x,max_eye_x,min_eye_y,max_eye_y).
if it's like in python then I would have:
validity_list = (validity_left, validity_right,mean_eye_x,
mean_eye_y, min_eye_x,max_eye_x,min_eye_y,max_eye_y)
... whereas the '=' tell the interpreter that everything behind it is a tuple '(', data, ')' ...which makes it one single dataset and if I then write it... it would be end up in one column. If you do a pick using a for-loop I would get "validity_left" writing in a separate column. In your case adding this to your below code an option?
for item in validity_list:
function to process item..etc.

write results sequentially in a loop in r

I have a bunt of single files which need to apply a test. I need to find the way to write automatically results of each file into a file. Here is what I do:
library(ape)
stud_files <- list.files("path/dir/data",full.names = T)
for (f in stud_files) {
df <- read.table(f, header=TRUE, sep=";")
df_xts <- as.xts(df$cola, order.by = as.Date(df$colb,"%m/%d/%Y"))
pet <- testa(df_xts)
res <- data.frame(estimate = pet$estimate,
p.value=pet$p.value,
logi = pet$alternative)
write.dna(res,file = "res_testa.xls",format = "sequential")
}
This loop works well, except the last command which aim to write the results of each file consecutively, it saved only the last performance. And the results save as string, not a table as I define above (data.frame). Any idea in this case? Thanks in advance
Check help(write.dna).
write.dna(x, file, format = "interleaved", append = FALSE,
nbcol = 6, colsep = " ", colw = 10, indent = NULL,
blocksep = 1)
append a logical, if TRUE the data are appended to the file without
erasing the data possibly existing in the file, otherwise the file (if
it exists) is overwritten (FALSE the default).
Set append = TRUE and you should be all set.
As some of the comments point out, however, you are probably better off generating your table, and then writing it all at once to a file. Unless you have billions of files, you likely won't run out of memory.
Here is how I would approach this.
library(ape)
library(data.table)
stud_files <- list.files("path/dir/data",full.names = T)
sumfunc <- function(f) {
df <- read.table(f, header=TRUE, sep=";")
df_xts <- as.xts(df$cola, order.by = as.Date(df$colb,"%m/%d/%Y"))
pet <- testa(df_xts)
res <- data.table(estimate = pet$estimate,
p.value=pet$p.value,
logi = pet$alternative)
return(res)
}
lres <- lapply(stud_files, sumfunc)
dat <- rbindlist(lres)
write.table(dat,
file = "res_testa.csv",
sep = ",",
quote = FALSE,
row.names = FALSE)

Generating new output filenames in for-loop

I want to write many raster files using a for loop.
path <- "D:/FolderA/FolderB/FolderC/FolderD/"
files1 <- c("FolderE1/raster.tif",
"FolderE2/raster.tif",
"FolderE3/raster.tif")
files2 <- c("FolderF1/raster.tif",
"FolderF2/raster.tif",
"FolderF3/raster.tif")
for (i in 1:length(files1)) {
raster1 <- raster(paste(path, files1[i], sep = ""), band = 1)
is.na(raster1[[0]])
raster2 <- raster(paste(path, files2[i], sep = ""), band = 1)
is.na(raster2[[0]])
mosaicraster <- mosaic(raster1, raster2, fun = mean)
NAvalue(mosaicraster) <- 0
outputfile <- paste(path, "mosaics/", files1[i], sep = "")
writeRaster(mosaikraster, outputfile , type = "GeoTIFF", datatype = "INT1U", overwrite = TRUE)
print(c(i, "of", length(files1)))
}
How do I create for each file a new folder within "D:/FolderA/FolderB/FolderC/FolderD/mosaics/" which includes FolderE1/, E2/... etc. plus the filename, e.g. mosaic.tif ?
outputfile <- paste(path, "mosaics/", files1[i], sep = "")
Does not give a satisfying result.
Just to demonstrate one method of making folders within a loop: If you have the directories in an object just looping over the elements of that object.
folders1 <- c("FolderE1",
"FolderE2",
"FolderE3")
for(i in folders1)
{
dir.create(i) #creates a dir named after the ith element of folders1
setwd(i) #goes into that directory
tiff('raster.tif') #plots your picture
plot(rnorm(10,rnorm(10)))
dev.off()
setwd('../') #goes out to the original folder
}
Just a warning: this is all a bit dangerous because mistakes can make a big mess.

Reading multiple csv of same format in a data frame

I need to run the same set of code for multiple CSV files. I want to do it with the same with macro. Below is the code that I am executing, but results are not coming properly. It is reading the data in 2-d format while I need to run in 3-d format.
lf = list.files(path = "D:/THD/data", pattern = ".csv",
full.names = TRUE, recursive = TRUE, include.dirs = TRUE)
ds<-lapply(lf,read.table)
I dont know if this is going to be useful but one of the way I do is:
##Step 1 read files
mycsv = dir(pattern=".csv")
n <- length(mycsv)
mylist <- vector("list", n)
for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i],header = T)
then I useually just use apply function to change things, for example,
## Change coloumn name
mylist <- lapply(mylist, function(x) {names(x) <- c("type","date","v1","v2","v3","v4","v5","v6","v7","v8","v9","v10","v11","v12","v13","v14","v15","v16","v17","v18","v19","v20","v21","v22","v23","v24","total") ; return(x)})
## changing type coloumn for weekday/weekend
mylist <- lapply(mylist, function(x) {
f = c("we", "we", "wd", "wd", "wd", "wd", "wd")
x$type = rep(f,52, length.out = 365)
return(x)
})
and so on.
Then I save with this following code again after all the changes I made (it is also sometime useful to split original file name and rename each files to save with a part of file name so that I can track each individual files later)
## for example some of my file had a pattern in file name such as "201_E424220_N563500.csv",so I split this to save with a new name like this:
mylist <-lapply(1:length(mylist), function(i) {
mylist.i <- mylist[[i]]
s = strsplit(mycsv[i], "_" , fixed = TRUE)[[1]]
d = cbind(mylist.i[, c("type", "date")], ID = s[1], Easting = s[2], Northing = s[3], mylist.i[, 3:ncol(mylist.i)])
return(d)
})
for(i in 1:n)
write.csv(file = paste("file", i, ".csv", sep = ""), mylist[i], row.names = F)
I hope this will help. When you get some time pleaes read about the PLYR package as I am sure this will be very useful for you, it is a very useful package with lots of data analysis options. PLYR has apply functions such as:
## l_ply split list, apply function and discard result
## ldply split list, apply function and return result in data frame
## laply split list, apply function and return result in an array
for example you can use the ldply to read all your csv and return a data frame simething like:
data = ldply(list.files(pattern = ".csv"), function(fname) {
j = read.csv(fname, header = T)
return(j)
})
So here J will be your data frame with all your csv files data.
Thanks,Ayan

Resources