saving dataframe output to multiple folders - r

Pardon my newbieness, but I feel like I've arrived at my wit's end.
I have a (spatial polygon) dataframe (tri.extract) that houses all of my data. Every row in this dataframe corresponds to an image. Every image in the dataframe corresponds to a parcel and thus has an attribute parcel_id, which denotes which parcel does the image belong to. I wish to save all the images in sub-folders so that each image is in the folder of its respective parcel.
parcels <- data.frame(unique(tri.extract#data$parcel_id))
save.dir <- "/home/iordamo/Documents/GIS_Workload/bbox/DemoGrasslandTIMED_END_ImagesMapillary/"
#create sub-folders named after parcel_ids
for (i in 1:nrow(parcels)){
dir.create(paste0(save.dir,parcels[i,]))
}
#the save loop itself
for (i in 1:nrow(tri.extract#data)){
#generate URLs for each image in the dataframe
img_url<-paste0('https://d1cuyjsrcm0gby.cloudfront.net/',
tri.extract#data$key[i],
'/thumb-2048.jpg')
#create a dataframe of all the folder names within save.dir - the parcels
dirs.to.save1 <- data.frame(list.files(save.dir, recursive = F))
dirs.to.save1[] <- lapply(dirs.to.save1, as.character)
for (g in 1:nrow(dirs.to.save1)){
if (g==1){
row <- dirs.to.save1[g,]
#print(row)
img_path <- file.path(paste0(save.dir,row, "/"), paste0("i_",tri.extract#data$key[i], ".jpg"))
download.file(img_url, img_path, quiet=TRUE, mode="wb")
#next
}
else if (g>1){
row <- dirs.to.save1[g,]
#print(row)
img_path <- file.path(paste0(save.dir,row, "/"), paste0("i_",tri.extract#data$key[i], ".jpg"))
download.file(img_url, img_path, quiet=TRUE, mode="wb")
#next
}
}
}
With the code in its current form all of the images get saved in all of the sub-folders. Can anyone explain why? To my understanding I am looping through each record of the dataframe (tri.extract), generating a URL, then (in the first nested loop) loop through each parcel and create a file.path from the save.dir and each row in the dirs.to.save1 dataframe and the respective image id (tri.extract#data$key[i]). And this should output in each respective folder because I am looping through them in the nested loop. Can someone explain to me where my logic fails to be translated to execution?

Ok, that wasn't too hard.
Solution turned out be, as usual, simpler than what I originally conjured up:
for (i in 1:nrow(tri.extract#data)){
img_url<-paste0('https://d1cuyjsrcm0gby.cloudfront.net/',
tri.extract#data$key[i],
'/thumb-2048.jpg')
for (g in 1:nrow(parcels)){
row <- droplevels(parcels[g,])
if(tri.extract#data$parcel_id[i] == parcels[g,]) {
img_path <- file.path(paste0(save.dir,row), paste0("i_",tri.extract#data$key[i], ".jpg"))
download.file(img_url, img_path, quiet=TRUE, mode="wb")
}}
}

Related

How to load .png images with image names listed in a .csv file to R

I am using a simple code below to append multiple images together with the R magick package. It works well, however, there are many images to process and their names are stored in a .csv file. Could anyone advise on how to load the image names to the image_read function from specific cells in a .csv file (see example below the code)? So far, I was not able to find anything appropriate that would solve this.
library (magick)
pic_A <- image_read('A.png')
pic_B <- image_read('B.png')
pic_C <- image_read('C.png')
combined <- c(pic_A, pic_B, pic_C)
combined <- image_scale(combined, "300x300")
image_info(combined)
final <- image_append(image_scale(combined, "x120"))
print(final)
image_write(final, "final.png") #to save
Something like this should work. If you load the csv into a dataframe then, it's then straightforward to point the image_read towards the appropriate elements.
And the index (row number) is included in the output filename so that things are not overwritten each iteration.
library (magick)
file_list <- read.csv("your.csv",header = F)
names(file_list) <- c("A","B","C")
for (i in 1:nrow(file_list)){
pic_A <- image_read(file_list$A[i])
pic_B <- image_read(file_list$B[i])
pic_C <- image_read(file_list$C[i])
combined <- c(pic_A, pic_B, pic_C)
combined <- image_scale(combined, "300x300")
image_info(combined)
final <- image_append(image_scale(combined, "x120"))
print(final)
image_write(final, paste0("final_",i,".png")) #to save
}

R: save each loop result into one data frame

I have written a loop in R (still learning). My purpose is to pick the max AvgConc and max Roll_TotDep from each looping file, and then have two data frames that each contains all the max numbers picked from individual files. The code I wrote only save the last iteration results (for only one single file)... Can someone point me a right direction to revise my code, so I can append the result of each new iteration with previous ones? Thanks!
data.folder <- "D:\\20150804"
files <- list.files(path=data.folder)
for (i in 1:length(files)) {
sub <- read.table(file.path(data.folder, files[i]), header=T)
max1Conc <- sub[which.max(sub$AvgConc),]
maxETD <- sub[which.max(sub$Roll_TotDep),]
write.csv(max1Conc, file= "max1Conc.csv", append=TRUE)
write.csv(maxETD, file= "maxETD.csv", append=TRUE)
}
The problem is that max1Conc and maxETD are not lists data.frames or vectors (or other types of object capable of storing more than one value).
To fix this:
maxETD<-vector()
max1Conc<-vector()
for (i in 1:length(files)) {
sub <- read.table(file.path(data.folder, files[i]), header=T)
max1Conc <- append(max1Conc,sub[which.max(sub$AvgConc),])
maxETD <- append(maxETD,sub[which.max(sub$Roll_TotDep),])
write.csv(max1Conc, file= "max1Conc.csv", append=TRUE)
write.csv(maxETD, file= "maxETD.csv", append=TRUE)
}
The difference here is that I made the two variables you wish to write out empty vectors (max1Conc and maxETD), and then used the append command to add each successive value to the vectors.
There are more idiomatic R ways of accomplishing your goal; personally, I suggest you look into learning the apply family of functions. (http://adv-r.had.co.nz/Functionals.html)
I can't directly test the whole thing because I don't have a directory with files like yours, but I tested the parts, and I think this should work as an apply-driven alternative. It starts with a pair of functions, one to ingest a file from your directory and other to make a row out of the two max values from each of those files:
library(dplyr)
data.folder <- "D:\\20150804"
getfile <- function(filename) {
sub <- read.table(file.path(data.folder, filename), header=TRUE)
return(sub)
}
getmaxes <- function(df) {
rowi <- data.frame(AvConc.max = max(df[,"AvConc"]), ETD.max = max(df[,"ETD"]))
return(rowi)
}
Then it uses a couple of rounds of lapply --- embedded in piping courtesy ofdplyr --- to a) build a list with each data set as an item, b) build a second list of one-row data frames with the maxes from each item in the first list, c) rbind those rows into one big data frame, d) and then cbind the filenames to that data frame for reference.
dfmax <- lapply(as.list(list.files(path = data.folder)), getfiles) %>%
lapply(., getmaxes) %>%
Reduce(function(...) rbind(...), .) %>%
data.frame(file = list.files(path = data.folder), .)

file.show and user input in a loop

I have a dataframe data with information on tiffs, including one column txt describing the content of the tiff. Unfortunately, txt is not always correct and we need to correct them by hand. Therefore I want to loop over each row in data, show the tiff and ask for feedback, which is than put into data$txt.cor.
setwd(file.choose())
Some test tiffs (with nonsene inside, but to show the idea...):
txt <- sample(100:199, 5)
for (i in 1:length(txt)){
tiff(paste0(i, ".tif"))
plot(txt[i], ylim = c(100, 200))
dev.off()
}
and the dataframe:
pix.files <- list.files(getwd(), pattern = "*.tif", full.names = TRUE)
pix.file.info <- file.info(pix.files)
data <- cbind(txt, pix.file.info)
data$file <- row.names(pix.file.info)
data$txt.cor <- ""
data$txt[5] <- 200 # wrong one
My feedback function (error handling stripped):
read.number <- function(){
n <- readline(prompt = "Enter the value: ")
n <- as.character(n) #Yes, character. Sometimes we have alphanumerical data or leading zeros
}
Now the loop, for which help would be very much appreciated:
for (i in nrow(data)){
file.show(data[i, "file"]) # show the image file
data[i, "txt.cor"] <- read.number() # aks for the feedback and put it back into the dataframe
}
In my very first attempts I was thinking of the plot.lm idea, where you go through the diagnostic plots after pressing return. I suspect that plot and tiffs are not big friends. file.show turned out to be easier. But now I am having a hard time with that loop...
Your problem is that you don't loop over the data, you only evaluate the last row. Simply write 1:nrow(data)to iterate over all rows.
To display your tiff images in R you can use the package rtiff:
library(rtiff)
for (i in 1:nrow(data)){
tif <- readTiff(data[i,"file"]) # read in the tiff data
plot(tif) # plot the image
data[i, "txt.cor"] <- read.number() # aks for the feedback and put it back into the dataframe
}

R Programming: Difficulty removing NAs from frame when using lapply

Full disclosure: I am taking a Data Science course on Coursera. For this particular question, we need to calculate the mean of some pollutant data that is being read in from multiple files.
The main function I need help with also references a couple other functions that I wrote in the script. For brevity, I'm just going to list them and their purpose:
boundIDs: I use this to bound the input so that inputs won't be accepted that are out of range. (range is 1:332, so if someone enters 1:400 this changes the range to 1:332)
pollutantToCode: converts the pollutant string entered to that pollutant's column number in the data file
fullFilePath - Creates the file name and appends it to the full file path. So if
someone states they need the file for ID 1 in directory
"curse/your/sudden/but/inevitable/betrayal/", the function will return
"curse/your/sudden/but/inevitable/betrayal/001.csv" to be added to
the file list vector.
After all that, the main function I'm working with is:
pollutantmean <- function(directory = "", pollutant, id = 1:332){
id <- boundIDs(id)
pollutant <- pollutantToCode(pollutant)
numberOfIds <- length(id)
fileList <- character(numberOfIds)
for (i in 1:numberOfIds){
if (id[i] > 332){
next
}
fileList[i] <- fullFilePath(directory, id[i])
}
data <- lapply(fileList, read.csv)
print(data[[1]][[pollutant]])
}
Right now, I'm intentionally printing only the first frame of data to see what my output looks like. To remove the NAs I've tried using:
data <- lapply(fileList, read.csv)
data <- data[!is.na(data)]
But the NAs remained, so then I tried computing the mean directly and using the na.rm parameter:
print(mean(data[[1]][[pollutant]], na.rm = TRUE))
But the mean was still "NA". Then I tried na.omit:
data <- lapply(fileList, na.omit(read.csv))
...and unfortunately the problem persisted.
Can someone please help? :-/
(PS: Right now I'm just focusing on the first frame of whatever is read in, i.e. data[[1]], since I figure if I can't get it for the first frame there's no point in iterating over the rest.)

Building a mean across several csv files

I have an assignment on Coursera and I am stuck - I do not necessarily need or want a complete answer (as this would be cheating) but a hint in the right direction would be highly appreciated.
I have over 300 CSV files in a folder (named 001.csv, 002.csv and so on). Each contains a data frame with a header. I am writing a function that will take three arguments: the location of the files, the name of the column you want to calculate the mean (inside the data frames) and the files you want to use in the calculation (id).
I have tried to keep it as simple as possible:
pm <- function(directory, pollutant, id = 1:332) {
setwd("C:/Users/cw/Documents")
setwd(directory)
files <<- list.files()
First of all, set the wd and get a list of all files
x <- id[1]
x
get the starting point of the user-specified ID.
Problem
for (i in x:length(id)) {
df <- rep(NA, length(id))
df[i] <- lapply(files[i], read.csv, header=T)
result <- do.call(rbind, df)
return(df)
}
}
So this is where I am hitting a wall: I would need to take the user-specified input from above (e.g. 10:25) and put the content from files "010.csv" through "025.csv" into a dataframe to actually come up with the mean of one specific column.
So my idea was to run a for-loop along the length of id (e.g. 16 for 10:25) starting with the starting point of the specified id. Within this loop I would then need to take the appropriate values of files as the input for read.csv and put the content of the .csv files in a dataframe.
I can get single .csv files and put them into a dataframe, but not several.
Does anybody have a hint how I could procede?
Based on your example e.g. 16 files for 10:25, i.e. 010.csv, 011.csv, 012.csv, etc.
Under the assumption that your naming convention follows the order of the files in the directory, you could try:
csvFiles <- list.files(pattern="\\.csv")[10:15]#here [10:15] ... in production use your function parameter here
file_list <- vector('list', length=length(csvFiles))
df_list <- lapply(X=csvFiles, read.csv, header=TRUE)
names(df_list) <- csvFiles #OPTIONAL: if you want to rename (later rows) to the csv list
df <- do.call("rbind", df_list)
mean(df[ ,"columnName"])
These code snippets should be possible to pimp and incorprate into your routine.
You can aggregate your csv files into one big table like this :
for(i in 100:250)
{
infile<-paste("C:/Users/cw/Documents/",i,".csv",sep="")
newtable<-read.csv(infile)
newtable<-cbind(newtable,rep(i,dim(newtable)[1]) # if you want to be able to identify tables after they are aggregated
bigtable<-rbind(bigtable,newtable)
}
(you will have to replace 100:250 with the user-specified input).
Then, calculating what you want shouldn't be very hard.
That won't works for files 001 to 099, you'll have to distinguish those from the others because of the "0" but it's fixable with little treatment.
Why do you have lapply inside a for loop? Just do lapply(files[files %in% paste0(id, ".csv")], read.csv, header=T).
They should also teach you to never use <<-.

Resources