Saving data frames as .Rda files and loading them using loops - r

I have three data frames: sets, themes, and parts. I want to save each as a .Rda file, and then (to prove that they saved correctly) clear my workspace and load each of them.
Without a loop, this obviously works:
save(sets, file = "sets.Rda")
save(themes, file = "themes.Rda")
save(parts, file = "parts.Rda")
rm(list=ls())
load("sets.Rda")
load("themes.Rda")
load("parts.Rda")
Looping through this SEEMS like it should be straightforward, but I can't get it to work. I have a few ideas about what's the issue, but I can't work my way around them.
My thought is this:
DFs <- list("sets", "themes", "parts")
for(x in 1:length(DFs)){
dx <- paste(DFs[[x]], ".Rda", sep = "")
save(x, file = dx)
}
rm(list=ls())
DFs <- list("sets.Rda", "themes.Rda", "parts.Rda")
for(DF in DFs) {
load(DF)
}
I know that loading loop can work because when I save the files using the first (non-looping) bit of code, it loads them all properly. But something about saving them using the above loop makes it so that when I run the loading loop, I don't get what I want. Get one object, named "x" with a value of "3L". I don't get it.
Please help me out. I think the problem rests in the arguments of my save() function, but I am not sure what's up.

Here's a minimal reproducible example showing how to write data.frames as RDS files in a loop, and then read them back into the R environment in a loop:
# Make 3 dataframes as an example
iris1 <- iris2 <- iris3 <- iris
df_names <- c("iris1", "iris2", "iris3")
# Saving them
for (i in 1:length(df_names)) {
saveRDS(get(df_names[i]), paste0(df_names[i], ".RDS"))
}
# Confirm they were written
dir()
# [1] "iris1.RDS" "iris2.RDS" "iris3.RDS"
# Remove them
rm(iris1, iris2, iris3)
# Load them
for (i in 1:length(df_names)) {
assign(df_names[i], readRDS(paste0(df_names[i], ".RDS")))
}

Related

Creating a raster stack of algorithms using nested loops to iterate through all possible band combination

All,
I am not able to directly stack these images inside of the loop. My only work around was to write each results then stack the images after the loop was completed. I would like to find an apply function that would allow me to do the same thing and I need to find a way to stack these rasters inside the loop instead of creating intermediate data by saving the output.
Any insight into this process would appreciated!
# Create function
Loop <- function(X,B1,B2){
S2_ext <- extent(S2) # Create raster template
2BDA_Stack <- raster(S2_ext)
for(i in B1){
for(j in B2){
2BDA <- X[[i]]/X[[j]]
2BDA_Stack <- stack(2BDA_Stack, 2BDA)
# I WANT TO REMOVE THIS STEP
writeRaster(x = 2BDA,
filename= paste("...",'Chl_2BDA_',i,j),
format = "GTiff", # save as a tif
datatype='FLT4S', # save as a float
overwrite = T) #Overwrites same named file
}
}
}
#Run Function
2BDA_Loop(X=S2,B1=1:6,B2=1:6)

How to load .png images with image names listed in a .csv file to R

I am using a simple code below to append multiple images together with the R magick package. It works well, however, there are many images to process and their names are stored in a .csv file. Could anyone advise on how to load the image names to the image_read function from specific cells in a .csv file (see example below the code)? So far, I was not able to find anything appropriate that would solve this.
library (magick)
pic_A <- image_read('A.png')
pic_B <- image_read('B.png')
pic_C <- image_read('C.png')
combined <- c(pic_A, pic_B, pic_C)
combined <- image_scale(combined, "300x300")
image_info(combined)
final <- image_append(image_scale(combined, "x120"))
print(final)
image_write(final, "final.png") #to save
Something like this should work. If you load the csv into a dataframe then, it's then straightforward to point the image_read towards the appropriate elements.
And the index (row number) is included in the output filename so that things are not overwritten each iteration.
library (magick)
file_list <- read.csv("your.csv",header = F)
names(file_list) <- c("A","B","C")
for (i in 1:nrow(file_list)){
pic_A <- image_read(file_list$A[i])
pic_B <- image_read(file_list$B[i])
pic_C <- image_read(file_list$C[i])
combined <- c(pic_A, pic_B, pic_C)
combined <- image_scale(combined, "300x300")
image_info(combined)
final <- image_append(image_scale(combined, "x120"))
print(final)
image_write(final, paste0("final_",i,".png")) #to save
}

R: using foreach to read csv data and apply functions over the data and export back to csv

I have 3 csv files, namely file1.csv, file2.csv and file3.csv.
Now for each of the file, I would like to import the csv and perform some functions over them and then export a transformed csv. So , 3 csv in and 3 transformed csv out. And there are just 3 independent tasks. So I thought I can try to use foreach %dopar%. Please not that I am using a Window machine.
However, I cannot get this to work.
library(foreach)
library(doParallel)
library(xts)
library(zoo)
numCores <- detectCores()
cl <- parallel::makeCluster(numCores)
doParallel::registerDoParallel(cl)
filenames <- c("file1.csv","file2.csv","file3.csv")
foreach(i = 1:3, .packages = c("xts","zoo")) %dopar%{
df_xts <- data_processing_IMPORT(filenames[i])
ddates <- unique(date(df_xts))
}
IF I comment out the last line ddates <- unique(date(df_xts)), the code runs fine with no error.
However, if I include the last line of code, I received the following error below, which I have no idea to get around. I tried to add .export = c("df_xts").
Error in { : task 1 failed - "unused argument (df_xts)"
It still doesn't work. I want to understand what's wrong with my logic and how should I get around this ? I am just trying to apply simple functions over the data only, I still haven't transformed the data and export them separately to csv. Yet I am already stuck.
The funny thing is I have written the simple code below, which works fine. Within the foreach, a is just like the df_xts above, being stored in a variable and passed into Fun2 to process. And the code below works fine. But above doesn't. I don't understand why.
numCores <- detectCores()
cl <- parallel::makeCluster(numCores)
doParallel::registerDoParallel(cl)
# Define the function
Fun1=function(x){
a=2*x
b=3*x
c=a+b
return(c)
}
Fun2=function(x){
a=2*x
b=3*x
c=a+b
return(c)
}
foreach(i = 1:10)%dopar%{
x <- rnorm(5)
a <- Fun1(x)
tst <- Fun2(a)
return(tst)
}
### Output: No error
parallel::stopCluster(cl)
Update: I have found out that the issue is with the date function there to extract the number of dates within the csv file but I am not sure how to get around this.
The use of foreach() is correct. You are using date() in ddates <- unique(date(df_xts)) but this function returns the current system time as POSIX and does not require any arguments. Therefore the argument error is regarding the date() function.
So i guess you want to use as.Date() instead or something similar.
ddates <- unique(as.Date(df_xts))
I've run into the same issue about reading, modifying and writing several CSV files. I tried to find a tidyverse solution for this, and while it doesn't really deal with the date problem above, here it is -- how to read, modify and write, several csv files using map from purrr.
library(tidyverse)
# There are some sample csv file in the "sample" dir.
# First get the paths of those.
datapath <- fs::dir_ls("./sample", regexp = ("csv"))
datapath
# Then read in the data, such as it is a list of data frames
# It seems simpler to write them back to disk as separate files.
# Another way to read them would be:
# newsampledata <- vroom::vroom(datapath, ";", id = "path")
# but this will return a DF and separating it to different files
# may be more complicated.
sampledata <- map(datapath, ~ read_delim(.x, ";"))
# Do some transformation of the data.
# Here I just alter the column names.
transformeddata <- sampledata %>%
map(rename_all, tolower)
# Then prepare to write new files
names(transformeddata) <- paste0("new-", basename(names(transformeddata)))
# Write the csv files and check if they are there
map2(transformeddata, names(transformeddata), ~ write.csv(.x, file = .y))
dir(pattern = "new-")

Iterative naming for a list created in a loop

i wrote a loop:
for(a in 1:100){
Code
list <- list("test1"=test1,"test2"=test2)
save(list, file = paste(paste("test",a,sep="_"),".RData",sep=""))
}
The iterative naming of the saved file works well, but I have not figured out a way to do this the list. The Problem is, that if I load the file into R the objects are both called list and thus I have a problem.
I have tried mv(from = "list" , to = paste(paste("test",a,sep="_")) but it does not work.
Can anybody help me with this?
Indeed this is a tricky point, since save(eval(parse(text=paste0("list", a))), file = paste("test",a,".RData",sep="")) is not working for some reason, your best bet IMO would be to save one file only - which might be more convenient any way, and access the names of the objects in the list of lists:
test1 <- 1
test2 <- 2
mylist <- list()
for(a in 1:100){
#assign(paste0("list",a), list("test1"=test1,"test2"=test2), environment())
mylist[[a]] <- list("test1"=test1,"test2"=test2)
}
save(mylist, file = "mylist.RData")

Looping Over a Set of Files

I cooked up some code that is supposed to find all my .txt files (they're outputs of ODE simulations), open them all up as data frames with "read.table" and then perform some calculations on them.
files <- list.files(path="/Users/redheadmammoth/Desktop/Ultimate_Aging_F2016",
pattern=".txt",full.names=TRUE)
ldf <- lapply(files, read.table)
tuse <- seq(from=0,to=100,by=0.1)
for(files in ldf)
findR <- function(r){
with(files,(sum(exp(-r*age)*fecund*surv*0.1)-1)^2)
}
{
R0 <- with(files,(sum(fecund*surv*age)))
GenTime <- with(files,(sum(tuse*fecund*surv*0.1))/R0)
r <- optimize(f=findR,seq(-5,5,.0001),tol=0.00000001)$minimum
RV <- with(files,(exp(r*tuse)/surv)*(exp(-r*tuse)*(fecund*surv)))
plot(log(surv) ~ age,files,type="l")
tmp.lm <- lm(log(surv) ~ age + I(age^2),files) #Fit log surv to a quadratic
lines(files$age,predict(tmp.lm),col="red")
}
However, the problem is that it seems to only be performing the calculations contained in my "for" loop on one file, rather than all of them. I'd like it to perform the calculations on all of my files, then save all the files together as one big data frame so I can access the results of any particular set of my simulations. I suspect the error is that I'm not indexing the files correctly in order to loop over all of them.
How about using plyr::ldply() for this. It takes a list (in your case your list of files) and performs the same function on them and then returns a data frame.
The main thing to remember to do is create a column for the ID of each file you read in so you know which data comes from which file. The simplest way to do this is just to call it the file name and then you can edit it from there.
If you have additional arguments in your function they go after the function you want to use in ldply.
# create file list
files <- list.files(path="/Users/redheadmammoth/Desktop/Ultimate_Aging_F2016",
pattern=".txt",full.names=TRUE)
tuse <- seq(from=0,to=100,by=0.1)
load_and_edit <- function(file, tuse){
temp <- read.table(file)
# here put all your calculations you want to do on each file
temp$R0 <- sum(temp$fecund*temp$surv*temp*age)
# make a column for each file name so you know which data comes from which file
temp$id <- file
return(temp)
}
new_data <- plyr::ldply(list.files, load_and_edit, tuse)
This is the easiest way I have found to read in and wrangle multiple files in batch.
You can then plot each one really easily.

Resources