saving and naming files in R automatically based on input filename

saving and naming files in R automatically based on input filename - r

I have generated several Utilisation Distributions (UD) with AdehabitatHR and stored them as Geotiffs. I am now using the same UDs with the Lattice package to generate some maps and saving them to a high-res tiff image with LZW compression. Problem is that I have literally hundreds of maps to make, save and name. Is there a way automatically do this once i have loaded all the necessary files from a directory? Each one of my UDs has the following structure of the filename "UD_resolution_species_area_year_season. tif" and in the final name I give to my map I would like to keep the same structure (or entire filename) but add the prefix "blablabla_" e.g. "blablabla_UD_resolution_species_area_year_season.tiff". The image also include a main name, a capital letter, which should also change.
At the moment I am using the following:
rlist = list.files(getwd(), pattern = "tif$", full.names = FALSE)
for (i in rlist) {
assign(unlist(strsplit(i, "[.]"))[1], raster(i))
}
shplist = list.files(getwd(), pattern = "shp$", full.names = FALSE)
for (i in shplist) {
assign(unlist(strsplit(i, "[.]"))[1], readOGR(i))
}
UD <- 'UD_resolution_species_area_year_season'
ext <- extent(UD) + 0.3 # set the extent for the plot
aa <-
quantile(UD,
probs = c(0.25, 0.75),
type = 8,
names = TRUE)
my.at <- c(aa[1], aa[2])
my.at <- round(my.at, 3)
maxval <- maxValue(UD)
tiff(
"C:/myworkingdirectory/maps/blablabla_UD_resolution_species_area_year_season.tiff",
res = 600,
compression = "lzw",
width = 15,
height = 15,
units = "cm"
)
levelplot(
UD,
xlab = "",
ylab = "",
xlim = c(ext[1], ext[2]),
ylim = c(ext[3], ext[4]),
margin = FALSE,
contour = FALSE,
col.regions = viridis(1000),
colorkey = list(at = seq(0, maxval)),
main = "A",
maxpixels = 2e5
) + latticeExtra::layer(sp.polygons(Land, fill = "grey50", col = NA)) + contourplot(
`UD`,
at = my.at[1],
labels = FALSE,
margin = FALSE,
lty = 2,
col = "orange",
pretty = TRUE
) + contourplot(
UD,
at = my.at[2],
labels = FALSE,
margin = FALSE,
lty = 2,
col = "red",
pretty = TRUE,
)
dev.off()

It is a common beginners mistake to use assign. Do not use it, it creates the type of trouble you are now facing. In stead, you can make lists and/or use a loop.
Also what you are asking is basic R stuff, but you are complicating the question with adding lots of irrelevant detail about setting the extent, and levelplot. It is better to learn about doing these basic things by removing the clutter and focus on a simple case first. That is also how you should write questions for this forum.
In essence you have a bunch of files you want to process. Below I show how you can loop over a vector of the names and then loop and do what you need to do in that loop.
library(raster)
rastfiles <- list.files(pattern = "tif$", full.names=TRUE)
outputfiles <- file.path("output/path", paste0("prefix_", basename(rastfiles)))
for (i in 1:length(rastfiles))
r <- raster(rastfiles[i])
png(outputfiles[i])
plot(r)
dev.off()
}
You can also first read all the files into a list
rastfiles <- list.files(pattern = "tif$", full.names=TRUE)
rlist <- lapply(rastfiles, raster)
names(rlist) <- gsub(".tif$", "", basename(rastfiles))
rastfiles <- list.files(pattern = "shp$", full.names=TRUE)
slist <- lapply(shpfiles, readOGR)
names(slist) <- gsub(".shp$", "", basename(shpfiles))
And perhaps create a vector of output filenames
outputtif <- file.path("output/dir", basename(rastfiles))
And then loop over the items in the list, or the output filenames

Related

Raster package eating up RAM while rasterizing

I'm using the raster package on an R server to process a large set (30000 files) of data files (10MB each).
For now, processing consists of parsing the data and subsequently rasterizing it via the rasterize function.
The data is very sparse (only along roads) but has a high resolution and large extent. I've seen temporary files of 30GB for a raster created from one of the input files.
Because of the amount of files I'm using a foreach() %dopar% approach to processing the files, giving one file to each thread. I've set the raster options as follows:
rasterOptions(maxmemory = 15000000000)
rasterOptions(chunksize = 14000000000)
rasterOptions(todisk = TRUE)
This should come out to 15GB/thread * 32 threads = 480GB of RAM used at maximum for the rasters. Add some overhead and I would expect somewhere between 10GB to 20GB of the 512GB RAM to remain. However, that is not the case and I can't seem to figure out why.
R gobbles up RAM until only 100MB to 2GB remain and only then seems to release previously allocated memory, only to be fed straight back into R for the next raster. I checked the RAM usage repeatedly over several hours to observe this.
I'm using SpatialPolygonDataFrames as input for rasterize, and suspected they might take up a lot of RAM as well. But when I checked their size, they were rather small, at about 100MB. Playing around with maxmemory, chunksize and only 16 threads also didn't seem to have any effect.
I also looked at the rasterize source code to see if I find an explanation there, but that didn't get me far:
setMethod('rasterize', signature(x='SpatialPoints', y='Raster'),
function(x, y, field, fun='last', background=NA, mask=FALSE, update=FALSE, updateValue='all', filename="", na.rm=TRUE, ...){
.pointsToRaster(x, y, field=field, fun=fun, background=background, mask=mask, update=update, updateValue=updateValue, filename=filename, na.rm=na.rm, ...)
}
)
I have no clue where to find .pointsToRaster
Does anyone have an explanation for this behaviour or ideas for things to check? Did I simply overlook something? I´d like to not use the entire RAM so that other users can still work on the server. From what I understand my code should regulate how much RAM is used.
Here's the code I use:
library('iterators')
library('parallel')
library('foreach')
library('doParallel')
#init parallelisation
nCores = 32
cCluster = makeCluster(nCores, type = "FORK", outFile = "parseProcess")
registerDoParallel(cCluster)
foreach(j = 1:length(fileList)) %dopar%{
#load all libraries for every thread
library('sp')
library('raster')
library('spatial')
library('gstat')
library('rgdal')
library('dismo')
library('deldir')
library('rgeos')
library('sjmisc')
#set rasteroptions per thread
rasterOptions(maxmemory = 15000000000)
rasterOptions(chunksize = 14000000000)
rasterOptions(todisk = TRUE)
tmpFolder = paste0("[PATH TO STORAGE]/rtmp",j)
dir.create(tmpFolder)
rasterOptions(tmpdir = tmpFolder)
#generate names for raster files
fileName = basename(fileList[j])
print(paste("Processing:", fileName))
rNameMax0 = sub(pattern = ".bin", replacement = "_scan0_max.tif", fileName)
#repeat this for all 11 scans
rasterStorage = "[PATH TO OTHER STORAGE]" #path to raster folder
scanList = parseFile(fileList[j]) #any memory allocated in this functions should be released on function return
#create template raster
bounds = as.vector(t(bbox(scanList$scan0)))
resolution = c(0.0000566, 0.0000359)
tmp = raster(xmn = bounds[1], xmx = bounds[2], ymn = bounds[3], ymx = bounds[4], res = resolution)
#create rasters from data
coordinates(scanList$scan0) = ~Long+Lat
proj4string(scanList$scan0) = WGS84CRS
rScanMax0 = rasterize(scanList$scan0, tmp, fun = 'max', filename = paste0(rasterStorage, rNameMax0))
rm('rScanMax0')
#repeat for scans 1 to 4
removeTmpFiles(h = 0.2)
unlink(tmpFolder, recursive = TRUE, force = TRUE)
dir.create(tmpFolder)
rasterOptions(tmpdir = tmpFolder)
coordinates(scanList$scan5) = ~Long+Lat
proj4string(scanList$scan5) = WGS84CRS
rScanMax5 = rasterize(scanList$scan5, tmp, fun = 'max', filename = paste0(rasterStorage, rNameMax5))
rm('rScanMax5')
#repeat for scans 6 to 10
removeTmpFiles(h = 0.2)
unlink(tmpFolder, recursive = TRUE, force = TRUE)
}
stopCluster(cCluster)
Here's the (gutted) code of the parseFile function:
parseFile = function(fileName){
con = file(fileName, "rb")
intSize = 4
fileEndian = "little"
#create data frames for each scan
scan0 = data.frame(matrix(ncol = n1, nrow = 0))
colnames(scan0) = c("Lat", "Long", ...)
scan1 = data.frame(matrix(ncol = n2, nrow = 0))
colnames(scan1) = c("Lat", "Long", ...)
scan2 = data.frame(matrix(ncol = n3, nrow = 0))
colnames(scan2) = c("Lat", "Long", ...)
scan3 = data.frame(matrix(ncol = n4, nrow = 0))
colnames(scan3) = c("Lat", "Long", ...)
scan4 = data.frame(matrix(ncol = n5, nrow = 0))
colnames(scan4) = c("Lat", "Long", ...)
scan5 = data.frame(matrix(ncol = n6, nrow = 0))
colnames(scan5) = c("Lat", "Long", ...)
scan6 = data.frame(matrix(ncol = n7, nrow = 0))
colnames(scan6) = c("Lat", "Long", ...)
scan7 = data.frame(matrix(ncol = n8, nrow = 0))
colnames(scan7) = c("Lat", "Long", ...)
scan8 = data.frame(matrix(ncol = n9, nrow = 0))
colnames(scan8) = c("Lat", "Long", ...)
scan9 = data.frame(matrix(ncol = n10, nrow = 0))
colnames(scan9) = c("Lat", "Long", ...)
scan10 = data.frame(matrix(ncol = n11, nrow = 0))
colnames(scan10) = c("Lat", "Long", ...)
header = readBin(con, raw(), n = 36)
i = 1
while(i){
blockHeader = readBin(con, integer(), n = 3, size = intSize, endian = fileEndian)
if(...){ #check whether file ended
break
}
i = i + 1
#sort data to correct scan, assign GPS tag
blockTrailer = readBin(con, raw(), n = 8)
}
#clean up
close(con)
#return parsed data
returnList = list("scan0" = scan0, "scan1" = scan1, "scan2" = scan2, "scan3" = scan3, "scan4" = scan4,
"scan5" = scan5, "scan6" = scan6, "scan7" = scan7, "scan8" = scan8, "scan9" = scan9, "scan10" = scan10)
return(returnList)
}
I'm also looking at the solutions posted here as another approach, but I'd still like to know why my code doesn't work as I expect it to.

plot_generic() function in plot_roc_components not found in R

I am using plot_roc_components function from rmda package. The definition of it has plot_generic() function. But, I am not able to find definition of this function. Why is it so?
The reason for it to see if there is an option for legend.size(). plot_roc_components gives me figure, however, I want to change the legend size. There is an option for legend.position, but not for its font size.
Could you please explain?
Thanks!

https://github.com/mdbrown/rmda/blob/57553a4cf5b6972176a0603b412260e367147619/R/plot_functions_sub.R
You were looking in one file but it was defined in another file.
plot_generic<- function(xx, predictors, value, plotNew,
standardize, confidence.intervals,
cost.benefit.axis = TRUE, cost.benefits, n.cost.benefits,
cost.benefit.xlab, xlab, ylab,
col, lty, lwd,
xlim, ylim, legend.position,
lty.fpr = 2, lty.tpr = 1,
tpr.fpr.legend = FALSE,
impact.legend = FALSE,
impact.legend.2 = FALSE,
population.size = 1000,
policy = policy, ...){
## xx is output from get_DecisionCurve,
## others are directly from the function call
#save old par parameters and reset them once the function exits.
old.par<- par("mar"); on.exit(par(mar = old.par))
xx.wide <- reshape::cast(xx, thresholds~model, value = value, add.missing = TRUE, fill = NA)
xx.wide$thresholds <- as.numeric(as.character(xx.wide$thresholds))
if(is.numeric(confidence.intervals)){
val_lower <- paste(value, "lower", sep = "_")
val_upper <- paste(value, "upper", sep = "_")
xx.lower <- cast(xx, thresholds~model, value = val_lower, add.missing = TRUE, fill = NA)
xx.upper <- cast(xx, thresholds~model, value = val_upper, add.missing = TRUE, fill = NA)
xx.lower$thresholds <- as.numeric(as.character(xx.lower$thresholds))
xx.upper$thresholds <- as.numeric(as.character(xx.upper$thresholds))
}
# adjust margins to add extra x-axis
if(cost.benefit.axis) par(mar = c(7.5, 4, 3, 2) + 0.1)
#set default ylim if not provided
#initial call to plot and add gridlines

Function input used as string

saving_ggplot <- function(name = 'default', plotname = last_plot()) {
image_name = paste(name, ".png", sep="")
ggsave(image_name, plot = plotname,
scale = 1,
dpi = 300, limitsize = TRUE)
}
This is my function which saves a ggplot. However, I for the life of me cannot figure out how to take the name argument as a string.
for example if someone runes saving_ggplot(FILENAME, PLOTNAME)
it will just say no object FILENAME. In python I can just capture it and use it as str(), but using as.character or toString in R still doesn't work.
Error:
saving_ggplot(weightvsageTEST, weightvsageplot)
Error in paste(name, ".png", sep = "") :
object 'weightvsageTEST' not found
Successful call using ggsave:
ggsave('weightvsage.png', plot = last_plot(),
scale = 1,
dpi = 300, limitsize = TRUE)

You can use substitute():
saving_ggplot <- function(name, plotname) {
image_name = paste0(substitute(name), ".png") # paste0 removes need for sep arg
ggsave(image_name, plot = plotname,
scale = 1,
dpi = 300, limitsize = TRUE)
}
saving_ggplot(foo, p) # saves foo.png
Alternately, if you want to stay within tidyverse quasiquotation syntax, use enexpr() instead:
enexpr(name) # instead of substitute(name)
Data:
N <- 100
df <- data.frame(x=rnorm(n=N), y=rnorm(n=N))
p <- ggplot(df, aes(x,y)) + geom_smooth()

Saving multiply pdf plots r

I have made a loop for making multiply plots, however i have no way of saving them, my code looks like this:
#----------------------------------------------------------------------------------------#
# RING data: Mikkel
#----------------------------------------------------------------------------------------#
# Set working directory
setwd()
#### Read data & Converting factors ####
dat <- read.table("Complete RING.txt", header =TRUE)
str(dat)
dat$Vial <- as.factor(dat$Vial)
dat$Line <- as.factor(dat$Line)
dat$Fly <- as.factor(dat$Fly)
dat$Temp <- as.factor(dat$Temp)
str(dat)
datSUM <- summaryBy(X0.5_sec+X1_sec+X1.5_sec+X2_sec+X2.5_sec+X3_sec~Vial_nr+Concentration+Sex+Line+Vial+Temp,data=dat, FUN=sum)
fl<-levels(datSUM$Line)
colors = c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3")
meltet <- melt(datSUM, id=c("Concentration","Sex","Line","Vial", "Temp", "Vial_nr"))
levels(meltet$variable) <- c('0,5 sec', '1 sec', '1,5 sec', '2 sec', '2,5 sec', '3 sec')
meltet20 <- subset(meltet, Line=="20")
meltet20$variable <- as.factor(meltet20$variable)
AllConcentrations <- levels(meltet20$Concentration)
for (i in AllConcentrations) {
meltet.i <- meltet20[meltet20$Concentration ==i,]
quartz()
print(dotplot(value~variable|Temp, group=Sex, data = meltet.i ,xlab="Time", ylab="Total height pr vial [mm above buttom]", main=paste('Line 20 concentration ', meltet.i$Concentration[1]),
key = list(points = list(col = colors[1:2], pch = c(1, 2)),
text = list(c("Female", "Male")),
space = "top"), col = colors, pch =c(1, 2))) }
I have tried with the quartz.save function, but that just overwrites the files. Im using a mac if that makes any difference.

When I want to save multiple plots in a loop I tend to do something like...
for(i in AllConcentrations){
meltet.i <- meltet20[meltet20$Concentration ==i,]
pdf(paste("my_filename", i, ".pdf", sep = ""))
dotplot(value~variable|Temp, group=Sex, data = meltet.i ,xlab="Time", ylab="Total height pr vial [mm above buttom]", main=paste('Line 20 concentration ', meltet.i$Concentration[1]),
key = list(points = list(col = colors[1:2], pch = c(1, 2)),
text = list(c("Female", "Male")),
space = "top"), col = colors, pch =c(1, 2))
dev.off()
}
This will create a pdf file for every level in AllConcentrations and save it in your working directory. It will paste together my_filename, the number of the iteration i, and then .pdf together to make each file unique. Of course, you will want to adjust height and width in the pdf function.

Problems saving several pdf files in R

I'm trying to save several xyplots created with a "for" loop in R and I'm not able to get complete pdf files (all files have the same size and I can't open them) if I execute the following loop:
for (i in 1:length(gases.names)) {
# Set ylim;
r_y <- round(range(ratio.cal[,i][ratio.cal[,i]<999], na.rm = T), digits = 1);
r_y <- c(r_y[1]-0.1, r_y[2]+0.1);
outputfile <- paste (path, "/cal_ratio_",gases.names[i], ".pdf", sep="");
dev.new();
xyplot(ratio.cal[,i] ~ data.GC.all$data.time, groups = data.vial, panel =
panel.superpose, xlab = "Date", ylab = gases.names[i], xaxt="n", ylim = r_y);
savePlot(filename = outputfile, type = 'pdf', device = dev.cur());
dev.off();
}
(a previous version was using trellis.device() instead of dev.new() + savePlot())
Do you know why I can't get good pdf files? If I do it "manually" it works... Any idea?

use pdf directly
for (i in seq_along(gases.names)) {
# Set ylim
r_y <- round(range(ratio.cal[,i][ratio.cal[,i]<999], na.rm = T), digits = 1)
r_y <- c(r_y[1]-0.1, r_y[2]+0.1)
outputfile <- paste (path, "/cal_ratio_",gases.names[i], ".pdf", sep="")
pdf(file = outputfile, width = 7, height = 7)
print(xyplot(ratio.cal[,i] ~ data.GC.all$data.time, groups = data.vial,
panel = panel.superpose, xlab = "Date", ylab = gases.names[i],
xaxt="n", ylim = r_y))
dev.off()
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

saving and naming files in R automatically based on input filename - r

Related

Raster package eating up RAM while rasterizing

plot_generic() function in plot_roc_components not found in R

Function input used as string

Saving multiply pdf plots r

Problems saving several pdf files in R

Categories

Resources