Average rasters based on their names in R - r

I would like to average several rasters by months. The files are named as follows:
africa_arc.20160101
africa_arc.20160102
africa_arc.20160103
Here is my current code, its not grabbing the files by their names:
require(raster)
require(maptools)
require(rgdal)
iDir <- "D:/RainfallFTP_unzipped"
oDir <- "D:/RainfallFTP_unzipped_AVG"
yrLs <- c("2016", "2017")
mthLs <- c(paste0("0",1:9), 10:12)
dyLs <- seq(as.Date('2016-01-01'),as.Date('2017-12-31'),by = 1)
dLs <- format(dyLs,"%d")
for (yr in yrLs){
for (mth in mthLs){
r.stk <- stack(paste0(iDir, "/", "africa_arc.", yr, mth, dLs, ".tif", sep=""))
}
}
Where am I going wrong?

I think you should always avoid (double) loops when possible.
Here's a simple solution, with only a couple of lines:
library (raster)
iDir <- "D:/RainfallFTP_unzipped/"
# list rasters in iDir
rs <- list.files(iDir,full.names = T)
# extract year-month and use lapply to grep and stack
r.stk <- lapply(sub('.+(\\d{6})\\d{2}','\\1',rs),function(x) stack(grep(x,rs,value = T)))
This will leave you with a list with each element being a stack of rasters for each year-month combination.
Please note that this assumes your filenames are as in your examples (so they end in the date without extension) and the rasters are located in iDir, not in any sub-directories of iDir.
If your rasters have a tif extension, as I think they have looking at your code, you just have to substitute the sub call within lapply to
sub('.+(\\d{6}).+tif$','\\1',rs)
to account for the file ending.

Related

read multiple ENVI files and combine them in one csv

I'm fairly new in working with R but trying to get this done. I have dozens of ENVI spectral datasets stored in a directory. Each dataset is seperated into two files. They all have the same name convention, i.e.:
ID_YYYYMMDD_350-200nm.asr
ID_YYYYMMDD_350-200nm.hdr
The task is to read the dataset, add two columns (ID and date from filename), and store the results in a *.csv-file. I got this to work for a single file (hardcoded).
library(caTools)
setwd("D:/some/path/software_scripts")
### filename without extension
name <- "011a_20100509_350-2500nm"
### split filename in area-id and date
flaeche<-substr(name, 0, 4)
date <- as.Date((substr(name,6,13)),"%Y%m%d")
### get values from ENVI-file in a matrix
spectrum <- read.ENVI(paste(name,".esl", sep = ""), headerfile=paste(name,".hdr", sep=""))
### add columns
spectrum <- cbind(Flaeche=flaeche,Datum=as.character(date),spectrum)
### CSV-Dataset with all values
write.csv(spectrum, file = name,".csv", sep=",")
I want to combine all available files into one *.csv file. I know that I've to use list.files but have no idea, how to implement the read.ENVI function and add the resulting matrices ongoing to CSV.
Update:
library(caTools)
setwd("D:/some/path/mean")
files <- list.files() # change or leave totally empty if setwd() put you in the right spot
all_names <- sub("^([^.]*).*", "\\1", files) # strip off extensions
name <- unique(all_names) # get rid of duplicates from .esl and .hdr
# wrap your existing code in a function
mungeENVI <- function(name) {
# split filename in area-id and date
flaeche<-substr(name, 0, 4)
date <- as.Date((substr(name,6,13)),"%Y%m%d")
# get values from ENVI-file in a matrix
spectrum <- read.ENVI(paste(name,".esl", sep = ""), headerfile=paste(name,".hdr", sep=""))
# add columns
spectrum <- cbind(Flaeche=flaeche,Datum=as.character(date),spectrum)
return(spectrum)
}
# use lapply to 'loop' over each name
list_of_ENVIs <- lapply(name, mungeENVI) # returns a list
# use do.call(rbind, x) to turn it into a big data.frame
final_df <- do.call(rbind, list_of_ENVIs)
# now write output
write.csv(final_df, "all_results.csv")
you can find a sample dataset here: Sample dataset
I work with a lot of lab data where I can rely on the output files being in a reliable format (same column order, column name, header format, etc). So this is assuming that the .ENVI files you have are similar to that. If your files are not like that, I'm happy to help with that too, I'd just need to see a dummy file or two.
Anyways here's the idea:
library(caTools)
library(lubridate)
library(magrittr)
setwd("~/Binfo/TST/Stack/") # adjust as needed
files <- list.files("data/", full.name = T) # adjust as needed
all_names <- gsub("\\.\\D{3}", "", files) # strip off extensions
names1 <- unique(all_names) # get rid of duplicates
# wrap your existing code in a function
mungeENVI <- function(name) {
# split filename in area-id and date
f <- gsub(".*\\/(\\d{3}\\D)_.*", "\\1", name)
d <- gsub(".*_(\\d+)_.*", "\\1", name) %>% ymd()
# get values from ENVI-file in a matrix
spectrum <- read.ENVI(paste(name,".esl", sep = ""), headerfile=paste(name,".hdr", sep=""))
# add columns
spectrum <- cbind(Flaeche=f,Datum= as.character(d),spectrum)
return(spectrum)
}
# use lapply to 'loop' over each name
list_of_ENVIs <- lapply(names1, mungeENVI) # returns a list
# use do.call(rbind, x) to turn it into a big data.frame
final_df <- do.call(rbind, list_of_ENVIs)
# now write output
write.csv(final_df, "data/all_results.csv")
Let me know if you have any problems and we an go from there. Cheers.
I edited my answer a bit, I think the problem you were hitting is in list.files() it should have had the argument full.name = T. I also adjusted you parsing method to be a little more defensive and use grep capture expressions. I tested the code with your two example files (4 really) but I can build out a large matrix (66743 elements). Also I used lubridate, I think it's a better way to work with dates and times.

How to crop a raster stack using multiple shapefiles in r

I am new to r so I am still figuring out the basics. I am trying to a take a list of multiple shapefiles and use them to crop several rasters stack. Then use the cropped raster stacks and create histograms from them.
I know I have to make lists of shapefiles which I have already figured out how to do. I also figured out how to individually crop rasters and make histograms. Unfortunately, the looping is throwing me for a loop. Does anyone have any suggestions?
Here's what I have so far:
library('rgdal')
library('raster')
library('landsat')
library('ggplot2')
library('rasterVis')
library('spatial.tools')
library('plyr')
library('reshape')
library('reshape2')
library('GGally')
library('lattice')
library('sp')
library('maptools')
library('rgeos')
library('gdata')
#Create folder for shapefiles located and rasters
foldershp <- paste0("F:/2926/Hayes/crp/hnw/")
setwd("F:/2926/Hayes/crp/hnw/")
fileName<-list.files(path= ".", pattern = ".shp$")
folderout<- paste0("F:/Indices/Histograms/")
#list for Rasters
ras.root<- paste0("F:/2926/July/")
ras.list <- c("mtvi", "ndsvi", "ndvi", "ndwi", "satvi", "swir32")
#Loop
for (i in ras.list) {
s <- stack()
setwd(paste(ras.root,i,sep=""))
files <- list.files(path = getwd(), pattern = "*.tif$")
s <- stack(files)
for (fileName in fileNames){
setwd(foldershp)
data <- readOGR(dsn=".",substr(fileName,1,nchar(fileName)-4))
names(data) <- substr(fileName,1,nchar(fileName)-4)
m <- mask(s,data)
setwd(folderout)
png(paste(i,"_",names(data),"_hist_plot",sep=""))
hist(m)
dev.off()
png(paste(i,"_clipped_",names(data),sep=""))
plot(m)
dev.off()
}
}
I'll give this a shot, though i'm sure there are more elegant ways....
set environments, read in the shapefiles as a list of names, create a character vector of where the rasters are kept (in your case, "mtvi","ndvi" etc)
require(raster)
require(gdata)
require(rgdal)
setwd("/home/st/R/shapes")
fileNames <- list.files(path = ".", pattern = "*.shp$")
ras.fols <- c("one","two","three")
setwd("/home/st/R/rasters")
now loop through the raster folders vector, setting the working directory to that folder and make a stack of the rasters inside it.
Then the nested loop will call your list of shapefiles and, one by one, mask the first rasterstack by the shapefiles and save a histogram and plot image of the masked data.
It then returns to the original loop, moves onto the next raster folder and performs the same action.
for (i in ras.fols) {
s <- stack()
setwd(paste("/home/st/R/rasters/",i,sep=""))
files <- list.files(path = getwd(), pattern = "*.tif$")
s <- stack(files)
for (fileName in fileNames){
setwd("/home/st/R/shapes")
data <- readOGR(dsn=".",substr(fileName,1,nchar(fileName)-4))
names(data) <- substr(fileName,1,nchar(fileName)-4)
m <- mask(s,data)
setwd("/home/st/R/write")
png(paste(i,"_",names(data),"_hist_plot",sep=""))
hist(m)
dev.off()
png(paste(i,"_clipped_",names(data),sep=""))
plot(m)
dev.off()
}
}
hope there's something that can help. You can tidy it up, put the shapefile folder and character vector of raster folders to the top, replace some 'setwd9("adfad/asdad/asdasd")' with some tidier stuff etc

Calculating percentile across netcdf files fails

I have five netcdf files where each file contains data for a time section. I want to calculate the 98th percentile for the whole timespan for each cell individually.
The accumulated file size for the netcdf files is around 250 MB.
My approach it this:
library(raster)
fileType="\\.nc$"
filenameList <- list.files(path=getwd(), pattern=fileType, full.names=F, recursive=FALSE)
#rasterStack for all layers
rasterStack <- stack()
#stack all data
for(i in 1:length(filenameList)){
filename <- filenameList[i]
stack.temp<-stack(filename)
rasterStack<-stack(rasterStack, stack.temp)
}
#calculate raster containing the 98th percentiles
result <- calc(rasterStack, fun = function(x) {quantile(x,probs = .98,na.rm=TRUE)} )
However, I get this error:
Error in ncdf4::nc_close(x#file#con) :
no slot of name "con" for this object of class ".RasterFile"
The stacking section of my code works, the crash happens during the calc function.
Do you have any idea where this might come from? Is it maybe an issue of where the data is stored (memory/disk)?
Strange, I generated some dummy data and it seems to work just fine, it does not seems to be your method. 250MB is not overly huge. I would clip a small piece of each raster and test if it works.
dat<-matrix(rnorm(16), 4, 4)
r1<-raster(dat)
r2<-r1*2
r3<-r2+1
r4<-r3+4
rStack <- stack(r1,r2,r3,r4)
result <- calc(rStack, fun = function(x) {quantile(x,probs = .98)} )
Perhaps this is related to the odd way you create a RasterStack. You should simply do:
filenames <- list.files(pattern="\\.nc$")
rasterStack <- stack(filenames)

Merging multiple rasters in R

I've been trying to find a time-efficient way to merge multiple raster images in R. These are adjacent ASTER scenes from the southern Kilimanjaro region, and my target is to put them together to obtain one large image.
This is what I got so far (object 'ast14dmo' representing a list of RasterLayer objects):
# Loop through single ASTER scenes
for (i in seq(ast14dmo.sd)) {
if (i == 1) {
# Merge current with subsequent scene
ast14dmo.sd.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
} else if (i > 1 && i < length(ast14dmo.sd)) {
tmp.mrg <- merge(ast14dmo.sd[[i]], ast14dmo.sd[[i+1]], tolerance = 1)
ast14dmo.sd.mrg <- merge(ast14dmo.sd.mrg, tmp.mrg, tolerance = 1)
} else {
# Save merged image
writeRaster(ast14dmo.sd.mrg, paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg", sep = ""), format = "GTiff", overwrite = TRUE)
}
}
As you surely guess, the code works. However, merging takes quite long considering that each single raster object is some 70 mb large. I also tried Reduce and do.call, but that failed since I couldn't pass the argument 'tolerance' which circumvents the different origins of the raster files.
Anybody got an idea of how to speed things up?
You can use do.call
ast14dmo.sd$tolerance <- 1
ast14dmo.sd$filename <- paste(path.mrg, "/AST14DMO_sd_", z, "m_mrg.tif", sep = "")
ast14dmo.sd$overwrite <- TRUE
mm <- do.call(merge, ast14dmo.sd)
Here with some data, from the example in raster::merge
r1 <- raster(xmx=-150, ymn=60, ncols=30, nrows=30)
r1[] <- 1:ncell(r1)
r2 <- raster(xmn=-100, xmx=-50, ymx=50, ymn=30)
res(r2) <- c(xres(r1), yres(r1))
r2[] <- 1:ncell(r2)
x <- list(r1, r2)
names(x) <- c("x", "y")
x$filename <- 'test.tif'
x$overwrite <- TRUE
m <- do.call(merge, x)
The 'merge' function from the Raster package is a little slow. For large projects a faster option is to work with gdal commands in R.
library(gdalUtils)
library(rgdal)
Build list of all raster files you want to join (in your current working directory).
all_my_rasts <- c('r1.tif', 'r2.tif', 'r3.tif')
Make a template raster file to build onto. Think of this a big blank canvas to add tiles to.
e <- extent(-131, -124, 49, 53)
template <- raster(e)
projection(template) <- '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
writeRaster(template, file="MyBigNastyRasty.tif", format="GTiff")
Merge all raster tiles into one big raster.
mosaic_rasters(gdalfile=all_my_rasts,dst_dataset="MyBigNastyRasty.tif",of="GTiff")
gdalinfo("MyBigNastyRasty.tif")
This should work pretty well for speed (faster than merge in the raster package), but if you have thousands of tiles you might even want to look into building a vrt first.
You can use Reduce like this for example :
Reduce(function(...)merge(...,tolerance=1),ast14dmo.sd)
SAGA GIS mosaicking tool (http://www.saga-gis.org/saga_tool_doc/7.3.0/grid_tools_3.html) gives you maximum flexibility for merging numeric layers, and it runs in parallel by default! You only have to translate all rasters/images to SAGA .sgrd format first, then run the command line saga_cmd.
I have tested the solution using gdalUtils as proposed by Matthew Bayly. It works quite well and fast (I have about 1000 images to merge). However, after checking with document of mosaic_raster function here, I found that it works without making a template raster before mosaic the images. I pasted the example codes from the document below:
outdir <- tempdir()
gdal_setInstallation()
valid_install <- !is.null(getOption("gdalUtils_gdalPath"))
if(require(raster) && require(rgdal) && valid_install)
{
layer1 <- system.file("external/tahoe_lidar_bareearth.tif", package="gdalUtils")
layer2 <- system.file("external/tahoe_lidar_highesthit.tif", package="gdalUtils")
mosaic_rasters(gdalfile=c(layer1,layer2),dst_dataset=file.path(outdir,"test_mosaic.envi"),
separate=TRUE,of="ENVI",verbose=TRUE)
gdalinfo("test_mosaic.envi")
}
I was faced with this same problem and I used
#Read desired files into R
data_name1<-'file_name1.tif'
r1=raster(data_name1)
data_name2<-'file_name2.tif'
r2=raster(data_name2)
#Merge files
new_data <- raster::merge(r1, r2)
Although it did not produce a new merged raster file, it stored in the data environment and produced a merged map when plotted.
I ran into the following problem when trying to mosaic several rasters on top of each other
In vv[is.na(vv)] <- getValues(x[[i]])[is.na(vv)] :
number of items to replace is not a multiple of replacement length
As #Robert Hijmans pointed out, it was likely because of misaligned rasters. To work around this, I had to resample the rasters first
library(raster)
x <- raster("Base_raster.tif")
r1 <- raster("Top1_raster.tif")
r2 <- raster("Top2_raster.tif")
# Resample
x1 <- resample(r1, crop(x, r1))
x2 <- resample(r2, crop(x, r2))
# Merge rasters. Make sure to use the right order
m <- merge(merge(x1, x2), x)
# Write output
writeRaster(m,
filename = file.path("Mosaic_raster.tif"),
format = "GTiff",
overwrite = TRUE)

Loop in R to read many files

I have been wondering if anybody knows a way to create a loop that loads files/databases in R.
Say i have some files like that: data1.csv, data2.csv,..., data100.csv.
In some programming languages you one can do something like this data +{ x }+ .csv the system recognizes it like datax.csv, and then you can apply the loop.
Any ideas?
Sys.glob() is another possibility - it's sole purpose is globbing or wildcard expansion.
dataFiles <- lapply(Sys.glob("data*.csv"), read.csv)
That will read all the files of the form data[x].csv into list dataFiles, where [x] is nothing or anything.
[Note this is a different pattern to that in #Joshua's Answer. There, list.files() takes a regular expression, whereas Sys.glob() just uses standard wildcards; which wildcards can be used is system dependent, details can be used can be found on the help page ?Sys.glob.]
See ?list.files.
myFiles <- list.files(pattern="data.*csv")
Then you can loop over myFiles.
I would put all the CSV files in a directory, create a list and do a loop to read all the csv files from the directory in the list.
setwd("~/Documents/")
ldf <- list() # creates a list
listcsv <- dir(pattern = "*.csv") # creates the list of all the csv files in the directory
for (k in 1:length(listcsv)){
ldf[[k]] <- read.csv(listcsv[k])
}
str(ldf[[1]])
Read the headers in a file so that we can use them for replacing in merged file
library(dplyr)
library(readr)
list_file <- list.files(pattern = "*.csv") %>%
lapply(read.csv, stringsAsFactors=F) %>%
bind_rows
fi <- list.files(directory_path,full.names=T)
dat <- lapply(fi,read.csv)
dat will contain the datasets in a list
Let's assume that your files have the file format that you mentioned in your question and that they are located in the working directory.
You can vectorise creation of the file names if they have a simple naming structure. Then apply a loading function on all the files (here I used purrr package, but you can also use lapply)
library(purrr)
c(1:100) %>% paste0("data", ., ".csv") %>% map(read.csv)
Here's another solution using a for loop. I like it better than the others because of its flexibility and because all dfs are directly stored in the global environment.
Assume you've already set your working directory, the algorithm will iteratively read all files and store them in the global environment with the name "datai".
list <- c(1:100)
for (i in list) {
filename <- paste0("data", i)
wd <- paste0("data", i, ".csv")
assign(filename, read.csv(wd))
}
First, set the working directory.
Find and store all the files ending with .csv.
Bind all of them row-wise.
Following is the code sample:
setwd("C:/yourpath")
temp <- list.files(pattern = "*.csv")
allData <- do.call("rbind",lapply(Sys.glob(temp), read.csv))
This may be helpful if you have datasets for participants as in psychology/sports/medicine etc.
setwd("C:/yourpath")
temp <- list.files(pattern = "*.sav")
#Maybe you want to unselect /delete IDs
DEL <- grep('ID(04|08|11|13|19).sav', temp)
temp2 <- temp[-DEL]
#Make a list of that contains all data
read.all <- lapply(temp2, read_sav)
#View(read.all[1])
#Option 1: put one under the next
df <- do.call("rbind", read.all)
Option 2: make something within each dataset (single IDs) e.g. get the mean of certain parts of each participant
mw_extraktion <- function(data_raw){
data_raw <- data.frame(data_raw)
#you may now calculate e.g. the mean for a certain variable for each ID
ID <- data_raw$ID[1]
data_OneID <- c(ID, Var2, Var3) #put your new variables (e.g. Means) here
} #end of function
data_combined <- t(data.frame(sapply(read.all, mw_extraktion) ) )

Resources