Merge netCDF files in R - r

I have 2 netCDF files (each .nc file has 4 variables: Susceptible, Infected, Recovered and Inhabitable. The dimension of each variable is 64 x 88). I would like to merge these 2 files into a single netCDF file such that the merged file will stack separately Susceptible from the 2 files, Infected from the 2 files, Recovered from the 2 files and Inhabitable from the 2 files.
Here are the 2 files(first and second)
Could anyone help me with this please?
Thanks in advance,
Ashok

The ncdf4 package will do what you want to do. Have a look at the code below, example for one variable only.
#install.packages('ncdf4')
library(ncdf4)
file1 <- nc_open('England_aggr_GPW4_2000_0001.nc')
file2 <- nc_open('England_aggr_GPW4_2000_0002.nc')
# Just for one variable for now
dat_new <- cbind(
ncvar_get(file1, 'Susceptible'),
ncvar_get(file2, 'Susceptible'))
dim(dat_new)
var <- file1$var['Susceptible']$Susceptible
# Create a new file
file_new3 <- nc_create(
filename = 'England_aggr_GPW4_2000_new.nc',
# We need to define the variables here
vars = ncvar_def(
name = 'Susceptible',
units = var$units,
dim = dim(dat_new)))
# And write to it
ncvar_put(
nc = file_new,
varid = 'Susceptible',
vals = dat_new)
# Finally, close the file
nc_close(file_new)
Update:
An alternative approach is using the raster package as shown below. I didn't figure out how to make 4D raster stacks, so I am splitting your data into one NCDF file per variable. Would that work for you?
#install.packages('ncdf4')
library(ncdf4)
library(raster)
var_names <- c('Susceptible', 'Infected', 'Recovered', 'Inhabitable')
for (var_name in var_names) {
# Create raster stack
x <- stack(
raster('England_aggr_GPW4_2000_0001.nc', varname = var_name),
raster('England_aggr_GPW4_2000_0002.nc', varname = var_name))
# Name each layer
names(x) <- c('01', '02')
writeRaster(x = x,
filename = paste0(var_name, '_out.nc'),
overwrite = TRUE,
format = 'CDF')
}

Related

How to Loop this following script in R instead of Repeating the same exact steps all the time for each file

I am new to R. I am working on cmip6 models. I have over 200 plus .nc files. I can run the script for individual files and get the output i need but having hard time looping the script. Can you guys help me out. It will save me alot of my time. Thank you in advance.
setwd("D:\\data")
library (raster) ## load required library
library(sp) ## load library
library(ncdf4)
station.data = read.csv(file.choose(), sep = ",", header =T) ## import station data.file
lon.lat = station.data[,c(2,3)] ## extract data of all stations in station file for which point values are to be extractd
lon.lat = SpatialPoints(lon.lat) ## lon.lat for further use
lon.lat
robject = brick(file.choose(), varname = "pr")## import raster netcdf file from which point values are to be extractd
dim(robject) ## check the dimensions of project
vall = extract(robject , lon.lat, method = "simple" ) ## extract values
vall = t(vall)
write.csv(vall, file = "earthvegbil.csv", fileEncoding = "macroman") ## save output csv file containing extracted point values .
Put your code in an lapply and loop over the indeces of your files. The latter is useful to get nice numerical suffixes in write.csv.
setwd("D:\\data")
files <- list.files(pattern=".csv$")
bricks <- list.files(pattern=".pr$")
stopifnot(length(files) == length(bricks)) ## check for equal length of both vectors
mapply(seq_along(files), \(x) {
station.data <- read.csv(files[[x]], sep=", ", header=T) ## import station data.file
lon.lat <- station.data[, c(2, 3)] ## extract data of all stations in station file for which point values are to be extractd
lon.lat <- SpatialPoints(lon.lat) ## lon.lat for further use
robject <- brick(bricks[[x]], varname="pr")## import raster netcdf file from which point values are to be extractd
vall <- extract(robject , lon.lat, method="simple" ) ## extract values
vall <- t(vall)
write.csv(vall, file=sprintf('./out/earthvegbil_%03d.csv', x), fileEncoding="macroman") ## save output csv file containing extracted point values .
}

Creating a list of raster bricks from a multivariate netCDF file

I've been working with the RCP (Representative Concentration Pathway) spatial data. It's a nice gridded dataset in netCDF format. How can I get a list of bricks where each element represents one variable from a multivariate netCDF file (by variable I don't mean lat,lon,time,depth...etc). This is what Iv'e tried to do. I can't post an example of the data, but I've set up the script below to be reproducible if you want to look in to it. Obviously questions welcome... I might not have expressed the language associated with the code smoothly. Cheers.
A: Package requirements
library(sp)
library(maptools)
library(raster)
library(ncdf)
library(rgdal)
library(rasterVis)
library(latticeExtra)
B: Gather data and look at the netCDF file structure
td <- tempdir()
tf <- tempfile(pattern = "fileZ")
download.file("http://tntcat.iiasa.ac.at:8787/RcpDb/download/R85_NOX.zip", tf , mode = 'wb' )
nc <- unzip( tf , exdir = td )
list.files(td)
## Take a look at the netCDF file structure, beyond this I don't use the ncdf package directly
ncFile <- open.ncdf(nc)
print(ncFile)
vars <- names(ncFile$var)[1:12] # I'll try to use these variable names later to make a list of bricks
C: Create a raster brick for one variable. Levels correspond to years
r85NOXene <- brick(nc, lvar = 3, varname = "emiss_ene")
NAvalue(r85NOXene) <- 0
dim(r85NOXene) # [1] 360 720 12
D: Names to faces
data(wrld_simpl) # in maptools
worldPolys <- SpatialPolygons(wrld_simpl#polygons)
cTheme <- rasterTheme(region = rev(heat.colors(20)))
levelplot(r85NOXene,layers = 4,zscaleLog = 10,main = "2020 NOx Emissions From Power Plants",
margin = FALSE, par.settings = cTheme) + layer(sp.polygons(worldPolys))
E: Summarize all grid cells for each year one variable "emis_ene", I want to do this for each variable of the netCDF file I'm working with.
gVals <- getValues(r85NOXene)
dim(gVals)
r85NOXeneA <- sapply(1:12,function(x){ mat <- matrix(gVals[,x],nrow=360)
matfun <- sum(mat, na.rm = TRUE) # Other conversions are needed, but not for the question
return(matfun)
})
F: Another meet and greet. Check out how E looks
library(ggplot2) # loaded here because of masking issues with latticeExtra
years <- c(2000,2005,seq(2010,2100,by=10))
usNOxDat <- data.frame(years=years,NOx=r85NOXeneA)
ggplot(data=usNOxDat,aes(x=years,y=(NOx))) + geom_line() # names to faces again
detach(package:ggplot2, unload=TRUE)
G: Attempt to create a list of bricks. A list of objects created in part C
brickLst <- lapply(1:12,function(x){ tmpBrk <- brick(nc, lvar = 3, varname = vars[x])
NAvalue(tmpBrk) <- 0
return(tmpBrk)
# I thought a list of bricks would be a good structure to do (E) for each netCDF variable.
# This doesn't break but, returns all variables in each element of the list.
# I want one variable in each element of the list.
# with brick() you can ask for one variable from a netCDF file as I did in (C)
# Why can't I loop through the variable names and return on variable for each list element.
})
H: Get rid of the junk you might have downloaded... Sorry
file.remove(dir(td, pattern = "^fileZ",full.names = TRUE))
file.remove(dir(td, pattern = "^R85",full.names = TRUE))
close(ncFile)
Your (E) step can be simplified using cellStats.
foo <- function(x){
b <- brick(nc, lvar = 3, varname = x)
NAvalue(b) <- 0
cellStats(b, 'sum')
}
sumLayers <- sapply(vars, foo)
sumLayers is the result you are looking for, if I understood correctly your question.
Moreover, you may use the zoo package because you are dealing with time series.
library(zoo)
tt <- getZ(r85NOXene)
z <- zoo(sumLayers, tt)
xyplot(z)

How to insert values into variable?

I need to take a basename from a file path and insert it into a variable so I can access a column in a dataframe. I have created some sample data to illustrate what I am trying to accomplish.
Create some sample data:
library(raster)
## Create a matrix with random data & use image()
xy = matrix(rnorm(400),20,20)
image(xy)
# Turn the matrix into a raster
rast = raster(xy)
# Give it lat/lon coords for 36-37°E, 3-2°S
extent(rast) = c(36,37,-3,-2)
# ... and assign a projection
projection(rast) = CRS("+proj=longlat +datum=WGS84")
plot(rast)
# Write to disk:
writeRaster(rast, "C:/temp/12345.tif", format = "tif")
Create a path to raster
path = 'C:/temp/12345.asc
Create a raster object:
r = raster(file)
Sample random locations in raster and report values in a dataframe
df = data.frame(sampleRandom(r, size=1000, cells=TRUE, sp=TRUE))
Now I need to automate the insertion of the basename into a variable so that it looks like:
test = df$X12345
This is my unsuccessful attempt at inserting the basename into the test variable:
require(tools)
name = basename(file_path_sans_ext(path))
test2 = paste('df$', 'X', name, sep = '')
>test2
[1] "df$X12345"
This method seems to create a the correct character "df$X12345", although I cannot access the dataframe by calling test2. How can I construct a series of characters into a functioning variable so that I can access the particular dataframe column?
Maybe you are looking for parse and eval:
df <- data.frame(a = 1, b = 2)
test2 <- "df$b"
eval(parse(text = test2))
# [1] 2
test = df[,paste0('X', name)]
Is this what you're looking for?

How can I produce wig files from .sam files using a faster R script?

I have an R script in which I can read the lines from a .sam file after mapping and I want to parse lines of sam file into strings in order to be easier to manipulate them and create the wig files that I want or to calculate the cov3 and cov5 that I need.
Can you help me please to make this script work faster? How can I parse lines of a huge .sam file into a data frame faster? Here is my script:
gc()
rm(list=ls())
exptPath <- "/home/dimitris/INDEX3PerfectUnique31cov5.sam"
lines <- readLines(exptPath)
pos = lines
pos
chrom = lines
chrom
pos = ""
chrom = ""
nn = length(lines)
nn
# parse lines of sam file into strings(this part is very very slow)
rr = strsplit(lines,"\t", fixed = TRUE)
rr
trr = do.call(rbind.data.frame, rr)
pos = as.numeric(as.character(trr[8:nn,4]))
# for cov3
#pos = pos+25
#pos
chrom = trr[8:nn,3]
pos = as.numeric(pos)
pos
tab1 = table(chrom,pos, exclude="")
tab1
ftab1 = as.data.frame(tab1)
ftab1 = subset(ftab1, ftab1[3] != 0)
ftab1 = subset(ftab1, ftab1[1] != "<NA>")
oftab1 = ftab1[ order(ftab1[,1]), ]
final.ftab1 = oftab1[,2:3]
write.table(final.ftab1, "ind3_cov5_wig.txt", row.names=FALSE,
sep=" ", quote=FALSE)
It's hard to provide a detailed answer without access to sample inputs and outputs (e.g., subsets of your data on dropbox). The Bioconductor solution would convert the sam file to bam
library(Rsamtools)
bam <- "/path/to/new.bam")
asBam("/path/to/old.sam", bam)
then read the data in, perhaps directly (see ?scanBam and ?ScanBamParam to import just the fields / regions of interest)
rr <- scanBam(bam)
or in the end more conveniently
library(GenomicAlignments)
aln <- readGAlignments(bam)
## maybe cvg <- coverage(bam) ?
There would be several steps to do your manipulations, ending with a GRanges object (sort of like a data.frame, but where the rows have genomic coordinates) or related object
## ...???
## gr <- GRanges(seqnames, IRanges(start, end), strand=..., score=...)
The end goal is to export to a wig / bigWig / bed file using
library(rtracklayer)
export(gr, "/path/to.wig")
There are extensive help resources, including package vignettes, man pages, and the Bioconductor mailing list

R apply raster function to a list of characters

I started recently to work with R so this question has probably a simple solution.
I have some .tif satellite images from different scenes. I can create a test raster brick with it but the process needs to be automatised because of the huge amount of files. Therefore I have been trying to create a function to read the list of .tif files and to output a list of rasters.
You can find here below the code I have been using:
# Description: Prepare a raster brick with ordered acquisitions
# from all the scenes of the study area
library(raster)
library(rgdal)
library(sp)
library(rtiff)
rm(list = ls())
setwd=getwd()
# If you want to download the .tif files of the 2 scenes from dropbox:
dl_from_dropbox <- function(x, key) {
require(RCurl)
bin <- getBinaryURL(paste0("https://dl.dropboxusercontent.com/s/", key, "/", x),
ssl.verifypeer = FALSE)
con <- file(x, open = "wb")
writeBin(bin, con)
close(con)
message(noquote(paste(x, "read into", getwd())))
}
dl_from_dropbox("lndsr.LT52210611985245CUB00-vi.NDVI.tif", "qb1bap9rghwivwy")
dl_from_dropbox("lndsr.LT52210611985309CUB00-vi.NDVI.tif", "sbhcffotirwnnc6")
dl_from_dropbox("lndsr.LT52210611987283CUB00-vi.NDVI.tif", "2zrkoo00ngigfzm")
dl_from_dropbox("lndsr.LT42240631992198XXX02-vi.NDVI.tif", "gx0ctxn2mca3u5v")
dl_from_dropbox("lndsr.LT42240631992214XXX02-vi.NDVI.tif", "pqnjw2dpz9beeo5")
dl_from_dropbox("lndsr.LT52240631986157CUB02-vi.NDVI.tif", "rrka10yaktv8la8")
# 1- Create a list of .tif files with names ordered chronologically (for time series analysis later on)
pathdir= # change
# List all the images from any scene in that folder and
# make a dataframe with a column for the date
a <- list.files(path=pathdir,pattern="lndsr.LT", all.files=FALSE,full.names=FALSE)
a1 <- as.data.frame(a, row.names=NULL, optional=FALSE, stringsAsFactors=FALSE) # class(a1$a) # character
# Create date column with julean date and order it in ascending order
a1$date <- substr(a1$a, 16, 22) # class(a1$date) = character
a1 <- a1[order(a1$date),]
# Keep only the column with the name of the scene
a1 <- subset(a1, select=1) # class(a1$a): character
# retrieve an ordered list from the dataframe
ord_dates <- as.list(as.data.frame(t(a1$a))) # length(ord_dates): 4 (correct)
# class(odd_dates) # list
# 2- Create rasters from elements of a list
for (i in 1:(length(ord_dates))){
# Point to each individual .tif file
tif_file <- ord_dates[i] # Problem: accesses only the first item of ord_dates
# Make a raster out of it
r <- raster(tif_file) # we cant use here a list as an input. Gives error:
# Error in .local(x, ...) : list has no "x"
# Give it a standardised name (r1,r2,r3, etc)
name <- paste("r", 1:length(ord_dates),sep = "")
# Write the raster to file
writeRaster (r , filename = name,format = "GTiff", overwrite =T )
}
I have also tried to use lapply() without much success.
r = lapply(ord_dates, raster)
Can you give me an advice on what concept to follow? I am guessing I should be using matrices but I don't really understand which are their advantages here or in what step they are required.
Any help is really appreciated!
Thanks in advance
Assuming ord_dates is a list of file names (that have full path or are in your getwd()), you can apply a (any) function to this list using lapply. I haven't tested this, unfortunately.
convertAllToRaster <- function(tif_file) {
r <- raster(tif_file)
# Give it a standardised name (r1,r2,r3, etc)
name <- paste("r", 1:length(ord_dates),sep = "")
# Write the raster to file
writeRaster (r , filename = name,format = "GTiff", overwrite =T )
message("Eeee, maybe it was written successfully.")
}
lapply(ord_dates, FUN = convertAllToRaster)
After solving the issues with factors and with the name, this is the code that worked for me. I added a for loop also inside the function you proposed, Roman. Thankyou very much for your kind help!!
convertAllToRaster <- function(ord_dates) {
for (i in 1:(length(ord_dates))){
tif_file <- ord_dates[i]
r <- raster(tif_file)
# Keep the original name
name <- paste(tif_file, ".grd", sep ="")
# Write the raster to file
writeRaster (r , filename = name,format = "raster", overwrite =T ) # in .grd format
}
}
lapply(ord_dates, FUN = convertAllToRaster)

Resources