I have one netCDF file (.nc) with 16 years(1998 - 2014) worth of daily precipitation (5844 layers). The 3 dimensions are time (size 5844), latitude (size 19) and longitude (size 20)
Is there a straightforward approach in R to compute for each rastercell:
Monthly & yearly average
A cummulative comparison (e.g. jan-mar compared to the average of all jan-mar)
So far I have:
library(ncdf4)
library(raster)
Rname <- 'F:/extracted_rain.nc'
rainfall <- nc_open(Rname)
readRainfall <- ncvar_get(rainfall, "rain") #"rain" is float name
raster_rainfall <- raster(Rname, varname = "rain") # also tried brick()
asdatadates <- as.Date(rainfall$dim$time$vals/24, origin='1998-01-01') #The time interval is per 24 hours
My first challenge will be the compuatation of monthly averages for each raster cell. I'm not sure how best to proceed while keeping the ultimate goal (cummulative comparison) in mind. How can I easily access only days from a certain month?
raster(readRainfall[,,500])) # doesn't seem like a straightforward approach
Hopefully I made my question clear, a first push in the right direction would be much appreciated.
Sample data here
The question asked for a solution in R, but in case anyone is looking to do this task and wants a simple alternative command-line solution, these kind of statistics are the bread and butter of CDO
Monthly averages:
cdo monmean in.nc monmean.nc
Annual averages:
cdo yearmean in.nc yearmean.nc
Make the average of all the Jan, Feb etc:
cdo ymonmean in.nc ymonmean.nc
The monthly anomaly relative to the long term annual cycle:
cdo sub monmean.nc ymonmean.nc monanom.nc
Then you want a specific month, just select with selmon, or seldate.
you can call these functions from R using the system command.
Here is one approach using the zoo-package:
### first read the data
library(ncdf4)
library(raster)
library(zoo)
### use stack() instead of raster
stack_rainfall <- stack(Rname, varname = "rain")
### i renamed your "asdatadates" object for simplicity
dates <- as.Date(rainfall$dim$time$vals/24, origin='1998-01-01')
In your example dataset you only have 18 layers, all coming from January 1998. However, the following should also work with more layers (months).
First, we will build a function that operates one one vector of values (i.e. pixel time series) to convert the input to a zoo object using dates and the calculates the mean using aggregate. The function returns a vector with the length equal to the number of months in dates.
monthly_mean_stack <- function(x) {
require(zoo)
pixel.ts <- zoo(x, dates)
out <- as.numeric(aggregate(pixel.ts, as.yearmon, mean, na.rm=TRUE))
out[is.nan(out)] <- NA
return(out)
}
Then, depending on whether you want the output to be a vector / matrix / data frame or want to stay in the raster format, you can either apply the function over the cell values after retrieving them with getValues, or use the calc-function from the raster-package to create a raster output (this will be a raster stack with as many layers as there a months in your data)
v <- getValues(stack_rainfall) # every row displays one pixel (-time series)
# this should give you a matrix with ncol = number of months and nrow = number of pixel
means_matrix <- t(apply(v, 1, monthly_mean_stack))
means_stack <- calc(stack_rainfall, monthly_mean_stack)
When you're working with large raster datasets you can also apply your functions in parallel using the clusterR function. See ?clusterR
I think easiest to convert to raster brick and then into a data.frame.
Then can pull stats quite easily using general code DF$weeklymean <- rowMeans(DF[, ])
Related
Apologies in advance for wording, English is not my native language and this is my first post. I have been able to aggregate my data to this point, but am having issues condensing it further. I am trying to get the weighted average depth by biomass of several species.
My data currently has columns (station, time, layer, depth, biomass_X, biomass_Y, biomass_Z, ...) and I want to condense it to (station, time, weighted_depth_X, weighted_depth_Y, weighted_depth_Z, ...).
I got this code to work, but is there a way to loop it so it can complete all my columns?
library(plyr)
newData<-ddply(data, ~station+time, summarize, weighted.mean(data[,6], w=depth))
There is certainly a nicer way but this should work:
# data: dataframe containing columns to be averaged
# weights: vector containing the corresponding weights
weighted_mean_all_cols<- function(data,weights){
res<-do.call(cbind,llply(colnames(data), function(col) {weighted.mean(data[,col], w=weights)}))
colnames(res) <- colnames(data)
res
}
# collect the names of the target columns to average
targetCols <- grep("^biomass",colnames(data))
# apply weighted average by group, for every target column
newData <- ddply(data, c('station','time'), function(groupDF) {
print(groupDF[targetCols])
weighted_mean_all_cols(groupDF[,targetCols],groupDF$depth)
})
NDVI RasterStack
I have a 15day 1981-2015 NDVI RasterStack.
I need to calculate monthly NDVI with the 15d data.
I want to know how to calculate mean of the same names MM raster into a new monthly 1981-2015 rasterstack
I appreciate for your help! Thank you very much.
names XYYYY.MM.DD
I have recently been working on the same solution to you problem, this should work for you also.
You want to create a separate variable that contains the dates from your layer names.
#this removes the "X" character from the name leaving only the dates
layer_name <- sub('.', '', names(NDVI_stack))
#install.packages("lubridate")
library(lubridate)
layer_name <- ymd(layer_name)
#Create an indices to prepare it for stackApply, which takes the means for all the days of the month within each year.
indices <- format(as.Date(layer_name, format = "%Y.%m.%d"), format = "%Y.m")
NDVI_mean <- stackApply(NDVI_stack, indices, mean)
I have a list of rasters(.tif format) for multiple years. It is a 16 day NDVI from landsat, i want to make a monthly NDVI (average of two consecutive rasters) and save it in same or different directory as a monthly average
I have listed the raster rasters and make stack of it, later i used stackApply to calculate the mean, but it will produce empty raster. I have 23 images for single year, which i want to average it and make 12 months. This is how my raster files look like
"landsatNDVISC05SLC2000001.tif" "landsatNDVISC05SLC2000017.tif"
"landsatNDVISC05SLC2000033.tif" "landsatNDVISC05SLC2000049.tif"
"landsatNDVISC05SLC2000065.tif" "landsatNDVISC05SLC2000081.tif"
"landsatNDVISC05SLC2000097.tif" "landsatNDVISC05SLC2000113.tif"
"landsatNDVISC05SLC2000129.tif" "landsatNDVISC05SLC2000145.tif"
"landsatNDVISC05SLC2000161.tif" "landsatNDVISC05SLC2000177.tif"
"landsatNDVISC05SLC2000193.tif" "landsatNDVISC05SLC2000209.tif"
"landsatNDVISC05SLC2000225.tif" "landsatNDVISC05SLC2000241.tif"
"landsatNDVISC05SLC2000257.tif" "landsatNDVISC05SLC2000273.tif"
"landsatNDVISC05SLC2000289.tif" "landsatNDVISC05SLC2000305.tif"
"landsatNDVISC05SLC2000321.tif" "landsatNDVISC05SLC2000337.tif"
"landsatNDVISC05SLC2000353.tif
This code works but will produce more than twelve empty raster and i also want to save the raster brick as single subset monthly raster
library(raster)
lrast<-list.files("G:/LANDSAT-NDVI/testAverage")
layers<-paste("landsatNDVISC05SLC2000", seq(from=001, to=353,by=16))
stak<-stack(lrast)
raster<-stackApply(stak, layers, fun = mean)
I want to make a monthly average from landsatNDVISC05SLC2000001.tif and landsatNDVISC05SLC2000017.tif as landsatNDVISC05SLC2000M1.tif. Similarly, 33,49 and since i only have 23 raster, i want to retain landsatNDVISC05SLC2000353.tif as landsatNDVISC05SLC2000M12.tif
Blockquote
not sure how stackapply works but something like this should do the stuff needed.
library(raster)
files <- list.files(path = "...", full.names = T, pattern = ".tif")
stk <- stack()
for (i in files){
print(i)
as <- raster(files[i])
stk <- addLayer(stk, as)
}
jday <-c("landsatNDVISC05SLC2000017.tif","landsatNDVISC05SLC2000033.tif",
"landsatNDVISC05SLC2000049.tif","landsatNDVISC05SLC2000065.tif","landsatNDVISC05SLC2000081.tif",
"landsatNDVISC05SLC2000097.tif","landsatNDVISC05SLC2000113.tif","landsatNDVISC05SLC2000129.tif",
"landsatNDVISC05SLC2000145.tif","landsatNDVISC05SLC2000161.tif","landsatNDVISC05SLC2000177.tif",
"landsatNDVISC05SLC2000193.tif","landsatNDVISC05SLC2000209.tif","landsatNDVISC05SLC2000225.tif",
"landsatNDVISC05SLC2000241.tif","landsatNDVISC05SLC2000257.tif","landsatNDVISC05SLC2000273.tif",
"landsatNDVISC05SLC2000289.tif","landsatNDVISC05SLC2000305.tif","landsatNDVISC05SLC2000321.tif",
"landsatNDVISC05SLC2000337.tif","landsatNDVISC05SLC2000353.tif")
jday <- as.numeric(substr(jday, 24, 25)) #substract the julien days (which I think these number represent before .tif; or you can substract the names from the 'files' vector)
dates <- as.Date(jday, origin=as.Date("2000-01-01")) # create a Date vector
stk <- setZ(stk, dates) # assign the date vector to the raster stack
raster <- zApply(stk, by = format(dates,"%Y-%m"), fun = mean, na.rm = T) # create the monthly stack
I am looking to loop over my R data frame that is in year-quarter and run a rolling regression across every year quarter. I then use the coefficients from this model to fit values that are 1 quarter ahead. But would like to use quarterly date format in R?
I had similar issue with
[Stata question] (Stata year-quarter for loop), but revisiting it in R. Does R have the notion of year quarters that can be easily used in a loop? For e.g., one possibly round about way is
months.list <- c("03","06","09","12")
years.list <- c(1992:2007)
## Loop over the month and years
for(yidx in years.list)
{
for(midx in months.list)
{
}
}
I see zoo:: package has some functions, but not sure which one can I use that is specific to my case. Some thing along the following lines would be ideal:
for (yqidx in 1992Q1:2007Q4){
z <- lm(y ~ x, data = mydata <= yqidx )
}
When I do the look ahead, I need to hand it so that the predicated value is run on the the next quarter that is yqidx + 1, and so 2000Q4 moves to 2001Q1.
If all you need help on is how to generate quarters,
require(data.table)
require(zoo)
months.list <- c("03","06","09","12")
years.list <- c(1992:2007)
#The next line of code generates all the month-year combinations.
df<-expand.grid(year=years.list,month=months.list)
#Then, we paste together the year and month with a day so that we get dates like "2007-03-01". Pass that to as.Date, and pass the result to as.yearqtr.
df$Date=as.yearqtr(as.Date(paste0(df$year,"-",df$month,"-01")))
df<-df[order(df$Date),]
Then you can use loops if you'd like. I'd personally consider using data.table like so:
require(data.table)
require(zoo)
DT<-data.table(expand.grid(year=years.list,month=months.list))
DT<-DT[order(year,month)]
DT[,Date:=as.yearqtr(as.Date(paste0(year,"-",month,"-01")))]
#Generate fake x values.
DT[,X:=rnorm(64)]
#Generate time index.
DT[,t:=1:64]
#Fake time index.
DT[,Y:=X+rnorm(64)+t]
#Get rid of the year and month columns -unneeded.
DT[,c("year","month"):=NULL]
#Create a second data.table to hold all your models.
Models<-data.table(Date=DT$Date,Index=1:64)
#Generate your (rolling) models. I am assuming you want to use all past observations in each model.
Models[,Model:=list(list(lm(data=DT[1:Index],Y~X+t))),by=Index]
#You can access an individual model thusly:
Models[5,Model]
I have a list dataframes of 288 ASCII data frames in R that contain values and their coordinates. The data are for the average temperature of every hour by month, so the data frames have titles that range from jan_01 to dec_24. What I want to do is restrict these so they are reduced to only containing values for a specific range (region) of coordinates. I can do this successfully for each individual frame using lower bound xl and upper bound xu. For example, the x-coordinates for 01:00 in April would be restricted by using
apr_01 <- apr_01[apr_01$x <= xu,]
apr_01 <- apr_01[apr_01$x >= xl,]
I suspect there's some way to use lapply() or a series of loops so that this operation can be done to all data frames for the whole year, but I couldn't figure out how to implement it since my method above needs the unique data frame name. I tried writing a generalizable function for use with lapply(), but I'm new to R so haven't had much luck. This is probably a trivial problem, but any help would be appreciated!
edit: The function I tried to write, which obviously won't work, was something like
restrict <- function(f){
f <- f[f$s1 <= xu]
f <- f[f$s1 >= xl]
}
for(i in 1:24){
apr_[i] <- restrict(apr_[i])
}
This turned out to be really simple. Here is the code that solved my problem:
restrict <- function(f){
f <- f[f$x <= xu,]
f <- f[f$x >= xl,]
}
dataframes <- lapply(dataframes, restrict)
The restrict function restricts the x-coordinates to what I'm interested in, then I applied it to each data frame in the list with lapply().