I would like to compare soil moisture rasters available every 3 days to rainfall rasters available daily. I make a stack of each and resample to the appropriate resolution. Now, to compare the stacks easily, it'd be nice to be able to copy each layer in the soil moisture stack and insert it next to itself twice. This is basically the same question as Stacking an existing RasterStack multiple times
except that I need the big stack to be sorted so that all of the rasters are in time order. Is there a way to do this?
(I know that I could copy the files before stacking them the 1st time, but this would require resampling 3x the stack. Since resampling is the slowest part of my script, there should be a better way.)
Something like this?
# example data
r <- raster(ncol=10, nrow=10)
r[]=1:ncell(r)
x <- brick(r,r,r,r,r,r)
x <- x * 1:6
y <- list()
for (i in 1:nlayers(x)) {
r <- raster(x, i)
y <- c(y, r, r, r)
}
s <- stack(y)
Related
I am trying to use R to find the Harmonics within a sound file, I would also like to plot these findings as a Frequency(Hz)(x) Strength(y) graph to show the harmonics found. I've found it hard so far to find a helpful, working example of FFT being used on an audio file in R as most of the tutorials work with a premade cosine or sine wave.
I have found an example on the https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/fft page under community code, but it did not work very well when I attempted to use it on the audio file.
#my addition (I've left the wave file space empty deliberately)
voice <- readWave("",from=0, to=Inf, units=c("seconds"), header=FALSE, toWaveMC=NULL)
#the community code
x <- wavobj#left
fs <- wavobj#samp.rate
nbits <- wavobj#bit
x <- x[1:(fs*5)]
y <- fft(x)
y.tmp <- Mod(y)
y.tmp <- Mod(y)
y.ampspec <- y.tmp[1:(length(y)/2+1)]
y.ampspec[2:(length(y)/2)] <- y.ampspec[2:(length(y)/2)] * 2
f <- seq(from=0, to=fs/2, length=length(y)/2+1)
plot(f, y.ampspec, type="h", xlab="Frequency (Hz)", ylab="Amplitude Spectrum", xlim=c(0, 350))
Please send some help!
If you're OK to use the seewave package, it has some helpful functions including meanspec which works out the mean frequency spectrum. Here's an example.
library(tuneR)
library(seewave)
data('sheep')
ms <- meanspec(sheep)
The ms object is a two dimensional array where the first column is the frequency and the second is the amplitude.
I have a raster stack made of 11 ascii files having temperature values of an area. Each file represents a different time point such as t2, t3,...,t12. I want to select one specific pixel from this area and I want to make a graph showing the changes of temperature values in time (from t2 to t12) of this pixel. I tried the following code:
> myfiles <- list.files(full.names = T)
> temp_files <- stack(myfiles)
> temp_values <- extract(temp_files, mypixel) # mypixel is defined by xyFromCell function
> plot(temp_values)
I check the values and it seemed right. But I will apply the same code for stacks with 500 layers and I cannot check each value in each layer so is this the right way to do that?
Here is a minimal, self-contained, reproducible example
library(raster)
r <- stack(system.file("external/rlogo.grd", package="raster"))
Now you can do things like
x <- 1:nlayers(r)
# select the cell you want
y <- r[4089]
# or
# extract(r, 4089)
And
plot(x, y)
So what you are doing appears to be correct.
I am using the mosaic function in the raster package to combine a long (11,000 files) list of rasters using the approach suggested by #RobertH here.
rlist <- sapply(list_names)
rlist$fun <- mean
rlist$na.rm <- TRUE
x <- do.call(mosaic, rlist)
As you might imagine, this eventually overruns my available memory (on several different machines and computing clusters). My question is: Is there a way to reduce the memory usage of either mosaic or do.call? I've tried altering maxmemory in rasterOptions(), but that does not seem to help. Processing the rasters in smaller batches seems problematic because the rasters may be spatially disjunct (i.e., sequential raster files may be located very far from each other). Thanks in advance for any help you can give.
Rather than loading all rasters into memory at once (in the mosaic() call), can you process them one at a time? That way, you have your mosaic that updates each time you bring one more raster into memory, but then you can get rid of the new raster and just keep the continuously updating mosaic raster.
Assuming that your rlist object is a list of rasters, I'm thinking of something like:
Pseudocode
Initialize an updating_raster object as the first raster in the list
Loop through each raster in the list in turn, starting from the 2nd raster
Read the ith raster into memory called next_raster
Update the updating_raster object by overwriting it with the mosaic of itself and the next raster using a weighted mean
R code
Testing with the code in the mosaic() help file example...
First generate some rasters and use the standard mosaic method.
library(raster)
r <- raster(ncol=100, nrow=100)
r1 <- crop(r, extent(-10, 11, -10, 11))
r2 <- crop(r, extent(0, 20, 0, 20))
r3 <- crop(r, extent(9, 30, 9, 30))
r1[] <- 1:ncell(r1)
r2[] <- 1:ncell(r2)
r3[] <- 1:ncell(r3)
m1 <- mosaic(r1, r2, r3, fun=mean)
Put the rasters in a list so they are in a similar format as I think you have.
rlist <- list(r1, r2, r3)
Because of the NA handling of the weighted.mean() function, I opted to create the same effect by breaking down the summation and the division into distinct steps...
First initialize the summation raster:
updating_sum_raster <- rlist[[1]]
Then initialize the "counter" raster. This will represent the number of rasters that went into mosaicking at each pixel. It starts as a 1 in all cells that aren't NA. It should properly handle NAs such that it only will increment for a given pixel if a non-NA value was added to the updating sum.
updating_counter_raster <- updating_sum_raster
updating_counter_raster[!is.na(updating_counter_raster)] <- 1
Here's the loop that doesn't require all rasters to be in memory at once. The counter raster for the raster being added to the mosaic has a value of 1 only in the cells that aren't NA. The counter is updated by summing the current counter raster and the updating counter raster. The total sum is updated by summing the current raster values and the updating raster values.
for (i in 2:length(rlist)) {
next_sum_raster <- rlist[[i]]
next_counter_raster <- next_sum_raster
next_counter_raster[!is.na(next_counter_raster)] <- 1
updating_sum_raster <- mosaic(x = updating_sum_raster, y = next_sum_raster, fun = sum)
updating_counter_raster <- mosaic(updating_counter_raster, next_counter_raster, fun = sum)
}
m2 <- updating_sum_raster / updating_counter_raster
The values here seem to match the use of the mosaic() function
identical(values(m1), values(m2))
> TRUE
But the rasters themselves aren't identical:
identical(m1, m2)
> FALSE
Not totally sure why, but maybe this gets you closer?
Perhaps compareRaster() is a better way to check:
compareRaster(m1, m2)
> TRUE
Hooray!
Here's a plot!
plot(m1)
text(m1, digits = 2)
plot(m2)
text(m2, digits = 2)
A bit more digging in the weeds...
From the mosaic.R file:
It looks like the mosaic() function initializes a matrix called v to populate with the values from all the cells in all the rasters in the list. The number of rows in matrix v is the number of cells in the output raster (based on the full mosaicked extent and resolution), and the number of columns is the number of rasters to be mosaicked (11,000) in your case. Maybe you're running into the limits of matrix creation in R?
With a 1000 x 1000 raster (1e6 pixels), the v matrix of NAs takes up 41 GB. How big do you expect your final mosaicked raster to be?
r <- raster(ncol=1e3, nrow=1e3)
x <- 11000
v <- matrix(NA, nrow=ncell(r), ncol=x)
format(object.size(v), units = "GB")
[1] "41 Gb"
In R I can easily compute the max/min value of each cell in a georeferenced raster stack using the max/min commands.
set.seed(42)
require(raster)
r1 <- raster(nrows=10, ncols=10)
r2=r3=r4=r1
r1[]= runif(ncell(r1))
r2[]= runif(ncell(r1))+0.2
r3[]= runif(ncell(r1))-0.2
r4[]= runif(ncell(r1))
rs=stack(r1,r2,r3,r4)
plot(rs)
max(rs)
min(rs)
However, I have been trying to find a way to find the second highest values across a stack. In my case, each raster on the stack denotes performance of a particular model across space. I would like to compare the first vs second best values to determine how much better is the best model from its runner up without having to convert my stack to a matrix and then back into a raster. Any ideas or suggestions??
You'll probably want to use calc(), adapting the code below to your precise situation. Just to show that it works as advertised, I've separately plotted layers formed by taking the highest, second highest, third, and fourth highest values found in each cell of the 4-layer RasterStack object.
zz <- range(cellStats(rs, range))
par(mfcol=c(2,2))
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[1]]), main="1st",zlim=zz)
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[2]]), main="2nd",zlim=zz)
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[3]]), main="3rd",zlim=zz)
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[4]]), main="4th",zlim=zz)
Or, more compactly and efficiently, just construct a new raster stack holding the reordered values and then plot its layers:
zz <- range(cellStats(rs, range))
rs_ord <- calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)])
par(mfcol=c(2,2))
plot(rs_ord[[1]], main="1st", zlim=zz)
plot(rs_ord[[2]], main="2nd", zlim=zz)
plot(rs_ord[[3]], main="3rd", zlim=zz)
plot(rs_ord[[4]], main="4th", zlim=zz)
Learning R language - I know how to do a moving average but I need to do more - but I am not a statistician - unfortunately all the docs seem to be written for statisticians.
I do this in excel a lot, it's really handy for analysis of operational activities.
Here are the fields on each row to make bollinger bands:
Value could be # of calls, complaint ratio, anything
TimeStamp | Value | Moving Average | Moving STDEVP | Lower Control | Upper Control
Briefly, the moving avg and the stdevP point to the prior 8 or so values in the series. Lower control at a given point in time is = moving average - 2*moving stdevP and upper control = moving average + 2*moving stdevP
This can easily be done in excel for a single file, but if I can find a way to make R work R will be better for my needs. Hopefully faster and more reliable when automated, too.
links or tips would be appreciated.
You could use the function rollapply() from the zoo package, providing you work with a zoo series :
TimeSeries <- cumsum(rnorm(1000))
ZooSeries <- as.zoo(TimeSeries)
BollLines <- rollapply(ZooSeries,9,function(x){
M <- mean(x)
SD <- sd(x)
c(M,M+SD*2,M-SD*2)
})
Now you have to remember that rollapply uses a centered frame, meaning that it takes the values to the left and the right of the current day. This is also more convenient and more true to the definition of the Bollinger Band than your suggestion of taking x prior values.
If you don't want to convert to zoo, you can use the vectors as well and write your own function. I added an S3 based plotting function that allows you to easily plot the calculations as well. With these functions, you could do something like :
TimeSeries <- cumsum(rnorm(1000))
X <- BollingerBands(TimeSeries,80)
plot(X,TimeSeries,type="l",main="An Example")
to get :
The function codes :
BollingerBands <- function(x,width){
Start <- width +1
Stop <- length(x)
Trail <- rep(NA,ceiling(width/2))
Tail <- rep(NA,floor(width/2))
Lines <- sapply(Start:Stop,function(i){
M <- mean(x[(i-width):i])
SD <- sd(x[(i-width):i])
c(M,M+2*SD,M-2*SD)
})
Lines <- apply(Lines,1,function(i)c(Trail,i,Tail))
Out <- data.frame(Lines)
names(Out) <- c("Mean","Upper","Lower")
class(Out) <- c("BollingerBands",class(Out))
Out
}
plot.BollingerBands <- function(x,data,lcol=c("red","blue","blue"),...){
plot(data,...)
for(i in 1:3){
lines(x[,i],col=lcol[i])
}
}
There is an illustration in the R Graph Gallery (65) giving code both for calculating the bands and for plotting share prices.
The 2005 code still seems to work six years later and will give IBM's current share price and going back several months
The most obvious bug is the width of the bandwidth and volume lower charts which have been narrowed; there may be another over the number of days covered.