How can I use the for function to stack new rasters? - raster

I am trying to create a new raster from calculating the difference in values between existing rasters. I want to find the difference between all of the existing rasters and one specific raster. Then, I want to stack all of these rasters. I typed out the entire calculations for 60 rasters, but I want to know the faster way using for.
Change<- stack(
AMTs$X1950-AMTs$X1950,
AMTs$X1951-AMTs$X1950,
AMTs$X1952-AMTs$X1950,
AMTs$X1953-AMTs$X1950,
AMTs$X1954-AMTs$X1950,
AMTs$X1955-AMTs$X1950,
AMTs$X1956-AMTs$X1950,
AMTs$X1957-AMTs$X1950,
AMTs$X1958-AMTs$X1950,
AMTs$X1959-AMTs$X1950,
AMTs$X1960-AMTs$X1950,
AMTs$X1961-AMTs$X1950,
AMTs$X1962-AMTs$X1950,
AMTs$X1963-AMTs$X1950,
AMTs$X1964-AMTs$X1950,
AMTs$X1965-AMTs$X1950,
AMTs$X1966-AMTs$X1950,
AMTs$X1967-AMTs$X1950,
AMTs$X1968-AMTs$X1950,
AMTs$X1969-AMTs$X1950,
AMTs$X1970-AMTs$X1950,
AMTs$X1971-AMTs$X1950,
AMTs$X1972-AMTs$X1950,
AMTs$X1973-AMTs$X1950,
AMTs$X1974-AMTs$X1950,
AMTs$X1975-AMTs$X1950,
AMTs$X1976-AMTs$X1950,
AMTs$X1977-AMTs$X1950,
AMTs$X1978-AMTs$X1950,
AMTs$X1979-AMTs$X1950,
AMTs$X1980-AMTs$X1950,
AMTs$X1981-AMTs$X1950,
AMTs$X1982-AMTs$X1950,
AMTs$X1983-AMTs$X1950,
AMTs$X1984-AMTs$X1950,
AMTs$X1985-AMTs$X1950,
AMTs$X1986-AMTs$X1950,
AMTs$X1987-AMTs$X1950,
AMTs$X1988-AMTs$X1950,
AMTs$X1989-AMTs$X1950,
AMTs$X1990-AMTs$X1950,
AMTs$X1991-AMTs$X1950,
AMTs$X1992-AMTs$X1950,
AMTs$X1993-AMTs$X1950,
AMTs$X1994-AMTs$X1950,
AMTs$X1995-AMTs$X1950,
AMTs$X1996-AMTs$X1950,
AMTs$X1997-AMTs$X1950,
AMTs$X1998-AMTs$X1950,
AMTs$X1999-AMTs$X1950,
AMTs$X2000-AMTs$X1950,
AMTs$X2001-AMTs$X1950,
AMTs$X2002-AMTs$X1950,
AMTs$X2003-AMTs$X1950,
AMTs$X2004-AMTs$X1950,
AMTs$X2005-AMTs$X1950,
AMTs$X2006-AMTs$X1950,
AMTs$X2007-AMTs$X1950,
AMTs$X2008-AMTs$X1950,
AMTs$X2009-AMTs$X1950
)

Is this what you're looking for?
create a function that takes the minus of every layer equal to and after 1950 from 1950.
minus<-function(dd, cc) {
return(dd-cc)
}
#now use overlay() from raster to create this new raster object
change <- overlay(AMTs[[1:59]], AMTs$1950, fun=minus)
breakdown:
the x variable from overlay AMTs[[1:59]] is equal to dd from the minus function, and the y variable AMTs$1950 is equal to cc from the minus function.

Related

raster: Modifications of the methods in resample() function

I’ve like to know the possibility of make some modifications in resample() function in raster package. First, in "bilinear" method by default is assigns a weighted average of the four nearest cells, and I’ll like to change for different number nearest cells, too, is possible? Second, is it possible too to create the mean method to calculate the arithmetic average of the n nearest cells too?
For example, in the first case for 25 cells: resample (myraster, myresolution, window=matrix (nrow=5,nol=5), method="bilinear") and in the second case: resample (myraster, myresolution, window=matrix (nrow=5,nol=5), fun=mean).
You cannot do all that, but you can use the focal function on the input data prior to using resample.

Averaging different length vectors with same domain range in R

I have a dataset that looks like the one shown in the code.
What I am guaranteed is that the "(var)x" (domain) of the variable is always between 0 and 1. The "(var)y" (co-domain) can vary but is also bounded, but within a larger range.
I am trying to get an average over the "(var)x" but over the different variables.
I would like some kind of selective averaging, not sure how to do this in R.
ax=c(0.11,0.22,0.33,0.44,0.55,0.68,0.89)
ay=c(0.2,0.4,0.5,0.42,0.5,0.43,0.6)
bx=c(0.14,0.23,0.46,0.51,0.78,0.91)
by=c(0.1,0.2,0.52,0.46,0.4,0.41)
qx=c(0.12,0.27,0.36,0.48,0.51,0.76,0.79,0.97)
qy=c(0.03,0.2,0.52,0.4,0.45,0.48,0.61,0.9)
a<-list(ax,ay)
b<-list(bx,by)
q<-list(qx,qy)
What I would like to have something like
avgd_x = c(0.12,0.27,0.36,0.48,0.51,0.76,0.79,0.97)
and
avgd_y would have contents that would
find the value of ay and by at 0.12 and find the mean with ay, by and qy.
Similarly and so forth for all the values in the vector with the largest number of elements.
How can I do this in R ?
P.S: This is a toy dataset, my dataset is spread over files and I am reading them with a custom function, but the raw data is available as shown in the code below.
Edit:
Some clarification:
avgd_y would have the length of the largest vector, for example, in the case above, avgd_y would be (ay'+by'+qy)/3 where ay' and by' would be vectors which have c(ay(qx(i))) and c(by(qx(i))) for i from 1 to length of qx, ay' and by' would have values interpolated at data points of qx

Bourdet Derivative in R with Smoothing Window

I am calculating pressure derivatives using algorithms from this PDF:
Derivative Algorithms
I have been able to implement the "two-points" and "three-consecutive-points" methods relatively easily using dplyr's lag/lead functions to offset the original columns forward and back one row.
The issue with those two methods is that there can be a ton of noise in the high resolution data we use. This is why there is the third method, "three-smoothed-points" which is significantly more difficult to implement. There is a user-defined "window width",W, that is typically between 0 and 0.5. The algorithm chooses point_L and point_R as being the first ones such that ln(deltaP/deltaP_L) > W and ln(deltaP/deltaP_R) > W. Here is what I have so far:
#If necessary install DPLYR
#install.packages("dplyr")
library(dplyr)
#Create initial Data Frame
elapsedTime <- c(0.09583, 0.10833, 0.12083, 0.13333, 0.14583, 0.1680,
0.18383, 0.25583)
deltaP <- c(71.95, 80.68, 88.39, 97.12, 104.24, 108.34, 110.67, 122.29)
df <- data.frame(elapsedTime,deltaP)
#Shift the elapsedTime and deltaP columns forward and back one row
df$lagTime <- lag(df$elapsedTime,1)
df$leadTime <- lead(df$elapsedTime,1)
df$lagP <- lag(df$deltaP,1)
df$leadP <- lead(df$deltaP,1)
#Calculate the 2 and 3 point derivatives using nearest neighbors
df$TwoPtDer <- (df$leadP - df$lagP) / log(df$leadTime/df$lagTime)
df$ThreeConsDer <- ((df$deltaP-df$lagP)/(log(df$elapsedTime/df$lagTime)))*
((log(df$leadTime/df$elapsedTime))/(log(df$leadTime/df$lagTime))) +
((df$leadP-df$deltaP)/(log(df$leadTime/df$elapsedTime)))*
((log(df$elapsedTime/df$lagTime))/(log(df$leadTime/df$lagTime)))
#Calculate the window value for the current 1 row shift
df$lnDeltaT_left <- abs(log(df$elapsedTime/df$lagTime))
df$lnDeltaT_right <- abs(log(df$elapsedTime/df$leadTime))
Resulting Data Table
If you look at the picture linked above, you will see that based on a W of 0.1, only row 2 matches this criteria for both the left and right point. Just FYI, this data set is an extension of the data used in example 2.5 in the referenced PDF.
So, my ultimate question is this:
How can I choose the correct point_L and point_R such that they meet the above criteria? My initial thoughts are some kind of while loop, but being an inexperienced programmer, I am having trouble writing a loop that gets anywhere close to what I am shooting for.
Thank you for any suggestions you may have!

How to use pointDistance with a very large vector

I've got a big problem.
I've got a large raster (rows=180, columns=480, number of cells=86400)
At first I binarized it (so that there are only 1's and 0's) and then I labelled the clusters.(Cells that are 1 and connected to each other got the same label.)
Now I need to calculate all the distances between the cells, that are NOT 0.
There are quiet a lot and that's my big problem.
I did this to get the coordinates of the cells I'm interested in (get the positions (i.e. cell numbers) of the cells, that are not 0):
V=getValues(label)
Vu=c(1:max(V))
pos=which(V %in% Vu)
XY=xyFromCell(label,pos)
This works very well. So XY is a matrix, which contains all the coordinates (of cells that are not 0). But now I'm struggling. I need to calculate the distances between ALL of these coordinates. Then I have to put each one of them in one of 43 bins of distances. It's kind of like this (just an example):
0<x<0.2 bin 1
0.2<x<0.4 bin2
When I use this:
pD=pointDistance(XY,lonlat=FALSE)
R says it's not possible to allocate vector of this size. It's getting too large.
Then I thought I could do this (create an empty data frame df or something like that and let the function pointDistance run over every single value of XY):
for (i in 1:nrow(XY))
{pD=PointDistance(XY,XY[i,],lonlat=FALSE)
pDbin=as.matrix(table(cut(pD,breaks=seq(0,8.6,by=0.2),Labels=1:43)))
df=cbind(df,pDbin)
df=apply(df,1,FUN=function(x) sum(x))}
It is working when I try this with e.g. the first 50 values of XY.
But when I use that for the whole XY matrix it's taking too much time.(Sometimes this XY matrix contains 10000 xy-coordinates)
Does anyone have an idea how to do it faster?
I don't know if this will works fast or not. I recommend you try this:
Let say you have dataframe with value 0 or 1 in each cell. To find coordinates all you have to do is write the below code:
cord_matrix <- which(dataframe == 1, arr.ind = TRUE)
Now, you get the coordinate matrix with row index and column index.
To find the euclidean distance use dist() function. Go through it. It will look like this:
dist_vector <- dist(cord_matrix)
It will return lower triangular matrix. can be transformed into vector/symmetric matrix. Now all you have to do is calculating bins according to your requirement.
Let me know if this works within the specific memory space.

Calculate sum of array cells within a given radius

This questions comes after a calculation in GIS (ArcMap 10.1) takes over a month to calculate (and didn't finish yet). Now I am trying to find a faster solution in R.
I have a matrix of ~30,000 x 80,000 cells, where each cell represents a 5x5 meters square. I need to calculate the sum of values in cells that fall within a given radius (3000 meters) from each cell.
For the cells on the edge of the matrix I assume a value of 0 outside the matrix.
The question is how to define the cells that fall within the radius.
There must be a library that has this functionality, but I couldn't find any.
Any suggestions?
A quick method you can test, would be to use extract and set buffer to 3000m and then use sum in the fun argument. You can sequentially extract each cell number in your raster. But I still think this will take an inordinate amount of time. Let's assume your raster is called r....
# in the first instance I would set y to be smallish, like say 1:100 and see how long it takes
extract( r , y = 1:ncell(r) , buffer = 3000 , fun = sum )
Now, the raster package does have some parallelism built in, which with access to a large, large, large multi-core machine could speed up your operation a bit by running...
beginCluster()
extract( r , y = 1:ncell(r) , buffer = 3000 , fun = sum )
endCluster()
Don't forget to assign the output of extract to a variable.

Resources