Raster::focal function replaces edge cells with NA values - raster

I am trying to use raster::focal to find out how many neighbors of value 1 each raster cell has. However, I have noticed that in the resulting raster the edge cells have been replaced with NA values. How can I get neighbor counts for the outer edge of the raster?
Here is a reproducible example:
#create raster and add 1's and 0's
land <- raster(matrix(0, 8, 10), xmn=408027.5, xmx=413027.5, ymn=4370000,
ymx=4374000)
land[4:8, 2:5] <- 1
land[2:3, 8:9] <- 1
land[1,0:10] <- 1
land[is.na(land[])] <- 0
#plot the raster
plot(land)
#create window for focal function
w <- matrix(1,3,3)
#run raster::focal
land.foc <- focal(land, w=w, fun=sum)
#plot resulting focal raster
plot(land.foc)
#plot NA values in land.foc
plot(is.na(land.foc))
However, as you can see when you compare the two rasters, the outer-most cells in the focal raster have been replaced with NA's.

You just need to set pad=TRUE and padValue=0. This 'extends' your raster and adds virtual rows and columns with you padValue, in this case 0.
land.foc <- focal(land, w=w, fun=sum,pad=T,padValue=0)
plot(land.foc)
plot(is.na(land.foc))
Edit:
Another way of looking at it is that the virtual cells don't have any values, they are NA.
So instead of assigning 0 as padValue, just add na.rm=TRUE to your call.
If you really need to do something else with the virtual cells, you can create your own function, which handles the NA cells more specific, that you then pass into focal.

Related

How can I solve the error while using the for loop in a row operation in R data frame?

I have a dataframe with multiple columns and I am using a for loop to apply a mathematical operation that is being recorded in new columns. The dataframe is named "F39". The code I have written is as follows:
for (i in 2:nrow(F39)) {
#calculating distance from distance formula (both in x and y)
F39$distance[i] <- sqrt((F39$X..cm.[i]-F39$X..cm.[i-1])^2 + (F39$Y..cm.[i]-F39$Y..cm.[i-1])^2)
#calculating fish speed in x and y
F39$fishspeed[i] <- F39$distance[i]/(0.02)
#assigning 0 as the starting fish speed
F39$fishspeed[1] <- 0
#assigning positive and negative signs to the velocity
F39$fishspeed[i] <- ifelse(F39$X..cm.[i]-F39$X..cm.[i-1] < 0,F39$fishspeed[i],-F39$fishspeed[i])
}
However, it gives me the following error:
Error in $<-.data.frame(*tmp*, "distance", value = c(NA, 0.194077783375631 :
replacement has 2 rows, data has 4837
There are 4837 rows in my dataframe. I have many other data frames where I am applying the same code and it is working but here and in some other data frames, it is not working.
I have added the .CSV file with data in google drive: Link to csv file
Your data.frame is missing the column "distance". Therefore it could not save any value in this column using the syntax F39$distance[i] <- ...
The solution would be to create first the column and than do the iteration, e.g.
F39 <- read.csv("C:/Users/kupzig.HYDROLOGY/Downloads/Fish39.csv")
names(F39) #-> no distance as column name
F39$fishspeed[1] <- 0 #assigning 0 as the starting fish speed
F39$distance <- NA #create the distance column
for (i in 2:nrow(F39)) {
#calculating distance from distance formula (both in x and y)
F39$distance[i] <- sqrt((F39$X..cm.[i]-F39$X..cm.[i-1])^2 + (F39$Y..cm.[i]-F39$Y..cm.[i-1])^2)
#calculating fish speed in x and y
F39$fishspeed[i] <- F39$distance[i]/(0.02)
#assigning positive and negative signs to the velocity
F39$fishspeed[i] <- ifelse(F39$X..cm.[i]-F39$X..cm.[i-1] < 0,F39$fishspeed[i],-F39$fishspeed[i])
}
Note that it would be clever to put all operations outside the loop which are independent from i or independent from any other pre-step which is dependent on i. This will save you in the future calculation time.

Substitute pixels values of all layers in a raster stack

I would need help with raster stack.
I have a 365-layers raster stack (corresponding to an hydrological year) and I want to substitute the pixel with value 3 with the value of the last previous layer with different value.
For example, for a specific pixel I have the first 10 layers that give values:
1,1,2,3,3,3,3,2,1,1.
The layers where the value is 3 should be substituted with, in this case, 2, which is the last value, before the "3-window", different from 3.
This should be done for all layers (except, obviously, the first one) for each pixel.
Any idea?
I just convert my raster in a matrix and I work with matrices but this requires a lot of time.
Let's suppose that vel_3D_snow_P4 is my matrix (retrieved from raster) where I have 365 rows (days) and more than 400000 columns (pixels), I writed this code:
vel_3D_snow_P4_back=matrix(nrow=nrow(vel_3D_snow_P4),ncol=ncol(vel_3D_snow_P4))
for (i in 1:ncol(vel_3D_snow_P4)){
y <- which(vel_3D_snow_P4[,i]==3)
startIndex <- y[!(y-1) %in% y]
stopIndex <- y[!(y+1) %in% y]
matrix=matrix(nrow=length(startIndex),ncol=2)
matrix[,1]=startIndex
matrix[,2]=stopIndex
new_vector_back=vel_3D_snow_P4[,i]
for (j in 1:nrow(matrix)){
if (matrix[j,1]==1) next
new_vector_back[matrix[j,1]:matrix[j,2]]=new_vector_back[matrix[j,1]-1]
}
vel_3D_snow_P4_back=cbind(vel_3D_snow_P4_back,new_vector_back)
print(c("fine",toString(i)))
}
But, as you can imagine, with numerous pixels it is impossible! This is the reason why I was asking for a solution/idea by maintaining raster format (maybe using calc function?)
Thanks in advance.
Typically, the first step with problems like this is in R, is to write a function that operates on a vector. Or search for an existing one. My first google query pointed me to zoo::na.locf
library(zoo)
x <- c(1,1,2,3,3,3,3,2,1,1)
x[x==3] <- NA
na.locf(x)
# [1] 1 1 2 2 2 2 2 2 1 1
Then create example raster data
library(raster)
r <- raster(ncol=10, nrow=10)
s <- stack(lapply(c(1,2,3,3,4), function(i) setValues(r, i)))
And combine the two. You can do
A)
x <- reclassify(s, cbind(3, NA))
z <- calc(x, fun=na.locf)
or
B)
f <- function(v) {
v[v==3] <- NA
na.locf(v)
}
zz <- calc(s, f)

Extracting raster values, from maximum, to cumulatively sum to x

I am trying to determine the location of raster cells that add up to a given amount, starting with the maximum value and progressing down.
Eg, my raster of 150,000 cells has a total sum value of 52,000,000;
raster1 <- raster("myvalues.asc")
cellStats(raster1,sum) = 52,000,000
I can extract the cells above the 95th percentile;
q95 <- raster1
q95[q95 < quantile(q95,0.95)] <- NA
cellStats(q95,sum) = 14,132,000
as you can see, the top 5% cells (based upon quantile maths) returns around 14 million of the original total of 'raster1'.
What i want to do is predetermine the overall sum as 10,000,000 (or x) and then cumulatively sum raster cells, starting with the maximum value and working down, until I have (and can plot) all cells that sum up to x.
I have attempted to convert 'raster1' to a vector, sort, cumulative sum etc but can't tie it back to the raster. Any help here much appreciated
S
The below is your own answer, but rewritten such that it is more useful to others (self contained). I have also changed the %in% to < which should be much more efficient.
library(raster)
r <- raster(nr=100, nc=100)
r[] = sample(ncell(r))
rs <- sort(as.vector(r), decreasing=TRUE)
r_10m <- min( rs[cumsum(rs) < 10000000] )
test <- r
test[test < r_10m ] <- NA
cellStats(test, sum)
couldnt find the edit button.....
this is something like what i need, after an hour scratching my head;
raster1v <- as.vector(raster1)
raster1vdesc <- sort(raster1v, decreasing=T)
raster1_10m <- raster1vdesc[cumsum(raster1vdesc)<10000000]
test <- raster1
test[!test%in%raster1_10m] <- NA
plot(test)
cellStats(test,sum) = 9,968,073
seems to work, perhaps, i dunno. Anything more elegant would be ideal

NA in clustering functions (kmeans, pam, clara). How to associate clusters to original data?

I need to cluster some data and I tried kmeans, pam, and clara with R.
The problem is that my data are in a column of a data frame, and contains NAs.
I used na.omit() to get my clusters. But then how can I associate them with the original data? The functions return a vector of integers without the NAs and they don't retain any information about the original position.
Is there a clever way to associate the clusters to the original observations in the data frame? (or a way to intelligently perform clustering when NAs are present?)
Thanks
The output of kmeans corresponds to the elements of the object passed as argument x. In your case, you omit the NA elements, and so $cluster indicates the cluster that each element of na.omit(x) belongs to.
Here's a simple example:
d <- data.frame(x=runif(100), cluster=NA)
d$x[sample(100, 10)] <- NA
clus <- kmeans(na.omit(d$x), 5)
d$cluster[which(!is.na(d$x))] <- clus$cluster
And in the plot below, colour indicates the cluster that each point belongs to.
plot(d$x, bg=d$cluster, pch=21)
This code works for me, starting with a matrix containing a whole row of NAs:
DF=matrix(rnorm(100), ncol=10)
row.names(DF) <- paste("r", 1:10, sep="")
DF[3,]<-NA
res <- kmeans(na.omit(DF), 3)$cluster
res
DF=cbind(DF, 'clus'=NA)
DF[names(res),][,11] <- res
print(DF[,11])

How to know the percentage of occurence of a value in a raster in R?

If have a raster r:
r <- raster(nrows=10, ncols=10)
values(r) <- runif(ncell(r))
Now I would like to know how much (in percent) the value 0.5 occurs in this raster in comparison to the others(in other words how many pixels that contain this value among all pixels of this raster)
thanks
I've taken the liberty to use different sample data as your comment indicated that you are not interested in continuous data.
values(r) <- rpois(ncell(r),3)
You can convert your raster to a vector by using as.vector, and then tabulate this and find the proportions of each element. table ignores NA values. Here I select the proportion of values that are equal to 2.
prop.table(table(as.vector(r)))["2"]
2
0.19
Note that 2 is used as a string as it is a label for the vector.

Resources