Removing NA rows raster - r

I have a raster with rows of NA that I want to remove in R. The rows are not on the edge of the raster, so I tried to use crop and trim but I didn't manage. Could someone give some tips please?
Here the file: https://drive.google.com/drive/folders/0B6-UFgI67v99c3ZhUFp0eWpzOGM

I do not see how that would make sense, conceptually. For a matrix, perhaps, but a raster does not have missing rows (even if the data might be stored in a sparse form, it would still know about them.)
Here is your problem:
library(raster)
r <- raster(nrow=10, ncol=10)
values(r) <- 1:ncell(r)
r[3:4, ] <- NA
You might think that this would do it:
rr <- is.na(r)
i <- which(rowSums(rr) != ncol(rr))
x <- r[i, ,drop=FALSE]
But that does not do what you want, because the raster returned would not be valid. I suppose you could use i in a loop and create a list of RasterLayer objects.
Finally, you can create a SpatialPointsDataFrame, which essentially is a sparse raster like this:
x <- as(r, 'SpatialPointsDataFrame')
But whether that is useful or not depends on what next you want to do with x

Related

Assign the value (1) to NA in tiff files

I have two raster files (values ranges from 0 to 1) and I want to find the difference between them. But the problem is there are certain values those are missing. So I want to assign them value 1 (Like NA=1). How can I do this? Any expert can solve this little query. Thanks
My code is this.
library(raster)
R1 <- raster ("D:/Results/1.tiff")
R2 <- raster ("D:/Results/2.tiff")
Se1= R2-R1
plot(Se1)
How large is your raster files and how limited by memory are you? With raster, the optimal memory safe approach when interacting with large files is to use the reclassify function shown below. Let me know if it works.
# Package names
library(raster)
# Read in files
R1 <- raster("D:/Results/1.tiff")
R2 <- raster("D:/Results/1.tiff")
# use the reclassify function to group values to other values.
# In this case, NA values to 1.Reclassification is done with matrix rcl ,
# in the row order of the reclassify table.
D1 <- reclassify(R1, cbind(NA, 1))
D2 <- reclassify(R2, cbind(NA, 1))
# Find the difference between the two and plot.
Se1 = R2-R1
plot(Se1)
Here is how you may do that with "terra" (the replacement of "raster")
library(terra)
R <- rast(paste0("D:/Results/", 1:2, ".tiff"))
R <- subst(R, NA, 1)
Se1 <- diff(R)

Removing outliers in time series rasters per pixel in R

Basically, I have a time-series of rasters in a stack. Here is my workflow:
Convert the stack to a data frame so each row represents a pixel, and each column represents a data. This process is fairly straightforward, so no issues here.
For each row (pixel), identify outliers and set them to NA. So in this case, I want to set what the outlier is. For example, let's say I want to set all the values larger than the 75th percentile to NA. The goal is so that when I calculate the mean, the outliers don't affect the calculation. The outliers in this case are several magnitudes higher, so they influence the mean significantly.
I got some help online and came up with this code:
my_data %>%
rowwise() %>%
mutate(across(is.numeric, ~ if (. > as.numeric(quantile(across(), .75, na.rm=TRUE))) NA else .))
The problem is that since it is a raster, there are a lot of NA values in some rows that I need the quantile function to ignore while calculating evaluating the cells (see below)
Using na.rm=TRUE seemed to be the solution, but now I am encountering a new error
Error: Problem with mutate() input ..1. i ..1 = across(...). x
missing value where TRUE/FALSE needed i The error occurred in row 1.
I understand that to get around this, I need to tell the if function to ignore the value if it is NA, but the dplyr syntax is very complicated for me, I so need some help on how to do this.
Looking forward to learning more and if there is a better way to do what I'm trying to do. I don't think I did a good job explaining it but, hopefully the code helps.
When asking a R question, you should always include some example data. Either create data with code (see below) or use a file that ships with R (do not use dput if it can be avoided). See the help files that ship with R, or other questions on this site for examples and inspiration.
Example data:
library(terra)
r <- rast(ncols=10, nrows=10, nlyr=10)
set.seed(1)
v <- runif(size(r))
v[sample(size(r), 100)] <- NA
values(r) <- v
Solution:
First write a function that does what you want, and works with a vector
f <- function(x) {
q <- quantile(x, .75, na.rm=TRUE)
x[x>q] <- NA
x
}
Now apply it to the raster data
x <- app(r, f)
With the raster package it would go like
library(raster)
rr <- brick(r)
xx <- calc(rr, f)
Note that you should not create a data.frame, but if you did you could do something like dd <- t(apply(d, 1, f))

Substitute pixels values of all layers in a raster stack

I would need help with raster stack.
I have a 365-layers raster stack (corresponding to an hydrological year) and I want to substitute the pixel with value 3 with the value of the last previous layer with different value.
For example, for a specific pixel I have the first 10 layers that give values:
1,1,2,3,3,3,3,2,1,1.
The layers where the value is 3 should be substituted with, in this case, 2, which is the last value, before the "3-window", different from 3.
This should be done for all layers (except, obviously, the first one) for each pixel.
Any idea?
I just convert my raster in a matrix and I work with matrices but this requires a lot of time.
Let's suppose that vel_3D_snow_P4 is my matrix (retrieved from raster) where I have 365 rows (days) and more than 400000 columns (pixels), I writed this code:
vel_3D_snow_P4_back=matrix(nrow=nrow(vel_3D_snow_P4),ncol=ncol(vel_3D_snow_P4))
for (i in 1:ncol(vel_3D_snow_P4)){
y <- which(vel_3D_snow_P4[,i]==3)
startIndex <- y[!(y-1) %in% y]
stopIndex <- y[!(y+1) %in% y]
matrix=matrix(nrow=length(startIndex),ncol=2)
matrix[,1]=startIndex
matrix[,2]=stopIndex
new_vector_back=vel_3D_snow_P4[,i]
for (j in 1:nrow(matrix)){
if (matrix[j,1]==1) next
new_vector_back[matrix[j,1]:matrix[j,2]]=new_vector_back[matrix[j,1]-1]
}
vel_3D_snow_P4_back=cbind(vel_3D_snow_P4_back,new_vector_back)
print(c("fine",toString(i)))
}
But, as you can imagine, with numerous pixels it is impossible! This is the reason why I was asking for a solution/idea by maintaining raster format (maybe using calc function?)
Thanks in advance.
Typically, the first step with problems like this is in R, is to write a function that operates on a vector. Or search for an existing one. My first google query pointed me to zoo::na.locf
library(zoo)
x <- c(1,1,2,3,3,3,3,2,1,1)
x[x==3] <- NA
na.locf(x)
# [1] 1 1 2 2 2 2 2 2 1 1
Then create example raster data
library(raster)
r <- raster(ncol=10, nrow=10)
s <- stack(lapply(c(1,2,3,3,4), function(i) setValues(r, i)))
And combine the two. You can do
A)
x <- reclassify(s, cbind(3, NA))
z <- calc(x, fun=na.locf)
or
B)
f <- function(v) {
v[v==3] <- NA
na.locf(v)
}
zz <- calc(s, f)

extracting cell numbers from multiple counties in R

I'm new to R so please excuse any terminology mistakes... I'm trying to extract the cell numbers for every county in the state of Oklahoma and paste them on top of each other so that I can use them to look at different temperatures throughout Oklahoma state. I have a shapefile of counties in the US, so I made a vector of all the county ID numbers for the state of OK. I then tried to extract the cell numbers and max temp values for every county in a loop. That extract line that I wrote works when I do it one county at a time, I think it's the okcounty=rbind line that's the problem but I don't know what the best way to do this is.
Thank you for your help! I really appreciate it.
`okcounties=which(counties$STATE_NAME=="Oklahoma") #contains 58 counties
county = NULL
for (i in 1:58){
countyvalues=extract(OK.tmax[[1]], extent(counties[okcounties[i],]), cellnumbers=T)
county=rbind(county, countyvalues) #add data from each of 58 counties
}`
I am finding your code a bit confusing and can see a few places it is going wrong. You are overthinking things a bit. I am not sure why you are extracting cellnumbers and not just taking advantage of extract and the stack object.
The "okcounties" object could be a sp class subset of the counties object, that you could pass directly to extract eg., okcounties <- counties[counties$STATE_NAME=="Oklahoma",] .
If you drop the call to extent, which is returning a bounding box for each county and not the county boundary, things get much simpler. To leverage the stack you could just let extract provide a data.frame of the raster values. Here is a worked example on synthetic data. I approximated your object naming convention for this example. The final object "ok.county" I believe would be the same as the "county" object that you are trying to create.
First, let's create some example data and plot
library(raster)
library(sp)
# create polygons
p <- raster(nrow=10, ncol=10)
p[] <- runif(ncell(p)) * 10
counties <- rasterToPolygons(p, fun=function(x){x > 9})
counties$county <- paste0("county",1:nrow(counties))
counties$STATE_NAME <- c(rep("CA",3),
rep("OK",nrow(counties)-3))
# Create raster stack
r <- raster(nrow=100, ncol=100)
r[] <- runif(ncell(r), 40,70)
r <- stack(r, r+5, r+10) # stack
names(r) <- c("June", "July", "Aug")
plot(r[[1]])
plot(p, add=TRUE, lwd=4)
We can use an index to subset to the state we are interested in.
ok <- counties[counties#data$STATE_NAME == "OK",]
Now we can use extract on the entire raster stack. The resulting object will be a list where each polygon has its own element in the list containing a data.frame. Each column of the data.frame represents a layer in the raster stack object.
ok.county <- extract(r, ok)
class(ok.county)
head(ok.county[[1]])
However, if you want to collapse the list into a single data.frame, unique polygon identifiers are missing. Here we are going to use the ID column in the SpatialPolygonsDataFrame object. Since the list is ordered the same as the polygon object you can assign unique values from the polygon object. In your case it would likely be the county names and the method would follow the same as the example.
cnames <- unique( counties#data$county )
for(i in 1:length(ok.county)) {
ok.county[[i]] <- data.frame(county = cnames[i], ok.county[[i]])
}
head(ok.county[[1]])
Now that we have a unique identifier assigned to each data.frame in the list we can collapse it using do.call.
ok.county <- as.data.frame(do.call("rbind", ok.county))
str(ok.county)
Using an apply function we can pull the maximum value for a given column (time-period) for each unique ID.
tapply(ok.county[,"June"], ok.county$county, max)
As to your original code, something like this would work (obviously, not tested) but there is no unique polygon ID tying results back to the county and it is still the bounding box of the county and not the polygon boundaries.
okcounties <- counties[counties$STATE_NAME=="Oklahoma",]
county = NULL
for (i in 1:nrow(okcounties)){
county <- rbind(county, extract(OK.tmax[[1]],
extent(okcounties[i,]), cellnumbers=T))
}

NA in clustering functions (kmeans, pam, clara). How to associate clusters to original data?

I need to cluster some data and I tried kmeans, pam, and clara with R.
The problem is that my data are in a column of a data frame, and contains NAs.
I used na.omit() to get my clusters. But then how can I associate them with the original data? The functions return a vector of integers without the NAs and they don't retain any information about the original position.
Is there a clever way to associate the clusters to the original observations in the data frame? (or a way to intelligently perform clustering when NAs are present?)
Thanks
The output of kmeans corresponds to the elements of the object passed as argument x. In your case, you omit the NA elements, and so $cluster indicates the cluster that each element of na.omit(x) belongs to.
Here's a simple example:
d <- data.frame(x=runif(100), cluster=NA)
d$x[sample(100, 10)] <- NA
clus <- kmeans(na.omit(d$x), 5)
d$cluster[which(!is.na(d$x))] <- clus$cluster
And in the plot below, colour indicates the cluster that each point belongs to.
plot(d$x, bg=d$cluster, pch=21)
This code works for me, starting with a matrix containing a whole row of NAs:
DF=matrix(rnorm(100), ncol=10)
row.names(DF) <- paste("r", 1:10, sep="")
DF[3,]<-NA
res <- kmeans(na.omit(DF), 3)$cluster
res
DF=cbind(DF, 'clus'=NA)
DF[names(res),][,11] <- res
print(DF[,11])

Resources