I'm trying to create a Koppen world map using data from http://worldclim.org. To find the right Koppen climate I need precipitation and temperature data (I have one raster map for each month for each of those two variables) and the latitude.
I tried doing the following :
prast <- list.files(path = "prec25/", pattern = glob2rx('*.tif'), full.names = T)
trast <- list.files(path = "temp25/", pattern = glob2rx('*.tif'), full.names = T)
lrast <- c(prast, trast)
climrast <- stack(lrast)
koppen_map <- calc(climrast, filename = "koppen.tif", fun = function(x) koppen(x[13:24], x[1:12], yFromCell(climrast, x[1])))
climrast is a RasterStack with the 24 different layers (12 layers with temperature data and 12 layers with precipitation data). The koppen function needs a vector with 12 values for temperature (that would be x[13:24]) and 12 values for temperature (x[1:12]).
yFromCell(climrast, x[1]) should give me the latitude but the calc operation fails because yFromCell(climrast, x[1]) returns NA in some cases.
If I replace the yFromCell(climrast, x[1]) with an arbitrary number like 10, the calc operation works fine.
Any idea what I'm doing wrong?
The memory-safe (and simple) way to get a RasterLayer with latitude values, you can do:
x <- init(climrast, 'y')
A working example with worldclim data:
library(raster)
prast <- getData('worldclim', var='prec', res=10)
tmin <- getData('worldclim', var='tmin', res=10)
tmax <- getData('worldclim', var='tmin', res=10)
trast <- (tmin + tmax) / 2
lat <- init(trast, 'y')
lrast <- stack(prast, trast, lat)
climrast <- crop(lrast, extent(25,30,-5,0))
# example function
koppen <- function(temp, prec, lat) {
(sum(temp * prec) + lat) / 1000
}
koppen_map <- calc(climrast, filename = "koppen.tif", fun = function(x) koppen(x[13:24], x[1:12], x[25]), overwrite=TRUE)
In your calc you are passing x[1] to yFromCell. But x[1] is the value of the raster cell, whereas you need to pass the cell number to yFromCell. I can illustrate with a minimal example:
First lets make a small dummy raster
library(raster)
set.seed(0)
clim = raster(matrix(sample(c(1:10,NA), 100, T), 10, 10))
Now lets try to get its latitudes using an analogy of what you had in the example
lat = calc(clim, function(x) yFromCell(clim, x))
plot(lat)
As you can see, that's not right at all - we got entirely the wrong latitude values because we passed the cell value rather than the cell number.
So lets make a raster layer that has the correct latitudes
lat = clim
lat[] = yFromCell(clim, 1:ncell(clim))
plot(lat)
That's much better. Now we can add this as a layer to our climate data, so that calc can access these values on a cell by cell basis.
climrast = stack(list(clim, lat))
koppen = calc(climrast, function(x) x[1]*x[2])
Related
I am trying to use for loops (or the apply function as an alternative) to extract coordinates from a data.frame, search for the closest point within the E-OBS gridded dataset, extract the temperature-data for time x1-x2 and write it to another excel file.
While the code works to extract single data points, I seem unable to include this code within a loop and to add the results potentially next to the input-coordinates.
library(sp)
library(raster)
library(ncdf4)
#Coordinates
df
ID site E N
1 1 site_place_date1 7.558758 47.81004
2 2 site_place_date2 7.582749 47.63411
3 3 site_place_date3 7.607968 48.01475
4 4 site_place_date4 7.644660 47.67139
... ... ... ...`
Set coordinates of target point MANUALLY:
lon <- 7.558758 # longitude of location
lat <- 47.81004 # latitude of location
#Mean daily temperature
ncin <- nc_open("tg_0.25deg_reg_v17.0.nc")
print(ncin)
t <- ncvar_get(ncin,"time")
tunits <- ncatt_get(ncin,"time","units")nt <- dim(t)
nt
obsoutput <- ncvar_get(ncin,
start= c(which.min(abs(ncin$dim$longitude$vals - lon)), # look for closest long
which.min(abs(ncin$dim$latitude$vals - lat)), # look for closest lat
1),
count=c(1,1,-1))
DataMeanT <- data.frame(DateN= t, MeanDailyT = obsoutput)
nc_close(ncin)
head(DataMeanT)
#check if there are NAs =999
summary(DataMeanT)
Data = DataMeanT
Data$Date = as.Date(Data$DateN,origin="20000-01-01")
Data$Year = format(Data$Date,"%Y")
Data$Month = format(Data$Date,"%m")
head(Data)
Data$YearMonth = format(Data$Date, format="%Y-%b")
Data_annual = aggregate(("T_AnnualMean" = MeanDailyT) ~ Year,data = Data, FUN = mean,na.action = na.pass)
names(Data_annual)[2] <- "AirT"
head(Data_annual)
#Export table
write.table(Data_annual, "Site_AirTemp.csv", row.names = FALSE, append = FALSE, col.names = TRUE, sep = ", ", quote = TRUE)
The aim is to run the script as part of a loop for all coordinates in df and to write the temperature data to a new data-table with information on site-ID or alternatively into the next columns of df.
Simply wrap your entire process in a defined method and use an apply function to pass in lon/lat coordinates. One great candidate is mapply or its wrapper Map to iterate elementwise between both vectors of df$E and df$N. Also, a third argument, df$site, is passed into method for unique CSV names as right now the same file will be overwritten.
Below some non-assignment lines such as head or summary are removed since they do nothing inside a method. Also context managers, within and with are used to avoid repetition of Data$ for more streamlined data manipulation. The Map call writes to file AND builds a list of aggregated data frames for use later.
Function
my_function <- function(lon, lat, site) {
# Mean daily temperature
ncin <- nc_open("tg_0.25deg_reg_v17.0.nc")
print(ncin)
t <- ncvar_get(ncin,"time")
tunits <- ncatt_get(ncin,"time","units")nt <- dim(t)
# look for closest lon and lat
obsoutput <- ncvar_get(ncin,
start = c(which.min(abs(ncin$dim$longitude$vals - lon)),
which.min(abs(ncin$dim$latitude$vals - lat)),
1),
count = c(1,1,-1))
DataMeanT <- data.frame(DateN = t, MeanDailyT = obsoutput)
nc_close(ncin)
Data <- within(DataMeanT, {
Date <- as.Date(DateN, origin="2000-01-01")
Year <- format(Date,"%Y")
Month <- format(Date,"%m")
YearMonth <- format(Date, format="%Y-%b")
})
Data_annual <- with(Data, aggregate(list("AirT" = MeanDailyT), list(Year=Year),
FUN = mean, na.action = na.pass))
# Export table
write.table(Data_annual, paste0("Site_AirTemp_", site, "_.csv"), row.names=FALSE,
append = FALSE, col.names = TRUE, sep = ", ", quote = TRUE)
# SAVE AGGREGATED DATA FRAME
return(Data_annual)
}
Call
# ITERATE THROUGH EACH LON/LAT PAIR ELEMENTWISE
df_list <- Map(my_function, df$E, df$N, df$site)
# df_list <- mapply(my_function, df$E, df$N, df$site, SIMPLIFY=FALSE) # EQUIVALENT CALL
You can probably do:
library(raster)
b <- brick("tg_0.25deg_reg_v17.0.nc")
e <- extract(b, df[, c('E', 'N')])
I'm trying to calculate the SPI from CHIRPS monthly mean precipitation data, because it's too large I cut it down to my area of interest and here it is: https://www.dropbox.com/s/jpwcg8j5bdc5gq6/chirps_mensual_v1.nc?dl=0
I did this to open it:
require(utils)
require(colorRamps)
require(RNetCDF)
require(rasterVis)
require(rgdal)
library(ncdf4)
library(raster)
datos2 <- nc_open("Datos/chirps_mensual_v1.nc")
ppt_array <- ncvar_get(datos2, "precip")
#I'm only taking complete years so I took out two months from 2018
ppt_mes <- ppt_array[ , ,1:444]
I know there is a SPI library but I don't know how should I format the data in order to use it. So I tried to do it without the function by fitting the gamma distribution but I dont' know how to do it for this data base.
Does anyone know how to calculate SPI either with the function or by fitting the distribution?
I don't think the SPI package is doing what you (or anyone) thinks it is doing. If you use debug(spi) and step through the code, you'll see that in one step it fits a empirical cumulative distribution function (with ecdf()) to the first two and last rows of data. Why the first two and last rows? I have no clue, but whoever wrote this package also used a for loop to do t() to a matrix. Not to mention that I think it should use a Gamma distribution or Pearson III distribution not ecdf() (according to Guttman, N.B. (1999) Accepting the standardized precipitation index: a calculation algorithm. JAWRA Journal of the American Water Resources Association, 35, 311–322.).
At the end I made it by using the SPI library, the result will be a value for each month in each grid point, if you want to calculate the value over a specific area I made that too but I can share it if you want it too:
Also, this one I made it using CRU data but you can adjust it:
#spei cru 1x1
rm(list=ls(all=TRUE)); dev.off()
require(utils)
require(RNetCDF)
require(rasterVis)
require(rgdal)
library(ncdf4)
require(SPEI)
########################################################################################################
prec <- open.nc("pre_mensual.nc")
lon <- length(var.get.nc(prec, "lon"))
lat <- length(var.get.nc(prec, "lat"))
lon1 <- var.get.nc(prec, "lon")
lat1 <- var.get.nc(prec, "lat")
ppt <- var.get.nc(prec, "pre")
ppt <- ppt[ , ,109:564] #31 18 456 (1980-2017)
anio = 456/12
###########################################################################################################
#Reshape data
precip <- sapply(1:dim(ppt)[3], function(x)t(ppt[,,x]))
############################################################################################################
#This is for SPI-6, you can use either of them
spi_6 <- array(list(),(lon*lat))
for (i in 1:(lon*lat)) {
spi_6[[i]] <- spi(precip[i,], scale=6, na.rm=TRUE)
}
#############################################################################################################
#Go back to an array form
sapply(spi_6, '[[',2 )->matriz_ppt
ppt_6 <- array(aperm(matriz_ppt, c(2,1),c(37,63,456)));spi_c <- array(t(ppt_6), dim=c(37,63,456))
#############################################################################################################
#Save to netcdf
for(i in 1:456) {
nam <- paste("SPI", i, sep = "")
assign(nam,raster((spi_c[ , ,i]), xmn=min(lon1), xmx=max(lon1), ymn=min(lat1), ymx=max(lat1), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0")) )
}
gpcc_spi <- stack(mget(paste0("SPI", 1:456)))
outfile <- "spi6_cru_1980_2017.nc"
crs(gpcc_spi) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
writeRaster(gpcc_spi, outfile, overwrite=TRUE, format="CDF", varname="SPEI", varunit="units",longname="SPEI CRU", xname="lon", yname="lat")
It's not the most stylish way to calculate it but it does work. :)
EDIT: If you want to calculate the SPI/SPEI over an area this is what I did:
library(SPEI)
library(ncdf4)
library(raster)
#
pre_nc <- nc_open("pre_1971_2017_Vts4.nc")
pre <- ncvar_get(pre_nc, "pre")
pre <- pre[, , 109:564] #This is for the time I'm interested in
lats <- ncvar_get(pre_nc, "lat")
lons <- ncvar_get(pre_nc, "lon")
times <- 0:467
# Read mask
#This is a mask you need to create that adjusts to your region of interest
#It consist of a matrix of 0's and 1's, the 1's are placed in the area
#you are interested in
mask1 <- nc_open("cuenca_IV_CDO_05_final.nc")
m1 <- ncvar_get(mask1, "Band1")
m1[m1 == 0] <- NA
#
# Apply mask to data
#
pre1 <- array(NA, dim=dim(pre))
#
for(lon in 1:length(lons)){
for(lat in 1:length(lats)){
pre1[lon,lat,] <- pre[lon,lat,]*m1[lon,lat]
}
}
#
# Mean over the area of interest
#
mean_pre1 <- apply(pre1,c(3),mean, na.rm=TRUE)
# Calculate SPI/SPEI
spi1 <- matrix(data= NA, nrow = 456, ncol = 48)
for (i in 1:48) {
spi1[,i] <- spi(data=ts(mean_pre1,freq=12),scale= i)$fitted
}
#This calculates SPI/SPEI-1 to SPI/SPEI-48, you can change it
# Save
#
write.table(spi1,'spi_1980_2017.csv',sep=';',row.names=FALSE)
SO-gurues!
I am trying to count the densities of surviving units in different gridcells.
I have two shapefiles with points from the two survey periods in question (one before and one after the mortality event). What I intend is to see whether there is a difference in survival rates and link the proportion of survival to any climatic variable obtained from the raster value of the desired grid. In the code snippet below I have created some random raster and shapefiles.
packs = c('raster', 'rgdal', 'spatstat', 'sp' ,'dplyr')
sapply(packs, FUN = 'require', character.only = TRUE)
xy <- matrix(rnorm(1024),32,32) #Creating the desired raster
image(xy)
rast <- raster(xy)
extent(rast) <- c(36,37,-3,-2)
projection(rast) <- CRS("+proj=longlat +datum=WGS84")
points <- runifpoint(n =4000, c(36,37,-3,-2)) # Creating the points
x <- points$x
y <- points$y
values <- c(rep(1, 900), rep(0, 3100))
xy <- cbind(x, y)
points <- cbind(x, y, values)
points <- data.frame(points)
shp <- SpatialPointsDataFrame(coords = xy, data = data.frame(values) ) # creating shpfiles
projection(shp) <- CRS("+proj=longlat +datum=WGS84")
subs <- filter(points, values == 1)
suxy <- select(subs, x,y)
shpsub <- SpatialPointsDataFrame(coords = suxy, data = data.frame(subs$values)) # creating shpfiles
projection(shpsub) <- CRS("+proj=longlat +datum=WGS84")
When I attempt to extract the points I use the following lines of code
shp <- spTransform(shp, projection(rast)) # make sure they have same transformation
shpsub <- spTransform(shpsub, projection(rast))
XY <- xyFromCell(rast, cell = 1:ncell(rast))
v <- as.data.frame(rast) #Extract values from raster
XY <- data.frame(XY, v) # Creating a data frame containing coord., cellno and value
XY$cell <- c(1:ncell(rast))
cells <- cellFromXY(rast,shp) # find which cells the points are in
cells <- rle(cells) # returns a value and a length, fast for counting
cellsfound <- cellFromXY(rast,shpsub)
cellsfound <- rle(cellsfound)
Proportion <- data.frame(cell = cells$values, shp = cells$lengths)
test <- data.frame(cell = rep(NA,NROW(Proportion)), shpsub = rep(NA, NROW(Proportion)))
test$cell <- c(cellsfound$values, rep(NA, nrow(test) - length(cellsfound$values)))
test$shpsub <- c(cellsfound$lengths, rep(NA, NROW(test) - length(cellsfound$lengths)))
Proportion <- full_join(Proportion, test, by = "cell")
test.Proportion <- mutate(Proportion, Proportion = shpsub/shp) #Calculating Proportion
XY <- left_join(XY, test.Proportion, by = "cell") # Adding Proportion to coord and cell no.
XY.m <- summarise(XY, )
XY <- na.omit(XY) ; XY <- XY[,-4]
As I see it. Using rle() returns the same cells multiple times instead of counting the no of points within each individual cell as was my intention. Can anyone please explain me how to do this in a way that retrieves the information on the number of occurrences in the individual cells?
I wrote the following script to produce the raw data for plotting the above map. The problem is, for 550,000 data points, this takes about 2 hours to run on a relatively powerful machine. I'm new to R, however, and I'm wondering if there are any optimized functions I can take advantage of?
The basic idea is that, given a set of geospatial data, you split the set into 200 rows, and split each row into a bunch of squares. You then calculate the total of a value in each square in a row. The approach I've taken below is to take the "upper left" point of a square, calculate the latitude/longitude of edges of the square, and exclude all points not in those bounds, and then sum what remains. Is there a better way without using a solution like PostGIS?
all.data <- read.csv("FrederictonPropertyTaxDiffCleanedv3.csv", header=TRUE,
stringsAsFactors=FALSE)
all.data$X <- as.numeric(all.data$X)
all.data$Y <- as.numeric(all.data$Y)
startEnd <- function(lats, lngs) {
# Find the "upper left" (NW) and "bottom right" (SE) coordinates of a set of data.
#
# Args:
# lats: A list of latitude coordinates
# lngs: A list of longitude coordinates
#
# Returns:
# A list of values corresponding to the northwest-most and southeast-most coordinates
# Convert to real number and remove NA values
lats <- na.omit(as.numeric(lats))
lngs <- na.omit(as.numeric(lngs))
topLat <- max(lats)
topLng <- min(lngs)
botLat <- min(lats)
botLng <- max(lngs)
return(c(topLat, topLng, botLat, botLng))
}
startEndVals <- startEnd(all.data$Y, all.data$X)
startLat <- startEndVals[1]
endLat <- startEndVals[3]
startLng <- startEndVals[2]
endLng <- startEndVals[4]
num_intervals = 200.0
interval <- (startEndVals[1] - startEndVals[3]) / num_intervals
# testLng <- -66.6462379307115
# testLat <- 45.9581234392
# Prepare the data to be sent in
data <- all.data[,c("Y", "X", "levy2014_ha")]
sumInsideSquare <- function(pointLat, pointLng, interval, data) {
# Sum all the values that fall within a square on a map given a point,
# an interval of the map, and data that contains lat, lng and the values
# of interest
colnames(data) <- c("lat", "lng", "value")
# Data east of point
data <- data[data$lng > pointLng,]
# Data west of point + interval
data <- data[data$lng < pointLng + interval,]
# Data north of point + interval (down)
data <- data[data$lat > pointLat - interval,]
# Data south of point
data <- data[data$lat < pointLat, ]
# Clean remaining data
data <- na.omit(data)
return(sum(data$value))
}
# Debugging
# squareSumTemp <- sumInsideSquare(testLat, testLng, interval, data)
# Given a start longitude and an end longitude, calculate an array of values
# corresponding to the sums for that latitude
calcSumLat <- function(startLng, endLng, lat, interval, data) {
row <- c()
lng <- startLng
while (lng < endLng) {
row <- c(row, sumInsideSquare(lat, lng, interval, data))
lng <- lng + interval
}
return(row)
}
# Debugging
# rowTemp <- calcSumLat(startLng, endLng, testLat, interval, data)
# write.csv(rowTemp, file = "Temp.csv", row.names = FALSE)
# Get each line of data to plot
lat <- startLat
rowCount <- 1
all.sums <- list()
while (lat > endLat) {
col <- calcSumLat(startLng, endLng, lat, interval, data)
all.sums[[as.character(rowCount)]] <- col
lat <- lat - interval
rowCount <- rowCount + 1
}
# Convert to data frame
all.sums.frame <- data.frame(all.sums)
# Save to disk so I don't have to run it again
write.csv(all.sums.frame, file = "Levy2014Sums200.csv", row.names = FALSE)
Ended up finding a solution to this myself. The key to it was using the foreach package with the doParallel package so it could take advantage of all the cores on my computer. There is a great guide on it here: http://www.r-bloggers.com/a-brief-foray-into-parallel-processing-with-r/
I want to calculate the distance between two points in two different datasets. I don't want to calculate the distance between all points - just to the nearest point of datasetB.
Some examples:
Dataset A - Persons
http://pastebin.com/HbaeqACi
Dataset B - Waterfeatures:
http://pastebin.com/UdDvNtHs
Dataset C - City:
http://pastebin.com/nATnkMRk
So...I want to calculate the distance of each person to the nearest waterfeature point.
I've already tried to work with the rgeos package and after struggling with some projections errors, I've got it to work. But this calculate (at least I assume it) all distances to every point, but, as already said, I've only interested in the distance to the nearest waterfeature point.
# load csv files
persons = read.csv("persons.csv", header = TRUE)
water = read.csv("water.csv", header = TRUE)
# change dataframes to SpatialPointDataFrame and assign a projection
library(sp)
library(rgeos)
coordinates(persons) <- c("POINT_X", "POINT_Y")
proj4string(persons) <- CRS("+proj=utm +datum=WGS84")
coordinates(water) <- c("POINT_X", "POINT_Y")
proj4string(water) <- CRS("+proj=utm +datum=WGS84")
# use rgoes package to calculate the distance
distance <- gDistance(persons, water, byid=TRUE)
# works, but calculates a huge number of distances
Is there any parameter, which I've missed. Or do I need to use another package or function? I've also looked at spatstat, which is able to calculate the distance to the nearest neighbor, but not of two different datasets: http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/spatstat/html/nndist.html
Edit:
The complete R-Script including plotting of the datasets:
library(RgoogleMaps)
library(ggplot2)
library(ggmap)
library(sp)
library(fossil)
#load data
persons = read.csv("person.csv", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("water.csv", header =TRUE, stringsAsFactors=FALSE)
city = read.csv("city.csv", header =TRUE)
# plot data
persons_ggplot2 <- persons
city_ggplot2 <- city
water_ggplot2 <- water
gc <- geocode('new york, usa')
center <- as.numeric(gc)
G <- ggmap(get_googlemap(center = center, color = 'bw', scale = 1, zoom = 11, maptype = "terrain", frame=T), extent="device")
G1 <- G + geom_point(aes(x=POINT_X, y=POINT_Y ),data=city, shape = 22, color="black", fill = "yellow", size = 4) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=persons, shape = 8, color="red", size=2.5) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=water_ggplot2, color="blue", size=1)
plot(G1)
#### calculate distance
# Generate unique coordinates dataframe
UniqueCoordinates <- data.frame(unique(persons[,4:5]))
UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)
# Generate a function that looks for the closest waterfeature for each id coordinates
NearestW <- function(id){
tmp <- UniqueCoordinates[UniqueCoordinates$Id==id, 1:2]
WaterFeatures <- rbind(tmp,water[,2:3])
tmp1 <- earth.dist(WaterFeatures, dist=TRUE)[1:(nrow(WaterFeatures)-1)]
tmp1 <- which.min(tmp1)
tmp1 <- water[tmp1,1]
tmp1 <- data.frame(tmp1, WaterFeature=tmp)
return(tmp1)
}
#apply to each id and the merge
CoordinatesWaterFeature <- ldply(UniqueCoordinates$Id, NearestW)
persons <- merge(persons, CoordinatesWaterFeature, by.x=c(4,5), by.y=c(2,3))
What about writing a function that looks for the nearest waterfeature for every person?
#requires function earth.dist from "fossil" package
require(fossil)
#load data
persons = read.csv("person.csv", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("water.csv", header =TRUE, stringsAsFactors=FALSE)
#Generate unique coordinates dataframe
UniqueCoordinates <- data.frame(unique(persons[,4:5]))
UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)
#Generate a function that looks for the closest waterfeature for each id coordinates
NearestW <- function(id){
tmp <- UniqueCoordinates[UniqueCoordinates$Id==id, 1:2]
WaterFeatures <- rbind(tmp,water[,2:3])
tmp1 <- earth.dist(WaterFeatures, dist=TRUE)[1:(nrow(WaterFeatures)-1)]
tmp1 <- min(tmp1)
tmp1 <- data.frame(tmp1, WaterFeature=tmp)
return(tmp1)
}
#apply to each id and the merge
CoordinatesWaterFeature <- ldply(UniqueCoordinates$Id, NearestW)
persons <- merge(persons, CoordinatesWaterFeature, by.x=c(4,5), by.y=c(2,3))
NOTE: I've added a stringsAsFactors parameter to the original read.csv , it make the merging easier at the end
NOTE:Column tmp1 notes the number of METERS to the nearest water feature
Maybe I'm a little too late, but you can use spatstat to compute distances between two different datasets. The command is nncross. The arguments you have to use are two objects of type ppp, which you can create using the as.ppp() function.