Visualize correlations on a map in R - r

I caculated correlations between temperatures und the date of grape harvest. I stored the results as matrix:
32.5 37.5 42.5 47.5 52.5 57.5 62.5
-12.5 -0.05783118 -0.001655467 -0.07857098 -0.1526494 -0.0007327898 -0.02078552 0.06121682
-7.5 -0.23219824 -0.059952117 -0.06895444 -0.1674386 -0.1311612338 -0.08476390 0.09831010
-2.5 -0.11040995 -0.147325160 -0.15016740 -0.1796807 -0.1819844495 -0.14472899 -0.03550576
2.5 -0.20577359 -0.180857373 -0.15077067 -0.2293366 -0.2577666092 -0.21645676 -0.13044584
7.5 -0.44526971 -0.176224708 -0.15114994 -0.2459971 -0.2741514139 -0.19281484 -0.15683870
12.5 -0.12481683 -0.121675085 -0.16011098 -0.2288839 -0.2503969467 -0.26616721 -0.23089796
17.5 -0.15352693 -0.220012419 -0.11456690 -0.2314059 -0.2194705426 -0.20557053 -0.22529422
Now I want to visualize the results on a map. It should look like this:
example for visualizing correlations on a map
Only my longitude und latitude are different. My latitude ranges from 32,5°N to 62,5°N and my longitude goes from -12,5°E to 17,5°E.
I have absolutely no idea how it's been done! It would be nice, if someone can help me.
Regards.

This is one way. Your grid is rather coarse (increments of 5°, or ~350 miles at the equator), and of course you did not provide an actual map, but this will plot a "heat map" of correlation at the coordinates you provided.
df <- cbind(lon=rownames(df),df)
library(reshape2)
library(ggplot2)
library(RColorBrewer)
gg <- melt(df,id="lon",variable.name="lat",value.name="corr")
gg$lat <- as.numeric(substring(gg$lat,2)) # remove pre-pended "X"
gg$lon <- as.numeric(as.character(gg$lon)) # convert factor to numeric
ggplot(gg)+
geom_tile(aes(x=lon,y=lat, fill=corr))+
scale_fill_gradientn(colours=rev(brewer.pal(9,"Spectral")))+
coord_fixed()

The corrplot() function from corrplot R package can be used to plot a correlogram.
library(corrplot)
M<-cor(mtcars) # compute correlation matrix
corrplot(M, method="circle")
this is described here:
http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
Online R software is also available to compute and visualize a correlation matrix, by a simple click without any installation :
http://www.sthda.com/english/rsthda/correlation-matrix.php

Related

How to dowscale a raster but keeping the same values?

If I have this raster with 40 x 40 resolution.
library(raster)
#get some sample data
data(meuse.grid)
gridded(meuse.grid) <- ~x+y
meuse.raster <- raster(meuse.grid)
res(meuse.raster)
#[1] 40 40
I would like to downscale this raster to 4 x 4. If a pixel of 40x40 = 125, so use the same values for all pixels of 4x4x within this pixel.
just divide each pixel of 40 x40 into 4x4 with keeping its value.
I am open to CDO solutions as well.
We can use raster::disaggregate
library(raster)
#get some sample data
data(meuse.grid)
gridded(meuse.grid) <- ~x+y
meuse.raster <- raster(meuse.grid)
#assign geographic coordinate system (from coordinates and location (Meuse) it seems like the standard projection for the Netherlands (Amersfoort, ESPG:28992)
crs(meuse.raster) <- "EPSG:28992"
#disaggregate
meuse.raster.dissaggregated <- disaggregate(meuse.raster, c(10,10))
I used c(10,10) to disaggregated from a 40x40 to 4x4 resolution (10 times more detailed).
res(meuse.raster.dissaggregated)
[1] 4 4
In the comments Chris mentioned the terra package. I also recommend shifting from raster to terra. I believe its the newest package and will eventually replaces packages like raster and stars.
terra also has a disaggregation function terra::disagg() which works in a similar way.

R - Linking two different sets of coordinates

I have two data sets - let's call them 'plot'(734 rows) and 'stations'(62 rows). A while ago I worked out that this code should let me link each 'plot' to its nearest 'station' based on their coordinates
data set is a little like this - (but without the headers of Long and Lat)
plot - Long Lat stations - Long Lat
13.2 60.5 14.6 55.4
15.4 62.6 15.5 62.9
15.6 62.4 16.4 58.9
16.5 58.7 19.3 64.0
16.5 58.5
#print results to "results.csv"
sink("results.csv")
#identifyl long + lat coords of each data set
p_coord<-SpatialPoints(plot[,c(1,2)])
s_coord<-SpatialPoints(stations[,c(1,2)])
#link coordinates
require(FNN)
g = get.knnx(coordinates(s_coord), coordinates(p_coord),k=1)
str(g)
plot(s_coord_2, col=2, xlim=c(-1e5,6e5))
plot(p_coord, add=TRUE)
segments(coordinates(p_coord)[,1], coordinates(p_coord)[,2], coordinates(s_coord[g$nn.index[,1]])[,1], coordinates(s_coord[g$nn.index[,1]])[,2])
#print result in results.csv
print(g)
I've since realised that the results i get are slightly wrong - for example plots #3 and #4 are linked to station #4, when it would be more applicable that plots #4 and #5 are linked to station #4.
So this leads me to think that something in the code is slightly off, but only by one row
would appreciate any comments on my code, or am equally interested into suggestions on simpler ways to connect two series of coordinates
Thanks
What is your coordinate reference system? Are these points in Scandinavia?
Anyway, you could go with the geosphere package and use distHaversine or
distVincentyEllipsoid(more precise) to get the distances:
plot <- data.frame(Lon = c(13.2,15.4,15.6,16.5,16.5),
Lat = c(60.5,62.6,62.4,58.7,58.5))
stations <- data.frame(Lon = c(14.6,15.5,16.4,19.3),
Lat = c(55.4,62.9,58.9,64))
p_coord <- SpatialPoints(plot[,c(1,2)])
s_coord <- SpatialPoints(stations[,c(1,2)])
library(geosphere)
apply(p_coord#coords, 1, function(x) {
which.min(distHaversine(p1 = x, p2 = s_coord#coords))
})
The output will be
[1] 3 2 2 3 3
which means that plot 1 is close to station 3, plot 2 is linked to station 2 and so on.

How to plot multiple semi-variogram from a single dataset efficiently in R?

I have a dataframe named seoul1to7 contains the hourly PM10 concentration data from 1 march to 7 march,2012.please,download.In this dataset, time is in yyyymmddhr format.for example, 2012030101 means 1 march 2012, 1.00 a.m.
Data is Look like:
ID time PM10 LAT LON
1 111121 2012030101 42 37.56464 126.976
2 111121 2012030102 36 37.56464 126.976
3 111121 2012030103 46 37.56464 126.976
4 111121 2012030104 40 37.56464 126.976
.
.
My ultimate goal is to plot semi-variogram for every hour. for example, for 1 march 2012,1.00 am(2012030101) there are 107 PM10 data. And I want to plot semivariogram for 2012030101 to 2012030723 (total 7*24 semivariogram). I wrote some code in R:
seoul1to7<-read.csv("seoul1to7.csv", row.names=1)
rownames(seoul1to7)<-NULL
seoul311<-subset(seoul1to7, time==2012030101)
seoul312<-subset(seoul1to7, time==2012030102)
.
.
.
seoul3723<-subset(seoul1to7,time==2012030724)
at first, I tried to make my desired (7*24) dataframes by subset() function
then I wanted to plot semivariogram for each dataframe.For example, I have plotted semivariogram for seoul311(for 2012030101) by following code:
library(sp)
library(gstat)
library(rgdal)
seoul311<-read.csv("seoul311.csv",row.name=1)
seoul311<-na.omit(seoul311)
coordinates(seoul311)=~LON+LAT
proj4string(seoul311) = "+proj=longlat +datum=WGS84"
seoul311<-spTransform(seoul311, CRS("+proj=utm +north +zone=52 +datum=WGS84"))
#plot Omnidirectional Variogram
seoul311.var<-variogram(PM10~1,data=seoul311,cutoff=66000, width=6000)
seoul311.var
plot(seoul311.var, col="black", pch=16,cex=1.3,
xlab="Distance",ylab="Semivariance",
main="Omnidirectional Variogram for seoul 311")
#Model fit
model.311<- fit.variogram(seoul311.var,vgm(psill=250,model="Gau",range=40000,nugget=100),
fit.method = 2)
plot(seoul311.var,model=model.311, col="black", pch=16,cex=1.3,
xlab="Distance",ylab="Semivariance",
main="Omnidirectional Variogram for seoul 3112")
#Directional Variogram
seoul311.var1<-variogram(PM10~1,data=seoul311,width=6000,cutoff=66000,
alpha=seq(0,135,45),tol.hor=15)
seoul311.var1
plot(seoul311.var1,model=model.311, cex=1.1,pch=16,col=1,
main="ANisotropic Variogram for PM10")
#anisotropy corrected variograms
model.3112.anis<- fit.variogram(seoul311.var1,vgm(250,"Gau",40000,100,anis=c(45,0.80)),
fit.method = 2)
#Final isotropic variogram for kriging
plot(seoul311.var,model=model.3112.anis, col="black", pch=16,cex=1.3,
xlab="Distance",ylab="Semivariance",
main="Final Isotropic Variogram")
But I understand that my code is very inefficient! I am writing (7*24) times subset(seoul1to7, time==2012030101) this code. and then again (7*24) times the code for plotting semivariogram! I think this is very inappropriate way.
So, How can I plotting these (7*24) semi variogram very efficiently from my dataset seoul1to7 (by using loop or any other function)? If you have need any further information please let me know.
library(sp)
library(gstat)
library(rgdal)
library(automap)
seoul1to7<-read.csv("seoul1to7.csv", row.names=1)
seoul1to7 <- na.omit(seoul1to7)
seoul1to7_split<-split(seoul1to7,seoul1to7$time) #a list contain 161 dataframe
seq(seoul1to7_split)
### now we loop (using lapply()) over each seoul1to7_split entry and calculate
### variogram using autofitVariogram and return the variogram plot
vars<-lapply(seq(seoul1to7_split), function(i)
{
dat<-seoul1to7_split[[i]] # for list element [[]]
coordinates(dat)<-~LON+LAT
proj4string(dat) <- "+proj=longlat +datum=WGS84"
dat <- spTransform(dat, CRS("+proj=utm +north +zone=52 +datum=WGS84"))
variogram<-autofitVariogram(log(PM10)~1,dat)
plot<- plot(variogram,plotit=FALSE, asp=1)
### in case you do not want to fix xlim and ylim to be identical
### for each plot just comment out the following line or change
### values as you see fit
#plt <- update(plt, xlim = c(-1000, 35000), ylim = c(0, 1000))
return(plot)
})
### now we actually have 23 * 7 variogram plots which we will combine
### into 23 hourly plots using latticeCombineGrid()
library(raster)
library(devtools)
#install.packages("Rcpp")
install_github("environmentalinformatics-marburg/Rsenal")
library(Rsenal)
names(seoul1to7_split)
hours<-substr(names(seoul1to7_split),9,10) #substructuring 9th to 10th digit of every element
hours
unique(hours)
class(unique(hours)) #character
seq(unique(hours))
var7_per_plot<-lapply(seq(unique(hours)), function(j)
{
index<- hours %in% unique(hours)[j]
plot.hours <- vars[index]
return(latticeCombineGrid(plot.hours, layout =c(3,3)))
})
var7_per_plot[[1]]
var7_per_plot[[2]]
.
.
.
var7_per_plot[[23]]
Special thanks to Tim Appelhans for taught me.

R: Calculating the shortest distance between two point layers

I need to calculate the shortest distance between two point matrices. I am new to R and have no clue how to do this. This is the code that I used to call in the data and convert them to points
library(dismo)
laurus <- gbif("Laurus", "nobilis")
locs <- subset(laurus, select = c("country", "lat", "lon"))
#uk observations
locs.uk <-subset(locs, locs$country=="United Kingdom")
#ireland observations
locs.ire <- subset(locs, locs$country=="Ireland")
uk_coord <-SpatialPoints(locs.uk[,c("lon","lat")])
ire_coord <-SpatialPoints(locs.ire[,c("lon","lat")])
crs.geo<-CRS("+proj=longlat +ellps=WGS84 +datum=WGS84") # geographical, datum WGS84
proj4string(uk_coord) <-crs.geo #define projection
proj4string(ire_coord) <-crs.geo #define projection
I need to calculate the shortest distance (Euclidean) from points in Ireland to points in UK. In other words I need to calculate the distance from each point in Ireland to its closet point in the UK points layer.
Can some one tell me what function or package I need to use in order to do this. I looked at gdistance and could not find a function that calculate the shortest distance.
You can use the FNN package which uses spatial trees to make the search efficient. It works with euclidean geometry, so you should transform your points to a planar coordinate system. I'll use rgdal package to convert to UK grid reference (stretching it a bit to use it over ireland here, but your original data was New York and you should use a New York planar coord system for that):
> require(rgdal)
> uk_coord = spTransform(uk_coord, CRS("+init=epsg:27700"))
> ire_coord = spTransform(ire_coord, CRS("+init=epsg:27700"))
Now we can use FNN:
> require(FNN)
> g = get.knnx(coordinates(uk_coord), coordinates(ire_coord),k=1)
> str(g)
List of 2
$ nn.index: int [1:69, 1] 202 488 202 488 253 253 488 253 253 253 ...
$ nn.dist : num [1:69, 1] 232352 325375 87325 251770 203863 ...
g is a list of indexes and distances of the uk points that are nearest to the 69 irish points. The distances are in metres because the coordinate system is in metres.
You can illustrate this by plotting the points then joining irish point 1 to uk point 202, irish 2 to uk 488, irish 3 to uk 202 etc. In code:
> plot(uk_coord, col=2, xlim=c(-1e5,6e5))
> plot(ire_coord, add=TRUE)
> segments(coordinates(ire_coord)[,1], coordinates(ire_coord)[,2], coordinates(uk_coord[g$nn.index[,1]])[,1], coordinates(uk_coord[g$nn.index[,1]])[,2])
gDistance() from the rgeos package will give you the distance matrix
library(rgeos)
gDistance(uk_coord, ire_coord, byid = TRUE)
Another option is nncross() from the spatstat package. Pro: it gives the distance to the nearest neighbour. Contra: you 'll need to convert the SpatialPoints to a SpatialPointPattern (see ?as.ppp in statstat)
library(spatstat)
nncros(uk.ppp, ire.ppp)
The package geosphere offers a lot of dist* functions to evaluate distances from two lat/lon points. In your example, you could try:
require(geosphere)
#get the coordinates of UK and Ireland
pointuk<-uk_coord#coords
pointire<-ire_coord#coords
#prepare a vector which will contain the minimum distance for each Ireland point
res<-numeric(nrow(pointire))
#get the min distance
for (i in 1:length(res)) res[i]<-min(distHaversine(pointire[i,,drop=FALSE],pointuk))
The distances you'll obtain are in meters (you can change by setting the radius of the earth in the call to distHaversine).
The problem with gDistance and other rgeos functions is that they evaluate the distance as the coordinates were planar. Basically, the number you obtain is not much useful.

Compare variogram and variog function

I assumed (probably wrongly) that in the easiest cases the output of variog in the geoR package and variogram in the sp package would have been the same.
I have this dataset:
head(final)
lat lon elev seadist tradist samples rssi
1 60.1577 24.9111 2.392 125 15.21606 200 -58
2 60.1557 24.9214 3.195 116 15.81549 200 -55
3 60.1653 24.9221 4.604 387 15.72119 200 -70
4 60.1667 24.9165 7.355 205 15.39796 200 -62
5 60.1637 24.9166 3.648 252 15.43457 200 -73
6 60.1530 24.9258 2.733 65 16.10631 200 -57
that is made of (I guess) unprojected data, so I project them
#data projection
#convert to sp object:
coordinates(final) <- ~ lon + lat #longitude first
library(rgdal)
proj4string(final) = "+proj=longlat +datum=WGS84"
UTM <- spTransform(final, CRS=CRS("+proj=utm +zone=35V+north+ellps=WGS84+datum=WGS84"))
and produce the variogram without trend according to the gstat library
var.notrend.sp<-variogram(rssi~1, UTM)
plot(var.notrend.sp)
trying to get the same output in geoR I go with
UTM1<-as.data.frame(UTM)
UTM1<-cbind(UTM1[,6:7], UTM1[,1:5])
UTM1
coords<-UTM1[,1:2]
coords
var.notrend.geoR <- variog(coords=coords, data=rssi,estimator.type='classical')
plot(var.notrend.geoR)
A couple of points.
gstat can work with unprojected data, and will compute the great-circle distance
setting the "projection" to be "+proj=longlat +datum=WGS84" does not transform the data to a cartesian grid-based system (such as UTM)
What you are seeing in the output of variogram is the fact that is (sensibly) using great circle distances. If you look at the scale of the distance axis, you will see that the ranges are quite different, because geoR doesn't know (and can't account for) the fact you are not using a grid-based projection.
If you want to compare apples with apples use rgdal and spTransform to transform the coordinate system to an appropriate projection and then create variograms with similar specifications. (Note that gstat defines a cutoff ( the length of the diagonal of the box spanning the data is divided by three.)).
The empirical variogram is highly dependent on the definition of distance and the choice of binning. (see the brilliant model-based geostatistics by Diggle and Ribeiro, especially chapter 5 which deals with this issue in detail.

Resources