How to plot a variable with geographic distance in R? - r

I would like to plot a simple xy graphic being y=variable and x=geographic distance.
I have data.frame with my data of interest in separate columns (ex: Species$Latitude, Species&Longitude, Species$Variable). All coordinates are in decimal degrees and all variable values are numeric.
Something like the attached image.
Can someone help me? I think it it's easy, but I'm having a hard time figuring it out (so not so easy actually).

When you have a point of origin, you can use the haversine formule to calculate the distance: Haversine function in R
Update, added sample code:
library(pracma)
names <- c("lion","tiger","flamengo")
latitude <- c(0,3,-5)
longitude <- c(0,-0.5,2)
species <- data.frame(names, latitude, longitude)
for(i in 1:length(species$latitude)){
loc1 <- c(0,0)
loc2 <- c(species$latitude[i],species$longitude[i])
species$distance[i] <- haversine(loc1, loc2)
}
species

Related

R - lat/long from a start point and distance

I have lat and lon coordinates. Because I needed to rotate them I transformed the WGS84 lat/lon coordinates into distances from a given point and rotated around that point using a rotation matrix. But for plotting reasons I now need to transform the newly rotated distance values (x and y) back into WGS84 lat/lon coordinates. But I can't find a way to do it.
I transformed the initial lat/lon values to distances from a chosen point like this:
g_mat_x <- cbind(lon, rep(sp_lat,length(lon)))
dist_x <- distGeo(c(sp_lon,sp_lat),g_mat_x)
g_mat_y <- cbind(rep(sp_lon,length(lat)),lat)
dist_y <- distGeo(c(sp_lon,sp_lat),g_mat_y)*(-1)
Where sp_lon and sp_lat are the coordinates of the freely chosen point. lon and lat are vectors from the measurement with the cordinates I needed the distances to.
This works great, but I can't get my head around on how to transform the distances back into the corresponding lat/lon values using the same point (sp_lon/sp_lat).
Are there functions around capable of doing that?
Sample data:
sp_lon <- 6.5
sp_lat <- 54.1
The lat and lon data can be found here:
https://pastebin.com/8ZMGkG2P for lat values
lon <- c(7.03922544225856, 7.03921652830416, 7.03920761347249, 7.03919870033677,
7.03918980775434, 7.039180893677, 7.03917198029713, 7.03916306606371,
7.03915415091448, 7.03914523638381, 7.03913632276797)
https://pastebin.com/bz4zRDb4 for lon values
lat <- c(53.8599307418054, 53.8599299782294, 53.8599292147252, 53.8599284513955,
53.8599276909077, 53.8599269276675, 53.8599261646716, 53.8599254016427,
53.8599246387294, 53.8599238760691, 53.8599231135803)
results for distances from distGeo:
https://pastebin.com/b1rK2i0H for dist_x
dist_x <- c(35275.2396149456, 35274.6564815661, 35274.0732907975, 35273.4902109736,
35272.9084757064, 35272.3253342833, 35271.7422384877, 35271.1590868543,
35270.5758753102, 35269.9927042311, 35269.4095929985)
dist_x_rot <- c(27157.4079196703, 27156.82265961, 27156.2373461817, 27155.6521449452,
27155.0683243199, 27154.4830661607, 27153.897859114, 27153.3125971812,
27152.7272807104, 27152.1420106127, 27151.556803262)
https://pastebin.com/tL7qhXwk for dist_y
dist_y <- c(-26720.819753436, -26720.9047412656, -26720.9897211144, -26721.0746815378,
-26721.1593256438, -26721.2442761088, -26721.3291993745, -26721.4141263143,
-26721.4990403831, -26721.5839262974, -26721.6687931261)
dist_y_rot <- c(-34940.2337323618, -34940.1648982768, -34940.0960416297, -34940.0271949336,
-34939.9583906952, -34939.8895184371, -34939.8206317157, -34939.7517340913,
-34939.6828085284, -34939.6138662435, -34939.5449210127)
I hope it is okay this way. I'd rather give you a small part of the real data instead of making up data.
EDIT: Okay, I got it using destPoint and simple trigonometry to get the dist vector and the angle.

Assigning covariate associated to spatial points to a bigger set of spatial points in R?

I have two data sets with spatial points (in .csv format): data1 with 220 spatial points with latitude and longitude and data2 with 80 spatial points with latitude and longitude. For data2 I have one covariate indicated the genetic origin of each points. Spatial points in both datasets are not exactly the same.
I would like to assign the genetic origin for spatial points in data1. It seems that I need to define around each point in data2 a square (or other) to be able to associate a genetic origin at each points in data1.
I am using R and I think packages as raster or sp may be useful.
Thanks for your help.
Best,
Marie.
You need to make your mind up about how you want to assign "genetic origin". One approach that seem to be hinting at is assigning it to its nearest neighbor.
When asking a question you should always include some example data.
library(raster)
d1 <- data.frame(lon=c(1,5,55,31), lat=c(3,7,20,22))
d2 <- data.frame(lon=c(4,2,8,65,5,4), lat=c(50,-90,20,32,10,10), origin=LETTERS[1:6], stringsAsFactors=FALSE)
Here is how you can assign origin based on the nearest known origin
# make sure your data are (x,y) or (longitude,latitude), not the reverse
pd <- pointDistance(d1, d2[,1:2], lonlat=TRUE)
nd <- apply(pd, 1, which.min)
d1$origin <- d2$origin[nd]

Create neighborhood list of large dataset / fasten up

I want to create a weight matrix based on distance. My code for the moment looks as follows and functions for a smaller sample of the data. However, with the large dataset (569424 individuals in 24077 locations) it doesn't go through. The problem arise at the nb2blocknb fuction. So my question would be: How can I optimize my code for large datasets?
# load all survey data
DHS <- read.csv("Daten/final.csv")
attach(DHS)
# define coordinates matrix
coormat <- cbind(DHS$location, DHS$lon_s, DHS$lat_s)
coorm <- cbind(DHS$lon_s, DHS$lat_s)
colnames(coormat) <- c("location", "lon_s", "lat_s")
coo <- cbind(unique(coormat))
c <- as.data.frame(coo)
coor <- cbind(c$lon_s, c$lat_s)
# get a list with beneighbored locations thath are inbetween 50 km distance
neighbor <- dnearneigh(coor, d1 = 0, d2 = 50, row.names=c$location, longlat=TRUE, bound=c("GE", "LE"))
# get neighborhood list on individual level
nb <- nb2blocknb(neighbor, as.character(DHS$location)))
# weight matrix in list format
nbweights.lw <- nb2listw(nb, style="B", zero.policy=TRUE)
Thanks a lot for your help!
you're trying to make 1.3 e10 distance calculations. The results would be in the GB.
I think you'd want to limit either the maximum distance or the number of nearest neighbors you're looking for. Try nn2 from the RANN package:
library('RANN')
nearest_neighbours_w_distance<-nn2(coordinatesA, coordinatesB,10)
note that this operation is not symmetric (Switching coordinatesA and coordinatesB gives different results).
Also you would first have to convert your gps coordinates to a coordinate reference system in which you can calculate euclidean distances, for example UTM (code not tested):
library("sp")
gps2utm<-function(gps_coordinates_matrix,utmzone){
spdf<-SpatialPointsDataFrame(gps_coordinates_matrix[,1],gps_coordinates_matrix[,2])
proj4string(spdf) <- CRS("+proj=longlat +datum=WGS84")
return(spTransform(spdf, CRS(paste0("+proj=utm +zone=",utmzone," ellps=WGS84"))))
}

Get all (multiple) values of a data.frame falling into every raster cell

I have spatial data with lat/long (x/y) and want to put a raster on it. I want to get all values inside every raster cell where the respective points fit into. The points are not equally distributed so one raster cell does not contain the same amount of points as the neighbouring raster cell. I know that there is the function rasterize that uses the mean to average all values inside that cell into one new value but I dont want to interpolate the mean inside the cell, I want to extract all values (here values of points inside that cell).
How can I do this in an effective way?
consider I have:
library(raster)
library(sp)
my data:
x <- runif(n) * 360 - 180
y <- runif(n) * 180 - 90
n <- 1000
values=runif(n)
xy <- cbind(x,y)
my raster
r <- raster(ncols=10, nrows=10)
Now I dont want to average all values like using rasterize, but extract all values (e.g into a list) that fall into that cell.
Many thanks for ideas and help! Is there any function for this?
Firstly, you have to have values in the raster to be sampled. In your example you are just trying to sample an empty raster. ( I mistook this for your sample size in the originals edit; issue is with your example, not the question)
To answer your question...
extract() is the function you are looking for:
library(raster)
library(sp)
r <- raster(ncols=10, nrows=10)
n <- 1000
x <- runif(n) * 360 - 180
y <- runif(n) * 180 - 90
values=runif(n)
r[]<-values
xy <- SpatialPointsDataFrame(data=data.frame(cbind(x,y)),coords=cbind(x,y))
r0 <- extract(r, xy)
plot(r0)

In R, how to average spatial points data over spatial grid squares

Managed to solve problem now
I have a set of around 50 thousand points that have coordinates and one value associated with them. I would like to be able to place points into a grid averaging the associated value of all points that fall into a grid square. So I want to end up with an object that identifies each grid square and gives the average inside the grid square.
I have the data in a spatial points data frame and a spatial grid object if that helps.
Improving answer: I have definitely done some searching, sorry about the initial state of the question I had only managed to frame the question inside my own head; hadn't had to communicate it to anyone else before...
Here is example data that hopefully illustrates the problem more clearly
##make some data
longi <- runif(100,0,10)
lati <- runif(100,0,10)
value <- runif(500,20,30)
##put in data frame then change to spatial data frame
df <- data.frame("lon"=longi,"lat"=lati,"val"=value)
coordinates(df) <- c("lon","lat")
proj4string(df) <- CRS("+proj=longlat")
##create a grid that bounds the data
grd <- GridTopology(cellcentre.offset=bbox(df)[,1],
cellsize=c(1,1),cells.dim=c(11,11))
sg <- SpatialGrid(grd)
Then I hope to get an object albeit a vector/data frame/list that gives me the average of value in each grid cell/square and some way of identifying which cell it is.
Solution
##convert the grid into a polygon##
polys <- as.SpatialPolygons.GridTopology(grd)
proj4string(polys) <- CRS("+proj=longlat")
##can now use the function over to select the correct points and average them
results <- rep(0, length(polys))
for(i in 1:length(polys)) {
results[i] = mean(df$val[which(!is.na(over(x=df,y=polys[i])))])
}
My question now is if this is the best way to do it or is there a more efficient way?
Your description is vague at best. Please try to ask more specific answers preferably, with code illustrating what you have already tried. Averaging a single value in your point data or a single raster cell makes absolutely no sense.
The best guess at an answer I can provide is to use raster extract() to assign the raster values to a sp point object and then use tapply() to aggregate the values to your grouping values in the points. You can use the coordinates of the points to identify cell location or alternately, the cellnumbers returned from extract (per below example).
require(raster)
require(sp)
# Create example data
r <- raster(ncol=500, nrow=500)
r[] <- runif(ncell(r))
pts <- sampleRandom(r, 100, sp=TRUE)
# Add a grouping value to points
pts#data <- data.frame(ID=rownames(pts#data), group=c( rep(1,25),rep(2,25),
rep(3,25),rep(4,25)) )
# Extract raster values and add to #data slot dataframe. Note, the "cells"
# attribute indicates the cell index in the raster.
pts#data <- data.frame(pts#data, extract(r, pts, cellnumbers=TRUE))
head(pts#data)
# Use tapply to cal group means
tapply(pts#data$layer, pts#data$group, FUN=mean)

Resources