Distance Matrix in R using geosphere

Distance Matrix in R using geosphere - r

I have a dataset with info on international investments in Europe and coordinates about NUTS3. For each investment I have the city and the coordinates (lat1,long1). I want to compute the distance from each city to each of the NUTS 3 I have --> E.G. Paris to Paris, Paris_Lyone, Paris_Orly, Paris_Maidenhead etc etc. I want to loop this mechanism for all the cities I have, so at the end I have a matrix for each city that include its distance to each NUTS. I tried to use geosphere but it gives me just the distance between rows.
summary(coordinate$NUTS_BN_ID)
summary(fdimkt$NUTS_BN_ID)
##merge dataset
df <- merge(fdimkt,coordinate, by="nutscode", all = FALSE)
View(df)
fix(df)
#install.packages("dplyr")
library(dplyr)
df %>% dplyr::rename(lat1= `_destination_latitude`, long1= `_destination_longitude` )
library(geosphere)
library(data.table)
#dt <- expand.grid.df(df,df)
setDT(df)[ , dist_km := distGeo(matrix(c(`_destination_latitude`, `_destination_longitude`), ncol = 2),
matrix(c(`lat2`, `long2`), ncol = 2))/1000]
summary(df$dist_km)
This didn't work because it returns me the distance by row, but I actually want the distance from each city to all the NUTS3 coordinates I have
Someone can help me with this?
I'm not sure on how to post my dt, this I gues that might help to have more suggestions.

Related

How can I access a matrix entries using a for loop in R?

I have a distance matrix with all distances between a all points in the data set.
How can I access the individual distances in the matrix without using a for loop?
This is a working example using a for loop:
# Create a distance matrix of all possible distances
DistMatrix <- Stations %>%
st_sf(crs = 4326) %>%
filter(!is.na(end.station.id)) %>%
st_distance(Stations$geometry)
# initialisation of new distance data frame Dist containing start and end station id.
Dist <- Bike %>%
select(start.station.id, end.station.id, dateS, tripduration) %>%
filter(!is.na(end.station.id)) %>%
mutate(Dist = NA)
# iterates over all rows and allocates the corresponding distance to start and end station id's.
for(i in 1:length(Dist$dateS)){
Dist$Dist[i] <- DistMatrix[which(Stations$end.station.id == Dist$end.station.id[i]),
which(Stations$end.station.id == Dist$start.station.id[i])]
}
This is my best try go at this problem using dplyr::mutate:
Dist2 <- Dist2 %>% mutate(Dist = DistMatrix[which(end.station.id == Stations[1]),
which(start.station.id == Stations[1])])
The expected outcome would be that the dataframe Dist is edited with the column Dist(distances):
Is there a working solution to this problem?
Thx!
EDIT: The example code is more detailed now.
EDIT 2: Added expected out come.

Trying to determine the distances between centroids of countries R

I currently have a dataframe of pairs of country codes (like US, RU, CA etc.) Is there a function that determines the centroid of a country given the country code so that I can find the distance between the pairs of countries? Or is there a function that can give me the coordinates of the centroid of each country (such as the longitude and latitude for example)?
This is the first couple lines of my dataset that I had filtered from a previous one for reference.

You can scrape this google public dataset.
My previous suggestion to use the countryref dataset in package CoordinateCleaner doesn't work because I found out there are duplicates with different positions.
library(rvest)
library(dplyr)
url <- 'https://developers.google.com/public-data/docs/canonical/countries_csv'
webpage <- read_html(url)
centroids <- url %>% read_html %>% html_nodes('table') %>% html_table() %>% as.data.frame
data <- data.frame(V1 = c("US","US"), V2 = c('VN','ZA'))
data %>% inner_join(centroids,by = c("V1"="country")) %>% inner_join(centroids,by = c("V2"="country"))
V1 V2 latitude.x longitude.x name.x latitude.y longitude.y name.y
1 US VN 37.09024 -95.71289 United States 14.05832 108.27720 Vietnam
2 US ZA 37.09024 -95.71289 United States -30.55948 22.93751 South Africa

r - ratio calculation via data set

df <- read.csv('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv')
df$countryName = as.character(df$countryName)
I processed the dataset.
df$countryName[df$countryName == "United States"] <- "United States of America"
Changed here for United States of America Arrived in population data.
df8$death_pop <- df8$death / df8$PopTotal
I totally calculated the death/pop.
most, 10 countries. death/pop. how can I find?

Using base R:
df8[order(df8$death_pop, decreasing = TRUE)[1:10],]
This orders your data.frame by death_pop and extracts the first 10 rows.
Using the package dplyr there is the function top_n, which gives you the desired result. I added arrange(desc() to give you a sorted output. Remove this part if you don't need it.
df8 %>% top_n(10, death_pop) %>% arrange(desc(death_pop))

Adding values from a dataframe to a different dataframe

I'm a noob in r programming.
I have 2010 census data in the link-
census data.
This is my dataframe-
dataframe.
What I'd like to do is add the population column 'P001001' from the census data for each state into the dataframe. I'm not able to figure out how to map the state abbreviations in the dataframe to the full names in the census data, and add the respective population to each row for that state in the data frame. The data is for all of the states. What should be the simplest way to do this?
Thanks in advance.

Use the inbuilt datasets for USA states: state.abb and state.name see State name to abbreviation in R
Here's a simple bit of code which will give you a tidyverse approach to the problem.
1) add the state abbreviation to the census table
2) left join the census with the df by state abbrevation
library(tibble)
library(dplyr)
census <-tibble(name = c("Colorado", "Alaska"),
poo1oo1 = c(100000, 200000))
census <-
census %>%
mutate(state_abb = state.abb[match(name, state.name)])
df <- tibble(date = c("2011-01-01", "2011-02-01"),
state = rep("CO", 2),
avg = c(123, 1234))
df <-
df %>%
left_join(census, by = c("state" = "state_abb"))

Distance to nearest point by group using sf

I have a dataset that looks similar to the example below. For each code I would like to calculate the distance to the next nearest code that belongs to the same area as it. So in my example, for each code belonging to area A001 I would be after an additional column in the dataset that contains the minimum distance to one of the other points that belong to area A001. I assume there should be a way of using st_distance to achieve this?
require("data.table")
require("sf")
dt1 <- data.table(
code=c("A00111", "A00112","A00113","A00211","A00212","A00213","A00214","A00311","A00312"),
area=c("A001", "A001","A001","A002","A002","A002","A002","A003","A003"),
x=c(325147,323095,596020,257409,241206,248371,261076,595218,596678),
y=c(286151,284740,335814,079727,084266,078283,062045,333889,337836))
sf1 <- st_as_sf(dt1, coords = c("x","y"), crs=27700, na.fail=FALSE)

There might be a 'cleaner' way to get here, but this gets you the correct values.
library(tidyverse)
# intermediate fun to help later in apply()
smallest_non_zero <- function(x) {
min_val <- min(x[x != 0])
x[match(min_val, x)]
}
closest_grp_distances <- sf1 %>%
group_split(area) %>%
map(~st_distance(., .) %>% # returns matrix
apply(1, smallest_non_zero)) %>%
unlist()
sf1$closest_grp_distances <- closest_grp_distances
I wanted to use the baseR split but it doesn't have a method for sf objects.