is it possible to merge datasets in R by longitude and latitude?
I am seeking to merge two geocoded datasets by AidData, containing information on development projects all around the globe. I would like to connect projects which are within a radius of 50km of a project from the other dataset. Columns in the dataframes look something like this:
Dataframe place_name latitude longitude
World Bank Dar es Salaam -6.82349 39.26951
China Dar es Salaam Region -6.83522 39.19597
The most problematic issue is that a 50km radius is not equating to a specific, non-varying change in latitude or longitude everywhere. If that would be the case, the problem could be resolved rather easily by setting an upper and lower boundary for each project, and merging it with each other projects that falls within those boundaries.
Is it possible to merge these data by latitude and longitude in R at all?
Related
I am currently working on dataset with different firms. I have each firms' longitude and latitude. I want to find the firms' city locations by using R.
For example, I found that Shanghai's city longitude and latitude range 120.852326~122.118227 and 30.691701~31.874634 respectively.
I firstly want to create a column named "city", and I want to use find if firms' longitudes and latitudes within Shanghai's city longitude and latitude range. If yes, then R will print "Shanghai" in the "city column if not, it will remain NA.
In my dataframe longitude and latitude variables are displayed as "longitude" and "latitude".
I am not sure how to run the code and I am really appreciate your favor and help!
I am really struggling at the beginning. Your help and favor are highly appreciative!
I have an excel spreadsheet with the latitude and longitude of bike docking stations.
I have a shape file in R (cb_2018_17_bg_500k.shp) that has GEOID (12 digit FIPS code) column and a column labelled geometry. The values in this column are POLYGON((longitude,latitude))
I am trying to add a column in the excel spreadsheet titled FIPS. So, I need to somehow join the latitude and longitude to GEOID column in the shape file.
I am a novice when it comes to R.
Any advice will be much appreciated.
Rich
So far, I have only managed to upload the shape file to R.
This question already has an answer here:
Why use st_intersection rather than st_intersects?
(1 answer)
Closed 1 year ago.
I have a dataframe that shows bus stop locations in Glasgow and another dataframe that shows Datazone polygons for Glasgow. I am using the sf package and have made both dataframes spatial. I want to do a spatial join to create a new dataframe (joined_ds) to match each bus stop location to a Datazone polygon and its associated characteristics (deprivation score). I'm using st_intersection which gives me a new dataframe with all the correct columns but 0 observations.
joined_ds <- st_intersection(st_buffer(bus_stop_data,0), st_buffer(datazones,0))
Both datasets are using the appropriate CRS (EPSG: 27700 for the British National Grid) and I know that the points and polygons overlap because I have successfully plotted them on a map using ggplot, so no idea why my dataframe is showing 0 obs. I've also tried loading in the datasets from scratch and no luck.
Any suggestions welcome, thanks!
Look at the differences between st_intersection and st_intersectshere:
Why use st_intersection rather than st_intersects?
Since you are only interested if a point intersects with a polygon, you need st_intersects. If I understand you correctly you don't need to use st_buffer but simply use st_join in combination with st_intersects. Something like:
st_join(bus_stop_data, datazones, join = st_intersects)
Keep in mind issues that might arise when using spatial joins, for example when a point intersects with two polygons.
I have my polygons stored in a SpatialPolygonsDataFrame and my coordinates in a data frame.
The output I want is to just have an additional column on my data frame that tags the OBJECTID (id of the polygon from the shapefile) that the coordinates fall into.
My problem is kind of the same with this
But its output is a little bit different. Also, it's kinda slow. I tried to tag just 4 coordinates and it took more than 5 minutes. I'm gonna be tagging 16k coordinates so would it be possible to do it faster?
The current methods I know about wouldn't do that exactly (i.e., produce one polygon id per coordinate) because they're generalized in case one point is contained in multiple (overlapping polygons).
See sp::over(), which used to be called overlay().
Example:
over(sr, geometry(meuse), returnList = TRUE)
over(sr, meuse, returnList = TRUE)
Possible duplicates (it's hard to tell without seeing your example data):
Extracting points with polygon in R
Intersecting Points and Polygons in R
I'm looking for an efficient way to link each record in a large dataset to its nearest NOAA weather station. The dataset contains 9-digit zip codes, and NOAA weather stations have lat long info. Anyone have tips on the best way to do this? Thanks!
EDIT: updating with code that worked in case anyone else is looking to find nearest NOAA weather station to a set of zip codes/ if there are suggestions for better ways to do this.
code based on that provided in this question: Finding nearest neighbour (log, lat), then the next closest neighbor, and so on for all points between two datasets in R
temp_stations is downloaded from https://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/station-inventories/temp-inventory.txt (weather stations used in development of temperature dataset)
zipcodes is a package that contains a dataset with lat long for each zip code in the US.
install.packages("zipcode")
require(zipcode)
data(zipcode)
#prime.zips is a subset of "zipcode" created by selecting just the zip codes contained in my original dataset. running the code below on the whole zipcode dataset crashed R on my computer.
install.packages("geosphere")
require(geosphere)
mat <- distm(prime.zips[ ,c('longitude','latitude')], temp_stations[ ,c(3,2)], fun=distGeo)
# assign the weather station id to each record in prime.zips based on shortest distance in the matrix
prime.zips$nearest.station <- temp_stations$station.id[apply(mat, 1, which.min)]