Finding coastal and international boarders from shape file in R - r

I want to automatically create two variables from a shape file: 1. a dummy indicator for if a region has a international border and 2. a dummy indicator for if a region has a coastal border.
For example for Guinea variable 1 would be the regions with red dots below, and variable 2 would be with blue dots (I did these by eye).
library(raster)
sd0 <- getData(name = "GADM", country = "GIN", level = 2)
plot(sd0)
There does not seem to be any information in the #data slot for these type of characteristics:
head(sd0#data)
# OBJECTID ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2 HASC_2 CCN_2 CCA_2 TYPE_2 ENGTYPE_2 NL_NAME_2
# 1 1 97 GIN Guinea 1 Boké 1 Boffa GN.BF NA Préfecture Prefecture
# 2 2 97 GIN Guinea 1 Boké 2 Boké GN.BK NA Préfecture Prefecture
# 3 3 97 GIN Guinea 1 Boké 3 Fria GN.FR NA Préfecture Prefecture
# 4 4 97 GIN Guinea 1 Boké 4 Gaoual GN.GA NA Préfecture Prefecture
# 5 5 97 GIN Guinea 1 Boké 5 Koundara GN.KD NA Préfecture Prefecture
# 6 6 97 GIN Guinea 2 Conakry 6 Conakry GN.CK NA Préfecture Prefecture
Perhaps they are elsewhere (I have little to no experience with shape files)? Is there a function somewhere that could at least allow me to create a variable that indicates if a region has no outer boundaries (i.e. all those with no dots in the map above)?

Related

How do you match a numeric value to a categorical value in another data set

I have two data sets. One with a numeric value assigned to individual categorical variables (country name) and a second with survey responses including a person's nationality. How do I assign the numeric value to a new column in the survey dataset with matching nationality/country name?
Here is the head of data set 1 (my.data1):
EN HCI
1 South Korea 0.845
2 UK 0.781
3 USA 0.762
Here is the head of data set 2 (my.data2):
Nationality OIS IR
1 South Korea 2 2
2 South Korea 3 3
3 USA 3 4
4 UK 3 3
I would like to make it look like this:
Nationality OIS IR HCI
1 South Korea 2 2 0.845
2 South Korea 3 3 0.845
3 USA 3 4 0.762
4 UK 3 3 0.781
I have tried this but unsuccessfully:
my.data2$HCI <- NA
for (i in i:nrow(my.data2)) {
my.data2$HCI[i] <- my.data1$HCI[my.data1$EN == my.data2$Nationality[i]]
}
We can use a left_join
library(dplyr)
left_join(my.data2, my.data1, by = c("Nationality" = "EN"))
Or with merge from base R
merge(my.data2, my.data1, by.x = c("Nationality", by.y = "EN", all.x = TRUE)

Change order of conditions when plotting normalised counts for single gene

I have a df of 17 variables (my samples) with the condition location which I would like to plot based on a single gene "photosystem II protein D1 1"
View(metadata)
sample location
<chr> <chr>
1 X1344 West
2 X1345 West
3 X1365 West
4 X1366 West
5 X1367 West
6 X1419 West
7 X1420 West
8 X1421 West
9 X1473 Mid
10 X1475 Mid
11 X1528 Mid
12 X1584 East
13 X1585 East
14 X1586 East
15 X1678 East
16 X1679 East
17 X1680 East
View(countdata)
func X1344 X1345 X1365 X1366 X1367 X1419 X1420 X1421 X1473 X1475 X1528 X1584 X1585 X1586 X1678 X1679 X1680
photosystem II protein D1 1 11208 6807 3483 4091 12198 7229 7404 5606 6059 7456 4007 2514 5709 2424 2346 4447 5567
countdata contains thousands of genes but I am only showing the headers and gene of interest
ddsMat has been created like this:
ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
colData = metadata,
design = ~ location)
When plotting:
library(DeSeq2)
plotCounts(ddsMat, "photosystem II protein D1 1", intgroup=c("location"))
By default, the function plots the "conditions" alphabetically eg: East-Mid-West. But I would like to order them so I can see them on the graph West-Mid-East.
Check plotCountsIMAGEhere
Is there a way of doing this?
Thanks,
I have found that you can manually change the order like this:
ddsMat$location <- factor(ddsMat$location, levels=c("West", "Mid", "East"))

r - Join data frame coordinates by shapefile regions aka Join Attributes by Location

I have a large data set, loaded in R as a data.frame. It contains observations associated with coordinate points (lat/lon).
I also have a shape file of North America.
In the empty column (NA filled) in my data frame, labelled BCR, I want to insert the region name which each coordinate falls into according to the shapefile.
I know how to do this is QGIS using the Vector > Data Management Tools > Join Attributes by Location
The shapefile can be downloaded by clicking HERE.
My data, right now, looks like this (a sample):
LATITUDE LONGITUDE Year EFF n St PJ day BCR
50.406752 -104.613 2009 1 0 SK 90 2 NA
50.40678 -104.61256 2009 2 0 SK 120 3 NA
50.40678 -104.61256 2009 2 1 SK 136 2 NA
50.40678 -104.61256 2009 3 2 SK 149 4 NA
43.0026385 -79.2900467 2009 2 0 ON 112 3 NA
43.0026385 -79.2900467 2009 2 1 ON 122 3 NA
But I want it to look like this:
LATITUDE LONGITUDE Year EFF n St PJ day BCR
50.406752 -104.613 2009 1 0 SK 90 2 Prairie Potholes
50.40678 -104.61256 2009 2 0 SK 120 3 Prairie Potholes
50.40678 -104.61256 2009 2 1 SK 136 2 Prairie Potholes
50.40678 -104.61256 2009 3 2 SK 149 4 Prairie Potholes
43.0026385 -79.2900467 2009 2 0 ON 112 3 Lower Great Lakes/St.Lawrence Plain
43.0026385 -79.2900467 2009 2 1 ON 122 3 Lower Great Lakes/St.Lawrence Plain
Notice the BCR column is now filled with the appropriate BCR region name.
My code so far is just importing and formatting the data and shapefile:
library(rgdal)
library(proj4)
library(sp)
library(raster)
# PFW data, full 2.5m observations
df = read.csv("MyData.csv")
# Clearning out empty coordinate data
pfw = df[(df$LATITUDE != 0) & (df$LONGITUDE != 0) & (!is.na(df$LATITUDE)) & (!is.na(df$LATITUDE)),]
# Creating a new column to be filled with associated Bird Conservation Regions
pfw["BCR"] = NA
# Making a duplicate data frame to conserve data
toSPDF = pfw
# Ensuring spatial formatting
#coordinates(toSPDF) = ~LATITUDE + LONGITUDE
SPDF <- SpatialPointsDataFrame(toSPDF[,c("LONGITUDE", "LATITUDE"),],
toSPDF,
proj4string = CRS("+init=epsg:4326"))
# BCR shape file, no state borders
shp = shapefile("C:/Users/User1/Desktop/BCR/BCR_Terrestrial_master_International.shx")
spPoly = spTransform(shp, CRS("+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
# Check
isTRUE(proj4string(spPoly) == proj4string(SPDF))
# Trying to join attributes by location
#try1 = point.in.polygon(spPoly, SPDF) # Sounds good doesn't work
#a.data <- over(SPDF, spPoly[,"BCRNAME"]) # Error: cannot allocate vector of size 204.7 Mb
I think you want to do a spatial query with points and polygons. That is to assign polygon attributes to the corresponding points. You can do that like this:
Example data
library(terra)
f <- system.file("ex/lux.shp", package="terra")
polygons <- vect(f)
points <- spatSample(v, 10)
Solution
e <- extract(polygons, points)
e
# id.y ID_1 NAME_1 ID_2 NAME_2 AREA POP
#1 1 3 Luxembourg 9 Esch-sur-Alzette 251 176820
#2 2 3 Luxembourg 9 Esch-sur-Alzette 251 176820
#3 3 2 Grevenmacher 6 Echternach 188 18899
#4 4 1 Diekirch 2 Diekirch 218 32543
#5 5 3 Luxembourg 9 Esch-sur-Alzette 251 176820
#6 6 1 Diekirch 4 Vianden 76 5163
#7 7 3 Luxembourg 11 Mersch 233 32112
#8 8 2 Grevenmacher 7 Remich 129 22366
#9 9 1 Diekirch 3 Redange 259 18664
#10 10 3 Luxembourg 9 Esch-sur-Alzette 251 176820
With the older spatial packages you can use raster::extract or sp::over.
Example data:
library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
set.seed(20180121)
pts <- data.frame(coordinates(spsample(pols, 5, 'random')), name=letters[1:5])
plot(pols); points(pts)
Solution:
e <- extract(pols, pts[, c('x', 'y')])
pts$BCR <- e$NAME_2
pts
# x y name BCR
#1 6.009390 49.98333 a Wiltz
#2 5.766407 49.85188 b Redange
#3 6.268405 49.62585 c Luxembourg
#4 6.123015 49.56486 d Luxembourg
#5 5.911638 49.53957 e Esch-sur-Alzette

"for" loop in R and checking previous value from a column

I'm working on a data frame which looks like this
Here's how it looks like:
shape id day hour week id footfall category area name
22496 22/3/14 3 12 634 Work cluster CBD area 1
22670 22/3/14 3 12 220 Shopping cluster Orchard Road 1
23287 22/3/14 3 12 723 Airport Changi Airport 2
16430 22/3/14 4 12 947 Work cluster CBD area 2
4697 22/3/14 3 12 220 Residential area Ang Mo Kio 2
4911 22/3/14 3 12 1001 Shopping cluster Orchard Rd 3
11126 22/3/14 3 12 220 Residential area Ang Mo Kio 2
and so on... until 635 rows return.
and the other dataset that I want to compare with can be found here
Here's how it looks like:
category Foreigners Locals
Work cluster 1600000 3623900
Shopping cluster 1800000 3646666.667
Airport 15095152 8902705
Residential area 527700 280000
They both share the same attribute, i.e. category
I want to check if I can compare the previous hour from the column hour in the first dataset so I can compare it with the value from the second dataset.
Here's, what I ideally want to find in R:
#for n in 1: number of rows{
# check the previous hour from IDA dataset !!!!
# calculate hourSum - previousHour = newHourSum and store it as newHourSum
# calculate hour/(newHourSum-previousHour) * Foreigners and store it as footfallHour
# add to the empty dataframe }
I'm not sure how to do that and here's what i tried:
tbl1 <- secondDataset
tbl2 <- firstDataset
mergetbl <- function(tbl1, tbl2)
{
newtbl = data.frame(hour=numeric(),forgHour=numeric(),locHour=numeric())
ntbl1rows<-nrow(tbl1) # get the number of rows
for(n in 1:ntbl1rows)
{
#get the previousHour
newHourSum <- tbl1$hour - previousHour
footfallHour <- (tbl1$hour/(newHourSum-previousHour)) * tbl2$Foreigners
#add to newtbl
}
}
This would what i expected:
shape id day hour week id footfall category area name forgHour locHour
22496 22/3/14 3 12 634 Work cluster CBD area 1 1 12
22670 22/3/14 3 12 220 Shopping cluster Orchard Road 1 21 25
23287 22/3/14 3 12 723 Airport Changi Airport 2 31 34
16430 22/3/14 4 12 947 Work cluster CBD area 2 41 23
4697 22/3/14 3 12 220 Residential area Ang Mo Kio 2 51 23
4911 22/3/14 3 12 1001 Shopping cluster Orchard Rd 3 61 45
11126 22/3/14 3 12 220 Residential area Ang Mo Kio 2 72 54

rworldmap package - Warning if the number of quantiles was reduced

I am using this R code:
library(rworldmap)
Data <- read.table("D:/Bla/Maps/Test.txt", header = TRUE, sep = "\t")
sPDF <- joinCountryData2Map(Data, joinCode = "ISO3",nameJoinColumn = "ISO3CountryCode")
mapCountryData(sPDF, nameColumnToPlot = "Data")
This produces a map but I get:
You asked for 7 quantiles, only 1 could be created in quantiles classification
I googled and it pointed me to this code
Not sure whether it is relevant.
This is the data I have used:
ISO3CountryCode Data
JPN 7
AUS 6
IND 6
CHN 5
GBR 5
CHE 4
IRN 4
DEU 3
EGY 3
ESP 3
LBY 3
TUN 3
USA 3
ARG 2
AUT 2
BRA 2
EST 2
GRC 2
ITA 2
TUR 2
URY 2
CHL 1
ETH 1
FRA 1
JOR 1
KEN 1
KOR 1
LTU 1
MEX 1
NLD 1
NZL 1
PER 1
POL 1
SAU 1
SRB 1
SVK 1
SVN 1
TZA 1
ZAF 1
It looks like by default mapCountryData() tries to fit data to quantiles for binning. You'll need to help it along a little by tweaking the catMethod parameter.
I'm not sure what your values 1 through 7 mean. If they are categories (and you want them all explicitly displayed in the legend), try:
mapCountryData(sPDF, nameColumnToPlot = "Data", catMethod="categorical")
If you want to treat all values equally on a continuous scale, try:
mapCountryData(sPDF, nameColumnToPlot = "Data", catMethod="fixedWidth")
If neither of these does do what you want, you might try altering numCats and/or catMethod see ?mapCountryData for the possible values and their meaning.

Resources