First painful attempt to do a Spatial map - r

I am struggling to get my first map to work. I have read every document I could find but I am not able to pull it all together to view my data on a map.
This is what I have done so far.
1. I created a very basic data table with 3 observations and 5 variables as a very simple starting point.
str(Datawithlatlongnotvector)
'data.frame': 3 obs. of 5 variables:
$ Client: Factor w/ 3 levels "Jan","Piet","Susan": 2 1 3
$ Sales : int 100 1000 15000
$ Lat : num 26.2 33.9 23.9
$ Lon : num 28 18.4 29.4
$ Area : Factor w/ 3 levels "Gauteng","Limpopo",..: 1 3 2
(the Area is the provinces of South Africa and also is as per the SHP file that I downloaded, see below)
I downloaded a map of South Africa and placed all 3 files (.dbf, shp and shx) files in the same directory - previous error but I found the answer from another user's question. http://www.mapmakerdata.co.uk.s3-website-eu-west-1.amazonaws.com/library/stacks/Africa/South%20Africa/index.htm and selected Simple base map.
I created a map as follows :
SAMap <- readOGR(dsn = ".", layer = "SOU-level_1")
and I can plot the map of the country showing the provinces with plot(SAMap)
I can also plot the data
plot(datawithlatlong)
I saw the instructions how to make a SpatialPointsData frame and I did that :
coordinates(Datawithlatlong) = ~Lat + Lon
I do not know how to pull it all together and do the following :
Show the data (100,1000 and 15000) on the map with different colours i.e. between 1 and 500 is one colour, between 501 and 10 000 is one colour and above 10 000 is one colour.

Maybe trying ggplot2 with some function like:
map = ggplot(df, aes(long, lat, fill = Sales_cat)) + scale_fill_brewer(type = "seq", palette = "Oranges", name = "Sales") + geom_polygon()
With scale_fill_brewer you can represent scales in terms of colours on the map. You should create a factor variable that represents categories according to the range of sales ("Sales_cat"). In any case, the shape file must be transformed into a data.frame.

Try this for 'SAMap' as the country shapefile and 'datawithlatlong' as your data convereted to SpatialPointDataFrame:
library(maptools)
library(classInt)
library(RColorBrewer)
# Prepare colour pallete
plotclr <- brewer.pal(3,"PuRd")
class<-classIntervals(datawithlatlong#data$sales, n=3, style="fixed", fixedBreaks=c(0, 500,1000,10000)) # you can adjust the intervals here
colcode <- findColours(class, plotclr)
# Plot country map
plot(SAMap,xlim=c(16, 38.0), ylim=c(-46,-23))# plot your polygon shapefile with appropriate xlim and ylim (extent)
# Plot dataframe convereted to SPDF (in your step 5)
plot(datawithlatlong, col=colcode, add=T,pch=19)
# Creating the legend
legend(16.2, -42, legend=names(attr(colcode, "table")), fill=attr(colcode, "palette"), cex=0.6, bty="n") # adjust the x and y for fixing appropriate location for the legend

I generated a bigger dataset because I think with only 3 points it hard to see how things are working.
library(rgdal)
library(tmap)
library(ggmap)
library(randomNames)
#I downloaded the shapefile with the administrative area polygons
map <- readOGR(dsn = ".", layer = "SOU")
#the coordinate system is not part of the loaded object hence I added this information
proj4string(map) <- CRS("+init=epsg:4326")
# Some sample data with random client names and random region
ADM2 <- sample(map#data$ADM2, replace = TRUE, 50)
name <- randomNames(50)
sales <- sample(0:5000, 50)
clientData <- data.frame(id = 1:50, name, region = as.character(ADM2), sales,
stringsAsFactors = FALSE)
#In order to add the geoinformation for each client I used the awesome
#function `ggmap::geocode` which takes a character string as input an
#provides the lon and lat for the region, city ...
geoinfo <- geocode(clientData$region, messaging = FALSE)
# Use this information to build a Point layer
clientData_point <- SpatialPointsDataFrame(geoinfo, data = clientData)
proj4string(clientData_point) <- CRS("+init=epsg:4326")
Now the part I hope that answers the question:
# Adding all sales which occured in one region
# If there are 3 clients in one region, the sales of the three are
# summed up and returned in a new layer
sales_map <- aggregate(x = clientData_point[ ,4], by = map, FUN = sum)
# Building a map using the `tmap` package`
tm_shape(sales_map) + tm_polygons(col = "sales")
Edit:
Here is a ggplot2 solution because it seems you want to stick with it.
First, for ggplot you have to transform your SpatialPolygonDataFrame to an ordinary data.frame. Fortunately, broom::tidy() will do the job automatically.
Second, your Lat values are missing a -. I added it.
Third, I renamed your objects for less typing.
point_layer<- structure(list(Client = structure(c(2L, 1L, 3L),
.Label = c("Jan", "Piet", "Susan"),
class = "factor"),
Sales = c(100, 1000, 15000 ),
Lat = c(-26.2041, -33.9249, -23.8962),
Lon = c(28.0473, 18.4241, 29.4486),
Area = structure(c(1L, 3L, 2L),
.Label = c("Gauteng", "Limpopo", "Western Cape"),
class = "factor"),
Sale_range = structure(c(1L, 2L, 4L),
.Label = c("(1,500]", "(500,2e+03]", "(2e+03,5e+03]", "(5e+03,5e+04]"),
class = "factor")),
.Names = c("Client", "Sales", "Lat", "Lon", "Area", "Sale_range"),
row.names = c(NA, -3L), class = "data.frame")
point_layer$Sale_range <- cut(point_layer$Sales, c(1,500.0,2000.0,5000.0,50000.0 ))
library(broom)
library(ggplot2)
ggplot_map <- tidy(map)
ggplot() + geom_polygon(ggplot_map, mapping = aes(x = long, y = lat, group = group),
fill = "grey65", color = "black") +
geom_point(point_layer, mapping = aes(x = Lon, y = Lat, col = Sale_range)) +
scale_colour_brewer(type = "seq", palette = "Oranges", direction = 1)

Related

Download, Plot map, and extract data in R

I downloaded a monthly data from [NASA data][1] and saved in .txt and .asc format. I am trying to plot and extract the data from the ASCII file, but unfortunately I am unable to do so. I tried the following:
1.
infile <- "OMI/L3feb09.txt"
data <- as.matrix(read.table(infile, skip = 3, header = FALSE, sep = "\t"))
data[data == -9999] = NA
rr <- raster(data, crs = "+init=epsg:4326")
extent(rr) = c(179.375, 179.375+1.25*288, -59.5, -59.5+1*120)
Tried to extract for australia
adm <- getData("GADM", country="AUS", level=1)
rr = mask(rr, adm)
plot(rr)
library(rgdal)
r = raster("OMI/L3feb09.txt")
plot(r)
library(raster)
r = raster("OMI/L3feb09.txt")
plot(r)
4.Also tried,
df1 <- read.table("OMI/L3feb09.txt", skip = 11, header = FALSE, sep = "\t")
Tried the following from
Stackoverflow link 1
Stackoverflow link 2
The problem is there are strings in the file in between number, such as "lat = -55.5"
Appreciate any kind of help. Thank you
[2]: https://stackoverflow.com/questions/42064943/opening-an-ascii-file-using-r
So, I downloaded one file and played around with it! It is not the best solution, however, I hope it can give you an idea.
library(stringr)
# read data
data<-read.csv("L3_tropo_ozone_column_oct04",header = FALSE, skip = 3,sep = "")
# this "" will seperate lat = -59.5 to 3 rows, and will be easier to remove.
#Also each row in the data frame constrained by 2 rows of "lat", represents #data on the later "lat".
lat_index<-which(data[,1]=="lat")
#you need the last row that contains data not "lat string
lat_index<-lat_index-1
#define an empty array for results.
result<-array(NA, dim = c(120,288),dimnames = list(lat=seq(-59.5,59.5,1),
lon=seq(-179.375,179.375,1.25)))
I assumed data -on 3 three digits- on each latituide is dividable by 3 resulting in 288, which equals the lon grid number. Correct me if I'm wrong.
# function to split a string into a vector in which each string has three letter/numbers
split_n_parts<-function(input_string,n){
# dislove it to many elements or by number
input_string_1<-unlist(str_extract_all(input_string,boundary("character")))
output_string<-vector(length = length(input_string_1)/n)
for ( x in 1:length(output_string)){
output_string[x]<-paste0(input_string_1[c(x*3-2)],
input_string_1[c(x*3-1)],
input_string_1[c(x*3)])
}
return(as.numeric(output_string))
}
Here, the code loops, collects, write each lat data in the result array
# loop over rows constrainted by 2 lats, process it and assign to an array
for (i in 1:length(lat_index)){
if(i ==1){
for(j in 1:lat_index[i]){
if(j==1){
row_j<-paste0(data[j,])
}else{
row_j<-paste0(row_j,data[j,])
}
}
}else{
ii<-i-1
lower_limit<-lat_index[ii]+4
upper_limit<-lat_index[i]
for(j in lower_limit:upper_limit){
if(j==lower_limit){
row_j<-paste0(data[j,])
}else{
row_j<-paste0(row_j,data[j,])
}
}
}
result[i,]<-split_n_parts(row_j,3)
}
Here, is the final array and image
#plot as image
image(result)
EDIT: To continue the solution and put the end-result:
# because data is IN DOBSON UNITS X 10
result<-result/10
#melt to datafrome
library(plyr)
result_df<-adply(result, c(1,2))
result_df$lat<-as.numeric(as.character(result_df$lat))
result_df$lon<-as.numeric(as.character(result_df$lon))
# plotting
library(maps)
library(ggplot2)
library(tidyverse)
world_map <- map_data("world")
#colors
jet.colors <-colorRampPalette(c("white", "cyan", "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"))
ggplot() +
geom_raster(data=result_df,aes(fill=V1,x=lon,y=lat))+
geom_polygon(data = world_map, aes(x = long, y = lat, group = group),
fill=NA, colour = "black")+
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0))+
scale_fill_gradientn(colors = jet.colors(7))

Drawing arbitrary lines on a highchart map in R (library highcharter)

I am drawing a highcharts map using the highcharter package in R. I added already some points (cities) and want to link them by drawing an additionnal beeline using the world map-coordinates.
I already managed to draw the beelines by first drawing the map, then hovering over the cities which shows me the plot-coordinates, and then redrawing the plot using the aforementioned plot-coordinates. (Watch out: I used the PLOT-coordinates and my goal is to use directly the WORLD MAP-coordinates.)
If you only have 1 or two cities, it's not a big deal. But if you have like 100 cities/points, it's annoying. I guess the answer will be something like here: Is it possible to include maplines in highcharter maps?.
Thank you!
Here my code:
library(highcharter)
library(tidyverse)
# cities with world coordinates
ca_cities <- data.frame(
name = c("San Diego", "Los Angeles", "San Francisco"),
lat = c(32.715736, 34.052235, 37.773972), # world-map-coordinates
lon = c(-117.161087, -118.243683, -122.431297) # world-map-coordinates
)
# path which I create AFTER the first drawing of the map as I get the
# plot-coordinates when I hover over the cities.
path <- "M669.63,-4963.70,4577.18,-709.5,5664.42,791.88"
# The goal: the path variable above should be defined using the WORLD-
# coordinates in ca_cities and not using the PLOT-coordinates.
# information for drawing the beeline
ca_lines <- data.frame(
name = "line",
path = path,
lineWidth = 2
)
# construct the map
map <- hcmap("countries/us/us-ca-all", showInLegend = FALSE) %>%
hc_add_series(data = ca_cities, type = "mappoint", name = "Cities") %>%
hc_add_series(data = ca_lines, type = "mapline", name = "Beeline", color = "blue")
map
See picture here
After several hours, I found an answer to my problem. There are maybe easier ways, but I'm going to post my version using the rgdal-package.
The idea is to convert first the world map-coordinates to the specific map's coordinate system (ESRI) and then back-transform all adjustments from highcharts:
library(highcharter)
library(tidyverse)
library(rgdal) # you also need rgdal
# cities with world coordinates
ca_cities <- data.frame(
name = c("San Diego", "Los Angeles", "San Francisco"),
lat = c(32.715736, 34.052235, 37.773972),
lon = c(-117.161087, -118.243683, -122.431297)
)
# pre-construct the map
map <- hcmap("countries/us/us-ca-all", showInLegend = FALSE)
# extract the transformation-info
trafo <- map$x$hc_opts$series[[1]]$mapData$`hc-transform`$default
# convert to coordinates
ca_cities2 <- ca_cities %>% select("lat", "lon")
coordinates(ca_cities2) <- c("lon", "lat")
# convert world geosystem WGS 84 into transformed crs
proj4string(ca_cities2) <- CRS("+init=epsg:4326") # WGS 84
ca_cities3 <- spTransform(ca_cities2, CRS(trafo$crs)) #
# re-transform coordinates according to the additionnal highcharts-parameters
image_coords_x <- (ca_cities3$lon - trafo$xoffset) * trafo$scale * trafo$jsonres + trafo$jsonmarginX
image_coords_y <- -((ca_cities3$lat - trafo$yoffset) * trafo$scale * trafo$jsonres + trafo$jsonmarginY)
# construct the path
path <- paste("M",
paste0(paste(image_coords_x, ",", sep = ""),
image_coords_y, collapse = ","),
sep = "")
# information for drawing the beeline
ca_lines <- data.frame(
name = "line",
path = path,
lineWidth = 2
)
# add series
map <- map %>%
hc_add_series(data = ca_cities, type = "mappoint", name = "Cities") %>%
hc_add_series(data = ca_lines, type = "mapline", name = "Beeline", color = "blue")
map

Mapping nearest neighbours of a long-lat data set using ggmap, geom_point and a loop

My ultimate goal is to connect all nearest neighbours of a set of buildings (based on Euclidean distance) on a ggmap using geom_path from the ggplot2 package. I need help with a loop that will allow me to plot all neighbours as easily as possible
I have created a distance matrix (called 'kmnew') in kilometres between 3 types of building in Beijing: B (x2), D (x2) and L (x1):
B B D D L
B NA 6.599014 5.758531 6.285787 3.770175
B NA NA 7.141096 3.873296 5.092667
D NA NA NA 3.690725 2.563017
D NA NA NA NA 2.832083
L NA NA NA NA NA
I try to discern the nearest neighbours of each building by row by declaring a matrix and using a loop to ascertain the nearest neighbour building:
nn <- matrix(NA,nrow=5,ncol=1)
for (i in 1:nrow(kmnew)){
nn[i,] <- which.min(kmnew[i,])
}
This returns the following error (not sure why):
Error in nn[i, ] <- which.min(kmnew[i, ]) : replacement has length zero
but seems to return the correct answer to nn:
[,1]
[1,] 5
[2,] 4
[3,] 5
[4,] 5
[5,] NA
I append this to an original dataframe called newbjdata:
colbj <- cbind(newbjdata,nn)
that returns
Name Store sqft long lat nn
1 B 1 1200 116.4579 39.93921 5
2 B 2 750 116.3811 39.93312 4
3 D 1 550 116.4417 39.88882 5
4 D 2 600 116.4022 39.90222 5
5 L 1 1000 116.4333 39.91100 NA
I then retrieve my map via ggmap:
bjgmap <- get_map(location = c(lon = 116.407395,lat = 39.904211),
zoom = 13, scale = "auto",
maptype = "roadmap",
messaging = FALSE, urlonly = FALSE,
filename = "ggmaptemp", crop = TRUE,
color = "bw",
source = "google", api_key)
My ultimate goal is to map the nearest neighbours together in a plot using geom_path from the ggplot package.
For example, the nn of the 1st building of type B (row 1) is the 1 building of type L (row 5). Obviously I can draw this line by subsetting the said 2 rows of the dataframe thus:
ggmap(bjgmap) +
geom_point(data = colbj, aes(x = long,y = lat, fill = factor(Name)),
size =10, pch = 21, col = "white") +
geom_path(data = subset(colbj[c(1,5),]), aes(x = long,y = lat),col = "black")
However, I need a solution that works like a loop, and I can't figure out how one might achieve this, as I need to reference the nn column and refer that back to the long lat data n times. I can well believe that I am not using the most efficient method, so am open to alternatives. Any help much appreciated.
Here is my attempt. I used gcIntermediate() from the geosphere package to set up lines. First, I needed to rearrange your data. When you use gcIntermediate(), you need departure and arrival long/lat. That is you need four columns. In order to arrange your data in this way, I used the dplyr package. mutate_each(colbj, funs(.[nn]), vars = long:lat) works for you to pick up desired arrival long/lat. . is for 'long' and 'lat'. [nn] is the vector index for the variables. Then, I employed gcIntermediate(). This creates SpatialLines. You need to make the object a SpatialLinesDataFrame. Then, you need to convert the output to "normal" data.frame. This step is essential so that ggplot can read your data. fortify() is doing the job.
library(ggmap)
library(geosphere)
library(dplyr)
library(ggplot2)
### Arrange the data: set up departure and arrival long/lat
mutate_each(colbj, funs(.[nn]), vars = long:lat) %>%
rename(arr_long = vars1, arr_lat = vars2) %>%
filter(complete.cases(nn)) -> mydf
### Get line information
rts <- gcIntermediate(mydf[,c("long", "lat")],
mydf[,c("arr_long", "arr_lat")],
50,
breakAtDateLine = FALSE,
addStartEnd = TRUE,
sp = TRUE)
### Convert the routes to a data frame for ggplot use
rts <- as(rts, "SpatialLinesDataFrame")
rts.df <- fortify(rts)
### Get a map (borrowing the OP's code)
bjgmap <- get_map(location = c(lon = 116.407395,lat = 39.904211),
zoom = 13, scale = "auto",
maptype = "roadmap",
messaging = FALSE, urlonly = FALSE,
filename = "ggmaptemp", crop = TRUE,
color = "bw",
source = "google", api_key)
# Draw the map
ggmap(bjgmap) +
geom_point(data = colbj,aes(x = long, y = lat, fill = factor(Name)),
size = 10,pch = 21, col = "white") +
geom_path(data = rts.df, aes(x = long, y = lat, group = group),
col = "black")
EDIT
If you want to do all data manipulation in one sequence, the following is one way to go. foo is identical to rts.df above.
mutate_each(colbj, funs(.[nn]), vars = long:lat) %>%
rename(arr_long = vars1, arr_lat = vars2) %>%
filter(complete.cases(nn)) %>%
do(fortify(as(gcIntermediate(.[,c("long", "lat")],
.[,c("arr_long", "arr_lat")],
50,
breakAtDateLine = FALSE,
addStartEnd = TRUE,
sp = TRUE), "SpatialLinesDataFrame"))) -> foo
identical(rts.df, foo)
#[1] TRUE
DATA
colbj <- structure(list(Name = structure(c(1L, 1L, 2L, 2L, 3L), .Label = c("B",
"D", "L"), class = "factor"), Store = c(1L, 2L, 1L, 2L, 1L),
sqft = c(1200L, 750L, 550L, 600L, 1000L), long = c(116.4579,
116.3811, 116.4417, 116.4022, 116.4333), lat = c(39.93921,
39.93312, 39.88882, 39.90222, 39.911), nn = c(5L, 4L, 5L,
5L, NA)), .Names = c("Name", "Store", "sqft", "long", "lat",
"nn"), class = "data.frame", row.names = c("1", "2", "3", "4",
"5"))

R maps - Filter coordinates out when plotting points in map

I have a large data frame of events locations (longitude, latitude). See a sample below:
events <- structure(list(lat = c(45.464944, 40.559207, 45.956775, 44.782831, 45.810287, 35.913357, 43.423855, 45.2359, 45.526025, 41.91371, 46.340833, 40.696482, 42.367164, 41.913701, 41.89167, 46.046206, 41.108316, 45.514132, 45.688118, 37.090387, 43.446555, 41.913712, 46.614833, 45.368825, 41.892168), lon = c(9.163453, 8.321587, 12.983347, 10.886471, 9.077844, 10.560439, 6.768272, 9.23176, 9.375761, 12.466994, 6.889444, 17.316925, 13.352248, 12.466992, 12.516713, 14.497758, 16.871019, 12.056176, 9.176543, 15.293506, 6.77119, 12.466993, 15.194904, 11.110711, 12.516085)), .Names = c("lat", "lon"), row.names = c(NA, 25L), class = "data.frame")
and I would like to plot only the events that happened in a given country (for instance, Italy).
The following plots all events:
library(rworldmap)
library(maps)
par(mar=c(0,0,0,0))
plot(getMap(resolution="low"), xlim = c(14,14), ylim = c(36.8,47))
points(x=events$lon, y=events$lat, pch = 19, col ='red' , cex = 0.6)
How can I filter out the events that fall outside the boundaries of the country ? Thank you in advance for help and support.
assign polygon map to an object:
m = getMap(resolution="low")
convert events into an sp object -- first longitude, then latitude:
pts = SpatialPoints(events[c(2,1)])
assign CRS:
proj4string(pts) = proj4string(m)
plot points inside any of the polygons:
points(pts[m,], pch = 3)
select points inside Italy:
it = m[m$ISO_A2 == "IT",]
points(pts[it,], col = 'blue')
(this uses sp::over, but behind the scenes)

time series plot in R

My data looks something like this:
There are 10,000 rows, each representing a city and all months since 1998-01 to 2013-9:
RegionName| State| Metro| CountyName| 1998-01| 1998-02| 1998-03
New York| NY| New York| Queens| 1.3414| 1.344| 1.3514
Los Angeles| CA| Los Angeles| Los Angeles| 12.8841| 12.5466| 12.2737
Philadelphia| PA| Philadelphia| Philadelphia| 1.626| 0.5639| 0.2414
Phoenix| AZ| Phoenix| Maricopa| 2.7046| 2.5525| 2.3472
I want to be able to do a plot for all months since 1998 for any city or more than one city.
I tried this but i get an error. I am not sure if i am even attempting this right. Any help will be appreciated. Thank you.
forecl <- ts(forecl, start=c(1998, 1), end=c(2013, 9), frequency=12)
plot(forecl)
Error in plots(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"
You might try
require(reshape)
require(ggplot2)
forecl <- melt(forecl, id.vars = c("region","state","city"), variable_name = "month")
forecl$month <- as.Date(forecl$month)
ggplot(forecl, aes(x = month, y = value, color = city)) + geom_line()
To add to #JLLagrange's answer, you might want to pass city through facet_grid() if there are too many cities and the colors will be hard to distinguish.
ggplot(forecl, aes(x = month, y = value, color = city, group = city)) +
geom_line() +
facet_grid( ~ city)
Could you provide an example of your data, e.g. dput(head(forecl)), before converting to a time-series object? The problem might also be with the ts object.
In any case, I think there are two problems.
First, data are in wide format. I'm not sure about your column names, since they should start with a letter, but in any case, the general idea would be do to something like this:
test <- structure(list(
city = structure(1:2, .Label = c("New York", "Philly"),
class = "factor"), state = structure(1:2, .Label = c("NY",
"PA"), class = "factor"), a2005.1 = c(1, 1), a2005.2 = c(2, 5
)), .Names = c("city", "state", "a2005.1", "a2005.2"), row.names = c(NA,
-2L), class = "data.frame")
test.long <- reshape(test, varying=c(3:4), direction="long")
Second, I think you are trying to plot too many cities at the same time. Try:
plot(forecl[, 1])
or
plot(forecl[, 1:5])

Resources