I am using an excel sheet for data. One column has FIPS numbers for GA counties and the other is labeled Count with numbers 1 - 5. I have made a map with these values using the following code:
library(usmap)
library(ggplot2)
library(rio)
carrierdata <- import("GA Info.xlsx")
plot_usmap( data = carrierdata, values = "Count", "counties", include = c("GA"), color="black") +
labs(title="Georgia")+
scale_fill_continuous(low = "#56B1F7", high = "#132B43", name="Count", label=scales::comma)+
theme(plot.background=element_rect(), legend.position="right")
I've included the picture of the map I get and a sample of the data I am using. Can anyone help me put the actual Count numbers on each county?
Thanks!
Data
The usmap package is a good source for county maps, but the data it contains is in the format of data frames of x, y co-ordinates of county outlines, whereas you need the numbers plotted in the center of the counties. The package doesn't seem to contain the center co-ordinates for each county.
Although it's a bit of a pain, it is worth converting the map into a formal sf data frame format to give better plotting options, including the calculation of the centroid for each county. First, we'll load the necessary packages, get the Georgia data and convert it to sf format:
library(usmap)
library(sf)
library(ggplot2)
d <- us_map("counties")
d <- d[d$abbr == "GA",]
GAc <- lapply(split(d, d$county), function(x) st_polygon(list(cbind(x$x, x$y))))
GA <- st_sfc(GAc, crs = usmap_crs()#projargs)
GA <- st_sf(data.frame(fips = unique(d$fips), county = names(GAc), geometry = GA))
Now, obviously I don't have your numeric data, so I'll have to make some up, equivalent to the data you are importing from Excel. I'll assume your own carrierdata has a column named "fips" and another called "values":
set.seed(69)
carrierdata <- data.frame(fips = GA$fips, values = sample(5, nrow(GA), TRUE))
So now we left_join our imported data to the GA county data:
GA <- dplyr::left_join(GA, carrierdata, by = "fips")
And we can calculate the center point for each county:
GA$centroids <- st_centroid(GA$geometry)
All that's left now is to plot the result:
ggplot(GA) +
geom_sf(aes(fill = values)) +
geom_sf_text(aes(label = values, geometry = centroids), colour = "white")
Related
I am using usmap and ggplot to plot population on a map. My data has two columns - population and zipcodes.
Question: How can I display data on city level using the same libraries or if you know of other libraries that can do the job.
Question: I am plotting California map and I want to zoom on LA county and nearby counties.
Below code gives me a nice California map and population as a color.
library(usmap)
library(ggplot2)
usmap::plot_usmap("counties",
include = ("CA") )
plot_usmap(data = data, values = "pop_2015", include = c("CA"), color = "grey") +
theme(legend.position = "right")+scale_fill_gradient(trans = "log10")
The tigris package makes downloading zip code tabulation areas fairly simple. You can download as a simple features dataframe so joining your data by zip code using dplyr functions is fairly easy. Here is a quick example:
library(tigris)
library(dplyr)
library(ggplot2)
df <- zctas(cb = TRUE,
starts_with = c("778"),
class = "sf")
## generate some sample data that
## can be joined to the downloaded data
sample_data <- tibble(zips = df$ZCTA5CE10,
values = rnorm(n = df$ZCTA5CE10))
## left join the sample data to the downloaded data
df <- df %>%
left_join(sample_data,
by = c("ZCTA5CE10" = "zips"))
## plot something
ggplot(df) +
geom_sf(aes(fill = values))
Rephrasing the question...I am preparing report and one part of it is spatial viz.
I have 2 datasets. First(Scores) is countries with their scores. Second one (Locations) is exact longitude and latitude that refers to an exact location inside those countries. Let that be examples:
Scores = data.frame( Country = c("Lebanon","UK","Chille"), Score =c(1,3.5,5))
Locations = data.frame(Location_Name = c("London Bridge", "US Embassy in Lebanon" , "Embassy of Peru in Santiago"),
LONG = c(-0.087749, 35.596614, -70.618236),
LAT = c(51.507911, 33.933586, -33.423285))
What i want to achieve is get filled map of the world (in my dataset i have every country) and color inside of its boundouries with the Score (Scores$Score) on continous scale.
On top of that I would like to add pins, bubbles or whatever marker of Locations from Locations dataframe.
So my desired outcome would be combination of this view:
and this view:
Ideally i would like also to be able to draw 2km radius around the Locations from Locations data.frame also.
I know to do them separately but cant seem to achieve it on one nice clean map.
I really appreciate any help or tips on this, got stuck for whole day on that one
As suggested by #agila you can use the tmap package.
First merge your Scores data with World so you can fill countries based on Scores data. Note that your Country column should match the name in World exactly when merging.
You will need to use st_as_sf from sf package to make your Locations an sf object to add to map.
tm_dots can show points. An alternative for bubbles is tm_bubbles.
library(tmap)
library(sf)
data(World)
Scores = data.frame(Country = factor(c("Mexico","Brazil","Chile"), levels = levels(World$name)),
Score =c(1,3.5,5))
Locations = data.frame(Location_Name = c("Rio de Janeiro", "US Embassy in Lebanon" , "Embassy of Peru in Santiago"),
LONG = c(-43.196388, 35.596614, -70.618236),
LAT = c(-22.908333, 33.933586, -33.423285))
map_data <- merge(World, Scores, by.x = "name", by.y = "Country", all = TRUE)
locations_sf <- st_as_sf(Locations, coords = c('LONG', 'LAT'))
tm_shape(map_data) +
tm_polygons("Score", palette = "-Blues") +
tm_shape(locations_sf) +
tm_dots(size = .1)
Map
I want to create a map of Germany where each state is shaded according to its gross domestic product. I know how to do this in R (and put the code below). Is there a possibility to do this in Julia in an equally simple way?
library(tidyverse)
library(ggplot2)
library(sf)
shpData = st_read("./geofile.shp")
GDPData <- read.delim("./stateGDP.csv", header=FALSE)
GDPData <- rename(GDPData,StateName=V1,GDP=V2)
GDPData %>%
left_join(shpData) ->mergedData
ggplot(mergedData) + geom_sf(data = mergedData, aes(fill = BIP,geometry=geometry)) + coord_sf(crs = st_crs(mergedData))-> pBIP1
You'd load the Shapefile and use Plots to plot it.
The ideomatic code is something like
using Plots, Shapefile, CSV
shp = Shapefile.shapes(Shapefile.Table("geofile.shp"))
GDPData = CSV.read("stateGDP.csv")
plot(shp, fill_z = GDPData.V2')
Note the ' which transposes the values to a column vector - this will tell Plots to apply the colors to individual polygons.
We daily produce maps that show a calculated level for temperature in 30 distinct areas of our region, each area is filled with a different colour depending on the level. This maps look like
Now I want to switch map generation to R. I've downloaded provincial and municipal boundaries (you can find boundaries for whole Spain or here the subset for my region) and managed to plot them with ggplot2 following Hadley's example.
I can also produce an ascii file that contains two columns: identifier (CODINE) and daily level. You can download here.
This is my first script attempting to plot shapefiles with R and ggplot2 so there may be mistakes and for sure it can be improved, suggestions welcome. The following code (based on Hadley's previously mentioned) works for me:
> require("rgdal")
> require("maptools")
> require("ggplot2")
> require("plyr")
# Reading municipal boundaries
esp = readOGR(dsn=".", layer="lineas_limite_municipales_etrs89")
muni=subset(esp, esp$PROV1 == "46" | esp$PROV1 == "12" | esp$PROV1 == "3")
muni#data$id = rownames(muni#data)
muni.points = fortify(muni, region="id")
muni.df = join(muni.points, muni#data, by="id")
# Reading province boundaries
prov = readOGR(dsn=".", layer="poligonos_provincia_etrs89")
pr=subset(prov, prov$CODINE == "46" | prov$CODINE == "12" | prov$CODINE == "03" )
pr#data$id = rownames(pr#data)
pr.points = fortify(pr, region="id")
pr.df = join(pr.points, pr#data, by="id")
ggplot(muni.df) + aes(long,lat,group=group) + geom_path(color="blue") +
+ coord_equal()+ geom_path(data=pr.df, +
aes(x=long, y=lat, group=group),color="red", size=0.5)
This code plots a nice map with all the boundaries
For polygon filling by level I tried to read and then merge as suggested in http://tormodboe.wordpress.com/2011/02/22/g%C3%B8y-med-kart-2/
level=read.csv("levels.dat",header=T,sep=" ")
munlevel=merge(muni.df,level,by="CODINE")
but it gives an error
Error en fix.by(by.x, x) : 'by' must specify a uniquely valid column
I am not familiar with shapefiles, maybe I need to learn more on shp data attributes to find the right choice to merge both data sets. How can I merge data so I can plot the lines (municipal boundaries) and then fill it with levels?
[NB: This question was asked over a month ago so OP has probably found a different way to solve their problem. I stumbled upon it while working on this related question. This answer is included in hopes it will benefit someone else.]
This appears to be what OP is asking for...
... and was produced with the following code:
require("rgdal")
require("maptools")
require("ggplot2")
require("plyr")
# read temperature data
setwd("<location if your data file>")
temp.data <- read.csv(file = "levels.dat", header=TRUE, sep=" ", na.string="NA", dec=".", strip.white=TRUE)
temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')
# read municipality polygons
setwd("<location of your shapefile")
esp <- readOGR(dsn=".", layer="poligonos_municipio_etrs89")
muni <- subset(esp, esp$PROVINCIA == "46" | esp$PROVINCIA == "12" | esp$PROVINCIA == "3")
# fortify and merge: muni.df is used in ggplot
muni#data$id <- rownames(muni#data)
muni.df <- fortify(muni)
muni.df <- join(muni.df, muni#data, by="id")
muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)
# create the map layers
ggp <- ggplot(data=muni.df, aes(x=long, y=lat, group=group))
ggp <- ggp + geom_polygon(aes(fill=LEVEL)) # draw polygons
ggp <- ggp + geom_path(color="grey", linestyle=2) # draw boundaries
ggp <- ggp + coord_equal()
ggp <- ggp + scale_fill_gradient(low = "#ffffcc", high = "#ff4444",
space = "Lab", na.value = "grey50",
guide = "colourbar")
ggp <- ggp + labs(title="Temperature Levels: Comunitat Valenciana")
# render the map
print(ggp)
Explanation:
Shapefiles imported into R with readOGR(...) are of type SpacialDataFrame and have two main sections: a ploygon section which contains the coordinates of all the points on each polygon, and a data section which contains information about each polygon (so, one row per polygon). These can be referenced, e.g., using muni#polygons and muni#data. The utility function fortify(...) converts the polygon section to a data frame organized for plotting with ggplot. So the basic workflow is:
[1] Import temperature data file (temp.data)
[2] Import polygon shapefile of municipalities (muni)
[3] Convert muni polygons to a data frame for plotting (muni.df <- fortify(...))
[4] Join columns from muni#data to muni.df
[5] Join columns from temp.data to muni.df
[6] Make the plot
The joins must be done on common fields, and this is where most of the problems come in. Each polygon in the original shapefile has a unique ID attribute. Running fortify(...) on the shapefile creates a column, id, which is based on this. But there is no ID column in the data section. Instead, the polygon IDs are stored as row names. So first we must add an id column to muni#data as follows:
muni#data$id <- rownames(muni#data)
Now we have an id field in muni#data and a corresponding id field in muni.df, so we can do the join:
muni.df <- join(muni.df, muni#data, by="id")
To create the map we will need to set fill colors based on temperature level. To do that we need to join the LEVEL column from temp.data to muni.df. In temp.data there is a field CODINE which identifies the municipality. There is also, now, a corresponding field CODIGOINE in muni.df. But there's a problem: CODIGOINE is char(5), with leading zeros, whereas CODINE is integer which means leading zeros are missing (imported from Excel, perhaps?). So just joining on these two fields produces no matches. We must first convert CODINE into char(5) with leading zeros:
temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')
Now we can join temp.dat to muni.df based on the corresponding fields.
muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)
We use merge(...) instead of join(...) because the join fields have different names and join(...) requires them to have the same name. (Note, however that join(...) is faster and should be used if possible). So, finally, we have a data frame which contains all the information for plotting the polygons and the temperature LEVEL which can be used to establish the fill color for each polygon.
Some notes on OP's original code:
OP's first map (the green one at the top) identifies "30 distinct areas for our region...". I could find no shapefile identifying those areas. The municipality file identifies 543 municipalities, and I could see no way to group these into 30 areas. In addition, the temperature level file has 542 rows, one for each municipality (more or less).
OP was importing line shapefiles for municipality to draw the boundaries. You don't need that because geom_polygon(...) will draw (and fill) the polygons and geom_path(...) will draw the boundaries.
I am relatively new to ggplot, so please forgive me if some of my problems are really simple or not solvable at all.
What I am trying to do is generate a "Heat Map" of a country where the filling of the shape is continous. Furthermore I have the shape of the country as .RData. I used hadley wickham's script to transform my SpatialPolygon data into a data frame. The long and lat data of my data frame now looks like this
head(my_df)
long lat group
6.527187 51.87055 0.1
6.531768 51.87206 0.1
6.541202 51.87656 0.1
6.553331 51.88271 0.1
This long/lat data draws the outline of Germany. The rest of the data frame is omitted here since I think it is not needed. I also have a second data frame of values for certain long/lat points. This looks like this
my_fixed_points
long lat value
12.817 48.917 0.04
8.533 52.017 0.034
8.683 50.117 0.02
7.217 49.483 0.0542
What I would like to do now, is colour each point of the map according to an average value over all the fixed points that lie within a certain distance of that point. That way I would get a (almost)continous colouring of the whole map of the country.
What I have so far is the map of the country plotted with ggplot2
ggplot(my_df,aes(long,lat)) + geom_polygon(aes(group=group), fill="white") +
geom_path(color="white",aes(group=group)) + coord_equal()
My first Idea was to generate points that lie within the map that has been drawn and then calculate the value for every generated point my_generated_point like so
value_vector <- subset(my_fixed_points,
spDistsN1(cbind(my_fixed_points$long, my_fixed_points$lat),
c(my_generated_point$long, my_generated_point$lat), longlat=TRUE) < 50,
select = value)
point_value <- mean(value_vector)
I havent found a way to generate these points though. And as with the whole problem, I dont even know if it is possible to solve this way. My question now is if there exists a way to generate these points and/or if there is another way to come to a solution.
Solution
Thanks to Paul I almost got what I wanted. Here is an example with sample data for the Netherlands.
library(ggplot2)
library(sp)
library(automap)
library(rgdal)
library(scales)
#get the spatial data for the Netherlands
con <- url("http://gadm.org/data/rda/NLD_adm0.RData")
print(load(con))
close(con)
#transform them into the right format for autoKrige
gadm_t <- spTransform(gadm, CRS=CRS("+proj=merc +ellps=WGS84"))
#generate some random values that serve as fixed points
value_points <- spsample(gadm_t, type="stratified", n = 200)
values <- data.frame(value = rnorm(dim(coordinates(value_points))[1], 0 ,1))
value_df <- SpatialPointsDataFrame(value_points, values)
#generate a grid that can be estimated from the fixed points
grd = spsample(gadm_t, type = "regular", n = 4000)
kr <- autoKrige(value~1, value_df, grd)
dat = as.data.frame(kr$krige_output)
#draw the generated grid with the underlying map
ggplot(gadm_t,aes(long,lat)) + geom_polygon(aes(group=group), fill="white") + geom_path(color="white",aes(group=group)) + coord_equal() +
geom_tile(aes(x = x1, y = x2, fill = var1.pred), data = dat) + scale_fill_continuous(low = "white", high = muted("orange"), name = "value")
I think what you want is something along these lines. I predict that this homebrew is going to be terribly inefficient for large datasets, but it works on a small example dataset. I would look into kernel densities and maybe the raster package. But maybe this suits you well...
The following snippet of code calculates the mean value of cadmium concentration of a grid of points overlaying the original point dataset. Only points closer than 1000 m are considered.
library(sp)
library(ggplot2)
loadMeuse()
# Generate a grid to sample on
bb = bbox(meuse)
grd = spsample(meuse, type = "regular", n = 4000)
# Come up with mean cadmium value
# of all points < 1000m.
mn_value = sapply(1:length(grd), function(pt) {
d = spDistsN1(meuse, grd[pt,])
return(mean(meuse[d < 1000,]$cadmium))
})
# Make a new object
dat = data.frame(coordinates(grd), mn_value)
ggplot(aes(x = x1, y = x2, fill = mn_value), data = dat) +
geom_tile() +
scale_fill_continuous(low = "white", high = muted("blue")) +
coord_equal()
which leads to the following image:
An alternative approach is to use an interpolation algorithm. One example is kriging. This is quite easy using the automap package (spot the self promotion :), I wrote the package):
library(automap)
kr = autoKrige(cadmium~1, meuse, meuse.grid)
dat = as.data.frame(kr$krige_output)
ggplot(aes(x = x, y = y, fill = var1.pred), data = dat) +
geom_tile() +
scale_fill_continuous(low = "white", high = muted("blue")) +
coord_equal()
which leads to the following image:
However, without knowledge as to what your goal is with this map, it is hard for me to see what you want exactly.
This slideshow offers another approach--see page 18 for a description of the approach and page 21 for a view of what the results looked like for the slide-maker.
Note however that the slide-maker used the sp package and the spplot function rather than ggplot2 and its plotting functions.