Related
Trying to create a choropleth map showing state population also labeling capital cities. I had two data frame initially but was not able not add ggplot 1 to ggplot 2, so I combined two data frames together, part of the table looks like this:
basically trying to combines these two images together:
and
I've written
ggplot(spr, aes(long, lat)) + borders("state") + geom_point() +
coord_quickmap() +geom_label_repel(aes(label = city), size = 2) +
geom_polygon(aes(long, lat, group = capital, fill = pcls),color = "grey") +
coord_map("bonne", parameters=45) +ggthemes::theme_map() +
scale_fill_brewer(palette = "Reds")
but map looks off:
i think it's the polygon part is throwing me off but not sure what to do about it.
You'll need shapefiles, or at least have the borders known to map the data to.
In keeping with your question from the other day, you can still use state. scale_fill_brewer is designed for use with discrete variables. Use scale_fill_gradientn, specifying brewer.pal. Add the capitals layer in there as desired.
library(ggplot2)
library(usmap)
library(maps)
library(ggrepel)
library(ggthemes)
us <- map_data("state") # get the data to plot and map data to
data(statepop)
pops <- statepop
pops$full <- tolower(pops$full)
ggplot() + geom_map(data = us, map = us, aes(long, lat, map_id = region), fill = "#ffffff", color = "#ffffff", size = 0.15) +
geom_map(data = pops, map = us, aes(fill = pop_2015, map_id = full), size = 0.15) +
coord_map("bonne", parameters=45) +
scale_fill_gradientn(colors = brewer.pal(9, "Reds")) + #adjust the number as necessary
borders("state") +
ggthemes::theme_map()
I want to set custom shape, size and color for points added to a map based upon a variable called 'Dataset'. I'm able to set the color of the points if I set the shape to the same type for all the points, but I'm hoping to have a map with a little more information. When I runt this code, all the points are circles colored black. What am I missing?
Thanks everyone for your help & time!!
Here's a reproducible example:
# Read in libraries
library(ggplot2)
library(maps)
library(maptools)
library(ggmap)
# Create mapping objects
world <- map_data("world2")
world$long <- world$long
state_dat <- map_data("state")
canada <- world[world$region==c("Canada"),]
map_dat <- rbind(state_dat, canada)
# Create custom shapes, sizes, colors
pt_colors=c("red", "blue", "grey", "green")
shapes = c(120, 22, 24, 21)
shape_size = c(1.1, 0.8, 1, 1)
# Create lat/long dataframe
xy <- data.frame(Dataset=c("GBIF","Flower","GBIF","Leaf","DNA","GBIF","GBIF","Leaf","GBIF","GBIF","DNA","GBIF","DNA","GBIF","GBIF","Leaf","GBIF","GBIF","GBIF","DNA"),
lat=c(38.89450,34.45300,39.86556,30.38818,28.74590,33.78527,41.23439,30.37935,41.38250,40.60648,30.87580,40.56425,28.75000,41.52666,35.46451,30.73621,38.50221,33.70335,38.98000,29.61100),
long=c(-77.06292,-84.22643,-79.50248,-84.64519,-81.47860,-84.37109,-81.46374,-86.17667,-72.10861,-74.53538,-84.41520,-74.86654,-81.47750,-73.15833,-78.89952,-86.73095,-78.40308,-86.70289,-77.03917,-81.78740)
)
# Create base map
p0 <- ggplot() +
geom_polygon(data=map_dat,aes(x=long,y=lat,group=group, fill=region),fill="white",color="black", show.legend=FALSE)+
coord_map("gilbert",xlim=c(-60,-97),ylim=c(15,47.5)) +#mollweide is pretty good
labs(x=expression("Longitude"*~degree*W), y=expression("Latitude"*~degree*N)) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=1),
plot.margin=unit(c(0.25,0.25,0.25,0.25),'inches'),
legend.position='none') +
theme(rect = element_blank())
# Add points to the map
p1 <- p0 +
geom_point(data=xy,aes(x=long,y=lat,fill=Dataset)) +
scale_color_manual(values=pt_colors) +
scale_shape_manual(values=shapes) +
scale_size_manual(values=shape_size)
You need to have colour, shape, and size within your geom_point aesthetic values. Geom_point doesn't use fill as an aesthetic, but uses colour.
Simply fixing that will generate what you want.
p1 <- p0 +
geom_point(data=xy,aes(x=long,y=lat,colour = Dataset, shape = Dataset, size = Dataset)) +
scale_color_manual(values=pt_colors) +
scale_shape_manual(values=shapes) +
scale_size_manual(values=shape_size)
I am having trouble in separating the legends in a ggplot2 graph with multiple layers. What my plot does is to fill different municipalities according to the number of textile companies present there and I also plot the plant localization with geom_point. My guess is to use aes.override() somehow, but I haven't been able to do this still. The solutions that I have read do not deal with a different variable for the plots detailed in the aes() of geom_point().
If you want to test the code below, you could download the shapefile for the brazilian municipalities here, use readOGR and fortify, then choose to fill the municipalities with your preference with fill and set arbitrarily random points within Brazil for geom_point() creating a different variable, such as lat_plant and long_plant below. The region column below details brazilian regions -- in this case, the "1" details the northern region of Brazil.
The Code
#setting the ggplot
library(ggplot2)
gg2 < -ggplot(data = out[out$region =="1",],
aes(x = long, y = lat, group = group, fill = as.factor(companies))) +
geom_polygon() +
ggtitle("title") +
scale_fill_discrete(name = "Number of Textile Companies") +
theme(plot.title = element_text(size = 30, face = "bold")) +
theme(legend.text = element_text(size = 12),
legend.title = element_text(colour = "blue", size = 16, face = "bold"))
#graph output
gg2 +
geom_point(data = out[out$region =="1",], aes(x = long_plant, y = lat_plant), color = "red")
What I am getting as legend is this:
And I would like to separate it, detailing that the dots as localizations and the colors as the filling for the number of textile companies in the region.
I leave another option for you. hmgeiger treated the number of textile companies as factor. But, I rather treated the variable as a continuous variable. Since there is NO reproducible data, I created a sample data by myself. Here, I created random samples uing longitude and latitude of Brazil, and made sure that some data points stay in Brazil. whatever2 contains data points staying in Brazil. I did a bit of trick here as well. I added a new column called Factory location. This is the dummy variable for adding color to data points in the final graphic. hmgeiger created Dummy.var that contains characters for you. I rather left "" in this column since you may not want to see any text in legend.
For your legend issue, as Antonio mentioned and hmgeiger did, you need to add color in aes() in geom_point(). This solves it. I did a bit more thing for you. If you do not know how many factories exist in each municipal, you need to count the number of factories. I did the job using poly.count() in the GISTools package and created another data frame that contains the numbers of factories in each municipal.
When I drew the map, I had three layers. One is for the polygons and another for filling the polygons with colors. They are done with geom_cartogram() from the ggalt package. The key thing is that you need to have a common key column for map_id. id in the first geom_cartogram() and ind in the second geom_cartogram() are identical information. In geom_point() you need color in aes(). The legend has a continuous bar for the number of factories and a single dot for factory location. No text exists next to it. So this makes the legend tidy, I think.
library(raster)
library(tidyverse)
library(GISTools)
library(RColorBrewer)
library(ggalt)
library(ggthemes)
# Get polygon data for Brazil
brazil <- getData("GADM", country = "brazil", level = 1)
mymap <- fortify(brazil)
# Create dummy data staying in the polygons
# For more information: https://stackoverflow.com/questions/47696382/removing-data-outside-country-map-boundary-in-r/47699405#47699405
set.seed(123)
mydata <- data.frame(long = runif(200, min = quantile(mymap$long)[1], max = quantile(mymap$long)[4]),
lat = runif(200, min = quantile(mymap$lat)[1], max = quantile(mymap$lat)[4]),
factory = paste("factory ", 1:200, sep = ""),
stringsAsFactors = FALSE)
spdf <- SpatialPointsDataFrame(coords = mydata[, c("long", "lat")], data = mydata,
proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
whatever <- spdf[!is.na(over(spdf, as(brazil, "SpatialPolygons"))), ]
whatever2 <- as.data.frame(whatever) %>%
mutate(`Factory location` = "")
# Now I check how many data points (factories) exist in each polygon
# and create a data frame
factory.num <- poly.counts(pts = whatever, polys = brazil)
factory.num <- stack(factory.num)
ggplot() +
geom_cartogram(data = mymap, aes(x = long, y = lat, map_id = id),
map = mymap) +
geom_cartogram(data = factory.num, aes(fill = values, map_id = ind),
map = mymap) +
geom_point(data = whatever2, aes(x = long, y = lat, color = `Factory location`)) +
scale_fill_gradientn(name = "Number of factories", colours = brewer.pal(5, "Greens")) +
coord_map() +
theme_map()
FYI, the link you posted to download the shape file is quite slow, at least to download to a US computer.
This link has downloads that work a lot better, and also shows how to read in shape data: https://dioferrari.wordpress.com/2014/11/27/plotting-maps-using-r-example-with-brazilian-municipal-level-data/
I made an example using the regions rather than municipalities data to keep it simple.
Data I used available for download here: https://drive.google.com/file/d/0B64xLcn8DZfwakNMbHFLQWo4YzA/view?usp=sharing
#Load libraries.
library(rgeos)
library(rgdal)
library(ggplot2)
#Read in and format map data.
regions_OGR <- readOGR(dsn="/Users/hmgeiger/Downloads/regioes_2010",
layer = "regioes_2010")
map_regions <- spTransform(regions_OGR,CRS("+proj=longlat +datum=WGS84"))
map_regions_fortified <- fortify(map_regions)
#We make there be 0, 1, or 3 textile companies.
#map_regions_fortified is in order by ID (region).
#So, we add a column with the number of textile companies
#repeated the right number of times for how many of each region there is.
num_rows_per_region <- data.frame(table(map_regions_fortified$id))
map_regions_fortified <- data.frame(map_regions_fortified,
Num.factories = factor(rep(c(1,0,1,3,1),times=num_rows_per_region$Freq)))
#First, plot without any location dots.
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
Now, let's add the factory locations.
#Set latitude and longitude based on the number of factories per region.
factory_locations <- data.frame(long = c(-65,-55,-51,-44,-42,-38),
lat = c(-5,-15,-27,-7,-12,-8))
#Add a dummy variable, which then allows the colour of the dots
#to be a part of the legend.
factory_locations <- data.frame(factory_locations,
Dummy.var = rep("One dot = one factory location",times=nrow(factory_locations)))
#Replot adding factory location dots.
#We will use black dots here since will be easier to see.
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
+ geom_point(data = factory_locations,aes(x = long,y = lat,colour = Dummy.var))
+ scale_colour_manual(values="black") + labs(colour="")
#Bonus: Let's change the color vector to something more color-blind friendly.
mycol <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442",
"#0072B2", "#D55E00", "#CC79A7","#490092")
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
+ geom_point(data = factory_locations,aes(x = long,y = lat,colour = Dummy.var))
+ scale_colour_manual(values="black") + labs(colour="")
+ scale_fill_manual(values=mycol)
My challenge is to add several text labels around the same point on a map. The MWE data frame puts six sports teams around New York City.
library(maps)
library(mapproj)
library(maptools)
all_states <- map_data("state") # load map outline and borders for US states
ny <- subset(all_states, region == "new york") # select only New York
nyteams <- c("Mets", "Yankees", "Knicks", "Giants", "Islanders", "Jets") # for text labels
df <- data.frame(long = rep(-73.99, times = 6), lat = rep(40.71, times =6)) # NYC coordinates for each team
df <- cbind(nyteams, df) # combine the columns to create the data frame for ggplot2
df <- cbind(df, rownum = seq(1:nrow(df))) # variable for spreading text labels by vertical latitude
It is simple to add the text labels vertically by incrementing the latitude of each point.
df$lat2 <- df$lat + (0.1*df$rownum) # to spread the text labels up the latitude
ggplot(data = df, aes(long, lat2)) +
geom_polygon(data = ny, aes(x=long, y=lat, group = group), colour="grey70", fill="white") +
coord_map("mercator") + # did not include geom_point() since text labels are sufficient
geom_text(aes(label = nyteams), size = 3)
But I worked out manually a system for placing the first team at the NYC latitude and longitude, the 2nd team just above it, the 3rd team to the right, the 4th just below it, and the 5th to the left (I can shorten names of teams to avoid over-writing Islanders, for example), etc..
df$lat2 <- df$lat + c(0, 0.1, 0.0, -0.1, 0.0, 0.2)
df$long2 <- df$long + c(0, 0.0, 0.3, 0.0, -0.5, 0.0)
ggplot(data = df, aes(long2, lat2)) +
geom_polygon(data = ny, aes(x=long, y=lat, group = group), colour="grey70", fill="white") +
coord_map("mercator") +
geom_text(aes(label = nyteams), size = 3)
Programming Question: How might R create such multiple placements of text labels without so much manual intervention?
I tried position = "jitter" and position = "dodge" to no avail.
Several other questions on SO have asked about adding multiple text annotations to a map. Limitations afflict all of them.
Can you plot a table onto a ggmap similar to annotation_custom method for non- Cartesian coordinates
Dynamic position for ggplot2 objects (especially geom_text)?
But directlabels package does not work on individual points
https://stats.stackexchange.com/questions/16057/how-do-i-avoid-overlapping-labels-in-an-r-plot/62856#62856
FField package
Here's one "semi"-automated approach, which places the labels in a circular pattern around the point.
ny.coords <- data.frame(long=-73.99, lat=40.71)
n <- length(nyteams)
r <- 0.3
th <- seq(0,2*(n-1)/n*pi,len=n)
coords <- data.frame(long=r*sin(th)+ny.coords$long,lat=r*cos(th)+ny.coords$lat)
ggplot(data=ny,aes(x=long,y=lat)) +
geom_polygon(data = ny, aes(group = group), colour="grey70", fill="white") +
geom_text(data=coords,aes(label = nyteams), size = 3)+
geom_point(data=ny.coords,color="red",size=3)+
coord_map("mercator",xlim=c(-75,-73),ylim=c(40,41.5))
The "semi" bit is that I picked a radius (r) based on the scale of the map, but you could probably automate that as well.
EDIT: Response to OP's comment.
There's nothing in this approach that explicitly avoids overlaps. However, changing the line
th <- seq(0,2*(n-1)/n*pi,len=n)
to
th <- seq(0,2*(n-1)/n*pi,len=n) + pi/(2*n)
produces this:
which has the the label positions rotated a bit and can (sometimes) avoid overlaps, if there are not too many labels.
Also, you should check out the directlabels package.
I have sampled 10,000 coordinates from my data in this file. I have around 130,000 points.
https://www.dropbox.com/s/40hfyx6a5hsjuv7/data.csv
I am trying to plot these points on the Americas map using ggplot2. Here is my code.
library(ggplot2)
library(maps)
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot(data = data_coords, legend = FALSE) +
geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) +
geom_point(aes(x = lon, y = lat), shape = 19, size = 0.00001,
alpha = 0.3, colour = "red") +
theme(panel.grid.major = element_blank()) +
theme(panel.grid.minor = element_blank()) +
theme(axis.text.x = element_blank(),axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
xlab("") + ylab("")
png("my_plot.png", width = 8000, height = 7000, res = 1000)
print(p)
dev.off()
The points seem to cover the whole area in which they were plotted. I would like them to be more smaller to better represent a location. You can see that I've set the size to 0.00001. I was just trying to see if it has any effect but it doesn't seem to help after a certain limit. Is this the best that is possible at this resolution or could it be reduced more?
I had actually plotted around 400,000 points but only on the US map before and they looked much better like below. Hoping to get something like this. Thanks.
https://www.dropbox.com/s/8d0niu9g6ygz0wo/Clusters_reduced.png
Try playing with very small values of alpha, instead of the point size:
http://docs.ggplot2.org/0.9.3.1/geom_point.html
# Varying alpha is useful for large datasets
d <- ggplot(diamonds, aes(carat, price))
d + geom_point(alpha = 1/1000)
Edit:
Additional ideas are given in the documentation. Here's a summary:
Details
The scatterplot is useful for displaying the relationship between two continuous variables, although it can also be used with one continuous and one categorical variable, or two categorical variables. See geom_jitter for possibilities.
The bubblechart is a scatterplot with a third variable mapped to the size of points. There are no special names for scatterplots where another variable is mapped to point shape or colour, however.
The biggest potential problem with a scatterplot is overplotting: whenever you have more than a few points, points may be plotted on top of one another. This can severely distort the visual appearance of the plot. There is no one solution to this problem, but there are some techniques that can help. You can add additional information with stat_smooth, stat_quantile or stat_density2d. If you have few unique x values, geom_boxplot may also be useful. Alternatively, you can summarise the number of points at each location and display that in some way, using stat_sum.
Another technique is to use transparent points, geom_point(alpha = 0.05).
Edit 2:
Combining the details from the manual with the hints in Transparency and Alpha levels for ggplot2 stat_density2d with maps and layers in R
This might look like the solution:
library(ggplot2)
library(maps)
data_coords <- read.csv("C:/Downloads/data.csv")
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot( data = data_coords, legend = FALSE) +
geom_polygon( data = map_world, aes(x = long, y = lat, group = group)) +
stat_density2d( data = data_coords, aes(x=lon, y=lat, fill = as.factor(..level..)), size=1, bins=10, geom='polygon') +
scale_fill_manual(values = c("yellow","red","green","royalblue", "black","white","orange","brown","grey"))
png("my_plot2k.png", width = 2000, height = 2000, res = 500)
print(p)
dev.off()
Resulting image (not the best colour palette used):