How can I edit the place of the legends with ggplot2 - r

I am having trouble in separating the legends in a ggplot2 graph with multiple layers. What my plot does is to fill different municipalities according to the number of textile companies present there and I also plot the plant localization with geom_point. My guess is to use aes.override() somehow, but I haven't been able to do this still. The solutions that I have read do not deal with a different variable for the plots detailed in the aes() of geom_point().
If you want to test the code below, you could download the shapefile for the brazilian municipalities here, use readOGR and fortify, then choose to fill the municipalities with your preference with fill and set arbitrarily random points within Brazil for geom_point() creating a different variable, such as lat_plant and long_plant below. The region column below details brazilian regions -- in this case, the "1" details the northern region of Brazil.
The Code
#setting the ggplot
library(ggplot2)
gg2 < -ggplot(data = out[out$region =="1",],
aes(x = long, y = lat, group = group, fill = as.factor(companies))) +
geom_polygon() +
ggtitle("title") +
scale_fill_discrete(name = "Number of Textile Companies") +
theme(plot.title = element_text(size = 30, face = "bold")) +
theme(legend.text = element_text(size = 12),
legend.title = element_text(colour = "blue", size = 16, face = "bold"))
#graph output
gg2 +
geom_point(data = out[out$region =="1",], aes(x = long_plant, y = lat_plant), color = "red")
What I am getting as legend is this:
And I would like to separate it, detailing that the dots as localizations and the colors as the filling for the number of textile companies in the region.

I leave another option for you. hmgeiger treated the number of textile companies as factor. But, I rather treated the variable as a continuous variable. Since there is NO reproducible data, I created a sample data by myself. Here, I created random samples uing longitude and latitude of Brazil, and made sure that some data points stay in Brazil. whatever2 contains data points staying in Brazil. I did a bit of trick here as well. I added a new column called Factory location. This is the dummy variable for adding color to data points in the final graphic. hmgeiger created Dummy.var that contains characters for you. I rather left "" in this column since you may not want to see any text in legend.
For your legend issue, as Antonio mentioned and hmgeiger did, you need to add color in aes() in geom_point(). This solves it. I did a bit more thing for you. If you do not know how many factories exist in each municipal, you need to count the number of factories. I did the job using poly.count() in the GISTools package and created another data frame that contains the numbers of factories in each municipal.
When I drew the map, I had three layers. One is for the polygons and another for filling the polygons with colors. They are done with geom_cartogram() from the ggalt package. The key thing is that you need to have a common key column for map_id. id in the first geom_cartogram() and ind in the second geom_cartogram() are identical information. In geom_point() you need color in aes(). The legend has a continuous bar for the number of factories and a single dot for factory location. No text exists next to it. So this makes the legend tidy, I think.
library(raster)
library(tidyverse)
library(GISTools)
library(RColorBrewer)
library(ggalt)
library(ggthemes)
# Get polygon data for Brazil
brazil <- getData("GADM", country = "brazil", level = 1)
mymap <- fortify(brazil)
# Create dummy data staying in the polygons
# For more information: https://stackoverflow.com/questions/47696382/removing-data-outside-country-map-boundary-in-r/47699405#47699405
set.seed(123)
mydata <- data.frame(long = runif(200, min = quantile(mymap$long)[1], max = quantile(mymap$long)[4]),
lat = runif(200, min = quantile(mymap$lat)[1], max = quantile(mymap$lat)[4]),
factory = paste("factory ", 1:200, sep = ""),
stringsAsFactors = FALSE)
spdf <- SpatialPointsDataFrame(coords = mydata[, c("long", "lat")], data = mydata,
proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
whatever <- spdf[!is.na(over(spdf, as(brazil, "SpatialPolygons"))), ]
whatever2 <- as.data.frame(whatever) %>%
mutate(`Factory location` = "")
# Now I check how many data points (factories) exist in each polygon
# and create a data frame
factory.num <- poly.counts(pts = whatever, polys = brazil)
factory.num <- stack(factory.num)
ggplot() +
geom_cartogram(data = mymap, aes(x = long, y = lat, map_id = id),
map = mymap) +
geom_cartogram(data = factory.num, aes(fill = values, map_id = ind),
map = mymap) +
geom_point(data = whatever2, aes(x = long, y = lat, color = `Factory location`)) +
scale_fill_gradientn(name = "Number of factories", colours = brewer.pal(5, "Greens")) +
coord_map() +
theme_map()

FYI, the link you posted to download the shape file is quite slow, at least to download to a US computer.
This link has downloads that work a lot better, and also shows how to read in shape data: https://dioferrari.wordpress.com/2014/11/27/plotting-maps-using-r-example-with-brazilian-municipal-level-data/
I made an example using the regions rather than municipalities data to keep it simple.
Data I used available for download here: https://drive.google.com/file/d/0B64xLcn8DZfwakNMbHFLQWo4YzA/view?usp=sharing
#Load libraries.
library(rgeos)
library(rgdal)
library(ggplot2)
#Read in and format map data.
regions_OGR <- readOGR(dsn="/Users/hmgeiger/Downloads/regioes_2010",
layer = "regioes_2010")
map_regions <- spTransform(regions_OGR,CRS("+proj=longlat +datum=WGS84"))
map_regions_fortified <- fortify(map_regions)
#We make there be 0, 1, or 3 textile companies.
#map_regions_fortified is in order by ID (region).
#So, we add a column with the number of textile companies
#repeated the right number of times for how many of each region there is.
num_rows_per_region <- data.frame(table(map_regions_fortified$id))
map_regions_fortified <- data.frame(map_regions_fortified,
Num.factories = factor(rep(c(1,0,1,3,1),times=num_rows_per_region$Freq)))
#First, plot without any location dots.
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
Now, let's add the factory locations.
#Set latitude and longitude based on the number of factories per region.
factory_locations <- data.frame(long = c(-65,-55,-51,-44,-42,-38),
lat = c(-5,-15,-27,-7,-12,-8))
#Add a dummy variable, which then allows the colour of the dots
#to be a part of the legend.
factory_locations <- data.frame(factory_locations,
Dummy.var = rep("One dot = one factory location",times=nrow(factory_locations)))
#Replot adding factory location dots.
#We will use black dots here since will be easier to see.
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
+ geom_point(data = factory_locations,aes(x = long,y = lat,colour = Dummy.var))
+ scale_colour_manual(values="black") + labs(colour="")
#Bonus: Let's change the color vector to something more color-blind friendly.
mycol <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442",
"#0072B2", "#D55E00", "#CC79A7","#490092")
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
+ geom_point(data = factory_locations,aes(x = long,y = lat,colour = Dummy.var))
+ scale_colour_manual(values="black") + labs(colour="")
+ scale_fill_manual(values=mycol)

Related

CanĀ“t make gg-plot show legend correct in R

I am computing a central warehouse problem and need help to visualize the result.
I have a dataset with coordinates for local warehouses which I have then made a clustering in kmeans() which provide coordinates for the central-warehouse locations.
In total there are 9 clusters, so k=9. And there are a total of 552 local-warehouses.
I would like to show a plot with different colors based on which cluster (central-warehouse) each local-warehouse belong to. And show the index of the cluster in a legend.
To set the colors I use the palette() and add color to have a total of 9 different colors.
Using the code in #1# I get the result I want but without the legend.
Using the code in #2# I get the legend but in wrong colors and not correct "format" of legend.
cc <- palette()
palette(c(cc,"purple"))
palette()
#1#
get_map("Mexico", zoom = 5) %>% ggmap()+
geom_point(data = Datafor_k_9, aes(x = `CW-lon`, y = `CW-lat`), col=Datafor_k_9$BelongToK, size = 3, shape=8)+
geom_point(data = Datafor_k_9, aes(x = `LW-lon`, y= `LW-lat`), col=Datafor_k_9$BelongToK, size=1)
p <- ggmap(get_googlemap("Mexico",zoom = 5,
maptype ='terrain',
color = 'color'))
p+geom_point(data = Datafor_k_9, aes(x = `CW-lon`, y = `CW-lat`,col=Datafor_k_9$BelongToK), size = 4,shape=8)+
geom_point(data = Datafor_k_9, aes(x = `LW-lon`, y= `LW-lat`, col=Datafor_k_9$BelongToK), size=1)
theme(legend.position = "right")
What am I doing wrong?
The data is combined in a single df called "Datafor_k_9".
The coordinates for local-warehouse is called "LW-lon"/"LW-lat" and for central-warehouse "CW-lon"/"CW-lat". And they SHOULD be tied together with their shared "BelongToK"
Link to .csv of Data:
https://drive.google.com/open?id=1JsyEVqknEfcuakSXydFAq2OQ1eHYznSX

Choropleth Map of State Population

Trying to create a choropleth map showing state population also labeling capital cities. I had two data frame initially but was not able not add ggplot 1 to ggplot 2, so I combined two data frames together, part of the table looks like this:
basically trying to combines these two images together:
and
I've written
ggplot(spr, aes(long, lat)) + borders("state") + geom_point() +
coord_quickmap() +geom_label_repel(aes(label = city), size = 2) +
geom_polygon(aes(long, lat, group = capital, fill = pcls),color = "grey") +
coord_map("bonne", parameters=45) +ggthemes::theme_map() +
scale_fill_brewer(palette = "Reds")
but map looks off:
i think it's the polygon part is throwing me off but not sure what to do about it.
You'll need shapefiles, or at least have the borders known to map the data to.
In keeping with your question from the other day, you can still use state. scale_fill_brewer is designed for use with discrete variables. Use scale_fill_gradientn, specifying brewer.pal. Add the capitals layer in there as desired.
library(ggplot2)
library(usmap)
library(maps)
library(ggrepel)
library(ggthemes)
us <- map_data("state") # get the data to plot and map data to
data(statepop)
pops <- statepop
pops$full <- tolower(pops$full)
ggplot() + geom_map(data = us, map = us, aes(long, lat, map_id = region), fill = "#ffffff", color = "#ffffff", size = 0.15) +
geom_map(data = pops, map = us, aes(fill = pop_2015, map_id = full), size = 0.15) +
coord_map("bonne", parameters=45) +
scale_fill_gradientn(colors = brewer.pal(9, "Reds")) + #adjust the number as necessary
borders("state") +
ggthemes::theme_map()

ggplot2 facet plot of shapefile polygons produces strange lines

I'm working to produce a facet/lattice plot of choropleth maps that each show a how different model runs affect one variable being mapped across a number of polygons. The problem is that the output graphic produces strange lines that run between the polygons in each plot (see the graphic below).
While I've manipulated and converted the shapefile into a data frame with appropriate attributes for ggplot2, I'm not familiar with the details of how to use the package and the online documentation is limited for such a complex package. I'm not sure what parameter is causing this issue, but I suspect it may be the aes parameter.
The script:
library(rgdal, tidyr, maptools, ggplot2, dplyr, reshape2)
setwd('D:/path/to/wd')
waterloo <- read.table("waterloo-data.txt", header=TRUE, sep=',', stringsAsFactors=FALSE)
waterloo <- data.frame(waterloo$DAUID, waterloo$LA0km, waterloo$LA4_exp, waterloo$LA20km, waterloo$LA30km, waterloo$LA40km, waterloo$LA50km)
colnames(waterloo) <- c("DAUID", "LA0km", "LA10km","LA20km", "LA30km", "LA40km", "LA50km")
## Produces expenditure measurements by ID variable DAUID, using reshape2/melt
wtidy <- melt(waterloo, id.vars=c("DAUID"), measure.vars = c("LA0km", "LA10km", "LA20km", "LA30km", "LA40km", "LA50km"))
colnames(wtidy) <- c("DAUID", "BufferSize", "Expenditure")
wtidy$DAUID <- as.factor(wtidy$DAUID) # for subsequent join with wtrl_f
### READ SPATIAL DATA ###
#wtrl <- readOGR(".", "Waterloo_DA_2011_new")
wtrl <- readShapeSpatial("Waterloo_DA_2011_new")
wtrl$id <- row.names(wtrl)
wtrl_f <- fortify(wtrl)
wtrl_f <- left_join(wtrl_f, wtrl#data, by="id")
# Join wtrl fortified (wtrl_f) to either twaterloo or wtidy
wtrl_f <- left_join(wtrl_f, wtidy, by="DAUID")
### PLOT SPATIAL DATA ###
ggplot(data = wtrl_f, # the input data
aes(x = long.x, y = lat.x, fill = Variable/1000, group = BufferSize)) + # define variables
geom_polygon() + # plot the DAs
geom_path(colour="black", lwd=0.05) + # polygon borders
coord_equal() + # fixed x and y scales
facet_wrap(~ BufferSize, ncol = 2) + # one plot per buffer size
scale_fill_gradient2(low = "green", mid = "grey", high = "red", # colors
midpoint = 10000, name = "Variable\n(thousands)") + # legend options
theme(axis.text = element_blank(), # change the theme options
axis.title = element_blank(), # remove axis titles
axis.ticks = element_blank()) # remove axis ticks
The output graphic appears as follows:
Strange! I've made good progress but I don't know where ggplot is getting these lines. Any help on this would be appreciated!
PS; as an additional unrelated question, the polygon lines are rather jagged. How would I smooth these lines?
This answer helped me to solve my problem, but not before I made up this minimal example ready to post. I'm sharing it here in case it helps someone solve the same problem faster.
Problem:
I'm trying to make a basic map in R with ggplot2. The polygons are filling wrong, making extra lines.
library("ggplot2")
library("maps")
map <- ggplot(map_data("world", region = "UK"), aes(x = long, y = lat)) + geom_polygon()
map
wrong map image
Solution:
I have to set the aesthetic "group" parameter to put the polygon points in the right order, otherwise ggplot will try to plot a patch of Scotland coastline in the middle of the south coast (for example).
map <- ggplot(map_data("world", region = "UK"), aes(x = long, y = lat, group = group)) + geom_polygon()
map
OK, I managed to resolve this issue by changing the aesthetic group parameter found on page 11 of the ggplot2 manual:
http://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf
The correct parameter is "group" and not the factor that is used to group the plots. The correct ggplot code:
ggplot(data = wtrl_f, # the input data
aes(x = long.x, y = lat.x, fill = Expenditure/1000, group = group)) + # define variables
geom_polygon() + # plot the DAs
geom_path(colour="black", lwd=0.025) + # DA borders
coord_equal() + # fixed x and y scales
facet_wrap(~ BufferSize, ncol = 2) + # one plot per buffer size
scale_fill_gradient2(low = "green", mid = "grey", high = "red", # colors
midpoint = 10000, name = "Expenditures\n(thousands)") + # legend options
theme(axis.text = element_blank(), # change the theme options
axis.title = element_blank(), # remove axis titles
axis.ticks = element_blank()) # remove axis ticks

Add multiple text labels programmatically around same point on ggplot map

My challenge is to add several text labels around the same point on a map. The MWE data frame puts six sports teams around New York City.
library(maps)
library(mapproj)
library(maptools)
all_states <- map_data("state") # load map outline and borders for US states
ny <- subset(all_states, region == "new york") # select only New York
nyteams <- c("Mets", "Yankees", "Knicks", "Giants", "Islanders", "Jets") # for text labels
df <- data.frame(long = rep(-73.99, times = 6), lat = rep(40.71, times =6)) # NYC coordinates for each team
df <- cbind(nyteams, df) # combine the columns to create the data frame for ggplot2
df <- cbind(df, rownum = seq(1:nrow(df))) # variable for spreading text labels by vertical latitude
It is simple to add the text labels vertically by incrementing the latitude of each point.
df$lat2 <- df$lat + (0.1*df$rownum) # to spread the text labels up the latitude
ggplot(data = df, aes(long, lat2)) +
geom_polygon(data = ny, aes(x=long, y=lat, group = group), colour="grey70", fill="white") +
coord_map("mercator") + # did not include geom_point() since text labels are sufficient
geom_text(aes(label = nyteams), size = 3)
But I worked out manually a system for placing the first team at the NYC latitude and longitude, the 2nd team just above it, the 3rd team to the right, the 4th just below it, and the 5th to the left (I can shorten names of teams to avoid over-writing Islanders, for example), etc..
df$lat2 <- df$lat + c(0, 0.1, 0.0, -0.1, 0.0, 0.2)
df$long2 <- df$long + c(0, 0.0, 0.3, 0.0, -0.5, 0.0)
ggplot(data = df, aes(long2, lat2)) +
geom_polygon(data = ny, aes(x=long, y=lat, group = group), colour="grey70", fill="white") +
coord_map("mercator") +
geom_text(aes(label = nyteams), size = 3)
Programming Question: How might R create such multiple placements of text labels without so much manual intervention?
I tried position = "jitter" and position = "dodge" to no avail.
Several other questions on SO have asked about adding multiple text annotations to a map. Limitations afflict all of them.
Can you plot a table onto a ggmap similar to annotation_custom method for non- Cartesian coordinates
Dynamic position for ggplot2 objects (especially geom_text)?
But directlabels package does not work on individual points
https://stats.stackexchange.com/questions/16057/how-do-i-avoid-overlapping-labels-in-an-r-plot/62856#62856
FField package
Here's one "semi"-automated approach, which places the labels in a circular pattern around the point.
ny.coords <- data.frame(long=-73.99, lat=40.71)
n <- length(nyteams)
r <- 0.3
th <- seq(0,2*(n-1)/n*pi,len=n)
coords <- data.frame(long=r*sin(th)+ny.coords$long,lat=r*cos(th)+ny.coords$lat)
ggplot(data=ny,aes(x=long,y=lat)) +
geom_polygon(data = ny, aes(group = group), colour="grey70", fill="white") +
geom_text(data=coords,aes(label = nyteams), size = 3)+
geom_point(data=ny.coords,color="red",size=3)+
coord_map("mercator",xlim=c(-75,-73),ylim=c(40,41.5))
The "semi" bit is that I picked a radius (r) based on the scale of the map, but you could probably automate that as well.
EDIT: Response to OP's comment.
There's nothing in this approach that explicitly avoids overlaps. However, changing the line
th <- seq(0,2*(n-1)/n*pi,len=n)
to
th <- seq(0,2*(n-1)/n*pi,len=n) + pi/(2*n)
produces this:
which has the the label positions rotated a bit and can (sometimes) avoid overlaps, if there are not too many labels.
Also, you should check out the directlabels package.

Coordinate points appropriately sized in ggplot2

I have sampled 10,000 coordinates from my data in this file. I have around 130,000 points.
https://www.dropbox.com/s/40hfyx6a5hsjuv7/data.csv
I am trying to plot these points on the Americas map using ggplot2. Here is my code.
library(ggplot2)
library(maps)
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot(data = data_coords, legend = FALSE) +
geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) +
geom_point(aes(x = lon, y = lat), shape = 19, size = 0.00001,
alpha = 0.3, colour = "red") +
theme(panel.grid.major = element_blank()) +
theme(panel.grid.minor = element_blank()) +
theme(axis.text.x = element_blank(),axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
xlab("") + ylab("")
png("my_plot.png", width = 8000, height = 7000, res = 1000)
print(p)
dev.off()
The points seem to cover the whole area in which they were plotted. I would like them to be more smaller to better represent a location. You can see that I've set the size to 0.00001. I was just trying to see if it has any effect but it doesn't seem to help after a certain limit. Is this the best that is possible at this resolution or could it be reduced more?
I had actually plotted around 400,000 points but only on the US map before and they looked much better like below. Hoping to get something like this. Thanks.
https://www.dropbox.com/s/8d0niu9g6ygz0wo/Clusters_reduced.png
Try playing with very small values of alpha, instead of the point size:
http://docs.ggplot2.org/0.9.3.1/geom_point.html
# Varying alpha is useful for large datasets
d <- ggplot(diamonds, aes(carat, price))
d + geom_point(alpha = 1/1000)
Edit:
Additional ideas are given in the documentation. Here's a summary:
Details
The scatterplot is useful for displaying the relationship between two continuous variables, although it can also be used with one continuous and one categorical variable, or two categorical variables. See geom_jitter for possibilities.
The bubblechart is a scatterplot with a third variable mapped to the size of points. There are no special names for scatterplots where another variable is mapped to point shape or colour, however.
The biggest potential problem with a scatterplot is overplotting: whenever you have more than a few points, points may be plotted on top of one another. This can severely distort the visual appearance of the plot. There is no one solution to this problem, but there are some techniques that can help. You can add additional information with stat_smooth, stat_quantile or stat_density2d. If you have few unique x values, geom_boxplot may also be useful. Alternatively, you can summarise the number of points at each location and display that in some way, using stat_sum.
Another technique is to use transparent points, geom_point(alpha = 0.05).
Edit 2:
Combining the details from the manual with the hints in Transparency and Alpha levels for ggplot2 stat_density2d with maps and layers in R
This might look like the solution:
library(ggplot2)
library(maps)
data_coords <- read.csv("C:/Downloads/data.csv")
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot( data = data_coords, legend = FALSE) +
geom_polygon( data = map_world, aes(x = long, y = lat, group = group)) +
stat_density2d( data = data_coords, aes(x=lon, y=lat, fill = as.factor(..level..)), size=1, bins=10, geom='polygon') +
scale_fill_manual(values = c("yellow","red","green","royalblue", "black","white","orange","brown","grey"))
png("my_plot2k.png", width = 2000, height = 2000, res = 500)
print(p)
dev.off()
Resulting image (not the best colour palette used):

Resources