I have a plot (made in R with ggplot2) that's the result of some singular value decomposition of a bunch of text data, so I basically have a data set of ~100 words used in some reviews and ~10 categories of reviews, with 2D coordinates for each of them. I'm having trouble getting the plot to look legible because of the amount of text and how close together a lot of the important points are.
The way my data is structured now, I'm plotting 2 different geom_texts with different formatting and whatnot, passing each one a separate data frame of coordinates. This has been easier since it's fine if the ~10 categories overlap the ~100 terms (which are of secondary importance) and I wanted pretty different formatting for the two, but there's not necessarily a reason they couldn't be put together in the same data frame and geom I guess if someone can figure out a solution.
What I'd like to do is use the ggrepel functionality so the ~10 categories are repelled from each other and use the shadowtext functionality to make them stand out from the background of colorful words, but since they're different geoms I'm not sure how to make that happen.
Minimal example with some fake data:
library(ggplo2)
library(ggrepel)
library(shadowtext)
dictionary <- c("spicy", "Thanksgiving", "carborator", "mixed", "cocktail", "stubborn",
"apple", "rancid", "table", "antiseptic", "sewing", "coffee", "tragic",
"nonsense", "stufing", "words", "bottle", "distillery", "green")
tibble(Dim1 = rnorm(100),
Dim2 = rnorm(100),
Term = sample(dictionary, 100, replace = TRUE),
Color = as.factor(sample.int(10, 100, replace = TRUE))) -> words
tibble(Dim1 = c(-1,-1,0,-0.5,0.25,0.25,0.3),
Dim2 = c(-1,-0.9, 0, 0, 0.25, 0.4, 0.1),
Term = c("Scotland", "Ireland", "America", "Taiwan", "Japan", "China", "New Zealand")) -> locations
#Base graph
ggplot() +
xlab("Factor 1") +
ylab("Factor 2") +
theme(legend.position = "none") +
geom_text_repel(aes(x = Dim1, y = Dim2, label = Term, color = Color),
words,
fontface = "italic", size = 8) -> p
#Cluttered and impossible to read:
p + geom_text(aes(x = Dim1, y = Dim2, label = Term),
locations,
fontface = "bold", size = 16, color = "#747474")
#I can make it repel:
p + geom_text_repel(aes(x = Dim1, y = Dim2, label = Term),
locations,
fontface = "bold", size = 16, color = "#747474")
#Or I can make the shadowtext:
p + geom_shadowtext(aes(x = Dim1, y = Dim2, label = Term),
locations,
fontface = "bold", size = 16, color = "#747474", bg.color = "white")
The results of the second plot, nicely repelling:
The results of the last plot, with these clean-looking white buffers around the category labels:
Is there a way to do both? I tried using geom_label_repel without the borders but I didn't think it looked as clean as the shadowtext solution.
This answer comes a little late, but I recently found myself in a similar pickle and figured a solution. I am writing cause it may be useful for someone else.
#I can make it repel:
p + geom_text_repel(aes(x = Dim1, y = Dim2, label = Term),
locations,
fontface = "bold", size = 16,
color = "white",
bg.color = "black",
bg.r = .15)
The bg.color and bg.r options from geom_text_repel allow you to select a shading color and size for your text, dramatically improving the contrast in your images (see below!). This solution is borrowed from this stack link!
Related
I am trying to make my beautiful ggplot map interactive with a tooltip using ggplotly. But the map rendered with ggploty is not beautiful.
Here is a picture of my map with only ggplot:
Here is a picture of my map when using ggplotly. It removes the legend and make the map ugly:
Is there another way of making my ggplot map interactive with a tooltip? And also ggplotly takes some time to render the interactive map:
Here is my sample code for my ggplot:
ggplot(data = sdpf_f, aes( fill = n,x = long, y = lat, group = group, text = tooltip)) +
geom_polygon(color = "white") +
theme_void() +
scale_fill_continuous(low="#c3ffff", high="#0291da",
guide = guide_legend(title.position = "top", label.position = "bottom", keywidth = 2,
keyheight = 0.5,
title = "Number of agreements"),na.value="lightgrey"
) +
theme(legend.position="bottom") +
coord_map()
Thanks & kind regards,
Akshay
I don't have your data and this isn't exactly the same, but it's fairly close to what I think you're expecting.
The libraries:
I called tidyverse for the plotting and piping. I called maps for the data I used and plotly for the Plotly graph.
I used a function that is derived from one of the ways ggplot sets the aspect ratio. I know I found this function on SO, but I don't remember who wrote it.
library(tidyverse)
library(maps)
library(plotly)
map_aspect = function(x, y) {
x.center <- sum(range(x)) / 2
y.center <- sum(range(y)) / 2
x.dist <- ggplot2:::dist_central_angle(x.center + c(-0.5, 0.5), rep(y.center, 2))
y.dist <- ggplot2:::dist_central_angle(rep(x.center, 2), y.center + c(-0.5, 0.5))
y.dist / x.dist
}
I had to create data as your question is not reproducible. I figured I would include it, so that my answer was reproducible.
ms <- map_data("state") %>%
mutate(n = ifelse(str_detect(region, "^a"), 1.0,
ifelse(str_detect(region, "^o"), 1.5,
ifelse(str_detect(region, "^t"), 2.0,
ifelse(str_detect(region, "^s"), 2.5,
ifelse(str_detect(region, "^w"),
3.0,
NA))))))
I modified your ggplot call. The only change is coord_fixed instead of coord_map. I did this so that Plotly could interpret the aspect ratio correctly. Within coord_fixed, I used the function map_aspect.
gp <- ggplot(data = ms, aes(fill = n, x = long, y = lat,
group = group, text = tooltip)) +
geom_polygon(color = "white") +
theme_void() +
scale_fill_continuous(low="#c3ffff", high="#0291da",
guide = guide_legend(title.position = "top",
label.position = "bottom",
keywidth = 2,
keyheight = 0.5,
title = "Number of agreements"),
na.value="lightgrey"
) +
theme(legend.position="bottom") +
coord_fixed(with(ms, map_aspect(long, lat)))
Then I created a Plotly object. I set some requirements for the layout, as well (horizontal legend at the bottom, with the legend title above the elements—similar to the legend in your ggplot call).
pp <- ggplotly(gp) %>%
layout(legend = list(orientation = "h", valign = "bottom",
title = list(side = "top"),
x = .02),
xaxis = list(showgrid = F), yaxis = list(showgrid = F),
showlegend = T)
Next, I needed to add the information for the legend. I chose to name the traces (which are split by color). I started by creating a vector of the names of the traces (which is what you see in the legend). I added a "" at the end, so that the NA-colored states wouldn't have a named trace (so they won't show in the legend).
This is likely something you'll need to change for your data.**
# the color values and the last one, which is for NAs, no legend
nm <- seq(1, 3, by = .5) %>% sprintf(fmt = "%.1f") %>% append(., "")
Then I wanted to ensure that showlegend was true for all but the NA (light gray) states. I created a vector of T and F.
legs <- c(rep(TRUE, 5), FALSE)
Then I added these to the plot.
invisible(lapply(1:length(pp$x$data),
function(i){
pp$x$data[[i]]$name <<- nm[i]
pp$x$data[[i]]$showlegend <<- legs[i]
}))
I am trying to do something similar to what is described in the blog here but using R with ggtree, ggmap, and ggplot2.
I want to be able to combine the plots of the phylogenetic tree and the map showing the sampling locations of the tips on a geographical map, and link the tips to the sampling locations by segments. That would allow to see ie. if some clusters appears to specific geographical locations (ie north, south of an area) and would allow also to display different data with tips colors/symbols. This would be at first used as exploratory graphs, but this can also be used later on for publication ...
I would like to use the gg* libraries (ggplot2, ggtree, ggmap ...) to do that, because then it is easy to modify plots to display different variables. Here is a dummy script to describe how I do that so far. I do the tree and map plot separately and combine them. I want also to be able to have a common legend for the two plots. I am stuck after combining, I do not find out how to link the points from the tree plot to the points on the map plot with segments.
Anyone with ideas / possible solutions on how to do that or an alternative approach ?
Here is the dummy dataset to illustrate for creating the plots
library(patchwork)
library(ggpubr)
library(ggtree)
library(tidyverse)
library(ggmap)
library(ggplot2)
mytree <- ggtree::rtree(100)
mymap <- ggmap::get_map(c(left = 0.903, bottom = 44.56, right = 6.72, top = 49.38),
scale = 4, maptype = "terrain",
source = "stamen",
color = "bw")
save(mymap, file = "dummy_map.Rdata")
ggmap require API key to create the map (sorry I cannot share the API key, but you can make one for free on google cloud). I saved the map object and its downloadable from here.
# Loading the map
load("dummy_map.Rdata")
# creating dummy metadata
mytree_data <- tidytree::as_tibble(mytree)
mymetadata <- mytree_data %>%
dplyr::filter(!is.na(label)) %>%
tibble::add_column(year = sample(seq(1990, 2020, by = 1), 100, replace = T),
lon = sample(seq(0.91, 6.7, by = 0.01), 100, replace = T),
lat = sample(seq(44.56, 49.38, by = 0.01), 100, replace = T)) %>%
dplyr::rename(id = label) %>%
dplyr::select(id, year, lat, lon)
# plotting the phylogenetic tree
# phylogenetic tree example
mytree_plot <-
ggtree::ggtree(mytree, layout = "rectangular", ladderize = T, lwd = .2) %<+%
mymetadata +
geom_tippoint(aes(color = year), size = 1, show.legend = T) +
scale_color_gradient(low='red', high="blue", space = "Lab",
limits = c(NA, NA), na.value = "black",
n.breaks = 8,
guide = "colorbar") +
geom_tiplab(aes(label = label), size = 1, offset = -1E-10) +
geom_treescale(fontsize = 2, linesize = 0.5, offset = 1) +
theme(legend.position = c(0.9,0.15),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6),
plot.title = element_text(hjust = 1))
mytree_plot
For some reason, I have to add the theme to be able to see the legend for the points, it is not created automatically. This should not occur. If anyone see what I am doing wrong here please let me know.
Then I add the sampling locations on the map, and deactivate the legend that is common with the tree legend
mymap_plot <- ggmap(mymap, n_pix = 340, darken = c(0.6, "white"))+
geom_point(data = mymetadata,
aes(x = lon, y = lat, color = year),
size = 2, alpha = .8, na.rm = T) +
scale_color_gradient(low='red', high="blue", space = "Lab",
limits = c(NA, NA),
n.breaks = 8,
guide = "colorbar") +
guides(color = F)
mymap_plot
Then I combine the tree plot and the map plot together. I tried with "patchwork" and "ggpubr" packages.
So far it appear easier to combine plots and draw a single legend with ggpubr, so this is currently my first choice at combining plot
# combining plots with patchwork
combined_plot <- mytree_plot + mymap_plot
# combining plots with ggpubr
# which I like better because it allows to combine the legends which is usefull
# when more variables are used ie shape for uncertainty location
other_combined <- ggarrange(mytree_plot, mymap_plot,
ncol = 2,
labels = c("A", "B"),
align = "hv",
legend = "bottom",
common.legend = T)
Here is the combined plot of the phylogenetic tree (ggtree) and the map (ggmap) obtained with ggpubr.
I am stuck at this point.
I need a way to add segments between corresponding points at the tips of the tree to the corresponding sampling locations of each tip on the map
Any solutions/ideas on how I could do that?
I am using the R package GGPubr to make Boxplots. I really like the nice visuals that it provides but am having problems. Does anyone know how to increase the font size of the numbers on the axes, and the axis labels, and class labels? Also how do I set the mean values so that they only display 2 decimal places?
Here is the code that I'm using:
library("ggpubr")
mydata <- read.csv("C:\\temp\\ndvi.csv")
ggboxplot(mydata, x = "class", y = "NDVI",
color = "class",
order = c("Conifer", "Deciduous", "Grasslands"), ggtheme=theme_gray(),
ylab = "NDVI Value", xlab = "Land Cover Class",
add="mean",
font.label = list(size = 30, face = "bold"))+ stat_summary(fun.data
= function(x) data.frame(y=1, label = paste("Mean=",mean(x))), geom="text")
+theme(legend.position="none")
And the csv:
NDVI,class
0.25,Conifer
0.27,Conifer
0.29,Conifer
0.403,Deciduous
0.38,Deciduous
0.365,Deciduous
0.31983489,Grasslands
0.32005,Grasslands
0.328887766,Grasslands
I would prefer to achieve the desired effects above with GGPubr rather than boxplot() or ggplot/ggplot 2. Thanks.
Here is one option where we use round() to take care of the two decimal places and add another theme() to change the text size.
ggboxplot(mydata, x = "class", y = "NDVI",
color = "class",
order = c("Conifer", "Deciduous", "Grasslands"), ggtheme=theme_gray(),
ylab = "NDVI Value", xlab = "Land Cover Class",
add="mean",
font.label = list(size = 30, face = "bold")) +
# use round() and set y = .45
stat_summary(fun.data = function(x) data.frame(y=1, label = paste("Mean=", round(mean(x), 2))), geom="text") +
theme(legend.position="none") +
theme(text = element_text(size = 16)) # change text size of theme components
My map-making code generates a map based on census data and plots important points as a tm_dots() layer. What I'd like to be able to do is differentiate between the types of dots (e.g. if the location is "Informal" or "Commercial").
tm_shape(bristol) + tm_fill("population", palette = "YlOrRd",
auto.palette.mapping = TRUE,
title = "Bristol Population",
breaks = c(0,5,10,15,20,25), colorNA = "darkgrey") + tm_borders("grey25",alpha = 0.7, lwd = 0.1) +
tm_dots("n", size=0.1,col="green", shapeNA = NA, title = "Spaces") +
tm_legend(text.size=1,title.size=1.2,position=c("left","top")) +
tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", title.snap.to.legend = TRUE)
What I'm looking for is essentially:
tm_dots("n", size=0.1,col=Classification, shapeNA = NA, title = "Spaces")
Adding several tm_dots() layers isn't an option. I also can't rename the dot legend, any advice on that too is appreciated.
Thanks for your help!
Solution
For future reference, I added offices to bristol via left_join, thus adding the Classification variable to the SpatialPolygonsDataFrame. I was having issues with it displaying NA values despite the showNA = NA parameter, but colorNA = NULL worked. Final line:
tm_dots(size=0.1,col="Classification", palette = "Set1", colorNA = NULL)
So bristol is a polygon shape (SpatialPolygonDataFrame or sf), and you want to plot dots in some polygons?
Normally, you would have a variable Offices, with two levels "Informal" and "Commercial". Then it's just tm_dots(size = 0.1, col = "Offices"). If you want to place two dots in one polygons because there are Informal and Commercial offices, then you can use your own approach (and use xmod and/or ymod for one group to prevent overlap), or create a SpatialPointsDataFrame or sf object with all offices, and a variable Offices with two levels as described above.
I figured it out, you need to have another tm_shape() for it to work. Still haven't got the title() to appear properly but one step at a time.
tm_shape(bristol) + tm_fill("population", palette = "YlOrRd", auto.palette.mapping = TRUE,
title = "Bristol Population",
breaks = c(0,5,10,15,20,25), colorNA = "darkgrey") + tm_borders("grey25",alpha = 0.7, lwd = 0.1) +
tm_dots("Informal_Offices", size=0.1,col="green", shapeNA = NA, title = "Informal Offices") +
tm_shape(bristol) + tm_dots("Commercial_Offices", size=0.1,col="white",shapeNA=NA, title="Commercial Offices") +
tm_legend(text.size=1,title.size=1.2,position=c("left","top")) +
tm_layout(legend.outside = TRUE, legend.outside.position = "bottom", title.snap.to.legend = TRUE)
Result
I am creating a number of histograms and I want to add annotations towards the top of the graph. I am plotting these using a for loop so I need a way to place the annotations at the top even though my ylims change from graph to graph. If I could store the ylim for each graph within the loop I could cause the y coordinates for my annotation to vary based on the current graph. The y value I include in my annotation must change dynamically as the loop proceeds across iterations. Here is some sample code to demonstrate my issue (Notice how the annotation moves around. I need it to change based on the ylim for each graph):
library(ggplot2)
cuts <- levels(as.factor(diamonds$cut))
pdf(file = "Annotation Example.pdf", width = 11, height = 8,
family = "Helvetica", bg = "white")
for (i in 1:length(cuts)) {
by.cut<-subset(diamonds, diamonds$cut == cuts[[i]])
print(ggplot(by.cut, aes(price)) +
geom_histogram(fill = "steelblue", alpha = .55) +
annotate ("text", label = "My annotation goes at the top", x = 10000 ,hjust = 0, y = 220, color = "darkred"))
}
dev.off()
ggplot uses Inf in its positions to represent the extremes of the plot range, without changing the plot range. So the y value of the annotation can be set to Inf, and the vjust parameter can also be adjusted to get a better alignment.
...
print(ggplot(by.cut, aes(price)) +
geom_histogram(fill = "steelblue", alpha = .55) +
annotate("text", label = "My annotation goes at the top",
x = 10000, hjust = 0, y = Inf, vjust = 2, color = "darkred"))
...
For i<-2, this looks as:
There may be a neater way, but you can get the max count and use that to set y in the annotate call:
for (i in 1:length(cuts)) {
by.cut<-subset(diamonds, diamonds$cut == cuts[[i]])
## get the cut points that ggplot will use. defaults to 30 bins and thus 29 cuts
by.cut$cuts <- cut(by.cut$price, seq(min(by.cut$price), max(by.cut$price), length.out=29))
## get the highest count of prices in a given cut.
y.max <- max(tapply(by.cut$price, by.cut$cuts, length))
print(ggplot(by.cut, aes(price)) +
geom_histogram(fill = "steelblue", alpha = .55) +
## change y = 220 to y = y.max as defined above
annotate ("text", label = "My annotation goes at the top", x = 10000 ,hjust = 0, y = y.max, color = "darkred"))
}