Plotting a Tree Map in R using treemapify - r

Im making a treemap of some data using a pretty cool library called treemapifyof which the details can be found here and github repository here
Based on my reading of the documentation it seems to be based on ggplot2 so it should be possible to modify the graph using the grammar of graphics
My code is below with some made up data. The end result is pretty nice but i want to change the color scheme to a more subtle using the line scale_colour_brewer. The graph runs fine but the colour scheme seems to be ignored. Has anyone had any experience with this?
# Create Random Data
country <- c("Ireland","England","France","Germany","USA","Spain")
job <- c("IT","SOCIAL","Project Manager","Director","Vice-President")
mydf <- data.frame(countries = sample(country,100,replace = TRUE),
career = sample(job,100,replace=TRUE),
participent = sample(1:100, replace = TRUE)
)
# Set Up the coords
treemap_coords <- treemapify(mydf,
area="participent",
fill="countries",
label="career",
group="countries")
# Plot the results using the Green Pallete
ggplotify(treemap_coords,
group.label.size.factor = 2,
group.label.colour = "white",
label.colour = "black",
label.size.factor = 1) +
labs(title="Work Breakdown") +
scale_colour_brewer(palette = "Greens")

If you want to change the fill color of the rectangles, try the scale for fill instead the one for colour:
scale_fill_brewer(palette = "Greens")

Related

How to make parallel coord plot interactive and more readable?

I am trying to make a parallel coordinate graph in R. I want the user to be able to adjust which specific factor level they are viewing data for (album name in this case). Below is my code:
ggparcoord(ts,
columns = c(3:4, 9, 11:12),
groupColumn = 20,
alphaLines = .75,
showPoints = TRUE,
scale = "globalminmax") +
facet_grid(~ album_name) +
scale_color_manual("album_name",
values = c("#50a7e0", "#ddc477", "#923c81", "#951e1a", "#876b79", "#353839",
"hotpink", "#bababa", "#994914", "#f6ed95", "#951e1a"),
labels = levels(ts$album_name)) +
theme_light() +
theme(legend.position = "none")
The output looks like this:
As you can see, it is very convoluted and the viewer can't glean much info from it. I want to make it to where the user can select which individual plot they would like to view, without having to use a shiny dashboard (htmlwidgets would be ideal, but I do not know how to implement this well). I am also open to a non interactive solution that might improve readability for the viewer.

stat_density2d tile geom showing grid when exporting as PDF

I'm having an issue when exporting stat_density2d plots.
ggplot(faithful, aes(eruptions, y = waiting, alpha = ..density..)) +
stat_density2d(geom = 'tile', contour = F)
When exporting as a png it looks like so:
But when I export as a PDF a grid appears:
I'm assuming that this is because the boundaries of the tiles overlap and so have equivalent of a doubled alpha value.
How can I edit just the edges of the tiles to stop this from happening?
Secondary question:
As Tjebo mentioned geom = 'raster' would fix this problem. However, this raises a new issue that only one group gets plot.
df <- faithful
df$new <- as.factor(ifelse(df$eruptions>3.5, 1, 0))
ggplot(df, aes(eruptions, waiting, fill = new, alpha = ..density..)) +
stat_density2d(geom = 'tile', contour = F) +
scale_fill_manual(values = c('1' = 'blue', '0' = 'red'))
ggplot(df, aes(eruptions, waiting, fill = new, alpha = ..density..)) +
stat_density2d(geom = 'raster', contour = F) +
scale_fill_manual(values = c('1' = 'blue', '0' = 'red'))
help on this second issue would also be much appreciated!
Now I decided to transform my comment into an answer instead. Hopefully it solves your problem.
There is an old, related google thread on this topic - It seems related to how the plots are vectorized in each pdf viewer.
A hack is suggested in this thread, but one solution might simply be to use geom = 'raster' instead.
library(ggplot2)
ggplot(faithful, aes(eruptions, y = waiting, alpha = ..density..)) +
stat_density2d(geom = 'raster', contour = F)
Created on 2019-08-02 by the reprex package (v0.3.0)
If you have a look at the geom_raster documentation - geom_raster is recommended if you want to export to pdf!
The most common use for rectangles is to draw a surface. You always want to use geom_raster here because it's so much faster, and produces smaller output when saving to PDF
edit - second part of the question
Your tile plot can't be correct - you are creating cut-offs (your x value), so the fill should not overlap. This points to the core of the problem - the alpha=..density.. probably calculates the density based on the entire sample, and not by group. I think the only way to go is to pre-calculate the density (e.g., using density(). In your example, for demonstration purpose, we have this luckily precalculated, as faithfuld (this is likely not showing the results which you really want, as it is the density on the entire sample!!).
I'd furthermore recommend not to use numbers as your factor values - this is pretty confusing for you and R. Use characters instead. Also, ideally don't use df for a sample data frame, as this is a base R function;)
Hope this helps
mydf <- faithfuld ## that is crucial!!! faithfuld contains a precalculated density column which is needed for the plot!!
mydf$new <- as.factor(ifelse(mydf$eruptions>3.5, 'large', 'small'))
ggplot(mydf, aes(eruptions, waiting)) +
geom_raster(aes(fill = new, alpha = density), interpolate = FALSE)
Created on 2019-08-02 by the reprex package (v0.3.0)

Avoiding code repetition in ggplot: adding various geom_sf plots

Following from this question on boxplots and my own question on making a map of India, what is a good way to avoid code repetition in ggplot when dealing with various layers of a map?
Below is a reprex . I thought the easiest way would be to:
1. save a basic map with state and national borders
2. add district layer (displaying the variables).
Imagine repeating step 2 for dozens of variables.
library(ggplot2)
library(sf)
library(raster)
# Download district and state data (should be less than 10 Mb in total)
distSF <- st_as_sf(getData("GADM",country="IND",level=2))
stateSF <- st_as_sf(getData("GADM",country="IND",level=1))
# Add country border
countryborder <- st_union(stateSF)
# STEP 1: Basic plot
basicIndia <- ggplot() +
geom_sf(data = stateSF, color = "white", fill = NA) +
geom_sf(data = countryborder, color = "blue", fill = NA) +
theme_dark()
# STEP 2: Adding the data layer underneath so it doesn't cover the other borders
indiaMap$layers <- c(geom_sf(data = distSF, fill = "red")[[1]], indiaMap$layers[[2:3]])
indiaMap$layers <- c(geom_sf(data = distSF, fill = "gold")[[1]], indiaMap$layers[[2:3]])
indiaMap
However, in this way, one cannot make even minor modifications to that additional layer, like adding a different title. The following obviously does not work but makes my point.
basicIndia$layers <- c(
geom_sf(data = distSF, aes(fill = GINI), color = "white", size = 0.2)[[1]] +
labs(title = "Gini coefficient"),
basicIndia$layers)
Am I approaching the problem in the wrong way? Is this something that cannot be done?
Another way to approach the problem would be to use ggplot_build().
Make a ggplot_build object using:
indiaBuild <- ggplot_build(basicIndia)
Instead of your step 2 we could now use:
indiaBuild$plot$layers <- c(indiaBuild$plot$layers,
geom_sf(data=distSF, fill='gold')[[1]])
You can change various parts of the ggplot_build object then including the title:
indiaBuild$plot$labels$title <- 'Gini coefficient'
When finished you can extract just the plot using p <- indiaBuild$plot

Faulty legend in R with "color_scale_manual"

Can please someone tell me why my legend is not displaying correctly (The point in the legend for Hypericin is filled green and not blue).
Here is my code:
ggplot(df,aes(x=x, y=y))+
labs(list(title='MTT_darktox',x=expression('Concentration['*mu*'M]'),y='Survival[%]'))+
scale_x_continuous(breaks=seq(0,50,2.5))+
scale_y_continuous(breaks=seq(0,120,20))+
expand_limits(y=c(0,120))+
geom_point(data=df,shape = 21, size = 3, aes(colour='Hypericin'), fill='blue')+
geom_errorbar(data=df,aes(ymin=y-sd1, ymax=y+sd1),width = 0.8, colour='blue')+
geom_line(data=df,aes(colour='Hypericin'), size = 0.8)+
geom_point(data=df2,shape = 21, size = 3, aes(colour='#212'), fill='green')+
geom_errorbar(data=df2,aes(ymin=y-sd1, ymax=y+sd1),width = 0.8, colour='green')+
geom_line(data=df2,aes(colour='#212'), size = 0.8)+
scale_colour_manual(name='Batch_Nr', values=c('Hypericin'='blue','#212' ='green'))
Thank you!
R Plot
It would definately help to see some data for reproducability.
Guessing the structure of your data results in something like this.
# create some fake data:
df <- data.frame(x = rep(1:10, 2),
y = c(1/1:10, 1/1:10 + 1),
error = c(rnorm(10, sd = 0.05), rnorm(10, sd = 0.1)),
group = rep(c("Hypericin", "#212"), each = 10))
Which can be plotted like this:
# plot the data:
library(ggplot2)
ggplot(df, aes(x = x, y = y, color = group)) +
geom_line() +
geom_point() +
geom_errorbar(aes(ymin = y - error, ymax = y + error)) +
scale_colour_manual(name='Batch_Nr',
values = c("Hypericin" = "blue", "#212" = "green"))
Which results in a plot like this:
Explanation
First of all, you don't need to add the data = df in the ggplot-functions if you already defined that in the first ggplot-call.
Furthermore, ggplot likes tidy data best (aka. the long-format. Read more about that here http://vita.had.co.nz/papers/tidy-data.pdf). Thus adding two datasets (df, and df2) is possible but merging them and creating every variable in the dataset has the advantage that its also easier for you to understand your data.
Your error (a green point instead of a blue one) came from this confusion. In line 6 you stated that fill = "blue", which you don't change later (i.e., you don't specify something like this: scale_fill_color(...).
Does that give you what you want?
Lastly, for future questions, please make sure that you follow the MWE-principle (Minimal-Working-Example) to make the life of everyone trying to answer the question a bit easier: How to make a great R reproducible example?
Thank you very much for your help! I will consider the merging for future code.
Meanwhile I found another solution to get what I wanted without changing everything (although probably not the cleanest way). I just added another line to override the legend appearance :
guides(colour= guide_legend(override.aes=list(linetype=c(1,1)
,shape=c(16,16))))
resulting in :
R plot new

R - Add legend to ggmap when no data is available

I am trying to figure out how to display a map including the legend with ggmap/ggplot.
I have gotten so far:
library(ggmap)
library(RColorBrewer)
bbox <- c(8.437526,47.328268,8.605915,47.462160)
map.base <- get_map(maptype='toner',source = 'stamen',location = bbox)
ggmap(map.base) +
geom_blank() +
ggtitle("2015-09-21 06:00:00 CEST") +
scale_colour_manual(values = rev(brewer.pal(7,"Spectral")), drop = FALSE)+
scale_size_manual(values=c(1:7), drop = FALSE) +
guides(color=guide_legend(title='Mean Delay [s]'), size = guide_legend(title='Mean Delay [s]'))+
ggsave(file=paste("map_","2015-09-21 060000",".png",sep=""),dpi = 100)
dev.off()
This generates the correct map in the correct bounding box. But even thought I have specified: "scale_colour_manual" and "scale_size_manual" with "drop = FALSE", no legend is appearing. How can I have the legend shown when no data is to be displayed?
The overall intention is to create a single map of a given interval in a time series. Now the problem is that some intervals have no data and so the map is displayed without a scale. If the map does not have a scale the dimensions of the map are different making it impossible to create a movie out of the different maps. That is why I need to be able to create a map WITHOUT data but WITH the legend showing.
Thank you.
Taking Jaap's comment into account, that I have to call a legend in aes I have been able to achieve what I want with following code:
library(ggmap)
library(RColorBrewer)
bbox <- c(8.437526,47.328268,8.605915,47.462160)
map.base <- get_map(maptype='toner',source = 'stamen',location = bbox)
ggmap(map.base) +
geom_point(aes(x=0,y=0, color=cut(0,breaks = c(-Inf,0,60,120,240,300,360,Inf),right = FALSE), size=cut(0,breaks = c(-Inf,0,60,120,240,300,360,Inf),right = FALSE))) +
ggtitle("2015-09-21 06:00:00 CEST") +
scale_colour_manual(values = rev(brewer.pal(7,"Spectral")), drop = FALSE)+
scale_size_manual(values=c(1:7), drop = FALSE) +
guides(color=guide_legend(title='Mean Delay [s]'), size = guide_legend(title='Mean Delay [s]')) +
ggsave(file=paste("map_","2015-09-21 060000",".png",sep=""),dpi = 100)
dev.off()
I know this is not the most elegant way, but it works for now.
I basically make a dummy point outside of the bounding box to be displayed. I then give the point a value, which is cut according to the breaks I want and then colored and sized accordingly. Just remember to put the values of x and y in aes outside of the bounding box.
Better solutions are welcome.

Resources