I'm working on a population pyramide that should be saved as a gif. Kind of like in this tutorial of Flowing Data, but with ggplot instead of plotrix.
My workflow:
1) Create a population pyramide
2) Create multiple pyramide-plots in a for-loop
for (i in unique(d$jahr)) {
d_jahr <- d %>%
filter(jahr == i)
p <- ggplot(data = d_jahr, aes(x = anzahl, y = value, fill = art)) +
geom_bar(data = filter(d_jahr, art == "w"), stat = "identity") +
geom_bar(data = filter(d_jahr, art == "m"), stat = "identity") +
coord_flip() +
labs(title = paste(i), x = NULL, y = NULL)
ggsave(p,filename=paste("img/",i,".png",sep=""))
}
3) Save the plots as gif with the animation package
My problem:
All years have different values, so the x-axis have different ranges. This results in weird looks in a gif, because the center of the plots jumps to the right, to the left, to the right...
Is it possible to fix the x-axis (in this case y-axis, because of coord-flip()) over multiple plots that are created independently?
You can fix the range of an axis by setting the limits parameter:
library(ggplot2)
lst <- list(
data.frame(x = 1:100, y=runif(100, 0, 10)),
data.frame(x = 1:100, y=runif(100, 0, 100))
)
ylim <- range(do.call(c, lapply(lst, "[[", "y")))
for (x in seq(lst)) {
print(ggplot(lst[[x]], aes(x, y)) + geom_point() + scale_y_continuous(limits=ylim))
}
or by adding +ylim(ylim) instead of +scale_y_continuous(limits=ylim) (via #DeveauP).
Related
How can I reformat this ridgeline plot so that is a vertical ridgeline plot?
My real dataset is the actual PDF. For a minimum reproducible example, I generate distributions and extract the PDFs to use in a dummy function. The dataframe has a model name (for grouping), x values paired with PDF ordinates, and an id field that separates the different ridgeline levels (i.e., ridgeline y axis).
set.seed(123)
makedfs <- function(name, id, mu, sig) {
vals <- exp(rnorm(1000, mean=mu, sd=sig))
pdf <-density(vals)
model <- rep(name, length(pdf$x))
prox <- rep(id, length(pdf$x))
df <- data.frame(model, prox, pdf$x, pdf$y)
colnames(df) <- c("name", "id", "x", "pdf")
return(df)
}
df1 <- makedfs("model1", 0, log(1), 1)
df2 <- makedfs("model2", 0, log(0.5), 2)
df3 <- makedfs("model1", 1, log(0.2), 0.8)
df4 <- makedfs("model2", 1, log(1), 1)
df <- rbind(df1, df2, df3, df4)
From this answer, R Ridgeline plot with multiple PDFs can be overlayed at same level, I have a standard joyplot:
ggplot(df, aes(x=x, y=id, height = pdf, group = interaction(name, id), fill = name)) +
geom_ridgeline(alpha = 0.5, scale = .5) +
scale_y_continuous(limits = c(0, 5)) +
scale_x_continuous(limits = c(-6, 6))
I am trying the code below based on https://wilkelab.org/ggridges/reference/geom_vridgeline.html but it throws an error on the width parameter.
p <- ggplot(df, aes(x=id, y=x, width = ..density.., fill=id)) +
geom_vridgeline(stat="identity", trim=FALSE, alpha = 0.85, scale = 2)
Error in `f()`:
! Aesthetics must be valid computed stats. Problematic aesthetic(s): width = ..density...
Did you map your stat in the wrong layer?
If you wanted the same graph, just vertically oriented, you need to use the same parameters when you use geom_vridgeline.
I swapped the limits you originally set so you can see that it's the same.
ggplot(df, aes(x = id, y = x, width = pdf, fill = name,
group = interaction(name, id))) +
geom_vridgeline(alpha = 0.85, scale = .5) +
scale_x_continuous(limits = c(0, 5)) + # <-- note that the x & y switched
scale_y_continuous(limits = c(-6, 6))
I'd like to plot histogram and density on the same plot. What I would like to add to the following is custom y-axis label which would be something like sprintf("[%s] %s", ..density.., ..count..) - two numbers at one tick value. Is it possible to obtain this with scale_y_continuous or do I need to work this around somehow?
Below current progress using scales::trans_new and sec_axis. sec_axis is kind of acceptable but the most desirable output is as on the image below.
set.seed(1)
var <- rnorm(4000)
binwidth <- 2 * IQR(var) / length(var) ^ (1 / 3)
count_and_proportion_label <- function(x) {
sprintf("%s [%.2f%%]", x, x/sum(x) * 100)
}
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(binwidth = binwidth) +
geom_density(aes(y = ..count.. * binwidth)) +
scale_y_continuous(
# this way
trans = trans_new(name = "count_and_proportion",
format = count_and_proportion_label,
transform = function(x) x,
inverse = function(x) x),
# or this way
sec.axis = sec_axis(trans = ~./sum(.),
labels = percent,
name = "proportion (in %)")
)
I've tried to create object with breaks before basing on the graphics::hist output - but these two histogram differs.
bins <- (max(var) - min(var))/binwidth
hdata <- hist(var, breaks = bins, right = FALSE)
# hist generates different bins than `ggplot2`
At the end I would like to get something like this:
Would it be acceptable to add percentage as a secondary axis? E.g.
your_plot + scale_y_continuous(sec.axis = sec_axis(~.*2, name = "[%]"))
Perhaps it would be possible to overlay the secondary axis on the primary one, but I'm not sure how you would go about doing that.
You can achieve your desired output by creating a custom set of labels, and adding it to the plot:
library(tidyverse)
library(ggplot2)
set.seed(1)
var <- rnorm(400)
bins <- .1
df <- data.frame(yvals = seq(0, 20, 5), labels = c("[0%]", "[10%]", "[20%]", "[30%]", "[40%]"))
df <- df %>% tidyr::unite("custom_labels", labels, yvals, sep = " ", remove = TRUE)
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(aes(y = ..count..), binwidth = bins) +
geom_density(aes(y = ..count.. * bins), color = "black", alpha = 0.7) +
ylab("[density] count") +
scale_y_continuous(breaks = seq(0, 20, 5), labels = df$custom_labels)
Is it possible to use the ggridges package to draw sets of bars instead of ridgelines, similar to geom_col()?
I have data such as:
dt = tibble(
hr = c(1,2,3,4,1,2,3,4),
fr = c(.1,.5,.9,.1,.4,.9,.9,.4),
gr = c('Mon','Mon','Mon','Mon','Sun','Sun','Sun','Sun')
)
The plot below gives me:
ggplot(dt, aes(x=hr, y=gr, height=fr)) +
geom_ridgeline() + ylab(NULL)
As you can see it draws a line connecting the values. What I am looking for instead are individual columns, as in this plot:
ggplot(dt, aes(x=hr, y=fr)) +
geom_col() + ylab(NULL) +
facet_wrap(~gr)
Here is a solution tracing out the individual bars.
library(tidyverse)
library(ggridges)
dt = tibble(
hr = c(1,2,3,4,1,2,3,4),
fr = c(.1,.5,.9,.1,.4,.9,.9,.4),
gr = c('Mon','Mon','Mon','Mon','Sun','Sun','Sun','Sun')
)
# function that turns an x, y pair into the shape of a bar of given width
make_bar <- function(x, y, width = 0.9) {
xoff <- width/2
data.frame(x = c(x-xoff*(1+2e-8), x-xoff*(1+1e-8), x-xoff, x+xoff, x+xoff*(1+1e-8), x+xoff*(1+2e-8)),
height = c(NA, 0, y, y, 0, NA))
}
# convert data table using make_bar function
dt %>%
mutate(bars = map2(hr, fr, ~make_bar(.x, .y))) %>%
unnest() -> dt_bars
ggplot(dt_bars, aes(x=x, y=gr, height=height)) +
geom_ridgeline() + ylab(NULL)
I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines
I have some charts created with ggplot2 which I would like to embed in a web application: I'd like to enhance the plots with tooltips. I've looked into several options. I'm currently experimenting with the rCharts library and, among others, dimple plots.
Here is the original ggplot:
Here is a first attempt to transpose this to a dimple plot:
I have several issues:
after formatting the y-axis with percentages, the data is altered.
after formatting the x-axis to correctly render dates, too many labels are printed.
I am not tied to dimple charts, so if there are other options that allow for an easier way to tweak axis formats I'd be happy to know. (the Morris charts look nice too, but tweaking them looks even harder, no?)
Objective: Fix the axes and add tooltips that give both the date (in the format 1984) and the value (in the format 40%).
If I can fix 1 and 2, I'd be very happy. But here is another, less important question, in case someone has suggestions:
Could I add the line labels ("Top 10%") to the tooltips when hovering over the lines?
After downloading the data from: https://gist.github.com/ptoche/872a77b5363356ff5399,
a data frame is created:
df <- read.csv("ps-income-shares.csv")
The basic dimple plot is created with:
library("rCharts")
p <- dPlot(
value ~ Year,
groups = c("Fractile"),
data = transform(df, Year = as.character(format(as.Date(Year), "%Y"))),
type = "line",
bounds = list(x = 50, y = 50, height = 300, width = 500)
)
While basic, so far so good. However, the following command, intended to convert the y-data to percentages, alters the data:
p$yAxis(type = "addMeasureAxis", showPercent = TRUE)
What am I doing wrong with showPercent?
For reference, here is the ggplot code:
library("ggplot2")
library("scales")
p <- ggplot(data = df, aes(x = Year, y = value, color = Fractile))
p <- p + geom_line()
p <- p + theme_bw()
p <- p + scale_x_date(limits = as.Date(c("1911-01-01", "2023-01-01")), labels = date_format("%Y"))
p <- p + scale_y_continuous(labels = percent)
p <- p + theme(legend.position = "none")
p <- p + geom_text(data = subset(df, Year == "2012-01-01"), aes(x = Year, label = Fractile, hjust = -0.2), size = 4)
p <- p + xlab("")
p <- p + ylab("")
p <- p + ggtitle("U.S. top income shares (%)")
p
For information, the chart above is based on the data put together by Thomas Piketty and Emmanuel Saez in their study of U.S. top incomes. The data and more may be found on their website, e.g.
http://elsa.berkeley.edu/users/saez/
http://piketty.pse.ens.fr/en/
EDIT:
Here is a screenshot of Ramnath's solution, with a title added and axis labels tweaked. Thanks Ramnath!
p$xAxis(inputFormat = '%Y-%m-%d', outputFormat = '%Y')
p$yAxis(outputFormat = "%")
p$setTemplate(afterScript = "
<script>
myChart.axes[0].timeField = 'Year'
myChart.axes[0].timePeriod = d3.time.years
myChart.axes[0].timeInterval = 10
myChart.draw()
myChart.axes[0].titleShape.remove() // remove x label
myChart.axes[1].titleShape.remove() // remove y label
myChart.svg.append('text') // chart title
.attr('x', 40)
.attr('y', 20)
.text('U.S. top income shares (%)')
.style('text-anchor','beginning')
.style('font-size', '100%')
.style('font-family','sans-serif')
</script>
")
p
To change (rather than remove) axis labels, for instance:
myChart.axes[1].titleShape.text('Year')
To add a legend to the plot:
p$set(width = 1000, height = 600)
p$legend(
x = 580,
y = 0,
width = 50,
height = 200,
horizontalAlign = "left"
)
To save the rchart:
p$save("ps-us-top-income-shares.html", cdn = TRUE)
An alternative based on the nvd3 library can be obtained (without any of the fancy stuff) with:
df$Year <- strftime(df$Year, format = "%Y")
n <- nPlot(data = df, value ~ Year, group = 'Fractile', type = 'lineChart')
Here is one way to solve (1) and (2). The argument showPercent is not to add % to the values, but to recompute the values so that they stack up to 100% which is why you are seeing the behavior you pointed out.
At this point, you will see that we are still having to write custom javascript to tweak the x-axis to get it to display the way we want it to. In future iterations, we will strive to allow the entire dimple API to be accessible within rCharts.
df <- read.csv("ps-income-shares.csv")
p <- dPlot(
value ~ Year,
groups = c("Fractile"),
data = df,
type = "line",
bounds = list(x = 50, y = 50, height = 300, width = 500)
)
p$xAxis(inputFormat = '%Y-%m-%d', outputFormat = '%Y')
p$yAxis(outputFormat = "%")
p$setTemplate(afterScript = "
<script>
myChart.axes[0].timeField = 'Year'
myChart.axes[0].timePeriod = d3.time.years
myChart.axes[0].timeInterval = 5
myChart.draw()
//if we wanted to change our line width to match the ggplot chart
myChart.series[0].shapes.style('stroke-width',1);
</script>
")
p
rCharts is rapidly evolving. I know it is late, but in case someone else would like to see it, here is an almost complete replication of the ggplot sample shown.
#For information, the chart above is based
#on the data put together by Thomas Piketty and Emmanuel Saez
#in their study of U.S. top incomes.
#The data and more may be found on their website, e.g.
#http://elsa.berkeley.edu/users/saez/
#http://piketty.pse.ens.fr/en/
#read in the data
df <- read.csv(
"https://gist.githubusercontent.com/ptoche/872a77b5363356ff5399/raw/ac86ca43931baa7cd2e17719025c8cde1c278fc1/ps-income-shares.csv",
stringsAsFactors = F
)
#get year as date
df$YearDate <- as.Date(df$Year)
library("ggplot2")
library("scales")
p <- ggplot(data = df, aes(x = YearDate, y = value, color = Fractile))
p <- p + geom_line()
p <- p + theme_bw()
p <- p + scale_x_date(limits = as.Date(c("1911-01-01", "2023-01-01")), labels = date_format("%Y"))
p <- p + scale_y_continuous(labels = percent)
p <- p + theme(legend.position = "none")
p <- p + geom_text(data = subset(df, Year == "2012-01-01"), aes(x = YearDate, label = Fractile, hjust = -0.2), size = 4)
p <- p + xlab("")
p <- p + ylab("")
p <- p + ggtitle("U.S. top income shares (%)")
gp <- p
gp
p <- dPlot(
value ~ Year,
groups = c("Fractile"),
data = df,
type = "line",
bounds = list(x = 50, y = 50, height = 300, width = 500)
)
p$xAxis(inputFormat = '%Y-%m-%d', outputFormat = '%Y')
p$yAxis(outputFormat = "%")
p$setTemplate(afterScript = "
<script>
myChart.axes[0].timeField = 'Year'
myChart.axes[0].timePeriod = d3.time.years
myChart.axes[0].timeInterval = 5
myChart.draw()
//if we wanted to change our line width to match the ggplot chart
myChart.series[0].shapes.style('stroke-width',1);
//to take even one step further
//we can add labels like in the ggplot example
myChart.svg.append('g')
.selectAll('text')
.data(
d3.nest().key(function(d){return d.cx}).map(myChart.series[0]._positionData)[myChart.axes[0]._max])
.enter()
.append('text')
.text(function(d){return d.aggField[0]})
.attr('x',function(d){return myChart.axes[0]._scale(d.cx)})
.attr('y',function(d){return myChart.axes[1]._scale(d.cy)})
.attr('dy','0.5em')
.style('font-size','80%')
.style('fill',function(d){return myChart._assignedColors[d.aggField[0]].fill})
</script>
")
p$defaultColors(ggplot_build(gp)$data[[2]]$colour)
p