R: interactive plots (tooltips): rCharts dimple plot: formatting axis - r

I have some charts created with ggplot2 which I would like to embed in a web application: I'd like to enhance the plots with tooltips. I've looked into several options. I'm currently experimenting with the rCharts library and, among others, dimple plots.
Here is the original ggplot:
Here is a first attempt to transpose this to a dimple plot:
I have several issues:
after formatting the y-axis with percentages, the data is altered.
after formatting the x-axis to correctly render dates, too many labels are printed.
I am not tied to dimple charts, so if there are other options that allow for an easier way to tweak axis formats I'd be happy to know. (the Morris charts look nice too, but tweaking them looks even harder, no?)
Objective: Fix the axes and add tooltips that give both the date (in the format 1984) and the value (in the format 40%).
If I can fix 1 and 2, I'd be very happy. But here is another, less important question, in case someone has suggestions:
Could I add the line labels ("Top 10%") to the tooltips when hovering over the lines?
After downloading the data from: https://gist.github.com/ptoche/872a77b5363356ff5399,
a data frame is created:
df <- read.csv("ps-income-shares.csv")
The basic dimple plot is created with:
library("rCharts")
p <- dPlot(
value ~ Year,
groups = c("Fractile"),
data = transform(df, Year = as.character(format(as.Date(Year), "%Y"))),
type = "line",
bounds = list(x = 50, y = 50, height = 300, width = 500)
)
While basic, so far so good. However, the following command, intended to convert the y-data to percentages, alters the data:
p$yAxis(type = "addMeasureAxis", showPercent = TRUE)
What am I doing wrong with showPercent?
For reference, here is the ggplot code:
library("ggplot2")
library("scales")
p <- ggplot(data = df, aes(x = Year, y = value, color = Fractile))
p <- p + geom_line()
p <- p + theme_bw()
p <- p + scale_x_date(limits = as.Date(c("1911-01-01", "2023-01-01")), labels = date_format("%Y"))
p <- p + scale_y_continuous(labels = percent)
p <- p + theme(legend.position = "none")
p <- p + geom_text(data = subset(df, Year == "2012-01-01"), aes(x = Year, label = Fractile, hjust = -0.2), size = 4)
p <- p + xlab("")
p <- p + ylab("")
p <- p + ggtitle("U.S. top income shares (%)")
p
For information, the chart above is based on the data put together by Thomas Piketty and Emmanuel Saez in their study of U.S. top incomes. The data and more may be found on their website, e.g.
http://elsa.berkeley.edu/users/saez/
http://piketty.pse.ens.fr/en/
EDIT:
Here is a screenshot of Ramnath's solution, with a title added and axis labels tweaked. Thanks Ramnath!
p$xAxis(inputFormat = '%Y-%m-%d', outputFormat = '%Y')
p$yAxis(outputFormat = "%")
p$setTemplate(afterScript = "
<script>
myChart.axes[0].timeField = 'Year'
myChart.axes[0].timePeriod = d3.time.years
myChart.axes[0].timeInterval = 10
myChart.draw()
myChart.axes[0].titleShape.remove() // remove x label
myChart.axes[1].titleShape.remove() // remove y label
myChart.svg.append('text') // chart title
.attr('x', 40)
.attr('y', 20)
.text('U.S. top income shares (%)')
.style('text-anchor','beginning')
.style('font-size', '100%')
.style('font-family','sans-serif')
</script>
")
p
To change (rather than remove) axis labels, for instance:
myChart.axes[1].titleShape.text('Year')
To add a legend to the plot:
p$set(width = 1000, height = 600)
p$legend(
x = 580,
y = 0,
width = 50,
height = 200,
horizontalAlign = "left"
)
To save the rchart:
p$save("ps-us-top-income-shares.html", cdn = TRUE)
An alternative based on the nvd3 library can be obtained (without any of the fancy stuff) with:
df$Year <- strftime(df$Year, format = "%Y")
n <- nPlot(data = df, value ~ Year, group = 'Fractile', type = 'lineChart')

Here is one way to solve (1) and (2). The argument showPercent is not to add % to the values, but to recompute the values so that they stack up to 100% which is why you are seeing the behavior you pointed out.
At this point, you will see that we are still having to write custom javascript to tweak the x-axis to get it to display the way we want it to. In future iterations, we will strive to allow the entire dimple API to be accessible within rCharts.
df <- read.csv("ps-income-shares.csv")
p <- dPlot(
value ~ Year,
groups = c("Fractile"),
data = df,
type = "line",
bounds = list(x = 50, y = 50, height = 300, width = 500)
)
p$xAxis(inputFormat = '%Y-%m-%d', outputFormat = '%Y')
p$yAxis(outputFormat = "%")
p$setTemplate(afterScript = "
<script>
myChart.axes[0].timeField = 'Year'
myChart.axes[0].timePeriod = d3.time.years
myChart.axes[0].timeInterval = 5
myChart.draw()
//if we wanted to change our line width to match the ggplot chart
myChart.series[0].shapes.style('stroke-width',1);
</script>
")
p

rCharts is rapidly evolving. I know it is late, but in case someone else would like to see it, here is an almost complete replication of the ggplot sample shown.
#For information, the chart above is based
#on the data put together by Thomas Piketty and Emmanuel Saez
#in their study of U.S. top incomes.
#The data and more may be found on their website, e.g.
#http://elsa.berkeley.edu/users/saez/
#http://piketty.pse.ens.fr/en/
#read in the data
df <- read.csv(
"https://gist.githubusercontent.com/ptoche/872a77b5363356ff5399/raw/ac86ca43931baa7cd2e17719025c8cde1c278fc1/ps-income-shares.csv",
stringsAsFactors = F
)
#get year as date
df$YearDate <- as.Date(df$Year)
library("ggplot2")
library("scales")
p <- ggplot(data = df, aes(x = YearDate, y = value, color = Fractile))
p <- p + geom_line()
p <- p + theme_bw()
p <- p + scale_x_date(limits = as.Date(c("1911-01-01", "2023-01-01")), labels = date_format("%Y"))
p <- p + scale_y_continuous(labels = percent)
p <- p + theme(legend.position = "none")
p <- p + geom_text(data = subset(df, Year == "2012-01-01"), aes(x = YearDate, label = Fractile, hjust = -0.2), size = 4)
p <- p + xlab("")
p <- p + ylab("")
p <- p + ggtitle("U.S. top income shares (%)")
gp <- p
gp
p <- dPlot(
value ~ Year,
groups = c("Fractile"),
data = df,
type = "line",
bounds = list(x = 50, y = 50, height = 300, width = 500)
)
p$xAxis(inputFormat = '%Y-%m-%d', outputFormat = '%Y')
p$yAxis(outputFormat = "%")
p$setTemplate(afterScript = "
<script>
myChart.axes[0].timeField = 'Year'
myChart.axes[0].timePeriod = d3.time.years
myChart.axes[0].timeInterval = 5
myChart.draw()
//if we wanted to change our line width to match the ggplot chart
myChart.series[0].shapes.style('stroke-width',1);
//to take even one step further
//we can add labels like in the ggplot example
myChart.svg.append('g')
.selectAll('text')
.data(
d3.nest().key(function(d){return d.cx}).map(myChart.series[0]._positionData)[myChart.axes[0]._max])
.enter()
.append('text')
.text(function(d){return d.aggField[0]})
.attr('x',function(d){return myChart.axes[0]._scale(d.cx)})
.attr('y',function(d){return myChart.axes[1]._scale(d.cy)})
.attr('dy','0.5em')
.style('font-size','80%')
.style('fill',function(d){return myChart._assignedColors[d.aggField[0]].fill})
</script>
")
p$defaultColors(ggplot_build(gp)$data[[2]]$colour)
p

Related

Plotting geom_segment with position_dodge

I have a data set with information of where individuals work at over time. More specifically, I have information on the interval at which individuals work in a given workplace.
library('tidyverse')
library('lubridate')
# individual A
a_id <- c(rep('A',1))
a_start <- c(201201)
a_end <- c(201212)
a_workplace <-c(1)
# individual B
b_id <- c(rep('B',2))
b_start <- c(201201, 201207)
b_end <- c(201206, 201211)
b_workplace <-c(1, 2)
# individual C
c_id <- c(rep('C',2))
c_start <- c(201201, 201202)
c_end <- c(201204, 201206)
c_workplace <-c(1, 2)
# individual D
d_id <- c(rep('D',1))
d_start <- c(201201)
d_end <- c(201201)
d_workplace <-c(1)
# final data frame
id <- c(a_id, b_id, c_id, d_id)
start <- c(a_start, b_start, c_start, d_start)
end <- c(a_end, b_end, c_end, d_end)
workplace <- as.factor(c(a_workplace, b_workplace, c_workplace, d_workplace))
mydata <- data.frame(id, start, end, workplace)
mydata_ym <- mydata %>%
mutate(ymd_start = as.Date(paste0(start, "01"), format = "%Y%m%d"),
ymd_end0 = as.Date(paste0(end, "01"), format = "%Y%m%d"),
day_end = as.numeric(format(ymd_end0 + months(1) - days(1), format = "%d")),
ymd_end = as.Date(paste0(end, day_end), format = "%Y%m%d")) %>%
select(-ymd_end0, -day_end)
I would like a plot where I can see the patterns of how long each individual works at each workplace as well as how they move around. I tried plotting a geom_segment as I have information of start and end date the individual works in each place. Besides, because the same individual may work in more than one place during the same month, I would like to use position_dodge to make it visible when there is overlap of different workplaces for the same id-time. This was suggested in this post here: Ggplot (geom_line) with overlaps
ggplot(mydata_ym) +
geom_segment(aes(x = id, xend = id, y = ymd_start, yend = ymd_end),
position = position_dodge(width = 0.1), size = 2) +
scale_x_discrete(limits = rev) +
coord_flip() +
theme(panel.background = element_rect(fill = "grey97")) +
labs(y = "time", title = "Work affiliation")
The problem I am having is that: (i) the position_dodge doesn't seem to be working, (ii) I don't know why all the segments are being colored in black. I would expect each workplace to have a different color and a legend to show up.
If you include colour = workplace in the aes() mapping for geom_segment you get colours and a legend and some dodging, but it doesn't work quite right (it looks like position_dodge only applies to x and not xend ... ? this seems like a bug, or at least an "infelicity", in position_dodge ...
However, replacing geom_segment with an appropriate use of geom_linerange does seem to work:
ggplot(mydata_ym) +
geom_linerange(aes(x = id, ymin = ymd_start, ymax = ymd_end, colour = workplace),
position = position_dodge(width = 0.1), size = 2) +
scale_x_discrete(limits = rev) +
coord_flip()
(some tangential components omitted).
A similar approach is previously documented here — a near-duplicate of your question once the colour= mapping is taken care of ...

Display custom axis labels in ggplot2

I'd like to plot histogram and density on the same plot. What I would like to add to the following is custom y-axis label which would be something like sprintf("[%s] %s", ..density.., ..count..) - two numbers at one tick value. Is it possible to obtain this with scale_y_continuous or do I need to work this around somehow?
Below current progress using scales::trans_new and sec_axis. sec_axis is kind of acceptable but the most desirable output is as on the image below.
set.seed(1)
var <- rnorm(4000)
binwidth <- 2 * IQR(var) / length(var) ^ (1 / 3)
count_and_proportion_label <- function(x) {
sprintf("%s [%.2f%%]", x, x/sum(x) * 100)
}
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(binwidth = binwidth) +
geom_density(aes(y = ..count.. * binwidth)) +
scale_y_continuous(
# this way
trans = trans_new(name = "count_and_proportion",
format = count_and_proportion_label,
transform = function(x) x,
inverse = function(x) x),
# or this way
sec.axis = sec_axis(trans = ~./sum(.),
labels = percent,
name = "proportion (in %)")
)
I've tried to create object with breaks before basing on the graphics::hist output - but these two histogram differs.
bins <- (max(var) - min(var))/binwidth
hdata <- hist(var, breaks = bins, right = FALSE)
# hist generates different bins than `ggplot2`
At the end I would like to get something like this:
Would it be acceptable to add percentage as a secondary axis? E.g.
your_plot + scale_y_continuous(sec.axis = sec_axis(~.*2, name = "[%]"))
Perhaps it would be possible to overlay the secondary axis on the primary one, but I'm not sure how you would go about doing that.
You can achieve your desired output by creating a custom set of labels, and adding it to the plot:
library(tidyverse)
library(ggplot2)
set.seed(1)
var <- rnorm(400)
bins <- .1
df <- data.frame(yvals = seq(0, 20, 5), labels = c("[0%]", "[10%]", "[20%]", "[30%]", "[40%]"))
df <- df %>% tidyr::unite("custom_labels", labels, yvals, sep = " ", remove = TRUE)
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(aes(y = ..count..), binwidth = bins) +
geom_density(aes(y = ..count.. * bins), color = "black", alpha = 0.7) +
ylab("[density] count") +
scale_y_continuous(breaks = seq(0, 20, 5), labels = df$custom_labels)

Using R ggridges (R-joyplot) for bar charts

Is it possible to use the ggridges package to draw sets of bars instead of ridgelines, similar to geom_col()?
I have data such as:
dt = tibble(
hr = c(1,2,3,4,1,2,3,4),
fr = c(.1,.5,.9,.1,.4,.9,.9,.4),
gr = c('Mon','Mon','Mon','Mon','Sun','Sun','Sun','Sun')
)
The plot below gives me:
ggplot(dt, aes(x=hr, y=gr, height=fr)) +
geom_ridgeline() + ylab(NULL)
As you can see it draws a line connecting the values. What I am looking for instead are individual columns, as in this plot:
ggplot(dt, aes(x=hr, y=fr)) +
geom_col() + ylab(NULL) +
facet_wrap(~gr)
Here is a solution tracing out the individual bars.
library(tidyverse)
library(ggridges)
dt = tibble(
hr = c(1,2,3,4,1,2,3,4),
fr = c(.1,.5,.9,.1,.4,.9,.9,.4),
gr = c('Mon','Mon','Mon','Mon','Sun','Sun','Sun','Sun')
)
# function that turns an x, y pair into the shape of a bar of given width
make_bar <- function(x, y, width = 0.9) {
xoff <- width/2
data.frame(x = c(x-xoff*(1+2e-8), x-xoff*(1+1e-8), x-xoff, x+xoff, x+xoff*(1+1e-8), x+xoff*(1+2e-8)),
height = c(NA, 0, y, y, 0, NA))
}
# convert data table using make_bar function
dt %>%
mutate(bars = map2(hr, fr, ~make_bar(.x, .y))) %>%
unnest() -> dt_bars
ggplot(dt_bars, aes(x=x, y=gr, height=height)) +
geom_ridgeline() + ylab(NULL)

Bar/Pie Chart Label from Data Frame Column

I am making a pie chart and want to label it with the value for each slice. I have the information in a data frame but the column in which to look should be defined in the function call.
The code is the (decently) long, but I think only 1 line needs to be changed. I have tried mainsym, as.symbol, as.name, quote, and anything else I could think to throw at it but to no avail.
Thanks
library(dplyr)
library(ggplot2)
library(gridExtra)
pie_chart <- function(df, main, labels, labels_title=NULL) {
mainsym <- as.symbol(main)
labelssym <- as.symbol(labels)
# convert the data into percentages. add label position and inner label text
df <- df %>%
mutate(perc = mainsym / sum(mainsym)) %>%
mutate(label_pos = 1 - cumsum(perc) + perc / 2,
inner_label_text = paste0(round(perc * 100), "%\n",main)) #NEED HELP HERE! Replace 'main' with something
#debug print statement
print(df)
# reorder the category factor levels to order the legend
df[[labels]] <- factor(df[[labels]], levels = unique(df[[labels]]))
p <- ggplot(data = df, aes_(x = factor(1), y = ~perc, fill = labelssym)) +
# make stacked bar chart with black border
geom_bar(stat = "identity", color = "black", width = 1) +
# add the percents and values to the interior of the chart
geom_text(aes(x = 1.25, y = label_pos, label = inner_label_text), size = 4) +
# convert to polar coordinates
coord_polar(theta = "y",direction=-1)
return(p)
}
set.seed(42)
donations <- data.frame(donation_total=sample(1:1E5,50,replace=TRUE))
donation_size_levels_same <- seq(0,2E6,10E3)
donations$bracket <- cut(donations$donation_total,breaks=donation_size_levels_same,right=FALSE,dig.lab = 50)
donations.by_bracket <- donations %>%
group_by(bracket) %>%
summarize(n=n(),total=sum(donation_total)) %>%
ungroup() %>%
arrange(bracket)
grid.arrange(
pie_chart(df=donations.by_bracket,main="n",labels="bracket",labels_title="Total Amount Donated"),
pie_chart(df=donations.by_bracket,main="total",labels="bracket",labels_title="Total Amount Donated"))
The label placement still needs some adjustment but this seems to address the labelling issue, if you just replace that one line (where you say need help here) as follows:
mutate(label_pos = 1 - cumsum(perc) + perc / 2,
inner_label_text = paste0(round(perc * 100), "%\n",as.character(df[[main]])))

Set axis for multiple plots in ggplot

I'm working on a population pyramide that should be saved as a gif. Kind of like in this tutorial of Flowing Data, but with ggplot instead of plotrix.
My workflow:
1) Create a population pyramide
2) Create multiple pyramide-plots in a for-loop
for (i in unique(d$jahr)) {
d_jahr <- d %>%
filter(jahr == i)
p <- ggplot(data = d_jahr, aes(x = anzahl, y = value, fill = art)) +
geom_bar(data = filter(d_jahr, art == "w"), stat = "identity") +
geom_bar(data = filter(d_jahr, art == "m"), stat = "identity") +
coord_flip() +
labs(title = paste(i), x = NULL, y = NULL)
ggsave(p,filename=paste("img/",i,".png",sep=""))
}
3) Save the plots as gif with the animation package
My problem:
All years have different values, so the x-axis have different ranges. This results in weird looks in a gif, because the center of the plots jumps to the right, to the left, to the right...
Is it possible to fix the x-axis (in this case y-axis, because of coord-flip()) over multiple plots that are created independently?
You can fix the range of an axis by setting the limits parameter:
library(ggplot2)
lst <- list(
data.frame(x = 1:100, y=runif(100, 0, 10)),
data.frame(x = 1:100, y=runif(100, 0, 100))
)
ylim <- range(do.call(c, lapply(lst, "[[", "y")))
for (x in seq(lst)) {
print(ggplot(lst[[x]], aes(x, y)) + geom_point() + scale_y_continuous(limits=ylim))
}
or by adding +ylim(ylim) instead of +scale_y_continuous(limits=ylim) (via #DeveauP).

Resources