cumulative plot using ggplot2 - r

I'm learning to use ggplot2 and am looking for the smallest ggplot2 code that reproduces the base::plot result below. I've tried a few things and they all ended up being horrendously long, so I'm looking for the smallest expression and ideally would like to have the dates on the x-axis (which are not there in the plot below).
df = data.frame(date = c(20121201, 20121220, 20130101, 20130115, 20130201),
val = c(10, 5, 8, 20, 4))
plot(cumsum(rowsum(df$val, df$date)), type = "l")

Try this:
ggplot(df, aes(x=1:5, y=cumsum(val))) + geom_line() + geom_point()
Just remove geom_point() if you don't want it.
Edit: Since you require to plot the data as such with x labels are dates, you can plot with x=1:5 and use scale_x_discrete to set labels a new data.frame. Taking df:
ggplot(data = df, aes(x = 1:5, y = cumsum(val))) + geom_line() +
geom_point() + theme(axis.text.x = element_text(angle=90, hjust = 1)) +
scale_x_discrete(labels = df$date) + xlab("Date")
Since you say you'll have more than 1 val for "date", you can aggregate them first using plyr, for example.
require(plyr)
dd <- ddply(df, .(date), summarise, val = sum(val))
Then you can proceed with the same command by replacing x = 1:5 with x = seq_len(nrow(dd)).

After a couple of years, I've settled on doing:
ggplot(df, aes(as.Date(as.character(date), '%Y%m%d'), cumsum(val))) + geom_line()

Jan Boyer seems to have found a more concise solution to this problem in this question, which I have shortened a bit and combined with the answers of Prradep, so as to provide a (hopefully) up-to-date-answer:
ggplot(data = df,
aes(x=date)) +
geom_col(aes(y=value)) +
geom_line(aes(x = date, y = cumsum((value))/5, group = 1), inherit.aes = FALSE) +
ylab("Value") +
theme(axis.text.x = element_text(angle=90, hjust = 1))
Note that date is not in Date-Format, but character, and that value is already grouped as suggested by Prradep in his answer above.

Related

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

how to make scale_y_reverse work with scale_y_time

I'm plotting HH:MM:SS on the y-axis against YYYY:MM on the x-axis, but am having issues trying to use scale_y_reverse with scale_y_time together.
sample data below:
library(hms)
library(ggplot2)
df_stack <- data.frame(yr_mn = rep(c("2020-01", "2020-02", "2020-03", "2020-04"),2),
timept_type = c(rep("A",4),rep("B",4)),
timept = as_hms(c("08:00:00", "09:10:00", "11:05:30", "10:45:00", "09:30:10", "10:00:00", "11:25:00", "08:40:00")))
df_stack %>%
ggplot(., aes(x=yr_mn, y=timept, color=timept, fill=timept)) +
geom_point() +
geom_line(aes(group=1))+
facet_wrap(vars(timept_type), scales="free", ncol=2) +
scale_y_time(limits = c(as.POSIXct(as_hms("07:00:00")), as.POSIXct(as_hms("11:30:00")))) +
scale_y_reverse() # this cause the y-axis labels to no longer be in hh:mm:ss format
If I comment out the last line scale_y_reverse() then the graph looks fine, but if I include it, the scale_y_reverse seems to convert the HH:MM:SS into seconds. I have no idea if it's something I'm doing incorrectly or if there's a built-in method within scale_y_time to resolve this. Thanks!
You can omit the scale_y_time and supply functions to scale_y_reverse to generate the labels and the breaks.
df_stack %>%
ggplot(aes(yr_mn, timept, color = timept, fill = timept)) +
geom_point() +
geom_line(aes(group = 1)) +
facet_wrap(vars(timept_type), scales = "free", ncol = 2) +
scale_y_reverse(labels = function(x) as_hms(x),
breaks = seq(as.numeric(as_hms("07:00:00")),
as.numeric(as_hms("11:30:00")),
1800))
Result:
After digging into the code for scale_x_reverse() and scale_x_time(), I've made a simplified version for my data (where x is time, and y is value).
Add a - in front of the variable to be reversed in the aesthetic, and add a reverse label function to scale_x_time(labels = function(x) -x)
ggplot(df, aes(x = -clock, y = points)) +
geom_point() +
scale_x_time(labels = function(x) -x)
xlab("Clock") + ylab("Points")
Reversing the limits may easier and less error-prone.
ggplot(df, aes(x = clock, y = points)) +
geom_point() +
scale_x_time(limits = c(12*60*60,0)) # in seconds
xlab("Clock") + ylab("Points")

How to order geom_segment ggplot with colour

I am new to ggplot library. And trying to draw the plot using the following data.frame:
library(tidyverse)
df <-tribble(~event, ~startdate,~enddate,~loc,
"A",as.POSIXct("1984/02/10"),as.POSIXct("1987/06/10"),"1",
"B",as.POSIXct("1984/02/11"),as.POSIXct("1990/02/12"),"2",
"A",as.POSIXct("1992/05/15"),as.POSIXct("1999/06/15"),"3",
"C",as.POSIXct("2003/08/29"),as.POSIXct("2015/08/29"),"4",
"B",as.POSIXct("2002/04/11"),as.POSIXct("2012/04/12"),"5",
"E",as.POSIXct("2000/02/10"),as.POSIXct("2005/02/15"),"6")
max_date = max(df$startdate,df$enddate)
Using the following code snippet:
ggplot(NULL)+
geom_segment(data = df,aes(x=loc, xend =loc,y = startdate, yend = enddate,colour=event),size = 5,alpha=0.6) +
geom_label(aes(label=df$event,x = df$loc,y=max_date), size=2) +
#geom_point(data=final_df,aes(x=newspaper,y=date),color="black") + Point from other data frame
coord_flip() + xlab("LoC") + ylab("Year")
I can able to output the following chart:
How can I order the above chart using the colour i.e. using the event field (in other word how can I perform group by operation on the event field so that first it should display first all events A then events B, C etc.)? I have tried to use scale_x_continuous and reorder from tidyverse package but it didn't work. How can I display more "Year" on the x-axis? I tried to use scale_x_date (mentioned here R: ggplot display all dates on x axis but it needs as.Date and ggplot geom_segment needs as.POSIXct format). Please feel free to correct me!
Any help would be great! Thank you!
Two options. I've also reversed your x and y so you don't have to use coord_flip() and made several other small modifications including the x-axis labels (you were looking for scale_y_datetime since you flipped the axes and the "dates" were actually in POSIXct). Also, one difference with Duck's answer is my scales = "free" in facet_grid. You might decide your labels and your "loc" variable may not make sense given these new graphs anyway.
library(tibble); library(ggplot2)
df <-tribble(~event, ~startdate,~enddate,~loc,
"A",as.POSIXct("1984/02/10"),as.POSIXct("1987/06/10"),"1",
"B",as.POSIXct("1984/02/11"),as.POSIXct("1990/02/12"),"2",
"A",as.POSIXct("1992/05/15"),as.POSIXct("1999/06/15"),"3",
"C",as.POSIXct("2003/08/29"),as.POSIXct("2015/08/29"),"4",
"B",as.POSIXct("2002/04/11"),as.POSIXct("2012/04/12"),"5",
"E",as.POSIXct("2000/02/10"),as.POSIXct("2005/02/15"),"6")
max_date = max(df$startdate,df$enddate)
ggplot(df)+
geom_segment(aes(y=event, yend = event, x = startdate, xend = enddate, colour=event),size = 5,alpha=0.6) +
geom_label(aes(label=event, y = event, x=max_date), size=2) +
xlab("Year") + ylab("LoC") +
scale_x_datetime(date_breaks = "year", date_labels = "%Y") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
ggplot(df)+
geom_segment(aes(y=loc, yend = loc, x = startdate, xend = enddate, colour=event),size = 5,alpha=0.6) +
geom_label(aes(label=event, y = loc, x=max_date), size=2) +
xlab("Year") + ylab("LoC") +
scale_x_datetime(date_breaks = "year", date_labels = "%Y") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
facet_grid(rows = vars(event), scales = "free")
Created on 2020-10-18 by the reprex package (v0.3.0)
Consider this as an option, as mentioned by #ArthurYip setting reorder could affect the sense of your plot. You could avoid the labels and using facet_grid() in next way:
library(ggplot2)
#Plot
ggplot(df)+
geom_segment(aes(x=loc, xend =loc,y = startdate, yend = enddate,colour=event),size = 5,alpha=0.6) +
coord_flip() + xlab("LoC") + ylab("Year")+
facet_grid(event~.,switch = "x")
Output:

How to show every second R ggplot2 x-axis label value?

I want to show every second of x-axis label list in the presentation.
Simplified code example in the following and its output in Fig. 1 where four Dates shown but #2 and #4 should be skipped.
# https://stackoverflow.com/a/6638722/54964
require(ggplot2)
my.dates = as.Date(c("2011-07-22","2011-07-23",
"2011-07-24","2011-07-28","2011-07-29"))
my.vals = c(5,6,8,7,3)
my.data <- data.frame(date =my.dates, vals = my.vals)
plot(my.dates, my.vals)
p <- ggplot(data = my.data, aes(date,vals))+ geom_line(size = 1.5)
Expected output: skip dates second and fourth.
Actual code
Actual code where due to rev(Vars) logic, I cannot apply as.Date to the values in each category; the variable molten has a column Dates
p <- ggplot(molten, aes(x = rev(Vars), y = value)) +
geom_bar(aes(fill=variable), stat = "identity", position="dodge") +
facet_wrap( ~ variable, scales="free") +
scale_x_discrete("Column name dates", labels = rev(Dates))
Expected output: skip #2,#4, ... values in each category.
I thought here changing scale_x_discrete to scale_x_continuous and having a break sequence breaks = seq(1,length(Dates),2)) in scale_x_continuous but it fails because of the following error.
Error: `breaks` and `labels` must have the same length
Proposal based Juan's comments
Code
ggplot(data = my.data, aes(as.numeric(date), vals)) +
geom_line(size = 1.5) +
scale_x_continuous(breaks = pretty(as.numeric(rev(my.data$date)), n = 5))
Output
Error: Discrete value supplied to continuous scale
Testing EricWatt's proposal application into Actual code
Code proposal
p <- ggplot(molten, aes(x = rev(Vars), y = value)) +
geom_bar(aes(fill=variable), stat = "identity", position="dodge") +
facet_wrap( ~ variable, scales="free") +
scale_x_discrete("My dates", breaks = Dates[seq(1, length(Dates), by = 2)], labels = rev(Dates))
Output
Error: `breaks` and `labels` must have the same length
If you have scale_x_discrete("My dates", breaks = Dates[seq(1, length(Dates), by = 2)]), you get x-axis without any labels so blank.
Fig. 1 Output of the simplified code example,
Fig. 2 Output of EricWatt's first proposal
OS: Debian 9
R: 3.4.0
This works with your simplified example. Without your molten data.frame it's hard to check it against your more complicated plot.
ggplot(data = my.data, aes(date, vals)) +
geom_line(size = 1.5) +
scale_x_date(breaks = my.data$date[seq(1, length(my.data$date), by = 2)])
Basically, use scale_x_date which will likely handle any strange date to numeric conversions for you.
My solution eventually on the actual code motivated by the other linked thread and EricWatt's answer
# Test data of actual data here # https://stackoverflow.com/q/45130082/54964
ggplot(data = molten, aes(x = as.Date(Time.data, format = "%d.%m.%Y"), y = value)) +
geom_bar(aes(fill = variable), stat = "identity", position = "dodge") +
facet_wrap( ~ variable, scales="free") +
theme_bw() + # has to be before axis text manipulations because disables their effect otherwise
theme(axis.text.x = element_text(angle = 90, hjust=1),
text = element_text(size=10)) +
scale_x_date(date_breaks = "2 days", date_labels = "%d.%m.%Y")

ggplot date scales shifts one month forward

I am sure this is an easy one but I couldn't find a solution from other posts.
If I run this:
test <- data.frame(dates = as.Date(c("2016-10-31","2016-11-30", "2016-12-31", "2017-01-31")),
values = c(1, 2, 3, 4))
ggplot(test, aes(x = dates, y = values)) +
geom_bar(position="stack", stat = "identity") +
scale_x_date(breaks = date_breaks("1 months"),labels = date_format("%b-%y"))
I get this:
As you can appreciate, all the dates on the X axis are moved forward to the following month.
I tried to use the scales package as suggested elsewhere but it didn't change a thing.
I can get away with this by tweaking the date using:
test$dates <- as.Date(format(test$dates, "%Y-%m-1"))
which delivers this (without using the scale_x_date bit):
but I am sure there is an elegant way to circumvent the problem.
I have run into this problem myself when doing monthly time series charts. My solution: add in a vector of dates into the "breaks = " section.
I.e.
scale_x_date(breaks = test$dates, labels = date_format("%b-%y"))
Note: When I tried "data_breaks" (like your code), I was not able to get it to work under a number of different permutations. The vector of dates only works with "breaks", not "data_breaks"
ggplot(test, aes(x = dates, y = values)) +
geom_bar(position="stack", stat = "identity") +
scale_x_date(breaks = test$dates, labels = date_format("%b-%y"))
P.s. This is my first attempt at answering a question here. I know that the question is old, but hopefully this will help someone!
The labels are correct when you transform dates with as.POSIXct and use scale_x_datetime instead of scale_x_date (no idea why though):
ggplot(test, aes(x = as.POSIXct(dates), y = values)) +
geom_bar(position="stack", stat = "identity") +
scale_x_datetime(breaks = date_breaks("1 months"), labels = date_format("%b-%y"))

Resources