How can I get my area plot to stack using ggplot? - r

I am trying to get my cumulative area plot to stack using the code below, which is based on http://dantalus.github.io/2015/08/16/step-plots/. I have added in position=stack, however the plot still overlaps.
The aim of what I am trying to achieve is to show the cumulative number of publications each year within a given period. So, as an example, in 1940 there may be one publication, the following year there may be 2 more, bringing the cumulative total to 3.
What would be the best way to get the areas to stack on top of each other?
How can the order be controlled? Would I need to use arrange() to order TERM2?
ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
stat_bin(data = subset(working, TERM2=="A"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) +
stat_bin(data = subset(working, TERM2=="B"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack",alpha=0.1) +
stat_bin(data = subset(working, TERM2=="Both"),bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) +
ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")
What I am currently getting:
Example of what I am trying to achieve:
The following chart was created in Excel using the same data which is exactly what I am looking to achieve in R.
My Data:
Example of how my data is currently structured:
Year TERM2
1944 A
1959 B
1966 A
1968 B
1968 A
1970 A
1971 B
1971 B
1971 A
1971 A
1971 Both
1971 Both
1971 Both
1972 A
1972 Both
1972 Both
1973 B
1973 A
1974 A
1974 A
'data.frame': 803 obs. of 6 variables:
$ Year : int 1944 1959 1966 1968 1968 1970 1971 1971 1971 1971 ...
$ TERM2 : Factor w/ 3 levels "B","A","Both": 2 1 2 1 2 2 1 1 2 2 ...
Changes based on user127649's suggestions
This is the plot after user127649's suggestions, which is close to what I would expect except I am looking for it to start at 0 and end at 803 (total number of publications).
ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
stat_bin(bins=80, aes(y=cumsum(..count..)), geom="area", alpha=0.1) +
ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")

I think there were two issues.
When You use stat_bin() in three separate layers, each effectively has it’s own independent data set. This will give the correct count, but (and this is a guess really) I think being in three separate layers means you can’t stack them.
If you use stat_bin() on all the layers I think stat = '..count..' performs cumsum() on the data as a whole.
I don’t know whether this is the best approach or not, but I think it’s what you’re after.
Data
The data are grouped and cumsum() is used on each group separately.
library(tidyverse)
working <- working %>%
count(Year, TERM2) %>%
spread(TERM2, n, fill = 0) %>%
mutate_at(vars('A', 'B', 'Both'), cumsum) %>%
gather(TERM2, N, -Year, factor_key = T) #%>%
# mutate(TERM2 = ordered(TERM2, levels = rev(levels(TERM2))))
Plot
This code will produce the first plot below. If you prefer the look of the second plot, you can un-comment the last line of the data manipulation chunk.
ggplot(working, aes(Year, N, fill = TERM2)) +
geom_area(position = 'stack') +
ylab("Total Number")
Result

Related

Setting custom colors and shading when printing a melted dataframe

I have a melted dataframe that generates the plot below. The data is downloaded from the Federal Reserve, and the first few lines of the melted dataframe are as follows:
> head(df_melt)
Date Variable value
1 Jun 1967 Chauvet-Piger Recession Probability 0.183
2 Jul 1967 Chauvet-Piger Recession Probability 0.108
3 Aug 1967 Chauvet-Piger Recession Probability 0.039
4 Sep 1967 Chauvet-Piger Recession Probability 0.096
5 Oct 1967 Chauvet-Piger Recession Probability 0.048
6 Nov 1967 Chauvet-Piger Recession Probability 0.036
I plot it using the following code:
ggplot(df_melt, aes(x = Date, y= value)) +
geom_line(aes(color = Variable)) +
labs(x = "Date", y = "Unemployment Rate") +
#Some more stuff related to axes, legend etc.
I would like to
Choose the colors
Shade the area under the UREC recession indicator with a light gray
I tried setting colors by changing aes(color = Variable) to color = line_colors
where line_colors is a vector of colors I have defined, but get an error message:
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1992): colour
Run `rlang::last_error()` to see where the error occurred.
I have also tried scale_color_manual without success. What am I doing wrong, and how can I fix these two problems?
Sincerely and with many thanks in advance
Thomas Philips

How to add legend on a line plot?

I have a data like this
year catch group
2011 22 1
2012 45 1
2013 34 1
2011 11 2
2012 22 2
2013 32 2
I would like to have the number of the group (1 and 2) to appear above the line in the plot.
Any suggestion?
My real data has 8 groups in total with 8 lines which makes it hard to see because the lines cross one another and the colors of the legend are similar.
I tried this:
library(ggplot2)
ggplot(aes(x=as.factor(year), y=catch, group=as.factor(group),
col=as.factor(group)), data=df) +
geom_line() +
geom_point() +
xlab("year") +
labs(color="group")
Firstly, distinguishing 8 different colours is very difficult. That's why your 8 groups seem to have similar colors.
What you want in this case is not a legend (which usually is an off-chart summary), but rather "annotation".
You can directly add the groups with
ggplot(...) +
geom_text(aes(x=as.factor(year), y=catch, label=group)) +
...
and then try to tweak the position of the text with nudge_x and nudge_y. But if you wanted only 1 label per group, you would have to prepare a data frame with it:
labels <- df %>% group_by(group) %>% top_n(1, -year)
ggplot(...) +
geom_text(data=labels, aes(x=as.factor(year), y=catch, label=group)) +
...

Fill geom_area (ggplot2) with a gradient

I am having some troubles applying a gradient fill to my area plot.
The data is as below:
> df
year annual
1 1960 0.0100
2 1961 -0.2700
3 1962 -0.3450
4 1963 -0.6508
5 1964 -0.9458
6 1965 -0.2458
7 1966 0.9492
8 1967 0.5383
9 1968 0.6275
10 1969 0.0000
I've set up a colorRampPalette for the gradient, and I know this works.
spi.cols <- colorRampPalette(c("darkred","red","yellow","white","green","blue","darkblue"),space="rgb")
With the plot, my aim is to have the fill colours follow the values in the annual column. So as to make it easy to tell that values are within certain boundaries. Right now, the plot seems to think every value it is "filling" is equal to zero, and is thus filling it all in one colour only.
ggplot(df, aes(x = year)) +
geom_polygon(aes(y = annual, fill = annual)) +
theme_classic() +
scale_fill_gradientn(colours = spi.cols(12), limits = c(-2.5, 2.5), guide = "legend")
I have also specified the breaks I'd like in my gradient, but I'm not sure how to utilise this. I attempted to use this in values of the scale_fill_gradientn but this was unsuccessful.
spi.breaks <- c(-2.5,-2,-1.6,-1.3,-0.8,-0.5,0.5,0.8,1.3,1.6,2,2.5)
Any help would be much appreciated

Join gap in polar line ggplot plot

When ggplot makes a line plot with polar coordinates, it leaves a gap between the highest and lowest x-values (Dec and Jan below) instead of wrapping around into a spiral. How can I continue the line and close that gap?
In particular, I want to use months as my x-axis, but plot multiple years of data in one looping line.
Reprex:
library(ggplot2)
# three years of monthly data
df <- expand.grid(month = month.abb, year = 2014:2016)
df$value <- seq_along(df$year)
head(df)
## month year value
## 1 Jan 2014 1
## 2 Feb 2014 2
## 3 Mar 2014 3
## 4 Apr 2014 4
## 5 May 2014 5
## 6 Jun 2014 6
ggplot(df, aes(month, value, group = year)) +
geom_line() +
coord_polar()
Here's a somewhat-hacky option:
# make a data.frame of start values end values should continue to
bridges <- df[df$month == 'Jan',]
bridges$year <- bridges$year - 1 # adjust index to align with previous group
bridges$month <- NA # set x value to any new value
# combine extra points with original
ggplot(rbind(df, bridges), aes(month, value, group = year)) +
geom_line() +
# close gap by removing expansion; redefine breaks to get rid of "NA/Jan" label
scale_x_discrete(expand = c(0,0), breaks = month.abb) +
coord_polar()
Obviously adding extra data points is not ideal, though, so maybe a more elegant answer exists.

Using ggplot, how can I add a point at a time in a loop and then connect them all by a line after the loop?

I am going dealing with a lot of data and I was thinking of plotting one part of the data at a time using a loop.
Here's a sample of the data:
Department Period Sales
1005 1 3354.256
1005 1 5587.164
1005 2 3946.055
1005 2 5739.555
1005 3 3990.139
1005 3 6208.411
1005 4 3887.84
1005 4 6397.811
1008 1 4014.629
1008 1 5370.781
1008 2 4311.249
1008 2 5403.442
1008 3 4028.125
1008 3 6660.305
1008 4 4564.816
My initial idea was to plot one point at a time and then connect the points with a line after exiting the loop.
gp <- ggplot()
for (i in 1:4) {
dat <- qdat[qdat$Period == i,]
gp <- gp + stat_summary(data = dat , aes(x=Period , y=Sales), geom="point", fun.y="sum")
print(gp)
}
final_plot <- gp + geom_line()
However, I only get the points, but am not able to generate any lines connecting the points.
Ideally, I would also like to know if it's possible to plot different line segments at a time to make one continuous line using a loop.
Thanks a lot!!
As was pointed out in the comments, using ggplot you should not add points one by one to the plot. It is easy to plot what you want in one step. I assume (tell me when I'm wrong) that you want to have Period on the x-axis and the sum over all sales belonging to that period on the y-asix. This can be done as follows.
First, I use aggregate() to sum up the sales per period:
plot.data <- aggregate(Sales~Period,data=qdat,FUN=sum)
With this data set, the plot can be done in a single line:
ggplot(plot.data,aes(x=Period,y=Sales)) + geom_point() + geom_line()
Note that I use geom_point() and geom_line() in order to get points connected by a line. Using the data sample you gave, I get the following picture:

Resources