Fill geom_area (ggplot2) with a gradient - r

I am having some troubles applying a gradient fill to my area plot.
The data is as below:
> df
year annual
1 1960 0.0100
2 1961 -0.2700
3 1962 -0.3450
4 1963 -0.6508
5 1964 -0.9458
6 1965 -0.2458
7 1966 0.9492
8 1967 0.5383
9 1968 0.6275
10 1969 0.0000
I've set up a colorRampPalette for the gradient, and I know this works.
spi.cols <- colorRampPalette(c("darkred","red","yellow","white","green","blue","darkblue"),space="rgb")
With the plot, my aim is to have the fill colours follow the values in the annual column. So as to make it easy to tell that values are within certain boundaries. Right now, the plot seems to think every value it is "filling" is equal to zero, and is thus filling it all in one colour only.
ggplot(df, aes(x = year)) +
geom_polygon(aes(y = annual, fill = annual)) +
theme_classic() +
scale_fill_gradientn(colours = spi.cols(12), limits = c(-2.5, 2.5), guide = "legend")
I have also specified the breaks I'd like in my gradient, but I'm not sure how to utilise this. I attempted to use this in values of the scale_fill_gradientn but this was unsuccessful.
spi.breaks <- c(-2.5,-2,-1.6,-1.3,-0.8,-0.5,0.5,0.8,1.3,1.6,2,2.5)
Any help would be much appreciated

Related

Setting custom colors and shading when printing a melted dataframe

I have a melted dataframe that generates the plot below. The data is downloaded from the Federal Reserve, and the first few lines of the melted dataframe are as follows:
> head(df_melt)
Date Variable value
1 Jun 1967 Chauvet-Piger Recession Probability 0.183
2 Jul 1967 Chauvet-Piger Recession Probability 0.108
3 Aug 1967 Chauvet-Piger Recession Probability 0.039
4 Sep 1967 Chauvet-Piger Recession Probability 0.096
5 Oct 1967 Chauvet-Piger Recession Probability 0.048
6 Nov 1967 Chauvet-Piger Recession Probability 0.036
I plot it using the following code:
ggplot(df_melt, aes(x = Date, y= value)) +
geom_line(aes(color = Variable)) +
labs(x = "Date", y = "Unemployment Rate") +
#Some more stuff related to axes, legend etc.
I would like to
Choose the colors
Shade the area under the UREC recession indicator with a light gray
I tried setting colors by changing aes(color = Variable) to color = line_colors
where line_colors is a vector of colors I have defined, but get an error message:
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1992): colour
Run `rlang::last_error()` to see where the error occurred.
I have also tried scale_color_manual without success. What am I doing wrong, and how can I fix these two problems?
Sincerely and with many thanks in advance
Thomas Philips

geom_ribbon() using multiple datasets in ggplot2: Error in as.POSIXct

I am using multiple datasets in ggplot2 to create a time series of event occurrences. The plan is to plot the mean lines (mean being average date of occurrence) of two datasets over time, and use geom_ribbon() to depict the range between +1 and -1 standard deviation above and below the mean (listed below in columns sdv_pos and sdv_neg representing +1 and -1 respectively).
I am able to plot the two mean lines. However, when I insert geom_ribbon I get the following error:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied".
I've tried converting the columns used in the geom_ribbon() line using as.POSIXct() with the origin, but it has not worked. I only get this error with geom_ribbon(), not geom_line().
Here are the two datasets:
Data1:
sdv_pos stv_neg year data1_mean
1976-03-20 1976-03-14 1997 1976-03-17
1976-02-18 1976-01-18 1998 1976-02-03
1976-02-12 1976-01-06 1999 1976-01-24
1976-03-02 1976-01-07 2000 1976-02-04
1976-01-10 1976-01-10 2001 1976-01-10
1976-04-21 1976-02-19 2002 1976-03-21 1
Data2:
sdv_pos sdv_neg year data2_mean
1976-04-24 1976-03-10 1997 1976-04-02
1976-04-21 1976-01-27 1998 1976-03-10
1976-04-21 1976-01-20 1999 1976-03-07
1976-03-23 1976-01-04 2000 1976-02-12
1976-05-05 1976-02-08 2001 1976-03-23
1976-05-01 1976-01-29 2002 1976-03-16
Here is the code I'm using for this. Note that when I remove geom_ribbon() the plot works. However when I include geom_ribbon() I get the error.
graph1<- ggplot()+
geom_line(data = Data1, aes(x = year, y = data2_mean), color = "blue") +
geom_ribbon(data = Data1, aes(x=data2_mean, ymax=sdv_pos, ymin=sdv_neg), fill="pink", alpha=.5)+
geom_line(data = Data2, aes(x = year, y=data2_mean), color = "red") +
geom_ribbon(data = Data2, aes(x=data2_mean, ymax=sdv_pos, ymin=sdv_neg), fill="yellow", alpha=.5)
Note that the year for the x axis and year for the data values are not the same. I use 1976 just to keep the mean line on the same date/month, otherwise the y-axis will extent to include all the years in the study
I found the answer by changing the command to
geom_ribbon(data = Data1, aes(x=data2_mean, ymax=sdv_pos, ymin=sdv_neg), fill="pink", alpha=.5)+
The difference being what the x value is. I thought I had to incorporate the mean as a centerline for the ribbon, but what it does is simply shades in the space between the two lines (sdv_pos, sdv_neg), and needs x for the x-axis to shade the area as it goes.
Seems obvious but I wanted to post an answer here in case anyone runs into the same problem

Specify the colour of ggpairs plot using a variable but not plot that variable

I have a dataset from the world bank with some continuous and categorical variables.
> head(nationsCombImputed)
iso3c iso2c country year.x life_expect population birth_rate neonat_mortal_rate region
1 ABW AW Aruba 2014 75.45 103441 10.1 2.4 Latin America & Caribbean
2 AFG AF Afghanistan 2014 60.37 31627506 34.2 36.1 South Asia
3 AGO AO Angola 2014 52.27 24227524 45.5 49.6 Sub-Saharan Africa
4 ALB AL Albania 2014 77.83 2893654 13.4 6.5 Europe & Central Asia
5 AND AD Andorra 2014 70.07 72786 20.9 1.5 Europe & Central Asia
6 ARE AE United Arab Emirates 2014 77.37 9086139 10.8 3.6 Middle East & North Africa
income gdp_percap.x log_pop
1 High income 47008.83 5.014693
2 Low income 1942.48 7.500065
3 Lower middle income 7327.38 7.384309
4 Upper middle income 11307.55 6.461447
5 High income 30482.64 4.862048
6 High income 67239.00 6.958379
I wish to use ggpairs to plot some of the continuous variables (life_expect, birth_rate, neonat_mortal_rate, gdp_percap.x) in a scatter plot but I would like to colour them using the region categorical variable from the data. I have tried a number of different ways but I cannot colour the continuous variables without including the categorical variable.
ggpairs(nationsCombImputed[,c(2,5,7,8,9,11)],
title="Scatterplot of Variables",
mapping = ggplot2::aes(color = region),
labeller = "iso2c")
But I get this error
Error in stop_if_high_cardinality(data, columns,
cardinality_threshold) : Column 'iso2c' has more levels (211) than
the threshold (15) allowed. Please remove the column or increase the
'cardinality_threshold' parameter. Increasing the
cardinality_threshold may produce long processing times
Ultimately I would just like a 4x4 scatter plot of the continuous variables coloured by region with the data points labels using the iso2c code in column 2.
Is this possible in ggpairs?
Well yes it is possible! As per #Robin Gertenbach suggestions I added the columns argument to my code and this worked great, please see below.
ggpairs(nationsCombImputed,
title="Scatterplot of Variables",
columns = c(5,7,8,11),
mapping=ggplot2::aes(colour = region))
I still wish to add data point labels to the scatter plot using the iso2c column but I am struggling with this, any pointers would be greatly appreciated.
As mentioned in the comment you can get ggpairs to color but not plot a dimension by specifying the numeric indices of the columns you do want to plot with columns = c(5,7,8,11).
To have a text scatter plot you will need to define a function e.g. textscatter that you will supply via lower = list(continuous = textscatter) in the ggpairs function call and specify the labels in the aesthetics.
textscatter <- function(data, mapping, ...) {
ggplot(data, mapping, ...) + geom_text()
}
ggpairs(
nationsCombImputed,
title="Scatterplot of Variables",
columns = c(5,7,8,11),
mapping=ggplot2::aes(colour = region, label = iso2c))
lower = list(continuous = textscatter)
)
Of course you can also put the label aesthetic definition into textscatter

How can I get my area plot to stack using ggplot?

I am trying to get my cumulative area plot to stack using the code below, which is based on http://dantalus.github.io/2015/08/16/step-plots/. I have added in position=stack, however the plot still overlaps.
The aim of what I am trying to achieve is to show the cumulative number of publications each year within a given period. So, as an example, in 1940 there may be one publication, the following year there may be 2 more, bringing the cumulative total to 3.
What would be the best way to get the areas to stack on top of each other?
How can the order be controlled? Would I need to use arrange() to order TERM2?
ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
stat_bin(data = subset(working, TERM2=="A"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) +
stat_bin(data = subset(working, TERM2=="B"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack",alpha=0.1) +
stat_bin(data = subset(working, TERM2=="Both"),bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) +
ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")
What I am currently getting:
Example of what I am trying to achieve:
The following chart was created in Excel using the same data which is exactly what I am looking to achieve in R.
My Data:
Example of how my data is currently structured:
Year TERM2
1944 A
1959 B
1966 A
1968 B
1968 A
1970 A
1971 B
1971 B
1971 A
1971 A
1971 Both
1971 Both
1971 Both
1972 A
1972 Both
1972 Both
1973 B
1973 A
1974 A
1974 A
'data.frame': 803 obs. of 6 variables:
$ Year : int 1944 1959 1966 1968 1968 1970 1971 1971 1971 1971 ...
$ TERM2 : Factor w/ 3 levels "B","A","Both": 2 1 2 1 2 2 1 1 2 2 ...
Changes based on user127649's suggestions
This is the plot after user127649's suggestions, which is close to what I would expect except I am looking for it to start at 0 and end at 803 (total number of publications).
ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
stat_bin(bins=80, aes(y=cumsum(..count..)), geom="area", alpha=0.1) +
ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")
I think there were two issues.
When You use stat_bin() in three separate layers, each effectively has it’s own independent data set. This will give the correct count, but (and this is a guess really) I think being in three separate layers means you can’t stack them.
If you use stat_bin() on all the layers I think stat = '..count..' performs cumsum() on the data as a whole.
I don’t know whether this is the best approach or not, but I think it’s what you’re after.
Data
The data are grouped and cumsum() is used on each group separately.
library(tidyverse)
working <- working %>%
count(Year, TERM2) %>%
spread(TERM2, n, fill = 0) %>%
mutate_at(vars('A', 'B', 'Both'), cumsum) %>%
gather(TERM2, N, -Year, factor_key = T) #%>%
# mutate(TERM2 = ordered(TERM2, levels = rev(levels(TERM2))))
Plot
This code will produce the first plot below. If you prefer the look of the second plot, you can un-comment the last line of the data manipulation chunk.
ggplot(working, aes(Year, N, fill = TERM2)) +
geom_area(position = 'stack') +
ylab("Total Number")
Result

Coloring a line plot based on a third factor in ggplot

I am having a hard time with a coloring scheme in ggplot. If someone could help me out or send me to another question that would be fantastic.
I have data that look along the lines of
day=rep(1:10, 5)
year=rep(1992:1996, each=10)
state=rep(c("A","B"), each=25)
set.seed(4)
y=runif(50, 5.0, 7.5)
df=data.frame(year,day,state,y)
> head(df)
year day state y
1 1992 1 A 6.464501
2 1992 2 A 5.022364
3 1992 3 A 5.734349
4 1992 4 A 5.693437
5 1992 5 A 7.033936
6 1992 6 A 5.651069
I want to create a plot similar to the below. Using the code:
library(ggplot2)
p = ggplot(df, aes(day, y))
p = p + geom_line(aes(colour = factor(year)))
print(p)
I want the coloring to be based off of the state variable. I would like the years that are in state 'A' to be one color and the years in state 'B' to be another.
Thank you
If you want it separated by years but colored by state the key is to use the group= argument:
ggplot(data=df, aes(x=day, y=y, group=year, colour=state)) +
geom_line() +
geom_point()

Resources