Plotting secondary axis using ggplot - r

i am trying to plot three variable (SA,SA1,SA2) with two variable(SA& SA2) on left y-axis and one variable (SA1)on right secondary y-axis. I tried to fix the axis limits using limits = c(1e15,5e15) on left y-axis while trying to limit secondary axis between limits = c(3e17,4.2e17) but i am unable to plot the seocondary axis with my customized limits. DATA Link
library(ggplot2)
test <- read.xlsx2("filepath/test.xlsx", 1, header=TRUE)
View(test)
test$SA=as.numeric(levels(test$SA))[test$SA]
test$SA1=as.numeric(levels(test$SA1))[test$SA1]
test$SA2=as.numeric(levels(test$SA2))[test$SA2]
g <- ggplot(test,aes(x=year, y= SA, group = 1)) + geom_line(mapping = aes(x = test$year, y = test$SA))
+ geom_line(mapping = aes(x = test$year, y = test$SA2), color = "red") + geom_line(mapping = aes(x = test$year, y = test$SA1), size = 1, color = "blue")
g+scale_y_continuous(name = "primary axis title",
+ sec.axis = sec_axis(~./5, name = "secondary axis title (SA1)"))
Final Solution by #dc37 gives me the followibng result:
ggplot(subset(DF, Var != "SA1"), aes(x = year, y = val, color = Var))+
geom_line()+
scale_y_continuous(name = "Primary axis", sec.axis = sec_axis(~.*100, name = "Secondary"))
Thanks

The argument sec.axis is only creating a new axis but it does not change your data and can't be used for plotting data.
To do be able to plot data from two groups with a large range, you need to scale down SA1 first.
Here, I scaled it down by dividing it by 100 (because the ratio between the max of SA1 and the max of SA and SA2 is close to 100) and I also reshape your dataframe in longer format more suitable for ggplot2:
library(lubridate)
df$year = parse_date_time(df$year, orders = "%Y") # To set year in a date format
library(dplyr)
library(tidyr)
DF <- df %>% mutate(SA1_100 = SA1/100) %>% pivot_longer(.,-year, names_to = "Var",values_to = "val")
# A tibble: 44 x 3
year Var val
<int> <chr> <dbl>
1 2008 SA 1.41e15
2 2008 SA1 3.63e17
3 2008 SA2 4.07e15
4 2008 SA1_100 3.63e15
5 2009 SA 1.53e15
6 2009 SA1 3.77e17
7 2009 SA2 4.05e15
8 2009 SA1_100 3.77e15
9 2010 SA 1.52e15
10 2010 SA1 3.56e17
# … with 34 more rows
Then, you can plot it by using (I subset the dataframe to remove "SA1" and keep the transformed column "SA1_100"):
library(ggplot2)
ggplot(subset(DF, Var != "SA1"), aes(x = year, y = val, color = Var))+
geom_line()+
scale_y_continuous(name = "Primary axis", sec.axis = sec_axis(~.*100, name = "Secondary"))
BTW, in ggplot2, you don't need to design column using $, simply write the name of it.
Data
structure(list(year = 2008:2018, SA = c(1.40916e+15, 1.5336e+15,
1.52473e+15, 1.58394e+15, 1.59702e+15, 1.54936e+15, 1.6077e+15,
1.59211e+15, 1.73533e+15, 1.7616e+15, 1.67771e+15), SA1 = c(3.63e+17,
3.77e+17, 3.56e+17, 3.68e+17, 3.68e+17, 3.6e+17, 3.6e+17, 3.68e+17,
3.55e+17, 3.58e+17, 3.43e+17), SA2 = c(4.07e+15, 4.05e+15, 3.94e+15,
3.95e+15, 3.59e+15, 3.53e+15, 3.43e+15, 3.2e+15, 3.95e+15, 3.03e+15,
3.16e+15)), row.names = c(NA, -11L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x56412c341350>)

Related

How can I change dates on an X axis into 'day 1', 'day 2' etc for a line graph plot?

I am trying to modify a line graph i have already made. On the x axis, it has the data in which a participant completed a task. However, I am trying to make it so the x axis simply show each completed session of the task as day 1, day 2 etc.... Is there a way to do this?
My code for the line graph is as follows:
ggplot(data = p07_points_scored, aes(x = day, y = total_score, group = 1)) +
geom_line() +
geom_point() +
theme(axis.text.x = element_text(angle = 60, vjust = 0.5)) +
labs(title=" P07s Total score on the training tool",
x = "Date of training completion",
y = "Total Score",
color = "lightblue") +
geom_smooth()
To further add to this. I have 4 separate line graphs from individual participants showing their total scores within the task. Is there a way to combine the separate graphs together into 1?
Many thanks :)
enter image description here
Here is an example with fake data: The key point is to mutate a new column days and assign it to the x axis with fct_inorder():
library(tidyverse)
library(lubridate)
# Create some fake data:
date <- dmy("6-8-2022"):dmy("5-9-2022")
y = rnorm(31, mean = 2300, sd = 100)
df <- tibble(date, y)
df %>%
mutate(days = paste0("day",row_number())) %>%
ggplot(aes(x = fct_inorder(days), y = y, group= 1)) +
geom_point()+
geom_line()
data:
df <- structure(list(date = 19210:19240, y = c(2379.71407792736, 2349.90296535465,
2388.14396999868, 2266.84629740315, 2261.95099255488, 2270.90461436351,
2438.19569234793, 2132.6468717962, 2379.46892613664, 2406.13636097426,
2176.9392984643, 2219.0521150482, 2221.22674399102, 2399.82972150781,
2396.76276645913, 2233.62763324748, 2468.98833991591, 2397.47855248058,
2486.96828322353, 2330.04116860874, 2280.66624489061, 2411.09933781266,
2281.06682518505, 2281.63162850277, 2235.66952459084, 2271.2152525563,
2481.86164459452, 2544.25592495568, 2411.90218614317, 2275.60378793237,
2297.98843827031)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-31L))

How to plot Quarterly and Year-to-Date values in ggplot?

Raw data
structure(list(attainment_target = c(7.5, 15), quarter_2022 = c("Q1",
"Q2"), total_attainment = c(2, 4), percent_attainment = c(0.2666,
0.2666)), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))
Quarter | Target | Attainment
2022-01-01 7.5 2
2022-04-01 15 4
Scenario
I would like to plot a ggplot (geom_col or geom_bar) with Quarter as x-axis and Attainment as y-axis with Target as a horizontal dash line that shows how far off I am from that value.
However, I am having trouble plotting YTD (Total attainment given # of quarters) in the same plot. Here is an example of how I used dplyr to create new field that shows calculated YTD value:
Desired output
Quarter | Target | Attainment | YTD. | % Attainment
2022-01-01 7.5 2 2 27
2022-04-01 15 4 6 40
Which is the best way to plot this via ggplot in R? Here is my current approach but having trouble incorporating all the above:
df1 <- df %>%
mutate(YTD_TOTAL = sum(total_attainment)) %>%
mutate(YTD_PERCENT_ATTAINMENT = sum(total_attainment) / max(attainment_target))
ggplot(data = df1, aes(fill=quarter_2022, x=attainment_target, y=total_attainment, color = quarter_2022, palette = "Paired",
label = TRUE,
position = position_dodge(0.9)))
Not sure exactly what you have in mind but here are some of the pieces you might want to use:
df %>%
mutate(YTD_TOTAL = cumsum(total_attainment)) %>%
mutate(YTD_PERCENT_ATTAINMENT = YTD_TOTAL/ attainment_target) %>%
ggplot(aes(quarter_2022, total_attainment)) +
geom_col(aes(y = YTD_TOTAL), fill = NA, color = "gray20") +
geom_text(aes(y = YTD_TOTAL, label = scales::percent(YTD_PERCENT_ATTAINMENT)),
vjust = -0.5) +
geom_col(fill = "gray70", color = "gray20") +
geom_text(aes(label = total_attainment),
position = position_stack(vjust = 0.5)) +
geom_segment(aes(x = as.numeric(as.factor(quarter_2022)) - 0.4,
xend = as.numeric(as.factor(quarter_2022)) + 0.4,
y = attainment_target, yend = attainment_target),
linetype = "dashed")

double y-axes plot in R

I want to build plot with double y-axes.
In image you can see my dataframe and plot. It was done in Excel, I need to do the sames in R. I tried to use latticeExtra library, but it doesn't show any lines and boxes
library(latticeExtra)
obj1 <- xyplot(Q_TY_PAPER ~ PU, df, type = "h")
obj2 <- xyplot(COM_USD ~ PU, df, type = "l")
doubleYScale(obj1, obj2, text = c("obj1", "obj2"))`
Can you please help me?
Here the capture of my dataset and the plot that I would like to get:
You need to separate your dataframe in two, one that will be used for the barchart and need to be reshape and the second one to be used for the line that need to be scaled.
Basically, the line will be plot on the same y axis that the barchart, however, we will add a secondary y axis that will have mark corresponding to the "real" value of the line.
So, first, we need to rescale the value plot as a line. As, we saw in your example that a value of 8 in the barchart match a value of 500 for the line, we can rescale by applying a ratio of 8/500:
df_line = df[,c("PU","COM_USD")]
df_line$COM_USD_2 = df_line$COM_USD * 8/500
> df_line
PU COM_USD COM_USD_2
1 Client1 464 7.424
2 Client2 237 3.792
3 Client3 179 2.864
4 Client4 87 1.392
5 Client5 42 0.672
6 Client6 27 0.432
7 Client7 10 0.160
For the barchart, we need to pivot the data in a longer format in order to fit the grammar of ggplot2. For doing that, we can use pivot_longer from tidyr packages (loaded with tidyverse):
library(tidyverse)
df_bar <- df %>% select(-COM_USD) %>% pivot_longer(., - PU, names_to = "Variable", values_to = "Value")
# A tibble: 21 x 3
PU Variable Value
<fct> <chr> <dbl>
1 Client1 Q_TY_PAPER 7.1
2 Client1 Q_TY_ONLINE 7.1
3 Client1 CURR 6
4 Client2 Q_TY_PAPER 3.8
5 Client2 Q_TY_ONLINE 3.8
6 Client2 CURR 3.9
7 Client3 Q_TY_PAPER 4.4
8 Client3 Q_TY_ONLINE 4.4
9 Client3 CURR 2.3
10 Client4 Q_TY_PAPER 2.6
# … with 11 more rows
Now, you can plot both of them by doing:
library(tidyverse)
ggplot(df_bar, aes(x = PU, y = Value))+
geom_bar(aes(fill = Variable), stat = "identity", position = position_dodge(), alpha = 0.8)+
geom_line(data = df_line, aes(x = PU, y = COM_USD_2, group = 1), size = 2, color = "blue")+
scale_y_continuous(name = "Quantity", limits = c(0,8), sec.axis = sec_axis(~(500/8)*., name = "USD"))+
theme(legend.title = element_blank(),
axis.title.x = element_blank())
As you can see, in scale_y_continuous, we are setting a second axis that will have the value of its ticks multiply by the reverse ratio (500/8). Like that, it will match values of the line plotted.
Finally, you get the following plot:
DATA
PU = paste0("Client",1:7)
COM_USD = c(464,237,179,87,42,27,10)
Q_TY_PAPER = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
Q_TY_ONLINE = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
CURR = c(6.0,3.9,2.3,0.2,0.2,0.1,0)
df = data.frame(PU,COM_USD, Q_TY_PAPER, Q_TY_ONLINE, CURR)
EDIT: Dealing with long names as x-axis labels
If your real data names of clients is too long, you can use this solution (Two lines of X axis labels in ggplot) to write them on two lines.
So, first modifying the PU variables:
PU = c("Jon Jon", "Bob Bob", "Andrew Andrew", "Henry Henry", "Alexander Alexander","Donald Donald", "Jack Jack")
COM_USD = c(464,237,179,87,42,27,10)
Q_TY_PAPER = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
Q_TY_ONLINE = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
CURR = c(6.0,3.9,2.3,0.2,0.2,0.1,0)
df = data.frame(PU,COM_USD, Q_TY_PAPER, Q_TY_ONLINE, CURR)
Then, we apply the same code as described above:
df_line = df[,c("PU","COM_USD")]
df_line$COM_USD_2 = df_line$COM_USD * 8/500
library(tidyverse)
df_bar <- df %>% select(-COM_USD) %>% pivot_longer(., - PU, names_to = "Variable", values_to = "Value")
But for the plot, you can use scale_x_discrete and specify labels by adding \n to indicate R to write x-labels on multiple lines:
ggplot(df_bar, aes(x = PU, y = Value))+
geom_bar(aes(fill = Variable), stat = "identity", position = position_dodge(), alpha = 0.8)+
geom_line(data = df_line, aes(x = PU, y = COM_USD_2, group = 1), size = 2, color = "blue")+
scale_y_continuous(name = "Quantity", limits = c(0,8), sec.axis = sec_axis(~(500/8)*., name = "USD"))+
theme(legend.title = element_blank(),
axis.title.x = element_blank())+
scale_x_discrete(labels = gsub(" ","\n",PU), breaks = PU)
And you get this:

Scale the x-axes with quarterly date format

I created a plot in R using the ggplot library:
library(ggplot2)
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = variable), size = 1) +
scale_color_manual(values = c("#00AFBB", "#E7B800"))
I got the plot that I want but the only problem is that variable, yQ values have the format:
1990Q1
1900Q2
1990Q3
1990Q4
......
......
2017Q1
2017Q2
2017Q3
2017Q4
and because there are many years, the x-axis label cannot show all the dates clearly (they overlapped).
Therefore, I want the x-axis label to show only Q1 and Q3 for every 5 years.
So I want the x-axis to be something like this:
1990Q1 1990Q3 1995Q1 1995Q3 ...... 2015Q1 2015Q3
I tried to use scale_x_date but my dates are not in date format (e.g. 1990Q1) and therefore this does not work. How can I fix it?
The question does not provide reproducible input but using df from the Note below with the autoplot.zoo method of ggplot's autoplot generic we can write:
library(ggplot2)
library(zoo)
z <- read.zoo(df, index = "yQ", FUN = as.yearqtr)
autoplot(z) + scale_x_yearqtr()
Note
Test input--
df <- data.frame(yQ = c("1990Q1", "1990Q2", "1990Q3", "1990Q4"), value = 1:4)
The zoo::format.yearqtr() function is quite easy to use with ggplot2.
Try
scale_x_date(labels = function(x) zoo::format.yearqtr(x, "%YQ%q"))
Use function zoo::as.yearqtr (zoo package) to work with quarterly dates.
Generate example data:
year <- 1990:2000
quar <- paste0("Q", 1:4)
foo <- as.vector(outer(year, quar, paste0))
data <- data.frame(dateQ = foo, Y = rnorm(length(foo)))
head(data)
dateQ Y
1 1990Q1 -0.09944705
2 1991Q1 0.14493910
3 1992Q1 0.54856787
4 1993Q1 1.12966224
5 1994Q1 -0.93539302
6 1995Q1 0.24772265
Transform quarterly date to "normal" date:
data$dateNorm <- as.Date(zoo::as.yearqtr(data$dateQ))
head(data)
dateQ Y dateNorm
1 1990Q1 -0.09944705 1990-01-01
2 1991Q1 0.14493910 1991-01-01
3 1992Q1 0.54856787 1992-01-01
4 1993Q1 1.12966224 1993-01-01
5 1994Q1 -0.93539302 1994-01-01
6 1995Q1 0.24772265 1995-01-01
It sets Q1/2/3/4 as the first day of January/April/July/October.
data[grep("1991", data$dateQ), ]
dateQ Y dateNorm
2 1991Q1 0.1449391 1991-01-01
13 1991Q2 1.5878678 1991-04-01
24 1991Q3 -0.1071823 1991-07-01
35 1991Q4 2.2905729 1991-10-01
Now you can plot it or perform other calculations as it's in Date format.
library(ggplot2)
ggplot(data, aes(dateNorm, Y)) +
geom_line()
You can
manipulate x-axis breaks and labels with scale_x_discrete(breaks = ..., labels = ...)
change the angle of text with theme(axis.text.x = element_text(angle = ...))
I generated some data
Combs <- expand.grid(1990:2017, c("Q1", "Q2", "Q3", "Q4"))
df <- data.frame(
yQ = sort(apply(Combs, 1, paste, collapse="")),
value = runif(112)
)
In the first example, I subset yQ values you want with a logical vector - and change the angle of text
library(ggplot2)
pattern <- c(T, F, T, F, rep(F, 16))
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ[pattern], labels = df$yQ[pattern]) +
theme(axis.text.x = element_text(angle=90))
But notice that ticks marks not specified by break are not shown - so the alternative is to copy yQ values into a vector and make non-relevant years = ""
xVec <- as.character(df$yQ)
xVec[pattern==F] <- ""
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ, labels = xVec) +
theme(axis.text.x = element_text(angle=90))

facet_grid() causing crash

I can not figure out what I'm missing. I keep crashing r or causing it to give very weird plots.
> head(vData)
vix.Close vstoxx vxfxi.Close Date
2011-03-16 29.40 35.2293 35.84 2011-03-16
2011-03-17 26.37 30.6133 31.77 2011-03-17
2011-03-18 24.44 28.5337 29.31 2011-03-18
2011-03-21 20.61 25.2355 25.95 2011-03-21
2011-03-22 20.21 24.3914 24.52 2011-03-22
2011-03-23 19.17 23.9226 24.03 2011-03-23
The below works:
p1.1<-ggplot(data = vData, aes(x = Date, y = vix.Close)) + geom_line(col= "red")
p1.1
p2<-p1.1 + geom_line(data = vData[!is.na(vData$vstoxx),], aes(x = Date, y = vstoxx), col="blue")
p2
p3<-p2 + geom_line(data = vData[!is.na(vData$vxfxi.Close),], aes(x = Date, y = vxfxi.Close), col="green")
p3
p4<-p3 + labs(title = "Volatility Indexes", x = "Time", y = "Index")
p4
But this is the part that is giving me trouble:
p5<- p4 + facet_grid(Date~., scales = Date)
p5
I echo what baptiste said: what is it you're trying to do? The code you've provided suggests that you're trying to create a separate line chart for each date in the dataset, which doesn't make much sense. For this demonstration, I'll show you how to facet the data by year to see the correlations between the different measurements of volatility over time. If you provide more detail as a comment, I'll revisit the code.
First let's take a look at what you've already done.
library(tidyverse)
library(gridExtra)
library(lubridate)
library(reshape2)
#Generate dummy data
vData <- tibble(
vix.Close = rnorm(1000, mean = 12, sd = 5),
vstoxx = rnorm(1000, mean = 12, sd = 5),
vxfxi.Close = rnorm(1000, mean = 12, sd = 5),
Date = as.Date(1:1000, origin = '2011-01-01')
)
# Generate individual plots per your question
p1.1 <-
ggplot(data = vData, aes(x = Date, y = vix.Close)) + geom_line(col = "red")
p1.1
p2 <-
p1.1 + geom_line(data = vData[!is.na(vData$vstoxx), ], aes(x = Date, y = vstoxx), col =
"blue")
p2
p3 <-
p2 + geom_line(data = vData[!is.na(vData$vxfxi.Close), ], aes(x = Date, y = vxfxi.Close), col =
"green")
p3
p4 <-
p3 + labs(title = "Volatility Indexes", x = "Time", y = "Index")
p4
You're creating four different plots and then layering them on top of each other. This approach works here, but it's cumbersome to make changes to each of the calls to ggplot or if you want to add/remove variables. Let's move your data to a "long" format and simplify the ggplot call.
# Melt the data into three columns and remove NAs
vData <- melt(vData, id = "Date") %>%
filter(!is.na(value)) %>%
tbl_df()
# Create one ggplot for all three indexes
ggplot(data = vData, aes(x = Date, y = value, color = variable)) +
geom_line() +
labs(title = "Volatility Indexes", x = "Time", y = "Index")
Now back to the big problem: you shouldn't be faceting by date because that would give you a huge number of tiny unreadable line charts. There are a number of other facets that might make sense. For example, you could look at the distribution of the three indexes by year.
ggplot(data = vData, aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
labs(title = "Volatility Indexes", x = "", y = "") +
facet_grid(year(Date) ~ .)
So put some thought into what exactly you want to show.

Resources