How to separately scale geoms with ggplot? - r

I am attempting to combine these 2 plots into 1 but I can not figure out how to scale them separately when they are combined. I have scaled the second y axis to the proper level, but it does not adjust the data output on the plot.
The problem is the values of plot 1 range between 0:2 while the range of plot 2 is 0:150000 so even though I have scaled the second y axis, the data will not adjust to that scale. I have read that scale_y_continuous() only adjusts the axis and has no representation of the actual data.
So how would I separately scale each geom?
SETUP:
library(quantmod)
library(tidyverse)
SPY520 <- getOptionChain("SPY", "2022-05-20")
SPY1Mputs <- SPY520[["puts"]]
Plot 1:
Plot 1
SPY1Mputs %>%
ggplot() +
geom_point(aes(x = Strike, y = IV)) +
geom_smooth(aes(x = Strike, y = IV))
Plot 2:
Plot 2
SPY1Mputs %>%
ggplot() +
geom_col(aes(x = Strike, y = OI))
Attempted combine:
Plot 3
SPY1Mputs %>%
ggplot(aes(x = Strike)) +
geom_col(aes(y = OI)) +
geom_point(aes(y = IV)) +
geom_smooth(aes(y = IV)) +
scale_y_continuous(sec.axis = sec_axis((~./60000), name = "IV"))

Related

How to I customise the values on the Y axis

I am using ggplot2 in R to create a histogram and I would like to customise the values on the y axis. At present the values on the Y axis range from one and have an interval of 3. I would like to make all the values on the on the y axis visible i.e. 1,2,3 and so on.
How do I do this?
plot_2 <-
ggplot(Tennis, aes(x=winner)) +
geom_bar(data = subset(top_wins, tournament == "French Open")) +
ggtitle("French Open")
You can use the scale_y_continuous() function. Below is an example where the y axis will go from 0 to 20.
ggplot() + geom_point(data = iris, aes(x = Petal.Width, y = Petal.Length,color = Species)) +
+ scale_y_continuous(limits = c(0, 20), breaks = seq(0, 20, by = 1)

Boxplot and line with dual y-axis from two data frame using ggplot in R

I am using ggplot to put boxplot and line in the same plot. I have two data frames, here are snippets for these two DFs:
TMA.core variable value
1 I-5 H&E 356642.6
2 B-1 H&E 490276.9
3 B-13 H&E 460831.8
4 L-11 H&E 551614.2
5 B-6 H&E 663711.8
6 F-10 H&E 596832.8
(there are many variables.)
TMA.core Mean CoV
I-5 390829.7 0.15181577
B-1 414909.9 0.21738852
B-13 500829.8 0.39049256
L-11 537229.7 0.07387486
B-6 575698.9 0.44764127
F-10 589245.2 0.15382864
What I want to do is draw boxplot using the first data frame and then plot the CoV for the corresponding TMA core and connect using geom_line.
My codes are:
ggplot() +
geom_boxplot(data = Merge_stats_melt, aes(x = reorder(TMA.core, value, FUN = mean), y = value)) +
geom_line(data = Merge_stas_mean_order, aes(x = reorder(TMA.core, Mean), y = CoV, group = 1)) +
scale_y_continuous(
# Add a second axis and specify its features
sec.axis = sec_axis(~./1000000, name = 'CoV')
)
Using these codes I can draw the boxplot but the line is always a horizontal line at y = 0.
How to solve this issue?
Using one or two data frames doesn't really matter. Just remember to adjust the y aesthetic accordingly, which you forgot to do.
library(ggplot2)
library(scales)
Find the ideal scaling factor for the dual axis
ratio <- max(Merge_stats_melt$value) / max(Merge_stas_mean_order$CoV)
ggplot() +
geom_boxplot(data = Merge_stats_melt, aes(x = reorder(TMA.core, value, FUN = mean), y = value)) +
geom_line(data = Merge_stas_mean_order, aes(x = reorder(TMA.core, Mean), y = CoV*ratio, group = 1)) +
scale_y_continuous(labels=comma,
sec.axis = sec_axis(~./ratio, name = 'CoV')
)

How to stop ggplot line plot adding fill

I am producing a ggplot which looks at a curve in a dataset. When I build the plot, ggplot is automatically adding fill to data which is on the negative side of the x axis. Script and plot shown below.
ggplot(df, aes(x = Var1, y = Var2)) +
geom_line() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
Using base R, I am able to get the plot shown below which is how it should look.
plot(x = df$Var1, y = df$Var2, type = "l",
xlab = "Var1", ylab = "Var2")
abline(v = 0)
abline(h = df$Var2[1])
If anyone could help identify why I might be getting the automatic fill and how I could make it stop, I would be very appreciative. I would like to make this work in ggplot so I can later animate the line as it is a time series that can be used to compare between other datasets from the same source.
Can add data if necessary. Data set is 1561 obs long however. Thanks in advance.
I guess you should try
ggplot(df, aes(x = Var1, y = Var2)) +
geom_path() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
instead. The geom_line()-function connects the points in order of the variable on the x-axis.
Take a look at this example
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_line()
The two points with x-coordinate -pi/2 will be connected first, creating a vertical black line. Next x = -pi/2 + 0.001 will be processed and so on. The x values will be processed in order.
Therefore you should use geom_path() to get the desired result
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_path()

R, ggplot stacked bar-chart with position = "fill" and labels

I'm working with ggplot2, stacked barplot to 100% with relative values, using the position = "fill" option in geom_bar().
Here my code:
test <- data.frame (x = c('a','a','a','b','b','b','b')
,k = c('k','j','j','j','j','k','k')
,y = c(1,3,4,2,5,9,7))
plot <- ggplot(test, aes(x =x, y = y, fill = k))
plot <- plot + geom_bar(position = "fill",stat = "identity")
plot <- plot + scale_fill_manual(values = c("#99ccff", "#ff6666"))
plot <- plot + geom_hline(yintercept = 0.50)+ggtitle("test")
plot
Here the result:
However, I need to add the labels on the various bars, also on the "sub bars". To do this, I worked with the geom_text():
plot + geom_text(aes(label=y, size=4))
But the result is not good. I tried without luck the hjust and vjust parameters, and also using something like:
plot + geom_text(aes(label=y/sum(y), size=4))
But I did not reach the result needed (I'm not adding all the tests to not overload the question with useless images, if needed, please ask!).
Any idea about to have some nice centered labels?
label specifies what to show, and y specifies where to show. Since you are using proportions for y-axis with position = "fill", you need to calculate the label positions (geom_text(aes(y = ...))) in terms of proportions for each x using cumulative sums. Additionally, to display only the total proportion of a given color, you will need to extract the Nth row for each x, k combination. Here, I am building a separate test_labels dataset for use in geom_text to display the custom labels:
test <- data.frame (x = c('a','a','a','b','b','b','b'),
k = c('k','j','j','j','j','k','k'),
y = c(1,3,4,2,5,9,7))
test_labels = test %>%
arrange(x, desc(k)) %>%
group_by(x) %>%
mutate(ylabel_pos = cumsum(y)/sum(y),
ylabel = y/sum(y)) %>%
group_by(k, add = TRUE) %>%
mutate(ylabel = sum(ylabel)) %>%
slice(n())
ggplot(test, aes(x =x, y = y, fill = k)) +
geom_bar(position = "fill", stat = "identity") +
scale_fill_manual(values = c("#99ccff", "#ff6666")) +
geom_hline(yintercept = 0.50) +
geom_text(data = test_labels,
aes(y = ylabel_pos, label=paste(round(ylabel*100,1),"%")),
vjust=1.6, color="white", size=3.5) +
ggtitle("test")
Result:
> test_labels
# A tibble: 4 x 5
# Groups: x, k [4]
x k y ylabel_pos ylabel
<fctr> <fctr> <dbl> <dbl> <dbl>
1 a j 4 1.0000000 0.8750000
2 a k 1 0.1250000 0.1250000
3 b j 5 1.0000000 0.3043478
4 b k 7 0.6956522 0.6956522

Conditional colouring of a geom_smooth

I'm analyzing a series that varies around zero. And to see where there are parts of the series with a tendency to be mostly positive or mostly negative I'm plotting a geom_smooth. I was wondering if it is possible to have the color of the smooth line be dependent on whether or not it is above or below 0. Below is some code that produces a graph much like what I am trying to create.
set.seed(5)
r <- runif(22, max = 5, min = -5)
t <- rep(-5:5,2)
df <- data.frame(r+t,1:22)
colnames(df) <- c("x1","x2")
ggplot(df, aes(x = x2, y = x1)) + geom_hline() + geom_line() + geom_smooth()
I considered calculating the smoothed values, adding them to the df and then using a scale_color_gradient, but I was wondering if there is a way to achieve this in ggplot directly.
You may use the n argument in geom_smooth to increase "number of points to evaluate smoother at" in order to create some more y values close to zero. Then use ggplot_build to grab the smoothed values from the ggplot object. These values are used in a geom_line, which is added on top of the original plot. Last we overplot the y = 0 values with the geom_hline.
# basic plot with a larger number of smoothed values
p <- ggplot(df, aes(x = x2, y = x1)) +
geom_line() +
geom_smooth(linetype = "blank", n = 10000)
# grab smoothed values
df2 <- ggplot_build(p)[[1]][[2]][ , c("x", "y")]
# add smoothed values with conditional color
p +
geom_line(data = df2, aes(x = x, y = y, color = y > 0)) +
geom_hline(yintercept = 0)
Something like this:
# loess data
res <- loess.smooth(df$x2, df$x1)
res <- data.frame(do.call(cbind, res))
res$posY <- ifelse(res$y >= 0, res$y, NA)
res$negY <- ifelse(res$y < 0, res$y, NA)
# plot
ggplot(df, aes(x = x2, y = x1)) +
geom_hline() +
geom_line() +
geom_line(data=res, aes(x = x, y = posY, col = "green")) +
geom_line(data=res, aes(x = x, y = negY, col = "red")) +
scale_color_identity()

Resources