Plotting a geom_arc() over a geom_point() plot - r

I am trying to plot a 90 degree arc from the x axis to the y axis as part of a geom_point() plot I already have with the aim of creating two parts to the graph, inside the arc and outside.
This is the data I have:
set.seed(1)
vector1 <- sample(1:500,250, replace = T)
vector2 <- replicate(250,min(sample(200,2,replace=TRUE)))
so.df <- data.frame(vector1, vector2)
so.df
ggplot(data = so.df, aes(x=vector2, y=vector1)) + geom_point()
I would like the arc to go from 100 on the y-axis to 100 on the x-axis with a radius of 10.
If I can clarify anything further please ask.
Thanks.

You can make the necessary coordinates using a little bit of geometry and then add it on in an additional geom_line:
set.seed(1)
vector1 <- sample(1:500,250, replace = T)
vector2 <- replicate(250,min(sample(200,2,replace=TRUE)))
so.df <- data.frame(vector1, vector2)
so.df
arc_data <- data.frame(
x = 0:100
) %>%
mutate(y = sqrt(100^2 - x^2))
ggplot(data = so.df, aes(x=vector2, y=vector1)) + geom_point() +
geom_line(aes(x, y), data = arc_data, col = "red")
That being said, I'm not sure how something with radius 10 reaches 100 on both axes?

Related

Add a labelling function to just first or last ggplot label

I often find myself working with data with long-tail distributions, so that a huge amount of range in values happens in the top 1-2% of the data. When I plot the data, the upper outliers cause variation in the rest of the data to wash out, but I want to show those difference.
I know there are other ways of handling this, but I found that capping the values towards the end of the distribution and then applying a continuous color palette (i.e., in ggplot) is one way that works for me to represent the data. However, I want to ensure the legend stays accurate, by adding a >= sign to the last legend label
The picture below shows the of legend I'd like to achieve programmatically, with the >= sign drawn in messily in red.
I also know I can manually set breaks and labels, but I'd really like to just do something like, if(it's the last label) ~paste0(">=",label) else label) (to show with pseudo code)
Reproducible example:
(I want to alter the plot legend to prefix just the last label)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- tibble(x = x
,y = y
,z = z)
d %>%
ggplot(aes(x = x
,y = y
,fill = z
,color = z)) +
geom_point() +
scale_color_viridis_c()
One option would be to pass a function to the labels argument which replaces the last element or label with your desired label like so:
library(ggplot2)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- data.frame(
x = x,
y = y,
z = z
)
ggplot(d, aes(
x = x,
y = y,
fill = z,
color = z
)) +
geom_point() +
scale_fill_continuous(labels = function(x) {
x[length(x)] <- paste0(">=", x[length(x)])
x
}, aesthetics = c("color", "fill"))

trouble adding SMA fit line to ggplot - geom_segment() nor geom_abline() don't match

I am trying to add the trendline from an SMA (standardized major axis) fit to my ggplot. However, when I extract the coefficients from the SMA and give them to geom_abline() the line extends over the entire plot instead of clipping to the data. The natural solution to this would be use a geom_segment() instead, manually calculating the endpoints of the line. However, when I do this the lines don't match each other and neither match the SMA fit. What's going on here?
I am aware that you can use the plot function directly on an sma object but I would prefer to use ggplot
Note: this is my first time asking a question so my apologies if I'm missing something!
Edit: I am using a log-log axis, which I suspect may be part of the issue.
Reproducible version below:
library(tidyverse)
library(smatr) #for the SMA
# sample data set
x <- rlnorm(100, meanlog = 10)
var <- rlnorm(100, meanlog = 10)
df <- data.frame(x=x, y=x+var)
# fit using an SMA
sm <- sma(x~y, data = df, log = "xy")
# get sma coefficients into a data.frame
bb <- data.frame(coef(sm))
bb <- bb %>%
rownames_to_column(var = "Coef") %>%
pivot_wider(names_from = "Coef", values_from = "coef.sm.")
## calculate end coordinates for segment
bb$min_x <- min(df$x, na.rm = TRUE)
bb$max_x <- max(df$x, na.rm = TRUE)
bb <- bb %>%
mutate(min_y = (slope*min_x) + elevation) %>%
mutate(max_y = (slope*max_x) + elevation)
# plot into ggplot
p1 <- ggplot(df, aes(x=x, y=y)) +
geom_point(shape=21) +
scale_y_continuous(trans = 'log10')+
scale_x_continuous(trans = 'log10') +
geom_abline(data=bb,aes(intercept=elevation,slope=slope), color = "blue")
p1 + geom_segment(data=bb, aes(x=min_x, xend=max_x, y=min_y, yend=max_y), color = "orange")
#this is the plot from the smatr package for comparison
plot(sm)

Use gganimate to display calculation of tweened data

I would like to use gganimate to:
Graph two separate curved lines with geom_path
Call a function that performs a calculation with the data from those lines and returns a single coordinate (x, y)
Plot that coordinate as a geom_point
Move the lines around, with the geom_point updating as the lines move
This is simple if the movement is such that the single (x, y) coordinate moves linearly (just calculate it at each stage ahead of time and then animate it, it will move linearly from each stage to the next), but if it's not I'm not sure what to do. If I call a function within aes(), which seems like the natural solution, it seems to calculate it once at the beginning and then not update it as the lines move.
Here is an example.
library(tidyverse)
library(gganimate)
# A function to find the x and y coordinate of the minimum y value of either set
min_of_both <- function(x1, y1, x2, y2) {
cm <- bind_rows(tibble(x = x1, y = y1),
tibble(x = x2, y = y2))
return(list(x = cm[which(cm$y == min(cm$y)),]$x,
y = min(cm$y)))
}
# Create two parabola paths, curve A which moves downwards from t = 1 to t = 2
curveA <- tibble(xA = -50:50/10, yA = 5+(-50:50/10)^2, t = 1) %>%
bind_rows(tibble(xA = -50:50/10, yA = -10 + (-50:50/10)^2, t = 2))
# And curve B which is static in both time 1 and 2
curveB <- tibble(xB = -50:50/10, yB = 1 + (-30:70/10)^2)
data <- curveB %>%
bind_rows(curveB) %>%
bind_cols(curveA)
# Plot Curve A
p <- ggplot(data, aes(x = xA, y = yA)) +
geom_path(color = 'red') +
# And Curve B
geom_path(aes(x=xB,y=yB), color = 'blue')+
# Then plot a single point that uses both curves as input
# Note I also get problems if trying to run the function through data= instead of mapping=
# or if I define two separate functions, one for x and one for y, so as to avoid $
geom_point(aes(
x = min_of_both(xA,yA,xB,yB)$x,
y = min_of_both(xA,yA,xB,yB)$y),
size = 3,
color = 'black') +
theme_minimal()+
transition_states(t)+
ease_aes('sine-in-out')
animate(p)
This results in (not sure if the animation will play on StackOverflow but the parabola does indeed move):
The black dot is intended to mark the lowest y-coordinate on either parabola at each moment, but instead it marks the lowest y-coordinate on either parabola at any point in the animation (at the end).
Any tips appreciated.
After a lot of head-scratching I think I've understood your point and have found one solution. The best way forward might be to manually tween the paths and calculate the min values using your function whilst grouping by .frame before plotting:
# Same curve setup, but labelling points for grouping later
curveA <- tibble(xA = -50:50/10,
yA = 5+(-50:50/10)^2,
point = 1:101,
t = 1) %>%
bind_rows(tibble(xA = -50:50/10,
yA = -10 + (-50:50/10)^2,
point = 1:101,
t = 2))
curveB <- tibble(xB = -50:50/10,
yB = 1 + (-30:70/10)^2,
point = 1:101,
t = 1)
A_frames <- curveA %>%
tween_along(ease = 'sine-in-out', 100, along = t, id = point) %>%
filter(.phase == "transition") %>%
select(xA, yA, point, .frame) %>%
arrange(.frame, point) # arrange by point needed to keep in order
B_frames <- curveB %>%
bind_rows(curveB %>% mutate(t = 2)) %>%
tween_along(ease = 'sine-in-out', 100, along = t, id = point) %>%
filter(.phase == "transition") %>%
select(xB, yB, point, .frame) %>%
arrange(.frame, point)
data <- A_frames %>%
left_join(B_frames, by = c(".frame", "point")) %>%
group_by(.frame) %>%
mutate(xmin = min_of_both(xA,yA,xB,yB)$x,
ymin = min_of_both(xA,yA,xB,yB)$y)
# Plot Curve A
p <- ggplot(data, aes(x = xA, y = yA)) +
geom_path(color = 'red') +
# And Curve B
geom_path(aes(x=xB,y=yB), color = 'blue')+
# Then plot a single point that uses both curves as input
# Note I also get problems if trying to run the function through data= instead of mapping=
# or if I define two separate functions, one for x and one for y, so as to avoid $
geom_point(aes(xmin, ymin),
size = 3,
color = 'black') +
theme_minimal()+
transition_states(.frame)+
ease_aes('sine-in-out')
animate(p, fps = 24)

Transform angles from polar coordinates

I am in need of some help with angles.
Calculating bearings with the package fossil and the function fossil::earth.bear we obtain, according to the help,
"the bearing in degrees clockwise from True North between any two points on the globe".
I have a vector of angles obtained with this function, and I need to transform them so the origin (0º) is on the x axis (East) and the angles increment counterclockwise.
Basically I need a way to rotate 90º clockwise my angles (so the 0º will be on the x axis "facing East") and then calculate the angle in the opposite direction (counterclockwise).
Intuitively, I've tried adding 90º to my bearings (to rotate clockwise) and then substracting them from 360 (to calculate the angle in the "opposite direction").
However, it does not work and I highly suspect that there is a different thing to do for each quadrant, but I just can't figure it out.
Below a test with dummy data and polar histograms to prove that the solution is not working as the resulting vector bearings2is not equivalent to the starting vector bearings
# Generate vector with 100 random values between 0 and 360
set.seed(123)
bearing <- runif(100, 0,360)
# generate a histogram with values binned every 5º
breaks = seq(0, 360, by=5)
bearing.cut = cut(bearing, breaks, right=FALSE)
bearing.freq = as.data.frame(table(bearing.cut))
bearing.freq$bearing.cut <- seq(5,360, by = 5)
#plot with ggplot
library(ggplot2)
ggplot(bearing.freq, aes(x = bearing.cut, y = Freq)) +
coord_polar(theta = "x", start = 0 direction = 1) + #start 0 for north, direction 1 for cloclwise
geom_bar(stat = "identity") +
scale_x_continuous(breaks = seq(0, 360, 5))
This is the plot that this creates
Now I perform the mentioned operations in my bearing vector
bearing2 <- 360-(bearing-90)
# repeat the process to generate freq table and plot
breaks = seq(0, 360, by=5)
bearing.cut2 = cut(bearing2, breaks, right=FALSE)
bearing.freq2 = as.data.frame(table(bearing.cut2))
bearing.freq2$bearing.cut <- seq(5,360, by = 5)
#plot with ggplot
library(ggplot2)
ggplot(bearing.freq2, aes(x = bearing.cut2, y = Freq)) +
coord_polar(theta = "x", start = -pi/2, direction = -1) + # now start at E and counterclockwise
geom_bar(stat = "identity") +
scale_x_continuous(breaks = seq(0, 360, 5))
And this is the plot that this generates. Clearly, if my conversion was correct, these two plots should look the same... and they don't.
** I have edited as per Gregor's suggestion (and to set a seed so it is repeatable). Looks better but we lose all angles between 0º and 90º. Which reinforces my initial idea that there's a different operation to do for each quadrant, but still can't figure it out. Still, thanks for the tip!
Ok I think I figured it out but not really sure why it works. I'll just leave it here to mark the question as answered.
The solution is that, for the first quadrant (angles between 0º and 90º, we need to calculate the complementray angle so we need 90-bearing . For the rest of the quadrants, we do what Gregor suggested(360-(bearing-90)).
Below the complete code to a reproducible example
library(ggplot2)
set.seed(123)
# 0º at North and clockwise
bearing <- runif(100, 0,360)
#create histogram
breaks = seq(0, 360, by=5) # half-integer sequence
bearing.cut = cut(bearing, breaks, right=FALSE)
bearing.freq = as.data.frame(table(bearing.cut))
bearing.freq$bearing.cut <- seq(5,360, by = 5)
#plot
p1 <- ggplot(bearing.freq, aes(x = bearing.cut, y = Freq)) +
coord_polar(theta = "x", start =0, direction = 1) +
geom_bar(stat = "identity") +
scale_x_continuous(breaks = seq(0, 360, 5))
# transform to 0º at E and counterclockwise
bearing2 <- ifelse(bearing <=90, (90-bearing), (360 - (bearing - 90)))
#create histogram
bearing.cut2 = cut(bearing2, breaks, right=FALSE)
bearing.freq2 = as.data.frame(table(bearing.cut2))
bearing.freq2$bearing.cut <- seq(5,360, by = 5)
# plot
p2 <- ggplot(bearing.freq2, aes(x = bearing.cut, y = Freq)) +
coord_polar(theta = "x", start = -pi/2, direction = -1) +
geom_bar(stat = "identity") +
scale_x_continuous(breaks = seq(0, 360, 5))
require(gridExtra)
grid.arrange(p1, p2, ncol=2)

Draw circle in date axis with ggplot2

Now I draw plot with ggplot2.
I want to draw circle in my plot.
So I searched it and found the solutions.
Draw a circle with ggplot2
However I can't use this solution, because my plot's x axis is Date format.
my_plot <- qplot(Day, value, data = target_data_melt, shape = variable, colour = variable, geom="line")
my_plot <- my_plot + scale_x_date(labels = date_format("%Y-%m"))
How can I draw a circle in my plot?
Is there any way to draw a circle in Date axis?
target_data_melt looks like this.
Day variable value
1 2010-10-01 231 0.007009346
2 2010-10-03 231 0.005204835
3 2010-10-05 231 0.006214004
You can adapt the code from the link you provided to format the x-coordinate as Date:
require("date")
circle <- function(center_Date = as.Date("2012-11-24"),
center_y = 0,
r.x = 100,
r.y = 100,
npoints = 100) {
cycle <- seq(0,2*pi,length.out = npoints)
xx <- center_Date + r.x * cos(cycle)
yy <- center_y + r.y * sin(cycle)
return(data.frame(x = xx, y = yy))
}
And a demonstration:
df <- circle()
plot <- ggplot(df, aes(x, y)) + geom_path()
plot(plot)
Example image (with an adjusted date and y-center) here.
You'll have to set the r.x and r.y properly to get a perfect circle (rather than an oval). What these should be depends on the scales you use in your plots.

Resources