Exclude a particular area from geom_smooth fit automatically

Exclude a particular area from geom_smooth fit automatically - r

I am plotting different plots in my shiny app.
By using geom_smooth(), I am fitting a smoothing curve on a scatterplot.
I am plotting these plots with ggplot() and rendering with ggplotly().
Is there any way, I can exclude a particular data profile from geom_smooth().
For e.g.:
It can be seen in the fit, the fit is getting disturbed and which is not desirable. I have tried plotly_click(), plotly_brush(), plotly_select(). But, I don't want user's interference when plotting this fit, this makes the process much slower and inaccurate.
Here is my code to plot this:
#plot
g <- ggplot(data = d_f4, aes_string(x = d_f4$x, y = d_f4$y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1)+
geom_smooth(formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
Unfortunately, I can not include my dataset in my question, because the dataset is quite big.

You can make an extra data.frame without the "outliers" and use this as the input for geom_smooth:
set.seed(8)
test_data <- data.frame(x = 1:100)
test_data$y <- sin(test_data$x / 10) + rnorm(100, sd = 0.1)
test_data[60:65, "y"] <- test_data[60:65, "y"] + 1
data_plot <- test_data[-c(60:65), ]
library(ggplot2)
ggplot(data = test_data, aes(x = x, y = y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1) +
geom_smooth(formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
ggplot(data = test_data, aes(x = x, y = y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1) +
geom_smooth(data = data_plot, formula = y ~ splines::bs(x, df = 10), method = "lm", color = "green3", level = 1, size = 1)
Created on 2020-11-27 by the reprex package (v0.3.0)
BTW: you don't need aes_string (which is deprecated) and d_f4$x, you can just use aes(x = x)

Related

Fit grouped curves by label in ggplot2

While making a nomogram of Remotion related to Depth and Time of sedimentation, I need to fit curves (as paraboles) to remotion labels if they are lower than its upper ten (7 ceils to 10, and 18 to 20). This is very close to what I need.
data.frame(
depth=rep(seq(0.5, 3.5, 0.5), each=8),
time=rep(seq(0, 280, 40), times=7),
ss = c(
820,369,238,164,107,66,41,33,
820,224,369,279,213,164,115,90,
820,631,476,361,287,230,180,148,
820,672,558,426,353,287,238,187,
820,713,590,492,402,344,262,230,
820,722,615,533,460,394,320,262,
820,738,656,574,492,418,360,303)
) %>%
transmute(
depth = depth,
time = time,
R = 100*(1- ss/820)
) %>%
mutate(G=factor(round(R, digits=-1))) %>%
ggplot(aes(x=time, y=depth, colour=time))+
geom_label(aes(label=round(R)))+
scale_y_continuous(trans = "reverse")+
geom_path(aes(group=G))
But it is not getting parabolical curves. How can I smooth them under the tens condition?

I'm not sure if this is what you're looking for. I separated the data and the plot and applied stat_smooth for each group. Unfortunately, the smoothed lines do not follow the color scheme. You will also see several warnings do to the method in which this creates the splines.
plt <- ggplot(df1, aes(x=time, y=depth, colour = time)) +
geom_label(aes(label=round(R))) +
scale_y_continuous(trans = "reverse") +
geom_path(aes(group=G), size = .6, alpha = .5)
lapply(1:length(unique(df1$G)),
function(i){
df2 <- df1 %>% filter(G == unique(G)[i])
plt <<- plt +
stat_smooth(data = df2, size = .5,
aes(x = time, y = depth),
se = F, method = lm, color = "darkred",
formula = y ~ splines::bs(x, knots = nrow(df2)))
})
You can extend this further with additional parameters. I'm just not sure exactly what you're expecting.
plt <- ggplot(df1, aes(x=time, y=depth, colour = time)) +
geom_label(aes(label=round(R))) +
scale_y_continuous(trans = "reverse") +
geom_path(aes(group=G), size = .6, alpha = .5)
lapply(1:length(unique(df1$G)),
function(i){
df2 <- df1 %>% filter(G == unique(G)[i])
# u <- df1 %>% {nrow(unique(.[,c(1:2)]))}
plt <<- plt +
stat_smooth(
data = df2, size = .5,
aes(x = time, y = depth),
se = F, method = lm, color = "darkred",
formula = y ~ splines::bs(x, knots = nrow(df2),
degree = ifelse(nrow(df2) <= 4,
3, nrow(df2) - 2)))
})

How to change font size of R^2 on scatterplot?

I have created a scatterplot and have included my R^2 value on the figure. However, I want to reduce the text size of the R^2 value but cant seem to work out how to do it. My code is below.
ggplot(Gully, aes(x = Downstream, y = Depth))+
geom_point(size = 0.5)+
stat_smooth(method= "lm", col = "black", sixe = 0.5) +
theme_bw()+
theme_classic()+
stat_regline_equation(label.y = -7, aes(label = ..rr.label.., size = 4))+
labs(y = "Decline in waterhole depth (m)", x = "Downstream distance (km)")+
theme(text=element_text(size=8, family = "Arial"))
Any suggestions would be great.
Thakyou
Marita

You probably don't want to map a variable with level "4" to the size aesthetic, which you do if you put size = 4 in aes().
You can simply set a size = 4 for the text size if you set the argument inside stat_regline_equation but outside of aes().
In absence of any minimal example data in your question, here comes an example from the stat_regline_equation help page.
library(ggplot2)
library(ggpubr)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x, y, group = c("A", "B"),
y2 = y * c(0.5,2), block = c("a", "a", "b", "b"))
# Fit polynomial regression line and add labels
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(my.data, aes(x, y2, color = group)) +
geom_point() +
stat_smooth(aes(fill = group, color = group), method = "lm", formula = formula) +
stat_regline_equation(
aes(label = paste(..eq.label.., ..adj.rr.label.., sep = "~~~~")),
formula = formula, size = 8 ## size argument outside of aes()
) +
theme_bw()
Created on 2021-09-22 by the reprex package (v2.0.1)

Fitting Rayleigh in R

This code
library(ggplot2)
library(MASS)
# Generate gamma rvs
x <- rgamma(100000, shape = 2, rate = 0.2)
den <- density(x)
dat <- data.frame(x = den$x, y = den$y)
ggplot(data = dat, aes(x = x, y = y)) +
geom_point(size = 3) +
theme_classic()
# Fit parameters (to avoid errors, set lower bounds to zero)
fit.params <- fitdistr(estimate, "gamma", lower = c(0, 0))
# Plot using density points
ggplot(data = dat, aes(x = x,y = y)) +
geom_point(size = 3) +
geom_line(aes(x=dat$x, y=dgamma(dat$x,fit.params$estimate["shape"], fit.params$estimate["rate"])),
color="red", size = 1) +
theme_classic()
fits and plots the distribution of series x. The resulting plot is:
Packages stats and MASS seem not to support the Rayleigh distribution. How can I extend the previous code to the Rayleigh distribution?

In the code below I start by recreating the vector x, this time setting the RNG seed, in order to make the results reproducible. Then a data.frame dat with only that vector is also recreated.
The density functions of the Gamma and Rayleigh distributions are fit to the histogram of x by first estimating their parameters and with stat_function.
library(ggplot2)
library(MASS)
library(extraDistr) # for the Rayleigh distribution functions
# Generate gamma rvs
set.seed(2020)
x <- rgamma(100000, shape = 2, rate = 0.2)
dat <- data.frame(x)
# Fit parameters (to avoid errors, set lower bounds to zero)
fit.params <- fitdistr(dat$x, "gamma", lower = c(0, 0))
ggplot(data = dat, aes(x = x)) +
geom_histogram(aes(y = ..density..), bins = nclass.Sturges(x)) +
stat_function(fun = dgamma,
args = list(shape = fit.params$estimate["shape"],
rate = fit.params$estimate["rate"]),
color = "red", size = 1) +
ggtitle("Gamma density") +
theme_classic()
fit.params.2 <- fitdistrplus::fitdist(dat$x, "rayleigh", start = list(sigma = 1))
fit.params.2$estimate
ggplot(data = dat, aes(x = x)) +
geom_histogram(aes(y = ..density..), bins = nclass.Sturges(x)) +
stat_function(fun = drayleigh,
args = list(sigma = fit.params.2$estimate),
color = "blue", size = 1) +
ggtitle("Rayleigh density") +
theme_classic()
To plot points and lines like in the question, not histograms, use the code below.
den <- density(x)
orig <- data.frame(x = den$x, y = den$y)
ggplot(data = orig, aes(x = x)) +
geom_point(aes(y = y), size = 3) +
geom_line(aes(y = dgamma(x, fit.params$estimate["shape"], fit.params$estimate["rate"])),
color="red", size = 1) +
geom_line(aes(y = drayleigh(x, fit.params.2$estimate)),
color="blue", size = 1) +
theme_classic()

Different color scale for geom_point and geom_smooth on ggplot

I am trying to plot observations and their grouped regression lines with ggplot as follows:
ggplot(df, aes(x = cabpol.e, y = pred.vote_share, color = coalshare)) +
geom_point() +
scale_color_gradient2(midpoint = 50, low="blue", mid="green", high="red") +
geom_smooth(aes(x = cabpol.e, y = pred.vote_share, group=coalshare1, fill = coalshare1), se = FALSE, method='lm') +
scale_fill_manual(values = c(Junior="blue", Medium="green", Senior="red"))
The problem is that the lines from geom_smooth are all the same color. I tried using scale_fill_manual so that there aren't two different color scales, and manually determining which color corresponds to each group. but instead all the lines appear blue. How can I make each line a different color?
As requested, here is a set of replicable data with the same problem:
set.seed(1000)
dff <- data.frame(x=rnorm(100, 0, 1),
y=rnorm(100, 1, 2),
z=seq(1, 100, 1),
g=rep(c("A", "B"), 50))
ggplot(dff, aes(x = x, y = y, color = z, group = g, fill = g)) +
geom_point() +
scale_color_gradient2(midpoint = 50, low="blue", high="red") +
geom_smooth(se = FALSE, method='lm')

My solution to this problem would be to create multiple geom_smooth calls, and each time subset the data for the desired factor level. This way you are able to pass a different color to each call of geom_smooth. As long as you do not have many factors, this solution is not terribly inefficient.
dff <- data.frame(x=rnorm(100, 0, 1),
y=rnorm(100, 1, 2),
z=seq(1, 100, 1),
g=rep(c("A", "B"), 50))
ggplot(dff, aes(x = x, y = y,
color = z,
group = g)) +
geom_point() +
scale_color_gradient2(midpoint = 50, low="blue", high="red") +
geom_smooth(
aes(x = x, y =y),
color = "red",
method = "lm",
data = filter(dff, g == "A"),
se = FALSE
) +
geom_smooth(
aes(x = x, y =y),
color = "blue",
method = "lm",
data = filter(dff, g == "B"),
se = FALSE
)

Group-trends between the x and y variables can be plotted by using different dataframes for the geom_line (with predicted values) and geom_point (with raw data) functions. Make sure to determine in the ggplot() function that color is always the same variable, and then for geom_line group by the same variable.
p2 <- ggplot(NULL, aes(x = cabpol.e, y = vote_share, color = coalshare)) +
geom_line(data = preds, aes(group = coalshare, color = coalshare), size = 1) +
geom_point(data = df, aes(x = cabpol.e, y = vote_share)) +
scale_color_gradient2(name = "Share of Seats\nin Coalition (%)",
midpoint = 50, low="blue", mid = "green", high="red") +
xlab("Ideological Differences on State/Market") +
ylab("Vote Share (%)") +
ggtitle("Vote Share Won by Coalition Parties in Next Election")

How to plot three or more variables in a single scatterplot with automated equations?

I want to plot 2 variables from 3 different dataframes in one scatterplot and also plot the equations of each linear relationship automatically. I am using the following code. However I have two problems:
I get the plots for the same values and not for the whole range (e.g. df1 =700 values, df2= 350 values, df3=450 values). What is the role of omitting the NA? Because I tried that both ways and I still get the same plot
I can only add the equations as a text which means to run the lm function and then add the relathionship manually in the plot. I need to do that automatically.
The code that I am using is:
ggplot(df1, aes(x=noxppb, y=OX, colour = "red")) +
geom_point(colour = "red", shape=2) + # Use hollow circles
geom_smooth(method=lm, se = FALSE) +
geom_point(data = df1, aes(x=noxppb, y=OX)) +
geom_point(colour = "blue", shape=3) +
geom_smooth(method = lm, se = F, colour = "blue", data = df2, aes(x=noxppb, y=OX)) +
geom_point(colour = "green", shape=4) +
geom_smooth(method = lm, se = F, colour = "green", data = df3, aes(x=noxppb, y=OX))
I get the following image:
However I Need something similar to this:

try this,
d <- plyr::mdply(data.frame(a=c(1,2,3), b=c(-1,0,1)),
function(a,b) data.frame(x=seq(0,10), y=jitter(a*seq(0,10)+b)))
equationise = function(d, ...){
m = lm(y ~ x, d)
eq <- substitute(italic(y) == a + b %.% italic(x),
list(a = format(coef(m)[1], ...),
b = format(coef(m)[2], ...)))
data.frame(x = Inf, y = d$y[nrow(d)],
label = as.character(as.expression(eq)),
stringsAsFactors = FALSE)
}
eqs <- plyr::ddply(d, "a", equationise, digits = 2)
ggplot(d, aes(x=x, y=y, colour = factor(a))) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_label(data=eqs, aes(label = label), parse=TRUE, hjust=1)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Exclude a particular area from geom_smooth fit automatically - r

Related

Fit grouped curves by label in ggplot2

How to change font size of R^2 on scatterplot?

Fitting Rayleigh in R

Different color scale for geom_point and geom_smooth on ggplot

How to plot three or more variables in a single scatterplot with automated equations?

Categories

Resources