Plot the convex hull in FDA plot - R - r

I am trying to add a convex hull for each group in this plot using ggpubr package? Why it does not work?
Code:
library(dplyr)
library(MASS)
library(ggplot2)
library(scales)
library(ggpubr)
library(data.table)
irisfda <- fda(Species ~ ., data = iris, method = mars)
df1 <- cbind(data.frame(irisfda$fit$fitted.values), species = iris[,"Species"])
ggplot(df1) +
geom_point(aes(X1, X2, color = species, shape = species), size = 2.5) +
labs(x = "FDA1",y = "FDA1") +
stat_chull(aes(color = species, fill = species), geom = "polygon", alpha = 0.1)

You haven't told stat_chull where the x and y points are. You told geom_point where they were, but geoms and stats don't inherit from each other when you add them to a plot. You can either just add the x and y co-ordinates to stat_chull or, better yet, add them to the ggplot call. Then stat_chull can inherit them, and you can save on some typing.
Incidentally, you used library calls for dplyr, MASS, scales and data.table, which aren't needed for this example, but you forgot to put the library call for mda, which is needed:
library(ggplot2)
library(ggpubr)
library(mda)
irisfda <- fda(Species ~ ., data = iris, method = mars)
df1 <- cbind(data.frame(irisfda$fit$fitted.values), species = iris[,"Species"])
ggplot(df1, aes(x = X1, y = X2, color = species, shape = species)) +
geom_point(size = 2.5) +
labs(x = "FDA1",y = "FDA1") +
stat_chull(geom = "polygon", alpha = 0.1)

Related

ggplotly: unable to add a frame in PCA score plot in ggplot2

I would like to make a PCA score plot using ggplot2, and then convert the plot into interactive plot using plotly.
What I want to do is to add a frame (not ellipse using stat_ellipse, I know it worked).
My problem is that when I try to use sample name as tooltip in ggplotly, the frame will disappear. I don't know how to fix it.
Below is my code
library(ggplot2)
library(plotly)
library(dplyr)
## Demo data
dat <- iris[1:4]
Group <- iris$Species
## Calculate PCA
df_pca <- prcomp(dat, center = T, scale. = FALSE)
df_pcs <- data.frame(df_pca$x, Group = Group)
percentage <-round(df_pca$sdev^2 / sum(df_pca$sdev^2) * 100, 2)
percentage <-paste(colnames(df_pcs),"(", paste(as.character(percentage), "%", ")", sep = ""))
## Visualization
Sample_Name <- rownames(df_pcs)
p <- ggplot(df_pcs, aes(x = PC1, y = PC2, color = Group, label = Sample_Name)) +
xlab(percentage[1]) +
ylab(percentage[2]) +
geom_point(size = 3)
ggplotly(p, tooltip = "label")
Until here it works! You can see that sample names can be properly shown in the ggplotly plot.
Next I tried to add a frame
## add frame
hull_group <- df_pcs %>%
dplyr::mutate(Sample_Name = Sample_Name) %>%
dplyr::group_by(Group) %>%
dplyr::slice(chull(PC1, PC2))
p2 <- p +
ggplot2::geom_polygon(data = hull_group, aes(fill = Group), alpha = 0.1)
You can see that the static plot still worked! The frame is properly added.
However, when I tried to convert it to plotly interactive plot. The frame disappeared.
ggplotly(p2, tooltip = "label")
Thanks a lot for your help.
It works if you move the data and mapping from the ggplot() call to the geom_point() call:
p2 <- ggplot() +
geom_point(data = df_pcs, mapping = aes(x = PC1, y = PC2, color = Group, label = Sample_Name), size = 3) +
ggplot2::geom_polygon(data = hull_group, aes(x = PC1, y = PC2, fill = Group, group = Group), alpha = 0.2)
ggplotly(p2, tooltip = "label")
You might want to change the order of the geom_point and geom_polygon to make sure that the points are on top of the polygon (this also affects the tooltip location).

Is there a way to add a legend to refer to residuals in a effect plot?

I'm looking to add a legend to my plot, for the moment the code I wrote is:
plot(allEffects(covid.lm, residuals=T), # plot with countries on graph
band.colors="grey2",
residuals.color=adjustcolor("steelblue3",alpha.f=0.5),
residuals.pch=16, smooth.residuals=F,
id = list(n=length(d$COUNTRY), cex=0.5))
Basically, I added numbers to the points in the plot (for which I created a linear model covid.lm, that done I'd need to add a legend for those points (that is a list of countries). Thanks in advance.
library(ggplot2)
data(iris)
m <- lm(Petal.Length ~ Sepal.Length, data = iris)
iris$Fitted <- predict(m)
iris$Species_num <- as.numeric(iris$Species)
ggplot(iris, aes(x = Petal.Length, y = Fitted)) +
geom_point(aes(color = as.factor(Species_num))) +
geom_text(aes(label = Species_num), hjust = 1.1, vjust = 1.1) +
labs(title = "Residuals", x = "Observed", y = "Fitted") +
guides(color=guide_legend(title="New Legend Title"))
Created on 2022-04-08 by the reprex package (v2.0.1)

draw line on geom_density_ridges

I am trying to draw a line through the density plots from ggridges
library(ggplot2)
library(ggridges)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(rel_min_height = 0.01)
Indicating the highest point and label the value of x at that point. Something like this below. Any suggestions on accomplishing this is much appreciated
One neat approach is to interrogate the ggplot object itself and use it to construct additional features:
# This is the OP chart
library(ggplot2)
library(ggridges)
gr <- ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(rel_min_height = 0.01)
Edit: This next part has been shortened, using purrr::pluck to extract the whole data part of the list, instead of manually specifying the columns we'd need later.
# Extract the data ggplot used to prepare the figure.
# purrr::pluck is grabbing the "data" list from the list that
# ggplot_build creates, and then extracting the first element of that list.
ingredients <- ggplot_build(gr) %>% purrr::pluck("data", 1)
# Pick the highest point. Could easily add quantiles or other features here.
density_lines <- ingredients %>%
group_by(group) %>% filter(density == max(density)) %>% ungroup()
# Use the highest point to add more geoms
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(rel_min_height = 0.01) +
geom_segment(data = density_lines,
aes(x = x, y = ymin, xend = x,
yend = ymin+density*scale*iscale)) +
geom_text(data = density_lines,
aes(x = x, y = ymin + 0.5 *(density*scale*iscale),
label = round(x, 2)),
hjust = -0.2)

ggplot mixture model R

I have a dataset with numeric values and a categorical variable. The distribution of the numeric variable differs for each category. I want to plot "density plots" for each categorical variable so that they are visually below the entire density plot.
This is similiar to components of a mixture model without calculating the mixture model (as I already know the categorical variable which splits the data).
If I take ggplot to group according to the categorical variable, each of the four densities are real densities and integrate to one.
library(ggplot2)
ggplot(iris, aes(x = Sepal.Width)) + geom_density() + geom_density(aes(x = Sepal.Width, group = Species, colour = 'Species'))
What I want is to have the densities of each category as a sub-density (not integrating to 1). Similiar to the following code (which I only implemented for two of the three iris species)
myIris <- as.data.table(iris)
# calculate density for entire dataset
dens_entire <- density(myIris[, Sepal.Width], cut = 0)
dens_e <- data.table(x = dens_entire[[1]], y = dens_entire[[2]])
# calculate density for dataset with setosa
dens_setosa <- density(myIris[Species == 'setosa', Sepal.Width], cut = 0)
dens_sa <- data.table(x = dens_setosa[[1]], y = dens_setosa[[2]])
# calculate density for dataset with versicolor
dens_versicolor <- density(myIris[Species == 'versicolor', Sepal.Width], cut = 0)
dens_v <- data.table(x = dens_versicolor[[1]], y = dens_versicolor[[2]])
# plot densities as mixture model
ggplot(dens_e, aes(x=x, y=y)) + geom_line() + geom_line(data = dens_sa, aes(x = x, y = y/2.5, colour = 'setosa')) +
geom_line(data = dens_v, aes(x = x, y = y/1.65, colour = 'versicolor'))
resulting in
Above I hard-coded the number to reduce the y values. Is there any way to do it with ggplot? Or to calculate it?
Thanks for your ideas.
Do you mean something like this? You need to change the scale though.
ggplot(iris, aes(x = Sepal.Width)) +
geom_density(aes(y = ..count..)) +
geom_density(aes(x = Sepal.Width, y = ..count..,
group = Species, colour = Species))
Another option may be
ggplot(iris, aes(x = Sepal.Width)) +
geom_density(aes(y = ..density..)) +
geom_density(aes(x = Sepal.Width, y = ..density../3,
group = Species, colour = Species))

Most succinct way to label/annotate extreme values with ggplot?

I'd like to annotate all y-values greater than a y-threshold using ggplot2.
When you plot(lm(y~x)), using the base package, the second graph that pops up automatically is Residuals vs Fitted, the third is qqplot, and the fourth is Scale-location. Each of these automatically label your extreme Y values by listing their corresponding X value as an adjacent annotation. I'm looking for something like this.
What's the best way to achieve this base-default behavior using ggplot2?
Updated scale_size_area() in place of scale_area()
You might be able to take something from this to suit your needs.
library(ggplot2)
#Some data
df <- data.frame(x = round(runif(100), 2), y = round(runif(100), 2))
m1 <- lm(y ~ x, data = df)
df.fortified = fortify(m1)
names(df.fortified) # Names for the variables containing residuals and derived qquantities
# Select extreme values
df.fortified$extreme = ifelse(abs(df.fortified$`.stdresid`) > 1.5, 1, 0)
# Based on examples on page 173 in Wickham's ggplot2 book
plot = ggplot(data = df.fortified, aes(x = x, y = .stdresid)) +
geom_point() +
geom_text(data = df.fortified[df.fortified$extreme == 1, ],
aes(label = x, x = x, y = .stdresid), size = 3, hjust = -.3)
plot
plot1 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid)) +
geom_point() + geom_smooth(se = F)
plot2 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid, size = .cooksd)) +
geom_point() + scale_size_area("Cook's distance") + geom_smooth(se = FALSE, show_guide = FALSE)
library(gridExtra)
grid.arrange(plot1, plot2)

Resources