Rsample - nested_cv (ggplot2 - geom_tile) -> graphical visualization of nested CV - r

I am working on a paper where I am using nested-cross-validation. I am keen to present a graphical representation of such.
For the data partition, I am using the package rsplit. #Topepo presents a great vignette here: https://tidymodels.github.io/rsample/reference/tidy.rsplit.html
#Nested Group
library(rsample)
theme_set(theme_bw())
library(rsample)
library(patchwork)
cv <- nested_cv(iris, outside = group_vfold_cv(data = iris, group = "Species", v = 3), inside = rsample::bootstraps(times = 5, apparent = F))
tidy_cv<-tidy(cv)
innerplot<- ggplot(tidy_cv, aes(x = inner_Resample, y = inner_Row, fill = inner_Data)) +
geom_tile() + facet_wrap(~Resample) + scale_fill_brewer()
validation<- ggplot(tidy_cv, aes(x = Resample, y = Row, fill = Data)) +
geom_tile() + facet_wrap(~Resample) + scale_fill_brewer()
validation + innerplot
However, what I am actually trying to achieve is a stack of geom_tiles. Basically, something like this: https://i.stack.imgur.com/vTSPw.png
However, it would be facet_wrap ~Resample.
Is it clear what I am trying to do?
Visual Representaiton
Cheers

Related

ggplot2 geom_qq change theoretical data

I have a set of pvalues i.e 0<=pval<=1
I want to plot qqplot using ggplot2
As in the documentation the following code will plot a q_q plot, however if my data are pvalues I want the therotical values to be also probabilites ie. 0<=therotical v<=1
df <- data.frame(y = rt(200, df = 5))
p <- ggplot(df, aes(sample = y))
p + stat_qq() + stat_qq_line()
I am aware of the qqplot.pvalues from gaston package it does the job but the plot is not as customizable as the ggplot version.
In gaston package the theoretical data are plotted as -log10((n:1)/(n + 1)) where n is number of pvalues. How to pass these values to ggplot as theoritical data?
Assuming you have some p-values, say from a normal distribution you could create it manually
library(ggplot2)
data <- data.frame(outcome = rnorm(150))
data$pval <- pnorm(data$outcome)
data <- data[order(data$pval),]
ggplot(data = data, aes(y = pval, x = pnorm(qnorm(ppoints(nrow(data)))))) +
geom_point() +
geom_abline(slope = 1) +
labs(x = 'theoraetical p-val', y = 'observed p-val', title = 'qqplot (pval-scale)')
Although I am not sure this plot is sensible to use for conclusions.

DCA : Labelling points with autoplot or ggplot2

I find very difficult to put labels for sites with a DCA in a autoplot or ggplot.
I also want to differentiate the points on the autoplot/ggplot according to their groups.
This is the data and the code I used and it went well until the command for autoplot/ggplot:
library(vegan)
data(dune)
d <- vegdist(dune)
csin <- hclust(d, method = "single")
cl <- cutree(csin, 3)
dune.dca <- decorana(dune)
autoplot(dune.dca)
This is the autoplot obtained:
I am using simple coding and I tried these codes but they didn't led me anywhere:
autoplot(dune.dca, label.size = 3, data = dune, colour = cl)
ggplot(dune.dca(x=DCA1, y=DCA2,colour=cl))
ggplot(dune.dca, display = ‘site’, pch = 16, col = cl)
ggrepel::geom_text_repel(aes(dune.dca))
If anyone has a simple suggestion, it could be great.
With the added information (package) I was able to go and dig a bit deeper.
The problem is (in short) that autoplot.decorana adds the data to the specific layer (either geom_point or geom_text). This is not inherited to other layers, so adding additional layers results in blank pages.
Basically notice that one of the 2 code strings below results in an error, and note the position of the data argument:
# Error:
ggplot() +
geom_point(data = mtcars, mapping = aes_string(x = 'hp', y = 'mpg')) +
geom_label(aes(x = hp, y = mpg, label = cyl))
# Work:
ggplot(data = mtcars) +
geom_point(mapping = aes_string(x = 'hp', y = 'mpg')) +
geom_label(aes(x = hp, y = mpg, label = cyl))
ggvegan:::autoplot.decorana places data as in the example the returns an error.
I see 2 ways to get around this problem:
Extract the layers data using ggplot_build or layer_data and create an overall or single layer mapping.
Extract the code for generating the data, and create our plot manually (not using autoplot).
I honestly think the second is simpler, as we might have to extract more information to make our data sensible. By looking at the source code of ggvegan:::autoplot.decorana (simply printing it to console by leaving out brackets) we can extract the below code which generates the same data as used in the plot
ggvegan_data <- function(object, axes = c(1, 2), layers = c("species", "sites"), ...){
obj <- fortify(object, axes = axes, ...)
obj <- obj[obj$Score %in% layers, , drop = FALSE]
want <- obj$Score %in% c("species", "sites")
obj[want, , drop = FALSE]
}
With this we can then generate any plot that we desire, with appropriate mappings rather than layer-individual mappings
dune.plot.data <- ggvegan_data(dune.dca)
p <- ggplot(data = dune.dca, aes(x = DCA1, DCA2, colour = Score)) +
geom_point() +
geom_text(aes(label = Label), nudge_y = 0.3)
p
Which gives us what I hope is your desired output

R ggnetwork: unable to change graph layout

I am trying ggnetwork and ggplot2 to plot some graph visualisation but I am unable to change the graph layout parameter that comes with the ggnetwork function. My reproducible code are as follows, and I am running this on R 4.0.3 on Ubuntu
install.packages("WDI") # this is the data source I need for this example
library(WDI)
new_wdi_cache <- WDIcache()
library(igraph)
library(tidyverse)
library(ggnetwork)
education<-WDI(indicator=c("SE.PRM.ENRR","SE.SEC.ENRR",
"SE.TER.ENRR","SE.SEC.PROG.ZS","SE.PRM.CMPT.ZS"),
start=2014,
end=2014,
extra= TRUE,
cache=new_wdi_cache)
education<-education[education$region!="Aggregates",]
education<-na.omit(education)
education.features <- education[,4:8]
education.features_scaled <-scale(education.features)
education.distance_matrix <- as.matrix(dist(education.features_scaled))
education.adjacency_matrix <- education.distance_matrix < 1.5
g1<-graph_from_adjacency_matrix(education.adjacency_matrix, mode="undirected")
new.g2<-ggnetwork(g1, layout = "kamadakawai") # LINE A
ggplot(new.g2, aes(x=x, y=y, xend=xend, yend=yend))+
geom_edges(colour="grey")+geom_nodes(size=5,aes(colour=species ))+
theme_blank()+labs(caption='WDI School enrollment and progression datasets')
On line A, I get an error that I really cannot understand:
Error: $ operator is invalid for atomic vectors
What does that mean? And if I remove the 'layout=' parameter from ggnetwork, the code runs. However I really need to change the layout.
The layout parameter doesn't take a string, but the output from a igraph::layout_ function.
So you can do:
new_g2 <- ggnetwork(g1, layout = igraph::layout.kamada.kawai(g1))
ggplot(new_g2, aes(x, y, xend = xend, yend = yend)) +
geom_edges(colour = "grey") +
geom_nodes(size = 8, aes(colour = name)) +
theme_blank() +
labs(caption = 'WDI School enrollment and progression datasets') +
theme(plot.caption = element_text(size = 16))

Using frame parameter to making a plot from ggplot to plotly

Here is my data:
data <- data.table(year = rep(1980:1985,each = 5),
Relationship = rep(c(" Acquaintance","Unknown","Wife","Stranger","Girlfriend","Friend"), 5),
N = sample(1:100, 30)
)
I can use plotly::plot_ly function to plot a Dynamic map of the years like this:
plot_ly(data
,x=~Relationship
,y=~N
,frame=~year
,type = 'bar'
)
but when I using ggplot with parameter frame ,I get a error
Error in -data$group : invalid argument to unary operator
here is my ggplot code :
p <- ggplot(data = data,aes(x =Relationship,y = N ))+
geom_bar(stat = "identity",aes(frame = year))
ggplotly(p)
Can you modify my ggplot code to produce the same graph ?
This example runs successfully using frame parameter:
data(gapminder, package = "gapminder")
gg <- ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent)) +
geom_point(aes(size = pop, frame = year)) +
scale_x_log10()
ggplotly(gg)
In case others are still looking, this does appear to be a bug related to geom_bar. Per Stéphane Laurent's GitHub report (https://github.com/ropensci/plotly/issues/1544) a workaround is to use geom_col(position = "dodge2") or geom_col(position = "identity") instead of geom_bar(stat='identity')

How to plot two distribution curves in a faceted way in R / ggplot2?

I have two probability distribution curves, a Gamma and a standarized Normal, that I need to compare:
library(ggplot2)
pgammaX <- function(x) pgamma(x, shape = 64.57849, scale = 0.08854802)
f <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun=pgammaX)
f + stat_function(fun = pnorm)
The output is like this
However I need to have the two curves separated by means of the faceting mechanism provided by ggplot2, sharing the Y axis, in a way like shown below:
I know how to do the faceting if the depicted graphics come from data (i.e., from a data.frame), but I don't understand how to do it in a case like this, when the graphics are generated on line by functions. Do you have any idea on this?
you can generate the data similar to what stat_function is doing ahead of time, something like:
x <- seq(-4,9,0.1)
dat <- data.frame(p = c(pnorm(x), pgammaX(x)), g = rep(c(0,1), each = 131), x = rep(x, 2) )
ggplot(dat)+geom_line(aes(x,p, group = g)) + facet_grid(~g)
The issue with doing facet_wrap is that the same stat_function is designed to be applied to each panel of the faceted variable which you don't have.
I would instead plot them separately and use grid.arrange to combine them.
f1 <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun = pgammaX) + ggtitle("Gamma") + theme(plot.title = element_text(hjust = 0.5))
f2 <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun = pnorm) + ggtitle("Norm") + theme(plot.title = element_text(hjust = 0.5))
library(gridExtra)
grid.arrange(f1, f2, ncol=2)
Otherwise create the data frame with y values from both pgammaX and pnorm and categorize them under a faceting variable.
Finally I got the answer. First, I need to have two data sets and attach each function to each data set, as follows:
library(ggplot2)
pgammaX <- function(x) pgamma(x, shape = 64.57849, scale = 0.08854802)
a <- data.frame(x=c(3,9), category="Gamma")
b <- data.frame(x=c(-4,4), category="Normal")
f <- ggplot(a, aes(x)) + stat_function(fun=pgammaX) + stat_function(data = b, mapping = aes(x), fun = pnorm)
Then, using facet_wrap(), I separate into two graphics according to the category assigned to each data set, and establishing a free_x scale.
f + facet_wrap("category", scales = "free_x")
The result is shown below:

Resources