Graphing a histogram overlaid with a fitted 2 parameter Weibull function - r

I would like to plot both a histogram to a fitted Weibull function on the same graph. The code to plot the histogram is:
hist(data$grddia2, prob=TRUE,breaks=5)
The code for the fitted Weibull function is:(Need the MASS package)
fitdistr(data$grddia2,densfun=dweibull,start=list(scale=1,shape=2))
How do I plot both together on the same graph. I've attached the data set.
Also, bonus to anyone who can provide code that can achieve the same thing, but create a graph for each column of data. Many columns within a data set. Would be nice to have all graphs on the same page.
https://www.dropbox.com/s/ra9c2kkk49vyyyc/Diameter%20Distribution.csv?dl=0

Here is the code
library("ggplot2")
library("dplyr")
library("tidyr")
library("MASS")
# Import dataset and filter the column "treeno"
# Use namespace dplyr:: explicitly because of conflict with MASS:: for function "select"
data <- read.csv("Diameter Distribution.csv") %>%
dplyr::select(-treeno)
# Function to provide the Weibull distribution for each column
# The distribution is calculated based on the estimated scale and shape parameters of the input
fitweibull <- function(column) {
x <- seq(0,7,by=0.01)
fitparam <- column %>%
unlist %>%
fitdistr(densfun=dweibull,start=list(scale=1,shape=2))
return(dweibull(x, scale=fitparam$estimate[1], shape=fitparam$estimate[2]))
}
# Apply function for each column then consolidate all in a data.frame
fitdata <-data %>%
apply(2, as.list) %>%
lapply(FUN = fitweibull) %>%
data.frame()
# Display graphs
multiplyingFactor<-10
ggplot() +
geom_histogram(data=gather(data), aes(x=value, group=key, fill=key), alpha=0.2) +
geom_line(data=gather(fitdata), aes(x=rep(seq(0,7,by=0.01),ncol(fitdata)), y=multiplyingFactor*value, group=key, color=key))
And the output figure
Variant: thanks to the wonderful ggplot2 package you can also have the graphs apart just by adding this final line of code
+ facet_wrap(~ key) + theme(legend.position = "none")
Which gives you this other figure:

Related

Change position of legend in plot of pec object

I am trying to plot the prediction error curve from pec package but I can't change the legend position and size. There's an example from pec package:
library(rms)
library(pec)
data(pbc)
pbc <- pbc[sample(1:NROW(pbc),size=100),]
f1 <- psm(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc)
f2 <- coxph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,x=TRUE,y=TRUE)
f3 <- cph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,surv=TRUE)
brier <- pec(list("Weibull"=f1,"CoxPH"=f2,"CPH"=f3),data=pbc,formula=Surv(time,status!=0)~1)
print(brier)
plot(brier)
But shows a big the legend in the middle of plot.
I also tried:
plot(brier, legend = "topright")
class(brier)
But don't show legend.
How can I change the position of legend? And also ¿is it posible to plot this graph using ggplot?
I think I got what you want using ggplot2. The idea is to pick elements from your brier object that contains data for the plot, make a dataframe with it and plot it.
library(ggplot2)
# packages for the pipe and pivot_wider, you can do it with base functions, I just prefer these
library(tidyr)
library(dplyr)
df <- do.call(cbind, brier[["AppErr"]]) # contains y values for each model
df <- cbind(brier[["time"]], df) # values of the x axis
colnames(df)[1] <- "time"
df <- as.data.frame(df) %>% pivot_longer(cols = 2:last_col(), names_to = "models", values_to = "values") # pivot table to long format makes it easier to use ggplot
ggplot(data = df, aes(x = time, y = values, color = models)) +
geom_line() # I suppose you know how to custom axis names etc.
Output:

ggsurvplot and ggplot lattice ?! Plotting kaplan-meier curve with cumulative incidence function

I would like to plot a kaplan meier curve (KM) and cumulative events or cumulative incidence function (CIF) in one plot as a lattice.
I have switched recently from SAS to R, and in SAS you can do it all in one step using a macro (See this image), but I couldn't find something similar in R yet.
Currently, I run a code for two separate graphs. The first plots survfit object using ggsurvplot which results in a KM curve, While the second plots a cuminc object after a number of transformations using ggplot. ggcompetingrisks was not very optimizable, so I don't use it. Also I am interested in plotting one certain competing risk for example death from cancer, and not all competing risks.
Here is an example of my current code using the BMT data-frame from the survminer package.
library(survminer)
library(cmprsk)
data(BMT)
# I'll add the variable Death to plot overall survival.
BMT <- mutate(BMT, death = ifelse (status == 1, 1, 0))
# KM plot:
figKM <- ggsurvplot(survfit(Surv(ftime, death) ~ dis, BMT))
figKM
# CIF plot:
cif <- cuminc(ftime = BMT$ftime, fstatus = BMT$status, group = BMT$dis, cencode = 0)
cifDT <- cif %>%
list_modify("Tests" = NULL) %>%
map_df(`[`, c("time", "est"), .id = "id") %>%
filter(id %in% c("0 1","1 1")) # to keep the incident I want
figCIF <- ggplot (cifDT, aes(x = time, y = est, color = id)) + geom_step(lwd = 1.2)
figCIF
is there a way to put figKM and figCIF together in a lattice plot? May by plotting them differently?
If you look at the contents of your figKM object with class and str you see that the first item in that list is a "plot", so this seems to do what you asked for in your comment:
library(cowplot)
plot_grid(figKM[[1]], figKM[[1]], nrow = 2)
I'm not a tidyverse-user so the map_df is perhaps some clone of the base function Reduce or Map but I don't have enough experience to a) know which package to load, or b) have the ability to figure out what is being done with your piped expressions. Commented code might have been more understandable. I am quite experienced with the survival package.

ggplot statistical differences in plot labels (reproducible code included)

I have a code that generates two plots (actually from different datasets) like this one:
#Plot 1
p1 <- ggplot(mtcars,aes(x=factor(cyl),fill=factor(gear)))+
geom_bar(position="fill")+
geom_text(aes(label=scales::percent(..count../sum(..count..))),
stat='count',position=position_fill(vjust=0.5))
#Plot 2
p2 <- ggplot(mtcars,aes(x=factor(cyl),fill=factor(gear)))+
geom_bar(position="fill")+
geom_text(aes(label=scales::percent(..count../sum(..count..))),
stat='count',position=position_fill(vjust=0.5))
plot <- p1 + p2
plot
Is it possible using gglpot o other library to test statistical differences among factos and if there are statistical differences among them to change the label from 25% from something like "25% ↑" or "25% ** " so what I want is to compare values and change labeling to include statistical differences. In my example values are the same but in reality plots are coming from different datasets.
As MrFlick mentioned, ggplot might not be the right tool to do the calculations. But once you have your calculations, you could do something like that
# some date with calculated levels of significance
dplyr::tibble(YEAR=rep(c(2019,2020),eac=3),
GRP=rep(c("A","B","C"),2),
VAL=c(20,100,30,25,70,30),
SIG=rep(c("*","***",""),2)) %>%
# create labels
dplyr::group_by(GRP) %>%
dplyr::mutate(LABEL=dplyr::case_when(VAL/sum(VAL)<0.5 ~ paste("<",SIG),
VAL/sum(VAL)>0.5 ~ paste(">",SIG),
TRUE ~ paste(""))) %>%
dplyr::ungroup() %>%
# calculate percentages
dplyr::group_by(YEAR) %>%
dplyr::mutate(VAL=VAL/sum(VAL)) %>%
dplyr::ungroup() %>%
# plot data: combining percentages and sig-levels as label
ggplot2::ggplot(ggplot2::aes(x=YEAR,
y=VAL,
fill=GRP,
label=glue::glue("{scales::percent(VAL)} {LABEL}"))) +
ggplot2::geom_bar(stat="identity") +
ggplot2::geom_text(position=ggplot2::position_fill(0.5))

Return variables to original values in PCA - R

I can apply PCA to the iris dataset and plot components 1 and 2 to see how the transformation separates the species.
library(tidyverse)
x <- iris[,1:4] %>% as.matrix()
pca <- prcomp(x, scale. = TRUE)
summary(pca)
as.data.frame(pca$x) %>%
mutate(Species = iris$Species) %>%
ggplot(aes(PC1,PC2, color = Species)) +
geom_point()
My problem is, since prcomp function scaled the values, how can I return them to the original values for the plot? As you can see, the scales go from negative to positive, but the sizes are not negative.
Any help will be greatly appreciated.

Generate multiple plots from generic code in R

I have a piece of code that is used to generate multiple plots from a dataset. The dataset is filtered based on the parameters required of the visualization and is plotted using ggplot.
library(ggplot2)
summary <- filter(dataframe)
plot <- ggplot(summary)
Now, I have multiple chunks of code for filtering each type of graph and each chunk has a 'summary' and 'plot'.
Is there a way I can plot multiple functions called 'plot' simultaneously? And is there a way to create a userinterface/button that when clicked will generate all the plots in one go?
If each plot is the same type of plot but with different data you could write it once an as #Victor Ordu suggests put them in a list using a loop or something like lapply or map() in the purrr package.
You could then arrange them using ggarrange from the ggpubr package.
For example:
library(tidyverse)
library(ggpubr)
# Plot function
example_plot <- function(x){
ggplot(x, aes(drat, wt)) +
geom_point()
}
# Summarise in a dataframe and plot into a list
graphs <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(graph = map(data, ~ example_plot(.x))) %>%
pull(graph)
# Plot them all!
ggarrange(graphs[[1]], graphs[[2]], graphs[[3]], ncol = 3)
Otherwise just skip straight to arranging them.

Resources