geom_qq() labeling each datapoint with the corresponding 'year' - r

library(fpp3)
library(hrbrthemes)
library(readr)
# load 'Fastenal' dataset and convert to tsibble
Fastenal <- read_csv('http://course1.winona.edu/bdeppa/FIN%20335/Datasets/Fastenal%20Sales%20(2004-2013).csv')
Fastenal %>% mutate(YearMonth = yearmonth('2004-01') + 0:119,
TotSales = Total.Sales/1000000) %>%
as_tsibble(index = YearMonth) -> Fastenal
# Fit AvSalesPD using SNAIVE model
Fastenal %>% model(SNAIVE(AvSalesPD/1000000)) -> fit
# Plot residual QQ plot, label each data points with corresponding 'Year'
fit %>% augment() %>%
ggplot(aes(sample = .resid/1000000)) +
geom_qq(aes(label = year(YearMonth), geom = 'text') + geom_qq_line()+ theme_ipsum()
I'm trying to draw QQ plot, using 'geom = "text"'.
When I run the above codes, I get error message as below;
'Error: geom_text requires the following missing aesthetics: label'
I couldn't find what I did wrong...-_-;

I think the problem is that geom_qq does not know how to use the aesthetics you are providing. By specifing geom = "text" in the function call, it should be drawing text instead of points. The problem is that there is no text supplied that the geom_text can use.
Thus we need to supply the labels. The following code should produce what you asked.
library(fpp3)
library(readr)
# load 'Fastenal' dataset and convert to tsibble
Fastenal <- read_csv('http://course1.winona.edu/bdeppa/FIN%20335/Datasets/Fastenal%20Sales%20(2004-2013).csv')
Fastenal %>% mutate(YearMonth = yearmonth('2004-01') + 0:119,
TotSales = Total.Sales/1000000) %>%
as_tsibble(index = YearMonth) -> Fastenal
# Fit AvSalesPD using SNAIVE model
Fastenal %>% model(SNAIVE(AvSalesPD/1000000)) -> fit
# Plot residual QQ plot, label each data points with corresponding 'Year'
fit <- fit %>% augment()
# here we define the labels to use.
labs <- year(fit$YearMonth)
labs <- labs[!is.na(fit$.resid)]
labs <- labs[order(na.omit(fit$.resid))] #EDIT: important! sort the labels for QQ-plots
ggplot(fit, aes(sample = .resid/1000000)) +
geom_qq(geom = "text", label = labs, angle = 90, size=1.5) +
geom_qq_line()
Note how we are manually creating the labels. We pass to geom_qq the same arguments as you passed, but outside the aes().
Since we are passing this information outside aesthetics, we have to take care that information in labs matches that of the aesthetics used.
Indeed,the points that you are using to draw the QQ plot have some missing entries (12). These are omitted, and as a result you have 108 points (not 120) in your resulting plot. This is why we have to make sure that labs has one entry for each non-missing point in your data (i.e. .resid entries different from NA).
If you do not filter NAs in the construction of labs you end up with an error and no plot.
The above code produces this plot:

Related

Change position of legend in plot of pec object

I am trying to plot the prediction error curve from pec package but I can't change the legend position and size. There's an example from pec package:
library(rms)
library(pec)
data(pbc)
pbc <- pbc[sample(1:NROW(pbc),size=100),]
f1 <- psm(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc)
f2 <- coxph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,x=TRUE,y=TRUE)
f3 <- cph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,surv=TRUE)
brier <- pec(list("Weibull"=f1,"CoxPH"=f2,"CPH"=f3),data=pbc,formula=Surv(time,status!=0)~1)
print(brier)
plot(brier)
But shows a big the legend in the middle of plot.
I also tried:
plot(brier, legend = "topright")
class(brier)
But don't show legend.
How can I change the position of legend? And also ¿is it posible to plot this graph using ggplot?
I think I got what you want using ggplot2. The idea is to pick elements from your brier object that contains data for the plot, make a dataframe with it and plot it.
library(ggplot2)
# packages for the pipe and pivot_wider, you can do it with base functions, I just prefer these
library(tidyr)
library(dplyr)
df <- do.call(cbind, brier[["AppErr"]]) # contains y values for each model
df <- cbind(brier[["time"]], df) # values of the x axis
colnames(df)[1] <- "time"
df <- as.data.frame(df) %>% pivot_longer(cols = 2:last_col(), names_to = "models", values_to = "values") # pivot table to long format makes it easier to use ggplot
ggplot(data = df, aes(x = time, y = values, color = models)) +
geom_line() # I suppose you know how to custom axis names etc.
Output:

How to add AUC to a multiple ROC graph with pROC's ggroc

I have a list of elements of the class "roc" (l_rocs) which I want to plot with ggroc from the package pROC
library("ggplot2")
library("pROC")
#inside a bigger loop
l_rocs[[names[[i]]]] <- roc(predictor=gbm.probs$Yes,
response=testing$Attrition,
levels=levels(testing$Attrition))
#loop end
ggroc(l_rocs) +
labs(color='Sampling Method'))
I now want to add the AUC for each curve. The best would be right inside the legend but I can not find a way to do it as the given element is a list.
Any advice?
Here is a solution using modified legend labels
I used some example data since no data for a reproducible example was added.
#library
library(pROC)
library(ggplot2)
library(tidyverse)
# example data
roc.list <- roc(outcome ~ s100b + ndka + wfns, data = aSAH)
#> Setting levels: control = Good, case = Poor
#> Setting direction: controls < cases
#> Setting levels: control = Good, case = Poor
#> Setting direction: controls < cases
#> Setting levels: control = Good, case = Poor
#> Setting direction: controls < cases
# extract auc
roc.list %>%
map(~tibble(AUC = .x$auc)) %>%
bind_rows(.id = "name") -> data.auc
# generate labels labels
data.auc %>%
mutate(label_long=paste0(name," , AUC = ",paste(round(AUC,2))),
label_AUC=paste0("AUC = ",paste(round(AUC,2)))) -> data.labels
# plot on a single plot with AUC in labels
ggroc(roc.list) +
scale_color_discrete(labels=data.labels$label_long)
If you have multiple ROC curves it might be better to draw a facet plot
# plot a facet plot with AUC within plots
ggroc(roc.list) +
facet_wrap(~name) +
geom_text(data = data.labels,
aes(0.5, 1,
label = paste(label_AUC)),
hjust = 1)
Created on 2021-09-30 by the reprex package (v2.0.1)

Extract critical points of a polynomial model object in R?

I am trying to solve for the inflection points of a cubic polynomial function which has been fitted to data, i.e. values of x where the first derivative is zero.
I also need a way to find the values of y at the critical points of x.
It is easy enough to fit the model using lm() and to view the model quality with summary(). And I can plot the function easily enough by adding predictions and using geom_line().
There must be a package or a base R function dedicated to this problem. Can anyone suggest a method?
Below is a reprex to depict the problem. Needless to say, the arrows are drawn only to illustrate the question; they are not mapped to the true inflection points or I would not be asking this question...
library(tidyverse)
library(modelr)
set.seed(0)
#generate random data and plot the values
df <- tibble(x= sample(x= c(-100:200), size= 50),
y= -0.5*(x^3) + 50*(x^2) + 7*(x) + rnorm(n=50, mean=10000, sd=50000) )
df %>% ggplot(aes(x, y)) +
geom_point()
# fit a model to the data
cubic_poly_model <- lm(data= df, formula = y~poly(x, 3))
# plot the fitted model
df %>%
add_predictions(model = cubic_poly_model) %>%
ggplot(aes(x, y))+
geom_point(alpha=1/3)+
geom_line(aes(x, y=pred))+
annotate('text', label= 'critical point A', x=-50, y=-250000)+
geom_segment(x=-50, xend=-10, y=-200000, yend=-5000, arrow = arrow(length=unit(3, 'mm'), type = 'closed'))+
annotate('text', label= 'critical point B', x=140, y=400000)+
geom_segment(x=110, xend=90, y=300000, yend=100000, arrow = arrow(length=unit(3, 'mm'), type = 'closed'))
# But how can I get the critical values of x and the y values they produce?
Created on 2020-09-03 by the reprex package (v0.3.0)
I devised a solution using the mosaic package . The makeFun() function allows a model object to be converted to a function. You can then use base R optimize()to find the max or min value of that function over a specified interval (in this case, the range of x values). Specify the "maximum" argument in optimize() to state whether you want the local maximum or local minimum.
See code below:
library(magrittr)
set.seed(0)
#generate random data and plot the values
df <- tibble::tibble(x= sample(x= c(-100:200), size= 50),
y= -0.5*(x^3) + 50*(x^2) + 7*(x) + rnorm(n=50, mean=10000, sd=50000) )
cubic_poly_model <- lm(data= df, formula = y~poly(x, 3))
crit_values <- cubic_poly_model %>%
mosaic::makeFun() %>%
optimize(interval = c(min(df$x), max(df$x)), maximum = TRUE)
funct_crit_x <- crit_values[['maximum']][[1]]
funct_max <- crit_values[['objective']]
funct_crit_x
funct_max

How to make plots scales the same or trun them into Log scales in ggplot

I am using this script to plot chemical elements using ggplot2 in R:
# Load the same Data set but in different name, becaus it is just for plotting elements as a well log:
Core31B1 <- read.csv('OilSandC31B1BatchResultsCr.csv', header = TRUE)
#
# Calculating the ratios of Ca.Ti, Ca.K, Ca.Fe:
C31B1$Ca.Ti.ratio <- (C31B1$Ca/C31B1$Ti)
C31B1$Ca.K.ratio <- (C31B1$Ca/C31B1$K)
C31B1$Ca.Fe.ratio <- (C31B1$Ca/C31B1$Fe)
C31B1$Fe.Ti.ratio <- (C31B1$Fe/C31B1$Ti)
#C31B1$Si.Al.ratio <- (C31B1$Si/C31B1$Al)
#
# Create a subset of ratios and depth
core31B1_ratio <- C31B1[-2:-18]
#
# Removing the totCount column:
Core31B1 <- Core31B1[-9]
#
# Metling the data set based on the depth values, to have only three columns: depth, element and count
C31B1_melted <- melt(Core31B1, id.vars="depth")
#ratio melted
C31B1_ra_melted <- melt(core31B1_ratio, id.vars="depth")
#
# Eliminating the NA data from the data set
C31B1_melted<-na.exclude(C31B1_melted)
# ratios
C31B1_ra_melted <-na.exclude(C31B1_ra_melted)
#
# Rename the columns:
colnames(C31B1_melted) <- c("depth","element","counts")
# ratios
colnames(C31B1_ra_melted) <- c("depth","ratio","percentage")
#
# Ploting the data in well logs format using ggplot2:
Core31B1_Sp <- ggplot(C31B1_melted, aes(x=counts, y=depth)) +
theme_bw() +
geom_path(aes(linetype = element))+ geom_path(size = 0.6) +
labs(title='Core 31 Box 1 Bioturbated sediments') +
scale_y_reverse() +
facet_grid(. ~ element, scales='free_x') #rasterImage(Core31Image, 0, 1515.03, 150, 0, interpolate = FALSE)
#
# View the plot:
Core31B1_Sp
I got the following image (as you can see the plot has seven element plots, and each one has its scale. Please ignore the shadings and the image at the far left):
My question is, is there a way to make these scales the same like using log scales? If yes what I should change in my codes to change the scales?
It is not clear what you mean by "the same" because that will not give you the same result as log transforming the values. Here is how to get the log transformation, which, when combined with the no using free_x will give you the plot I think you are asking for.
First, since you didn't provide any reproducible data (see here for more on how to ask good questions), here is some that gives at least some of the features that I think your data has. I am using tidyverse (specifically dplyr and tidyr) to do the construction:
forRatios <-
names(iris)[1:3] %>%
combn(2, paste, collapse = " / ")
toPlot <-
iris %>%
mutate_(.dots = forRatios) %>%
select(contains("/")) %>%
mutate(yLocation = 1:n()) %>%
gather(Comparison, Ratio, -yLocation) %>%
mutate(logRatio = log2(Ratio))
Note that the last line takes the log base 2 of the ratio. This allows ratios in each direction (above and below 1) to plot meaningfully. I think that step is what you need. you can accomplish something similar with myDF$logRatio <- log2(myDF$ratio) if you don't want to use dplyr.
Then, you can just plot that:
ggplot(
toPlot
, aes(x = logRatio
, y = yLocation) ) +
geom_path() +
facet_wrap(~Comparison)
Gives:

Graphing a histogram overlaid with a fitted 2 parameter Weibull function

I would like to plot both a histogram to a fitted Weibull function on the same graph. The code to plot the histogram is:
hist(data$grddia2, prob=TRUE,breaks=5)
The code for the fitted Weibull function is:(Need the MASS package)
fitdistr(data$grddia2,densfun=dweibull,start=list(scale=1,shape=2))
How do I plot both together on the same graph. I've attached the data set.
Also, bonus to anyone who can provide code that can achieve the same thing, but create a graph for each column of data. Many columns within a data set. Would be nice to have all graphs on the same page.
https://www.dropbox.com/s/ra9c2kkk49vyyyc/Diameter%20Distribution.csv?dl=0
Here is the code
library("ggplot2")
library("dplyr")
library("tidyr")
library("MASS")
# Import dataset and filter the column "treeno"
# Use namespace dplyr:: explicitly because of conflict with MASS:: for function "select"
data <- read.csv("Diameter Distribution.csv") %>%
dplyr::select(-treeno)
# Function to provide the Weibull distribution for each column
# The distribution is calculated based on the estimated scale and shape parameters of the input
fitweibull <- function(column) {
x <- seq(0,7,by=0.01)
fitparam <- column %>%
unlist %>%
fitdistr(densfun=dweibull,start=list(scale=1,shape=2))
return(dweibull(x, scale=fitparam$estimate[1], shape=fitparam$estimate[2]))
}
# Apply function for each column then consolidate all in a data.frame
fitdata <-data %>%
apply(2, as.list) %>%
lapply(FUN = fitweibull) %>%
data.frame()
# Display graphs
multiplyingFactor<-10
ggplot() +
geom_histogram(data=gather(data), aes(x=value, group=key, fill=key), alpha=0.2) +
geom_line(data=gather(fitdata), aes(x=rep(seq(0,7,by=0.01),ncol(fitdata)), y=multiplyingFactor*value, group=key, color=key))
And the output figure
Variant: thanks to the wonderful ggplot2 package you can also have the graphs apart just by adding this final line of code
+ facet_wrap(~ key) + theme(legend.position = "none")
Which gives you this other figure:

Resources