R: Non-normal distribution with specification limits -> quartiles & Cp/Cpk - r

I am having problem to plot quartiles of mixed distribution and furthermore to calculate Cp & Cpk.
My data:
> dput(hist)
structure(list(index = c(1, 10, 11, 12, 128044, 128045, 128046,
128047, 128048, 128049, 128050, 128051, 128052, 128053, 128054,
128055, 128056, 128057, 128058, 128059, 128060, 128061, 128062,
128063, 128064, 128065, 128066, 128067, 128068, 128069, 128070,
128071, 128072, 128073, 128074, 128075, 128076, 128077, 128078,
128079, 128080, 128081, 128082, 13, 14, 15, 150780, 150781, 150782,
150783, 150784, 150785, 150786, 150787, 150788, 150789, 150790,
150791, 150792, 150793, 150794, 150795, 150796, 150797, 150798,
150799, 150800, 16, 163525, 163526, 163527, 163528, 163529, 163530,
163531, 163532, 163533, 163534, 163535, 163536, 163537, 163538,
163539, 163540, 163541, 163542, 163543, 163544, 163545, 163546,
163547, 163548, 163549, 163550, 163551, 163552, 17), Rundheit = c(0.24,
0.25, 0.23, 0.24, 0.23, 0.24, 0.22, 0.24, 0.21, 0.22, 0.23, 0.24,
0.22, 0.24, 0.27, 0.23, 0.26, 0.27, 0.35, 0.27, 0.27, 0.27, 0.27,
0.27, 0.28, 0.32, 0.31, 0.3, 0.29, 0.28, 0.28, 0.27, 0.28, 0.27,
0.28, 0.28, 0.29, 0.29, 0.28, 0.28, 0.27, 0.26, 0.27, 0.23, 0.26,
0.24, 0.17, 0.52, 0.18, 0.19, 0.17, 0.18, 0.18, 0.18, 0.18, 0.2,
0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.19, 0.18, 0.18,
0.25, 0.23, 0.23, 0.22, 0.23, 0.23, 0.23, 0.22, 0.23, 0.2, 0.21,
0.21, 0.22, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.23, 0.22, 0.22,
0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24)), .Names = c("index",
"Rundheit"), row.names = c(17L, 45L, 311125L, 622233L, 872553L,
872581L, 872609L, 872637L, 872665L, 872693L, 872749L, 872777L,
872805L, 872833L, 872861L, 872889L, 872917L, 872945L, 872973L,
873001L, 873057L, 873085L, 873113L, 873141L, 873169L, 873197L,
873225L, 873253L, 873281L, 873309L, 873365L, 873393L, 873421L,
873449L, 873477L, 873505L, 873533L, 873561L, 873589L, 873617L,
873673L, 873701L, 873729L, 933341L, 1244449L, 1555557L, 1579889L,
1579917L, 1579945L, 1579973L, 1580001L, 1580029L, 1580057L, 1580085L,
1580113L, 1580141L, 1580197L, 1580225L, 1580253L, 1580281L, 1580309L,
1580337L, 1580365L, 1580393L, 1580421L, 1580449L, 1580533L, 1866665L,
1976397L, 1976425L, 1976453L, 1976481L, 1976509L, 1976565L, 1976593L,
1976621L, 1976649L, 1976677L, 1976705L, 1976733L, 1976761L, 1976789L,
1976817L, 1976873L, 1976901L, 1976929L, 1976957L, 1976985L, 1977013L,
1977041L, 1977069L, 1977097L, 1977125L, 1977181L, 1977209L, 1977237L,
2177773L), na.action = structure(98:100, .Names = c("2412637",
"2412665", "2412721"), class = "omit"), class = "data.frame")
I have ploted easily ggplot, and the density looks quite good, however quartiles (+/-2s and +/- 3s) are not correct.
My plot:
vec <- quantile(hist$Rundheit, na.rm = TRUE)
ggplot(data=hist, aes(Rundheit)) +
geom_bar(aes( y=..count..), stat="bin",position="dodge", fill="gray40", colour="white") +
stat_density(color="red", geom="line", size=1, position="identity") +
geom_vline(xintercept=vec, linetype=2, colour="blue", size=1) + #Tolerance/Limits
geom_vline(aes(xintercept=0.55), size = 1, color="red") + #Tolerance/Limits
geom_vline(aes(xintercept=0), size = 1, color="red")
Furthermore I have tried to calculate Cp and Cpk using SixSigma package:
library(SixSigma)
cp<- ss.ca.cp(hist$Rundheit, 0,0.55)
cp
[1] 1.922963
cpk <- ss.ca.cpk(hist$Rundheit, 0,0.55)
cpk
[1] 1.658759
However the numbers of cp and cpka calculated by SixSigma do not match the numbers which i received by using another programme, whereas
cp=2.35 and cpk=2.11
Just for the info i do not have much background in statistics
Thanks for the tipps!

How about something like this? Is this what your are after? I don't really know what cp, cpk, LSL and USL are, to be honest.
(I renamed hist to dat, as hist is a very commonly used function.)
m <- mean(dat$Rundheit)
s <- sd(dat$Rundheit)
vec <- data.frame(val = c(m, m - 3*s, m + 3*s, m - 5*s, m + 5*s),
sigma = factor(c('mean', '3s', '3s', '5s', '5s'), c('mean', '3s', '5s')))
library(ggplot2)
ggplot(data=dat, aes(Rundheit)) +
geom_bar(aes( y=..count..), stat="bin",position="dodge", fill="gray40",
colour="white") +
stat_density(color="red", geom="line", size=1, position="identity") +
geom_vline(data = vec, aes(xintercept = val, lty = sigma),
colour = "blue", size = 1)

Related

How to change the a axis to a time series in ggplot2

I'm trying to replicate the graph provided at https://www.chicagofed.org/research/data/cfnai/current-data since I will be needing graphs for data sets soon that look like this. I'm almost there, I can't seem to figure out how to change the x axis to the dates when using ggplot2. Specifically, I would like to change it to the dates in the Date column. I tried about a dozen ways and nothing is working. The data for this graph is under indexes on the website. Here's my code and the graph where dataSet is the data from the website:
library(ggplot2)
library(reshape2)
library(tidyverse)
library(lubridate)
df = data.frame(time = index(dataSet), melt(as.data.frame(dataSet)))
df
str(df)
df$data1.Date = as.Date(as.character(df$data1.Date))
str(df)
replicaPlot1 = ggplot(df, aes(x = time, y = value)) +
geom_area(aes(colour = variable, fill = variable)) +
stat_summary(fun = sum, geom = "line", size = 0.4) +
labs(title = "Chicago Fed National Activity Index (CFNAI) Current Data")
replicaPlot1 + scale_x_continuous(name = "time", breaks = waiver(), labels = waiver(), limits =
df$data1.Date)
replicaPlot1
Any sort of help on this would be very much appreciated!
G:\BOS\Common\R-Projects\Graphs\Replica of Chicago Fed National Acitivty index (PCA)\dataSet
Not sure what's your intention with data.frame(time = index(dataSet), melt(as.data.frame(dataSet))). When I download the data and read via readxl::read_excel I got a nice tibble with a date(time) column which after reshaping via tidyr::pivot_longer could easily be plotted and by making use of scale_x_datetime has a nicely formatted date axis:
Using just the first 20 rows of data try this:
library(ggplot2)
library(readxl)
library(tidyr)
df <- pivot_longer(df, -Date, names_to = "variable")
ggplot(df, aes(x = Date, y = value)) +
geom_area(aes(colour = variable, fill = variable)) +
stat_summary(fun = sum, geom = "line", size = 0.4) +
labs(title = "Chicago Fed National Activity Index (CFNAI) Current Data") +
scale_x_datetime(name = "time")
#> Warning: Removed 4 rows containing non-finite values (stat_summary).
#> Warning: Removed 4 rows containing missing values (position_stack).
Created on 2021-01-28 by the reprex package (v1.0.0)
DATA
# Data downloaded from https://www.chicagofed.org/~/media/publications/cfnai/cfnai-data-series-xlsx.xlsx?la=en
# df <- readxl::read_excel("cfnai-data-series-xlsx.xlsx")
# dput(head(df, 20))
df <- structure(list(Date = structure(c(
-87004800, -84412800, -81734400,
-79142400, -76464000, -73785600, -71193600, -68515200, -65923200,
-63244800, -60566400, -58060800, -55382400, -52790400, -50112000,
-47520000, -44841600, -42163200, -39571200, -36892800
), tzone = "UTC", class = c(
"POSIXct",
"POSIXt"
)), P_I = c(
-0.26, 0.16, -0.43, -0.09, -0.19, 0.58, -0.05,
0.21, 0.51, 0.33, -0.1, 0.12, 0.07, 0.04, 0.35, 0.04, -0.1, 0.14,
0.05, 0.11
), EU_H = c(
-0.06, -0.09, 0.01, 0.04, 0.1, 0.22, -0.04,
0, 0.32, 0.16, -0.2, 0.34, 0.06, 0.17, 0.17, 0.07, 0.12, 0.12,
0.15, 0.18
), C_H = c(
-0.01, 0.01, -0.05, 0.08, -0.07, -0.01,
0.12, -0.11, 0.1, 0.15, -0.04, 0.04, 0.17, -0.03, 0.05, 0.08,
0.09, 0.05, -0.06, 0.09
), SO_I = c(
-0.01, -0.07, -0.08, 0.02,
-0.16, 0.22, -0.08, -0.07, 0.38, 0.34, -0.13, -0.1, 0.08, -0.07,
0.06, 0.07, 0.12, -0.3, 0.35, 0.14
), CFNAI = c(
-0.34, 0.02, -0.55,
0.04, -0.32, 1, -0.05, 0.03, 1.32, 0.97, -0.46, 0.39, 0.38, 0.11,
0.63, 0.25, 0.22, 0.01, 0.49, 0.52
), CFNAI_MA3 = c(
NA, NA, -0.29,
-0.17, -0.28, 0.24, 0.21, 0.33, 0.43, 0.77, 0.61, 0.3, 0.1, 0.29,
0.37, 0.33, 0.37, 0.16, 0.24, 0.34
), DIFFUSION = c(
NA, NA, -0.17,
-0.14, -0.21, 0.16, 0.11, 0.17, 0.2, 0.5, 0.41, 0.28, 0.2, 0.32,
0.36, 0.32, 0.33, 0.25, 0.31, 0.47
)), row.names = c(NA, -20L), class = c(
"tbl_df",
"tbl", "data.frame"
))

How to plot truncated distributions (truncdist) with fitdistrplus?

I am attempting to plot goodness of fit curves to truncated distributions from the fitdistrplus package using its plot function.
library(fitdistrplus)
library(truncdist)
library(truncnorm)
dataNum <- c(433.6668, 413.0450, 435.9952, 449.7559, 457.3629, 498.6187, 598.0335, 637.5611, 644.9193, 634.4843, 620.8676, 590.6622, 581.6411, 572.5022, 594.0925, 587.7293, 608.4948, 626.7594, 599.0286, 611.2966, 572.1749, 545.0071, 490.0298, 478.8484, 458.8293, 437.4878, 467.7026, 477.4094, 467.4182, 519.3056, 599.0155, 648.8603, 623.0672, 606.3737, 552.3653, 558.7612, 553.1345, 549.5961, 546.0578, 565.4582, 562.6825, 606.6225, 578.1584, 572.6201, 546.4735, 514.8147, 479.4638, 462.7702, 430.3652, 452.9671)
If I use the library(truncnorm) to fit a truncated normal distribution, everything works fine.
fit.dataNormTrunc2 <- fitdist(dataNum, "truncnorm", fix.arg=list(a=min(dataNum)), start = list(mean = mean(dataNum), sd = sd(dataNum)))
plot(fit.dataNormTrunc2)
However, if I try to use the truncdist package, only the histogram comparison plot prints without any of the other plots (e.g. qq-plot). I also get an error:
Error in qtNorm(p = c(0.01, 0.03, 0.05, 0.07, 0.09, 0.11, 0.13, 0.15, :
unused argument (p = c(0.01, 0.03, 0.05, 0.07, 0.09, 0.11, 0.13, 0.15, 0.17, 0.19, 0.21, 0.23, 0.25, 0.27, 0.29, 0.31, 0.33, 0.35, 0.37, 0.39, 0.41, 0.43, 0.45, 0.47, 0.49, 0.51, 0.53, 0.55, 0.57, 0.59, 0.61, 0.63, 0.65, 0.67, 0.69, 0.71, 0.73, 0.75, 0.77, 0.79, 0.81, 0.83, 0.85, 0.87, 0.89, 0.91, 0.93, 0.95, 0.97, 0.99))
The code used is:
dtNorm <- function(x, mean, sd) {
dtrunc(x, "norm", mean, sd, a=min(dataNum), b=Inf)
}
ptNorm <- function(x, mean, sd) {
ptrunc(x, "norm", mean, sd, a=min(dataNum), b=Inf)
}
qtNorm <- function(x, mean, sd) {
qtrunc(x, "norm", mean, sd, a=min(dataNum), b=Inf)
}
fit.dataNormTrunc <- fitdist(dataNum, "tNorm", start = c(mean=mean(dataNum), sd=sd(dataNum)))
plot(fit.dataNormTrunc)
I have also tried the truncdist approach with the lognormal functionand again the other 3 plots don't print out and I get the same error about the values not being used.

Plot in R with different pch's

This is my data, and I need to plot:
data=structure(c(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09,
0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2,
0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31,
0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53,
0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64,
0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75,
0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86,
0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97,
0.98, 0.99, -4.29168871465397, -3.11699074587972, 1.09152409255126,
1.55755175826356, -0.172913268677486, 0.138305902738217, -0.38707713636532,
0.0638896647028127, 0.838910810102289, 0.943154102106711, 1.10825647675154,
1.26151733689579, 0.95610404139547, 1.13671597066802, 1.06145162449853,
1.22015975232484, 1.47211564748976, 1.43575780356999, 1.84397139393396,
1.76431139003358, 1.59262327273733, 1.74799121927712, 1.60092115463811,
1.91302749514369, 1.69691050471565, 1.73871696181996, 1.70008388736007,
1.62139419455853, 2.03803222390097, 1.95654400666235, 2.14213709053145,
2.20797610828818, 2.43019994960532, 2.43201814098108, 1.80396697393168,
2.22800019319471, 2.07590961781243, 1.93938306553876, 1.95940985069043,
2.01357121475676, 1.97530323680977, 1.80327169854223, 2.36734705989908,
2.44766094824079, 2.75792381459726, 2.77274665368527, 2.49888229303308,
2.31540449224314, 2.6409962540336, 2.43729957198807, 2.63155885389867,
2.53653088267223, 2.36871141172942, 2.54858578120089, 2.69802567434559,
3.09606341962321, 3.08856133175863, 3.18997559061186, 3.36005160648579,
3.56895022380044, 3.73753226001724, 3.74662085372188, 4.01296134301718,
4.07267448537225, 3.88165588983999, 3.7369314477271, 3.23912007937852,
3.31721703890831, 3.21894991022748, 3.48377059081018, 3.32624243338278,
3.31970136033168, 3.33053692253337, 3.34467916673038, 3.236168836409,
2.93429043790414, 2.9303837626847, 3.15769722112212, 3.75496410153913,
3.60526854720219, 3.82913260531081, 4.12105540857576, 4.00407286724511,
3.86329120505831, 4.01282715673454, 4.27078090625557, 3.57982245847814,
3.42938648057264, 3.04047099021105, 3.22396221972667, 4.4317374989557,
4.55399628631069, 4.51384672365535, 5.19575483872483, 4.77975901314362,
3.67143455937258, 4.83321942758713, 5.82353153779422, 5.4721995802281,
0.209205679527393, 0.36810747913542, 0.767214115569449, 0.631134464438132,
0.950471080949761, 0.955883872576242, 0.861939569072133, 0.978322788509546,
0.650739708163536, 0.609454620741533, 0.416316714902356, 0.424390227854642,
0.509471258981771, 0.45111061569788, 0.482703338045896, 0.415503380452312,
0.281397009944395, 0.312633722543431, 0.172403050166603, 0.157569155616774,
0.223315461391016, 0.134712102225702, 0.187843250166637, 0.109294406499708,
0.115163596824693, 0.138462578171918, 0.119131458337016, 0.174760537513378,
0.060100726330413, 0.0724953102167094, 0.0727020992861007, 0.0538763524104828,
0.0305519665256373, 0.0458544145004334, 0.13222239331969, 0.062914362547982,
0.0997526784831062, 0.11462977656091, 0.116582141802293, 0.0986337165111772,
0.136226138825677, 0.168342590268618, 0.0716128991576213, 0.0676036354494944,
0.0357838762803169, 0.0334279079582225, 0.0610644117339305, 0.0616823286482187,
0.0660736255131733, 0.104368782129991, 0.0705141118177286, 0.0778176025258217,
0.108146014569371, 0.125671355892738, 0.0590267483041353, 0.0294699796128093,
0.0338205013760269, 0.0269159737669502, 0.0134643988629253, 0.00867709725404753,
0.00493722923021656, 0.00323813401160211, 0.000497278521965683,
0.000424360028534299, 0.000603507667276793, 0.00192008642195063,
0.00578745302404915, 0.00632637091749721, 0.0036673526900235,
0.00322317560117313, 0.00315464572099522, 0.00890662685249866,
0.00630278028858244, 0.00172069402847441, 0.00297661131713389,
0.00907593497087, 0.00794661797866469, 0.00360198056893646, 0.000913572843050492,
0.000952621690864408, 0.000214234772719202, 4.55598611162067e-05,
2.0600933563486e-05, 0.00014372066333701, 3.00102200614383e-05,
1.97046007623936e-05, 0.000349337120439941, 0.00580915934418336,
0.0186446024343607, 0.0455194395151208, 0.0067650312952201, 0.00903110379061256,
0.0210099376843247, 0.0126330025977033, 0.0735408204027586, 0.158374400655879,
0.0970807294810527, 0.0643407704341705, 0.408677400389109), .Dim = c(99L,
3L), .Dimnames = list(NULL, c("betas.position", "coef", "pvalue"
)))
I need to plot a graph like this: plot(data[,1],data[,2], pch=8)
When the p-value (data[,3]) is bigger than 0.10, pch should be empty(a line).
I believe that I have to construct some rule, but I am not able to do this so far.
Use an ifelse, which returns a vector which here is either 1 or 2 depending on the value of data[,3]:
plot(data[,1],data[,2],pch=ifelse(data[,3]>0.10,1,2))
so pch=1 for data[,3]>0 and pch=2 otherwise. Adjust these for whichever symbols you want, or use NA for nothing. You can use similar logic for setting the symbol size with the cex= parameter.
The below will remove the points you don't want from your chart:
data <- as.data.frame(data)
plot(data[data$pvalue > 0.1,1],data[data$pvalue > 0.1,2], pch=8)
I'm not sure what you mean by "empty (a line)". If you want to overlay different plot types you should consider ggplot2. It has far more functionality than the Base R plots.

Building a function by defining X and Y and then Integrating in R

I need to construct a function with x values coming from the first column of this matrix below and y values coming from the second column from the same matrix, with the purpose of later calculating the integral in the desired range.:
matrix=structure(c(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09,
0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2,
0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31,
0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53,
0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64,
0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75,
0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86,
0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97,
0.98, 0.99, -7.38512004893287, -7.38512004893287, -6.4788834441613,
-5.63088940915783, -4.83466644123448, -4.68738146949482, -4.28638930290018,
-4.22411786604579, -3.59136848943044, -3.51706359680799, -3.39972014575003,
-3.28609348968074, -3.08569873266253, -2.99764447889508, -2.89470597729108,
-2.77488515429677, -2.67019029728821, -2.54646363628509, -2.48474483938047,
-2.30542896070156, -2.22485510301423, -2.16689229344011, -2.10316315192181,
-2.05135466960309, -1.90942757945567, -1.87863626704201, -1.82507998490407,
-1.75875817642096, -1.6919717645629, -1.62396997031953, -1.56159595204983,
-1.52152738173419, -1.46478394989911, -1.4590555309334, -1.21744398902807,
-1.21731951113139, -1.15003007559406, -1.07321513324935, -0.993364510081357,
-0.924402354306976, -0.885939210442384, -0.831155619244629, -0.80947326709303,
-0.786842719842383, -0.743834513319968, -0.721194178931262, -0.593033922802471,
-0.514780082129033, -0.50717184901095, -0.44223827942003, -0.403514759789576,
-0.296251921664, -0.204238424399985, -0.1463212643028, -0.0982036017275267,
-0.0705262020944892, 0.0275436976821241, 0.0601977432996216,
0.114959963559268, 0.182222546319913, 0.236503724954577, 0.272244043950984,
0.325188234828891, 0.347862804414816, 0.438932719815686, 0.630570414177834,
0.805087251137292, 0.904903847087405, 0.940702374334727, 0.958351604371838,
1.03920208406121, 1.25808734990267, 1.32634708210007, 1.34458194173569,
1.42693337001189, 1.55016591141652, 1.5710754638668, 1.61795101580197,
1.62472416407376, 1.70223430572367, 1.86164374636379, 1.94317125269006,
2.03941620499986, 2.12071850455654, 2.17753890907921, 2.22227616630581,
2.45586794615095, 2.66160802425205, 2.83084956697756, 2.94669126521054,
3.04536994227142, 3.09217816201639, 3.42405058020625, 3.45140184734503,
3.67343579954061, 4.64233570345934, 4.87075743677502, 5.27924539262207,
5.56822483595709), .Dim = c(99L, 2L), .Dimnames = list(NULL,
c("x", "y")))
So i would have a function like this:
plot(matrix[,1],matrix[,2])
And then, my idea is to calculate the integral of this function using this code in R:
integrating= function(x) return(myfunction(x));
integrate(integrating, lower=0.08, upper=0.15)
Is it possible?
I tried but it didnt work.
When I looked at you provide matrix (better use variable mat not matrix for it), I found that your x samples are evenly spaced, and y values are monotone and smooth against x. So a simple linear interpolation would be sufficiently good to model those data.
## read `?approx`
f <- approxfun(mat[, 1], mat[, 2])
Then you can do
integrate (f, lower = 0.08, upper = 0.15)
# -0.2343698 with absolute error < 1.3e-05

Quantile-Quantile ggplot with geom_smooth

I would like to use geom_smooth on my qqplot from ggplot. However it seems that ggplot with stat="qq" doesnt even react to geom_smooth line.
Does anyone know how i can do geom_smooth on QQ Plot?
My data and code:
data2 <- structure(list(index = c(1, 10, 100, 1000, 10000, 100001, 100002,
100003, 100004, 100005, 100006, 100007, 100008, 100009, 10001,
100010, 100011, 100012, 100013, 100014, 100015, 100016, 100017,
100018, 100019, 10002, 100020, 100021, 100022, 100023, 100024,
100025, 100026, 100027, 100028, 100029, 10003, 100030, 100031,
100032, 100033, 100034, 100035, 100036, 100037, 100038, 100039,
10004, 100040, 100041, 100042, 100043, 100044, 100045, 100046,
100047, 100048, 100049, 10005, 100050, 100051, 100052, 100053,
100054, 100055, 100056, 100057, 100058, 100059, 10006, 100060,
100061, 100062, 100063, 100064, 100065, 100066, 100067, 100068,
100069, 10007, 100070, 100071, 100072, 100073, 100074, 100075,
100076, 100077, 100078, 100079, 10008, 100080, 100081, 100082,
100083, 100084, 100085, 100086, 100087, 100088, 100089, 10009,
100090, 100091, 100092, 100093, 100094, 100095, 100096, 100097,
100098, 100099, 1001, 10010, 100100, 100101, 100102, 100103,
100104, 100105, 100106, 100107, 100108, 100109, 10011, 100110,
100111, 100112, 100113, 100114, 100115, 100116, 100117, 100118,
100119, 10012, 100120, 100121, 100122, 100123, 100124, 100125,
100126, 100127, 100128, 100129, 10013, 100130, 100131, 100132,
100133, 100134, 100135, 100136, 100137, 100138, 100139, 10014,
100140, 100141, 100142, 100143, 100144, 100145, 100146, 100147,
100148, 100149, 10015, 100150, 100151, 100152, 100153, 100154,
100155, 100156, 100157, 100158, 100159, 10016, 100160, 100161,
100162, 100163, 100164, 100165, 100166, 100167, 100168, 100169,
10017, 100170, 100171, 100172, 100173, 100174, 100175, 100176,
100177, 100178, 100179, 10018, 100180, 100181, 100182, 100183,
100184, 100185, 100186, 100187, 100188, 100189, 10019, 100190,
100191, 100192, 100193, 100194, 100195, 100196, 100197, 100198,
100199, 1002, 10020, 100200, 100201, 100202, 100203, 100204,
100205, 100206, 100207, 100208, 100209, 10021, 100210, 100211,
100212, 100213, 100214, 100215, 100216, 100217, 100218, 100219,
10022, 100220, 100221, 100222, 100223, 100224, 100225, 100226,
100227, 100228, 100229, 10023, 100230, 100231, 100232, 100233,
100234, 100235, 100236, 100237, 100238, 100239, 10024, 100240,
100241, 100242, 100243, 100244, 100245, 100246, 100247, 100248,
100249, 10025, 100250, 100251, 100252, 100253, 100254, 100255,
100256, 100257, 100258, 100259, 10026, 100260, 100261, 100262,
100263, 100264, 100265, 100266, 100267, 100268, 100269, 10027,
100270, 100271, 100272, 100273, 100274, 100275, 100276, 100277,
100278, 100279, 10028, 100280, 100281, 100282, 100283, 100284,
100285, 100286, 100287, 100288, 100289, 10029, 100290, 100291,
100292, 100293, 100294, 100295, 100296, 100297, 100298, 100299,
1003, 10030, 100300, 100301, 100302, 100303, 100304, 100305,
100306, 100307, 100308, 100309, 10031, 100310, 100311, 100312,
100313, 100314, 100315, 100316, 100317, 100318, 100319, 10032,
100320, 100321, 100322, 100323, 100324, 100325, 100326, 100327,
100328, 100329, 10033, 100330, 100331, 100332, 100333, 100334,
100335, 100336, 100337, 100338, 100339, 10034, 100340, 100341,
100342, 100343, 100344, 100345, 100346, 100347, 100348, 100349,
10035, 100350, 100351, 100352, 100353, 100354, 100355, 100356,
100357, 100358, 100359, 10036, 100360, 100361, 100362, 100363,
100364, 100365, 100366, 100367, 100368, 100369, 10037, 100370,
100371, 100372, 100373, 100374, 100375, 100376, 100377, 100378,
100379, 10038, 100380, 100381, 100382, 100383, 100384, 100385,
100386, 100387, 100388, 100389, 10039, 100390, 100391, 100392,
100393, 100394, 100395, 100396, 100397, 100398, 100399, 1004,
10040, 100400, 100401, 100402, 100403, 100404, 100405, 100406,
100407, 100408, 100409, 10041, 100410, 100411, 100412, 100413,
100414, 100415, 100416, 100417, 100418, 100419, 10042, 100420,
100421, 100422, 100423, 100424, 100425, 100426, 100427, 100428,
100429, 10043, 100430, 100431, 100432, 100433, 100434, 100435,
100436, 100437, 100438, 100439, 10044, 100440, 100441, 100442,
100443, 100444, 100445, 100446, 100447), X = c(0.24, 0.25,
0.27, 0.32, 0.24, 0.22, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21,
0.21, 0.23, 0.2, 0.21, 0.21, 0.21, 0.22, 0.22, 0.21, 0.22, 0.21,
0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.23, 0.22, 0.22, 0.22, 0.21,
0.22, 0.23, 0.22, 0.22, 0.22, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22,
0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.2,
0.22, 0.23, 0.21, 0.22, 0.2, 0.21, 0.21, 0.2, 0.2, 0.21, 0.21,
0.22, 0.23, 0.21, 0.21, 0.22, 0.21, 0.2, 0.21, 0.21, 0.23, 0.21,
0.21, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22,
0.21, 0.21, 0.22, 0.21, 0.21, 0.22, 0.21, 0.21, 0.22, 0.21, 0.21,
0.22, 0.23, 0.21, 0.21, 0.21, 0.22, 0.22, 0.21, 0.22, 0.24, 0.24,
0.24, 0.26, 0.22, 0.24, 0.25, 0.21, 0.23, 0.22, 0.24, 0.24, 0.26,
0.25, 0.24, 0.23, 0.28, 0.27, 0.28, 0.26, 0.27, 0.26, 0.25, 0.25,
0.22, 0.25, 0.22, 0.27, 0.27, 0.26, 0.28, 0.28, 0.28, 0.28, 0.27,
0.26, 0.27, 0.23, 0.27, 0.27, 0.27, 0.27, 0.26, 0.27, 0.28, 0.26,
0.26, 0.25, 0.22, 0.24, 0.26, 0.26, 0.24, 0.24, 0.25, 0.25, 0.24,
0.25, 0.25, 0.22, 0.26, 0.25, 0.25, 0.25, 0.25, 0.26, 0.28, 0.26,
0.27, 0.24, 0.24, 0.26, 0.26, 0.25, 0.25, 0.25, 0.25, 0.23, 0.24,
0.24, 0.24, 0.22, 0.24, 0.25, 0.16, 0.18, 0.17, 0.17, 0.17, 0.14,
0.15, 0.16, 0.23, 0.16, 0.16, 0.16, 0.13, 0.14, 0.15, 0.17, 0.17,
0.17, 0.17, 0.22, 0.17, 0.17, 0.19, 0.19, 0.18, 0.18, 0.18, 0.2,
0.18, 0.19, 0.21, 0.23, 0.17, 0.19, 0.18, 0.18, 0.19, 0.18, 0.18,
0.18, 0.2, 0.18, 0.23, 0.16, 0.17, 0.18, 0.19, 0.18, 0.2, 0.21,
0.21, 0.21, 0.2, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21,
0.2, 0.2, 0.22, 0.23, 0.22, 0.22, 0.22, 0.21, 0.22, 0.21, 0.23,
0.22, 0.21, 0.22, 0.24, 0.22, 0.23, 0.22, 0.2, 0.22, 0.21, 0.22,
0.22, 0.22, 0.23, 0.23, 0.24, 0.23, 0.24, 0.22, 0.22, 0.21, 0.22,
0.21, 0.2, 0.2, 0.22, 0.2, 0.22, 0.21, 0.22, 0.22, 0.21, 0.21,
0.23, 0.2, 0.22, 0.22, 0.22, 0.22, 0.21, 0.22, 0.22, 0.21, 0.2,
0.21, 0.21, 0.19, 0.21, 0.22, 0.21, 0.22, 0.22, 0.2, 0.2, 0.21,
0.2, 0.21, 0.21, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22,
0.21, 0.2, 0.17, 0.23, 0.22, 0.22, 0.21, 0.23, 0.23, 0.24, 0.24,
0.24, 0.23, 0.22, 0.24, 0.23, 0.23, 0.24, 0.23, 0.23, 0.23, 0.23,
0.22, 0.21, 0.24, 0.22, 0.22, 0.23, 0.22, 0.22, 0.21, 0.21, 0.23,
0.22, 0.22, 0.23, 0.24, 0.22, 0.23, 0.23, 0.23, 0.22, 0.23, 0.23,
0.24, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23,
0.24, 0.23, 0.23, 0.22, 0.24, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23,
0.27, 0.28, 0.27, 0.23, 0.27, 0.26, 0.26, 0.27, 0.26, 0.27, 0.26,
0.27, 0.28, 0.26, 0.23, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26,
0.26, 0.28, 0.29, 0.23, 0.26, 0.27, 0.28, 0.27, 0.27, 0.25, 0.26,
0.26, 0.26, 0.26, 0.22, 0.26, 0.26, 0.26, 0.26, 0.25, 0.26, 0.28,
0.26, 0.26, 0.26, 0.21, 0.22, 0.26, 0.27, 0.25, 0.26, 0.26, 0.26,
0.26, 0.26, 0.26, 0.26, 0.24, 0.26, 0.25, 0.26, 0.26, 0.26, 0.26,
0.26, 0.26, 0.27, 0.26, 0.23, 0.25, 0.24, 0.25, 0.3, 0.3, 0.29,
0.29, 0.28, 0.27, 0.28, 0.23, 0.28, 0.28, 0.27, 0.27, 0.27, 0.29,
0.28, 0.29, 0.26, 0.27, 0.24, 0.26, 0.26, 0.26, 0.29, 0.26, 0.26,
0.28, 0.28)), .Names = c("index", "X"), row.names = c(17L,
45L, 73L, 86L, 121L, 165L, 193L, 221L, 249L, 277L, 305L, 333L,
361L, 389L, 401L, 445L, 473L, 501L, 529L, 557L, 585L, 613L, 641L,
669L, 697L, 709L, 753L, 781L, 809L, 837L, 865L, 893L, 921L, 949L,
977L, 1005L, 1017L, 1061L, 1089L, 1117L, 1145L, 1173L, 1201L,
1229L, 1257L, 1285L, 1313L, 1325L, 1369L, 1397L, 1425L, 1453L,
1481L, 1509L, 1537L, 1565L, 1593L, 1621L, 1633L, 1677L, 1705L,
1733L, 1761L, 1789L, 1817L, 1845L, 1873L, 1901L, 1929L, 1941L,
1985L, 2013L, 2041L, 2069L, 2097L, 2125L, 2153L, 2181L, 2209L,
2237L, 2249L, 2293L, 2321L, 2349L, 2377L, 2405L, 2433L, 2461L,
2489L, 2517L, 2545L, 2557L, 2601L, 2629L, 2657L, 2685L, 2713L,
2741L, 2769L, 2797L, 2825L, 2853L, 2865L, 2909L, 2937L, 2965L,
2993L, 3021L, 3049L, 3077L, 3105L, 3133L, 3161L, 3166L, 3201L,
3245L, 3273L, 3302L, 3330L, 3358L, 3386L, 3414L, 3442L, 3470L,
3498L, 3509L, 3554L, 3582L, 3610L, 3638L, 3666L, 3694L, 3722L,
3750L, 3778L, 3806L, 3817L, 3862L, 3890L, 3918L, 3946L, 3974L,
4002L, 4030L, 4058L, 4086L, 4114L, 4125L, 4170L, 4198L, 4226L,
4254L, 4282L, 4310L, 4338L, 4366L, 4394L, 4422L, 4433L, 4478L,
4506L, 4534L, 4562L, 4590L, 4618L, 4646L, 4674L, 4702L, 4730L,
4741L, 4786L, 4814L, 4842L, 4870L, 4898L, 4926L, 4954L, 4982L,
5010L, 5038L, 5049L, 5094L, 5122L, 5150L, 5178L, 5206L, 5234L,
5262L, 5290L, 5318L, 5346L, 5357L, 5402L, 5430L, 5458L, 5486L,
5514L, 5542L, 5570L, 5598L, 5626L, 5654L, 5665L, 5710L, 5738L,
5766L, 5794L, 5822L, 5850L, 5878L, 5906L, 5934L, 5962L, 5973L,
6018L, 6046L, 6074L, 6102L, 6130L, 6158L, 6186L, 6214L, 6242L,
6270L, 6274L, 6309L, 6354L, 6382L, 6410L, 6438L, 6466L, 6494L,
6522L, 6550L, 6578L, 6606L, 6617L, 6662L, 6690L, 6718L, 6746L,
6774L, 6803L, 6831L, 6859L, 6887L, 6915L, 6925L, 6971L, 6999L,
7027L, 7055L, 7083L, 7111L, 7139L, 7167L, 7195L, 7223L, 7233L,
7279L, 7307L, 7335L, 7363L, 7391L, 7419L, 7447L, 7475L, 7503L,
7531L, 7541L, 7587L, 7615L, 7643L, 7671L, 7699L, 7727L, 7755L,
7783L, 7811L, 7839L, 7849L, 7895L, 7923L, 7951L, 7979L, 8007L,
8035L, 8063L, 8091L, 8119L, 8147L, 8157L, 8203L, 8231L, 8259L,
8287L, 8315L, 8343L, 8371L, 8399L, 8427L, 8455L, 8465L, 8511L,
8539L, 8567L, 8595L, 8623L, 8651L, 8679L, 8707L, 8735L, 8763L,
8773L, 8819L, 8847L, 8875L, 8903L, 8931L, 8959L, 8987L, 9015L,
9043L, 9071L, 9081L, 9127L, 9155L, 9183L, 9211L, 9239L, 9267L,
9295L, 9323L, 9351L, 9379L, 9382L, 9417L, 9463L, 9491L, 9519L,
9547L, 9575L, 9603L, 9631L, 9659L, 9687L, 9715L, 9725L, 9771L,
9799L, 9827L, 9855L, 9883L, 9911L, 9939L, 9967L, 9995L, 10023L,
10033L, 10079L, 10107L, 10135L, 10163L, 10191L, 10219L, 10247L,
10275L, 10303L, 10331L, 10341L, 10387L, 10415L, 10443L, 10471L,
10499L, 10527L, 10555L, 10583L, 10611L, 10639L, 10649L, 10695L,
10723L, 10751L, 10779L, 10807L, 10835L, 10863L, 10891L, 10919L,
10947L, 10957L, 11003L, 11031L, 11059L, 11087L, 11115L, 11143L,
11171L, 11199L, 11227L, 11255L, 11265L, 11311L, 11339L, 11367L,
11395L, 11423L, 11451L, 11479L, 11507L, 11535L, 11563L, 11573L,
11619L, 11647L, 11675L, 11703L, 11731L, 11759L, 11787L, 11815L,
11843L, 11871L, 11881L, 11927L, 11955L, 11983L, 12011L, 12039L,
12067L, 12095L, 12123L, 12151L, 12179L, 12189L, 12235L, 12263L,
12291L, 12319L, 12347L, 12375L, 12403L, 12431L, 12459L, 12487L,
12490L, 12525L, 12571L, 12599L, 12627L, 12655L, 12683L, 12711L,
12739L, 12767L, 12795L, 12823L, 12833L, 12879L, 12907L, 12935L,
12963L, 12991L, 13019L, 13047L, 13075L, 13103L, 13131L, 13141L,
13187L, 13215L, 13243L, 13271L, 13299L, 13327L, 13355L, 13383L,
13411L, 13439L, 13449L, 13495L, 13523L, 13551L, 13579L, 13607L,
13635L, 13663L, 13691L, 13719L, 13747L, 13757L, 13803L, 13831L,
13859L, 13887L, 13915L, 13943L, 13971L, 13999L), class = "data.frame")
ggplot(data = data2, aes(sample = X)) +
geom_point(stat = "qq", colour = "gray40", size = 5) +
stat_smooth(method = "loess") +
theme(axis.text.y = element_text(size = 15),
axis.text.x = element_text(size = 15),
axis.title.x = element_text(size = 18, face = "bold"),
axis.title.y = element_text(size = 18, face = "bold"),
legend.position = "bottom", legend.title = element_blank(),
legend.text = element_text(size = 14))
Additionaly i would like to change the x axis to sample, and y axis to theoretical.
Plus -> Anyone has an idea if it is possible to obtain qqplot with probability of exceedance (Like here)?
I think you need to calculate the values first:
data2$theoretical <- unlist(qqnorm(data2$X)[1])
Then you can plot them:
ggplot(data2, aes(x = X, y = theoretical)) +
geom_point(colour = "gray40", size = 5) +
geom_smooth(method = "loess") +
theme(axis.text.y = element_text(size = 15),
axis.text.x = element_text(size = 15),
axis.title.x = element_text(size = 18, face = "bold"),
axis.title.y = element_text(size = 18, face = "bold"),
legend.position = "bottom", legend.title = element_blank(),
legend.text = element_text(size = 14)) +
xlab("sample")

Resources