How to plot discrete density function in R - r

graph "density function" if I have for example X = 12, 13, 14 with probabilities of 0.25, 0.25, 0.50, all on the same graph? each number has the mentioned probability.

Yes. Many ways to do this. barplot() is one:
dat <- data.frame(
x = 12:14,
p = c(0.25, 0.25, 0.5)
)
barplot(p~x, data=dat)

Related

Adding observations as proportions on a horizontal barplot in R using text() function

I cannot figure out how to get the percentage of responses at the end of the bars. I know I'm missing something within the text() function, just not sure what exactly I'm missing. Thank you!
#Training/Specialty Barplot
trainbarplot <- barplot(table(PSR$training), horiz = TRUE,
main="Respondent Distribution of Training", cex.main = 1.1, font.main = 2,
cex.lab = 0.8, cex.names = 0.4, font.axis = 4, las = 2,
xlab="Response Frequency", xlim=c(0, 40), cex.axis = 0.8,
border="black",
col=rgb (0.1, 0.1, 0.4, 0.5, 0.6),
density=c(50,40,30) , angle=c(9,11,36)
)
text(trainbarplot, table(PSR$training) - 3,
labels=paste(round(proportions(table(PSR$training))*100, 0), "%"))
Generate data
I generated some sample data to replicate your problem. Please note that you should always try to provide an example dataset :)
set.seed(123)
df1 <- data.frame(x = rnorm(10, mean=10, sd=2), y = LETTERS[1:20])
Plot the data
Here's a plot that follows the same structure as your code:
bp <- barplot(df1$x, names.arg = df1$y, col = df1$colour, horiz = T)
text(x= df1$x+0.5, y= bp, labels=paste0(round(df1$x),"%"), xpd=TRUE)
Using ggplot2
You can also plot your data using ggplot2. For instance, you could first create a new column in your dataset with information on the labels...
df1$perc <- paste0(round(df1$x),"%")
Next, you can plot your data using ggplot and adding different relevant layers.
library(ggplot2)
ggplot(df1, aes(x = x, y = y)) +
geom_col() +
geom_text(aes(label = perc)) +
theme_minimal()
Good luck!

R: Reduce number of plots in quantile regression results

By using the following code I am able to plot the results of my quantile regression model:
quant_reg_all <- rq(y_quant ~ X_quant, tau = seq(0.05, 0.95, by = 0.05), data=df_lasso)
quant_plot <- summary(quant_reg_all, se = "boot")
plot(quant_plot)
However, as there are many variables the plots are unreadable as shown in the image below:
Including the label, I have 18 variables.
How could I plot a few of these images at the time so they are readable?
depending on the number of graphs you cant, you could do:
quant_reg_all <- rq(y_quant ~ X_quant, tau = seq(0.05, 0.95, by = 0.05), data=df_lasso)
quant_plot <- summary(quant_reg_all, se = "boot")
plot(quant_plot, 1:3)# plot the first 3
plot(quant_plot, c(3, 6, 9, 10))# plot the 3rd, 6th, 9th and 10th plots

Multiple Layers in ggplot2

I want to overlay a plot of an empirical cdf with a cdf of a normal distribution. I can only get the code to work without using ggplot.
rnd_nv1 <- rnorm(1000, 1.5, 0.5)
plot(ecdf(rnd_nv1))
lines(seq(0, 3, by=.1), pnorm(seq(0, 3, by=.1), 1.5, 0.5), col=2)
For ggplot to work I would need a single data frame, for example joining rnd_vn1 and pnorm(seq(0, 3, by=.1), 1.5, 0.5), col=2). This is a problem, because the function rnorm gives me just the function values without values on the domain. I don't even know how rnorm creates these, if I view the table I just see function values. But then again, magically, the plot of rnd_nv1 works.
The following plots the two lines but they overlap, since they are almost equal.
set.seed(1856)
x <- seq(0, 3, by = 0.1)
rnd_nv1 <- rnorm(1000, 1.5, 0.5)
dat <- data.frame(x = x, ecdf = ecdf(rnd_nv1)(x), norm = pnorm(x, 1.5, 0.5))
library(ggplot2)
long <- reshape2::melt(dat, id.vars = "x")
ggplot(long, aes(x = x, y = value, colour = variable)) +
geom_line()

How to calculate moments of a distribution specified by x, y coordinates

Imagine I have a simple dataframe of x, y coordinates.
dta_example <- data.frame(
x=c(0,1,2,3,4,5),
y=c(0.1, 0.4, 0.5, 0.6, 0.3, 0.1)
)
plot(NULL, xlim=c(0, 6), ylim=c(0,1), xlab="x", ylab="f(x)")
polygon(
x=c(dta_example$x[1], dta_example$x, dta_example$x[length(dta_example$x)]),
y=c(0, dta_example$y, 0),
col="red"
)
points(dta_example, pch=16)
How would I go about using the above to produce an empirical probability distribution that I could then characterise in terms of mean, sd, skewness, kurtosis etc? Thanks,
Jon
I would recommend the use approxfun on your data. Also, I would add 0s to your data beforehand:
dta_example <- rbind(c(0,0), dta_example, c(0,0))
First create a function corresponding to your data
f <- approxfun(dta_example$x, dta_example$y)
Compute numerically the $n$-th moment
n <- 3
xmin <- min(dta_example$x)
xmax <- max(dta_example$x)
m <- integrate(function(x) x^n*f(x), lower=xmin, upper=xmax)
m
# 47.1216 with absolute error < 0.002
EDIT: An example with a simple triangular distribution.
dat <- data.frame(x = c(-1, 0, 1), y = c(0, 1, 0))
f <- approxfun(dat$x, dat$y)
Plot of the distribution
plot(f, xlim=c(-2,2), col="red") ; grid()
Check that the integra between -1 and +1 is equal to one
integrate(f, lower=-1, upper=+1)
Compute mean and variance
integrate(function(x) x*f(x), lower=-1, upper=+1)
integrate(function(x) x^2*f(x), lower=-1, upper=+1)

Plot ROC curve and calculate AUC in R at specific cutoff info

Given such data:
SN = Sensitivity;
SP = Specificity
Cutpoint SN 1-SP
1 0.5 0.1
2 0.7 0.2
3 0.9 0.6
How can i plot the ROC curve and calculate AUC. And compare the AUC between two different ROC curves. In the most of the packages such pROC or ROCR, the input of the data is different from those shown above. Can anybody suggest the way to solve this problem in R or by something else?
ROCsdat <- data.frame(cutpoint = c(5, 7, 9), TPR = c(0.56, 0.78, 0.91), FPR = c(0.01, 0.19, 0.58))
## plot version 1
op <- par(xaxs = "i", yaxs = "i")
plot(TPR ~ FPR, data = dat, xlim = c(0,1), ylim = c(0,1), type = "n")
with(dat, lines(c(0, FPR, 1), c(0, TPR, 1), type = "o", pch = 25, bg = "black"))
text(TPR ~ FPR, data = dat, pos = 3, labels = dat$cutpoint)
abline(0, 1)
par(op)
First off, I would recommend to visit your local library and find an introductory book on R. It is important to have a solid base before you can write your own code, and copy-pasting code found on the internet without really understanding what is means is risky at best.
Regarding your question, I believe the (0,0) and (1,1) cooordinates are part of the ROC curve so I included them in the data:
ROCsdat <- data.frame(cutpoint = c(-Inf, 5, 7, 9, Inf), TPR = c(0, 0.56, 0.78, 0.91, 1), FPR = c(0, 0.01, 0.19, 0.58, 1))
AUC
I strongly recommend against setting up your own trapezoid integration function at this stage of your training in R. It's too error-prone and easy to screw up with a small (syntax) mistake.
Instead, use a well established integration code like the trapz function in pracma:
library(pracma)
trapz(ROCsdat$FPR, ROCsdat$TPR)
Plotting
I think you mostly got the plotting, although I would write it slightly differently:
plot(TPR ~ FPR, data = ROCsdat, xlim = c(0,1), ylim = c(0,1), type="b", pch = 25, bg = "black")
text(TPR ~ FPR, data = ROCsdat, pos = 3, labels = ROCsdat$cutpoint)
abline(0, 1, col="lightgrey")
Comparison
For the comparison, let's say you have two AUCs in auc1 and auc2. The if/else syntax looks like this:
if (auc1 < auc2) {
cat("auc1 < auc2!\n")
} else if (auc1 == auc2) {
cat("aucs are identical!\n")
} else {
cat("auc1 > auc2!\n")
}
I suppose you could just compute it manually:
dat <- data.frame(tpr=c(0, .5, .7, .9, 1), fpr=c(0, .1, .2, .6, 1))
sum(diff(dat$fpr) * (dat$tpr[-1] + dat$tpr[-length(dat$tpr)]) / 2)
# [1] 0.785
You need to have the tpr and fpr vectors begin with 0 and end with 1 to compute the AUC properly.

Resources