Indicating the statistically significant difference in bar graph base R - r

This has been asked before in this post: Indicating the statistically significant difference in bar graph USING R. However, they wanted to know how to do this using ggplot2. I was wondering how you do this using just the base package or function barplot().
I want something that looks like this image below:
http://i.stack.imgur.com/3I6El.jpg
my current code:
barcenter3<- barplot(newMEANs3$Percent_Viability, names.arg=c("Control", "Cyp28d1", "A3", "A4"), ylab = "Average Emergent", ylim=c(0, 1.1), xlab= "RNAi Line", main = "Trip Nicotine UAS-RNAi Emergents")
segments(barcenter3, newMEANs3$Percent_Viability-newSDs3$Percent_Viability, barcenter3, newMEANs3$Percent_Viability+newSDs3$Percent_Viability, lwd=1);
segments(barcenter3 - 0.1, newMEANs3$Percent_Viability-newSDs3$Percent_Viability, barcenter3 + 0.1, newMEANs3$Percent_Viability-newSDs3$Percent_Viability, lwd=1);
segments(barcenter3 - 0.1, newMEANs3$Percent_Viability+newSDs3$Percent_Viability, barcenter3 + 0.1, newMEANs3$Percent_Viability+newSDs3$Percent_Viability, lwd=1);
dev.off();
I want to add p value comparing contrast.

Here is a simple function to do this.
## Sample Data
means <- seq(10,40,10)
pvals <- seq(0.01, 0.05, 0.02)
barPs <- function(means, pvals, offset=1, ...) {
breaks <- barplot(means, ylim=c(0, max(means)+3*offset), ...)
ylims <- diff(means) + means[-length(means)] + offset
segments(x0=breaks[-length(breaks)], y0=ylims, x1=breaks[-1], y1=ylims)
segments(x0=c(breaks[-length(breaks)], breaks[-1]),
y0=rep(ylims, each=2), y1=rep(ylims-offset/2, each=2))
text(breaks[-length(breaks)]+diff(breaks[1:2])/2, ylims+offset,
labels=paste("p=", pvals))
}
barPs(means, pvals, offset=1, main="Bar w/ P-value",
names.arg=toupper(letters[1:4]))

Related

Adding observations as proportions on a horizontal barplot in R using text() function

I cannot figure out how to get the percentage of responses at the end of the bars. I know I'm missing something within the text() function, just not sure what exactly I'm missing. Thank you!
#Training/Specialty Barplot
trainbarplot <- barplot(table(PSR$training), horiz = TRUE,
main="Respondent Distribution of Training", cex.main = 1.1, font.main = 2,
cex.lab = 0.8, cex.names = 0.4, font.axis = 4, las = 2,
xlab="Response Frequency", xlim=c(0, 40), cex.axis = 0.8,
border="black",
col=rgb (0.1, 0.1, 0.4, 0.5, 0.6),
density=c(50,40,30) , angle=c(9,11,36)
)
text(trainbarplot, table(PSR$training) - 3,
labels=paste(round(proportions(table(PSR$training))*100, 0), "%"))
Generate data
I generated some sample data to replicate your problem. Please note that you should always try to provide an example dataset :)
set.seed(123)
df1 <- data.frame(x = rnorm(10, mean=10, sd=2), y = LETTERS[1:20])
Plot the data
Here's a plot that follows the same structure as your code:
bp <- barplot(df1$x, names.arg = df1$y, col = df1$colour, horiz = T)
text(x= df1$x+0.5, y= bp, labels=paste0(round(df1$x),"%"), xpd=TRUE)
Using ggplot2
You can also plot your data using ggplot2. For instance, you could first create a new column in your dataset with information on the labels...
df1$perc <- paste0(round(df1$x),"%")
Next, you can plot your data using ggplot and adding different relevant layers.
library(ggplot2)
ggplot(df1, aes(x = x, y = y)) +
geom_col() +
geom_text(aes(label = perc)) +
theme_minimal()
Good luck!

Plotting single points and their range

I am trying to plot some data points from a matrix complete with their standard deviation, but I am having troubles in plotting the latter.
My tools are:
a matrix with the data points to plot at a x coordinate within a properly xlim-defined x-axis;
a vector of as many y arbitrary coordinates for the plotting height, just not making them overlap;
a vector of lengths of the standard deviation lines, to be displayed horizontally around the data points.
Yeah, eventually it'll look like a flying saucer invasion.
I can easily plot the points at the given height, one by one - it is the way I want to do it.
Trouble comes in adding the standard deviation horizontal lines for each point.
Has someone an idea on how to do it?
x<-matrix(c(1:4,NA,NA,10:16), nrow=4, ncol=4)
y<-seq(0.001,0.006, 0.001)
std.dev<-c(runif(7, 0.1, 0.5), NA, NA, runif(7, 0.1, 0.5))
plot(0,0, xlim=c(min = 0, max(x), na.rm=T)+0.001), ylim = c(0,0.016), type = "n", xlab = "My x", yaxt = "n", ylab ="")
points(x = x[1,2], y = y[1], pch = 21, bg = "red", col = "red")
When working with base R it is amazing to find out that R does not provide a "built-in" support for error bars. You may want to consult doing this with other packages.
With base R the work-around is to use the arrow() function and setting the "arrow head angle" to 90 degrees.
Note: I had to change your given data definition as it threw errors. Also have a look at this part of your code.
I plot the error bars in vertical mode. You can easily adapt this for horizontal bars. I did this for presentation reasons to avoid overlapping error bars.
Using your full data will make it easier to deconflict the bars.
x<-matrix(c(1:7,NA,NA,10:16), nrow=4, ncol=4) # adapted to ensure same length
y<-seq(0.001,0.016, 0.001) # adapted to ensure same length
std.dev<-c(runif(7, 0.1, 0.5), NA, NA, runif(7, 0.1, 0.5))
plot(0,0
, xlim= c(min = 0, max(x, na.rm=T)) # had to fix xlim definition
, ylim = c(-1,1) # changed to show give std.dev
, type = "n", xlab = "My x", yaxt = "n", ylab ="")
points(x = x, y = y, pch = 21, bg = "red", col = "red") # set x and y to show all
# --------------- add arrows with "flat head --------------------------
arrows( x0 = x, , x1 = x
,y0 = y-std.dev, y1 = y+std.dev # center deviation on data point
, code=3, angle=90 # set the angle for the head to emulate error bar
, length=0.1)
This yields:

R - colouring scatterplot points

Hi there, I'm wondering why my code below makes the legend coloured, but the dots themselves are not:
# dataset <- data.frame(IDName, Value, Setpoints)
# dataset <- unique(dataset)
# Paste or type your script code here:
dat <- aggregate(Value ~ Setpoints + IDName, dataset, mean)
x <- dat$Value
y <- dat$Setpoints
z <- dataset$IDName
plot(x,y, main ="Turbidity Frequency Distribution",xlab="% Time < Turbidity level", ylab="Turbidity (NTU)")
lines(spline(x,y))
palette()
legend('topleft', legend = unique(z), col = 1:3, cex = 0.8, pch = 1)
#constant lines
abline(h=c(0.1,0.15,0.3), col=c("red","pink","purple"), lty=2, lwd=3)
Make sure that z is a factor. Then, use col = z when you create the plot. You will get colored points.
In your legend (the character values to appear in legend) to the levels of your factor z. In addition, set the colors based on unique(z) - they should match your points.
Here is the complete example. In the future, instead of putting data in a comment, please edit your question with the data. Also, you may want to consider ggplot2 for future plotting.
dat <- aggregate(Value ~ Setpoints + IDName, dataset, mean)
x <- dat$Value
y <- dat$Setpoints
z <- dataset$IDName
z <- factor(z)
plot(x, y,
main ="Turbidity Frequency Distribution",
xlab="% Time < Turbidity level",
ylab="Turbidity (NTU)",
col = z)
lines(spline(x,y))
palette()
legend('topleft',
legend = levels(z),
col = unique(z),
cex = 0.8,
pch = 1)
#constant lines
abline(h=c(0.1,0.15,0.3), col=c("red","pink","purple"), lty=2, lwd=3)
Plot
Data
dataset <- structure(list(IDName = c("Filter01", "Filter01", "Filter01",
"Filter01", "Filter01", "Filter02", "Filter02", "Filter02", "Filter02",
"Filter02"), Setpoints = c(0.16, 0.2, 0.3, 2, 2.2, 0.16, 0.2,
0.3, 2, 2.2), Value = c(96.1, 96.2, 96.428, 99.603, 99.6, 98.8,
98.9, 99.049, 99.194, 99.2)), class = "data.frame", row.names = c(NA,
-10L))

How to plot a normal distribution by labeling specific parts of the x-axis?

I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result:

Howto Plot ROC curve in R with only known SN/PPV/Cutoff info

Given such data:
#Cutpoint SN (1-PPV)
5 0.56 0.01
7 0.78 0.19
9 0.91 0.58
How can I plot ROC curve with R that produce similar result like the
attached ?
I know ROCR package but it doesn't take such input.
If you just want to create the plot (without that silly interpolation spline between points) then just plot the data you give in the standard way, prepending a point at (0,0) and appending one at (1,1) to give the end points of the curve.
## your data with different labels
dat <- data.frame(cutpoint = c(5, 7, 9),
TPR = c(0.56, 0.78, 0.91),
FPR = c(0.01, 0.19, 0.58))
## plot version 1
op <- par(xaxs = "i", yaxs = "i")
plot(TPR ~ FPR, data = dat, xlim = c(0,1), ylim = c(0,1), type = "n")
with(dat, lines(c(0, FPR, 1), c(0, TPR, 1), type = "o", pch = 25, bg = "black"))
text(TPR ~ FPR, data = dat, pos = 3, labels = dat$cutpoint)
abline(0, 1)
par(op)
To explain the code: The first plot() call sets up the plotting region, without doing an plotting at all. Note that I force the plot to cover the range (0,1) in both axes. The par() call tells R to plot axes that cover the range of the data - the default extends them by 4 percent of the range on each axis.
The next line, with(dat, lines(....)) draws the ROC curve and here we prepend and append the points at (0,0) and (1,1) to give the full curve. Here I use type = "o" to give both points and lines overplotted, the points are represented by character 25 which allows it to be filled with a colour, here black.
Then I add labels to the points using text(....); the pos argument is used to position the label away from the actual plotting coordinates. I take the labels from the cutpoint object in the data frame.
The abline() call draws the 1:1 line (here the 0, and 1 mean an intercept of 0 and a slope of 1 respectively.
The final line resets the plotting parameters to the defaults we saved in op prior to plotting (in the first line).
The resulting plot looks like this:
It isn't an exact facsimile and I prefer the plot using the default for the axis ranges(adding 4 percent):
plot(TPR ~ FPR, data = dat, xlim = c(0,1), ylim = c(0,1), type = "n")
with(dat, lines(c(0, FPR, 1), c(0, TPR, 1), type = "o", pch = 25, bg = "black"))
text(TPR ~ FPR, data = dat, pos = 3, labels = dat$cutpoint)
abline(0, 1)
Again, not a true facsimile but close.

Resources