I would like to place asterisks in my grouped barplot (R base) to indicate where the paired comparisons differ significantly. I know how to place these stars using the points command. However, from the posts that I read sofar it seems that one needs to find the right coordinates manually (e.g., group I: x=0.635, y=26, see the code below). This would take quite some time if one needs to find that out for all significant pairs.
So my question is: Is there an easier way to find the coordinates that correspond with the mid and just next to paired bars? I would prefer to do this in base plotting system at the moment but ggplot answers are also welcome. Thank you very much in advance!
Data example
set.seed(123)
dat<-matrix(runif(32, min = 0.5, max = 1), nrow=2, ncol=16)
colnames(dat)<-c(LETTERS[1:16])
par(mar=c(2,4,2,2))
mp<-barplot(dat, col=c("blue","red"), beside=TRUE, horiz=TRUE, xpd=FALSE, axes=FALSE, axisnames=TRUE, cex.names=0.8, las=2, xlim=c(0.5,1.0), main="Data Example")
axis(1, at=seq(0.5,1.0, by=0.1))
axis(2, at=mp, labels=FALSE, tick=FALSE)
points(x=0.635, y=26, pch="*", cex=2) #sign position at I
Let's say you have a vector telling you which pairs are significant. For example:
sign <- rep(TRUE, 16) ; sign[c(5, 7, 13:14)] <- FALSE
you already know the y coordinates of the letters:
colMeans(mp)
so you can define the y coordinates of the asterisks:
ord_sign <- colMeans(mp)[sign]
For the x coordinates, you can place them for example 0.01 point to the right from the max value:
abs_sign <- apply(dat, 2, max)[sign] + 0.01
Then you can draw all your asterisks at once:
points(x=abs_sign, y=ord_sign, pch="*", cex=2)
Related
I'm really new to R and I'm looking to create a graph similar to the one attached. I have tried to create a density plot using both ggplot and the base program.
I have used code ggplot(data, aes(x = Freq)) + geom_density() but the output is incorrect. I'm getting a spike at each number point rather than an overall curve. Every row is one data point of between 1 to 7 and the frequency distributions for one trait is as follows:
1: 500, 2: 550 3:700 4:1000 5:900 6:835: 7:550
As such I have 5035 rows as one row equates to one score.
Any help is much appreciated.
Here is what I wish the plot would look like. (Note I'll add other traits at a later stage, I just wish to add one line at the moment).
there are a few things going on here, first is generating summary statistics of the data. you just need to call mean and sd in the appropriate way to get mean and standard deviation from your data. you've not shown your data so it would be difficult to suggest much here.
as far as plotting these summary statistics, you can replicate the plot from the original paper easily, but it's pretty bad and I'd suggest you not do that. stronger lines imply more importance, the need to double label everything, mislabelling the y-axis, all of that on top of drawing nice smooth parametric curves gives a false impression of confidence. I've only scanned the paper, but that sort of data is crying out for a multi-level model of some sort
I prefer "base" graphics, ggplot is great for exploratory graphics but if you have hard constraints on what a plot should look like it tends to get in the way. We start with the summary statistics:
df <- read.csv(text="
title, mu, sigma,label, label_x,label_pos
Extraversion, 4.0, 1.08,Extra, 3.85,3
Agreeableness, 5.0, 0.77,Agree, 5.0, 3
Conscientiousness, 4.7, 0.97,Cons, 3.4, 2
Emotional stability,5.3, 0.84,Emot stab,5.9, 4
Intellect, 3.7, 0.86,Intellect,3.7, 3
")
I've just pulled numbers out of the paper here, you'd have to calcular them. the mu column is the mean of the variable, and sigma is the standard deviation. label_x and label_pos are used to draw labels so need to be manually chosen (or the plot can be annotated afterwards in something like Inkscape). label_x is the x-axis position, and label_pos stands for where it is in relation to the x-y point (see text for info about the pos parameter)
next we calculate a couple of things:
lwds <- 1 + seq(3, 1, len=5) ^ 2
label_y <- dnorm(df$label_x, df$mu, df$sigma)
i.e. line widths and label y positions, and we can start to make the plot:
# start by setting up plot nicely and setting plot limits
par(bty='l', mar=c(3, 3, 0.5, 0.5), mgp=c(1.8, 0.4, 0), tck=-0.02)
plot.new(); plot.window(c(1, 7), c(0, 0.56), yaxs='i')
# loop over data drawing curves
for (i in 1:nrow(df)) {
curve(dnorm(x, df$mu[[i]], df$sigma[[i]]), add=T, n=151, lwd=lwds[[i]])
}
# draw labels
text(df$label_x, label_y, df$label, pos=df$label_pos)
# draw axes
axis(1, lwd=0, lwd.ticks=1)
axis(2, lwd=0, lwd.ticks=1)
box(lwd=1)
# finally, title and legend
title(xlab='Level of state', ylab='Probability density')
legend('topleft', legend=df$title, lwd=lwds, bty='n', cex=0.85)
this gives us something like:
I've also gone with more modern capitalisation, and started the y-axis at zero as these are probabilities so can't be negative
My preferences would be for something closer to this:
the thin lines cover 2 standard deviations (i.e. 95% intervals) around the mean, thick lines 1 SDs (68%), and the point is the mean. it's much easier to discriminate each measure and compare across them, and it doesn't artificially make "extraversion" more prominent. the code for this is similar:
par(bty='l', mar=c(3, 8, 0.5, 0.5), mgp=c(1.8, 0.4, 0), tck=-0.02)
plot.new(); plot.window(c(1, 7), c(5.3, 0.7))
# draw quantiles
for (i in 1:nrow(df)) {
lines(df$mu[[i]] + df$sigma[[i]] * c(-1, 1), rep(i,2), lwd=3)
lines(df$mu[[i]] + df$sigma[[i]] * c(-2, 2), rep(i,2), lwd=1)
}
# and means
points(df$mu, 1:5, pch=20)
axis(1, lwd=0, lwd.ticks=1)
axis(2, at=1:5, labels=df$title, lwd=0, lwd.ticks=1, las=1)
box()
title(xlab='Level of state')
I am trying to do the following:
plot a time series in R using a polygonal line
plot one or more horizontal lines superimposed
find the intersections of said line with the orizontal ones
I got this far:
set.seed(34398)
c1 <- as.ts(rbeta(25, 33, 12))
p <- plot(c1, type = 'l')
# set thresholds
thresholds <- c(0.7, 0.77)
I can find no way to access the segment line object plotted by R. I really really really would like to do this with base graphics, while realizing that probably there's a ggplot2 concoction out there that would work. Any idea?
abline(h=thresholds, lwd=1, lty=3, col="dark grey")
I will just do one threshold. You can loop through the list to get all of them.
First find the points, x, so that the curve crosses the threshold between x and x+1
shift = (c1 - 0.7)
Lower = which(shift[-1]*shift[-length(shift)] < 0)
Find the actual points of crossing, by finding the roots of Series - 0.7 and plot
shiftedF = approxfun(1:length(c1), c1-0.7)
Intersections = sapply(Lower, function(x) { uniroot(shiftedF, x:(x+1))$root })
points(Intersections, rep(0.7, length(Intersections)), pch=16, col="red")
I am trying to plot some data over years with two y-axes in R. However, whenever I try to include a legend, the the legend dominates my plot. When I use solutions suggested elsewhere like keyword and/or using the cex argument, suggested in another post here, it either becomes unreadable or is still too big.
Here is my example with randomly generated data:
#Create years
year.df <- seq(1974, 2014, 1)
# Create y-axis data
set.seed(75)
mean1 <- rnorm(length(year.df), 52.49, 0.87)
mean2 <- rnorm(length(year.df), 52.47, 0.96)
#Create dataframe
df <- data.frame(cbind(year.df, mean1, mean2))
I want a second y-axis, the difference of the two means over the years
df$diff <- abs(df$mean1 - df$mean2)
When I plot using the code below to create two y-axes:
par(mfrow=c(1,1), mar=c(5.1,4.1,4.1,5.1))
with(df, plot(year.df, mean1, type = "l", lwd=4, xlab="Year", ylab="Mean", ylim=c(48,58)))
with(df, lines(year.df, mean2, type = "l", col="green", lwd=4))
par(new=TRUE)
with(df, plot(year.df, diff, type="l", axes=FALSE, xlab=NA, ylab=NA, col="red", lty=5, ylim=c(0,10)))
axis(side = 4)
mtext(side = 4, line = 3, "Annual Difference")
legend("topleft",
legend=c("Calculated", "MST", "Diff"),
lty=c(1,1,5), col=c("black", "green", "red"))
I get:
When I use the cex=0.5 argument in the legend(), it starts to become unreadable:
Is there a way to format my legend in a clear, readable manner? Better than what I have?
The white space in the legend tells me that you manually widened your plot window. Legends do not scale well when it comes to manual re-sizing.
The solution is opening a plot of the exact size you need before plotting. In Windows, this is done with windows(width=10, height=8). Units are in inches.
As you can see below, the legend sits tightly in the corner.
Apparently, I forgot to do the first step of troubleshooting: turn things off an turn it on. I woke up this morning and ran the script again. Even with cex = 0.5 and it turned out fine. I chose to use cex = 0.75. I would still appreciate any help in why that might be. Spent many hours yesterday trying to fix my legend and the same code works and receives this product (cex=0.75):
How can I rotate the X axis labels 45 degrees on a grouped bar plot in R?
I have tried the solution suggested here but got something very messy, the labels seem to have been added multiple times (only showing the axis part to protect data privacy):
This solution (gridBase) was also unsuccessful for me, for some reason I get the following error:
"Cannot pop the top-level viewport (grid and graphics output mixed?)"
PS.
Most people seem to recommend this solution in R base but I am stuck with that too because I don't understand what data they are referring to (I need some kind of example data set to understand new command lines...).
Are these solutions not working because my barplot is a grouped barplot? Or should it work nevertheless? Any suggestions are welcome, I have been stuck for quite some time. Thank you.
[edit] On request I am adding the code that I used to generate the picture above (based on one of the text() solutions):
data <- #this is a matrix with 4 columns and 20 rows;
#colnames and rownames are specified.
#the barplot data is grouped by rows
lablist <- as.vector(colnames(data))
barplot(data, beside=TRUE, col=c("darkred","red","grey20","grey40"))
text(1:100, par("usr")[1], labels=lablist, srt=45, pos=1, xpd=TRUE)
I am not a base plot proficient, so maybe my solution is not very simple. I think that using ggplot2 is better here.
def.par <- par(no.readonly = TRUE)
## divide device into two rows and 1 column
## allocate figure 1 for barplot
## allocate figure 2 for barplot labels
## respect relations between widths and heights
nf <- layout(matrix(c(1,1,2,2),2,2,byrow = TRUE), c(1,3), c(3,1), TRUE)
layout.show(nf)
## barplot
par(mar = c(0,1,1,1))
set.seed(1)
nKol <- 8 ## you can change here but more than 11 cols
## the solution is not really readable
data <- matrix(sample(1:4,nKol*4,rep=TRUE),ncol=nKol)
xx <- barplot(data, beside=TRUE,
col=c("darkred","red","grey20","grey40"))
## labels , create d ummy plot for sacles
par(mar = c(1,1,0,1))
plot(seq_len(length(xx)),rep(1,length(xx)),type='n',axes=FALSE)
## Create some text labels
labels <- paste("Label", seq_len(ncol(xx)), sep = " ")
## Plot text labels with some rotation at the top of the current figure
text(seq_len(length(xx)),rep(1.4,length(xx)), srt = 90, adj = 1,
labels = labels, xpd = TRUE,cex=0.8,srt=60,
col=c("darkred","red","grey20","grey40"))
par(def.par) #- reset to default
Try the first answer:
x <- barplot(table(mtcars$cyl), xaxt="n")
labs <- paste(names(table(mtcars$cyl)), "cylinders")
text(cex=1, x=x-.25, y=-1.25, labs, xpd=TRUE, srt=45)
But change cex=1 to cex=.8 or .6 in the text() function:
text(cex=.6, x=x-.25, y=-1.25, labs, xpd=TRUE, srt=45)
In the picture you posted, it appears to me that the labels are just too big. cex sets the size of these labels.
I had the same problem with a grouped bar plot. I assume that you only want one label below each group. I may be wrong about this, since you don't state it explicitly, but this seems to be the case since your labels are repeated in image. In that case you can use the solution proposed by Stu although you have to apply colMeans to the x variable when you supply it to the text function:
x <- barplot(table(mtcars$cyl), xaxt="n")
labs <- paste(names(table(mtcars$cyl)), "cylinders")
text(cex=1, x=colMeans(x)-.25, y=-1.25, labs, xpd=TRUE, srt=45)
I am using filled.contour() to plot data stored in a matrix. The data is generated by a (highly) non-linear function, hence its distribution is not uniform at all and the range is very large.
Consequently, I have to use the option "levels" to fine tune the plot. However, filled.contour() does not use these custom levels to make an appropriate color key for the heat map, which I find quite surprising.
Here is a simple example of what I mean:
x = c(20:200/100)
y = c(20:200/100)
z = as.matrix(exp(x^2)) %*% exp(y^2)
filled.contour(x=x,y=y,z=z,color.palette=colorRampPalette(c('green','yellow','red')),levels=c(1:60/3,30,50,150,250,1000,3000))
As you can see, the color key produced with the code above is pretty much useless. I would like to use some sort of projection (perhaps sin(x) or tanh(x)?), so that the upper range is not over-represented in the key (in a linear way).
At this point, I would like to:
1) know if there is something very simple/obvious I am missing, e.g.: an option to make this "key range adapting" automagically;
2) seek suggestions/help on how to do it myself, should the answer to 1) be negative.
Thanks a lot!
PS: I apologize for my English, which is far from perfect. Please let me know if you need me to clarify anything.
I feel your frustration. I never found a way to do this with filled contour, so have usually reverted to using image and then adding my own scale as a separate plot. I wrote the function image.scale to help out with this (link). Below is an example of how you can supply a log-transform to your scale in order to stretch out the small values - then label the scale with the non-log-transformed values as labels:
Example:
source("image.scale.R") # http://menugget.blogspot.de/2011/08/adding-scale-to-image-plot.html
x = c(20:200/100)
y = c(20:200/100)
z = as.matrix(exp(x^2)) %*% exp(y^2)
pal <- colorRampPalette(c('green','yellow','red'))
breaks <- c(1:60/3,30,50,150,250,1000,3000)
ncolors <- length(breaks)-1
labs <- c(0.5, 1, 3,30,50,150,250,1000,3000)
#x11(width=6, height=6)
layout(matrix(1:2, nrow=1, ncol=2), widths=c(5,1), heights=c(6))
layout.show(2)
par(mar=c(5,5,1,1))
image(x=x,y=y,z=log(z), col=pal(ncolors), breaks=log(breaks))
box()
par(mar=c(5,0,1,4))
image.scale(log(z), col=pal(ncolors), breaks=log(breaks), horiz=FALSE, xlab="", ylab="", xaxt="n", yaxt="n")
axis(4, at=log(labs), labels=labs)
box()
Result: