Is it possible to create this graph on R? - r

I'm really new to R and I'm looking to create a graph similar to the one attached. I have tried to create a density plot using both ggplot and the base program.
I have used code ggplot(data, aes(x = Freq)) + geom_density() but the output is incorrect. I'm getting a spike at each number point rather than an overall curve. Every row is one data point of between 1 to 7 and the frequency distributions for one trait is as follows:
1: 500, 2: 550 3:700 4:1000 5:900 6:835: 7:550
As such I have 5035 rows as one row equates to one score.
Any help is much appreciated.
Here is what I wish the plot would look like. (Note I'll add other traits at a later stage, I just wish to add one line at the moment).

there are a few things going on here, first is generating summary statistics of the data. you just need to call mean and sd in the appropriate way to get mean and standard deviation from your data. you've not shown your data so it would be difficult to suggest much here.
as far as plotting these summary statistics, you can replicate the plot from the original paper easily, but it's pretty bad and I'd suggest you not do that. stronger lines imply more importance, the need to double label everything, mislabelling the y-axis, all of that on top of drawing nice smooth parametric curves gives a false impression of confidence. I've only scanned the paper, but that sort of data is crying out for a multi-level model of some sort
I prefer "base" graphics, ggplot is great for exploratory graphics but if you have hard constraints on what a plot should look like it tends to get in the way. We start with the summary statistics:
df <- read.csv(text="
title, mu, sigma,label, label_x,label_pos
Extraversion, 4.0, 1.08,Extra, 3.85,3
Agreeableness, 5.0, 0.77,Agree, 5.0, 3
Conscientiousness, 4.7, 0.97,Cons, 3.4, 2
Emotional stability,5.3, 0.84,Emot stab,5.9, 4
Intellect, 3.7, 0.86,Intellect,3.7, 3
")
I've just pulled numbers out of the paper here, you'd have to calcular them. the mu column is the mean of the variable, and sigma is the standard deviation. label_x and label_pos are used to draw labels so need to be manually chosen (or the plot can be annotated afterwards in something like Inkscape). label_x is the x-axis position, and label_pos stands for where it is in relation to the x-y point (see text for info about the pos parameter)
next we calculate a couple of things:
lwds <- 1 + seq(3, 1, len=5) ^ 2
label_y <- dnorm(df$label_x, df$mu, df$sigma)
i.e. line widths and label y positions, and we can start to make the plot:
# start by setting up plot nicely and setting plot limits
par(bty='l', mar=c(3, 3, 0.5, 0.5), mgp=c(1.8, 0.4, 0), tck=-0.02)
plot.new(); plot.window(c(1, 7), c(0, 0.56), yaxs='i')
# loop over data drawing curves
for (i in 1:nrow(df)) {
curve(dnorm(x, df$mu[[i]], df$sigma[[i]]), add=T, n=151, lwd=lwds[[i]])
}
# draw labels
text(df$label_x, label_y, df$label, pos=df$label_pos)
# draw axes
axis(1, lwd=0, lwd.ticks=1)
axis(2, lwd=0, lwd.ticks=1)
box(lwd=1)
# finally, title and legend
title(xlab='Level of state', ylab='Probability density')
legend('topleft', legend=df$title, lwd=lwds, bty='n', cex=0.85)
this gives us something like:
I've also gone with more modern capitalisation, and started the y-axis at zero as these are probabilities so can't be negative
My preferences would be for something closer to this:
the thin lines cover 2 standard deviations (i.e. 95% intervals) around the mean, thick lines 1 SDs (68%), and the point is the mean. it's much easier to discriminate each measure and compare across them, and it doesn't artificially make "extraversion" more prominent. the code for this is similar:
par(bty='l', mar=c(3, 8, 0.5, 0.5), mgp=c(1.8, 0.4, 0), tck=-0.02)
plot.new(); plot.window(c(1, 7), c(5.3, 0.7))
# draw quantiles
for (i in 1:nrow(df)) {
lines(df$mu[[i]] + df$sigma[[i]] * c(-1, 1), rep(i,2), lwd=3)
lines(df$mu[[i]] + df$sigma[[i]] * c(-2, 2), rep(i,2), lwd=1)
}
# and means
points(df$mu, 1:5, pch=20)
axis(1, lwd=0, lwd.ticks=1)
axis(2, at=1:5, labels=df$title, lwd=0, lwd.ticks=1, las=1)
box()
title(xlab='Level of state')

Related

Access lines plotted by R using basic plot()

I am trying to do the following:
plot a time series in R using a polygonal line
plot one or more horizontal lines superimposed
find the intersections of said line with the orizontal ones
I got this far:
set.seed(34398)
c1 <- as.ts(rbeta(25, 33, 12))
p <- plot(c1, type = 'l')
# set thresholds
thresholds <- c(0.7, 0.77)
I can find no way to access the segment line object plotted by R. I really really really would like to do this with base graphics, while realizing that probably there's a ggplot2 concoction out there that would work. Any idea?
abline(h=thresholds, lwd=1, lty=3, col="dark grey")
I will just do one threshold. You can loop through the list to get all of them.
First find the points, x, so that the curve crosses the threshold between x and x+1
shift = (c1 - 0.7)
Lower = which(shift[-1]*shift[-length(shift)] < 0)
Find the actual points of crossing, by finding the roots of Series - 0.7 and plot
shiftedF = approxfun(1:length(c1), c1-0.7)
Intersections = sapply(Lower, function(x) { uniroot(shiftedF, x:(x+1))$root })
points(Intersections, rep(0.7, length(Intersections)), pch=16, col="red")

How to make the placement of asterisks in a barplot easier?

I would like to place asterisks in my grouped barplot (R base) to indicate where the paired comparisons differ significantly. I know how to place these stars using the points command. However, from the posts that I read sofar it seems that one needs to find the right coordinates manually (e.g., group I: x=0.635, y=26, see the code below). This would take quite some time if one needs to find that out for all significant pairs.
So my question is: Is there an easier way to find the coordinates that correspond with the mid and just next to paired bars? I would prefer to do this in base plotting system at the moment but ggplot answers are also welcome. Thank you very much in advance!
Data example
set.seed(123)
dat<-matrix(runif(32, min = 0.5, max = 1), nrow=2, ncol=16)
colnames(dat)<-c(LETTERS[1:16])
par(mar=c(2,4,2,2))
mp<-barplot(dat, col=c("blue","red"), beside=TRUE, horiz=TRUE, xpd=FALSE, axes=FALSE, axisnames=TRUE, cex.names=0.8, las=2, xlim=c(0.5,1.0), main="Data Example")
axis(1, at=seq(0.5,1.0, by=0.1))
axis(2, at=mp, labels=FALSE, tick=FALSE)
points(x=0.635, y=26, pch="*", cex=2) #sign position at I
Let's say you have a vector telling you which pairs are significant. For example:
sign <- rep(TRUE, 16) ; sign[c(5, 7, 13:14)] <- FALSE
you already know the y coordinates of the letters:
colMeans(mp)
so you can define the y coordinates of the asterisks:
ord_sign <- colMeans(mp)[sign]
For the x coordinates, you can place them for example 0.01 point to the right from the max value:
abs_sign <- apply(dat, 2, max)[sign] + 0.01
Then you can draw all your asterisks at once:
points(x=abs_sign, y=ord_sign, pch="*", cex=2)

questions about axes and mtext

I am swimming backwards in my R knowledge. Please help!
ExampleData:
Site, Aluminum_Dissolved, Federal_Guideline
M1, 0.1, 0.4
M1, 0.2, 0.4
M1, 0.5, 0.4
M2, 0.6, 0.4
M2, 0.4, 0.4
M2, 0.3, 0.4
I have a simple function:
boxplot(ExampleData$Aluminum_Dissolved ~ ExampleData$Site, col="purple",
par (cex.axis=2, las=2), mar=c(7,4,4,2)+0.1
X and Y axis Labels:
Once I increase the values on the axis so much, my xlab and ylab are obscured by axis text.
I have tried using:
`mpg=c(3,1,0)`
and altering values but that seems to get mess up with margin increase
`mar=c(7,4,4,2)+0.1`
I tried scrapping the xlab and ylab altogether and using mtext, but I can't get that to give me labels outside my axis text that are parallel to the y-axis. I have tried:
`mtext("Dissolved Aluminum", side=2, adj=0, las)` etc....
45 degree text on x-axis:
And, finally, I have tried reconstructing my x and y-axis with no avail and I can't seem to rotate my x-axis labels 45 degrees using SRT function. I have tried:
boxplot(ExampleData$Aluminum_Dissolved ~ ExampleData$Site, col="purple",
xaxt='n', yaxt='n', axis(2, cex.axis=2, xlab="Dissolved Aluminum"),
axis(1, cex.axis=2, srt=45)
and this doesn't work. What am I missing. Is there a simple way to do this I am missing...
A quick tutorial:
The way that plotting works in base R graphics is general thought of as a "pen on paper" model. This means that each function you call draws "on top" of what you've created up to that point. Graphical parameters can either be set beforehand via a call to par, or passed directly to the plotting function directly (with some caveats). So for example, I would have done this as:
par(cex.axis=2, las=2,mar=c(7,4,4,2)+0.1)
boxplot(Aluminum_Dissolved ~ Site,data = dat,
col="purple",ylab = "Dissolved Aluminum",xlab = "Dissolved Aluminum")
If you wanted custom axes, you would have done something like:
par(cex.axis=2, las=2,mar=c(7,4,4,2)+0.1)
boxplot(Aluminum_Dissolved ~ Site,data = dat,
col="purple",ylab = "Dissolved Aluminum",xlab = "Dissolved Aluminum",axes = FALSE)
axis(...)
Subsequent call (on separate lines) to things like points or lines would add points or lines to the graph, respectively.
The caveat with par is that some parameters can only be set by calling par directly, not by passing them as named arguments to plotting functions. There is a list of those (which includes mar) located at ?par.
#joran was right -- i think i just messed up the order of the function. I get the axis labels working despite greater size in text using this code:
boxplot(ExampleData$Aluminum_Dissolved ~ ExampleData$Site, col="purple", par(cex.axis=2, cex.lab=1.8), ylab="Dissolved Aluminum")
The only problem with this is that the label is very close to text, but it is alright.

histogram and pdf in the same graph [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Fitting a density curve to a histogram in R
I'd like to plot on the same graph the histogram and various pdf's. I've tried for just one pdf with the following code (adopted from code I've found in the web):
hist(data, freq = FALSE, col = "grey", breaks = "FD")
.x <- seq(0, 0.1, length.out=100)
curve(dnorm(.x, mean=a, sd=b), col = 2, add = TRUE)
It gives me an error. Can you advise me?
For multiple pdf's what's the trick?
And I've observed that the histogram seems to be plot the density (on y-y axis) instead of the number of observations.... how can I change this?
Many thanks!
It plots the density instead of the frequency because you specified freq=FALSE. It is not very fair to complain about it doing exactly what you told it to do.
The curve function expects an expression involving x (not .x) and it does not require you to precompute the x values. You probably want something like:
a <- 5
b <- 2
hist( rnorm(100, a, b), freq=FALSE )
curve( dnorm(x,a,b), add=TRUE )
To head of your next question, if you specify freq=TRUE (or just leave it out for the default) and add the curve then the curve just runs along the bottom (that is the whole purpose of plotting the histogram as a density rather than frequencies). You can work around this by scaling the expression given to curve by the width of the bins and the number of total points:
out <- hist( rnorm(100, a, b) )
curve( dnorm(x,a,b)*100*diff(out$breaks[1:2]), add=TRUE )
Though personally the first option (density scale) without tickmark labels on the y-axis makes more sense to me.
h<-hist(data, breaks="FD", col="red", xlab="xTitle", main="Normal pdf and histogram")
xfit<-seq(min(data),max(data),length=100)
x.norm<-rnorm(n=100000, mean=a, sd=b)
yfit<-dnorm(xfit,mean=mean(x.norm),sd=sd(x.norm))
yfit <- yfit*diff(h$mids[1:2])*length(loose_All)
lines(xfit, yfit, col="blue", lwd=2)

Plotting Contrasting graph(s) from a dataset using R

I have a set of data (1000+ animals) from two seasons (winter and summer) and would like to demonstrate the differences in the gestation length (days) pattern in these two seasons. My data is similar to this:
id <- c(1,2,3,4,5,6,7,8,9,10)
season <- c(1,1,2,2,1,2,1,1,2,1)
gest <- c(114,NA,123,116,NA,120,110,NA,116,119)
data <- cbind(id,season,gest)
I would like to have something like this:
http://had.co.nz/ggplot2/graphics/55078149a733dd1a0b42a57faf847036.png
OR any similar form of graph that would give me a good contrast.
Thank you for all your help,
Bazon
library(ggplot2)
df <- data.frame(id=id,season=season,gest=gest)
qplot(gest,data=df,geom="density",fill=season,alpha=I(0.2))
This should give something similar to that example, but you may want to play with the alpha parameter to get the transparency right.
There is a chart type commonly used to show demographics data, and in particular for directly contrasting two groups in which you wish to emphasize the comparison of subgroups that comprise both groups which are identical to each other along some or all variables other than In the demographics context, the most common application is age structure of males versus females. This seems like it might be a good candidate to effectively visualize your data.
The plot shown below was created using the Base graphics package in R and the (excellent) R Package SVGAnnotation, by Duncan Temple Lang, to create the interactive elements (by re-rendering the image in SVG and post-processing the resultant XML).
(Although the plot was created using R and SVGAnnotate, the image below is from a UK Government Site).
That particular plot that you linked used ggplot2. I'm not really good at using it, so I'll show you how to do it with base graphics
data <- as.data.frame(data)
d1 <- density(data$gest[which(data$season==1)], na.rm=TRUE)
d2 <- density(data$gest[which(data$season==2)], na.rm=TRUE)
plot(d1, ylim=c(0, max(d1$y,d2$y)), xlim=range(c(d1$x, d2$x)),
main="Length of gestation", xlab="Length (days)", col="blue", lwd=2)
polygon(d1$x, d1$y, col=rgb(0, 0, 1, 0.5), lty=0)
points(d2, t="l", col="red", lwd=2)
polygon(d2$x, d2$y, col=rgb(1, 0, 0, 0.5), lty=0)
Alternatively check out the densityplot function of the lattice package, although I'm not sure how to fill in the lines.
PS: is your dataset that small? Density plots are probably NOT the way to go if that is the case (a scatterplot would be better)
EDIT
If you want to do this with histograms you can do something like:
hist(data$gest[which(data$season==1)], main="Length of gestation",
xlab="Length (days)", col=rgb(0, 0, 1, 0.5))
# Note the add=TRUE parameter to superimpose the histograms
hist(data$gest[which(data$season==2)], col=rgb(1, 0, 0, 0.5), add=TRUE)

Resources