Add multiple ggplot2 geom_segment() based on mean() and sd() data - r

I have a data frame mydataAll with columns DESWC, journal, and highlight. To calculate the average and standard deviation of DESWC for each journal, I do
avg <- aggregate(DESWC ~ journal, data = mydataAll, mean)
stddev <- aggregate(DESWC ~ journal, data = mydataAll, sd)
Now I plot a horizontal stripchart with the values of DESWC along the x-axis and each journal along the y-axis. But for each journal, I want to indicate the standard deviation and average with a simple line. Here is my current code and the results.
stripchart2 <-
ggplot(data=mydataAll, aes(x=mydataAll$DESWC, y=mydataAll$journal, color=highlight)) +
geom_segment(aes(x=avg[1,2] - stddev[1,2],
y = avg[1,1],
xend=avg[1,2] + stddev[1,2],
yend = avg[1,1]), color="gray78") +
geom_segment(aes(x=avg[2,2] - stddev[2,2],
y = avg[2,1],
xend=avg[2,2] + stddev[2,2],
yend = avg[2,1]), color="gray78") +
geom_segment(aes(x=avg[3,2] - stddev[3,2],
y = avg[3,1],
xend=avg[3,2] + stddev[3,2],
yend = avg[3,1]), color="gray78") +
geom_point(size=3, aes(alpha=highlight)) +
scale_x_continuous(limit=x_axis_range) +
scale_y_discrete(limits=mydataAll$journal) +
scale_alpha_discrete(range = c(1.0, 0.5), guide='none')
show(stripchart2)
See the three horizontal geom_segments at the bottom of the image indicating the spread? I want to do that for all journals, but without handcrafting each one. I tried using the solution from this question, but when I put everything in a loop and remove the aes(), it give me an error that says:
Error in x - from[1] : non-numeric argument to binary operator
Can anyone help me condense the geom_segment() statements?

I generated some dummy data to demonstrate. First, we use aggregate like you have done, then we combine those results to create a data.frame in which we create upper and lower columns. Then, we pass these to the geom_segment specifying our new dataset. Also, I specify x as the character variable and y as the numeric variable, and then use coord_flip():
library(ggplot2)
set.seed(123)
df <- data.frame(lets = sample(letters[1:8], 100, replace = T),
vals = rnorm(100),
stringsAsFactors = F)
means <- aggregate(vals~lets, data = df, FUN = mean)
sds <- aggregate(vals~lets, data = df, FUN = sd)
df2 <- data.frame(means, sds)
df2$upper = df2$vals + df2$vals.1
df2$lower = df2$vals - df2$vals.1
ggplot(df, aes(x = lets, y = vals))+geom_point()+
geom_segment(data = df2, aes(x = lets, xend = lets, y = lower, yend = upper))+
coord_flip()+theme_bw()
Here, the lets column would resemble your character variable.

Related

How to plot lines of geom_smooth with the p values of the lm model result?

I am trying to plot my date in ggplot like this: , with line type been determined by the p values of the smooth lines (i.e., dash line if the regression is not significant, and solid line when it is). Before I post this question, I tried this answer in this forum, but they normally deal with labels, not the line itself.
Belwo is my failure code with sample data. Thanks in advance for your kind help.
library(plyr)
library(ggplot2)
dat <- data.frame(id = 1: 100,
x = rnorm(100,2,0.5),
y = rnorm(100, 20, 5),
varA = rep(letters[1:4], 25),
varB = factor(sample(c(50,100,150), 100, TRUE)))
pvdat <- ddply(dat,.(varA,varB), function(df) data.frame(pvalue=format(signif(summary(lm(y~x,data=df))[[4]][2, 4], 2),scientific=-2),
lty = ifelse(summary(lm(y~x,data=df))[[4]][2, 4] > 0.05, 0, 1)))
ggplot(data= dat, aes(x = x, y = y, col = as.factor(varB))) + geom_smooth(method = "lm", aes(linetype = pvdat$lty)) + facet_grid(. ~ as.factor(varA), scale = "free_x")
There are two problems here:
pvdata$lty is continuous, but linetype requires a factor
pvdata has ten items but dat has 100, so ggplot does not know how to make a mapping between the two
To change your numeric column to a factor, you need as.factor(), and to make the mapping you can use the merge() function to make a single data frame with the values from pvdat mapped for each element of dat. Putting these together:
ggplot(data= merge(dat,pvdat,by = c("varA","varB")), aes(x = x, y = y, col = as.factor(varB))) + geom_smooth(method = "lm", aes(linetype = as.factor(lty))) + facet_grid(. ~ as.factor(varA), scale = "free_x")
will solve your problem.

Advice/ on how to plot side by side histograms with line graph going through in ggplot2

I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.

Adding dummy values on axis in ggplot2 to add asymmetric distance between ticks

How to add dummy values on x-axis in ggplot2
I have 0,2,4,6,12,14,18,22,26 in data and that i have plotted on x-axis. Is there a way to add the remaining even numbers for which there is no data in table? this will create due spaces on the x-axis.
after the activity the x-axis should show 0,2,4,6,8,10,12,14,16,18,20,22,24,26
i have tried using rbind.fill already to add dummy data but when I make them factor the 8,10,12etc coming in last
Thanks
enter image description here
Hope this make sense:
library(ggplot2)
gvals <- factor(letters[1:3])
xvals <- factor(c(0,2,4,6,12,14,18,22,26), levels = seq(0, 26, by = 2))
yvals <- rnorm(10000, mean = 2)
df <- data.frame(x = sample(xvals, size = length(yvals), replace = TRUE),
y = yvals,
group = sample(gvals, size = length(yvals), replace = TRUE))
ggplot(df, aes(x = x, y = y)) + geom_boxplot(aes(fill = group)) +
scale_x_discrete(drop = FALSE)
The tricks are to make the x-variable with all levels you need and to specify drop = FALSE in scale.

How to make error bars for multiple variables in bar chat

I was hoping someone could help me with the following problem:
I am attempting to make a combined barplot showing the mean and standard errors for 3 different continuous variables (body temp, length, mass) recorded for a binary variable (gender).
I have been able to plot the mean values for each variable but I can't seem to successfully calculate the standard error for these 3 variables using any of the codes I've tried.
I tried many things, but I think I was on the right track with this:
View(test4)
test4 <- aggregate(test4,
by = list(Sex = test4$Sex),
FUN = function(x) c(mean = mean(x), sd = sd(x),
n = length(x)))
test4
#this produced mean, sd, length for ALL variables (including sex)
test4<-do.call(test4)
test4$se<-test4$x.sd / sqrt(test4$x.n)
Then I kept getting the error:
Error in sqrt(test4$x.n) : non-numeric argument to mathematical function
I tried to recode to target my 3 variables after aggregate(test4...) but I couldn't get it to work...Then I subsetted by resulting dataframe to exclude sex but that didn't work. I then tried to define it as a matrix or vector but still that didn't work.
I would like my final graph to to have y axis = mean values, x axis = variable (3 sub-groups (Tb, Mass, Length) with two bars side by side showing male and female values for comparison.
Any help or direction anyone could provide would be greatly appreciated!!
Many thanks in advance! :)
aggregate does give some crazy output when you are trying to output more than one column.
If you wish to use aggregate I would do mean and SE as separate calls to aggregate.
However, here is a solution using tidyr and dplyr that I don't think is too bad.
I've created some data. I hope it looks like yours. It is so useful to include a simulated dataset with your question.
library(tidyr)
library(dplyr)
library(ggplot2)
# Create some data
test4 <- data.frame(Sex = rep(c('M', 'F'), 50),
bodytemp = rnorm(100),
length = rnorm(100),
mass = rnorm(100))
# Gather the data to 'long' format so the bodytemp, length and mass are all in one column
longdata <- gather(test4, variable, value, -Sex)
head(longdata)
# Create the summary statistics seperately for sex and variable (i.e. bodytemp, length and mass)
summary <- longdata %>%
group_by(Sex, variable) %>%
summarise(mean = mean(value), se = sd(value) / length(value))
# Plot
ggplot(summary, aes(x = variable, y = mean, fill = Sex)) +
geom_bar(stat = 'identity', position = 'dodge') +
geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
width = 0.2,
position = position_dodge(0.9))
My final plot
Update: I was able to answer my question by combining the initial part of timcdlucas script along with another one I had used when plotting just one output. For anyone else who may be seeking an answer to a similar question, I have posted my script and the resulting graph (see link above):
View(test3) #this dataframe was organized as 'sex', 'tb', 'mass', 'svl'
newtest<-test3
View(newtest)
#transform data to 'long' combining all variables in one column
longdata<-gather(newtest, variable, value, -Sex)
View(longdata)
#set up table in correct format
longdata2 <- aggregate(longdata$value,
by = list(Sex = longdata$Sex, Variable = longdata$variable),
FUN = function(x) c(mean = mean(x), sd = sd(x),
n = length(x)))
longdata2 <- do.call(data.frame, longdata2)
longdata2$se<-longdata2$x.sd / sqrt(longdata2$x.n)
colnames(longdata2)<-c("Sex", "Variable", "mean", "sd", "n", "se")
longdata2$names<-c(paste(longdata2$Variable, "Variable /", longdata2$Sex, "Sex"))
View(longdata2)
dodge <- position_dodge(width = 0.9)
limits <- aes(ymax = longdata3$mean + longdata3$se,
ymin = longdata3$mean - longdata3$se)
#To order the bars in the way I desire *might not be necessary for future scripts*
positions<-c("Tb", "SVL", "Mass")
#To plot new table:
bfinal <- ggplot(data = longdata3, aes(x = factor(Variable), y = mean,
fill = factor(Sex)))+
geom_bar(stat = "identity",
position = position_dodge(0.9))+
geom_errorbar(limits, position = position_dodge(0.9),
width = (0.25)) +
labs(x = "Variable", y = "Mean") +
ggtitle("")+
scale_fill_discrete(name = "",
labels=c("Male", "Female"))+
scale_x_discrete(breaks=c("Mass", "SVL", "Tb"),
labels=c("Mass", "SVL", "Tb"),
limits=(positions))
bfinal
:)

How to add ggplot legend of two different lines R?

I need to add a legend of the two lines (best fit line and 45 degree line) on TOP of my two plots. Sorry I don't know how to add plots! Please please please help me, I really appreciate it!!!!
Here is an example
type=factor(rep(c("A","B","C"),5))
xvariable=seq(1,15)
yvariable=2*xvariable+rnorm(15,0,2)
newdata=data.frame(type,xvariable,yvariable)
p = ggplot(newdata,aes(x=xvariable,y=yvariable))
p+geom_point(size=3)+ facet_wrap(~ type) +
geom_abline(intercept =0, slope =1,color="red",size=1)+
stat_smooth(method="lm", se=FALSE,size=1)
Here is another approach which uses aesthetic mapping to string constants to identify different groups and create a legend.
First an alternate way to create your test data (and naming it DF instead of newdata)
DF <- data.frame(type = factor(rep(c("A", "B", "C"), 5)),
xvariable = 1:15,
yvariable = 2 * (1:15) + rnorm(15, 0, 2))
Now the ggplot code. Note that for both geom_abline and stat_smooth, the colour is set inside and aes call which means each of the two values used will be mapped to a different color and a guide (legend) will be created for that mapping.
ggplot(DF, aes(x = xvariable, y = yvariable)) +
geom_point(size = 3) +
geom_abline(aes(colour="one-to-one"), intercept =0, slope = 1, size = 1) +
stat_smooth(aes(colour="best fit"), method = "lm", se = FALSE, size = 1) +
facet_wrap(~ type) +
scale_colour_discrete("")
Try this:
# original data
type <- factor(rep(c("A", "B", "C"), 5))
x <- 1:15
y <- 2 * x + rnorm(15, 0, 2)
df <- data.frame(type, x, y)
# create a copy of original data, but set y = x
# this data will be used for the one-to-one line
df2 <- data.frame(type, x, y = x)
# bind original and 'one-to-one data' together
df3 <- rbind.data.frame(df, df2)
# create a grouping variable to separate stat_smoothers based on original and one-to-one data
df3$grp <- as.factor(rep(1:2, each = nrow(df)))
# plot
# use original data for points
# use 'double data' for abline and one-to-one line, set colours by group
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 3) +
facet_wrap(~ type) +
stat_smooth(data = df3, aes(colour = grp), method = "lm", se = FALSE, size = 1) +
scale_colour_manual(values = c("red","blue"),
labels = c("abline", "one-to-one"),
name = "") +
theme(legend.position = "top")
# If you rather want to stack the two keys in the legend you can add:
# guide = guide_legend(direction = "vertical")
#...as argument in scale_colour_manual
Please note that this solution does not extrapolate the one-to-one line outside the range of your data, which seemed to be the case for the original geom_abline.

Resources