Adding trend lines/boxplots (by group) in ggplot2 - r

I have 40 subjects, of two groups, over 15 weeks, with some measured variable (Y).
I wish to have a plot where: x = time, y = T, lines are by subjects and colours by groups.
I found it can be done like this:
TIME <- paste("week",5:20)
ID <- 1:40
GROUP <- sample(c("a","b"),length(ID), replace = T)
group.id <- data.frame(GROUP, ID)
a <- expand.grid(TIME, ID)
colnames(a) <-c("TIME", "ID")
group.id.time <- merge(a, group.id)
Y <- rnorm(dim(group.id.time)[1], mean = ifelse(group.id.time$GROUP =="a",1,3) )
DATA <- cbind(group.id.time, Y)
qplot(data = DATA,
x=TIME, y=Y,
group=ID,
geom = c("line"),colour = GROUP)
But now I wish to add to the plot something to show the difference between the two groups (for example, a trend line for each group, with some CI shadelines) - how can it be done?
I remember once seeing the ggplot2 can (easily) do this with geom_smooth, but I am missing something about how to make it work.
Also, I wondered at maybe having the lines be like a boxplot for each group (with a line for the different quantiles and fences and so on). But I imagine answering the first question would help me resolve the second.
Thanks.

p <- ggplot(data=DATA, aes(x=TIME, y=Y, group=ID)) +
geom_line(aes(colour=GROUP)) +
geom_smooth(aes(group=GROUP))
geom_smooth plot http://img143.imageshack.us/img143/7678/geomsmooth.png

Related

How to draw different line segment with different facets

I have a question about using geom_segment in R ggplot2.
For example, I have three facets and two clusters of points(points which have the same y values) in each facets, how do I draw multiple vertical line segments for each clustering with geom_segment?
Like if my data is
x <- (1:24)
y <- (rep(1,2),2,rep(2,2),1,rep(3,2),4, rep(4,1),5,6, ..rep(8,2),7)
facets <-(1,2,3)
factors <-(1,2,3,4,5,6)
xmean <- ( (1+2+3)/3, (4+5+6)/3, ..., (22+23+24)/3)
Note: (1+2+3)/3 is the mean first cluster in the first facet and (4+5+6)/3 is the mean second cluster in the second facet and (7+8+9)/3 is the first cluster in the second facet.
My Code:
ggplot(,aes(x=as.numeric(x),y=as.numeric(y),color=factors)+geom_point(alpha=0.85,size=1.85)+facet_grid(~facets)
+geom_segment(what should I put here to draw this line in different factors?)
Desired result:
Please see the picture!
Please see the updated picture!
Thank you so much! Have a nice day :).
Maybe this is what you are looking for. Instead of working with vectors put your data in a dataframe. Doing so you could easily make an aggregated dataframe with the mean values per facet and cluster which makes it easy to the segments:
Note: Wasn't sure about the setup of your data. You talk about two clusters per facet but your data has 8. So I slightly changed the example data.
library(ggplot2)
library(dplyr)
df <- data.frame(
x = 1:24,
y = rep(1:6, each = 4),
facets = rep(1:3, each = 8)
)
df_sum <- df %>%
group_by(facets, y) %>%
summarise(x = mean(x))
#> `summarise()` has grouped output by 'facets'. You can override using the `.groups` argument.
ggplot(df, aes(x, y, color = factor(y))) +
geom_point(alpha = 0.85, size = 1.85) +
geom_segment(data = df_sum, aes(x = x, xend = x, y = y - .25, yend = y + .25), color = "black") +
facet_wrap(~facets)

reorder barchart as bell curve (density) in R

Lets say I have a data frame :
df <- data.frame(x = c("A","B","C"), y = c(10,20,30))
and I wish to plot it with ggplot2 such that I get a plot like a histogram ( where instead of plotting count I plot my y column values from the data frame. ( I don't mind if the x column is a factor column or a character column.
I will add that I know how to reorder a bar chart by descending/ascending, but ordering like a histogram (highest values in the middle- around the mean and decreasing to both sides) is still beyond me.
I thought of transmuting the data such that I can fit it in a histogram - like creating a vector with 10 "A"objects, 20 "B" and 30 "C" and then running a histogram on that. But its not practical for what I'm trying to do as it seems like a lazy and highly inefficient way to do it. Also the df data frame is huge as it is- so multiplying by millions etc is not going to be kind on my system.
This seems like a strange thing to want to do, since if the ordering is not already implicit in your x variables, then ordering as a bell curve is at best artificial. However, it's fairly trivial to implement if you really want to...
library(ggplot2)
df <- data.frame(yvals = floor(abs(rnorm(26)) * 100),
xvals = LETTERS,
stringsAsFactors = FALSE)
ggplot(data = df, aes(x = xvals, y = yvals)) + geom_bar(stat = "identity")
ordered <- order(df$yvals)
left_half <- ordered[seq(1, length(ordered), 2)]
right_half <- rev(ordered[seq(2, length(ordered), 2)])
new_order <- c(left_half, right_half)
df2 <- df[new_order,]
df2$xvals <- factor(df2$xvals, levels = df2$xvals)
ggplot(data = df2, aes(x = xvals, y = yvals)) + geom_bar(stat = "identity")

ggplot2: Different vlines for each graph using facet_wrap [duplicate]

I've poked around, but been unable to find an answer. I want to do a weighted geom_bar plot overlaid with a vertical line that shows the overall weighted average per facet. I'm unable to make this happen. The vertical line seems to a single value applied to all facets.
require('ggplot2')
require('plyr')
# data vectors
panel <- c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
instrument <-c("V1","V2","V1","V1","V1","V2","V1","V1","V2","V1","V1","V2","V1","V1","V2","V1")
cost <- c(1,4,1.5,1,4,4,1,2,1.5,1,2,1.5,2,1.5,1,2)
sensitivity <- c(3,5,2,5,5,1,1,2,3,4,3,2,1,3,1,2)
# put an initial data frame together
mydata <- data.frame(panel, instrument, cost, sensitivity)
# add a "contribution to" vector to the data frame: contribution of each instrument
# to the panel's weighted average sensitivity.
myfunc <- function(cost, sensitivity) {
return(cost*sensitivity/sum(cost))
}
mydata <- ddply(mydata, .(panel), transform, contrib=myfunc(cost, sensitivity))
# two views of each panels weighted average; should be the same numbers either way
ddply(mydata, c("panel"), summarize, wavg=weighted.mean(sensitivity, cost))
ddply(mydata, c("panel"), summarize, wavg2=sum(contrib))
# plot where each panel is getting its overall cost-weighted sensitivity from. Also
# put each panel's weighted average on the plot as a simple vertical line.
#
# PROBLEM! I don't know how to get geom_vline to honor the facet breakdown. It
# seems to be computing it overall the data and showing the resulting
# value identically in each facet plot.
ggplot(mydata, aes(x=sensitivity, weight=contrib)) +
geom_bar(binwidth=1) +
geom_vline(xintercept=sum(contrib)) +
facet_wrap(~ panel) +
ylab("contrib")
If you pass in the presumarized data, it seems to work:
ggplot(mydata, aes(x=sensitivity, weight=contrib)) +
geom_bar(binwidth=1) +
geom_vline(data = ddply(mydata, "panel", summarize, wavg = sum(contrib)), aes(xintercept=wavg)) +
facet_wrap(~ panel) +
ylab("contrib") +
theme_bw()
Example using dplyr and facet_wrap incase anyone wants it.
library(dplyr)
library(ggplot2)
df1 <- mutate(iris, Big.Petal = Petal.Length > 4)
df2 <- df1 %>%
group_by(Species, Big.Petal) %>%
summarise(Mean.SL = mean(Sepal.Length))
ggplot() +
geom_histogram(data = df1, aes(x = Sepal.Length, y = ..density..)) +
geom_vline(data = df2, mapping = aes(xintercept = Mean.SL)) +
facet_wrap(Species ~ Big.Petal)
vlines <- ddply(mydata, .(panel), summarize, sumc = sum(contrib))
ggplot(merge(mydata, vlines), aes(sensitivity, weight = contrib)) +
geom_bar(binwidth = 1) + geom_vline(aes(xintercept = sumc)) +
facet_wrap(~panel) + ylab("contrib")

changing y scale when using fun.y ggplot

This an example of my data
library(ggplot)
set.seed(1)
df <- data.frame(Groups = factor(rep(1:10, each = 10)))
x <- sample(1:100, 50)
df[x, "Style"] <- "Lame"
df[-x, "Style"] <- "Cool"
df$Style <- factor(df$Style)
p <- ggplot() + stat_summary(data = df, aes(Groups, Style, fill = Style),
geom = "bar", fun.y = length, position=position_dodge())
(Sorry, this is my first question... I don't know how to present code snippets like head(df) or the actual plot in SO. Please run this code to understand my question.)
So the plot adequately presents the count of every 'Style' per 'Groups'. However, the y axis scale shows the levels of the factor variable 'Style'. Although values I am plotting are originally discrete, the count of every 'Cool' and 'Lame' per 'Groups' is continuous.
How do I change the 'y' scale of my barplot from discrete to continuous in ggplot2, in order to correspond to the count values and not the original factor levels???
You can take advantage of ggplot grouping and the histogram to do this for you
p <- ggplot(df, aes(Groups, fill=Style)) + geom_histogram(position=position_dodge())

ggplot2-line plotting with TIME series and multi-spline

This question's theme is simple but drives me crazy:
1. how to use melt()
2. how to deal with multi-lines in single one image?
Here is my raw data:
a 4.17125 41.33875 29.674375 8.551875 5.5
b 4.101875 29.49875 50.191875 13.780625 4.90375
c 3.1575 29.621875 78.411875 25.174375 7.8012
Q1:
I've learn from this post Plotting two variables as lines using ggplot2 on the same graph to know how to draw the multi-lines for multi-variables, just like this:
The following codes can get the above plot. However, the x-axis is indeed time-series.
df <- read.delim("~/Desktop/df.b", header=F)
colnames(df)<-c("sample",0,15,30,60,120)
df2<-melt(df,id="sample")
ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) + geom_line() + geom_point()
I wish it could treat 0 15 30 60 120 as real number to show the time series, rather than name_characteristics. Even having tried this, I failed.
row.names(df)<-df$sample
df<-df[,-1]
df<-as.matrix(df)
df2 <- data.frame(sample = factor(rep(row.names(df),each=5)), Time = factor(rep(c(0,15,30,60,120),3)),Values = c(df[1,],df[2,],df[3,]))
ggplot(data = df2, aes(x=Time, y= Values, group = sample, colour=sample))
+ geom_line()
+ geom_point()
Loooooooooking forward to your help.
Q2:
I've learnt that the following script can add the spline() function for single one line, what about I wish to apply spline() for all the three lines in single one image?
n <-10
d <- data.frame(x =1:n, y = rnorm(n))
ggplot(d,aes(x,y))+ geom_point()+geom_line(data=data.frame(spline(d, n=n*10)))
Your variable column is a factor (you can verify by calling str(df2)). Just convert it back to numeric:
df2$variable <- as.numeric(as.character(df2$variable))
For your other question, you might want to stick with using geom_smooth or stat_smooth, something like this:
p <- ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) +
geom_line() +
geom_point()
library(splines)
p + geom_smooth(aes(group = sample),method = "lm",formula = y~bs(x),se = FALSE)
which gives me something like this:

Resources