reorder barchart by sum of bar segments with ggplot/plyr - r

I need the 11 bars in the following stacked barplot to be reordered by the sum of the first two segments of each bar, i.e. sorted by the (red+green) segments in the plot.
> dput(q1m.bl)
structure(list(ItemA = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L), .Label = c("sehr wichtig", "wichtig", "unwichtig",
"keine Angabe"), class = "factor"), ItemQ = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 9L,
10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L), .Label = c("PUSHERS_AA",
"PUSHERS_COM", "PUSHERS_BED", "PUSHERS_SEC", "PUSHERS_STAB",
"PUSHERS_COST", "PUSHERS_INNO", "PUSHERS_VAL", "PUSHERS_INDEP",
"PUSHERS_STDS", "PUSHERS_SRC"), class = "factor"), Counts = c(1L,
3L, 4L, 1L, 3L, 3L, 2L, 1L, 4L, 2L, 2L, 1L, 3L, 5L, 1L, 1L, 1L,
6L, 1L, 5L, 1L, 2L, 1L, 1L, 1L, 6L, 1L, 2L, 6L, 1L, 2L, 4L, 2L,
1L, 3L, 3L, 2L, 1L, 2L, 1L, 5L, 1L), blpos = c(0.111111111111111,
0.444444444444444, 0.888888888888889, 1, 0.333333333333333, 0.666666666666667,
0.888888888888889, 1, 0.444444444444444, 0.666666666666667, 0.888888888888889,
1, 0.333333333333333, 0.888888888888889, 1, 0.111111111111111,
0.222222222222222, 0.888888888888889, 1, 0.555555555555556, 0.666666666666667,
0.888888888888889, 1, 0.111111111111111, 0.222222222222222, 0.888888888888889,
1, 0.222222222222222, 0.888888888888889, 1, 0.222222222222222,
0.666666666666667, 0.888888888888889, 1, 0.333333333333333, 0.666666666666667,
0.888888888888889, 1, 0.222222222222222, 0.333333333333333, 0.888888888888889,
1)), .Names = c("ItemA", "ItemQ", "Counts", "blpos"), row.names = c(NA,
-42L), class = "data.frame")
The plot ...
ggplot(q1m.bl, aes(x = ItemQ, y = Counts, fill = ItemA)) +
geom_bar(stat="identity", position="fill") +
geom_text(aes(y = blpos, label = Counts), hjust = 1) +
theme(axis.text.x=element_text(angle=90, hjust = 0), text = element_text(size=10)) +
coord_flip()
Ugh, not enough rep points to embed images. Sorry for the inconvenience. Plot is here: http://i.stack.imgur.com/am0Ud.png
I played around with arrange() and after checking the data frame itself, I thought the following sorting should do the trick. (Note: blpos means "bar label position" and are the positions of the various numbers in the plot.) But plotting this "sorted" data frame leads to the identical plot as above. I do not understand which information to change to change the plotting order of the ItemQ column.
q1m.bl.s <- arrange(q1m.bl, ItemA, desc(blpos))
ggplot(q1m.bl.s, ....
What's the best approach anyway? Should I manipulate the df (using ddply/arrange/reorder/etc.) prior to plotting? Because I tend to think this is a presentation issue and should be done inside ggplot. Does it even matter? The "ggplot ordered barchart" questions I found on SO seem to use both approaches; yet none I found was referring to stacked bar segments and using factor data... hence this new question.
Thank you very much for enlightening me!

It's all about re-ordering the factor levels of the ItemQ variable.
d <- subset(q1m.bl, ItemA %in% c("sehr wichtig", "wichtig"))
totals <- aggregate(d$Counts, list(ItemQ = d$ItemQ), sum)
ItemQ.order <- as.character(totals[order(-totals$x), ]$ItemQ)
q1m.bl$ItemQ <- factor(q1m.bl$ItemQ, levels = ItemQ.order)
Then you should be able to run the code exactly as you provided it and it will produce this:
EDIT (digisus): konvas, I am just re-adding your first answer showing the use of ddply because even I do not feel comfortable with it/do not fully get it, I am sure others can benefit from it. :-) So, with your permission I repost it here:
library(plyr)
ItemQ.order <- q1m.bl %>%
group_by(ItemQ) %>%
filter(ItemA %in% c("sehr wichtig", "wichtig")) %>%
summarise(total = sum(Counts)) %>%
arrange(-total) %>%
select(ItemQ) %>%
unlist %>%
as.character
q1m.bl$ItemQ <- factor(q1m.bl$ItemQ, levels = ItemQ.order)

library(ggplot2)
fac_ord <- function(seed){
set.seed(seed)
return(sample(letters[1:4]))
}
# this seed simulates arbitrary sortings
seed <- 2
fac_ord(seed)
val = c(1,2,3,4,2,2,2,2)
fac = factor(c("a","b","c","d","a","b","c","d"),
levels=fac_ord(seed),
labels=fac_ord(seed),
ordered=FALSE)
dif = c(rep("x",4),rep("y",4))
df = data.frame(val = val, fac = fac)
ggplot(df, aes(x=fac, y=val, fill=dif)) +
geom_bar(stat="identity") +
labs(title = sprintf("seed = %d / %s", seed, paste(fac_ord(seed),collapse=",")))
As the example shows - ggplot will use same ordering for fac in the plot as the internal order of fac. So to influence the plotted order you have to write a function which returns the intended order - in dependence on whatever facts and values - and use this to create the factor fac - and then use this propperly-ordered factor for the plotting.
The intended result can also be reached by application of reorder() for reordering the levels of the factor.

Related

How to apply p-value for each group of dataframe in R using facet_wrap in ggpubr

I have a data that looks like this:
melted.df <- structure(list(Time = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("24",
"36", "48", "72"), class = "factor"), id = c(1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L,
18L, 19L, 20L, 21L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), Samples = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L), .Label = c("WT_Ago2_800", "WT_Ago2_400", "WT_Ago2_200",
"WT_Ago4_800"), class = "factor"), Size = c(0, 0, 0, 0, 0, 0,
0.3, 0, 0, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.8, 0.5, 0, 0,
0, 0, 0, 0, 0.1, 0.65, 0.2, 0.85, 0.725, 0.575, 0.1, 1.1, 0.9,
1.325, 1, 0.8, 0.5, 2.2, 1.65, 0, 0, 0, 0, 0, 0, 0.825, 1.175,
0.1, 0.55, 0.85, 0.85, 1.1, 1.4, 0.6, 0.95, 1.15, 0.975, 2.35,
1.15, 2.1, 0, 0, 0, 0, 0, 0, 0.65, 1.4, 0.55, 0.1, 0.7, 1.1,
0.95, 1.85, 0.85, 0.1, 1.5, 1.25, 1.8, 1.75, 2.15)), row.names = c(NA,
-84L), class = "data.frame")
This data consists of 4 time frames (24, 36, 48 and 72 hours). I want to use the code below to paste the p values calculated as stat.test for each time.levels and apply that to each facet_wrap. If you check for i=1, there is no p-value so it's nothing you would want to apply to the figure, and if you do i=2, you would get p-values applied to the figure. The problem is that I couldn't get the p-value applied to its respective facets. It just applies same p-value in all facets. How can I get this resolved?
code:
library(devtools)
# install_github("https://github.com/kassambara/rstatix")
library(rstatix) # https://github.com/kassambara/rstatix
library(stringi)
library(ggpubr)
time.levels <- levels(melted.df$Time)
stat.test <- NULL
for (i in 1:length(time.levels)){
stat.test <- aov(Size ~ Samples, data = melted.df[melted.df$Time == time.levels[i],]) %>%
tukey_hsd()
# stat.test <- rbind(stat.test, tmp.stat)
bp <- ggboxplot(melted.df, x = "Samples", y = "Size") +
facet_wrap(vars(Time))+
stat_pvalue_manual(
stat.test, label = "p.adj",
y.position = c(2, 2.5, 3, 3.5, 3.8, 4)
)
bp
}
Note. All your values in Size for Time == 24L are zero:
> filter(melted.df, Time == 24L) %>% select(Size) %>% summary
Size
Min. :0
1st Qu.:0
Median :0
Mean :0
3rd Qu.:0
Max. :0
If you wish to proceed anyway, you should make the plots individually and then use gridExtra::grid.arrange:
library(gridExtra)
bp <- vector("list", length = length(time.levels))
for (i in seq_along(time.levels)) {
sdf <- melted.df[melted.df$Time == time.levels[i],]
stat.test <- aov(Size ~ Samples, data = sdf) %>%
tukey_hsd()
bp[[i]] <- ggboxplot(sdf, x = "Samples", y = "Size") +
facet_wrap(vars(Time))+
stat_pvalue_manual(
stat.test, label = "p.adj",
y.position = c(2, 2.5, 3, 3.5, 3.8, 4)
)
}
do.call(grid.arrange, bp)
Note that you have to use the subset data.frame sdf as the input for ggboxplot.
You don't need to use gridExtra::grid.arrange.
Here is a clean solution.
library(rstatix) # latest version
library(ggpubr) # latest version
stat.test <- melted.df %>%
group_by(Time) %>%
tukey_hsd(Size ~ Samples)
ggboxplot(melted.df, x = "Samples", y = "Size", facet.by = "Time") +
stat_pvalue_manual(
stat.test, label = "p.adj",
y.position = c(2, 2.5, 3, 3.5, 3.8, 4)
)

Using R to fit a curve to a dataset using a specific equation

I am using R.
I would like to use a specific equation to fit a curve to one of my data sets (attached)
> dput(data)
structure(list(Gossypol = c(1036.331811, 4171.427741, 6039.995102,
5909.068158, 4140.242559, 4854.985845, 6982.035521, 6132.876396,
948.2418407, 3618.448997, 3130.376482, 5113.942098, 1180.171957,
1500.863038, 4576.787021, 5629.979049, 3378.151945, 3589.187889,
2508.417927, 1989.576826, 5972.926124, 2867.610671, 450.7205451,
1120.955, 3470.09352, 3575.043632, 2952.931863, 349.0864019,
1013.807628, 910.8879471, 3743.331903, 3350.203452, 592.3403778,
1517.045807, 1504.491931, 3736.144027, 2818.419785, 723.885643,
1782.864308, 1414.161257, 3723.629772, 3747.076592, 2005.919344,
4198.569251, 2228.522959, 3322.115942, 4274.324792, 720.9785449,
2874.651764, 2287.228752, 5654.858696, 1247.806111, 1247.806111,
2547.326207, 2608.716056, 1079.846532), Treatment = structure(c(2L,
3L, 4L, 5L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 1L), .Label = c("C", "1c_2d", "3c_2d",
"9c_2d", "1c_7d"), class = "factor"), Damage_cm = c(0.4955, 1.516,
4.409, 3.2665, 0.491, 2.3035, 3.51, 1.8115, 0, 0.4435, 1.573,
1.8595, 0, 0.142, 2.171, 4.023, 4.9835, 0, 0.6925, 1.989, 5.683,
3.547, 0, 0.756, 2.129, 9.437, 3.211, 0, 0.578, 2.966, 4.7245,
1.8185, 0, 1.0475, 1.62, 5.568, 9.7455, 0, 0.8295, 2.411, 7.272,
4.516, 0, 0.4035, 2.974, 8.043, 4.809, 0, 0.6965, 1.313, 5.681,
3.474, 0, 0.5895, 2.559, 0)), .Names = c("Gossypol", "Treatment",
"Damage_cm"), row.names = c(NA, -56L), class = "data.frame")
The equation is: y~yo+a*(1-b^x)
Where:
y = Gossypol (from my data set)
x = Damage_cm (from my data set)
The other 3 parameters are unknown:
yo = Intercept, a = asymptote and b = slope
I guess I have to use the package nls2. So far I wrote the following code:
data<-read.csv("Regression_exp2.csv",header=T, sep = ",")
library(nls2)
attach(data)
m<-nls(Gossypol~Y+A*(1-B^Damage_cm),data=data,start = list(Y=1700,A=4000,B=1))
This gives me the error message:
Error in nlsModel(formula, mf, start, wts): singular gradient matrix
at initial parameter estimates
In the end I would like to use the equation to plot a curve (with SE interval, I usually use ggplot2)
Furthermore, I would like to know the R2 and p value.
I would also be interested in the parameters yo , a and b
I have never done this before and would be extremely grateful if anyone could help me or give me a hint how to do this in R? I suppose I have to use a non linear approach (glm(...))
thanks a lot,
Mike
You have to adjust your starting values a bit:
> data
Gossypol Treatment Damage_cm
1 1036.3318 1c_2d 0.4955
2 4171.4277 3c_2d 1.5160
3 6039.9951 9c_2d 4.4090
4 5909.0682 1c_7d 3.2665
5 4140.2426 1c_2d 0.4910
...
54 2547.3262 1c_2d 0.5895
55 2608.7161 3c_2d 2.5590
56 1079.8465 C 0.0000
Then you can call:
m<-nls(data$Gossypol~Y+A*(1-B^data$Damage_cm),data=data,start = list(Y=1000,A=3000,B=0.5))
Printing m gives you:
> m
Nonlinear regression model
model: data$Gossypol ~ Y + A * (1 - B^data$Damage_cm)
data: data
Y A B
1303.4450 2796.0385 0.4939
residual sum-of-squares: 1.03e+08
Now you can get the data based on the fit:
fitData <- 1303.4450 + 2796.0385*(1-0.4939^data$Damage_cm)
Plot the data to compare the fit and the original data:
plot(data$Damage_cm, data$Gossypol, col='black')
par(new=T)
plot(data$Damage_cm,fitData, col='red', ylim=c(0,8000), axes=F, ylab='')
which gives you:
If you want to use nls2 make sure it is installed and if not you can use
install.packages('nls2')
to do so.
library(nls2)
m2<-nls2(data$Gossypol~Y+A*(1-B^data$Damage_cm),data=data,start = list(Y=1000,A=3000,B=0.5))
which gives you the same values as nls:
> m2
Nonlinear regression model
model: data$Gossypol ~ Y + A * (1 - B^data$Damage_cm)
data: structure(list(Gossypol = c(1036.331811, 4171.427741, 6039.995102, 5909.068158, 4140.242559, 4854.985845, 6982.035521, 6132.876396, 948.2418407, 3618.448997, 3130.376482, 5113.942098, 1180.171957, 1500.863038, 4576.787021, 5629.979049, 3378.151945, 3589.187889, 2508.417927, 1989.576826, 5972.926124, 2867.610671, 450.7205451, 1120.955, 3470.09352, 3575.043632, 2952.931863, 349.0864019, 1013.807628, 910.8879471, 3743.331903, 3350.203452, 592.3403778, 1517.045807, 1504.491931, 3736.144027, 2818.419785, 723.885643, 1782.864308, 1414.161257, 3723.629772, 3747.076592, 2005.919344, 4198.569251, 2228.522959, 3322.115942, 4274.324792, 720.9785449, 2874.651764, 2287.228752, 5654.858696, 1247.806111, 1247.806111, 2547.326207, 2608.716056, 1079.846532), Treatment = structure(c(2L, 3L, 4L, 5L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 1L), .Label = c("C", "1c_2d", "3c_2d", "9c_2d", "1c_7d"), class = "factor"), Damage_cm = c(0.4955, 1.516, 4.409, 3.2665, 0.491, 2.3035, 3.51, 1.8115, 0, 0.4435, 1.573, 1.8595, 0, 0.142, 2.171, 4.023, 4.9835, 0, 0.6925, 1.989, 5.683, 3.547, 0, 0.756, 2.129, 9.437, 3.211, 0, 0.578, 2.966, 4.7245, 1.8185, 0, 1.0475, 1.62, 5.568, 9.7455, 0, 0.8295, 2.411, 7.272, 4.516, 0, 0.4035, 2.974, 8.043, 4.809, 0, 0.6965, 1.313, 5.681, 3.474, 0, 0.5895, 2.559, 0)), .Names = c("Gossypol", "Treatment", "Damage_cm"), row.names = c(NA, -56L), class = "data.frame")
Y A B
1303.4450 2796.0385 0.4939
residual sum-of-squares: 1.03e+08
Number of iterations to convergence: 2
Achieved convergence tolerance: 4.936e-06
If you prefer ggplot2:
ggplot(data, aes(x = Damage_cm, y = Gossypol)) +
geom_point() +
geom_smooth(method = "nls",
formula = y ~ Y + A * (1 - B^x),
start = c(Y=1000, A=3000, B=0.5), se = F)
Though I'm afraid the standard errors would have to be simulated outside of ggplot.

Faceting bars in ggplot2

I have this problem: I want to build a stacked bar plot with the faceting capabilities, so I can compare the distribution of frequencies for five common categories, within two different objects, separated according to three groups. I have six objects, five categories and three groups. The problem is that each group has only two different and exclusive objects to plot, but so far I can only produce a plot in which the six objects are plotted across the three groups. This is not optimal, since for each group I have four objects with no data.
Is it possible to plot just two objects for each group with the faceting capabilities?
EDITED
This is my data:
structure(list(Face = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L), .Label = c("LGH002", "LGH003", "LGM009",
"SCM018", "VAH022", "VAM028"), class = "factor"), Race = structure(c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L),
.Label = c("1. Amerindian", "2. White", "3. Mestizo", "4. Other races",
"5. Cannot tell"), class = "factor"), Count = c(19L, 0L, 13L, 8L, 0L, 2L,
7L, 23L, 6L, 2L, 1L, 1L, 29L, 6L, 3L, 29L, 0L, 11L, 0L, 0L, 0L, 38L, 1L, 0L,
1L, 0L, 30L, 9L, 0L, 1L), Density = c(0.475, 0, 0.325, 0.2, 0,
0.05, 0.175, 0.575, 0.15, 0.05, 0.025, 0.025, 0.725, 0.15,
0.075, 0.725, 0, 0.275, 0, 0, 0, 0.95, 0.025, 0, 0.025, 0,
0.75, 0.225, 0, 0.025), School = structure(c(1L, 1L, 1L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Municipal",
"Private Fee-Paying", "Private-Voucher"), class = "factor")),
.Names =c("Face", "Race", "Count", "Density", "School"),
class = "data.frame", row.names = c(NA, -30L))
This is the code I'm using to build the plot:
P <- ggplot(data = races.df, aes(x = Face, y = Density, fill = Race)) +
geom_bar(stat="identity") +
scale_y_continuous(labels=percent)
P + facet_grid(School ~ ., scales="free") + coord_flip()
As you can imagine, I only want to see the x-values "SCM018" and "LGH002" in "Municipal"; "LGM009" and "LGH003" in "Private-Voucher"; and "VAH022" and "VAM028" in "Private Fee-Paying" (only two objects per group). Is it possible? Any help?
All the best,
Mauricio.

How to display date format (Sep-12) in bar chart and set y axis limit?

I have two questions on building a bar plot by using ggplot().
How to display data format (Sep-12)?
I would like to display the date in the format of Sep-12. My data is a quarterly summary. I would like to show Mar, Jun, Sep and Dec quarters. However, I used the as.Date(YearQuarter) within the ggplot() function. It shows a different sequence of Apr, July, Oct, Jan.
How to increase y axis limit?
The y axis is set at 70%, one of value label is out of the picutre. I have added ylim(0,1) to increase the y limit to 1. However, I lost the percentage format as the y axis is not displaying the percentage anymore.
x4.can.t.m <- structure(list(NR_CAT = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L), .Label = c("0%", "1 to 84%", "85% +"
), class = "factor"), TYPE = structure(c(1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("PM BUSINESS", "PM CONSUMER",
"PREPAY"), class = "factor"), YearQuarter = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("2011-09-01",
"2011-12-01", "2012-03-01", "2012-06-01", "2012-09-01"), class = "factor"),
value = c(0.5, 0, 0.5, 0.35, 0, 0.65, 0.28, 0.02, 0.7, 0.4,
0, 0.6, 0.38, 0, 0.62, 0.43, 0.01, 0.56, 0.57, 0, 0.43, 0.35,
0, 0.65, 0.39, 0.01, 0.6, 0.55, 0, 0.45, 0.4, 0.02, 0.58,
0.35, 0.02, 0.63, 0.35, 0, 0.65, 0.55, 0.01, 0.44, 0.47,
0, 0.53)), .Names = c("NR_CAT", "TYPE", "YearQuarter", "value"
), row.names = c(NA, -45L), class = "data.frame")
This is my plot code:
x4.can.t.m$YearQuarter <- as.Date(x4.can.t.m$YearQuarter)
x4.can.t.d.bar <- ggplot(data=x4.can.t.m, aes(x=YearQuarter, y=value,fill=NR_CAT)) +
geom_bar(stat="identity",position = "dodge",ymax=NR_CAT+0.2) +
facet_wrap(~TYPE,ncol=1) +
geom_text(aes(label =paste(round(value*100,0),"%",sep="")),
position=position_dodge(width=0.9),
vjust=-0.25,size=3) +
scale_y_continuous(formatter='percent',ylim=1) +
labs(y="Percentage",x="Year Quarter") +
ylim(0,100%)
x4.can.t.d.bar +scale_fill_manual("Canopy Indicators",values=tourism.cols(c(6,9,8)))+
opts(title="Canopy Indicator: All Customers portout for Network
Issues",size=4)
It looks like you have an older version of ggplot; the following is for ggplot 0.2.9.1. I had to fix several things to make your plot work. Starting from your original definition of x4.can.t.m:
x4.can.t.m$YearQuarter <- format(as.Date(x4.can.t.m$YearQuarter),"%b-%y")
library("scales")
ggplot(data=x4.can.t.m, aes(x=YearQuarter, y=value, fill=NR_CAT)) +
geom_bar(stat="identity", position = "dodge") +
geom_text(aes(label = paste(round(value*100,0),"%",sep=""), group=NR_CAT),
position=position_dodge(width=0.9),
vjust=-0.25, size=3) +
scale_y_continuous("Percentage", labels=percent, limits=c(0,1)) +
labs(x="Year Quarter") +
scale_fill_discrete("Canopy Indicators") +
facet_wrap(~TYPE,ncol=1) +
ggtitle("Canopy Indicator: All Customers portout for Network Issues") +
theme(plot.title = element_text(size=rel(1.2)))
The first part of the question is just achieved by formatting YearQuarter into the format you wanted, leaving it as a string.
The second part specifies the limits in scale_y_continuous and uses the labels argument to specify the formatting function. Note that library("scales") is needed for this part to work.

Removing per-panel unused factors in a bar chart

I am trying to make a plot using barchart from lattice, but I am having some issues with unused factors for a given panel. I have tried using drop.unused.levels but it seems it only drops factors when they are not used in any panel.
This is the data frame that I am using:
dm <- structure(list(Benchmark = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), class = "factor", .Label = c("416.gamess",
"429.mcf", "436.cactusADM", "458.sjeng", "462.libquantum", "471.omnetpp",
"482.sphinx3")), Class = structure(c(3L, 1L, 2L, 3L, 1L, 4L,
2L, 3L, 1L, 2L, 3L, 1L, 4L, 2L, 3L, 1L, 2L, 3L, 1L, 4L, 2L, 3L,
1L, 2L, 3L, 1L, 4L, 2L, 3L, 1L, 2L, 3L, 1L, 4L, 2L), class = "factor", .Label = c("CS",
"PF", "PI", "PU")), Config = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("Disabled",
"Shallowest", "Deepest", "StorePref", "StridedPref"), class = "factor"),
Perf = c(1, 0.72, 0.8, 1, 0.32, 1.16, 0.79, 1, 0.98, 1, 1,
0.72, 1, 0.99, 1, 0.98, 1, 1, 1.12, 0.97, 1, 1, 0.97, 1,
1, 0.99, 0.97, 1, 1, 1.18, 1, 1, 0.99, 0.97, 1)), .Names = c("Benchmark",
"Class", "Config", "Perf"), row.names = c(NA, -35L), class = "data.frame")
First I attempted using barchart like this:
barchart(Perf ~ Benchmark | Class, dm, groups=Config,
scales=list(x=list(relation='free')), auto.key=list(columns=3))
That gave me the following plot:
As you can see, there is a gap between the benchmarks for PI, PF and CS classes. The reason is that each factor is only present in a given class, thus it is missing in all the others, and barchart might introduce a gap in the x axis.
My second attempt was to call barchart four times (one for each class):
class.subset <- function(dframe, class.name) {
return(dframe[dframe$Class == class.name, ])
}
pl1 <- barchart(Perf ~ Benchmark, class.subset(dm, 'PI'), groups=Config)
pl2 <- barchart(Perf ~ Benchmark, class.subset(dm, 'PF'),, groups=Config)
pl3 <- barchart(Perf ~ Benchmark, class.subset(dm, 'CS'),, groups=Config)
pl4 <- barchart(Perf ~ Benchmark, class.subset(dm, 'PU'),, groups=Config)
print(pl1, split=c(1, 1, 2, 2), more = TRUE)
print(pl2, split=c(1, 2, 2, 2), more = TRUE)
print(pl3, split=c(2, 1, 2, 2), more = TRUE)
print(pl4, split=c(2, 2, 2, 2))
The plot that I got is pretty much what I want, but now I do not know how to create a single global legend for all the subplots (instead of the very same legend for each subplot):
Ideally, I would prefer to solve the problem that I am facing using the first approach (since in that way I would also have the class name in each of the panels). However, if in the second case, it is possible to add a global legend and a title for each subplot containing the class name, that would be okay too.
Here's a quick way using latticeExtra:
pl1 <- barchart(Perf ~ Benchmark|Class, class.subset(dm, 'PI'), groups=Config,
auto.key=list(columns=3))
pl2 <- barchart(Perf ~ Benchmark|Class, class.subset(dm, 'PF'), groups=Config)
pl3 <- barchart(Perf ~ Benchmark|Class, class.subset(dm, 'CS'), groups=Config)
pl4 <- barchart(Perf ~ Benchmark|Class, class.subset(dm, 'PU'), groups=Config)
library(latticeExtra)
pls <- c(pl1, pl2, pl3, pl4)
pls <- update(pls, scales=list(y="same"))
pls
I was just having the same problem in a factor with 95 levels and for lattice::xyplot.
What worked for me is (with factor being the variable with too many levels):
library(gdata)
key<-simpleKey(levels(drop.levels(df$factor)),...)
xyplot(response~predictor,groups=factor, data=df, key=key)
Worked like a charm for me.
Best wishes!

Resources