Removing per-panel unused factors in a bar chart - r

I am trying to make a plot using barchart from lattice, but I am having some issues with unused factors for a given panel. I have tried using drop.unused.levels but it seems it only drops factors when they are not used in any panel.
This is the data frame that I am using:
dm <- structure(list(Benchmark = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), class = "factor", .Label = c("416.gamess",
"429.mcf", "436.cactusADM", "458.sjeng", "462.libquantum", "471.omnetpp",
"482.sphinx3")), Class = structure(c(3L, 1L, 2L, 3L, 1L, 4L,
2L, 3L, 1L, 2L, 3L, 1L, 4L, 2L, 3L, 1L, 2L, 3L, 1L, 4L, 2L, 3L,
1L, 2L, 3L, 1L, 4L, 2L, 3L, 1L, 2L, 3L, 1L, 4L, 2L), class = "factor", .Label = c("CS",
"PF", "PI", "PU")), Config = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("Disabled",
"Shallowest", "Deepest", "StorePref", "StridedPref"), class = "factor"),
Perf = c(1, 0.72, 0.8, 1, 0.32, 1.16, 0.79, 1, 0.98, 1, 1,
0.72, 1, 0.99, 1, 0.98, 1, 1, 1.12, 0.97, 1, 1, 0.97, 1,
1, 0.99, 0.97, 1, 1, 1.18, 1, 1, 0.99, 0.97, 1)), .Names = c("Benchmark",
"Class", "Config", "Perf"), row.names = c(NA, -35L), class = "data.frame")
First I attempted using barchart like this:
barchart(Perf ~ Benchmark | Class, dm, groups=Config,
scales=list(x=list(relation='free')), auto.key=list(columns=3))
That gave me the following plot:
As you can see, there is a gap between the benchmarks for PI, PF and CS classes. The reason is that each factor is only present in a given class, thus it is missing in all the others, and barchart might introduce a gap in the x axis.
My second attempt was to call barchart four times (one for each class):
class.subset <- function(dframe, class.name) {
return(dframe[dframe$Class == class.name, ])
}
pl1 <- barchart(Perf ~ Benchmark, class.subset(dm, 'PI'), groups=Config)
pl2 <- barchart(Perf ~ Benchmark, class.subset(dm, 'PF'),, groups=Config)
pl3 <- barchart(Perf ~ Benchmark, class.subset(dm, 'CS'),, groups=Config)
pl4 <- barchart(Perf ~ Benchmark, class.subset(dm, 'PU'),, groups=Config)
print(pl1, split=c(1, 1, 2, 2), more = TRUE)
print(pl2, split=c(1, 2, 2, 2), more = TRUE)
print(pl3, split=c(2, 1, 2, 2), more = TRUE)
print(pl4, split=c(2, 2, 2, 2))
The plot that I got is pretty much what I want, but now I do not know how to create a single global legend for all the subplots (instead of the very same legend for each subplot):
Ideally, I would prefer to solve the problem that I am facing using the first approach (since in that way I would also have the class name in each of the panels). However, if in the second case, it is possible to add a global legend and a title for each subplot containing the class name, that would be okay too.

Here's a quick way using latticeExtra:
pl1 <- barchart(Perf ~ Benchmark|Class, class.subset(dm, 'PI'), groups=Config,
auto.key=list(columns=3))
pl2 <- barchart(Perf ~ Benchmark|Class, class.subset(dm, 'PF'), groups=Config)
pl3 <- barchart(Perf ~ Benchmark|Class, class.subset(dm, 'CS'), groups=Config)
pl4 <- barchart(Perf ~ Benchmark|Class, class.subset(dm, 'PU'), groups=Config)
library(latticeExtra)
pls <- c(pl1, pl2, pl3, pl4)
pls <- update(pls, scales=list(y="same"))
pls

I was just having the same problem in a factor with 95 levels and for lattice::xyplot.
What worked for me is (with factor being the variable with too many levels):
library(gdata)
key<-simpleKey(levels(drop.levels(df$factor)),...)
xyplot(response~predictor,groups=factor, data=df, key=key)
Worked like a charm for me.
Best wishes!

Related

How to plot predicted values with standard errors for lmer model results?

I have a transplant experiment for four locations and four substrates (taken from each location). I have determined survival for each population in each location and substrate combination. This experiment was replicated three times.
I have created a lmm as follows:
Survival.model <- lmer(Survival ~ Location + Substrate + Location:Substrate + (1|Replicate), data=Transplant.Survival,, REML = TRUE)
I would like to use the predict command to extract predictions, for example:
Survival.pred <- predict(Survival.model)
Then extract standard errors so that I can plot them with the predictions to generate something like the following plot:
I know how to do this with a standard glm (which is how I created the example plot), but am not sure if I can or should do this with an lmm.
Can I do this or am I as a new user of linear mixed models missing something fundamental?
I did find this post on Stack Overflow which was not helpful.
Based on a comment from RHertel, maybe I should have phrased the
question: How do I plot model estimates and confidence intervals for
my lmer model results so that I can get a similar plot to the one I
have created above?
Sample Data:
Transplant.Survival <- structure(list(Location = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Steninge", "Molle",
"Kampinge", "Kaseberga"), class = "factor"), Substrate = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L,
4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("Steninge",
"Molle", "Kampinge", "Kaseberga"), class = "factor"), Replicate = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor"), Survival = c(1, 1, 1, 0.633333333333333,
0.966666666666667, 0.5, 0.3, 0.233333333333333, 0.433333333333333,
0.966666666666667, 0.866666666666667, 0.5, 0.6, 0.266666666666667,
0.733333333333333, 0.6, 0.3, 0.5, 0.3, 0.633333333333333, 0.9,
0.266666666666667, 0.633333333333333, 0.7, 0.633333333333333,
0.833333333333333, 0.9, 0.6, 0.166666666666667, 0.333333333333333,
0.433333333333333, 0.6, 0.9, 0.6, 0.133333333333333, 0.566666666666667,
0.633333333333333, 0.633333333333333, 0.766666666666667, 0.766666666666667,
0.0333333333333333, 0.733333333333333, 0.3, 1.03333333333333,
0.6, 1)), .Names = c("Location", "Substrate", "Replicate", "Survival"
), class = "data.frame", row.names = c(NA, -46L))
Edit: fixed bug in function / figure.
If you like to plot estimates with CI, you may want to look at the sjp.lmer function in the sjPlot package. See some example of the various plot types here.
Furthermore, the arm package provides function for computing standard Errors (arm::se.fixef and arm::se.ranef)
sjp.setTheme("forestgrey") # plot theme
sjp.lmer(Survival.model, type = "fe")
would give following plot

Using R to fit a curve to a dataset using a specific equation

I am using R.
I would like to use a specific equation to fit a curve to one of my data sets (attached)
> dput(data)
structure(list(Gossypol = c(1036.331811, 4171.427741, 6039.995102,
5909.068158, 4140.242559, 4854.985845, 6982.035521, 6132.876396,
948.2418407, 3618.448997, 3130.376482, 5113.942098, 1180.171957,
1500.863038, 4576.787021, 5629.979049, 3378.151945, 3589.187889,
2508.417927, 1989.576826, 5972.926124, 2867.610671, 450.7205451,
1120.955, 3470.09352, 3575.043632, 2952.931863, 349.0864019,
1013.807628, 910.8879471, 3743.331903, 3350.203452, 592.3403778,
1517.045807, 1504.491931, 3736.144027, 2818.419785, 723.885643,
1782.864308, 1414.161257, 3723.629772, 3747.076592, 2005.919344,
4198.569251, 2228.522959, 3322.115942, 4274.324792, 720.9785449,
2874.651764, 2287.228752, 5654.858696, 1247.806111, 1247.806111,
2547.326207, 2608.716056, 1079.846532), Treatment = structure(c(2L,
3L, 4L, 5L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 1L), .Label = c("C", "1c_2d", "3c_2d",
"9c_2d", "1c_7d"), class = "factor"), Damage_cm = c(0.4955, 1.516,
4.409, 3.2665, 0.491, 2.3035, 3.51, 1.8115, 0, 0.4435, 1.573,
1.8595, 0, 0.142, 2.171, 4.023, 4.9835, 0, 0.6925, 1.989, 5.683,
3.547, 0, 0.756, 2.129, 9.437, 3.211, 0, 0.578, 2.966, 4.7245,
1.8185, 0, 1.0475, 1.62, 5.568, 9.7455, 0, 0.8295, 2.411, 7.272,
4.516, 0, 0.4035, 2.974, 8.043, 4.809, 0, 0.6965, 1.313, 5.681,
3.474, 0, 0.5895, 2.559, 0)), .Names = c("Gossypol", "Treatment",
"Damage_cm"), row.names = c(NA, -56L), class = "data.frame")
The equation is: y~yo+a*(1-b^x)
Where:
y = Gossypol (from my data set)
x = Damage_cm (from my data set)
The other 3 parameters are unknown:
yo = Intercept, a = asymptote and b = slope
I guess I have to use the package nls2. So far I wrote the following code:
data<-read.csv("Regression_exp2.csv",header=T, sep = ",")
library(nls2)
attach(data)
m<-nls(Gossypol~Y+A*(1-B^Damage_cm),data=data,start = list(Y=1700,A=4000,B=1))
This gives me the error message:
Error in nlsModel(formula, mf, start, wts): singular gradient matrix
at initial parameter estimates
In the end I would like to use the equation to plot a curve (with SE interval, I usually use ggplot2)
Furthermore, I would like to know the R2 and p value.
I would also be interested in the parameters yo , a and b
I have never done this before and would be extremely grateful if anyone could help me or give me a hint how to do this in R? I suppose I have to use a non linear approach (glm(...))
thanks a lot,
Mike
You have to adjust your starting values a bit:
> data
Gossypol Treatment Damage_cm
1 1036.3318 1c_2d 0.4955
2 4171.4277 3c_2d 1.5160
3 6039.9951 9c_2d 4.4090
4 5909.0682 1c_7d 3.2665
5 4140.2426 1c_2d 0.4910
...
54 2547.3262 1c_2d 0.5895
55 2608.7161 3c_2d 2.5590
56 1079.8465 C 0.0000
Then you can call:
m<-nls(data$Gossypol~Y+A*(1-B^data$Damage_cm),data=data,start = list(Y=1000,A=3000,B=0.5))
Printing m gives you:
> m
Nonlinear regression model
model: data$Gossypol ~ Y + A * (1 - B^data$Damage_cm)
data: data
Y A B
1303.4450 2796.0385 0.4939
residual sum-of-squares: 1.03e+08
Now you can get the data based on the fit:
fitData <- 1303.4450 + 2796.0385*(1-0.4939^data$Damage_cm)
Plot the data to compare the fit and the original data:
plot(data$Damage_cm, data$Gossypol, col='black')
par(new=T)
plot(data$Damage_cm,fitData, col='red', ylim=c(0,8000), axes=F, ylab='')
which gives you:
If you want to use nls2 make sure it is installed and if not you can use
install.packages('nls2')
to do so.
library(nls2)
m2<-nls2(data$Gossypol~Y+A*(1-B^data$Damage_cm),data=data,start = list(Y=1000,A=3000,B=0.5))
which gives you the same values as nls:
> m2
Nonlinear regression model
model: data$Gossypol ~ Y + A * (1 - B^data$Damage_cm)
data: structure(list(Gossypol = c(1036.331811, 4171.427741, 6039.995102, 5909.068158, 4140.242559, 4854.985845, 6982.035521, 6132.876396, 948.2418407, 3618.448997, 3130.376482, 5113.942098, 1180.171957, 1500.863038, 4576.787021, 5629.979049, 3378.151945, 3589.187889, 2508.417927, 1989.576826, 5972.926124, 2867.610671, 450.7205451, 1120.955, 3470.09352, 3575.043632, 2952.931863, 349.0864019, 1013.807628, 910.8879471, 3743.331903, 3350.203452, 592.3403778, 1517.045807, 1504.491931, 3736.144027, 2818.419785, 723.885643, 1782.864308, 1414.161257, 3723.629772, 3747.076592, 2005.919344, 4198.569251, 2228.522959, 3322.115942, 4274.324792, 720.9785449, 2874.651764, 2287.228752, 5654.858696, 1247.806111, 1247.806111, 2547.326207, 2608.716056, 1079.846532), Treatment = structure(c(2L, 3L, 4L, 5L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 1L), .Label = c("C", "1c_2d", "3c_2d", "9c_2d", "1c_7d"), class = "factor"), Damage_cm = c(0.4955, 1.516, 4.409, 3.2665, 0.491, 2.3035, 3.51, 1.8115, 0, 0.4435, 1.573, 1.8595, 0, 0.142, 2.171, 4.023, 4.9835, 0, 0.6925, 1.989, 5.683, 3.547, 0, 0.756, 2.129, 9.437, 3.211, 0, 0.578, 2.966, 4.7245, 1.8185, 0, 1.0475, 1.62, 5.568, 9.7455, 0, 0.8295, 2.411, 7.272, 4.516, 0, 0.4035, 2.974, 8.043, 4.809, 0, 0.6965, 1.313, 5.681, 3.474, 0, 0.5895, 2.559, 0)), .Names = c("Gossypol", "Treatment", "Damage_cm"), row.names = c(NA, -56L), class = "data.frame")
Y A B
1303.4450 2796.0385 0.4939
residual sum-of-squares: 1.03e+08
Number of iterations to convergence: 2
Achieved convergence tolerance: 4.936e-06
If you prefer ggplot2:
ggplot(data, aes(x = Damage_cm, y = Gossypol)) +
geom_point() +
geom_smooth(method = "nls",
formula = y ~ Y + A * (1 - B^x),
start = c(Y=1000, A=3000, B=0.5), se = F)
Though I'm afraid the standard errors would have to be simulated outside of ggplot.

Faceting bars in ggplot2

I have this problem: I want to build a stacked bar plot with the faceting capabilities, so I can compare the distribution of frequencies for five common categories, within two different objects, separated according to three groups. I have six objects, five categories and three groups. The problem is that each group has only two different and exclusive objects to plot, but so far I can only produce a plot in which the six objects are plotted across the three groups. This is not optimal, since for each group I have four objects with no data.
Is it possible to plot just two objects for each group with the faceting capabilities?
EDITED
This is my data:
structure(list(Face = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L), .Label = c("LGH002", "LGH003", "LGM009",
"SCM018", "VAH022", "VAM028"), class = "factor"), Race = structure(c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L),
.Label = c("1. Amerindian", "2. White", "3. Mestizo", "4. Other races",
"5. Cannot tell"), class = "factor"), Count = c(19L, 0L, 13L, 8L, 0L, 2L,
7L, 23L, 6L, 2L, 1L, 1L, 29L, 6L, 3L, 29L, 0L, 11L, 0L, 0L, 0L, 38L, 1L, 0L,
1L, 0L, 30L, 9L, 0L, 1L), Density = c(0.475, 0, 0.325, 0.2, 0,
0.05, 0.175, 0.575, 0.15, 0.05, 0.025, 0.025, 0.725, 0.15,
0.075, 0.725, 0, 0.275, 0, 0, 0, 0.95, 0.025, 0, 0.025, 0,
0.75, 0.225, 0, 0.025), School = structure(c(1L, 1L, 1L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Municipal",
"Private Fee-Paying", "Private-Voucher"), class = "factor")),
.Names =c("Face", "Race", "Count", "Density", "School"),
class = "data.frame", row.names = c(NA, -30L))
This is the code I'm using to build the plot:
P <- ggplot(data = races.df, aes(x = Face, y = Density, fill = Race)) +
geom_bar(stat="identity") +
scale_y_continuous(labels=percent)
P + facet_grid(School ~ ., scales="free") + coord_flip()
As you can imagine, I only want to see the x-values "SCM018" and "LGH002" in "Municipal"; "LGM009" and "LGH003" in "Private-Voucher"; and "VAH022" and "VAM028" in "Private Fee-Paying" (only two objects per group). Is it possible? Any help?
All the best,
Mauricio.

reorder barchart by sum of bar segments with ggplot/plyr

I need the 11 bars in the following stacked barplot to be reordered by the sum of the first two segments of each bar, i.e. sorted by the (red+green) segments in the plot.
> dput(q1m.bl)
structure(list(ItemA = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L), .Label = c("sehr wichtig", "wichtig", "unwichtig",
"keine Angabe"), class = "factor"), ItemQ = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 9L,
10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L), .Label = c("PUSHERS_AA",
"PUSHERS_COM", "PUSHERS_BED", "PUSHERS_SEC", "PUSHERS_STAB",
"PUSHERS_COST", "PUSHERS_INNO", "PUSHERS_VAL", "PUSHERS_INDEP",
"PUSHERS_STDS", "PUSHERS_SRC"), class = "factor"), Counts = c(1L,
3L, 4L, 1L, 3L, 3L, 2L, 1L, 4L, 2L, 2L, 1L, 3L, 5L, 1L, 1L, 1L,
6L, 1L, 5L, 1L, 2L, 1L, 1L, 1L, 6L, 1L, 2L, 6L, 1L, 2L, 4L, 2L,
1L, 3L, 3L, 2L, 1L, 2L, 1L, 5L, 1L), blpos = c(0.111111111111111,
0.444444444444444, 0.888888888888889, 1, 0.333333333333333, 0.666666666666667,
0.888888888888889, 1, 0.444444444444444, 0.666666666666667, 0.888888888888889,
1, 0.333333333333333, 0.888888888888889, 1, 0.111111111111111,
0.222222222222222, 0.888888888888889, 1, 0.555555555555556, 0.666666666666667,
0.888888888888889, 1, 0.111111111111111, 0.222222222222222, 0.888888888888889,
1, 0.222222222222222, 0.888888888888889, 1, 0.222222222222222,
0.666666666666667, 0.888888888888889, 1, 0.333333333333333, 0.666666666666667,
0.888888888888889, 1, 0.222222222222222, 0.333333333333333, 0.888888888888889,
1)), .Names = c("ItemA", "ItemQ", "Counts", "blpos"), row.names = c(NA,
-42L), class = "data.frame")
The plot ...
ggplot(q1m.bl, aes(x = ItemQ, y = Counts, fill = ItemA)) +
geom_bar(stat="identity", position="fill") +
geom_text(aes(y = blpos, label = Counts), hjust = 1) +
theme(axis.text.x=element_text(angle=90, hjust = 0), text = element_text(size=10)) +
coord_flip()
Ugh, not enough rep points to embed images. Sorry for the inconvenience. Plot is here: http://i.stack.imgur.com/am0Ud.png
I played around with arrange() and after checking the data frame itself, I thought the following sorting should do the trick. (Note: blpos means "bar label position" and are the positions of the various numbers in the plot.) But plotting this "sorted" data frame leads to the identical plot as above. I do not understand which information to change to change the plotting order of the ItemQ column.
q1m.bl.s <- arrange(q1m.bl, ItemA, desc(blpos))
ggplot(q1m.bl.s, ....
What's the best approach anyway? Should I manipulate the df (using ddply/arrange/reorder/etc.) prior to plotting? Because I tend to think this is a presentation issue and should be done inside ggplot. Does it even matter? The "ggplot ordered barchart" questions I found on SO seem to use both approaches; yet none I found was referring to stacked bar segments and using factor data... hence this new question.
Thank you very much for enlightening me!
It's all about re-ordering the factor levels of the ItemQ variable.
d <- subset(q1m.bl, ItemA %in% c("sehr wichtig", "wichtig"))
totals <- aggregate(d$Counts, list(ItemQ = d$ItemQ), sum)
ItemQ.order <- as.character(totals[order(-totals$x), ]$ItemQ)
q1m.bl$ItemQ <- factor(q1m.bl$ItemQ, levels = ItemQ.order)
Then you should be able to run the code exactly as you provided it and it will produce this:
EDIT (digisus): konvas, I am just re-adding your first answer showing the use of ddply because even I do not feel comfortable with it/do not fully get it, I am sure others can benefit from it. :-) So, with your permission I repost it here:
library(plyr)
ItemQ.order <- q1m.bl %>%
group_by(ItemQ) %>%
filter(ItemA %in% c("sehr wichtig", "wichtig")) %>%
summarise(total = sum(Counts)) %>%
arrange(-total) %>%
select(ItemQ) %>%
unlist %>%
as.character
q1m.bl$ItemQ <- factor(q1m.bl$ItemQ, levels = ItemQ.order)
library(ggplot2)
fac_ord <- function(seed){
set.seed(seed)
return(sample(letters[1:4]))
}
# this seed simulates arbitrary sortings
seed <- 2
fac_ord(seed)
val = c(1,2,3,4,2,2,2,2)
fac = factor(c("a","b","c","d","a","b","c","d"),
levels=fac_ord(seed),
labels=fac_ord(seed),
ordered=FALSE)
dif = c(rep("x",4),rep("y",4))
df = data.frame(val = val, fac = fac)
ggplot(df, aes(x=fac, y=val, fill=dif)) +
geom_bar(stat="identity") +
labs(title = sprintf("seed = %d / %s", seed, paste(fac_ord(seed),collapse=",")))
As the example shows - ggplot will use same ordering for fac in the plot as the internal order of fac. So to influence the plotted order you have to write a function which returns the intended order - in dependence on whatever facts and values - and use this to create the factor fac - and then use this propperly-ordered factor for the plotting.
The intended result can also be reached by application of reorder() for reordering the levels of the factor.

How to display date format (Sep-12) in bar chart and set y axis limit?

I have two questions on building a bar plot by using ggplot().
How to display data format (Sep-12)?
I would like to display the date in the format of Sep-12. My data is a quarterly summary. I would like to show Mar, Jun, Sep and Dec quarters. However, I used the as.Date(YearQuarter) within the ggplot() function. It shows a different sequence of Apr, July, Oct, Jan.
How to increase y axis limit?
The y axis is set at 70%, one of value label is out of the picutre. I have added ylim(0,1) to increase the y limit to 1. However, I lost the percentage format as the y axis is not displaying the percentage anymore.
x4.can.t.m <- structure(list(NR_CAT = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L), .Label = c("0%", "1 to 84%", "85% +"
), class = "factor"), TYPE = structure(c(1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("PM BUSINESS", "PM CONSUMER",
"PREPAY"), class = "factor"), YearQuarter = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("2011-09-01",
"2011-12-01", "2012-03-01", "2012-06-01", "2012-09-01"), class = "factor"),
value = c(0.5, 0, 0.5, 0.35, 0, 0.65, 0.28, 0.02, 0.7, 0.4,
0, 0.6, 0.38, 0, 0.62, 0.43, 0.01, 0.56, 0.57, 0, 0.43, 0.35,
0, 0.65, 0.39, 0.01, 0.6, 0.55, 0, 0.45, 0.4, 0.02, 0.58,
0.35, 0.02, 0.63, 0.35, 0, 0.65, 0.55, 0.01, 0.44, 0.47,
0, 0.53)), .Names = c("NR_CAT", "TYPE", "YearQuarter", "value"
), row.names = c(NA, -45L), class = "data.frame")
This is my plot code:
x4.can.t.m$YearQuarter <- as.Date(x4.can.t.m$YearQuarter)
x4.can.t.d.bar <- ggplot(data=x4.can.t.m, aes(x=YearQuarter, y=value,fill=NR_CAT)) +
geom_bar(stat="identity",position = "dodge",ymax=NR_CAT+0.2) +
facet_wrap(~TYPE,ncol=1) +
geom_text(aes(label =paste(round(value*100,0),"%",sep="")),
position=position_dodge(width=0.9),
vjust=-0.25,size=3) +
scale_y_continuous(formatter='percent',ylim=1) +
labs(y="Percentage",x="Year Quarter") +
ylim(0,100%)
x4.can.t.d.bar +scale_fill_manual("Canopy Indicators",values=tourism.cols(c(6,9,8)))+
opts(title="Canopy Indicator: All Customers portout for Network
Issues",size=4)
It looks like you have an older version of ggplot; the following is for ggplot 0.2.9.1. I had to fix several things to make your plot work. Starting from your original definition of x4.can.t.m:
x4.can.t.m$YearQuarter <- format(as.Date(x4.can.t.m$YearQuarter),"%b-%y")
library("scales")
ggplot(data=x4.can.t.m, aes(x=YearQuarter, y=value, fill=NR_CAT)) +
geom_bar(stat="identity", position = "dodge") +
geom_text(aes(label = paste(round(value*100,0),"%",sep=""), group=NR_CAT),
position=position_dodge(width=0.9),
vjust=-0.25, size=3) +
scale_y_continuous("Percentage", labels=percent, limits=c(0,1)) +
labs(x="Year Quarter") +
scale_fill_discrete("Canopy Indicators") +
facet_wrap(~TYPE,ncol=1) +
ggtitle("Canopy Indicator: All Customers portout for Network Issues") +
theme(plot.title = element_text(size=rel(1.2)))
The first part of the question is just achieved by formatting YearQuarter into the format you wanted, leaving it as a string.
The second part specifies the limits in scale_y_continuous and uses the labels argument to specify the formatting function. Note that library("scales") is needed for this part to work.

Resources