a barplot with two different variables on R - r

would like to plot the following data on the same barplot. it is a length frequency barplot showing the males and the females of a population with respect to their length classes:
I am new to this and i dont know how to put my data here, but here is an example:
Lengthclass Both Males Females
60 7 5 2
70 10 5 5
80 11 6 5
90 4 2 2
100 3 3 0
110 3 0 3
120 1 1 0
130 0 0 0
140 1 0 1
150 2 0 2
If i use this code:
{barplot()} it does not give me all three variables on the same plot.
i need a graph the looks like this but on R.
Thank you:)

classes <- levels(cut(60:100, breaks = c(60,70,80,90,100),
right =FALSE))
my.df <- data.frame(lengthclass = classes,
both = c(7,10,11,4),
male = c(5,5,6,2),
female = c(2,5,5,2))
barplot(t(as.matrix(my.df[, 2:4])),
beside = TRUE,
names.arg = my.df$lengthclass,
legend.text = TRUE,
ylim = c(0,12),
ylab = "number of individuals",
xlab = "Length class (cm)")

Your barplot is known as a "grouped barplot" (in contrast to a "stacked barplot").
Arrange your data in a matrix and use beside=TRUE in your call to barplot(). Here is an example using a built-in dataset:
> VADeaths
Rural Male Rural Female Urban Male Urban Female
50-54 11.7 8.7 15.4 8.4
55-59 18.1 11.7 24.3 13.6
60-64 26.9 20.3 37.0 19.3
65-69 41.0 30.9 54.6 35.1
70-74 66.0 54.3 71.1 50.0
> barplot(VADeaths,beside=TRUE)

Related

glmmTMB, post-hoc testing and glht

I am using glmmTMB to analyze a negative binomial generalized linear mixed model (GLMM) where the dependent variable is count data (CT), which is over-dispersed.
There are 115 samples (rows) in the relevant data frame. There are two fixed effects (F1, F2) and a random intercept (R), within which is nested a further random effect (NR). There is also an offset, consisting of the natural logarithm of the total counts in each sample (LOG_TOT).
An example of a data frame, df, is:
CT F1 F2 R NR LOG_TOT
77 0 0 1 1 12.9
167 0 0 2 6 13.7
289 0 0 3 11 13.9
253 0 0 4 16 13.9
125 0 0 5 21 13.7
109 0 0 6 26 13.6
96 1 0 1 2 13.1
169 1 0 2 7 13.7
190 1 0 3 12 13.8
258 1 0 4 17 13.9
101 1 0 5 22 13.5
94 1 0 6 27 13.5
89 1 25 1 4 13.0
166 1 25 2 9 13.6
175 1 25 3 14 13.7
221 1 25 4 19 13.8
131 1 25 5 24 13.5
118 1 25 6 29 13.6
58 1 75 1 5 12.9
123 1 75 2 10 13.4
197 1 75 3 15 13.7
208 1 75 4 20 13.8
113 1 8 1 3 13.2
125 1 8 2 8 13.7
182 1 8 3 13 13.7
224 1 8 4 18 13.9
104 1 8 5 23 13.5
116 1 8 6 28 13.7
122 2 0 1 2 13.1
115 2 0 2 7 13.6
149 2 0 3 12 13.7
270 2 0 4 17 14.1
116 2 0 5 22 13.5
94 2 0 6 27 13.7
73 2 25 1 4 12.8
61 2 25 2 9 13.0
185 2 25 3 14 13.8
159 2 25 4 19 13.7
125 2 25 5 24 13.6
75 2 25 6 29 13.5
121 2 8 1 3 13.0
143 2 8 2 8 13.8
219 2 8 3 13 13.9
191 2 8 4 18 13.7
98 2 8 5 23 13.5
115 2 8 6 28 13.6
110 3 0 1 2 12.8
123 3 0 2 7 13.6
210 3 0 3 12 13.9
354 3 0 4 17 14.4
160 3 0 5 22 13.7
101 3 0 6 27 13.6
69 3 25 1 4 12.6
112 3 25 2 9 13.5
258 3 25 3 14 13.8
174 3 25 4 19 13.5
171 3 25 5 24 13.9
117 3 25 6 29 13.7
38 3 75 1 5 12.1
222 3 75 2 10 14.1
204 3 75 3 15 13.5
235 3 75 4 20 13.7
241 3 75 5 25 13.8
141 3 75 6 30 13.9
113 3 8 1 3 12.9
90 3 8 2 8 13.5
276 3 8 3 13 14.1
199 3 8 4 18 13.8
111 3 8 5 23 13.6
109 3 8 6 28 13.7
135 4 0 1 2 13.1
144 4 0 2 7 13.6
289 4 0 3 12 14.2
395 4 0 4 17 14.6
154 4 0 5 22 13.7
148 4 0 6 27 13.8
58 4 25 1 4 12.8
136 4 25 2 9 13.8
288 4 25 3 14 14.0
113 4 25 4 19 13.5
162 4 25 5 24 13.7
172 4 25 6 29 14.1
2 4 75 1 5 12.3
246 4 75 3 15 13.7
247 4 75 4 20 13.9
114 4 8 1 3 13.1
107 4 8 2 8 13.6
209 4 8 3 13 14.0
190 4 8 4 18 13.9
127 4 8 5 23 13.5
101 4 8 6 28 13.7
167 6 0 1 2 13.4
131 6 0 2 7 13.5
369 6 0 3 12 14.5
434 6 0 4 17 14.9
172 6 0 5 22 13.8
126 6 0 6 27 13.8
90 6 25 1 4 13.1
172 6 25 2 9 13.7
330 6 25 3 14 14.2
131 6 25 4 19 13.7
151 6 25 5 24 13.9
141 6 25 6 29 14.2
7 6 75 1 5 12.2
194 6 75 2 10 14.2
280 6 75 3 15 13.7
253 6 75 4 20 13.8
45 6 75 5 25 13.4
155 6 75 6 30 13.9
208 6 8 1 3 13.5
97 6 8 2 8 13.5
325 6 8 3 13 14.3
235 6 8 4 18 14.1
112 6 8 5 23 13.6
188 6 8 6 28 14.1
The random and nested random effects are treated as factors. The fixed effect F1 has the value 0, 1, 2, 3, 4 and 6. The fixed effect F2 has the values 0, 8, 25 and 75. I am treating the fixed effects as continuous, rather than ordinal, because I would like to identify monotonic unidirectional changes in the dependent variable CT rather than up and down changes.
I previously used the lme4 package to analyze the data as a mixed model:
library(lme4)
m1 <- lmer(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE)
Followed by the use of glht in the multcomp package for post-hoc analysis employing the formula approach:
library(multcomp)
glht_fixed1 <- glht(m1, linfct = c(
"F1 == 0",
"F1 + 8*F1:F2 == 0",
"F1 + 25*F1:F2 == 0",
"F1 + 75*F1:F2 == 0",
"F1 + (27)*F1:F2 == 0"))
glht_fixed2 <- glht(m1, linfct = c(
"F2 + 1*F1:F2 == 0",
"F2 + 2*F1:F2 == 0",
"F2 + 3*F1:F2 == 0",
"F2 + 4*F1:F2 == 0",
"F2 + 6*F1:F2 == 0",
"F2 + (3.2)*F1:F2 == 0"))
glht_omni <- glht(m1)
Here is the corresponding negative binomial glmmTMB model, which I now prefer:
library(glmmTMB)
m2 <- glmmTMB(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE, family="nbinom2")
According to this suggestion by Ben Bolker (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/025813.html), the best approach to post hoc testing with glmmTMB is to use lsmeans (?or its more recent equivalent, emmeans).
I follwed Ben's suggestion, running
source(system.file("other_methods","lsmeans_methods.R",package="glmmTMB"))
and I can then use emmeans on the glmmTMB object. For example,
as.glht(emmeans(m2,~(F1 + 27*F1:F2)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
3.11304347826087, 21 == 0 -8.813
But this does not seem correct. I can also change F1 and F2 to factors and then try this:
as.glht(emmeans(m2,~(week + 27*week:conc)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
0, 0 == 0 -6.721
1, 0 == 0 -6.621
2, 0 == 0 -6.342
3, 0 == 0 -6.740
4, 0 == 0 -6.474
6, 0 == 0 -6.967
0, 8 == 0 -6.694
1, 8 == 0 -6.651
2, 8 == 0 -6.227
3, 8 == 0 -6.812
4, 8 == 0 -6.371
6, 8 == 0 -6.920
0, 25 == 0 -6.653
1, 25 == 0 -6.648
2, 25 == 0 -6.282
3, 25 == 0 -6.766
4, 25 == 0 -6.338
6, 25 == 0 -6.702
0, 75 == 0 -6.470
1, 75 == 0 -6.642
2, 75 == 0 -6.091
3, 75 == 0 -6.531
4, 75 == 0 -5.762
6, 75 == 0 -6.612
But, again, I am not sure how to bend this output to my will. If some kind person could tell me how to correctly carry over the use of formulae in glht and linfct to the emmeans scenario with glmmTMB, I would be very grateful. I have read all the manuals and vignettes until I am blue in face (or it feels that way, at least), but I am still at a loss. In my defense (culpability?) I am a statistical tyro, so many apologies if I am asking a question with very obvious answers here.
The glht software and post hoc testing carries directly over to the glmmADMB package, but glmmADMB is 10x slower than glmmTMB. I need to perform multiple runs of this analysis, each with 300,000 examples of the negative binomial mixed model, so speed is essential.
Many thanks for your suggestions and help!
The second argument (specs) to emmeans is not the same as the linfct argument in glht, so you can't use it in the same way. You have to call emmeans() using it the way it was intended. The as.glht() function converts the result to a glht object, but it really is not necessary to do that as the emmeans summary yields similar results.
I think the results you were trying to get are obtainable via
emmeans(m2, ~ F2, at = list(F2 = c(0, 8, 25, 75)))
(using the original model with the predictors as quantitative variables). This will compute the adjusted means holding F1 at its average, and at each of the specified values of F2.
Please look at the documentation for emmeans(). In addition, there are lots of vignettes that provide explanations and examples -- starting with https://cran.r-project.org/web/packages/emmeans/vignettes/basics.html.
Following the advice of my excellent statistical consultant, I think the solution below provides what I had previously obtained using glht and linfct.
The slopes for F1 are calculated at the various levels of F2 by using contrast and emmeans to compute the differences in the dependendent variable between two values of F1 separated by one unit (i.e. c(0,1)). (Since the regression is linear, the two values of F1 are arbitrary, provided they are separated by one unit, eg c(3,4)). Vice versa for the slopes of F2.
Thus, slopes of F1 at F2 = 0, 8, 25, 75 and 27 (27 is average of F2):
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=0)),list(c(-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F1",])
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=8)),list(c(-1,1)))
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=25)),list(c(-1,1)))
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=75)),list(c(-1,1)))
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=27)),list(c(-1,1)))
and slopes of F2 at F1 = 1, 2, 3, 4, 6 and 3.2 (3.2 is average of F1, excluding zero value):
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F2",])
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=1)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=2)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=4)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=6)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3.2)),list(c(-1,1)))
Interaction of F1 and F2 slopes at F1 = 0 and F2 = 0
contrast(emmeans(m1, specs=c("F1","F2"), at=list(F1=c(0,1),F2=c(0,1))),list(c(1,-1,-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F1:F2",])
From the resulting emmGrid objects provided from contrast(), one can pick out as desired the estimate of the slope (estimate), standard deviation of the estimated slope (SE), Z score for the difference of the estimated slope from a null hypothesized slope of zero (z.ratio, calculated by emmGrid from estimate divided by SE) and corresponding P value (p.value calculated by emmGrid as 2*pnorm(-abs(z.ratio)).
For example:
contrast(emmeans(m1, specs="F1", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))
yields:
NOTE: Results may be misleading due to involvement in interactions
contrast estimate SE df z.ratio p.value
c(-1, 1) 0.001971714 0.002616634 NA 0.754 0.4511
Postscript added 1.25 yrs later:
The above gives the correct solutions, but as Russell Lenth pointed out the answers are more easily obtained using emtrends. However, I have selected this answer as being correct since there may be have some didactic value in showing how to calculate slopes using emmeans to find the resulting change in the predicted dependent variable when the independent variable changes by 1.

boxplot doesn't show all the parameter in R

I write this code to execute an ANOVA for a simple dataframe and I want to draw a boxplot out of it
DF <- read.table('chromium.txt',header=TRUE)
Chromium.aov <- aov(Concentration ~ Lab,data=DF)
print(summary(Chromium.aov))
with(DF,boxplot(Concentration,Lab))
here is the text file
Lab Concentration
1 26.1
1 21.5
1 22.0
1 22.6
1 24.9
1 22.6
1 23.8
1 23.2
2 18.3
2 19.7
2 18.0
2 17.4
2 22.6
2 11.6
2 11.0
2 15.7
3 19.1
3 13.9
3 15.7
3 18.6
3 19.1
3 16.8
3 25.5
3 19.7
4 30.7
However, R only show 2 box plots for lab 1 and 2, not 3 and 4, how can I fix this?
boxplot(DF$Concentration ~ DF$Lab)
The syntax you used is making one box with all the values of 'Concentration', and another with the values of 'Lab'
When you do with(DF,boxplot(Concentration,Lab)), you are providing two sets of values to be plotted - Concentration and lab. You want to split the Concentration based on the unique values Lab and then create the boxplot.
boxplot(split(DF$Concentration, DF$Lab))

How to dynamically change the breaks and limits while plotting the box plots using ggplots

I have the following data
head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
The summary stats:
data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
name type na mean disp median mad min max nlevs
1 Ozone integer 37 42.129310 32.987885 31.5 25.94550 1.0 168.0 0
2 Solar.R integer 7 185.931507 90.058422 205.0 98.59290 7.0 334.0 0
3 Wind numeric 0 9.957516 3.523001 9.7 3.40998 1.7 20.7 0
4 Temp integer 0 77.882353 9.465270 79.0 8.89560 56.0 97.0 0
5 Month integer 0 6.993464 1.416522 7.0 1.48260 5.0 9.0 0
6 Day integer 0 15.803922 8.864520 16.0 11.86080 1.0 31.0 0
Now I want to plot the boxplot of continuous vars and i have the following code which I was using for some other dataset.
d <- melt(df)
p <- ggplot(d) +
geom_boxplot(aes(x=variable, y=value, color=variable,fill=variable))) +
labs(x="", y="", title="Box Plot of Variables",subtitle="",caption="") + my_theme() +
scale_y_continuous(breaks=c(seq(0,100000,20000)), limits = c(0,100000)) +
theme(plot.title = element_text(lineheight=.8, face="bold",colour = "steelblue",hjust =0.5,vjust = 2,size = 11)) +
theme(text = element_text(size=10), axis.text.x = element_text(angle=45, hjust=1))
Obviously the breaks and limits parameters in scale_y_continuous() have to be changed for this data which implies that this has to be done every time whenever I want to plot the boxplot; but this approach doesn't give me the flexibility to make it generalizable..
Say that I want it to be included in my shiny app.
How can I change dynamically the breaks and limits parameters depending upon the date input without doing it manually each time.
Add this variable to your code:
num.labels <- 10 #or whatever
Then update your call to scale_y_continuous to:
scale_y_continuous(breaks= seq(min(d$value), max(d$value), length.out = num.labels),
limits = c(min(d$value),max(d$value)))
You should be able to take it from there.

Time-series data visualization

I have a pretty large data frame in R stored in long form. It contains body temperature data collected from 40 different individuals, with 10 sec intervals, over 16 days. Individuals have been exposed to conditions (cond1 and cond2). It essentially looks like this:
ID Cond1 Cond2 Day ToD Temp
1 A B 1 18.0 37.1
1 A B 1 18.3 37.2
1 A B 2 18.6 37.5
2 B A 1 18.0 37.0
2 B A 1 18.3 36.9
2 B A 2 18.6 36.9
3 A A 1 18.0 36.8
3 A A 1 18.3 36.7
3 A A 2 18.6 36.7
...
I want to create four separate line plots for each combination of conditions(AB, BA, AA, BB) that shows mean temp over time (day 1-16).
p.s. ToD stands for time of day. Not sure if I need to provide it in order to create the plot.
So far I have tried to define the dataset as time series by doing
ts <- ts(data=dataset$Temp, start=1, end=16, frequency=8640)
plot(ts)
This returns a plot of Temp, but I can't figure out how to define condition values for breaking up the data.
Edit:
Essentially I want a plot that looks like this 1, but one for each group separately, and using mean Temp values. This plot is just for one individual in one condition, and I want one that shows the mean for all individuals in the same condition.
You can use summarise and group_by to group the data by condition and then plot it. Is this what you're looking for?
library(dplyr)
## I created a dataframe df that looks like this:
ID Cond1 Cond2 Day ToD Temp
1 1 A B 1 18.0 37.1
2 1 A B 1 18.3 37.2
3 1 A B 2 18.6 37.5
4 2 B A 1 18.0 37.0
5 2 B A 1 18.3 36.9
6 2 B A 2 18.6 36.9
7 3 A A 1 18.0 36.8
8 3 A A 1 18.3 36.7
9 3 A A 2 18.6 36.7
df$Cond <- paste0(df$Cond1, df$Cond2)
d <- summarise(group_by(df, Cond, Day), t = mean(Temp))
ggplot(d, aes(Day, t, color = Cond)) + geom_line()
which results in:

Plotting a new point value in a boxplot. R and ggplot2

I have a simple data frame called msq:
sex wing index
1 h 54 67.4
2 m 60.5 67.9
3 m 60 64.5
4 m 59 66.6
5 m 63.5 63.3
6 m 63 66.7
7 m 61.5 71.8
8 m 62 67.9
9 m 63 67.8
10 m 62.5 72.7
11 m 61.5 70.3
12 h 54.5 70.7
13 m 60 61.1
14 m 63.5 50.9
15 m 63 72.1
My intention is to make a boxplot with ggplot for which I use this code that works fine:
gplot(msq, aes("index",index))+ geom_boxplot (aes(group="sex"))
and then to plot an outlier that should stand alone up in the graph (a value 73.9). The problem is that if I include it in the data set, the boxplot "absorbs" it making the error line longer... I have been looking in Hmisc and to stat_summary but I can't get any clear idea.
thank you.
You could use geom_point to add points to a plot generated with ggplot2.
library(ggplot2)
ggplot(msq, aes(sex, index)) + # Note. I modified the aes call
geom_boxplot() +
geom_point(aes(y = 73.9)) # add points

Resources