When I write the following code
ddply(milkers, .(dim_cat, lact_cat), function(x) mean(x$milkyield))
I get the following output
The mean calculations regarding milk production by class of stock (1 vs 2) are correct. I would like to end up with a table more like the one below.
Effectively I am trying to get the number of animals in each time period and calculate their mean milk production. The problem is that it is calculating the total number of animals for all time periods and mean milk production for all time periods.
The code I used to generate this data is below.
heiferdat <- subset(milkers, lact_cat== 1)
cowdat <- subset(milkers, lact_cat== 2)
ddply(milkers, .(dim_cat), function(x) c(Heifers = sum(milkers$lact_cat==1), H_Milk= mean(heiferdat$milkyield), Cows = sum(milkers$lact_cat==2), C_Milk= mean(cowdat$milkyield)))
I had anticipated that in this code the .(dim_cat) variable would be applied to the function to restrict the sum and mean functions to only include animals in the correct time period.
I am looking for advice as to how I can get the output with one row per time period with the number of animals for each class lact_cat and the mean milk production for each lact_cat
Thank you
The following is a subset of the data that i am working with.
dput(milkers[180:200, c(11, 25, 26)])
dput(heiferdat[1:20, c(11, 25, 26)])
dput(cowdat[1:20, c(11, 25, 26)])
> dput(milkers[180:200, c(11, 25, 26)])
structure(list(milkyield = structure(c(8.42, 38.32, 14.27, 7.68,
16.59, 17.19, 24.45, 33.47, 36.16, 25.88, 11.61, 18.96, 11.27,
33.6, 21.57, 20.87, 9.62, 7.93, 21.02, 17.75, 22.01), label = "Milk (L)", class = c("labelled",
"numeric")), dim_cat = structure(c(5L, 3L, 7L, 7L, 2L, 7L, 2L,
2L, 2L, 3L, 6L, 6L, 2L, 3L, 6L, 6L, 6L, 6L, 6L, 7L, 6L), .Label = c("<31",
"31-90", "91-150", "151-210", "211-270", "271-330", ">330"), class = c("labelled",
"factor"), label = "Days in Milk"), lact_cat = structure(c(2L,
2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), row.names = 180:200, class = "data.frame")
> dput(heiferdat[1:20, c(11, 25, 26)])
structure(list(milkyield = structure(c(14.27, 17.19, 11.61, 18.96,
11.27, 21.57, 20.87, 9.62, 7.93, 21.02, 17.75, 22.01, 25.15,
11.75, 12.6, 15.62, 19.29, 8.85, 15.52, 11.62), label = "Milk (L)", class = c("labelled",
"numeric")), dim_cat = structure(c(7L, 7L, 6L, 6L, 2L, 6L, 6L,
6L, 6L, 6L, 7L, 6L, 6L, 6L, 6L, 7L, 6L, 6L, 6L, 6L), .Label = c("<31",
"31-90", "91-150", "151-210", "211-270", "271-330", ">330"), class = c("labelled",
"factor"), label = "Days in Milk"), lact_cat = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), row.names = c(182L,
185L, 190L, 191L, 192L, 194L, 195L, 196L, 197L, 198L, 199L, 200L,
201L, 202L, 203L, 204L, 205L, 206L, 207L, 208L), class = "data.frame")
> dput(cowdat[1:20, c(11, 25, 26)])
structure(list(milkyield = structure(c(15.73, 14.56, 16.94, 16.25,
39.09, 9.79, 8.41, 3.05, 38.89, 11.7, 29.89, 19.73, 18.2, 20.63,
20.32, 52.99, 10.11, 8.08, 10.84, 33.75), label = "Milk (L)", class = c("labelled",
"numeric")), dim_cat = structure(c(3L, 6L, 6L, 2L, 3L, 7L, 6L,
7L, 3L, 7L, 3L, 6L, 3L, 6L, 2L, 2L, 7L, 6L, 7L, 7L), .Label = c("<31",
"31-90", "91-150", "151-210", "211-270", "271-330", ">330"), class = c("labelled",
"factor"), label = "Days in Milk"), lact_cat = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), row.names = c(NA,
20L), class = "data.frame")
Following from #DanChaltiel's advice to use dplyr. Here is a dplyr approach:
library(dplyr)
all_summary = milkers %>%
group_by(dim_cat, lact_cat) %>%
summarise(avg = mean(milkyield),
num = n())
At this point you have all the summary information calculated. The following code is just formatting/presentation.
heifer_summary = all_summary %>%
filter(lact_cat == 1) %>%
select(dim_cat, Heifers = num, H_Milk = avg)
cow_summary = all_summary %>%
filter(lact_cat == 2) %>%
select(dim_cat, Cows = num, C_Milk = avg)
arranged_summary = full_join(heifer_summary, cow_summary, by = "dim_cat") %>%
select(dim_cat, Heifers, H_Milk, Cows, C_Milk) %>%
arrange(dim_cat)
I have an issue that is related to this one, but was unable to come to a solution for mine.
I have a reactive ggplot that I would like to update using a check box based on group data.
Currently, when I have ONE box selected, the data displays correctly. If I select more than one check box, I lose data points. See pictures below. I think I have to change the way I'm filtering my data and use droplevels somewhere but not sure how to integrate that (I'm new to shiny!). Any suggestions are appreciated!
WHOC_Sum_CMJ <- structure(list(Athlete = structure(c(1L, 1L, 1L, 7L, 7L, 7L,
7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L,
11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L,
14L, 14L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L), .Label = c("Athlete 1", "Athlete 10",
"Athlete 11", "Athlete 12", "Athlete 13", "Athlete 14", "Athlete 2",
"Athlete 3", "Athlete 4", "Athlete 5", "Athlete 6", "Athlete 7",
"Athlete 8", "Athlete 9"), class = "factor"), Date = structure(c(1L,
4L, 5L, 1L, 3L, 5L, 7L, 2L, 3L, 5L, 7L, 1L, 3L, 5L, 7L, 1L, 3L,
5L, 7L, 1L, 3L, 6L, 7L, 2L, 4L, 5L, 8L, 1L, 3L, 5L, 7L, 1L, 3L,
5L, 7L, 1L, 3L, 5L, 7L, 1L, 3L, 5L, 7L, 1L, 3L, 5L, 7L, 1L, 3L,
6L, 7L, 1L, 3L, 5L, 7L), .Label = c("2020-01-06", "2020-01-07",
"2020-01-13", "2020-01-14", "2020-01-21", "2020-01-23", "2020-01-27",
"2020-01-28"), class = "factor"), Position = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("DEF", "FWD", "GOALIE"), class = "factor"),
Program = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L,
4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("Navy", "Red", "RTP", "White"), class = "factor"),
mRSI = c(0.36, 0.38, 0.42, 0.46, 0.46, 0.47, 0.48, 0.31,
0.3, 0.24, 0.3, 0.29, 0.26, 0.28, 0.28, 0.36, 0.35, 0.43,
0.43, 0.28, 0.31, 0.28, 0.3, 0.33, 0.36, 0.35, 0.37, 0.37,
0.36, 0.37, 0.36, 0.3, 0.36, 0.34, 0.37, 0.26, 0.28, 0.34,
0.3, 0.39, 0.4, 0.43, 0.43, 0.43, 0.47, 0.46, 0.48, 0.34,
0.36, 0.33, 0.37, 0.28, 0.28, 0.34, 0.33), SystemWeight = c(617.21,
612.4, 620.45, 672.08, 682.23, 670.5, 663.41, 517.33, 515.23,
511.62, 517.85, 697.55, 703.92, 689.43, 691.33, 859.06, 845.9,
850.97, 851.84, 655.79, 665.09, 673.91, 667.92, 626.78, 632.92,
634.52, 624.88, 637.55, 645.6, 648.78, 646.64, 558.03, 563.23,
569.58, 560.95, 693.63, 695.54, 684.37, 684.58, 641.18, 660.8,
663.95, 660, 594.92, 596.97, 591.36, 585.64, 522.35, 518.17,
530.95, 523.5, 780.65, 789.81, 775.84, 775.48), FTCT = c(0.61,
0.62, 0.67, 0.74, 0.75, 0.77, 0.77, 0.54, 0.55, 0.44, 0.53,
0.53, 0.49, 0.53, 0.56, 0.6, 0.58, 0.68, 0.68, 0.53, 0.57,
0.54, 0.55, 0.61, 0.63, 0.64, 0.65, 0.59, 0.58, 0.59, 0.59,
0.51, 0.59, 0.59, 0.59, 0.53, 0.57, 0.63, 0.59, 0.76, 0.76,
0.79, 0.78, 0.67, 0.72, 0.72, 0.74, 0.63, 0.65, 0.61, 0.63,
0.49, 0.5, 0.53, 0.57), JumpHeight_cm = c(28.97, 29.78, 31.43,
35.83, 35.41, 36.59, 36.92, 27.56, 26.11, 26.15, 26.82, 26.15,
25.08, 24.98, 24.62, 29.39, 30.17, 32.42, 32.56, 26.6, 27.25,
25.58, 27.88, 29.17, 31.58, 28.48, 31.24, 33.73, 32.78, 33.09,
33.43, 29.73, 31.91, 30.65, 32.98, 24.15, 24.24, 27.57, 25.44,
26.68, 26.39, 27.43, 28.87, 35.44, 36.29, 35.71, 36.06, 26.79,
27.76, 26.82, 29.71, 28.69, 26.9, 31.12, 29.77), EJH = c(17.6,
18.58, 21.11, 26.66, 26.69, 28.08, 28.38, 14.99, 14.39, 11.41,
14.33, 13.8, 12.34, 13.29, 13.67, 17.58, 17.5, 22.03, 22.19,
14.03, 15.59, 13.92, 15.39, 17.7, 19.75, 18.37, 20.3, 19.99,
18.9, 19.62, 19.61, 15.09, 18.8, 18.18, 19.6, 12.78, 13.87,
17.28, 15.06, 20.44, 20.12, 21.74, 22.52, 23.8, 26.25, 25.68,
26.73, 16.99, 18.13, 16.42, 18.82, 14.09, 13.43, 16.61, 16.9
), Weight = c(62.94, 62.45, 63.27, 68.54, 69.57, 68.38, 67.65,
52.76, 52.54, 52.17, 52.81, 71.13, 71.78, 70.31, 70.5, 87.61,
86.26, 86.78, 86.87, 66.88, 67.82, 68.72, 68.11, 63.92, 64.54,
64.71, 63.72, 65.02, 65.84, 66.16, 65.94, 56.91, 57.44, 58.09,
57.2, 70.74, 70.93, 69.79, 69.81, 65.39, 67.39, 67.71, 67.31,
60.67, 60.88, 60.31, 59.72, 53.27, 52.84, 54.15, 53.39, 79.61,
80.54, 79.12, 79.08)), class = "data.frame", row.names = c(NA,
-55L))
```
checkboxGroupInput("Program", label = "Program", choices = unique(WHOC_Sum_CMJ$Program), selected = "Red", inline = TRUE)
# (Note: for the code I cut out some of the styling to make it more readable. That's why it looks different than the pictures).
```
```
renderPlot({
f <- WHOC_Sum_CMJ %>%
select(Date, Athlete, JumpHeight_cm, Program)%>%
filter(Program == input$Program)
p <- ggplot(f)+
geom_line(aes(x=Date, y=JumpHeight_cm, colour = Athlete))+
geom_point(aes(x=Date, y=JumpHeight_cm, colour = Athlete))+
theme_bw() +
labs(title = "Team Jump Height",
x = "Date",
y = "Jump Height (cm)")+
scale_x_date(limits = c(min = min(WHOC_Sum_CMJ$Date), max = max(WHOC_Sum_CMJ$Date)), labels = date_format("%m/%d"),
date_breaks = "2 weeks", expand = c(.08,0))+
guides(col = guide_legend(nrow = 3))+
geom_text_repel(data= subset(f, Date == min(Date)), aes(x=Date, y=JumpHeight_cm,label = unique(Athlete)),
force = .1,
nudge_x = -2,
direction = "y",
hjust = 1,
)
p
})
The issue in your code indeed is based on the filter call. You'll need to use %in%instead of ==, when filtering a vector of statements. Please see the following:
---
title: "Test"
output: flexdashboard::flex_dashboard
runtime: shiny
---
```{r global, include=FALSE}
library(ggplot2)
library(dplyr)
library(scales)
library(ggrepel)
WHOC_Sum_CMJ <- structure(list(Athlete = structure(c(1L, 1L, 1L, 7L, 7L, 7L,
7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L,
11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L,
14L, 14L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L), .Label = c("Athlete 1", "Athlete 10",
"Athlete 11", "Athlete 12", "Athlete 13", "Athlete 14", "Athlete 2",
"Athlete 3", "Athlete 4", "Athlete 5", "Athlete 6", "Athlete 7",
"Athlete 8", "Athlete 9"), class = "factor"), Date = structure(c(1L,
4L, 5L, 1L, 3L, 5L, 7L, 2L, 3L, 5L, 7L, 1L, 3L, 5L, 7L, 1L, 3L,
5L, 7L, 1L, 3L, 6L, 7L, 2L, 4L, 5L, 8L, 1L, 3L, 5L, 7L, 1L, 3L,
5L, 7L, 1L, 3L, 5L, 7L, 1L, 3L, 5L, 7L, 1L, 3L, 5L, 7L, 1L, 3L,
6L, 7L, 1L, 3L, 5L, 7L), .Label = c("2020-01-06", "2020-01-07",
"2020-01-13", "2020-01-14", "2020-01-21", "2020-01-23", "2020-01-27",
"2020-01-28"), class = "factor"), Position = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("DEF", "FWD", "GOALIE"), class = "factor"),
Program = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L,
4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("Navy", "Red", "RTP", "White"), class = "factor"),
mRSI = c(0.36, 0.38, 0.42, 0.46, 0.46, 0.47, 0.48, 0.31,
0.3, 0.24, 0.3, 0.29, 0.26, 0.28, 0.28, 0.36, 0.35, 0.43,
0.43, 0.28, 0.31, 0.28, 0.3, 0.33, 0.36, 0.35, 0.37, 0.37,
0.36, 0.37, 0.36, 0.3, 0.36, 0.34, 0.37, 0.26, 0.28, 0.34,
0.3, 0.39, 0.4, 0.43, 0.43, 0.43, 0.47, 0.46, 0.48, 0.34,
0.36, 0.33, 0.37, 0.28, 0.28, 0.34, 0.33), SystemWeight = c(617.21,
612.4, 620.45, 672.08, 682.23, 670.5, 663.41, 517.33, 515.23,
511.62, 517.85, 697.55, 703.92, 689.43, 691.33, 859.06, 845.9,
850.97, 851.84, 655.79, 665.09, 673.91, 667.92, 626.78, 632.92,
634.52, 624.88, 637.55, 645.6, 648.78, 646.64, 558.03, 563.23,
569.58, 560.95, 693.63, 695.54, 684.37, 684.58, 641.18, 660.8,
663.95, 660, 594.92, 596.97, 591.36, 585.64, 522.35, 518.17,
530.95, 523.5, 780.65, 789.81, 775.84, 775.48), FTCT = c(0.61,
0.62, 0.67, 0.74, 0.75, 0.77, 0.77, 0.54, 0.55, 0.44, 0.53,
0.53, 0.49, 0.53, 0.56, 0.6, 0.58, 0.68, 0.68, 0.53, 0.57,
0.54, 0.55, 0.61, 0.63, 0.64, 0.65, 0.59, 0.58, 0.59, 0.59,
0.51, 0.59, 0.59, 0.59, 0.53, 0.57, 0.63, 0.59, 0.76, 0.76,
0.79, 0.78, 0.67, 0.72, 0.72, 0.74, 0.63, 0.65, 0.61, 0.63,
0.49, 0.5, 0.53, 0.57), JumpHeight_cm = c(28.97, 29.78, 31.43,
35.83, 35.41, 36.59, 36.92, 27.56, 26.11, 26.15, 26.82, 26.15,
25.08, 24.98, 24.62, 29.39, 30.17, 32.42, 32.56, 26.6, 27.25,
25.58, 27.88, 29.17, 31.58, 28.48, 31.24, 33.73, 32.78, 33.09,
33.43, 29.73, 31.91, 30.65, 32.98, 24.15, 24.24, 27.57, 25.44,
26.68, 26.39, 27.43, 28.87, 35.44, 36.29, 35.71, 36.06, 26.79,
27.76, 26.82, 29.71, 28.69, 26.9, 31.12, 29.77), EJH = c(17.6,
18.58, 21.11, 26.66, 26.69, 28.08, 28.38, 14.99, 14.39, 11.41,
14.33, 13.8, 12.34, 13.29, 13.67, 17.58, 17.5, 22.03, 22.19,
14.03, 15.59, 13.92, 15.39, 17.7, 19.75, 18.37, 20.3, 19.99,
18.9, 19.62, 19.61, 15.09, 18.8, 18.18, 19.6, 12.78, 13.87,
17.28, 15.06, 20.44, 20.12, 21.74, 22.52, 23.8, 26.25, 25.68,
26.73, 16.99, 18.13, 16.42, 18.82, 14.09, 13.43, 16.61, 16.9
), Weight = c(62.94, 62.45, 63.27, 68.54, 69.57, 68.38, 67.65,
52.76, 52.54, 52.17, 52.81, 71.13, 71.78, 70.31, 70.5, 87.61,
86.26, 86.78, 86.87, 66.88, 67.82, 68.72, 68.11, 63.92, 64.54,
64.71, 63.72, 65.02, 65.84, 66.16, 65.94, 56.91, 57.44, 58.09,
57.2, 70.74, 70.93, 69.79, 69.81, 65.39, 67.39, 67.71, 67.31,
60.67, 60.88, 60.31, 59.72, 53.27, 52.84, 54.15, 53.39, 79.61,
80.54, 79.12, 79.08)), class = "data.frame", row.names = c(NA,
-55L))
WHOC_Sum_CMJ$Date <- as.Date(WHOC_Sum_CMJ$Date)
```
Column {.sidebar}
-----------------------------------------------------------------------
```{r}
checkboxGroupInput("Program", label = "Program", choices = unique(WHOC_Sum_CMJ$Program), selected = "Red", inline = TRUE)
# (Note: for the code I cut out some of the styling to make it more readable. That's why it looks different than the pictures).
```
Column
-----------------------------------------------------------------------
```{r}
renderPlot({
f <- WHOC_Sum_CMJ %>%
dplyr::select(Date, Athlete, JumpHeight_cm, Program) %>%
filter(Program %in% input$Program)
p <- ggplot(f) +
geom_line(aes(x=Date, y=JumpHeight_cm, colour = Athlete)) +
geom_point(aes(x=Date, y=JumpHeight_cm, colour = Athlete)) +
theme_bw() +
labs(title = "Team Jump Height",
x = "Date",
y = "Jump Height (cm)") +
scale_x_date(limits = c(min = min(WHOC_Sum_CMJ$Date), max = max(WHOC_Sum_CMJ$Date)), labels = date_format("%m/%d"),
date_breaks = "2 weeks", expand = c(.08,0)) +
guides(col = guide_legend(nrow = 3)) +
geom_text_repel(data= subset(f, Date == min(Date)), aes(x=Date, y=JumpHeight_cm,label = unique(Athlete)),
force = .1,
nudge_x = -2,
direction = "y",
hjust = 1,
)
p
})
```
I want to do paired t-test with a data frame. I think I grouped them right but do not know why it reports the error:
Error in complete.cases(x, y) : not all arguments have the same length.
centre_g is my data frame containing all the info I want to use in my analysis. Paired t-test is a right way to do it.
str(centre_g)
# Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':
# 24 obs. of 17 variables
# (I will only list two variables that is used for my anaysis):
# $ BA: Factor w/ 2 levels "after","before": 2 1 2 1 2 1 2 1 2 1 ...
# $ Pb: num 437 1183 1465 3105 NA ...
I used to extract "before" and "after" for "Pb", i.e. I extracted two vectors in the data frame, and did paired t-test, it works fine
(tResult <- t.test(before$Pb, after$Pb, paired = TRUE))
but when I tried to do the paired t-test directly on my data frame, it has the error message mentioned in the question
(tResult <- t.test(Pb ~ BA, data = centre_g, paired = TRUE))
I tried several times, with grouped data or sorted data. I do not know what is wrong with the second method. Is it because the NA values I have got in my data frame? but the first method is fine?
Since I have quite a lot more information in my data frame waiting to be analysed, I do not want to extract vectors for every single of them. I hope to do my paired t-test on my data frame. Could anyone help me?
the detail of centre_g is:
structure(list(day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), SAMPLE.No = structure(c(1L,
13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 3L, 5L, 7L, 9L, 11L,
1L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 3L), .Label = c("s1",
"s1.2", "s10", "s10.2", "s11", "s11.2", "s12", "s12.2", "s13",
"s13.2", "s14", "s14.2", "s2", "s2.2", "s3", "s3.2", "s4", "s4.2",
"s5", "s5.2", "s6", "s6.2", "s7", "s7.2", "s8", "s8.2", "s9",
"s9.2"), class = "factor"), weir = c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L, 12L, 12L), BA = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L), .Label = c("after", "before"), class = "factor"), centre.bank = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("bank", "centre"), class = "factor"),
Pb = c(436.65, 1182.93, 1465.21, 3105.36, 39.1, 1493.91,
NA, 165.28, 38.83, 351.48, 80.26, 47.39, 151.27, 434.01,
-97.58, 240.83, 56.8, 40.24, 38.8, NA, 41.13, 38.93, 44.39,
39.05), Pb.Error = c(16.41, 30.01, 51.26, 102.44, 27.21,
79.63, NA, 13.82, 48.78, 16.71, 19.1, 21.43, 18.65, 21.41,
232.7, 18.83, 12.19, 15.28, 11.94, NA, 22.24, 14.01, 10.56,
9.63), Zn = c(542.52, 981.83, 1234.78, 7554.41, 529.38, 5240.01,
NA, 542.65, 526.08, 820.87, 649.7, 793.42, 707.23, 1204.3,
-34.56, 209.86, 172.5, 130.29, 187.96, NA, 234.57, 137.38,
165.21, 135.05), Zn.Error = c(19.5, 29.31, 48.12, 161.54,
42.36, 144.56, NA, 23.37, 52.5, 26.18, 33.33, 39.87, 31.89,
35.79, 44.83, 17.24, 15.11, 21.25, 19.76, NA, 26.65, 18.67,
15.12, 13.97), Fe = c(3731.23, 14239.54, 23774.52, 52349.37,
3896.63, 13311.26, NA, 2756.96, 3511.06, 2664.12, 2383.16,
2785.75, 2834.59, 6288.39, -321.14, 14704.05, 3825.8, 5017.52,
13181.67, NA, 31190.39, 8516.23, 14130, 18348.01), Fe.Error = c(106.82,
229.87, 432.59, 884.29, 239.03, 496.1, NA, 111.92, 283.9,
102.44, 137.69, 161.02, 137.66, 172.32, 187.37, 274.6, 140.64,
240.97, 310.62, NA, 565.41, 265.57, 260.75, 291.45), Mn = c(110.65,
1337.08, 1126.82, 3495.03, 410.99, 5267.34, NA, 314.42, 338.8,
591.99, 308.46, 427.59, 573.87, 896.23, 277.82, 421.17, 969.72,
535.07, 879.97, NA, 742.39, 350.62, 379.98, 834.36), Mn.Error = c(43.39,
93.86, 133.34, 297.53, 125.08, 410.14, NA, 63.25, 155.08,
68.16, 82.1, 96.34, 88.97, 89.89, 1470.88, 78, 92.24, 118.6,
112.32, NA, 134.87, 91.97, 72.7, 91.12), Cr = c(-38.15, 50.8,
25.9, 53.32, 21.52, 132.82, NA, 8.13, 5.46, 35.07, 93.78,
88.18, 71.23, 47.26, 32.91, 25.49, 10.36, 19.99, 5.13, NA,
32.61, 22.13, 47.5, -5.82), Cr.Error = c(9.05, 16.41, 7.7,
9.99, 4.58, 33.88, NA, 7.84, 2.86, 9.18, 8.75, 7.55, 7.98,
9.62, 6.38, 5.54, 6.72, 4.6, 6.5, NA, 6.64, 4.62, 9.51, 11.3
), Ca = c(32195.21, 46510.98, 21723.24, 17820.74, 14639.01,
45937.9, NA, 37840.08, 4704.64, 37705.36, 28625.21, 25115.24,
41579.19, 91829.16, 19752.96, 14605.4, 34654.73, 15798.87,
13873.07, NA, 22901.14, 4097.09, 12053.38, 276525.69), Ca.Error = c(211.2,
326.69, 160.54, 142.76, 120.63, 304.76, NA, 219.4, 66.28,
225.41, 187.03, 169.88, 226.15, 378.53, 149.92, 125.47, 208.18,
127.73, 127.4, NA, 168.31, 64.51, 128.02, 908.61)), row.names = c(1L,
4L, 6L, 8L, 10L, 12L, 13L, 16L, 17L, 19L, 21L, 23L, 26L, 28L,
29L, 32L, 34L, 36L, 38L, 39L, 42L, 43L, 46L, 48L), class = "data.frame")
I am interested in doing paired t test on "Pb" column, trying to compare "before" and "after" (as shown in column "BA"). Each "weir" would be an individual.
I have worked it our after a day. I found it is because a row of NA data. There are some places where I did not manage to take samples, so there appears to be a whole row of NA data (except the factors columns).
To make sure the data frame has the whole length (24 instead of 23) and does not omit NA data, add na.rm = FALSE when subsetting the data frame into centre_g.
centre_g <- subset(HM_selected, centre.bank == "centre", na.rm = FALSE)
(I think I gave the right centre_g in my question dataset, but occationally I just got 23 data. adding na.rm to make sure how NA data are processed)
When doing the paired t-test, also add na.rm = FALSE.
(tRESULT <- t.test(Pb ~ BA, data = centre, paired = TRUE, na.rm = FALSE)
and that works perfectly for me.
sorry if there is any confusion in the question
So I am trying to make a stacked bar graph with bar width mapped to a variable; but I want the spacing between my bars to be constant.
Does anyone know how to make the spacing constant between the bars?
Right now I've got this:
p<-ggplot(dd, aes(variable, value.y, fill=Date, width=value.x / 15))+ coord_flip() + opts(ylab="")
p1<-p+ geom_bar(stat="identity") + scale_fill_brewer(palette="Dark2") + scale_fill_hue(l=55,c=55)
p2<-p1 + opts(axis.title.x = theme_blank(), axis.title.y = theme_blank())
p2
Thanks in advance.
Here's my data by the way (sorry for the long, bulky dput):
> dput(dd)
structure(list(variable = structure(c(1L, 1L, 1L, 1L, 1L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 7L, 7L, 7L, 7L, 7L, 2L, 2L,
2L, 2L, 2L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 9L, 9L, 9L,
9L, 9L, 8L, 8L, 8L, 8L, 8L), .Label = c("Alcohol and Tobacco",
"Health and Personal Care", "Clothing", "Energy", "Recreation and Education",
"Household", "Food", "Transportation", "Shelter"), class = "factor", scores = structure(c(2.91,
5.31, 10.08, 15.99, 4.95, 11.55, 11.2, 27.49, 20.6), .Dim = 9L, .Dimnames = list(
c("Alcohol and Tobacco", "Clothing", "Energy", "Food", "Health and Personal Care",
"Household", "Recreation and Education", "Shelter", "Transportation"
)))), value.x = c(2.91, 2.91, 2.91, 2.91, 2.91, 5.31, 5.31,
5.31, 5.31, 5.31, 10.08, 10.08, 10.08, 10.08, 10.08, 15.99, 15.99,
15.99, 15.99, 15.99, 4.95, 4.95, 4.95, 4.95, 4.95, 11.55, 11.55,
11.55, 11.55, 11.55, 11.2, 11.2, 11.2, 11.2, 11.2, 27.49, 27.49,
27.49, 27.49, 27.49, 20.6, 20.6, 20.6, 20.6, 20.6), Date = structure(c(5L,
4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L,
3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L, 3L,
2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L), .Label = c("1993-2001",
"2001-2006", "2007-2010", "2010-2011", "2012 Jan - May"), class = "factor"),
value.y = c(2.1, 2.5, 7.6, 21.7, 2.8, 1.5, 0.3, -4.1, -4.2,
4.7, 3, 16.9, 1.9, 32.8, 23.9, 3.2, 4.6, 11.3, 8.9, 12.9,
1.7, 2, 7.8, 5.9, 10, 1.9, 2.1, 5.6, 2.2, 9.9, 1.4, 1.3,
2.2, 0.6, 17.3, 1.1, 2.3, 6.4, 13.1, 10, 4.3, 7.6, 0.9, 15.2,
20.5)), .Names = c("variable", "value.x", "Date", "value.y"
), row.names = c(NA, -45L), class = "data.frame")
For a categorical or "discrete" scale - you can adjust the width, but it needs to be between 0 and 1. Your value.x's put it over 1, hence the overlap. You can use rescale, from the scales packages to adjust this quickly so that the within category width of the bar is representative of some other variable (in this case value.x)
install.packages("scales")
library(scales)
ggplot(dd,aes(x=variable,y=value.y,fill=Date)) +
geom_bar(aes(width=rescale(value.x,c(0.5,1))),stat="identity",position="stack")' +
coord_flip()
Play with rescaling for optimal "view" change 0.5 to 0.25... etc.
Personally, I think something like this is more informative:
ggplot(dd,aes(x=variable,y=value.y,fill=Date)) +
geom_bar(aes(width=rescale(value.x,c(0.2,1))),stat="identity") +
coord_flip() + facet_grid(~Date) + opts(legend.position="none")
Attempt # 2.
I'm tricking ggplot2 into writing a continuous scale as categorical.
# The numbers for tmp I calculated by hand. Not sure how to program
# this part but the math is
# last + half(previous_width) + half(current_width)
# Change the 1st number in cumsum to adjust the between category width
tmp <- c(2.91,7.02,14.715,27.75,38.22,46.47,57.845,77.19,101.235) + cumsum(rep(5,9))
dd$x.pos1 <- rep(tmp,each=5)
ggplot(dd,aes(x=x.pos1,y=value.y,fill=Date)) +
geom_bar(aes(width=value.x),stat="identity",position="stack") +
scale_x_continuous(breaks=tmp,labels=levels(dd$variable)) +
coord_flip()
For good measure you're probably going to want to adjust the text size. That's done with ... + opts(axis.text.y=theme_text(size=12))