Create boxplots with groups spread across multiple columns

Create boxplots with groups spread across multiple columns - r

I'm using the weightloss dataset:
structure(list(id = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12"), class = "factor"),
diet = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("no", "yes"), class = "factor"),
exercises = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("no", "yes"
), class = "factor"), t1 = c(10.43, 11.59, 11.35, 11.12,
9.5, 9.5, 11.12, 12.51, 11.35, 11.12, 11.12, 10.2, 11.12,
9.96, 12.05, 8.11, 12.05, 12.05, 12.28, 10.66, 11.35, 10.2,
10.2, 9.5, 10.2, 12.98, 13.21, 10.2, 11.59, 12.05, 11.59,
12.05, 11.82, 11.12, 12.51, 11.59, 10.43, 11.35, 11.82, 10.2,
13.67, 10.66, 10.2, 12.05, 11.82, 10.43, 12.74, 11.35), t2 = c(13.21,
10.66, 11.12, 9.5, 9.73, 12.74, 12.51, 12.28, 11.59, 10.66,
13.44, 11.35, 12.51, 12.74, 13.67, 14.37, 14.6, 12.98, 12.05,
14.37, 14.6, 11.82, 14.13, 13.21, 12.51, 12.98, 11.12, 9.73,
13.44, 13.67, 12.98, 11.35, 12.05, 15.29, 11.82, 12.05, 12.51,
14.83, 13.9, 13.21, 14.13, 15.06, 12.98, 11.35, 12.51, 14.13,
12.74, 11.35), t3 = c(11.59, 13.21, 11.35, 11.12, 12.28,
10.43, 11.59, 12.74, 9.96, 11.35, 10.66, 11.12, 15.76, 16.68,
17.84, 14.6, 17.84, 17.61, 18.54, 16.91, 15.52, 17.38, 19,
14.13, 14.6, 14.6, 12.05, 15.52, 13.9, 12.74, 13.21, 14.83,
14.6, 10.89, 15.52, 12.98, 14.37, 15.06, 13.44, 14.13, 15.29,
14.6, 15.06, 15.52, 13.9, 14.37, 15.06, 15.06)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -48L))
So far it looks like at least here I can separate the scores:
weight <- weightloss
summary <- weight %>%
get_summary_stats(type = "mean_sd")
summary
Which gives me this:
A tibble: 3 x 4
variable n mean sd
<chr> <dbl> <dbl> <dbl>
1 t1 48 11.2 1.09
2 t2 48 12.7 1.42
3 t3 48 14.2 2.26
I'm trying to run a RMANOVA on this, but I would like to get a boxplot for every one of the three groups, all in a single plot. However, I'm not sure how to plot the x and y in this case. I tried using this for the x:
trial_type <- c("t1","t2","t3")
factor(trial_type)
But thats where I'm stuck...I'm not sure how you get the y in this case. The y is clearly the scores from each trial. I tried grouping by this factor to see if that would sort out the scores in some way, but I haven't figured that out either.
I'm just not sure how you plot this into ggplot. Any help would be great! I can imagine this is a very useful skill to learn for any data that uses trials.

You may have to pivot_longer first, then the grouping gets easier.
After pivoting, all values will be on the same column ('values'), and there will be a grouping column ('trial')
library(dplyr)
df<-df %>% pivot_longer(names_to = 'trial', values_to = 'value', cols = matches('t\\d'))
with(df, boxplot(value ~ trial))
If you prefer ggplot:
ggplot(df, aes(x=trial, y=value))+
geom_boxplot()

a ggplot and reshape2 way;
library(reshape2)
df %>%
melt(id.vars='id',measure.vars = c('t1','t2','t3')) %>%
ggplot(aes(x=variable,y=value))+
geom_boxplot(aes(color=variable))

Related

How can i add percentage breakdown analysis of Total to the table using mutate or rowsums

I would want to add a Total column that shows total of each interval of user type along with percentage breakdown of each user type interval of Total?
bike_rides %>%
group_by(user_type) %>%
summarize("<=5min" = sum(ride_length_min <=5),
"<=15min" = sum(ride_length_min <=15),
"<=30min" = sum(ride_length_min <=30),
"<=45min" = sum(ride_length_min <=45),
"<=60min" = sum(ride_length_min <=60),
">2hrs" = sum(ride_length_min >120),
">4hrs" = sum(ride_length_min >240),
">6hrs" = sum(ride_length_min >360),)
I get following table:
\# A tibble: 2 × 9
user_type `<=5min` `<=15min` `<=30min` `<=45min` `<=60min` `>2hrs`
`>4hrs` `>6hrs`
\<fct\> \<int\> \<int\> \<int\> \<int\> \<int\> \<int\> \<int\> <int\>
1 casual 172062 915674 1372926 1530708 1603595 31092 5346 3068
2 member 555911 1884360 2394716 2521329 2545510 3629 1339 866\`
however i want to add Total column at the end that shows total of each user type alongside percentage analysis that shows something like this:
I tried using mutate and rowsums but unable to get what I want.
Dput:
bike_rides_str <- structure(
list(
user_type = structure(
c(
1L,
1L,
2L,
1L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
1L,
2L,
2L,
2L,
2L,
2L,
2L,
1L,
2L,
2L,
2L,
1L,
2L,
2L,
2L,
1L,
2L,
2L,
2L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
1L,
2L,
2L,
2L,
2L
),
levels = c("casual",
"member"),
class = "factor"
),
ride_length_min = c(
2.95,
4.35,
4.35,
14.93,
6.03,
3.37,
16.57,
12.07,
25.45,
7.38,
6.35,
12.35,
9.33,
6.88,
10,
1.67,
5.05,
4.92,
9.27,
25.83,
23.17,
4.73,
14.05,
7.12,
10.82,
5.13,
6.78,
7.65,
24,
10.72,
25.9,
9.48,
7.47,
12.48,
14.78,
13.53,
2.65,
9.13,
2.67,
2.75,
16.95,
15.73,
39.02,
7.8,
7.18,
13.42,
4.48,
12.37,
1.52,
6.85
)
),
row.names = c(NA,-50L),
class = c("tbl_df", "tbl", "data.frame")
)

Weighted proportions and confidence intervals

I have tried to follow this post to calculate a weighted proportion and standard error. However, the answer provided did not have a lot of explanation so I was unsure if my calculations were correct.
I would love confirmation that what I've done is indeed correct, or alternate ways to achieve my desired outcome if incorrect?
# Test data
test <- structure(list(koala = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Gendry", class = "factor"),
koala.pres = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 2L, 3L,
2L, 2L, 3L, 2L, 1L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 1L, 1L,
1L, 1L), .Label = c("Absent", "Day", "Night"), class = "factor"),
habitat = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("Exposed Sandstone Scribbly Gum", "Sheltered sandstone Blue leafed stringybark forest",
"Transitional Shale Dry Ironbark Forest"), class = "factor"),
tree.sp = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 12L, 12L, 13L, 13L,
13L, 13L, 13L, 14L, 14L, 14L, 14L, 2L, 2L, 2L, 7L, 7L, 7L,
9L, 13L, 1L, 1L, 1L, 2L, 2L, 10L, 11L, 11L, 13L, 13L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L), .Label = c("A. littoralis",
"C. gummifera", "E. amplifolia", "E. beyeriana", "E. crebra",
"E. fibrosa", "E. globoidea", "E. longifolia", "E. oblonga",
"E. piperita", "E. punctata", "E. resinifera", "E. sclerophylla",
"E. sieberi"), class = "factor"), cbh = c(0.76, 0.98, 0.42,
0.34, 0.4, 0.44, 0.45, 0.47, 0.66, 0.59, 0.99, 0.43, 0.35,
0.36, 0.4, 0.46, 0.52, 0.49, 0.4, 1.56, 1.26, 0.83, 1.1,
1.22, 1.04, 1.04, 1.08, 1.7, 1.35, 1.89, 0.88, 0.63, 1.26,
0.45, 1.2, 1.33, 0.41, 1.22, 0.75, 0.32, 0.52, 0.6, 1.37,
1.51, 1.29, 0.51, 0.46, 0.44, 2.35, 1.68, 1.24, 0.58, 0.53,
0.69, 0.45, 0.5, 0.5, 0.51, 1.46, 1.23, 0.32, 1.47, 2.27,
0.41, 0.59, 0.61, 0.83, 0.56, 0.41, 0.47, 0.6, 0.35, 1.91,
0.65, 0.52, 1.41, 0.95, 0.91, 1.51, 1.08, 0.95, 0.52, 1.7,
0.76, 1.03, 0.88, 1.45, 1.81, 0.4, 0.39, 0.34, 0.35, 0.89,
0.8, 1.1, 1.77, 0.52, 1.23, 0.49, 0.46, 2.27, 0.41, 1.4,
0.58, 0.66, 0.41, 0.44, 0.87, 0.51, 0.57, 0.78, 1.18, 1.41,
1.13, 1, 1.48, 1.48, 0.4, 1.8, 0.78, 0.82, 1.23, 1.51, 3.82,
0.51, 1.59, 0.95, 1.04, 1.98, 1.3, 0.88, 0.52, 1, 1.27, NA,
1.07, 0.35, 1.33, 0.45, 0.63, 0.45, 0.32, 0.56, 0.68, 1.67,
1.3, 1.83, 0.58, 0.56, 0.44, 0.9, 0.99, 0.59, 0.63, 2.53,
1.33, 2.1, 0.91, 1.24, 1.13, 1.22, 1.64, 2.35, 1.07, 1.27,
1.4, 1.88, 0.56, 1.86, 1.3, 1.97, 0.92, 1.23, 0.34, 0.8),
dbh = c(0.2419, 0.3119, 0.1337, 0.1082, 0.1273, 0.1401, 0.1432,
0.1496, 0.2101, 0.1878, 0.3151, 0.1369, 0.1114, 0.1146, 0.1273,
0.1464, 0.1655, 0.156, 0.1273, 0.4966, 0.4011, 0.2642, 0.3501,
0.3883, 0.331, 0.331, 0.3438, 0.5411, 0.4297, 0.6016, 0.2801,
0.2005, 0.4011, 0.1432, 0.382, 0.4234, 0.1305, 0.3883, 0.2387,
0.1019, 0.1655, 0.191, 0.4361, 0.4806, 0.4106, 0.1623, 0.1464,
0.1401, 0.748, 0.5348, 0.3947, 0.1846, 0.1687, 0.2196, 0.1432,
0.1592, 0.1592, 0.1623, 0.4647, 0.3915, 0.1019, 0.4679, 0.7226,
0.1305, 0.1878, 0.1942, 0.2642, 0.1783, 0.1305, 0.1496, 0.191,
0.1114, 0.608, 0.2069, 0.1655, 0.4488, 0.3024, 0.2897, 0.4806,
0.3438, 0.3024, 0.1655, 0.5411, 0.2419, 0.3279, 0.2801, 0.4615,
0.5761, 0.1273, 0.1241, 0.1082, 0.1114, 0.2833, 0.2546, 0.3501,
0.5634, 0.1655, 0.3915, 0.156, 0.1464, 0.7226, 0.1305, 0.4456,
0.1846, 0.2101, 0.1305, 0.1401, 0.2769, 0.1623, 0.1814, 0.2483,
0.3756, 0.4488, 0.3597, 0.3183, 0.4711, 0.4711, 0.1273, 0.573,
0.2483, 0.261, 0.3915, 0.4806, 1.2159, 0.1623, 0.5061, 0.3024,
0.331, 0.6303, 0.4138, 0.2801, 0.1655, 0.3183, 0.4043, NA,
0.3406, 0.1114, 0.4234, 0.1432, 0.2005, 0.1432, 0.1019, 0.1783,
0.2165, 0.5316, 0.4138, 0.5825, 0.1846, 0.1783, 0.1401, 0.2865,
0.3151, 0.1878, 0.2005, 0.8053, 0.4234, 0.6685, 0.2897, 0.3947,
0.3597, 0.3883, 0.522, 0.748, 0.3406, 0.4043, 0.4456, 0.5984,
0.1783, 0.5921, 0.4138, 0.6271, 0.2928, 0.3915, 0.1082, 0.2546
), tree.hgt = c(11.2, 9, 9.2, 6.8, 6.2, 6, 6, 6.3, 12.2,
12, 16.5, 7.4, 6.2, 9.8, 9.7, 6, 9, 7.8, 9.2, 16.6, 16.6,
13.8, 14.5, 8.4, 14.2, 15.6, 15.8, 17.8, 14.2, 17.2, 11.6,
11, 16.2, 10.6, 16.2, 14.2, 7.2, 10.2, 12.4, 9.2, 8, 16,
16.8, 15.4, 15.2, 6.6, 6.8, 7.8, 16.3, 17, 12.4, 10.8, 11,
12, 8, 9, 11.2, 14.4, 14.4, 10, 7, 15.6, 18, 6.8, 9, 6, 9.4,
10, 8.2, 8.4, 9, 6, 18.8, 12.2, 7.2, 9.4, 19.2, 14.8, 21.4,
17.4, 17.8, 11.8, 17.8, 13, 14, 14.4, 16.7, 18, 7, 7.2, 5.5,
9.2, 9.6, 14, 16, 19.2, 11, 15.5, 7.2, 9, 19.5, 7.2, 23,
17.6, 11.8, 7.2, 7.5, 14, 11.6, 9.3, 16.8, 16.6, 15, 18.6,
22.8, 20, 19.8, 9, 18.2, 14, 19.2, 16.4, 19.8, 5.8, 11.8,
17.6, 17.8, 14.6, 17.6, 16.9, 16.3, 10.8, 17.8, 17, 20, 15,
8.4, 20.6, 9.2, 14, 8.5, 8.2, 11.2, 6.6, 18.4, 18.4, 21,
9.8, 9.2, 9, 15.2, 17.2, 10.4, 8.8, 19.2, 19, 25, 14.9, 19,
17.8, 11.3, 20, 23, 12, 17.9, 17.9, 15.2, 8, 17, 13, 14,
18, 19.4, 5.4, 16), rel.abu.tree.in.hr = c(18.7, 18.7, 18.7,
18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7,
18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 17.6, 17.6, 17.6, 4.78,
4.78, 4.78, 4.78, 4.78, 4.78, 4.78, 4.78, 4.78, 4.78, 4.78,
4.78, 0.74, 0.74, 2.7, 2.7, 2.7, 2.7, 2.7, 1.47, 1.47, 1.47,
1.47, 18.7, 18.7, 18.7, 17.6, 17.6, 17.6, 4.78, 2.7, 0.78,
0.78, 0.78, 18.7, 18.7, 0.26, 3.4, 3.4, 2.7, 2.7, 0.78, 0.78,
18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7,
18.7, 0.004, 9.19, 9.19, 9.19, 9.19, 9.19, 9.19, 9.19, 9.19,
9.19, 9.19, 9.19, 9.19, 9.19, 9.19, 9.19, 9.19, 9.19, 9.19,
9.19, 9.19, 9.19, 14.7, 14.7, 14.7, 14.7, 14.7, 14.7, 14.7,
14.7, 14.7, 14.7, 14.7, 14.7, 14.7, 14.7, 14.7, 17.6, 17.6,
17.6, 17.6, 17.6, 17.6, 17.6, 17.6, 17.6, 17.6, 17.6, 17.6,
17.6, 17.6, 17.6, 17.6, 17.6, 17.6, 17.6, 17.6, 17.6, 17.6,
17.6, 16.53, 16.53, 16.53, 16.53, 16.53, 16.53, 16.53, 16.53,
16.53, 16.53, 16.53, 16.53, 16.53, 16.53, 16.53, 16.53, 16.53,
16.53, 16.53, 16.53, 16.53, 16.53, 16.53, 16.53, 16.53, 3.4,
3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 0.74, 0.74,
0.74, 0.74), prop.hab.class.in.hr = c(18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 18.42105263, 18.42105263, 18.42105263, 18.42105263,
18.42105263, 2.631578947, 2.631578947, 2.631578947, 2.631578947,
2.631578947, 2.631578947, 2.631578947, 2.631578947, 2.631578947,
2.631578947, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842, 78.94736842, 78.94736842,
78.94736842, 78.94736842, 78.94736842), k.pres = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
rel_abun = c(344.473684181, 344.473684181, 344.473684181,
344.473684181, 344.473684181, 344.473684181, 344.473684181,
344.473684181, 344.473684181, 344.473684181, 344.473684181,
344.473684181, 344.473684181, 344.473684181, 344.473684181,
344.473684181, 344.473684181, 344.473684181, 344.473684181,
324.210526288, 324.210526288, 324.210526288, 88.0526315714,
88.0526315714, 88.0526315714, 88.0526315714, 88.0526315714,
88.0526315714, 88.0526315714, 88.0526315714, 88.0526315714,
88.0526315714, 88.0526315714, 88.0526315714, 13.6315789462,
13.6315789462, 49.736842101, 49.736842101, 49.736842101,
49.736842101, 49.736842101, 27.0789473661, 27.0789473661,
27.0789473661, 27.0789473661, 344.473684181, 344.473684181,
344.473684181, 324.210526288, 324.210526288, 324.210526288,
88.0526315714, 49.736842101, 2.05263157866, 2.05263157866,
2.05263157866, 49.2105263089, 49.2105263089, 0.68421052622,
8.9473684198, 8.9473684198, 7.1052631569, 7.1052631569, 61.5789473676,
61.5789473676, 1476.315789454, 1476.315789454, 1476.315789454,
1476.315789454, 1476.315789454, 1476.315789454, 1476.315789454,
1476.315789454, 1476.315789454, 1476.315789454, 1476.315789454,
0.31578947368, 725.5263157798, 725.5263157798, 725.5263157798,
725.5263157798, 725.5263157798, 725.5263157798, 725.5263157798,
725.5263157798, 725.5263157798, 725.5263157798, 725.5263157798,
725.5263157798, 725.5263157798, 725.5263157798, 725.5263157798,
725.5263157798, 725.5263157798, 725.5263157798, 725.5263157798,
725.5263157798, 725.5263157798, 1160.526315774, 1160.526315774,
1160.526315774, 1160.526315774, 1160.526315774, 1160.526315774,
1160.526315774, 1160.526315774, 1160.526315774, 1160.526315774,
1160.526315774, 1160.526315774, 1160.526315774, 1160.526315774,
1160.526315774, 1389.473684192, 1389.473684192, 1389.473684192,
1389.473684192, 1389.473684192, 1389.473684192, 1389.473684192,
1389.473684192, 1389.473684192, 1389.473684192, 1389.473684192,
1389.473684192, 1389.473684192, 1389.473684192, 1389.473684192,
1389.473684192, 1389.473684192, 1389.473684192, 1389.473684192,
1389.473684192, 1389.473684192, 1389.473684192, 1389.473684192,
1304.9999999826, 1304.9999999826, 1304.9999999826, 1304.9999999826,
1304.9999999826, 1304.9999999826, 1304.9999999826, 1304.9999999826,
1304.9999999826, 1304.9999999826, 1304.9999999826, 1304.9999999826,
1304.9999999826, 1304.9999999826, 1304.9999999826, 1304.9999999826,
1304.9999999826, 1304.9999999826, 1304.9999999826, 1304.9999999826,
1304.9999999826, 1304.9999999826, 1304.9999999826, 1304.9999999826,
1304.9999999826, 268.421052628, 268.421052628, 268.421052628,
268.421052628, 268.421052628, 268.421052628, 268.421052628,
268.421052628, 268.421052628, 268.421052628, 58.4210526308,
58.4210526308, 58.4210526308, 58.4210526308)), row.names = c(NA,
-175L), class = "data.frame")
# Calculate a weighted proportion for test$tree.sp
# Weighting variable is test$rel.abu.tree.in.hr
# Calculate weighted proportion
library(survey)
dsurvey <- svydesign(ids = ~1, data = test, weights = ~rel.abu.tree.in.hr)
wpct <- data.frame(svymean(~tree.sp, design = dsurvey))
Outcome of above
wpct
mean SE
tree.spA. littoralis 1.830415e-03 8.345005e-04
tree.spC. gummifera 3.071812e-01 4.180415e-02
tree.spE. amplifolia 1.877349e-06 1.889682e-06
tree.spE. beyeriana 4.744530e-02 1.424895e-02
tree.spE. crebra 4.313209e-02 1.359431e-02
tree.spE. fibrosa 1.034889e-01 2.547820e-02
tree.spE. globoidea 2.395497e-01 3.825613e-02
tree.spE. longifolia 1.939536e-01 3.472975e-02
tree.spE. oblonga 2.916462e-02 8.264006e-03
tree.spE. piperita 1.220277e-04 1.228147e-04
tree.spE. punctata 1.914896e-02 5.681831e-03
tree.spE. resinifera 2.083857e-03 8.701564e-04
tree.spE. sclerophylla 1.013768e-02 3.663735e-03
tree.spE. sieberi 2.759703e-03 1.400363e-03

problem with paired t-test: not all arguments have the same length

I want to do paired t-test with a data frame. I think I grouped them right but do not know why it reports the error:
Error in complete.cases(x, y) : not all arguments have the same length.
centre_g is my data frame containing all the info I want to use in my analysis. Paired t-test is a right way to do it.
str(centre_g)
# Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':
# 24 obs. of 17 variables
# (I will only list two variables that is used for my anaysis):
# $ BA: Factor w/ 2 levels "after","before": 2 1 2 1 2 1 2 1 2 1 ...
# $ Pb: num 437 1183 1465 3105 NA ...
I used to extract "before" and "after" for "Pb", i.e. I extracted two vectors in the data frame, and did paired t-test, it works fine
(tResult <- t.test(before$Pb, after$Pb, paired = TRUE))
but when I tried to do the paired t-test directly on my data frame, it has the error message mentioned in the question
(tResult <- t.test(Pb ~ BA, data = centre_g, paired = TRUE))
I tried several times, with grouped data or sorted data. I do not know what is wrong with the second method. Is it because the NA values I have got in my data frame? but the first method is fine?
Since I have quite a lot more information in my data frame waiting to be analysed, I do not want to extract vectors for every single of them. I hope to do my paired t-test on my data frame. Could anyone help me?
the detail of centre_g is:
structure(list(day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), SAMPLE.No = structure(c(1L,
13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 3L, 5L, 7L, 9L, 11L,
1L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 3L), .Label = c("s1",
"s1.2", "s10", "s10.2", "s11", "s11.2", "s12", "s12.2", "s13",
"s13.2", "s14", "s14.2", "s2", "s2.2", "s3", "s3.2", "s4", "s4.2",
"s5", "s5.2", "s6", "s6.2", "s7", "s7.2", "s8", "s8.2", "s9",
"s9.2"), class = "factor"), weir = c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L, 12L, 12L), BA = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L), .Label = c("after", "before"), class = "factor"), centre.bank = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("bank", "centre"), class = "factor"),
Pb = c(436.65, 1182.93, 1465.21, 3105.36, 39.1, 1493.91,
NA, 165.28, 38.83, 351.48, 80.26, 47.39, 151.27, 434.01,
-97.58, 240.83, 56.8, 40.24, 38.8, NA, 41.13, 38.93, 44.39,
39.05), Pb.Error = c(16.41, 30.01, 51.26, 102.44, 27.21,
79.63, NA, 13.82, 48.78, 16.71, 19.1, 21.43, 18.65, 21.41,
232.7, 18.83, 12.19, 15.28, 11.94, NA, 22.24, 14.01, 10.56,
9.63), Zn = c(542.52, 981.83, 1234.78, 7554.41, 529.38, 5240.01,
NA, 542.65, 526.08, 820.87, 649.7, 793.42, 707.23, 1204.3,
-34.56, 209.86, 172.5, 130.29, 187.96, NA, 234.57, 137.38,
165.21, 135.05), Zn.Error = c(19.5, 29.31, 48.12, 161.54,
42.36, 144.56, NA, 23.37, 52.5, 26.18, 33.33, 39.87, 31.89,
35.79, 44.83, 17.24, 15.11, 21.25, 19.76, NA, 26.65, 18.67,
15.12, 13.97), Fe = c(3731.23, 14239.54, 23774.52, 52349.37,
3896.63, 13311.26, NA, 2756.96, 3511.06, 2664.12, 2383.16,
2785.75, 2834.59, 6288.39, -321.14, 14704.05, 3825.8, 5017.52,
13181.67, NA, 31190.39, 8516.23, 14130, 18348.01), Fe.Error = c(106.82,
229.87, 432.59, 884.29, 239.03, 496.1, NA, 111.92, 283.9,
102.44, 137.69, 161.02, 137.66, 172.32, 187.37, 274.6, 140.64,
240.97, 310.62, NA, 565.41, 265.57, 260.75, 291.45), Mn = c(110.65,
1337.08, 1126.82, 3495.03, 410.99, 5267.34, NA, 314.42, 338.8,
591.99, 308.46, 427.59, 573.87, 896.23, 277.82, 421.17, 969.72,
535.07, 879.97, NA, 742.39, 350.62, 379.98, 834.36), Mn.Error = c(43.39,
93.86, 133.34, 297.53, 125.08, 410.14, NA, 63.25, 155.08,
68.16, 82.1, 96.34, 88.97, 89.89, 1470.88, 78, 92.24, 118.6,
112.32, NA, 134.87, 91.97, 72.7, 91.12), Cr = c(-38.15, 50.8,
25.9, 53.32, 21.52, 132.82, NA, 8.13, 5.46, 35.07, 93.78,
88.18, 71.23, 47.26, 32.91, 25.49, 10.36, 19.99, 5.13, NA,
32.61, 22.13, 47.5, -5.82), Cr.Error = c(9.05, 16.41, 7.7,
9.99, 4.58, 33.88, NA, 7.84, 2.86, 9.18, 8.75, 7.55, 7.98,
9.62, 6.38, 5.54, 6.72, 4.6, 6.5, NA, 6.64, 4.62, 9.51, 11.3
), Ca = c(32195.21, 46510.98, 21723.24, 17820.74, 14639.01,
45937.9, NA, 37840.08, 4704.64, 37705.36, 28625.21, 25115.24,
41579.19, 91829.16, 19752.96, 14605.4, 34654.73, 15798.87,
13873.07, NA, 22901.14, 4097.09, 12053.38, 276525.69), Ca.Error = c(211.2,
326.69, 160.54, 142.76, 120.63, 304.76, NA, 219.4, 66.28,
225.41, 187.03, 169.88, 226.15, 378.53, 149.92, 125.47, 208.18,
127.73, 127.4, NA, 168.31, 64.51, 128.02, 908.61)), row.names = c(1L,
4L, 6L, 8L, 10L, 12L, 13L, 16L, 17L, 19L, 21L, 23L, 26L, 28L,
29L, 32L, 34L, 36L, 38L, 39L, 42L, 43L, 46L, 48L), class = "data.frame")
I am interested in doing paired t test on "Pb" column, trying to compare "before" and "after" (as shown in column "BA"). Each "weir" would be an individual.

I have worked it our after a day. I found it is because a row of NA data. There are some places where I did not manage to take samples, so there appears to be a whole row of NA data (except the factors columns).
To make sure the data frame has the whole length (24 instead of 23) and does not omit NA data, add na.rm = FALSE when subsetting the data frame into centre_g.
centre_g <- subset(HM_selected, centre.bank == "centre", na.rm = FALSE)
(I think I gave the right centre_g in my question dataset, but occationally I just got 23 data. adding na.rm to make sure how NA data are processed)
When doing the paired t-test, also add na.rm = FALSE.
(tRESULT <- t.test(Pb ~ BA, data = centre, paired = TRUE, na.rm = FALSE)
and that works perfectly for me.
sorry if there is any confusion in the question

Using cast() or ddply() to summarise the mean for two continuous variables in one dataframe

The data (below) has two columns named "Date" and "Independent Variable (IV)" containing factors, plus two extra columns called "Independent_value" and "Sapflow" containing continuous values.
Column Descriptions:
Date = measurements of the independent variables over 5 months (June-October).
Independent Variable = 3 independent variables (i.e temperature, humidity, and radiation).
Independent Value = represents readings of temperature, radiation, and humidity over daily time steps from June to October.
Sapflow (dependent variable) = sapflow rates in tree species recorded over daily time steps from June to October and how the independent variables may affect these rates of sapflow.
Goal
In this instance, I would like to summarise the data (found below) by group (i.e. Date and Independent variable) using either cast() or ddply() to produce a new data frame showing the mean recorded value of each independent variable (temperature, humidity, and radiation) per month and the mean rate of sapflow for independent variables per month in the following format:-
*Key
*IV = independent variable (i.e.Temperature, Humidity, and Radation)
*Mean_IV = the mean of the independent variable
*Mean_Sapflow (dependent variable) = the mean sapflow rate per month per independent variable
Date IV Mean_IV Mean_Sapflow
1 June Humidity 19.67 14.97
2 June Humidity 18.82 16.31
3 June Humidity 20.38 17.52
4 June Humidity 14.94 7.45
5 June Humidity 12.92 12.18
6 June Humidity 15.28 15.82
Problem:
I have tried using ddply() and cast() and I cannot produce the dataframe format shown above. If anyone can help, I would be deeply appreciative.
*ddply
library(plyr)
summarised_Sapflow<-ddply(Sapflow_new, c("Date", "Independent_Variable"), summarise,
N=length(Independent_Value),
mean("Independent Value","Sapflow"))
The output is a series of warnings:
Warning messages:
1: In mean.default("Independent Value", "Sapflow") :
argument is not numeric or logical: returning NA
2: In mean.default("Independent Value", "Sapflow") :
argument is not numeric or logical: returning NA
3: In mean.default("Independent Value", "Sapflow") :
argument is not numeric or logical: returning NA
cast()
library(reshape)
Sapflow.Summary<-cast(Sapflow_new,
Date~Independent_Variable, mean,
value=c('Independent_Value','Sapflow'))
This output is very close to my goal but mean sapflow is missing and the months contained in the "Date" are organised in the wrong order because the arrangment of my code is probably incorrect.
Date Humidity Radiation Temperature
1 August 18.38968 178.9806 71.73355
2 July 21.80065 270.9065 61.33065
3 June 17.60733 263.6733 70.56133
4 October 11.34867 93.6000 81.74300
5 September 14.82200 152.2333 72.21367
Data:
structure(list(Date = structure(c(3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L,
5L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L,
4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L,
5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L,
4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L,
5L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L,
4L), .Label = c("August", "July", "June", "October",
"September"
), class = "factor"), Independent_Variable =
structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Humidity",
"Radiation",
"Temperature"), class = "factor"), Independent_Value =
c(19.67,
18.82, 20.38, 14.94, 12.92, 15.28, 15.12, 16.05, 15.19,
16.67,
18.69, 14.61, 16.71, 17.35, 16.98, 15.44, 15.21, 18.62,
20.11,
18.64, 15.66, 17.2, 18.21, 19.32, 23.02, 21.69, 18.03,
18.46,
18.45, 20.78, 23.04, 22.05, 19.71, 20.59, 24.89, 23.34,
24.7,
24.2, 22.43, 18.21, 17.66, 18.23, 20.36, 22.83, 23.52,
22.88,
19.59, 21.51, 22.25, 21.47, 22.03, 22.51, 25.54, 24.01,
24.28,
26.21, 23.72, 17.63, 17.27, 19.19, 19.97, 19.84, 22.78,
24.46,
23.05, 23.31, 24.75, 23.23, 18.91, 15.56, 13.51, 15.8,
17.67,
19.18, 18.93, 20.05, 17.1, 16.87, 18.77, 20.49, 21.5,
18.04,
18.82, 17.38, 13.05, 13.13, 13.48, 16.32, 16.74, 16.11,
15.77,
15.48, 18.17, 18.16, 18.44, 16.63, 16.64, 14.47, 13.07,
14.14,
17.27, 16.71, 18.22, 12.9, 13.95, 14.7, 15.78, 17.52,
19.66,
18.87, 18.07, 16.4, 12.92, 10.57, 10.04, 9.78, 10.24,
14.25,
15.92, 11.59, 9.25, 10.33, 11.22, 15.03, 13.67, 14.26,
15.42,
8.34, 8.56, 12.37, 14.38, 15.47, 16.4, 17.15, 20.05,
11.08, 10.63,
14.34, 13.27, 9.33, 8.1, 10.95, 12.79, 8.64, 11.42,
12.12, 9.91,
7.86, 3.51, 4.97, 3.63, 5.59, 85.07, 79.72, 72.83, 90.1,
83.02,
73.34, 77.11, 74.79, 81.66, 77.71, 66.14, 78.15, 69.33,
68.13,
60.31, 69.47, 81.86, 78.63, 77.69, 77.56, 52.88, 53.32,
53.74,
55.85, 49.56, 55.3, 69.25, 74.96, 69.29, 60.07, 54.31,
48.6,
55.73, 56.74, 47.66, 60.51, 55.64, 58.39, 63.8, 63.16,
73.65,
71.08, 64.34, 60.1, 51.61, 54.87, 58.23, 52.49, 52.56,
59.64,
67.85, 64.42, 60.08, 59.71, 57.12, 58.7, 68.85, 72.44,
89.13,
77.67, 62.17, 61.3, 63.58, 66.26, 60.09, 56.63, 53.11,
59.84,
60.06, 80.76, 79.51, 73.96, 84.58, 78.77, 71.65, 72.59,
77.52,
69.04, 78.26, 77.22, 73.75, 81.95, 82.04, 78.14, 73.41,
72.76,
90.68, 74.24, 71.3, 74.4, 60.26, 66.08, 65.18, 57.17,
66.88,
75.53, 71.52, 74.97, 66.02, 78.06, 73.58, 68.18, 83.55,
80.4,
66.28, 72.32, 72.39, 77.74, 69.81, 74.21, 77.37, 88.28,
65.33,
87.54, 80.49, 69.58, 68.18, 69.25, 60.06, 66.38, 68.51,
71.65,
63.29, 76.63, 80.46, 85.56, 81.25, 94.48, 73.87, 76.8,
72.83,
77.55, 81.5, 77.7, 75.79, 94.38, 99.55, 94.14, 87.29,
84.81,
82.63, 85.27, 84.52, 71.13, 76.28, 78.06, 82.83, 75.18,
83.8,
85.38, 84, 85.33, 197.8, 195.5, 288, 72, 160.5, 337.1,
176.9,
242.3, 189.4, 295.7, 363.2, 158, 290, 251.2, 297.3,
192.6, 163.5,
274.5, 210.7, 243.4, 287.4, 375.7, 290.5, 336.4, 361.6,
369.2,
302.6, 295.2, 348.5, 343.5, 327.6, 358.9, 358.6, 288.9,
325.6,
307.8, 321.3, 321.5, 280.6, 264.9, 253, 279.5, 318.1,
285.1,
330.8, 252, 201, 229.9, 259.3, 230.4, 265.5, 214.1, 307,
311.1,
282.5, 256.9, 227.2, 263.4, 68.2, 130.8, 276.6, 299.2,
276.5,
243.9, 291, 289.3, 290.6, 259.6, 220.5, 72.7, 158.9,
233.8, 105.9,
164.2, 168.1, 188.7, 120.1, 217.7, 111.2, 114.7, 143.6,
55.2,
108.5, 162.2, 185, 197.7, 54.1, 126.3, 111.2, 135.4,
228.3, 214.3,
240.1, 247.6, 173, 172.4, 131.9, 149.4, 203.1, 92.3,
168.5, 146.6,
65.9, 103.6, 200.2, 131.3, 183.5, 128.3, 140.6, 124.1,
125.9,
75.8, 173.2, 47.9, 111.7, 205.8, 188.3, 175.6, 193.7,
170.4,
188.3, 108, 171.1, 59.5, 87.7, 142.2, 111.8, 26.3,
129.9, 103.1,
158.7, 147.9, 109.8, 67.8, 106.6, 12.3, 15.8, 53, 63.4,
86.2,
123.3, 112.9, 128.2, 141.9, 81.6, 102, 86.8, 83.9, 50,
96.8,
100.5, 47), Sapflow = c(14.97, 16.31, 17.52, 7.45,
12.18, 15.82,
11.79, 14.45, 10.95, 13.62, 16.28, 11.42, 16.13, 15.09,
17.28,
14.43, 11.7, 16.06, 17.66, 16.33, 17.79, 18.58, 19.41,
19.8,
21.63, 21.35, 17.81, 17.56, 19.37, 21.27, 23.26, 23.67,
22.64,
21.85, 24.81, 22.36, 24.72, 23.87, 23.67, 22.01, 19.23,
19.92,
21.99, 23.6, 24.9, 24.46, 22.22, 23.95, 24.81, 23.88,
22.98,
24.47, 26.09, 25.97, 25.82, 26.24, 25.09, 22, 16.91,
21.35, 25.32,
25.76, 26.38, 25.78, 25.77, 25.15, 26.29, 26.22,
24.59, 18.26,
18.91, 21.57, 21.37, 21.29, 23.96, 24.85, 21.02, 23.05,
22.69,
23.9, 25.24, 25.4, 23.19, 22.8, 22.08, 21.86, 13.82,
22.05, 23.21,
20.12, 22.73, 21.88, 23.33, 24.76, 23.5, 22.06, 22.01,
20.65,
21.54, 19.9, 21.67, 21.84, 18.82, 17.99, 21.41, 23.53,
23.39,
25.75, 22.62, 22.25, 21.81, 16.81, 20.42, 12.08, 12.36,
15.31,
14.14, 15.48, 15.18, 14.19, 12.09, 12.39, 12.34, 12.61,
10.79,
10.53, 11.29, 9.92, 9.79, 10.86, 10.98, 10.58, 12.54,
12.52,
12.25, 6.38, 0.91, 5.24, 6.56, 5.72, 4.55, 4.99, 2.88,
0.99,
1.03, 1.57, 2.07, 2.3, 2.22, 2.11, 2.21, 2.29, 14.97,
16.31,
17.52, 7.45, 12.18, 15.82, 11.79, 14.45, 10.95, 13.62,
16.28,
11.42, 16.13, 15.09, 17.28, 14.43, 11.7, 16.06, 17.66,
16.33,
17.79, 18.58, 19.41, 19.8, 21.63, 21.35, 17.81, 17.56,
19.37,
21.27, 23.26, 23.67, 22.64, 21.85, 24.81, 22.36,
24.72, 23.87,
23.67, 22.01, 19.23, 19.92, 21.99, 23.6, 24.9, 24.46,
22.22,
23.95, 24.81, 23.88, 22.98, 24.47, 26.09, 25.97, 25.82,
26.24,
25.09, 22, 16.91, 21.35, 25.32, 25.76, 26.38, 25.78,
25.77, 25.15,
26.29, 26.22, 24.59, 18.26, 18.91, 21.57, 21.37, 21.29,
23.96,
24.85, 21.02, 23.05, 22.69, 23.9, 25.24, 25.4, 23.19,
22.8, 22.08,
21.86, 13.82, 22.05, 23.21, 20.12, 22.73, 21.88, 23.33,
24.76,
23.5, 22.06, 22.01, 20.65, 21.54, 19.9, 21.67, 21.84,
18.82,
17.99, 21.41, 23.53, 23.39, 25.75, 22.62, 22.25, 21.81,
16.81,
20.42, 12.08, 12.36, 15.31, 14.14, 15.48, 15.18, 14.19,
12.09,
12.39, 12.34, 12.61, 10.79, 10.53, 11.29, 9.92, 9.79,
10.86,
10.98, 10.58, 12.54, 12.52, 12.25, 6.38, 0.91, 5.24,
6.56, 5.72,
4.55, 4.99, 2.88, 0.99, 1.03, 1.57, 2.07, 2.3, 2.22,
2.11, 2.21,
2.29, 14.97, 16.31, 17.52, 7.45, 12.18, 15.82, 11.79,
14.45,
10.95, 13.62, 16.28, 11.42, 16.13, 15.09, 17.28, 14.43,
11.7,
16.06, 17.66, 16.33, 17.79, 18.58, 19.41, 19.8, 21.63,
21.35,
17.81, 17.56, 19.37, 21.27, 23.26, 23.67, 22.64, 21.85,
24.81,
22.36, 24.72, 23.87, 23.67, 22.01, 19.23, 19.92, 21.99,
23.6,
24.9, 24.46, 22.22, 23.95, 24.81, 23.88, 22.98, 24.47,
26.09,
25.97, 25.82, 26.24, 25.09, 22, 16.91, 21.35, 25.32,
25.76, 26.38,
25.78, 25.77, 25.15, 26.29, 26.22, 24.59, 18.26, 18.91,
21.57,
21.37, 21.29, 23.96, 24.85, 21.02, 23.05, 22.69, 23.9,
25.24,
25.4, 23.19, 22.8, 22.08, 21.86, 13.82, 22.05, 23.21,
20.12,
22.73, 21.88, 23.33, 24.76, 23.5, 22.06, 22.01, 20.65,
21.54,
19.9, 21.67, 21.84, 18.82, 17.99, 21.41, 23.53, 23.39,
25.75,
22.62, 22.25, 21.81, 16.81, 20.42, 12.08, 12.36, 15.31,
14.14,
15.48, 15.18, 14.19, 12.09, 12.39, 12.34, 12.61, 10.79,
10.53,
11.29, 9.92, 9.79, 10.86, 10.98, 10.58, 12.54, 12.52,
12.25,
6.38, 0.91, 5.24, 6.56, 5.72, 4.55, 4.99, 2.88, 0.99,
1.03, 1.57,
2.07, 2.3, 2.22, 2.11, 2.21, 2.29)), class =
"data.frame", row.names = c(NA,
-456L))

It is not a ddply() or a cast() solution, but using tidyverse and reshape2 you can do:
df %>%
group_by(Date, Independent_Variable) %>%
summarise(Independent_Value = mean(Independent_Value)) %>%
mutate(Independent_Variable = paste(Independent_Variable, "IV", sep = "_")) %>%
dcast(Date~Independent_Variable, value.var = "Independent_Value") %>%
arrange(factor(Date, levels = month.name)) %>%
left_join(df %>%
group_by(Date, Independent_Variable) %>%
summarise(Sapflow = mean(Sapflow)) %>%
mutate(Independent_Variable = paste(Independent_Variable, "Sapflow", sep = "_")) %>%
dcast(Date~Independent_Variable, value.var = "Sapflow") %>%
arrange(factor(Date, levels = month.name)),
by = c("Date" = "Date"))
Date Humidity_IV Radiation_IV Temperature_IV Humidity_Sapflow
1 June 17.60733 263.6733 70.56133 16.067000
2 July 21.80065 270.9065 61.33065 23.356774
3 August 18.38968 178.9806 71.73355 22.941613
4 September 14.82200 152.2333 72.21367 19.309333
5 October 11.34867 93.6000 81.74300 6.700667
Radiation_Sapflow Temperature_Sapflow
1 16.067000 16.067000
2 23.356774 23.356774
3 22.941613 22.941613
4 19.309333 19.309333
5 6.700667 6.700667
First, it is grouping by "Date" and "Independent_Variable" and summarising "Independent_Value". Second, it is adding "_IV" to the values in Independent_Variable. Third, it is reshaping the data and arranging according the real order of months. Fourth, it is doing the first three steps for "Sapflow". Finally, it is merging the two.
Or by using just tidyverse:
df %>%
group_by(Date, Independent_Variable) %>% #Grouping
summarise_all(funs(mean = mean(.))) %>% #Summarising all variables and adding "_mean" to the new variables
arrange(factor(Date, levels = month.name)) #Arranging according the real order of months
Date Independent_Variable Independent_Value_mean Sapflow_mean
<fct> <fct> <dbl> <dbl>
1 June Humidity 17.6 16.1
2 June Radiation 264. 16.1
3 June Temperature 70.6 16.1
4 July Humidity 21.8 23.4
5 July Radiation 271. 23.4
6 July Temperature 61.3 23.4

Bar graph in ggplot2 with width as a variable and even spacing between bars

So I am trying to make a stacked bar graph with bar width mapped to a variable; but I want the spacing between my bars to be constant.
Does anyone know how to make the spacing constant between the bars?
Right now I've got this:
p<-ggplot(dd, aes(variable, value.y, fill=Date, width=value.x / 15))+ coord_flip() + opts(ylab="")
p1<-p+ geom_bar(stat="identity") + scale_fill_brewer(palette="Dark2") + scale_fill_hue(l=55,c=55)
p2<-p1 + opts(axis.title.x = theme_blank(), axis.title.y = theme_blank())
p2
Thanks in advance.
Here's my data by the way (sorry for the long, bulky dput):
> dput(dd)
structure(list(variable = structure(c(1L, 1L, 1L, 1L, 1L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 7L, 7L, 7L, 7L, 7L, 2L, 2L,
2L, 2L, 2L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 9L, 9L, 9L,
9L, 9L, 8L, 8L, 8L, 8L, 8L), .Label = c("Alcohol and Tobacco",
"Health and Personal Care", "Clothing", "Energy", "Recreation and Education",
"Household", "Food", "Transportation", "Shelter"), class = "factor", scores = structure(c(2.91,
5.31, 10.08, 15.99, 4.95, 11.55, 11.2, 27.49, 20.6), .Dim = 9L, .Dimnames = list(
c("Alcohol and Tobacco", "Clothing", "Energy", "Food", "Health and Personal Care",
"Household", "Recreation and Education", "Shelter", "Transportation"
)))), value.x = c(2.91, 2.91, 2.91, 2.91, 2.91, 5.31, 5.31,
5.31, 5.31, 5.31, 10.08, 10.08, 10.08, 10.08, 10.08, 15.99, 15.99,
15.99, 15.99, 15.99, 4.95, 4.95, 4.95, 4.95, 4.95, 11.55, 11.55,
11.55, 11.55, 11.55, 11.2, 11.2, 11.2, 11.2, 11.2, 27.49, 27.49,
27.49, 27.49, 27.49, 20.6, 20.6, 20.6, 20.6, 20.6), Date = structure(c(5L,
4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L,
3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L, 3L,
2L, 1L, 5L, 4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L), .Label = c("1993-2001",
"2001-2006", "2007-2010", "2010-2011", "2012 Jan - May"), class = "factor"),
value.y = c(2.1, 2.5, 7.6, 21.7, 2.8, 1.5, 0.3, -4.1, -4.2,
4.7, 3, 16.9, 1.9, 32.8, 23.9, 3.2, 4.6, 11.3, 8.9, 12.9,
1.7, 2, 7.8, 5.9, 10, 1.9, 2.1, 5.6, 2.2, 9.9, 1.4, 1.3,
2.2, 0.6, 17.3, 1.1, 2.3, 6.4, 13.1, 10, 4.3, 7.6, 0.9, 15.2,
20.5)), .Names = c("variable", "value.x", "Date", "value.y"
), row.names = c(NA, -45L), class = "data.frame")

For a categorical or "discrete" scale - you can adjust the width, but it needs to be between 0 and 1. Your value.x's put it over 1, hence the overlap. You can use rescale, from the scales packages to adjust this quickly so that the within category width of the bar is representative of some other variable (in this case value.x)
install.packages("scales")
library(scales)
ggplot(dd,aes(x=variable,y=value.y,fill=Date)) +
geom_bar(aes(width=rescale(value.x,c(0.5,1))),stat="identity",position="stack")' +
coord_flip()
Play with rescaling for optimal "view" change 0.5 to 0.25... etc.
Personally, I think something like this is more informative:
ggplot(dd,aes(x=variable,y=value.y,fill=Date)) +
geom_bar(aes(width=rescale(value.x,c(0.2,1))),stat="identity") +
coord_flip() + facet_grid(~Date) + opts(legend.position="none")

Attempt # 2.
I'm tricking ggplot2 into writing a continuous scale as categorical.
# The numbers for tmp I calculated by hand. Not sure how to program
# this part but the math is
# last + half(previous_width) + half(current_width)
# Change the 1st number in cumsum to adjust the between category width
tmp <- c(2.91,7.02,14.715,27.75,38.22,46.47,57.845,77.19,101.235) + cumsum(rep(5,9))
dd$x.pos1 <- rep(tmp,each=5)
ggplot(dd,aes(x=x.pos1,y=value.y,fill=Date)) +
geom_bar(aes(width=value.x),stat="identity",position="stack") +
scale_x_continuous(breaks=tmp,labels=levels(dd$variable)) +
coord_flip()
For good measure you're probably going to want to adjust the text size. That's done with ... + opts(axis.text.y=theme_text(size=12))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Create boxplots with groups spread across multiple columns - r

a ggplot and reshape2 way; library(reshape2) df %>% melt(id.vars='id',measure.vars = c('t1','t2','t3')) %>% ggplot(aes(x=variable,y=value))+ geom_boxplot(aes(color=variable))

Related

How can i add percentage breakdown analysis of Total to the table using mutate or rowsums

Weighted proportions and confidence intervals

problem with paired t-test: not all arguments have the same length

Using cast() or ddply() to summarise the mean for two continuous variables in one dataframe

Bar graph in ggplot2 with width as a variable and even spacing between bars

Categories

Resources