Related
[![enter image description here][2]][2][![i need help in writing gstummary r code to produce following table output.dummy table shown in above table][2]][2]
i need help in writing gstummary r code to produce following table output.dummy table shown in above table
[![enter image description here][2]][2]
library(gtsummary)
[![enter image description here][2]][2]
[![enter image description here][3]][3]
id
age
sex
country
edu
ln
ivds
n2
p5
1
a
M
eng
x
45
15
40
15
2
a
M
eng
x
23
26
70
15
4
a
M
eng
x
26
36
35
40
5
b
F
eng
x
26
25
36
47
6
b
F
wal
y
45
45
60
12
7
b
M
wal
y
60
25
36
15
8
c
M
wal
y
70
08
25
36
9
c
F
sco
z
80
25
36
15
10
c
F
sco
z
90
25
26
39
structure(list(id = 1:15, age = structure(c(1L, 1L, 2L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 1L, 1L, 2L, 1L, 2L), .Label = c("a", "b",
"c"), class = "factor"), sex = structure(c(2L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("F", "M"), class = "factor"),
country = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 3L), .Label = c("eng", "scot", "wale"
), class = "factor"), edu = structure(c(1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L), .Label = c("x",
"y", "z"), class = "factor"), lon = c(45L, 23L,
25L, 45L, 70L, 69L, 90L, 50L, 62L, 45L, 23L, 25L, 45L, 70L,
69L), is = c(15L, 26L, 36L, 34L, 2L, 4L, 5L, 8L, 9L,
15L, 26L, 36L, 34L, 2L, 4L), n2 = c(40L, 70L, 50L, 60L,
30L, 25L, 80L, 89L, 10L, 40L, 70L, 50L, 60L, 30L, 25L), p5 = c(15L,
20L, 36L, 48L, 25L, 36L, 28L, 15L, 25L, 15L, 20L, 36L, 48L,
25L, 36L)), row.names = c(NA, 15L), class = "data.frame")
[
I made a table similar to what you have above (more similar to the table you had before you updated it). But I think it'll get you most of the way there.
The type of table you're requesting it something that is in the works. In the meantime, you will need to use the bstfun::tbl_2way_summary() function. This function exists in another package while we work to make it better before integrating with gtsummary.
library(bstfun) # install with `remotes::install_github("ddsjoberg/bstfun")`
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.1'
# add a column that is all the same value
trial2 <- trial %>% mutate(constant = TRUE)
# loop over each continuous variable, construct table, then merge them together
tbls_row1 <-
c("age", "marker", "ttdeath") %>%
purrr::map(
~tbl_2way_summary(data = trial2, row = grade, col = constant, con = all_of(.x),
statistic = "{mean} ({sd}) - {min}, {max}") %>%
modify_header(stat_1 = paste0("**", .x, "**"))
) %>%
tbl_merge() %>%
modify_spanning_header(everything() ~ NA)
# repeat for the second row
tbls_row2 <-
c("age", "marker", "ttdeath") %>%
purrr::map(
~tbl_2way_summary(data = trial2, row = stage, col = constant, con = all_of(.x),
statistic = "{mean} ({sd}) - {min}, {max}") %>%
modify_header(stat_1 = paste0("**", .x, "**"))
) %>%
tbl_merge() %>%
modify_spanning_header(everything() ~ NA)
# stack these tables
tbl_stacked <- tbl_stack(list(tbls_row1, tbls_row2))
# lastly, add calculated summary stats for categorical variables, and merge them
tbl_summary_stats <-
trial2 %>%
tbl_summary(
include = c(grade, stage),
missing = "no"
) %>%
modify_header(stat_0 ~ "**n (%)**") %>%
modify_footnote(everything() ~ NA)
tbl_final <-
tbl_merge(list(tbl_summary_stats, tbl_stacked)) %>%
modify_spanning_header(everything() ~ NA) %>%
# column spanning column headers
modify_spanning_header(
list(c(stat_1_1_2, stat_1_2_2) ~ "**Group 1**",
stat_1_3_2 ~ "**Group 2**")
)
Created on 2021-07-10 by the reprex package (v2.0.0)
I am trying to show the distribution of data between three different methods(FAP, One PIT (onetrans), Two PIT (twotrans), shown in facets below) for measuring the forest fuels. My count on the y-axis is the number of sample points that estimate the grouped value on the x-axis (Total.kg.m2). The Total.kg.m2 is a continuous variable. I don't particularly care how big the binwidth is on the x-axis is but I want only values that are exactly zero to be above the "0" label. My current graph [1] is misrepresentative because there are no sample points that estimate "0" for the FAP method. Below is some example data and my code. How can I do this more effectively? My dataframe is called "cwd" but I have included a subset at the bottom.
My current graph:
The code for my current graph:
method_names <- c(`FAP` = "FAP", `onetrans` = "PIT - One Transect ", `twotrans` ="PIT - Two Transects")
ggplot(sampleData, aes(Total.kg.m2)) +
geom_histogram(bins=40, color = "black", fill = "white") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
legend.position = "none",axis.text=element_text(size=10), axis.title =
element_text(size = 12)) +
scale_x_continuous(name= expression("kg m"^"-2"), breaks =seq(0,16,1)) +
scale_y_continuous(name = "Count", breaks = seq(0, 80,10), limits= c(0,70)) +
facet_grid(.~method) +
facet_wrap(~method, ncol =1, labeller = as_labeller(method_names)) +
theme(strip.text.x = element_text(size =14),
strip.background = element_rect(color = "black", fill = "gray"))
I don't think using geom_bar gets me what I want and I tried changing the binwidth to 0.05 in geom_histogram but then I get bins too small. Essentially, I think I'm trying to change my data from continuous numeric to factors but I'm not sure how to make it work.
Here is some sample data:
sampleData
Site Treatment Unit Plot Total.Tons.ac Total.kg.m2 method
130 Thinning CO 10 7 0.4500000 0.1008000 twotrans
351 Shelterwood CO 12 1 7.2211615 1.6175402 twotrans
88 Thinning NB 3 7 1.1400000 0.2553600 twotrans
224 Shelterwood NB 2 3 2.1136105 0.4734487 onetrans
54 Thinning SB 9 11 1.8857743 0.4224134 onetrans
74 Thinning SB 1 3 0.8500000 0.1904000 twotrans
328 Shelterwood DB 7 11 0.8740906 0.1957963 twotrans
341 Shelterwood CO 10 5 2.4210886 0.5423239 twotrans
266 Shelterwood WB 9 7 1.0092961 0.2260823 onetrans
405 Shelterwood WB 9 5 7.0029263 1.5686555 FAP
332 Shelterwood NB 8 7 2.8059152 0.6285250 twotrans
126 Thinning SB 9 11 1.4900000 0.3337600 twotrans
295 Shelterwood NB 2 5 7.6567281 1.7151071 twotrans
406 Shelterwood WB 9 7 3.0703135 0.6877502 FAP
179 Thinning FB 6 9 13.2916773 2.9773357 FAP
185 Thinning FB 7 9 5.3594318 1.2005127 FAP
39 Thinning FB 7 5 0.0000000 0.0000000 onetrans
187 Thinning NB 8 1 0.9477477 0.2122955 FAP
10 Thinning FB 2 7 0.0000000 0.0000000 onetrans
102 Thinning SB 5 11 0.0000000 0.0000000 twotrans
dput(sampleData)
structure(list(Site = structure(c(2L, 1L, 2L, 1L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label =
c("Shelterwood",
"Thinning"), class = "factor"), Treatment = structure(c(1L, 1L,
4L, 4L, 5L, 5L, 2L, 1L, 6L, 6L, 4L, 5L, 4L, 6L, 3L, 3L, 3L, 4L,
3L, 5L), .Label = c("CO", "DB", "FB", "NB", "SB", "WB"), class = "factor"),
Unit = c(10L, 12L, 3L, 2L, 9L, 1L, 7L, 10L, 9L, 9L, 8L, 9L,
2L, 9L, 6L, 7L, 7L, 8L, 2L, 5L), Plot = c(7L, 1L, 7L, 3L,
11L, 3L, 11L, 5L, 7L, 5L, 7L, 11L, 5L, 7L, 9L, 9L, 5L, 1L,
7L, 11L), Total.Tons.ac = c(0.45, 7.221161504, 1.14, 2.113610483,
1.885774282, 0.85, 0.874090569, 2.421088641, 1.009296069,
7.002926269, 2.805915201, 1.49, 7.656728085, 3.07031351,
13.29167729, 5.359431807, 0, 0.947747726, 0, 0), Total.kg.m2 = c(0.1008,
1.617540177, 0.25536, 0.473448748, 0.422413439, 0.1904, 0.195796287,
0.542323856, 0.22608232, 1.568655484, 0.628525005, 0.33376,
1.715107091, 0.687750226, 2.977335712, 1.200512725, 0, 0.212295491,
0, 0), method = structure(c(3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L,
2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 3L), .Label = c("FAP",
"onetrans", "twotrans"), class = "factor")), .Names = c("Site",
"Treatment", "Unit", "Plot", "Total.Tons.ac", "Total.kg.m2",
"method"), row.names = c(130L, 351L, 88L, 224L, 54L, 74L, 328L,
341L, 266L, 405L, 332L, 126L, 295L, 406L, 179L, 185L, 39L, 187L,
10L, 102L), class = "data.frame")
I have created a function to extract the basic statistics like Mean,median , mode , SD, Var based on what the user wants. Eg. If the user wants to see only mean ,only mean should be calculated. So the statistics are passed as arguments.
The code is
countfunc<-function(dset,Xaxis,Color,Groupby,AggValue){
S1=select(dset,Xaxis,Color,Groupby)
S2=unique(S1)
str(S2)
stackval5<-aggregate(Groupby~Xaxis+Color,data=S2,FUN=AggValue)
return(stackval5)
}
countfunc(sbarr,"workclass","sex","age","mean")
Sample data :
> dput(head(S1,20))
structure(list(workclass = structure(c(8L, 7L, 5L, 5L, 5L, 5L,
5L, 7L, 5L, 5L, 5L, 8L, 5L, 5L, 5L, 5L, 7L, 5L, 5L, 7L), .Label = c(" Federal-gov",
" Local-gov", " NA", " Never-worked", " Private", " Self-emp-inc",
" Self-emp-not-inc", " State-gov", " Without-pay"), class = "factor"),
sex = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c(" Female",
" Male"), class = "factor"), age = c(39L, 50L, 38L, 53L,
28L, 37L, 49L, 52L, 31L, 42L, 37L, 30L, 23L, 32L, 40L, 34L,
25L, 32L, 38L, 43L)), .Names = c("workclass", "sex", "age"
), row.names = c(NA, 20L), class = "data.frame")
But when i run the function , it is throwing an error as "In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA" even though the "age" column is in Int, tried with Numeric conversion as well.
str of my DF
'data.frame': 886 obs. of 3 variables:
$ workclass: Factor w/ 9 levels " Federal-gov",..: 8 7 5 5 5 5 5 7 5 5 ...
$ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
$ age : int 39 50 38 53 28 37 49 52 31 42 ...
Xaxis Color Groupby
1 workclass sex NA
If i hard code the values (aggregate(age~workclass+sex,data=S1,FUN=mean), it is working as expected.It would a great help if you guide or share some thoughts on what i am doing wrong here. Thanks in advance.
Try the following.
countfunc<-function(dset,Xaxis,Color,Groupby,AggValue){
S1=select(dset,Xaxis,Color,Groupby)
S2=unique(S1)
stackval5 <- aggregate(S2[[Groupby]], list(S2[[Xaxis]], S2[[Color]]), FUN = AggValue)
names(stackval5) <- c(Xaxis, Color, Groupby)
stackval5
}
countfunc(sbarr,"workclass","sex","age","mean")
workclass sex age
1 Private Female 33.60000
2 Self-emp-not-inc Female 43.00000
3 Private Male 39.42857
4 Self-emp-not-inc Male 42.33333
5 State-gov Male 34.50000
What you were doing wrong was the formula. aggregate was looking for the values of the variables Xaxis, Color and Groupby, which were, respectively, "workclass", "sex", and "age". Since the value "age" is neither numeric nor logical, it would return NA. (It would do mean("age") and return NA.)
suppose I have two boxplots.
trial1 <- ggplot(completionTime, aes(fill=Condition, x=Scenario, y=Trial1))
trial1 + geom_boxplot()+geom_point(position=position_dodge(width=0.75)) + ylim(0, 160)
trial2 <- ggplot(completionTime, aes(fill=Condition, x=Scenario, y=Trial2))
trial2 + geom_boxplot()+geom_point(position=position_dodge(width=0.75)) + ylim(0, 160)
How can I plot trial 1 and trial 2 on the same plot and same respective X? they have the same range of y.
I looked at geom_boxplot(position="identity"), but that plots the two conditions(fill) on the same X.
I want to plot two y column on the same X.
Edit: the dataset
User Condition Scenario Trial1 Trial2
1 1 ME a 67 41
2 1 ME b 70 42
3 1 ME c 40 15
4 1 ME d 65 23
5 1 ME e 45 45
6 1 SE a 100 34
7 1 SE b 54 23
8 1 SE c 70 23
9 1 SE d 56 15
10 1 SE e 30 20
11 2 ME a 42 23
12 2 ME b 22 12
13 2 ME c 28 8
14 2 ME d 22 8
15 2 ME e 38 37
16 2 SE a 59 18
17 2 SE b 65 14
18 2 SE c 75 7
19 2 SE d 37 9
20 2 SE e 31 7
dput()
structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Condition = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), .Label = c("ME", "SE"), class = "factor"), Scenario =
structure(c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L), .Label = c("a", "b", "c", "d", "e"), class = "factor"),
Trial1 = c(67L, 70L, 40L, 65L, 45L, 100L, 54L, 70L, 56L,
30L, 42L, 22L, 28L, 22L, 38L, 59L, 65L, 75L, 37L, 31L), Trial2 = c(41L,
42L, 15L, 23L, 45L, 34L, 23L, 23L, 15L, 20L, 23L, 12L, 8L,
8L, 37L, 18L, 14L, 7L, 9L, 7L)), .Names = c("User", "Condition",
"Scenario", "Trial1", "Trial2"), class = "data.frame", row.names = c(NA,
-20L))
You could try using interaction to combine two of your factors and plot against a third. For example, assuming you want to fill by condition as in your original code:
library(tidyr)
completionTime %>%
gather(trial, value, -Scenario, -Condition, -User) %>%
ggplot(aes(interaction(Scenario, trial), value)) + geom_boxplot(aes(fill = Condition))
Result:
This is a question about array and data frame manipulation and calculation, in the
context of models for log odds in contingency tables. The closest question I've found to this is How can i calculate odds ratio in many table, but mine is more general.
I have a data frame representing a 3-way frequency table, of size 5 (litter) x 2 (treatment) x 3 (deaths).
"Freq" is the frequency in each cell, and deaths is the response variable.
Mice <-
structure(list(litter = c(7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 7L, 7L, 8L,
8L, 9L, 9L, 10L, 10L, 11L, 11L), treatment = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A",
"B"), class = "factor"), deaths = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("0", "1",
"2+"), class = "factor"), Freq = c(58L, 75L, 49L, 58L, 33L, 45L,
15L, 39L, 4L, 5L, 11L, 19L, 14L, 17L, 18L, 22L, 13L, 22L, 12L,
15L, 5L, 7L, 10L, 8L, 15L, 10L, 15L, 18L, 17L, 8L)), .Names = c("litter",
"treatment", "deaths", "Freq"), row.names = c(NA, 30L), class = "data.frame")
From this, I want to calculate the log odds for adjacent categories of the last variable (deaths)
and have this value in a data frame with factors litter (5), treatment (2), and contrast (2), as detailed below.
The data can be seen in xtabs() form:
mice.tab <- xtabs(Freq ~ litter + treatment + deaths, data=Mice)
ftable(mice.tab)
deaths 0 1 2+
litter treatment
7 A 58 11 5
B 75 19 7
8 A 49 14 10
B 58 17 8
9 A 33 18 15
B 45 22 10
10 A 15 13 15
B 39 22 18
11 A 4 12 17
B 5 15 8
>
From this, I want to calculate the (adjacent) log odds of 0 vs. 1 and 1 vs.2+ deaths, which is easy in
array format,
odds1 <- log(mice.tab[,,1]/mice.tab[,,2]) # contrast 0:1
odds2 <- log(mice.tab[,,2]/mice.tab[,,3]) # contrast 1:2+
odds1
treatment
litter A B
7 1.6625477 1.3730491
8 1.2527630 1.2272297
9 0.6061358 0.7156200
10 0.1431008 0.5725192
11 -1.0986123 -1.0986123
>
But, for analysis, I want to have these in a data frame, with factors litter, treatment and contrast
and a column, 'logodds' containing the entries in the odds1 and odds2 tables, suitably strung out.
More generally, for an I x J x K table, where the last factor is the response, my desired result
is a data frame of IJ(K-1) rows, with adjacent log odds in a 'logodds' column, and ideally, I'd like
to have a general function to do this.
Note that if T is the 10 x 3 matrix of frequencies shown by ftable(), the calculation is essentially
log(T) %*% matrix(c(1, -1, 0,
0, 1, -1))
followed by reshaping and labeling.
Can anyone help with this?