I am trying to produce a univariate output table using the gtsummary package.
structure(list(id = 1:10, age = structure(c(3L, 3L, 2L, 3L, 2L,
2L, 2L, 1L, 1L, 1L), .Label = c("c", "b", "a"), class = c("ordered",
"factor")), sex = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 2L), .Label = c("F", "M"), class = "factor"), country = structure(c(1L,
1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("eng", "scot",
"wale"), class = "factor"), edu = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 3L), .Label = c("x", "y", "z"), class = "factor"),
lungfunction = c(45L, 23L, 25L, 45L, 70L, 69L, 90L, 50L,
62L, 45L), ivdays = c(15L, 26L, 36L, 34L, 2L, 4L, 5L, 8L,
9L, 15L), no2 = c(40L, 70L, 50L, 60L, 30L, 25L, 80L, 89L,
10L, 40L), pm25 = c(15L, 20L, 36L, 48L, 25L, 36L, 28L, 15L,
25L, 15L)), row.names = c(NA, 10L), class = "data.frame")
...
library(gtsummary)
publication_dummytable1_sum %>%
select(sex,age,lungfunction,ivdays) %>%
tbl_uvregression(
method =lm,
y = lungfunction,
pvalue_fun = ~style_pvalue(.x, digits = 3)
) %>%
add_global_p() %>% # add global p-value
bold_p() %>% # bold p-values under a given threshold
bold_labels()
...
When I run this code I get the output below. The issue is the labeling of the ordered factor variable (age). R chooses its own labeling for the ordered factor variable. Is it possible to tell R not to choose its own labeling for ordered factor variables?
I want output like the following:
Like many other people, I think you might be misunderstanding the meaning of an "ordered" factor in R. All factors in R are ordered, in a sense; the estimates etc. are typically printed, plotted, etc. in the order of the levels vector. Specifying that a factor is of type ordered has two major effects:
it allows you to evaluate inequalities on the levels of the factor (e.g. you can filter(age > "b"))
the contrasts are set by default to orthogonal polynomial contrasts, which is where the L (linear) and Q (quadratic) labels come from: see e.g. this CrossValidated answer for more details.
If you want this variable treated in the same way a regular factor (so that the estimates are made for differences of groups from the baseline level, i.e. treatment contrasts), you can:
convert back to an unordered factor (e.g. factor(age, ordered=FALSE))
specify that you want to use treatment contrasts in your model (in base R you would specify contrasts = list(age = "contr.treatment"))
set options(contrasts = c(unordered = "contr.treatment", ordered = "contr.treatment")) (the default for ordered is "contr.poly")
If you have an unordered ("regular") factor and the levels are not in the order you want, you can reset the level order by specifying the levels explicitly, e.g.
mutate(across(age, factor,
levels = c("0-10 years", "11-20 years", "21-30 years", "30-40 years")))
R sets the factors in alphabetical order by default, which is sometimes not what you want (but I can't think of a case where the order would be 'random' ...)
The easiest way to remove the odd labelling for the ordered variables, is to remove the ordered class from these factor variables. Example below!
library(gtsummary)
library(tidyverse)
packageVersion("gtsummary")
#> [1] '1.4.2'
publication_dummytable1_sum <-
structure(list(id = 1:10, age = structure(c(3L, 3L, 2L, 3L, 2L,
2L, 2L, 1L, 1L, 1L), .Label = c("c", "b", "a"), class = c("ordered",
"factor")), sex = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 2L), .Label = c("F", "M"), class = "factor"), country = structure(c(1L,
1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("eng", "scot",
"wale"), class = "factor"), edu = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 3L), .Label = c("x", "y", "z"), class = "factor"),
lungfunction = c(45L, 23L, 25L, 45L, 70L, 69L, 90L, 50L,
62L, 45L), ivdays = c(15L, 26L, 36L, 34L, 2L, 4L, 5L, 8L,
9L, 15L), no2 = c(40L, 70L, 50L, 60L, 30L, 25L, 80L, 89L,
10L, 40L), pm25 = c(15L, 20L, 36L, 48L, 25L, 36L, 28L, 15L,
25L, 15L)), row.names = c(NA, 10L), class = "data.frame") |>
as_tibble()
# R labels the order factors like this in lm()
lm(lungfunction ~ age, publication_dummytable1_sum)
#>
#> Call:
#> lm(formula = lungfunction ~ age, data = publication_dummytable1_sum)
#>
#> Coefficients:
#> (Intercept) age.L age.Q
#> 51.17 -10.37 -15.11
tbl <-
publication_dummytable1_sum %>%
# remove ordered class
mutate(across(where(is.ordered), ~factor(., ordered = FALSE))) %>%
select(sex,age,lungfunction,ivdays) %>%
tbl_uvregression(
method =lm,
y = lungfunction,
pvalue_fun = ~style_pvalue(.x, digits = 3)
)
Created on 2021-07-22 by the reprex package (v2.0.0)
Related
[![enter image description here][2]][2][![i need help in writing gstummary r code to produce following table output.dummy table shown in above table][2]][2]
i need help in writing gstummary r code to produce following table output.dummy table shown in above table
[![enter image description here][2]][2]
library(gtsummary)
[![enter image description here][2]][2]
[![enter image description here][3]][3]
id
age
sex
country
edu
ln
ivds
n2
p5
1
a
M
eng
x
45
15
40
15
2
a
M
eng
x
23
26
70
15
4
a
M
eng
x
26
36
35
40
5
b
F
eng
x
26
25
36
47
6
b
F
wal
y
45
45
60
12
7
b
M
wal
y
60
25
36
15
8
c
M
wal
y
70
08
25
36
9
c
F
sco
z
80
25
36
15
10
c
F
sco
z
90
25
26
39
structure(list(id = 1:15, age = structure(c(1L, 1L, 2L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 1L, 1L, 2L, 1L, 2L), .Label = c("a", "b",
"c"), class = "factor"), sex = structure(c(2L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("F", "M"), class = "factor"),
country = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 3L), .Label = c("eng", "scot", "wale"
), class = "factor"), edu = structure(c(1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L), .Label = c("x",
"y", "z"), class = "factor"), lon = c(45L, 23L,
25L, 45L, 70L, 69L, 90L, 50L, 62L, 45L, 23L, 25L, 45L, 70L,
69L), is = c(15L, 26L, 36L, 34L, 2L, 4L, 5L, 8L, 9L,
15L, 26L, 36L, 34L, 2L, 4L), n2 = c(40L, 70L, 50L, 60L,
30L, 25L, 80L, 89L, 10L, 40L, 70L, 50L, 60L, 30L, 25L), p5 = c(15L,
20L, 36L, 48L, 25L, 36L, 28L, 15L, 25L, 15L, 20L, 36L, 48L,
25L, 36L)), row.names = c(NA, 15L), class = "data.frame")
[
I made a table similar to what you have above (more similar to the table you had before you updated it). But I think it'll get you most of the way there.
The type of table you're requesting it something that is in the works. In the meantime, you will need to use the bstfun::tbl_2way_summary() function. This function exists in another package while we work to make it better before integrating with gtsummary.
library(bstfun) # install with `remotes::install_github("ddsjoberg/bstfun")`
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.1'
# add a column that is all the same value
trial2 <- trial %>% mutate(constant = TRUE)
# loop over each continuous variable, construct table, then merge them together
tbls_row1 <-
c("age", "marker", "ttdeath") %>%
purrr::map(
~tbl_2way_summary(data = trial2, row = grade, col = constant, con = all_of(.x),
statistic = "{mean} ({sd}) - {min}, {max}") %>%
modify_header(stat_1 = paste0("**", .x, "**"))
) %>%
tbl_merge() %>%
modify_spanning_header(everything() ~ NA)
# repeat for the second row
tbls_row2 <-
c("age", "marker", "ttdeath") %>%
purrr::map(
~tbl_2way_summary(data = trial2, row = stage, col = constant, con = all_of(.x),
statistic = "{mean} ({sd}) - {min}, {max}") %>%
modify_header(stat_1 = paste0("**", .x, "**"))
) %>%
tbl_merge() %>%
modify_spanning_header(everything() ~ NA)
# stack these tables
tbl_stacked <- tbl_stack(list(tbls_row1, tbls_row2))
# lastly, add calculated summary stats for categorical variables, and merge them
tbl_summary_stats <-
trial2 %>%
tbl_summary(
include = c(grade, stage),
missing = "no"
) %>%
modify_header(stat_0 ~ "**n (%)**") %>%
modify_footnote(everything() ~ NA)
tbl_final <-
tbl_merge(list(tbl_summary_stats, tbl_stacked)) %>%
modify_spanning_header(everything() ~ NA) %>%
# column spanning column headers
modify_spanning_header(
list(c(stat_1_1_2, stat_1_2_2) ~ "**Group 1**",
stat_1_3_2 ~ "**Group 2**")
)
Created on 2021-07-10 by the reprex package (v2.0.0)
I want to test a 2x3 factorial design and contrasted the variables like this
library(lme4)
library(emmeans)
my.helmert = matrix(c(2, -1, -1, 0, -1, 1), ncol = 2)
contrasts(Target3$mask) = my.helmert
contrasts(Target3$length)
So for mask I want to compare the first group with the average of the two other groups and in a second step the second with the third group.
This works fine in my LMM
Target3.2_TT.lmer = lmer(logTotalTime ~ mask*length+ (1+length|Subject) +(1|Trialnum), data = Target3)
There is a significant interaction between mask and length, that´s why I want to take a look at this effect and calculate a post hoc test (Turkey) like this:
emmeans(Target3.2_TT.lmer, pairwise ~ mask : length)
This also works pretty fine with one problem: now my contrasts are gone. The text calculates the differences for all masks and not just 1 vs. 2 and 3 and 2 vs. 3. Is there a possibility to keep my contrasts in the Post hoc test?
This is what the data looks like:
> dput(Target3)
structure(list(mask = structure(c(2L, 1L, 2L, 3L, 1L, 2L, 3L,
2L, 1L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 1L, 3L, 1L, 1L, 2L, 3L,
3L, 2L, 1L, 3L, 2L, 3L, 2L), contrasts = structure(c(2, -1, -1,
0, -1, 1), .Dim = c(3L, 2L), .Dimnames = list(c("keine Maske",
"syntaktisch\n korrekt", "syntaktisch \n inkorrekt"), NULL)), .Label = c("keine Maske",
"syntaktisch\n korrekt", "syntaktisch \n inkorrekt"), class = "factor"),
length = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 1L, 2L, 1L), .Label = c("kurzes \n N+1", "langes\n N+1"
), class = "factor"), logTotalTime = c(4.969813299576, 5.37989735354046,
5.14166355650266, 5.40717177146012, 5.27299955856375, 5.72358510195238,
5.4249500174814, 6.18001665365257, 5.67675380226828, 5.44241771052179,
5.66988092298052, 5.04985600724954, 5.78996017089725, 5.03043792139244,
5.92958914338989, 5.15329159449778, 6.11146733950268, 5.26269018890489,
5.17614973257383, 6.18001665365257, 6.03068526026126, 5.68697535633982,
5.17614973257383, 5.19849703126583, 5.29330482472449, 5.89989735358249,
5.73979291217923, 5.65599181081985, 5.94017125272043, 5.72031177660741
)), .Names = c("mask", "length", "logTotalTime"), row.names = c(2L,
4L, 6L, 8L, 9L, 11L, 13L, 15L, 16L, 18L, 20L, 22L, 27L, 29L,
31L, 33L, 35L, 37L, 39L, 41L, 42L, 44L, 47L, 49L, 51L, 54L, 55L,
57L, 59L, 61L), class = "data.frame")
Well, if you ask for pairwise comparisons, that’s what you get, and Helmert contrasts are not the same as pairwise comparisons. Further, the Tukey (not Turkey) method applies only to pairwise comparisons, not to other types of contrasts.
Here’s something to try that may give you what you want.
emm = emmeans(Target3.2_TT.lmer,
~ mask | length)
contrast(emm, list(
c1 = c(2, -1, -1)/2,
c2 = c(0, 1, -1)),
adjust = “mvt”)
This will work independently of whatever parameterization (i.e., contrasts settings) were used when fitting the model. The model parametrrization affects how the model matrix is set up and the interpretation of the coefficients, but does not affect the results from emmeans or its relatives.
i have dataset and i have to perform daily forecast splited by groups.
The group is client+stuff
ts <- read.csv("C:/Users/Admin/Desktop/mydat.csv",sep=";", dec=",")
here mydat
structure(list(Data = structure(c(1L, 3L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 2L, 4L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 26L), .Label = c("01.04.2017",
"01.06.2017", "02.04.2017", "02.06.2017", "03.04.2017", "04.04.2017",
"05.04.2017", "06.04.2017", "07.04.2017", "08.04.2017", "09.04.2017",
"10.04.2017", "11.04.2017", "12.05.2017", "13.05.2017", "14.05.2017",
"15.05.2017", "16.05.2017", "17.05.2017", "18.05.2017", "19.05.2017",
"20.05.2017", "21.05.2017", "22.05.2017", "23.05.2017", "24.05.2017",
"25.05.2017", "26.05.2017", "27.05.2017", "28.05.2017", "29.05.2017",
"30.05.2017", "31.05.2017"), class = "factor"), client = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Horns and hooves", "Kornev & Co."
), class = "factor"), stuff = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("chickens", "hooves", "Oysters"), class = "factor"),
Продажи = c(374L, 12L, 120L, 242L, 227L, 268L, 280L, 419L,
12L, 172L, 336L, 117L, 108L, 150L, 90L, 117L, 116L, 146L,
120L, 211L, 213L, 67L, 146L, 118L, 152L, 122L, 201L, 497L,
522L, 65L, 268L, 441L, 247L, 348L, 445L, 477L, 62L, 226L,
476L, 306L)), .Names = c("Data", "client", "stuff", "Продажи"
), class = "data.frame", row.names = c(NA, -40L))
of course I can manually separate three datasets
horns and hooves + hooves
Horns and hooves + chickens
Kornev & Co. + oysters
but what to do in the case when I have a huge dataset and there are hundreds of groups. Do not manually split.
Is it possible to split it in R into groups and then perform a forecast?
the code for forecast is simple
The first i do so
library(forecast)
library(lubridate)
msts <- msts(ts$sales,seasonal.periods = c(7,365.25),start = decimal_date(as.Date("2017-05-12")))
plot(msts, main="sales", xlab="Year", ylab="sales")
tbats <- tbats(msts)
plot(tbats, main="Multiple Season Decomposition")
sp<- predict(tbats,h=14) #14 days forecast
plot(sp, main = "TBATS Forecast", include=14)
print(sp)
if the result does not suit me, I'm perform forecast via dummy variables
tsw <- ts(ts$Sales, start = decimal_date(as.Date("2017-05-12")), frequency = 7)
View(tsw)
mytslm <- tslm(tsw ~ trend + season)
print(mytslm)
residarima1 <- auto.arima(mytslm$residuals)
residualsArimaForecast <- forecast(residarima1, h=14)
residualsF <- as.numeric(residualsArimaForecast$mean)
regressionForecast <- forecast(mytslm,h=14)
regressionF <- as.numeric(regressionForecast$mean)
forecastR <- regressionF+residualsF
print(forecastR)
You can use split to split the data into groups by a combination of factors, in this case columns client and stuff.
group_list <- split(mydat, list(mydat$client, mydat$stuff))
group_list <- group_list[sapply(group_list, function(x) nrow(x) != 0)]
Then you can use this list and lapply any function you want. The following is how you would perform your first forecast. Note that I have separated the forecast code from the plots code and that each step of the forecast is done by one function, first apply function msts and produce a list of such objects, then apply function tbats and produce another list.
fun_msts <- function(ts){
msts(ts$Sales, seasonal.periods = c(7,365.25), start = decimal_date(as.Date("2017-05-12")))
}
fun_sp <- function(m){
tbats <- tbats(m)
predict(tbats, h=14) #14 days forecast
}
msts_list <- lapply(group_list, fun_msts)
sp_list <- lapply(msts_list, fun_sp)
Now if you want to, you can plot the results. In order to do that, define two other functions to be lapplyed.
plot_msts <- function(m, new.window = TRUE){
if(new.window) windows()
plot(m, main="Sales", xlab="Year", ylab="Sales")
}
plot_sp <- function(sp, new.window = TRUE){
if(new.window) windows()
plot(sp, main = "TBATS Forecast", include = 14)
}
lapply(msts_list, plot_msts)
lapply(sp_list, plot_sp)
In these functions a new graphic device is open with function windows. If you are not using Microsoft Windows or if you want to open another type of device, change that instruction but keep the if(new.window).
EDIT.
As for the regression with dummy variables, you can do the following.
fun_tslm <- function(x, start = "2017-05-12", freq = 7){
tsw <- ts(x[["Sales"]], start = decimal_date(as.Date(start)), frequency = freq)
#View(tsw)
mytslm <- tslm(tsw ~ trend + season)
mytslm
}
fun_forecast <- function(x, h = 14){
residarima1 <- auto.arima(x[["residuals"]])
residualsArimaForecast <- forecast(residarima1, h = h)
residualsF <- as.numeric(residualsArimaForecast$mean)
regressionForecast <- forecast(x, h = h)
regressionF <- as.numeric(regressionForecast$mean)
forecastR <- regressionF + residualsF
forecastR
}
tslm_list <- lapply(group_list, fun_tslm)
fore_list <- lapply(tslm_list, fun_forecast)
How can the analysis of repeated replicated design given on this page ( https://stats.stackexchange.com/questions/115135/repeated-measures-anova-with-replicated-measurements ) be done in R? I can perform ANOVA using aov() but I have some doubts as to the Error term there.
The data is as follows:
mydf = structure(list(User = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
Mode = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Trial1Time = c(20L,
5L, 40L, 10L, 15L, 30L, 13L, 11L, 35L), Trial2Time = c(30L,
7L, 25L, 20L, 17L, 35L, 26L, 11L, 38L)), .Names = c("User",
"Mode", "Trial1Time", "Trial2Time"), class = "data.frame", row.names = c(NA,
-9L))
I can't get why the testing of dataset is not working in R neural networks (nnet package).
I have two datasets with similar structures - for training (trainset, 17 cases) and prediction (testset, 9 cases). Each dataset has columns: Age, Gender, Height, Weight. In the testing dataset the age is unknown (NaN).
The formula for training is obtained successfully below:
library(nnet)
trainednetwork<-nnet(age~gender+emLength+action5cnt,trainset, size=17)
Anyway, if I try to use test dataset for prediction in the next string of the code,
prediction<-predict(trainednetwork,testset)
I get mistake "No component terms, no attribute". Can anyone help?
The data (obtained with dput() function):
testset:
structure(list(
age = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_),
gender = structure(
c(2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L),
.Label = c("f", "m"),
class = "factor"),
Height= c(9L, 11L, 9L, 11L, 9L, 11L, 9L, 11L, 9L),
Weight= c(1L, 41L, 2L, 1L, 2L, 29L, 12L, 6L, 12L)),
.Names = c("age", "gender", "Height", "Weight"),
class = "data.frame",
row.names = c(NA, 9L))
trainset:
structure(list(
age = c(43L, 35L, 22L, 28L, 20L, 47L, 41L, 23L,
42L, 27L, 22L, 60L, 62L, 47L, 42L, 26L, 54L),
gender = structure(
c(2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L),
.Label = c("f", "m"),
class = "factor"),
Height= c(7L, 9L, 11L, 11L, 11L, 9L, 11L, 9L, 23L, 9L,
9L, 9L, 10L, 7L, 7L, 11L, 7L),
Weight= c(2L, 2L, 9L, 9L, 28L, 8L, 6L, 3L, 1L, 2L, 40L,
1L, 9L, 1L, 7L, 4L, 35L)),
.Names = c("age", "gender", "Height", "Weight"),
class = "data.frame",
row.names = c(NA, 17L))
I think in the R neuralnet package the command to use for prediction is "compute", not predict, which is very confusing. A