How to extract attributes values from svyciprop object? - r

How can I extract attributes from svyciprop object below to a data.frame:
library(survey)
api <- read.dta(file = "http://www.ats.ucla.edu/stat/stata/library/apipop.dta")
data(api)
dclus1 <- svydesign(id=~dnum, fpc=~fpc, data=apiclus1)
prop.ci <- svyciprop(~I(ell==0), dclus1, method="li")
printing
prop.ci
Yields:
> prop.ci
2.5% 97.5%
I(ell == 0) 0.021858 0.000664 0.11
str(prop.ci)
> str(prop.ci)
Class 'svyciprop' atomic [1:1] 0.0219
..- attr(*, "var")= num [1, 1] 0.000512
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr "as.numeric(I(ell == 0))"
.. .. ..$ : chr "as.numeric(I(ell == 0))"
..- attr(*, "ci")= Named num [1:2] 0.000664 0.107778
.. ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"

You can use the following commands to extract the proportion and the confidence interval from the object prop.ci:
# the proportion
as.vector(prop.ci)
# [1] 0.02185792
# the confidence interval
attr(prop.ci, "ci")
# 2.5% 97.5%
# 0.0006639212 0.1077784084
If you want to access the values of the confidence interval separately, you can use vector indexing:
ci <- attr(prop.ci, "ci")
ci[1]
# 2.5%
# 0.0006639212
ci[2]
# 97.5%
# 0.1077784

in case it's easier to remember..
# everything
prop.ci
# extract just the proportion
prop.ci[ 1 ]
# extract the confidence interval
confint( prop.ci )
# lower bound
confint( prop.ci )[ 1 ]
# upper bound
confint( prop.ci )[ 2 ]

Related

mk.test() results to tabble/matrix R

I want to apply mk.test() to the large dataset and get results in a table/matrix.
My data look something like this:
Column A
Column B
...
ColumnXn
1
2
...
5
...
...
...
...
3
4
...
7
So far I managed to perform mk.test() for all columns and print the results:
for(i in 1:ncol(data)) {
print(mk.test(as.numeric(unlist(data[ , i]))))
}
I got all the results printed:
.....
Mann-Kendall trend test
data: as.numeric(unlist(data[, i]))
z = 4.002, n = 71, p-value = 6.28e-05
alternative hypothesis: true S is not equal to 0
sample estimates:
S varS tau
7.640000e+02 3.634867e+04 3.503154e-01
Mann-Kendall trend test
data: as.numeric(unlist(data[, i]))
z = 3.7884, n = 71, p-value = 0.0001516
alternative hypothesis: true S is not equal to 0
sample estimates:
S varS tau
7.240000e+02 3.642200e+04 3.283908e-01
....
However, I was wondering if it is possible to get results in a table/matrix format that I could save as excel.
Something like this:
Column
z
p-value
S
varS
tau
Column A
4.002
0.0001516
7.640000e+02
3.642200e+04
3.283908e-01
...
...
...
...
...
...
ColumnXn
3.7884
6.28e-05
7.240000e+02
3.642200e+04
3.283908e-01
Is it possible to do so?
I would really appreciate your help.
Instead of printing the test results you can store them in a variable. This variable holds the various test statistics and values. To find the names of the properties you can perform the test on the first row and find the property names using a string conversion:
testres = mk.test(as.numeric(unlist(data[ , 1])))
str(testres)
List of 9
$ data.name : chr "as.numeric(unlist(data[, 1]))"
$ p.value : num 0.296
$ statistic : Named num 1.04
..- attr(*, "names")= chr "z"
$ null.value : Named num 0
..- attr(*, "names")= chr "S"
$ parameter : Named int 3
..- attr(*, "names")= chr "n"
$ estimates : Named num [1:3] 3 3.67 1
..- attr(*, "names")= chr [1:3] "S" "varS" "tau"
$ alternative: chr "two.sided"
$ method : chr "Mann-Kendall trend test"
$ pvalg : num 0.296
- attr(*, "class")= chr "htest"
Here you see that for example the z-value is called testres$statistic and similar for the other properties. The values of S, varS and tau are not separate properties but they are grouped together in the list testres$estimates.
In the code you can create an empty dataframe, and in the loop add the results of that run to this dataframe. Then at the end you can convert to csv using write.csv().
library(trend)
# sample data
mydata = data.frame(ColumnA = c(1,3,5), ColumnB = c(2,4,1), ColumnXn = c(5,7,7))
# empty dataframe to store results
results = data.frame(matrix(ncol=6, nrow=0))
colnames(results) <- c("Column", "z", "p-value", "S", "varS", "tau")
for(i in 1:ncol(mydata)) {
# store test results in variable
testres = mk.test(as.numeric(unlist(mydata[ , i])))
# extract elements of result
testvars = c(colnames(mydata)[i], # column
testres$statistic, # z
testres$p.value, # p-value
testres$estimates[1], # S
testres$estimates[2], # varS
testres$estimates[3]) # tau
# add to results dataframe
results[nrow(results)+1,] <- testvars
}
write.csv(results, "mannkendall.csv", row.names=FALSE)
The resulting csv file can be opened in Excel.

R - Alteryx - All columns in a tibble must be vectors

I'm using R on Alteryx to perform some statical analysis from my data.
It appears the error message " ! All Columns in a tibble must be vectors." as the following error message:
Does anybody can help me?
Below is my entire code:
library("tibble")
# Calling Data from Connection #1
data <- read.Alteryx("#1")
average_wilcox <- c("1","1","1","1","1","1","1")
# Creating data frame for in case it comes an empty table
df <- data.frame(average_wilcox)
#Verify if p-value is empty
# In case is different that empty, executes the steps for the Hypothesis Test for non-normal data
if (length(data$p.value) == 0) {
write.Alteryx(df, 1)
} else if (data$p.value != '') {
Week1 <- read.Alteryx("#2", mode="data.frame")
"&"
Week2 <- read.Alteryx("#3", mode="data.frame")
# MANN WHITNEY TEST (AVERAGE TEST FOR NON NORMAL)
Week1_data <- Week1$Wk1_feature_value
Week2_data <- Week2$Wk2_feature_value
# DEFINE VECTORS
week1 <- c(Week1_data)
week2 <- c(Week2_data)
merge(cbind(Week1, X=1:length(week1)),
cbind(Week2, X=1:length(week2)), all.y =T) [-1]
# MANN WHITNEY TEST (MEAN TEST FOR NON NORMAL)
average_wilcox <- wilcox.test(week1,week2, alternative='two.sided', conf.level=.95)
average_test <- tibble(average_wilcox)
average_test[] <- lapply(average_test, as.character)
write.Alteryx(average_test, 1)
}
#### NORMAL HYPOTHESIS TEST ####
# Calling Data from Connection #4
data1 <- read.Alteryx("#4")
df1 <- data.frame(Date=as.Date(character()),"p.value"=character(),User=character(),stringsAsFactors=FALSE)
# Verify if p-value is empty
# In case if different than empty, executes the steps for the Hypothesis Test for normal data
if(length(data1$p.value) == 0) {
write.Alteryx(df1, 3)
} else if (data1$p.value != '') {
Week1 <- read.Alteryx("#2", mode="data.frame")
"&"
Week2 <- read.Alteryx("#3", mode="data.frame")
# T TEST (MEAN TEST FOR NORMAL)
Week1_data <- Week1$Wk1_feature_value
Week2_data <- Week2$Wk2_feature_value
# DEFINE VECTORS
week1 <- c(Week1_data)
week2 <- c(Week2_data)
# T TEST (MEAN TEST FOR NORMAL)
t_test <- t.test(week1,week2, alternative='two.sided',conf.level=.95)
write.Alteryx(t_test,3)
}
Please, anybody knows what I have to do?
Many thanks,
Wil
Reason is that both wilcox.test and t.test returns a list of vectors, which may have difference in length. So, using that list in write.Alteryx is triggering the error as it expects a data.frame/tibble/data.table. e.g.
> str(t.test(1:10, y = c(7:20)))
List of 10
$ statistic : Named num -5.43
..- attr(*, "names")= chr "t"
$ parameter : Named num 22
..- attr(*, "names")= chr "df"
$ p.value : num 1.86e-05
$ conf.int : num [1:2] -11.05 -4.95
..- attr(*, "conf.level")= num 0.95
$ estimate : Named num [1:2] 5.5 13.5
..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
$ null.value : Named num 0
..- attr(*, "names")= chr "difference in means"
$ stderr : num 1.47
$ alternative: chr "two.sided"
$ method : chr "Welch Two Sample t-test"
$ data.name : chr "1:10 and c(7:20)"
- attr(*, "class")= chr "htest"
> x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)
> y <- c(1.15, 0.88, 0.90, 0.74, 1.21)
> str(wilcox.test(x, y, alternative = "g") )
List of 7
$ statistic : Named num 35
..- attr(*, "names")= chr "W"
$ parameter : NULL
$ p.value : num 0.127
$ null.value : Named num 0
..- attr(*, "names")= chr "location shift"
$ alternative: chr "greater"
$ method : chr "Wilcoxon rank sum exact test"
$ data.name : chr "x and y"
- attr(*, "class")= chr "htest"
An option is to convert the output from both t.test and wilcox.test to a data.frame/tibble. tidy/glance from broom does this
...
library(broom)
average_wilcox <- tidy(wilcox.test(week1,week2, alternative='two.sided', conf.level=.95))
write.Alteryx(average_wilcox, 1)
...
t_test <- tidy(t.test(week1,week2, alternative='two.sided',conf.level=.95))
write.Alteryx(t_test,3)

Extract a column from lme4 summary in R

I was wondering what is the most efficient way to extract (not print like HERE) only the Std.Dev. column from the vc object below as a vector?
library(lme4)
library(nlme)
data(Orthodont, package = "nlme")
fm1 <- lmer(distance ~ age + (age|Subject), data = Orthodont)
vc <- VarCorr(fm1) ## extract only the `Std.Dev.` column as a vector
The structure of 'vc' suggests it is a list with single element 'Subject' and the 'stddev' is an attribute
str(vc)
#List of 1
# $ Subject: num [1:2, 1:2] 6.3334 -0.3929 -0.3929 0.0569
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:2] "(Intercept)" "age"
# .. ..$ : chr [1:2] "(Intercept)" "age"
# ..- attr(*, "stddev")= Named num [1:2] 2.517 0.239 ####
So, extract the attribute directly
attr(vc$Subject, "stddev")
and the residual standard deviation is an outside attribute
attr(vc, "sc")
#[1] 1.297364
If we combine them with c, we get a single vector
c(attr(vc$Subject, "stddev"), attr(vc, "sc"))
# (Intercept) age
# 2.5166317 0.2385853 1.2973640
Wrap with as.numeric/as.vector to remove the names as it is a named vector
Or use attributes
c(attributes(vc)$sc, attributes(vc$Subject)$stddev)
If you want the three elements in the column, you can use:
as.numeric(c(attr(vc[[1]], "stddev"), attr(vc, "sc")))

Does glmmTMB return the standard error for random effect variance components like glmmADMB?

Note: The following analysis is solely for reproducible objects -- and is not being put forth as a legitimate way to analyze the bacteria data from MASS.
library(glmmADMB)
library(glmmTMB)
data(bacteria,package="MASS")
bacteria$present <- as.numeric(bacteria$y)-1
bacteria$early <- factor(as.numeric(bacteria$week > 3) + 1)
bfit2 <- glmmadmb(present ~ trt ,
random = ~ (1 | ID) + (1 | early),
family = "binomial", data = bacteria)
bfit2$S
bfit2$sd_S
bfit3 <- glmmTMB(present ~ trt + (1 | ID) + (1 | early),
family = "binomial", data = bacteria)
summary(bfit3)$varcor
confint(bfit3)
In my understanding,
bfit2$S contains estimates for the variance of random effects for glmmADMB
bfit3$varcor contains estimates for the standard deviation of random effects for glmmTMB (that is, the sqrt() of the elements of bfit2$S)
bfit2$sd_S contains standard errors of the estimates in bfit2$S for glmmADMB (as noted in this SO post)
Where are the standard errors for bfit3$varcor stored in a glmmTMB object? UPDATE: confint is implemented for glmmTMB objects, so if calculating 95% CI is the end goal then that's available (Thanks kaskr).
> bfit2$S
$ID
(Intercept)
(Intercept) 1.4047
$early
(Intercept)
(Intercept) 0.51465
> bfit2$sd_S
$ID
(Intercept)
(Intercept) 0.94743
$early
(Intercept)
(Intercept) 0.61533
> summary(bfit3)$varcor
Conditional model:
Groups Name Std.Dev.
ID (Intercept) 1.18513
early (Intercept) 0.71733
> confint(bfit3)
2.5 % 97.5 % Estimate
cond.(Intercept) 1.1934429 4.1351114 2.6642771
cond.trtdrug -2.6375284 -0.0503639 -1.3439462
cond.trtdrug+ -2.0819454 0.5325683 -0.7746885
cond.Std.Dev.ID.(Intercept) 0.6118984 2.2953603 1.1851276
cond.Std.Dev.early.(Intercept) 0.2222685 2.3150437 0.7173293
As we can see, sqrt(1.4047) = 1.18513 and sqrt(0.51465) = 0.71733 so that indicates bfit2$S gives the estimates for the variances and summary(bfit3)$varcor gives the estimates for the standard deviation.
2nd update:
After some digging, I realized that bfit3$sdr returns the variance components on the log-sd-scale, along with the standard errors. So one thought was to avoid confint and back calculate the SEs by calculating 95%CIs on log-sd-scale and then transforming to desired scale and then dividing width of CI by 2*1.96.
## to get the standard errors from glmmTMB:
bfit3$sdr
## note that theta is just log(sd)
exp(summary(bfit3$sdr, "fixed")[4,1])
exp(summary(bfit3$sdr, "fixed")[5,1])
## calculate the (wald) lower and upper on the log(sd) scale:
low.log.sd.id <- summary(bfit3$sdr, "fixed")[4,1] - 1.96*summary(bfit3$sdr, "fixed")[4,2]
low.log.sd.early <- summary(bfit3$sdr, "fixed")[5,1] - 1.96*summary(bfit3$sdr, "fixed")[5,2]
upp.log.sd.id <- summary(bfit3$sdr, "fixed")[4,1] + 1.96*summary(bfit3$sdr, "fixed")[4,2]
upp.log.sd.early <- summary(bfit3$sdr, "fixed")[5,1] + 1.96*summary(bfit3$sdr, "fixed")[5,2]
## convert to variance scale by taking exp and then squaring
low.var.id <- exp(low.log.sd.id)^2
upp.var.id <- exp(upp.log.sd.id)^2
low.var.early <- exp(low.log.sd.early)^2
upp.var.early <- exp(upp.log.sd.early)^2
## back calculate SEs
(upp.var.id - low.var.id) / (2*1.96)
(upp.var.early - low.var.early) / (2*1.96)
## see how they compare to the confint answers for sd:
sqrt(c(low.var.id, upp.var.id))
sqrt(c(low.var.early, upp.var.early))
Run it:
> ## to get the standard errors from glmmTMB:
> bfit3$sdr
sdreport(.) result
Estimate Std. Error
beta 2.6642771 0.7504394
beta -1.3439462 0.6600031
beta -0.7746885 0.6669800
theta 0.1698504 0.3372712
theta -0.3322203 0.5977910
Maximum gradient component: 4.83237e-06
> ## note that theta is just log(sd)
> exp(summary(bfit3$sdr, "fixed")[4,1])
[1] 1.185128
> exp(summary(bfit3$sdr, "fixed")[5,1])
[1] 0.7173293
> ## calculate the (wald) lower and upper on the log(sd) scale:
> low.log.sd.id <- summary(bfit3$sdr, "fixed")[4,1] - 1.96*summary(bfit3$sdr, "fixed")[4,2]
> low.log.sd.early <- summary(bfit3$sdr, "fixed")[5,1] - 1.96*summary(bfit3$sdr, "fixed")[5,2]
> upp.log.sd.id <- summary(bfit3$sdr, "fixed")[4,1] + 1.96*summary(bfit3$sdr, "fixed")[4,2]
> upp.log.sd.early <- summary(bfit3$sdr, "fixed")[5,1] + 1.96*summary(bfit3$sdr, "fixed")[5,2]
> ## convert to variance scale by taking exp and then squaring
> low.var.id <- exp(low.log.sd.id)^2
> upp.var.id <- exp(upp.log.sd.id)^2
> low.var.early <- exp(low.log.sd.early)^2
> upp.var.early <- exp(upp.log.sd.early)^2
> ## back calculate SEs
> (upp.var.id - low.var.id) / (2*1.96)
[1] 1.24857
> (upp.var.early - low.var.early) / (2*1.96)
[1] 1.354657
> ## see how they compare to the confint answers for sd:
> sqrt(c(low.var.id, upp.var.id))
[1] 0.611891 2.295388
> sqrt(c(low.var.early, upp.var.early))
[1] 0.2222637 2.3150935
The last two rows above match the last two rows of the confint(bfit3) output pretty well. Now I guess I just wonder why the SEs for glmmADMB were 0.94743 and 0.61533 whereas the back-calculated ones for glmmTMB are 1.24857 and 1.354657 respectively...(?)
Not 100% sure about your analysis, but here's what I did to check (including digging in the guts of glmmADMB and using slightly obscure aspects of glmmTMB):
run glmmADMB, and dig into the ADMB .std output file to check on the results:
n par estimate sd
4 tmpL 1.6985e-01 3.3727e-01
5 tmpL -3.2901e-01 5.9780e-01
...
62 S 1.4045 9.474e-01
63 S 5.178e-01 6.191e-01
These lines are, respectively, the internal parameters (tmpL: log-standard deviations) and the transformed parameters (variances).
Re-do the transformation and check:
tmpL <- c(0.16985,-0.32901)
cbind(admb.raw=exp(2*tmpL),
admb=unlist(bfit2$S),
tmb =unlist(VarCorr(bfit3)))
## admb.raw admb tmb
## ID 1.4045262 1.40450 1.4045274
## early 0.5178757 0.51788 0.5145613
We get from the standard deviation of the log-std dev of the random effect to the standard deviation of the variance by multiplying by the derivative of the transformation (V = exp(2*logsd), so dV/d(logsd) = 2*exp(2*logsd))
## sd of log-sd from glmmTMB
tmb_sd <- sqrt(diag(vcov(bfit3,full=TRUE)))[4:5]
tmb_logsd <- bfit3$sdr$par.fixed[4:5]
tmpL_sd <- c(0.33727,0.5978)
cbind(admb.raw=tmpL_sd*2*exp(2*tmpL),
admb=unlist(bfit2$sd_S),
tmb=tmb_sd*2*exp(2*tmb_logsd))
## admb.raw admb tmb
## ID 0.9474091 0.94741 0.9474132
## early 0.6191722 0.61918 0.6152003
So these all seem to match up OK.
The $varcor object is a list. The standard deviations are stored as attributes:
str( summary(bfit3)$varcor )
List of 2
$ cond:List of 2
..$ ID : num [1, 1] 1.4
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr "(Intercept)"
.. .. ..$ : chr "(Intercept)"
.. ..- attr(*, "stddev")= Named num 1.19
.. .. ..- attr(*, "names")= chr "(Intercept)"
.. ..- attr(*, "correlation")= num [1, 1] 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr "(Intercept)"
.. .. .. ..$ : chr "(Intercept)"
.. ..- attr(*, "blockCode")= Named num 1
.. .. ..- attr(*, "names")= chr "us"
..$ early: num [1, 1] 0.515
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr "(Intercept)"
.. .. ..$ : chr "(Intercept)"
.. ..- attr(*, "stddev")= Named num 0.717
.. .. ..- attr(*, "names")= chr "(Intercept)"
.. ..- attr(*, "correlation")= num [1, 1] 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr "(Intercept)"
.. .. .. ..$ : chr "(Intercept)"
.. ..- attr(*, "blockCode")= Named num 1
.. .. ..- attr(*, "names")= chr "us"
..- attr(*, "sc")= num 1
..- attr(*, "useSc")= logi FALSE
$ zi : NULL
- attr(*, "sc")= logi FALSE
- attr(*, "class")= chr "VarCorr.glmmTMB"
This will loop over the .$cond elements of that object:
sapply( summary(bfit3)$varcor$cond, function(x) attr( x, "stddev") )
ID.(Intercept) early.(Intercept)
1.1851276 0.7173293

Rename x,y vectors in t-test R

I am performing a t-test in R
out <- t.test(x=input1, y=input2, alternative=c("two.sided","less","greater"), mu=0, paired=TRUE, conf.level = 0.95)
It gives the result
Paired t-test
data: input1 and input2
t = -1.1469, df = 7, p-value = 0.2891
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.15100900 0.05236717
sample estimates:
mean of the differences
-0.04932091
I need to change the names of the data in result. e.g.,
data: Fruits and Vegetables
Please, anyone give me an idea to include some attribute in t.test to change the data names.
With some dummy data
set.seed(1)
input1 <- rnorm(20, mean = -1)
input2 <- rnorm(20, mean = 5)
It would be easier just to rename or create objects with the desired names:
Fruits <- input1
Vegetables <- input2
t.test(x = Fruits, y = Vegetables, paired = TRUE, alternative = "two.sided")
Paired t-test
data: Fruits and Vegetables
t = -18.6347, df = 19, p-value = 1.147e-13
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.454791 -5.151218
sample estimates:
mean of the differences
-5.803005
But if you really want to do this after the fact, then grab the object returned by t.test():
tmp <- t.test(x = input1, y = input2, paired = TRUE, alternative = "two.sided")
Look at the structure of the object tmp
> str(tmp)
List of 9
$ statistic : Named num -18.6
..- attr(*, "names")= chr "t"
$ parameter : Named num 19
..- attr(*, "names")= chr "df"
$ p.value : num 1.15e-13
$ conf.int : atomic [1:2] -6.45 -5.15
..- attr(*, "conf.level")= num 0.95
$ estimate : Named num -5.8
..- attr(*, "names")= chr "mean of the differences"
$ null.value : Named num 0
..- attr(*, "names")= chr "difference in means"
$ alternative: chr "two.sided"
$ method : chr "Paired t-test"
$ data.name : chr "input1 and input2"
- attr(*, "class")= chr "htest"
and note the data.name component. We can replace that with a character string:
tmp$data.name <- "Fuits and Vegetables"
The print tmp:
> tmp
Paired t-test
data: Fuits and Vegetables
t = -18.6347, df = 19, p-value = 1.147e-13
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.454791 -5.151218
sample estimates:
mean of the differences
-5.803005

Resources