seach all inputs for a function to get only significant outputs - r

I have a function tryCatch that outputs a p-value for different inputs defined in ind_gene.
Is there a way to search all inputs in ind_gene and get all outputs from tryCatch where p < 0.05
ind_gene <- which(rownames(matrix_cpm_spike_liver) == "hsa-miR-320c")
s1 <- tryCatch(survdiff(
Surv(as.numeric(as.character(ClinicalDataHep$new_death))[ind_clin],
ClinicalDataHep$death_event[ind_clin])~
event_rna[ind_gene,ind_tum]), error = function(e) return(NA))
> s1
Call:
survdiff(formula = Surv(as.numeric(as.character(ClinicalDataHep$new_death))[ind_clin],
ClinicalDataHep$death_event[ind_clin]) ~ event_rna[ind_gene,
ind_tum])
N Observed Expected (O-E)^2/E (O-E)^2/V
event_rna[ind_gene, ind_tum]=0 8 6 5.52 0.0420 0.0652
event_rna[ind_gene, ind_tum]=1 51 25 25.48 0.0091 0.0652
Chisq= 0.1 on 1 degrees of freedom, p= 0.798
The p-value can be computed like this:
p.val <- 1 - pchisq(s1$chisq, length(s1$n) - 1)

Related

Fixing call notice printed after an lm() like function in R

I using an lm() like function called robu() from library robumeta within my own function foo.
However, I'm manipulating the formula argument such that when it is missing the default formula would be: formula(dint~1) or else any formula that user defines.
It works fine, however, in the output of foo the printed formula call always is: Model: missing(f) if formula(dint ~ 1) regardless of what formula is inputted in the foo.
Can I correct this part of output so that it only shows the exact formula used? (see below examples)
dat <- data.frame(dint = 1:9, SD = 1:9*.1,
time = c(1,1,2,3,4,3,2,4,1),
study.name = rep(c("bob", "jim", "jon"), 3))
library(robumeta)
# MY FUNCTION:
foo <- function(f, data){
robu(formula = if(missing(f)) formula(dint~1) else formula(f), data = data, studynum = study.name, var = SD^2)
}
# EXAMPLES OF USE:
foo(data = dat) ## HERE I expect: `Model: dint ~ 1`
foo(dint~as.factor(time), data = dat) ## HERE I expect: `Model: dint ~ time`
One option is to update the 'ml' object
foo <- function(f, data){
fmla <- if(missing(f)) {
formula(dint ~ 1)
} else {
formula(f)
}
model <- robu(formula = fmla, data = data, studynum = study.name, var = SD^2)
model$ml <- fmla
model
}
-checking
foo(data = dat)
RVE: Correlated Effects Model with Small-Sample Corrections
Model: dint ~ 1
Number of studies = 3
Number of outcomes = 9 (min = 3 , mean = 3 , median = 3 , max = 3 )
Rho = 0.8
I.sq = 96.83379
Tau.sq = 9.985899
Estimate StdErr t-value dfs P(|t|>) 95% CI.L 95% CI.U Sig
1 X.Intercept. 4.99 0.577 8.65 2 0.0131 2.51 7.48 **
---
Signif. codes: < .01 *** < .05 ** < .10 *
---
Note: If df < 4, do not trust the results
foo(dint~ as.factor(time), data = dat)
RVE: Correlated Effects Model with Small-Sample Corrections
Model: dint ~ as.factor(time)
Number of studies = 3
Number of outcomes = 9 (min = 3 , mean = 3 , median = 3 , max = 3 )
Rho = 0.8
I.sq = 97.24601
Tau.sq = 11.60119
Estimate StdErr t-value dfs P(|t|>) 95% CI.L 95% CI.U Sig
1 X.Intercept. 3.98 2.50 1.588 2.00 0.253 -6.80 14.8
2 as.factor.time.2 1.04 4.41 0.236 1.47 0.842 -26.27 28.3
3 as.factor.time.3 1.01 1.64 0.620 1.47 0.617 -9.10 11.1
4 as.factor.time.4 2.52 2.50 1.007 2.00 0.420 -8.26 13.3
---
Signif. codes: < .01 *** < .05 ** < .10 *

How to perform Tukey's pairwise test in R?

I have the following data (dat)
I have the following data(dat)
V W X Y Z
1 8 89 3 900
1 8 100 2 800
0 9 333 4 980
0 9 560 1 999
I wish to perform TukeysHSD pairwise test to the above data set.
library(reshape2)
dat1 <- gather(dat) #convert to long form
pairwise.t.test(dat1$key, dat1$value, p.adj = "holm")
However, every time I try to run it, it keeps running and does not yield an output. Any suggestions on how to correct this?
I would also like to perform the same test using the function TukeyHSD(). However, when I try to use the wide/long format, I run into a error that says
" Error in UseMethod("TukeyHSD") :
no applicable method for 'TukeyHSD' applied to an object of class "data.frame"
We need 'x' to be dat1$value as it is not specified the first argument is taken as 'x' and second as 'g'
pairwise.t.test( dat1$value, dat1$key, p.adj = "holm")
#data: dat1$value and dat1$key
# V W X Y
#W 1.000 - - -
#X 0.018 0.018 - -
#Y 1.000 1.000 0.018 -
#Z 4.1e-08 4.1e-08 2.8e-06 4.1e-08
#P value adjustment method: holm
Or we specify the argument and use in any order we wanted
pairwise.t.test(g = dat1$key, x= dat1$value, p.adj = "holm")
Regarding the TukeyHSD
TukeyHSD(aov(value~key, data = dat1), ordered = TRUE)
#Tukey multiple comparisons of means
# 95% family-wise confidence level
# factor levels have been ordered
#Fit: aov(formula = value ~ key, data = dat1)
#$key
# diff lwr upr p adj
#Y-V 2.00 -233.42378 237.4238 0.9999999
#W-V 8.00 -227.42378 243.4238 0.9999691
#X-V 270.00 34.57622 505.4238 0.0211466
#Z-V 919.25 683.82622 1154.6738 0.0000000
#W-Y 6.00 -229.42378 241.4238 0.9999902
#X-Y 268.00 32.57622 503.4238 0.0222406
#Z-Y 917.25 681.82622 1152.6738 0.0000000
#X-W 262.00 26.57622 497.4238 0.0258644
#Z-W 911.25 675.82622 1146.6738 0.0000000
#Z-X 649.25 413.82622 884.6738 0.0000034

How are the threshold or cutoff points in {Epi} R package selected?

In the R package {Epi} the ROC() function can generate a plot out of the dataset aSAH in in the {pROC} package like this:
with the following commands:
require(Epi)
require(pROC)
data(aSAH)
rock = ROC(form = outcome ~ s100b, data=aSAH, plot = "ROC", MX = T)
The sensitivity and specificity were calculated for 51 points included in the object nrow(rock$res). In this regard, note that nrow(aSAH) is instead 113.
Which points were used to generate rock$res?
If we were using the function roc() in the package {pROC} instead, we could get this via: roc(aSAH$outcome, aSAH$s100b)$threshold. But being different packages, they are probably different.
The answer is... of course... in the package documentation:
res dataframe with variables sens, spec, pvp, pvn and name of the test
variable. The latter is the unique values of test or linear predictor
from the logistic regression in ascending order with -Inf prepended.
So what are the unique values:
points = unique(aSAH$s100b); length(points) [1] 50 plus the pre-ended -Inf!
Nice inkling, but can we prove it... I think so:
require(Epi)
require(pROC)
data(aSAH)
rock = ROC(form = outcome ~ s100b, data=aSAH, plot = "ROC", MX = T)
d = aSAH
> head(d)
gos6 outcome gender age wfns s100b ndka
29 5 Good Female 42 1 0.13 3.01
30 5 Good Female 37 1 0.14 8.54
31 5 Good Female 42 1 0.10 8.09
points = sort(unique(d$s100b))
> head(points)
[1] 0.03 0.04 0.05 0.06 0.07 0.08
> length(points)
[1] 50
## Logistic regression coefficients:
beta.0 = as.numeric(rock$lr$coefficients[1])
beta.1 = as.numeric(rock$lr$coefficients[2])
## Sigmoid function:
sigmoid = 1 / (1 + exp(-(beta.0 + beta.1 * points)))
sigmoid = as.numeric(c("-Inf", sigmoid))
lr.eta = rock$res$lr.eta
length(lr.eta)
head(lr.eta)
head(sigmoid)
> head(lr.eta)
[1] -Inf 0.1663429 0.1732556 0.1803934 0.1877585 0.1953526
> head(sigmoid)
[1] -Inf 0.1663429 0.1732556 0.1803934 0.1877585 0.1953526
## Trying to get the lr.eta number 0.304 on the plot:
> which.max(rowSums(rock$res[, c("sens", "spec")]))
# 0.30426295405785 18
## What do we find in row 18 or res?
> rock$res[18,]
sens spec pvp pvn lr.eta
0.30426295405785 0.6341463 0.8055556 0.2054795 0.35 0.304263
## Yet, lr.eta is not the Youden's J statistic or index:
> rock$res[18,"sens"] + rock$res[18,"spec"] - 1
[1] 0.4397019
## Instead, it is the Probability of the outcome at the input with max Youden's index:
## Excluding the "-Inf" introduced by the ROC function (position 17 as opposed to 18):
max.sens.sp.cut = points[17]
1 / (1 + exp(-(beta.0 + beta.1 * max.sens.sp.cut))) [1] 0.304263 !!!
Done!
The lt.eta is, therefore, the probability of the outcome at the threshold corresponding to the maximum Youden's index.

How to get residuals from Repeated measures ANOVA model in R

Normally from aov() you can get residuals after using summary() function on it.
But how can I get residuals when I use Repeated measures ANOVA and formula is different?
## as a test, not particularly sensible statistically
npk.aovE <- aov(yield ~ N*P*K + Error(block), npk)
npk.aovE
summary(npk.aovE)
Error: block
Df Sum Sq Mean Sq F value Pr(>F)
N:P:K 1 37.0 37.00 0.483 0.525
Residuals 4 306.3 76.57
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
N 1 189.28 189.28 12.259 0.00437 **
P 1 8.40 8.40 0.544 0.47490
K 1 95.20 95.20 6.166 0.02880 *
N:P 1 21.28 21.28 1.378 0.26317
N:K 1 33.14 33.14 2.146 0.16865
P:K 1 0.48 0.48 0.031 0.86275
Residuals 12 185.29 15.44
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Intuitial summary(npk.aovE)$residuals return NULL..
Can anyone can help me with this?
Look at the output of
> names(npk.aovE)
and try
> npk.aovE$residuals
EDIT: I apologize I read your example way too quickly. What I suggested is not possible with multilevel models with aov(). Try the following:
> npk.pr <- proj(npk.aovE)
> npk.pr[[3]][, "Residuals"]
Here's a simpler reproducible anyone can mess around with if they run into the same issue:
x1 <- gl(8, 4)
block <- gl(2, 16)
y <- as.numeric(x1) + rnorm(length(x1))
d <- data.frame(block, x1, y)
m <- aov(y ~ x1 + Error(block), d)
m.pr <- proj(m)
m.pr[[3]][, "Residuals"]
The other option is with lme:
require(MASS) ## for oats data set
require(nlme) ## for lme()
require(multcomp) ## for multiple comparison stuff
Aov.mod <- aov(Y ~ N * V + Error(B/V), data = oats)
the_residuals <- aov.out.pr[[3]][, "Residuals"]
Lme.mod <- lme(Y ~ N * V, random = ~1 | B/V, data = oats)
the_residuals <- residuals(Lme.mod)
The original example came without the interaction (Lme.mod <- lme(Y ~ N * V, random = ~1 | B/V, data = oats)) but it seems to be working with it (and producing different results, so it is doing something).
And that's it...
...but for completeness:
1 - The summaries of the model
summary(Aov.mod)
anova(Lme.mod)
2 - The Tukey test with repeated measures anova (3 hours looking for this!!). It does raises a warning when there is an interaction (* instead of +), but it seems to be safe to ignore it. Notice that V and N are factors inside the formula.
summary(Lme.mod)
summary(glht(Lme.mod, linfct=mcp(V="Tukey")))
summary(glht(Lme.mod, linfct=mcp(N="Tukey")))
3 - The normality and homoscedasticity plots
par(mfrow=c(1,2)) #add room for the rotated labels
aov.out.pr <- proj(aov.mod)
#oats$resi <- aov.out.pr[[3]][, "Residuals"]
oats$resi <- residuals(Lme.mod)
qqnorm(oats$resi, main="Normal Q-Q") # A quantile normal plot - good for checking normality
qqline(oats$resi)
boxplot(resi ~ interaction(N,V), main="Homoscedasticity",
xlab = "Code Categories", ylab = "Residuals", border = "white",
data=oats)
points(resi ~ interaction(N,V), pch = 1,
main="Homoscedasticity", data=oats)

Error with post hoc tests in ANCOVA with factorial design

Does anyone know how to do post hoc tests in an ANCOVA model with a factorial design?
I have two vectors consisting of 23 baseline values (covariate) and 23 values after treatment (dependent variable) and I have two factors with both two levels. I created an ANCOVA model and calculated the adjusted means, standard errors and confidence intervals. Example:
library(effects)
baseline = c(0.7672,1.846,0.6487,0.4517,0.5599,0.2255,0.5946,1.435,0.5374,0.4901,1.258,0.5445,1.078,1.142,0.5,1.044,0.7824,1.059,0.6802,0.8003,0.5547,1.003,0.9213)
after_treatment = c(0.4222,1.442,0.8436,0.5544,0.8818,0.08789,0.6291,1.23,0.4093,0.7828,-0.04061,0.8686,0.8525,0.8036,0.3758,0.8531,0.2897,0.8127,1.213,0.05276,0.7364,1.001,0.8974)
age = factor(c(rep(c("Young","Old"),11),"Young"))
treatment = factor(c(rep("Drug",12),rep("Placebo",11)))
ANC = aov(after_treatment ~ baseline + treatment*age)
effect_treatage = effect("treatment*age",ANC)
data.frame(effect_treatage)
treatment age fit se lower upper
1 Drug Old 0.8232137 0.1455190 0.5174897 1.1289377
2 Placebo Old 0.6168641 0.1643178 0.2716452 0.9620831
3 Drug Young 0.5689036 0.1469175 0.2602413 0.8775659
4 Placebo Young 0.7603360 0.1462715 0.4530309 1.0676410
Now I want test if there is a difference between the adjusted means of
Young-Placebo:Young-Drug
Old-Placebo:Old-Drug
Young-Placebo:Old-Drug
Old-Placebo:Young-Drug
So I tried:
library(multcomp)
pH = glht(ANC, linfct = mcp(treatment*age="Tukey"))
# Error: unexpected '=' in "ph = glht(ANC_nback, linfct = mcp(treat*age="
And:
pH = TukeyHSD(ANC)
# Error in rep.int(n, length(means)) : unimplemented type 'NULL' in 'rep3'
# In addition: Warning message:
# In replications(paste("~", xx), data = mf) : non-factors ignored: baseline
Does anyone know how to resolve this?
Many thanks!
PS for more info see
R: How to graphically plot adjusted means, SE, CI ANCOVA
If you wish to use multcomp, then you can take advantage of the wonderful and seamless interface between lsmeans and multcomp packages (see ?lsm), whereas lsmeans provides support for glht().
baseline = c(0.7672,1.846,0.6487,0.4517,0.5599,0.2255,0.5946,1.435,0.5374,0.4901,1.258,0.5445,1.078,1.142,0.5,1.044,0.7824,1.059,0.6802,0.8003,0.5547,1.003,0.9213)
after_treatment = c(0.4222,1.442,0.8436,0.5544,0.8818,0.08789,0.6291,1.23,0.4093,0.7828,-0.04061,0.8686,0.8525,0.8036,0.3758,0.8531,0.2897,0.8127,1.213,0.05276,0.7364,1.001,0.8974)
age = factor(c(rep(c("Young","Old"),11),"Young"))
treatment = factor(c(rep("Drug",12),rep("Placebo",11)))
Treat <- data.frame(baseline, after_treatment, age, treatment)
ANC <- aov(after_treatment ~ baseline + treatment*age, data=Treat)
library(multcomp)
library(lsmeans)
summary(glht(ANC, linfct = lsm(pairwise ~ treatment * age)))
## Note: df set to 18
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: aov(formula = after_treatment ~ baseline + treatment * age, data = Treat)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Drug,Old - Placebo,Old == 0 0.20635 0.21913 0.942 0.783
## Drug,Old - Drug,Young == 0 0.25431 0.20698 1.229 0.617
## Drug,Old - Placebo,Young == 0 0.06288 0.20647 0.305 0.990
## Placebo,Old - Drug,Young == 0 0.04796 0.22407 0.214 0.996
## Placebo,Old - Placebo,Young == 0 -0.14347 0.22269 -0.644 0.916
## Drug,Young - Placebo,Young == 0 -0.19143 0.20585 -0.930 0.789
## (Adjusted p values reported -- single-step method)
This eliminates the need for reparametrization. You can achieve the same results by using lsmeans alone:
lsmeans(ANC, list(pairwise ~ treatment * age))
## $`lsmeans of treatment, age`
## treatment age lsmean SE df lower.CL upper.CL
## Drug Old 0.8232137 0.1455190 18 0.5174897 1.1289377
## Placebo Old 0.6168641 0.1643178 18 0.2716452 0.9620831
## Drug Young 0.5689036 0.1469175 18 0.2602413 0.8775659
## Placebo Young 0.7603360 0.1462715 18 0.4530309 1.0676410
##
## Confidence level used: 0.95
##
## $`pairwise differences of contrast`
## contrast estimate SE df t.ratio p.value
## Drug,Old - Placebo,Old 0.20634956 0.2191261 18 0.942 0.7831
## Drug,Old - Drug,Young 0.25431011 0.2069829 18 1.229 0.6175
## Drug,Old - Placebo,Young 0.06287773 0.2064728 18 0.305 0.9899
## Placebo,Old - Drug,Young 0.04796056 0.2240713 18 0.214 0.9964
## Placebo,Old - Placebo,Young -0.14347183 0.2226876 18 -0.644 0.9162
## Drug,Young - Placebo,Young -0.19143238 0.2058455 18 -0.930 0.7893
##
## P value adjustment: tukey method for comparing a family of 4 estimates
You need to use the which argument in TukeyHSD; "listing terms in the fitted model for which the intervals should be calculated". This is needed because you have a non-factor variable in the model ('baseline'). The variable causes trouble when included, which is default when which is not specified.
ANC = aov(after_treatment ~ baseline + treatment*age)
TukeyHSD(ANC, which = c("treatment:age"))
If you wish to use the more flexible glht, see section 3, page 8- here
Reparametrization is a possibility here:
treatAge <- interaction(treatment, age)
ANC1 <- aov(after_treatment ~ baseline + treatAge)
#fits are equivalent:
all.equal(logLik(ANC), logLik(ANC1))
#[1] TRUE
library(multcomp)
summary(glht(ANC1, linfct = mcp(treatAge="Tukey")))
# Simultaneous Tests for General Linear Hypotheses
#
#Multiple Comparisons of Means: Tukey Contrasts
#
#
#Fit: aov(formula = after_treatment ~ baseline + treatAge)
#
#Linear Hypotheses:
# Estimate Std. Error t value Pr(>|t|)
#Placebo.Old - Drug.Old == 0 -0.20635 0.21913 -0.942 0.783
#Drug.Young - Drug.Old == 0 -0.25431 0.20698 -1.229 0.617
#Placebo.Young - Drug.Old == 0 -0.06288 0.20647 -0.305 0.990
#Drug.Young - Placebo.Old == 0 -0.04796 0.22407 -0.214 0.996
#Placebo.Young - Placebo.Old == 0 0.14347 0.22269 0.644 0.916
#Placebo.Young - Drug.Young == 0 0.19143 0.20585 0.930 0.789
#(Adjusted p values reported -- single-step method)

Resources