Change confidence level in t test (inside of by()) in R - r

I need to change the value of confidence level in t.test in by()
res<-by(HUS[,1], HUS$air.conditioning, FUN=t.test)
Where i can provide conf.level?

Use anonymous/lamdba function (function(x)
res <- by(HUS[,1], HUS$air.conditioning, FUN =
function(x) t.test(x, conf.level = 0.90))
Or specify the named argument
res <- by(HUS[,1], HUS$air.conditioning, FUN=t.test, conf.level = 0.90)
NOTE: Specifying the named argument may not work in all the functions e.g. ave
Check with reproducible example
res1 <- by(mtcars[, "mpg"], mtcars$cyl, FUN = t.test, conf.level = 0.90)
res2 <- by(mtcars[, "mpg"], mtcars$cyl, FUN = function(x) t.test(x, conf.level = 0.90))
> res1 <- lapply(res1, function(x) {x$data.name <- "x"; x})
> all.equal(res1, res2, check.attributes = FALSE)
[1] TRUE

Related

Error in R Code if called in a function (multcompleters, strsplit: non-character argument)

I would like to generate labels from a Tukey test inside a function.
There it throws an error whereas if I call the code not inside a function it works absolutely fine.
Other threads say the solution is to convert a variable with "as.character", but when I tried this it did not work.
Sadly I could not figure out the error and it would be great if you could help me. The error and the trace are in the last lines.
Minimal working example:
require(plyr)
require(multcomp)
require(multcompView)
require(datasets)
data(iris)
iris
Dataset <- iris
####################################################################
#Works:
a=aov(Dataset$Sepal.Length ~ Dataset$Species)
tHSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)
generate_label_df <- function(HSD, flev){
Tukey.levels <- HSD[[flev]][,4]
Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
plot.labels <- names(Tukey.labels[['Letters']])
boxplot.df <- ddply(Dataset, flev, function (x) max(fivenum(x$y)) + 0.2)
plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],stringsAsFactors = FALSE)
labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)
return(labels.df)
}
LABELS <- generate_label_df(tHSD, 'Dataset$Species')
####################################################################
#Throws error:
doTukey <- function(y_var, x_var, ret=FALSE) {
require(ggplot2)
require(plyr)
require(multcomp)
a=aov(y_var ~ x_var)
tHSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)
generate_label_df <- function(HSD, flev){
Tukey.levels <- HSD[[flev]][,4]
Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
plot.labels <- names(Tukey.labels[['Letters']])
boxplot.df <- ddply(Dataset, flev, function (x) max(fivenum(x$y)) + 0.2)
plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],stringsAsFactors = FALSE)
labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)
return(labels.df)
}
LABELS <- generate_label_df(tHSD, 'Dataset$Species')
}
doTukey(Dataset$Sepal.Length, Dataset$Species) # Error: Error in strsplit(x, sep) : non-character argument,
#Trace:
#5.strsplit(x, sep)
#4.vec2mat2(namx)
#3.multcompLetters(Tukey.levels)
#2.generate_label_df(tHSD, "Dataset$Species")
#1.doTukey(Dataset$Sepal.Length, Dataset$Species)
Thanks
You are right, the problem comes with this
multcompLetters(Tukey.levels)
When your aov is a function, the names are taken from the formula variable, you can see with my example below:
Dataset = iris
Tukeyobj <- function(y_var, x_var, ret=FALSE) {
a=aov(y_var ~ x_var)
tHSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)
return(tHSD)
}
names(Tukeyobj(Dataset$Sepal.Length, Dataset$Species))
#returns "x_var"
So in your function, HSD[[flev]] returns NULL and strsplit on NULL, surprisingly gives you that error. Always check what you are calling in the function, try to make it self contained. I made some minor corrections to what it have, it should work:
require(plyr)
require(multcomp)
require(multcompView)
require(datasets)
data(iris)
generate_label_df <- function(HSD, flev,DATA){
Tukey.levels <- HSD[[flev]][,4]
Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
plot.labels <- names(Tukey.labels[['Letters']])
boxplot.df <- ddply(DATA, flev, function (x) max(fivenum(x$y)) + 0.2)
plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],stringsAsFactors = FALSE)
# note this part doesn't work
# last column is all NAs
labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)
return(labels.df)
}
doTukey <- function(y_var, x_var,DATA,ret=FALSE) {
FORMULA = as.formula(paste(y_var,"~",x_var))
a=aov(FORMULA,data=DATA)
HSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)
return(generate_label_df(HSD,x_var,DATA))
}
doTukey("Sepal.Length","Species",iris)

r how to keep print method for custom class

i have defined a method for printing a vector with the class test:
print.test <- function(x, ...) {
x <- formatC(
as.numeric(x),
format = "f",
big.mark = ".",
decimal.mark = ",",
digits = 1
)
x[x == "NA"] <- "-"
x[x == "NaN"] <- "-"
print.default(x)
}
which works fine for the following
a <- c(1000.11, 2000.22, 3000.33)
class(a) <- c("test", class(a))
print(a)
[1] "1.000,11" "2.000,22" "3.000,33"
this also works:
round(a)
[1] "1.000,0" "2.000,0" "3.000,0"
this does not:
median(a)
[1] 2000.22
class(median(a))
[1] "numeric"
now my question is: do i need to write a custom method for this class to use median e.g. and if so what would it look like or is there another way (as i simply would like this class to print the data in a certain format)?
The problem is that median.default returns an object of class numeric therefore autoprinting of the returned object does not call your custom print method.
The following will do so.
median.test <- function(x, na.rm = FALSE, ...){
y <- NextMethod(x, na.rm = na.rm, ...)
class(y) <- c("test", class(y))
y
}
median(a)
#[1] "2.000,2"
As for the handling of NA values, I will first define another method for a base R function. It is not strictly needed but save some code lines if objects of class test are used frequently.
c.test <- function(x, ...){
y <- NextMethod(x, ...)
class(y) <- c("test", class(y))
y
}
b <- c(a, NA)
class(b)
#[1] "test" "numeric"
median(b)
#[1] "-"
median(b, na.rm = TRUE)
#[1] "2.000,2"
EDIT.
The following defines a generic function wMedian, a default method and a method for objects of class "currency", as requested by the OP in a comment.
Note that there must be a method print.currency, which I don't redefine since it's exactly the same as print.test above. As for the other methods, I have made them simpler with the help of a new function, as.currency.
median.currency <- function(x, na.rm = FALSE, ...){
y <- NextMethod(x, na.rm = na.rm, ...)
as.currency(y)
}
c.currency <- function(x, ...){
y <- NextMethod(x, ...)
as.currency(y)
}
as.currency <- function(x){
class(x) <- c("currency", class(x))
x
}
wMedian <- function(x, ...) UseMethod("wMedian")
wMedian.default <- function(x, ...){
matrixStats::weightedMedian(x, ...)
}
wMedian.currency <- function(x, w = NULL, idxs = NULL, na.rm = FALSE, interpolate = is.null(ties), ties = NULL, ...) {
y <- NextMethod(x, w = w, idxs = idxs, na.rm = na.rm, interpolate = interpolate, ties = ties, ... )
as.currency(y)
}
set.seed(1)
x <- rnorm(10)
wMedian(x, w = (1:10)/10)
#[1] 0.4084684
wMedian(as.currency(x), w = (1:10)/10)
#[1] "0,4"

nls boot error must have positive length

I am getting the error below with nlsBoot() any idea what is wrong?
Error in apply(tabboot, 1, quantile, c(0.5, 0.025, 0.975)) :
dim(X) must have a positive length
set.seed(1)
x = 1:100
y = x^2+rnorm(100,50,500)
plot(x,y)
d = data.frame(x =x, y=y)
mymodel = nls(y~x^b,start= list(b=1),data = d)
mymodel
library(nlstools)
nlsBoot(mymodel, niter = 999)
Try to define the formula before applying the nls function, like this:
formula <- as.formula(y ~ x^b)
mymodel <- nls(formula,start= list(b=1),data = d)
added
Well, I've modified the code and now it can handle one parameter fit.
# My suggestion is to erase all the environment first:
rm(list = ls())
# Then we start again:
set.seed(1)
x = 1:100
y = x^2+rnorm(100,50,500)
plot(x,y)
d = data.frame(x =x, y=y)
mymodel = nls(y~x^b,start= list(b=1),data = d)
Here is the function that you have to use:
nlsboot_onepar <- function (nls, niter = 999)
{
if (!inherits(nls, "nls"))
stop("Use only with 'nls' objects")
data2 <- eval(nls$data, sys.frame(0))
fitted1 <- fitted(nls)
resid1 <- resid(nls)
var1 <- all.vars(formula(nls)[[2]])
l1 <- lapply(1:niter, function(i) {
data2[, var1] <- fitted1 + sample(scale(resid1, scale = FALSE),
replace = TRUE)
nls2 <- try(update(nls, start = as.list(coef(nls)),
data = data2), silent = TRUE)
if (inherits(nls2, "nls"))
return(list(coef = coef(nls2), rse = summary(nls2)$sigma))
})
if (sum(sapply(l1, is.null)) > niter/2)
stop(paste("Procedure aborted: the fit only converged in",
round(sum(sapply(l1, is.null))/niter), "% during bootstrapping"))
tabboot <- sapply(l1[!sapply(l1, is.null)], function(z) z$coef,simplify =
FALSE)
tabboot <- as.matrix(t(as.numeric(tabboot)))
rownames(tabboot) <- "b"
rseboot <- sapply(l1[!sapply(l1, is.null)], function(z) z$rse)
recapboot <- t(apply(tabboot, 1, quantile, c(0.5, 0.025,
0.975)))
colnames(recapboot) <- c("Median", "2.5%", "97.5%")
estiboot <- t(apply(tabboot, 1, function(z) c(mean(z), sd(z))))
colnames(estiboot) <- c("Estimate", "Std. error")
serr <- sum(sapply(l1, is.null))
if (serr > 0)
warning(paste("The fit did not converge", serr, "times during
bootstrapping"))
listboot <- list(coefboot = t(tabboot), rse = rseboot, bootCI = recapboot,
estiboot = estiboot)
class(listboot) <- "nlsBoot"
return(listboot)
}
And then we use it:
result <- nlsboot_onepar(mymodel, niter = 999)
If you want to plot the parameter distribution, you can do this:
graphics.off()
plot(density(as.vector(result$coefboot)))
# or
hist(as.vector(result$coefboot))
I hope that helps you.

Remove an argument / element from ellipsis

I want to drop an element from an ellipsis because I want to set it manaully inside my function. The example below illustrates the point as the error is sensible given I'm passing na.rm in manually and via ellipsis. How can I remove na.rm from the ellipsis so that the error isn't thrown?
mymean <- function(x, ...){
dots <- list(...)
if (is.null(dots$na.rm)) {
na_arg <- TRUE
} else {
na_arg <- dots$na.rm
}
mean(x, na.rm = na_arg, ...)
}
set.seed(10); a <- c(rnorm(20), NA)
mymean(a)
## [1] -0.06053267
mymean(a, trim = .5)
## [1] -0.08275319
mymean(a, na.rm = TRUE, trim = .1)
## Error in mean.default(x, na.rm = na_arg, ...) :
## formal argument "na.rm" matched by multiple actual arguments
1) lm Look at the source code of lm. Following that:
mymean1 <- function(x, ...){
mc <- match.call()
mc[[1]] <- as.name("mean")
if (is.null(mc$na.rm)) mc$na.rm <- TRUE
eval(mc, parent.frame())
}
mymean1(a, na.rm = TRUE, trim = .1)
## [1] -0.01886715
mymean1(a, trim = .1)
## [1] -0.01886715
2) do.call This would also work although there is a subtle difference, namely ... gets evaluated. In this and most cases it won't matter but it might in some less usual cases.
mymean2 <- function(x, ...) {
dots <- list(...)
if (! "na.rm" %in% names(dots)) dots$na.rm <- TRUE
do.call("mean", c(list(x), dots))
}
mymean2(a, na.rm = TRUE, trim = .1)
## [1] -0.01886715
mymean2(a, trim = .1)
## [1] -0.01886715
3) ordinary arg However, it would be easier to just write:
mymean3 <- function(x, na.rm = TRUE, ...) mean(x, na.rm = na.rm, ...)
mymean3(a, na.rm = TRUE, trim = .1)
## [1] -0.01886715
mymean3(a, trim = .1)
## [1] -0.01886715
This is the usual way wrappers with different defaults are written. For example see the read.csv source code.

Apply function to all columns but restricted to the rows within each level of a class in R

I have a df with many numeric columns, and one factor variable (catvar) with three levels (A, B and C). I have a function which performs some esoteric calculations and returns a new dataframe with a new column titled "estimatefinal". This works fine, however I would like this to be applied separately for each of the levels of "catvar". How can I do this?
This is my function, though the specifics of it aren't necessarily relevant:
bestpercentile <- function(x){
n = 10
auc_final <- array(0, dim = c(n, n, n))
for (i in 1:n){
for (j in 1:n){
for (k in 1:n){
findbest <- function(x){
x <- spare
x$income_per <-apply(x[,6:9], 1, quantile, probs = i/n, na.rm = T)
x$purchases_per <-apply(x[,10:13], 1, quantile, probs = j/n, na.rm = T)
x$expenses_per <-apply(x[,14:18], 1, quantile, probs = k/n, na.rm = T)
x$estimate <<- x$income_per-x$purchases_per-x$expenses_per
}
findbest(x)
model <- glm(default ~ estimate, data = x, family = "binomial")
auc_coef <- performance(prediction(predict(model , x, type="response"), x$default), measure = "auc")
auc_final[i,j,k] <- auc_coef#y.values[[1]]
}
}
}
optimumpercentile <- which(auc_final == max(auc_final), arr.ind = TRUE)
x$income_per <-apply(x[,6:9], 1, quantile, probs = optimumpercentile[1]/10, na.rm = T)
x$purchases_per <-apply(x[,10:13], 1, quantile, probs = optimumpercentile[2]/10, na.rm = T)
x$expenses_per <-apply(x[,14:18], 1, quantile, probs = optimumpercentile[3]/10, na.rm = T)
x$estimatefinal <- x$income_per-x$purchases_per-x$expenses_per
return(x)
}
The end result would be exactly what I would get if I was to split the df into three for each of the levels of "catvar" then run the function three times for each of those new df's and then finally to rbind the three three together. of course I count do it that hacky way, but I would rather have a more elegant solution.
Thanks

Resources