I am trying to apply a t-test to a factor with 24 levels (speaker). My goal is to see if there is a significant difference between orthography (2 levels: jj or L) according to the continuous variable, intensity difference (intdiff). However, when using the by() function, it returned the following error:
Error in FUN(X[[1L]], ...) : could not find function "FUN"
My syntax which produced the error was:
by(data, data$speaker, t.test(intdiff~orthography))
I specified the arguments according to the R documentation, so I can't figure out why it's not accepting the function I provided. Any help would be greatly appreciated. In the event you need to try to reproduce the problem, here is the data set with which I am working:
https://www.dropbox.com/s/bxb9ebavln1rh3u/SpanishPalatals.csv
Many thanks in advance.
This: t.test(intdiff~orthography) is not a function. It appears you are expecting by to split a dataframe so this might succeed:
by(data, data$speaker, function(d){ t.test(d$intdiff ~ d$orthography, data=d)} )
To explain further: function(d){ t.test(d$intdiff ~ d$orthography)} is a function. Or you could try:
by(data, data$speaker, t.test, form= intdiff ~ orthography ) # untested
The second version uses t.test (which is a function 'name' rather than a function 'call') and there is a formula method for t.test. The matching with argument names accepts partial names, so the dataframe being passed to`.test should get automatically matched to the 'data' argument.
The following:
ff <- function(spkr){
tt <- t.test(intdiff~orthography,data=df[df$speaker==spkr,])
p <- tt$p.value
return (c(as.character(spkr), p,
ifelse(p<0.01,"***",ifelse(p<0.05,"**",ifelse(p<0.1,"*","")))))
}
result <- sapply(unique(df$speaker),ff)
result <- data.frame(t(result))
colnames(result) <- c("speaker","p","")
Produces this with your dataset:
> result
speaker p
1 f11r 0.274156477338993
2 f13r 0.713051221315941
3 f15a 0.572200487250118
4 f16a 0.192474372524439
5 f19s 0.071456754899202 *
6 f21s 0.172336984420981
7 f23s 0.00711798616059324 ***
8 f24s 0.875438396151962
9 f31s 0.0191665818354575 **
10 f35s 0.550666959777641
11 f36s 0.715870353562376
12 m09a 0.195488505334365
13 m10a 0.0083410071012031 ***
14 m12r 0.461148808729932
15 m14r 0.407116475315898
16 m17s 0.00147426201434577 ***
17 m18s 0.614243811131762
18 m20s 0.204627912633947
19 m25s 0.00652026971231048 ***
20 m26s 0.135705391035981
21 m27s 0.099118573524907 *
22 m28s 0.0789796806312655 *
23 m32s 0.27026239413494
Note that one of the speakers had only 1 orthography (speaker = f22s), which causes the t.test to fail, so I removed it.
Related
Applying lmer() function across all columns in dataframe. I have made a list of variables and used lapply. Below is the code:
varlist=names(Genus_abundance)[5:ncol(Genus_abundance)]
lapply(varlist, function(x){lmer(substitute(i ~ Status + (1|Match), list(i=as.name(x), data=Genus_abundance, na.action = na.exclude)))})
However, I keep getting this error:
Error in eval(predvars, data, env) : object 'Acetatifactor' not found
I have checked and Acetatifactor is in the Genus_abundance dataframe.
Bit stuck about where its going wrong
EDIT:
Added a working example:
set.seed(43)
n <- 6
dat <- data.frame(id=1:n, Status=rep(LETTERS[1:2], n/2), age= sample(18:90, n, replace=TRUE), match=1:n, Acetatifactor=runif(n), Acutalibacter=runif(n), Adlercreutzia=runif(n))
head(dat)
id Status age match Acetatifactor Acutalibacter Adlercreutzia
1 1 A 49 1 0.1861022 0.1364904 0.8626298
2 2 B 31 2 0.7297301 0.8246794 0.3169752
3 3 A 23 3 0.4118721 0.5923042 0.2592606
4 4 B 64 4 0.4140497 0.7943970 0.7422665
5 5 A 60 5 0.4803101 0.7690324 0.7473611
6 6 B 79 6 0.4274945 0.9180564 0.9179040
lapply(varlist,
function(x){lmer(substitute(i ~ status + (1|match), list(i=as.name(x))),
data=dd)
})
The specific problem here is misplaced parentheses. You should close the substitute(..., list(i=as.name(x))) with three close-parentheses so that the whole chunk is properly understood as the first argument to lme4.
More generally I agree with #Kat in the comments that this is a good place to look. Since your arguments are already strings (not symbols) you don't really need all of the substitute() business and could use
fit_fun <- function(v) {
lmer(reformulate(c("status", "(1|match)"), response = v),
data = dd, na.action = na.exclude)
}
lapply(varlist, fit_fun)
Or you could use refit to fit the first column, then update the fit with each of the next columns. For large models this is much more efficient.
m1 <- lmer(resp1 ~ status + (1|match), ...)
m_other <- lapply(dd[-(1:3)], refit, object = m1)
c(list(m1), m_other)
I’m trying to return a parameter in a list, but I cannot find the parameter using str(list).
this is my codes
install.packages("meta")
library(meta)
m1 <- metacor(c(0.85, 0.7, 0.95), c(20, 40, 10))
m1
COR 95%-CI %W(fixed) %W(random)
1 0.8500 [0.6532; 0.9392] 27.9 34.5
2 0.7000 [0.4968; 0.8304] 60.7 41.7
3 0.9500 [0.7972; 0.9884] 11.5 23.7
Number of studies combined: k = 3
COR 95%-CI z p-value
Fixed effect model 0.7955 [0.6834; 0.8710] 8.48 < 0.0001
Random effects model 0.8427 [0.6264; 0.9385] 4.87 < 0.0001
how could I save COR(=0.8427) orp-value(=< 0.0001) forRandom effects model as a single parameter.
It seems that the numbers that you are looking for (cor 0.8427) are created in print.meta. The function seems too big though so I gave up trying to pinpoint exactly where it gets calculated and what name it has. I don't think it is even saved within the function, but rather printed.
Anyway I took the alternative road of capturing the output:
#capture the output of the summary - the fifth line gives us what we want
out <- capture.output(summary(m1))[5]
#capture all the number and return the first
unlist(regmatches(out, gregexpr("[[:digit:]]+\\.*[[:digit:]]*", out)))[1]
#[1] "0.8427"
I assume your problem is accessing to the object.
The $ will help you with it, such that by putting the variablename, then the dollar and by pressing the tab, the different possibilities of that object will appear. According to you questions, the values would be
> m1$cor[1]
[1] 0.85
> mysummary<-summary(m1)
> mysummary$fixed$p
[1] 2.163813e-17
> mysummary$fixed$z
[1] 8.484643
> ifelse(mysummary$fixed$p<0.0001, "<0.0001", "WHATEVER")
[1] "<0.0001"
To select a specific one, you can use [i] where i is an integer (example i = 1 for 0.85)
To get a 0.0001, I suggest using an ifelse() statement on pvalues or Z with their according rule. Cheers !
I am trying to use the R mnnCorrect function (from the scran package). It requires at least 2 input matrices to work.
# install package
source("https://bioconductor.org/biocLite.R"); biocLite("scran")
# example matrix 1
B1 <- matrix(rnorm(10000), ncol=50)
# example matrix 2
B2 <- matrix(rnorm(10000), ncol=50)
# function below works fine
out <- mnnCorrect(B1, B2)
However, I am trying to supply these matrices as a list like so (more convenient for automating the process with a variable number of matrices):
mat_list=list()
mat_list[["Mat1"]]=B1
mat_list[["Mat2"]]=B2
str(mat_list)
List of 2
$ Mat1: num [1:200, 1:50] 1.107 -0.828 1.559 -1.353 0.667 ...
$ Mat2: num [1:200, 1:50] -0.231 0.894 0.369 1.606 -1.346 ...
# This works fine
out <- mnnCorrect(mat_list$Mat1, mat_list$Mat2)
# These do not work
out <- mnnCorrect(mat_list)
Error in mnnCorrect(mat_list) : at least two batches must be specified
out <- mnnCorrect(cat(paste(gsub("^","mat_list$",names(mat_list)),collapse=", "))
Error in mnnCorrect(mat_list) : at least two batches must be specified
out <- mnnCorrect(capture.output(cat(paste(gsub("^","mat_list$",names(mat_list)),collapse=", ")))
Error in mnnCorrect(mat_list) : at least two batches must be specified
library(dplyr)
cat(paste(gsub("^","mat_list$",names(mat_list)),collapse=", ") %>% mnnCorrect(.)
mat_list$Mat1, mat_list$Mat2Error in mnnCorrect(.) : at least two batches must be specified
Is there a way to achieve this?
In R, you use the function do.call for that. Here is an example:
do.call(mnnCorrect, mat_list)
See also the help page ?do.call .
I am trying to run some summary statistics on a large data set where the groups = (Entry + Plant). I am using the summaryBy() function, and it appears to be working fine for most of my variables. It is, however, transforming one of my variables (YieldPlant) using an unknown function and improperly calculating means and standard deviations. Here is some sample output:
> library(doBy)
> SP.data <- read.csv("~/Desktop/2014 Summer Research/Within-Line Variation Trial/2014 Heirloom Variation Trial.csv", na.string = c("NA"))
> head(SP.data$YieldPlant, n=10) [1] NA NA NA NA 16.16 18.58 11.2 10.95 11.61 13.94
> summaryTRAITS <- summaryBy(YieldPlant ~ Entry + Plant, data=SP.data, FUN = function(Plant) { c(m=mean(Plant, na.rm=T), s=sd(Plant, na.rm=T))})
> head(summaryTRAITS$YieldPlant.m, n=10) [1] NaN 307.8571 444.0000 364.0000 179.5714 354.2857 592.1429 521.3333 729.8571 322.4286
The "YieldPlant" should be much smaller than R is recognizing. I'd appreciate any help you all can offer. Thanks!
Hannah
I am attempting to carry out lasso regression using the lars package but can not seem to get the lars bit to work. I have inputted code:
diabetes<-read.table("diabetes.txt", header=TRUE)
diabetes
library(lars)
diabetes.lasso = lars(diabetes$x, diabetes$y, type = "lasso")
However, I get an error message of :
Error in rep(1, n) : invalid 'times' argument.
I have tried entering it like this:
diabetes<-read.table("diabetes.txt", header=TRUE)
library(lars)
data(diabetes)
diabetes.lasso = lars(age+sex+bmi+map+td+ldl+hdl+tch+ltg+glu, y, type = "lasso")
But then I get the error message:
'Error in lars(age+sex + bmi + map + td + ldl + hdl + tch + ltg + glu, y, type = "lasso") :
object 'age' not found'
Where am I going wrong?
EDIT: Data - as below but with another 5 columns.
ldl hdl tch ltg glu
1 -0.034820763 -0.043400846 -0.002592262 0.019908421 -0.017646125
2 -0.019163340 0.074411564 -0.039493383 -0.068329744 -0.092204050
3 -0.034194466 -0.032355932 -0.002592262 0.002863771 -0.025930339
4 0.024990593 -0.036037570 0.034308859 0.022692023 -0.009361911
5 0.015596140 0.008142084 -0.002592262 -0.031991445 -0.046640874
I think some of the confusion may have to do with the fact that the diabetes data set that comes with the lars package has an unusual structure.
library(lars)
data(diabetes)
sapply(diabetes,class)
## x y x2
## "AsIs" "numeric" "AsIs"
sapply(diabetes,dim)
## $x
## [1] 442 10
##
## $y
## NULL
##
## $x2
## [1] 442 64
In other words, diabetes is a data frame containing "columns" which are themselves matrices. In this case, with(diabetes,lars(x,y,type="lasso")) or lars(diabetes$x,diabetes$y,type="lasso") work fine. (But just lars(x,y,type="lasso") won't, because R doesn't know to look for the x and y variables within the diabetes data frame.)
However, if you are reading in your own data, you'll have to separate the response variable and the predictor matrix yourself, something like
X <- as.matrix(mydiabetes[names(mydiabetes)!="y",])
mydiabetes.lasso = lars(X, mydiabetes$y, type = "lasso")
Or you might be able to use
X <- model.matrix(y~.,data=mydiabetes)
lars::lars does not appear to have a formula interface, which means you cannot use the formula specification for the column names (and furthermore it does not accept a "data=" argument). For more information on this and other "data mining" topics, you might want to get a copy of the classic text: "Elements of Statistical Learning". Try this:
# this obviously assumes require(lars) and data(diabetes) have been executed.
> diabetes.lasso = with( diabetes, lars(x, y, type = "lasso"))
> summary(diabetes.lasso)
LARS/LASSO
Call: lars(x = x, y = y, type = "lasso")
Df Rss Cp
0 1 2621009 453.7263
1 2 2510465 418.0322
2 3 1700369 143.8012
3 4 1527165 86.7411
4 5 1365734 33.6957
5 6 1324118 21.5052
6 7 1308932 18.3270
7 8 1275355 8.8775
8 9 1270233 9.1311
9 10 1269390 10.8435
10 11 1264977 11.3390
11 10 1264765 9.2668
12 11 1263983 11.0000