Related
The function should perform as follows: The function takes the
arguments: x1, x2, alt = "two-sided", lev = 0.95, where the equality
indicates the default value.
•The arguments x1 and x2 are the X1 and X2 samples, respectively.
•The argument alt is the alternative hypothesis whose two other
possible values are "greater" and "less".
•The argument lev is the confidence level 1 −α. ii. The function
returns an R list containing the test statistic, p-value, confidence
level, and confidence interval.
iii. Inside the function, two Shapiro-Wilk tests of normality are
conducted separately for the two samples (note the normality
assumption at the beginning of the problem). If one or both p-values
are less than 0.05, a warning message is printed out explaining the
situation.
Here is what I have come up with so far but not sure how to create one function to run both:
library(stats)
x1 <- c(103, 94, 110, 87, 98, 102, 86, 98, 109, 92)
x2 <- c(97, 82, 123, 92, 175, 88, 118, 81, 165, 97, 134, 92, 87, 114)
var.test(x1, x2, alternative = "two.sided", conf.level = 0.95)
shapiro.test(x1)$p.value < 0.05|shapiro.test(x2)$p.value < 0.05
Some hints:
Your task is to write a function, so you should have something like this:
my_function <- function(x1, x2, alt = "two-sided", level = 0.95){
# fill in the body of the function here
}
You can do whatever you need to do in the body of the function.
Recall that in R, the last evaluated line of a function is automatically its returned value. So, you might choose to have your last line be list(...) as described in the problem statement.
It will be useful to store results of tests, etc. as variables inside your function, e.g. test_output_1 <- ... so that you can reference those things later in the body of your function.
I'm attempting to estimate survival probabilities using the survfit function from the Survival package. My dataset consists of animals that were captured at various times over the course of ~2 years. Some animals died, some animals were censored after capture and some animals lived beyond the end of the study (I'm guessing this means I have left, right and interval censored data).
I can estimate survival probability using right censors only, but this assumes all animals were captured on the same day and does not account for adding new animals through time. What I would like to do is estimate survival as a function of calendar day and not as a function of time since capture.
Example data:
time1<- c(2, 386, 0, 1, 384, 3, 61, 33, 385, 64)
time2<- c(366, 665, 285, 665, 665, 454, 279, 254, 665, 665)
censor<- c(3,3,3,3,3,3,3,3,3,3)
region <- c(1, 6, 1, 6, 5, 1, 1, 1, 5, 6)
m1<- data.frame(time1, time2, censor, region)
code:
km.2 <- survfit(Surv(m1$time1, m1$time2, m1$censor, type = "interval") ~ m1$region)
Note the above code runs but doesn't estimate what I laid out above. I hope this is an issue of specifying certain arguments in the survfit function but this is where I am lost. Thanks for the help
Not sure if you've figured this out by now since it was nearly a year ago. I'm a bit confused by the experiment you're explaining.
However, one item that pops out immediately is the "time1". I believe you can't have any times start or end at 0. I recommend adding 0.5 or 1 to that specific time observation, and explaining why in your write up. But having a 0 value is a likely culprit for why it's not estimating properly
I am trying to follow along the Factor Analysis chapter in "Using Multivariate Statistics", by Tabachnick and Fidell.
The data, and my steps, are as follows:
# data
dat.ski <- data.frame(skiers = paste0("S", c(1:5), sep=""), cost = c(32, 61, 59, 36, 62), lift=c(64, 37, 40, 62, 46) , depth = c(65, 62, 45, 34, 43), powder = c(67, 65, 43, 35, 40))
# correlation matrix
cor.ski <- cor(dplyr::select(dat.ski, -skiers))
# eigenvalues and eigenvectors
eig.ski <- eigen(cor.ski)
The correlation matrix and eigenvalues (2.02, 1.94, 0.04 and 0.00) correspond to that in the book. The first two eigenvectors I have are (.352, -0.251, -0.626, -0.647) and (.514, -.664, .322, .280).
However, the book then continues to say that only the first two eigenvalues are retained and the "factor analysis is re-run" which results in the following two eigenvalues*: 2.00, 1.91 and eigenvectors (-2,83, 0.177, 0.568, 0.675) and (0.651, -0.685, 0.252, 0.207). I can't work out to reproduce these eigenvectors... if I run psych::fa(cor.ski, nfactors=2, fm="pa"), the SS loadings correspond to the new eigenvalues*.
Any help on how to return the eigenvectors as per the text will be greatly appreciated.
Thanks.
I worked this out by remembering that R is a visible language! By looking at the definition of psych::fac, I see that the authors have actually performed 7 iterations of factor analysis, not mereley "taken the first two eigenvectors and rerun FA"; I also finally understand how factor analysis is performed and can tie it in with the subsequent text, which in a nutshell is:
Starting with the correlation matrix (r) and assuming k factors are used
Get eigenvalues (L) and eigenvectors (V) of correlation matrix r
Calculate C = sum(diag(R))
Calculate the loadings, A = V[,1:k] * Sqrt{L[1:k]} (eqn 13.6 of text)
set R* = AA' (eqn 13.5 of text, R=AA')
set C* = sum(diag(R*))
Update diag(R) = diag(R*)
Repeat above steps until max iterations reached, or until e = abs(C-C*) is smaller than some threshold
My non-linear model is the following:
fhw <- data.frame(
time=c(10800, 10810, 10820, 10830, 10840, 10850, 10860, 10870, 10880, 10890),
water=c( 105, 103, 103, 104, 107, 109, 112, 113, 113, 112)
)
nl <- nls(formula = water ~ cbind(1,poly(time,4),sin(omega_1*time+phi_1),
sin(omega_2*time+phi_2),
sin(omega_3*time+phi_3)), data = fhw,
start = list(omega_1=(2*pi)/545, omega_2=(2*pi)/205,
omega_3=(2*pi)/85, phi_1=pi, phi_2=pi, phi_3=pi),
algorithm = "plinear", control = list(maxiter = 1000))
Time is between 10800 and 17220, but I want to predict ahead. Using the function predict like this:
predict(nl,data.frame(time=17220:17520))
gives wrong results, since the first value it returns is complete different than the last value it return when I use predict(nl). I think the problem has something to do with poly, but I'm not sure. Furthermore, predicting at one time point, gives the error: degree' must be less than number of unique points. Can anybody help?
I have installed the mixdist package in R to combine distributions. Specifically, I'm using the mix() function. See documentation.
Basically, I'm getting
Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist = dist, :
missing value in parameter
I googled the error message, but no useful results popped up.
My first argument to mix() is a data frame called data.df. It is formatted exactly like the built-in data set pike65. I also did data.df <- as.mixdata(data.df).
My second argument has two rows. It is a data frame called datapar, formatted exactly like pikepar. My pi values are 0.5 and 0.5. My mu values are 250 and 463 (based on my data set). My sigma values are 0.5 and 1.
My call to mix() looks like:
fitdata <- mix(data.df, datapar, "norm", constr = mixconstr(consigma="CCV"), emsteps = 3, print.level = 2)
The printing shows that my pi values go from 0.5 to NaN after the first iteration, and that my gradient is becoming 0.
I would appreciate any help in sorting out this error.
Thanks,
n.i.
Using the test data you linked to
library(mixdist)
time <- seq(673,723)
counts <-c(3,12,8,12,18,24,39,48,64,88,101,132,198,253,331,
419,563,781,1134,1423,1842,2505,374,6099,9343,13009,
15097,13712,9969,6785,4742,3626,3794,4737,5494,5656,4806,
3474,2165,1290,799,431,213,137,66,57,41,35,27,27,27)
data.df <- data.frame(time=time, counts=counts)
We can see that
startparam <- mixparam(c(699,707),1 )
data.fit <- mix(data.mix, startparam, "norm")
Gives the same error. This error appears to be closely tied to the data (so the reason this data does not work could be potentially different than why yours does not work but this is the only example you offered up).
The problem with this data is that the probability between the two groups becomes indistinguishable at some point. Then that happens, the "E" step of the algorithm cannot estimate the pi variable properly. Here
pnorm(717,707,1)
# [1] 1
pnorm(717,699,1)
# [1] 1
both are exactly 1 and this seems to be causing the error. When mix takes 1 minus this value and compares the ratio to estimate group, it gets NaN values which are propagated to the estimate of proportions. When internally these NaN values are passed to nlm() to do the estimation, you get the error message
Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist = dist, :
missing value in parameter
The same error message can be replicated with
f <- function(x) sum((x-1:length(x))^2)
nlm(f, c(10,10))
nlm(f, c(10,NaN)) #error
So it appears the maxdist package will not work in this scenario. You may wish to contact the package maintainer to see if they are aware of the problem. In the meantime you will will need to find another way to estimate the parameters of you mixture model.
Now, I am not an expert in mixture distributions, but I think #MrFlick's accepted answer is a little bit misleading for anyone googling the error message (although no doubt correct for the example he gave). The core problem is that in both, your linked code and your example, the sigma values are very small compared to mu values. I think that the algorithm just cannot manage to find a solution with such small starting sigma values. If you increase the sigma values, you will get a solution. Linked code as an example:
library(mixdist)
time <- seq(673,723)
counts <- c(3, 12, 8, 12, 18, 24, 39, 48, 64, 88, 101, 132, 198, 253, 331, 419, 563, 781, 1134, 1423, 1842, 2505, 374, 6099, 9343, 13009, 15097, 13712, 9969, 6785, 4742, 3626, 3794, 4737, 5494, 5656, 4806, 3474, 2165, 1290, 799, 431, 213, 137, 66, 57, 41, 35, 27, 27, 27)
data.df <- data.frame(time=time, counts=counts)
data.mix <- as.mixdata(data.df)
startparam <- mixparam(mu = c(699,707), sigma = 1)
data.fit <- mix(data.mix, startparam, "norm") ## Leads to the error message
startparam <- mixparam(mu = c(699,707), sigma = 5) # Adjust start parameters
data.fit <- mix(data.mix, startparam, "norm")
plot(data.fit)
data.fit ### Estimates somewhat reasonable mixture distributions
# Parameters:
# pi mu sigma
# 1 0.853 699.3 4.494
# 2 0.147 708.6 2.217
A bottom line: if you can increase your start parameter sigma values, mix function might find reasonable estimates for you. You do not necessarily have to try another package.
In addition, you can get this message if you have missing data in your dataset.
From example set
data(pike65)
data(pikepar)
pike65$freq[10] <- NA
fitpike1 <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3)
Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist =
dist, : missing value in parameter