Related
The function should perform as follows: The function takes the
arguments: x1, x2, alt = "two-sided", lev = 0.95, where the equality
indicates the default value.
•The arguments x1 and x2 are the X1 and X2 samples, respectively.
•The argument alt is the alternative hypothesis whose two other
possible values are "greater" and "less".
•The argument lev is the confidence level 1 −α. ii. The function
returns an R list containing the test statistic, p-value, confidence
level, and confidence interval.
iii. Inside the function, two Shapiro-Wilk tests of normality are
conducted separately for the two samples (note the normality
assumption at the beginning of the problem). If one or both p-values
are less than 0.05, a warning message is printed out explaining the
situation.
Here is what I have come up with so far but not sure how to create one function to run both:
library(stats)
x1 <- c(103, 94, 110, 87, 98, 102, 86, 98, 109, 92)
x2 <- c(97, 82, 123, 92, 175, 88, 118, 81, 165, 97, 134, 92, 87, 114)
var.test(x1, x2, alternative = "two.sided", conf.level = 0.95)
shapiro.test(x1)$p.value < 0.05|shapiro.test(x2)$p.value < 0.05
Some hints:
Your task is to write a function, so you should have something like this:
my_function <- function(x1, x2, alt = "two-sided", level = 0.95){
# fill in the body of the function here
}
You can do whatever you need to do in the body of the function.
Recall that in R, the last evaluated line of a function is automatically its returned value. So, you might choose to have your last line be list(...) as described in the problem statement.
It will be useful to store results of tests, etc. as variables inside your function, e.g. test_output_1 <- ... so that you can reference those things later in the body of your function.
I have generated two survival curves (Kaplan-Meier estimate) using the function survfit for R from the survival packagem, with a survival object of the form Surv(time_1, time_2, event) and the formula Surv(time_1, time_2, event) ~ gender.
I would like to perform a statistical test of equality of the two resulting survival curves.
Unfortunately such a form of survival object is not admissible for survdiff. It only accepts Surv(time_2, event) which gives different (and in my case wrong) results.
Is there a function which allows me to compare the two curves based on the results of survfit?
Here is the code to create sample data:
e<-c(1, 0 ,1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1)
t1<-c(35, 35, 34, 35, 35, 35, 34, 35, 35, 35, 34, 35, 35, 35, 34, 35)
t2<-c(36, 37, 37, 36, 36,37, 35, 36, 36, 37, 37, 36, 36, 37, 35, 36)
g<-c("F","F","F","F","F","F","F","F","M","M","M","M","M","M","M","M")
data<-cbind(g,t1,t2,e)
data<-data.frame(data)
#result differs
km<-survfit(Surv(time_1,time_2,event)~Gender,data=data_test)
km2<-survfit(Surv(time_2,event)~Gender,data=data_test)
From what I got reading up a bit on the subject, the usual logrank test is not defined for interval censored data. This explains why the survdiff function is complaining about right-censored data.
Nonetheless, there exists generalizations of the logrank for interval censored data. Some seems to be implemented in the interval package (described here).
I cannot really help you more as I only work with right censored data so I never needed these generalization.
I hope that this can help you anyway.
My non-linear model is the following:
fhw <- data.frame(
time=c(10800, 10810, 10820, 10830, 10840, 10850, 10860, 10870, 10880, 10890),
water=c( 105, 103, 103, 104, 107, 109, 112, 113, 113, 112)
)
nl <- nls(formula = water ~ cbind(1,poly(time,4),sin(omega_1*time+phi_1),
sin(omega_2*time+phi_2),
sin(omega_3*time+phi_3)), data = fhw,
start = list(omega_1=(2*pi)/545, omega_2=(2*pi)/205,
omega_3=(2*pi)/85, phi_1=pi, phi_2=pi, phi_3=pi),
algorithm = "plinear", control = list(maxiter = 1000))
Time is between 10800 and 17220, but I want to predict ahead. Using the function predict like this:
predict(nl,data.frame(time=17220:17520))
gives wrong results, since the first value it returns is complete different than the last value it return when I use predict(nl). I think the problem has something to do with poly, but I'm not sure. Furthermore, predicting at one time point, gives the error: degree' must be less than number of unique points. Can anybody help?
I have basic knowledge in R, I would like to know how to write a code of an exponential function in R
F(X)=B(1-e^-AX)
where A=lambda parameter, B is a parameter represents the Y data, X represents the X data below.
I need the exponential model to generate the curve to fit the data; for example:
X <- c(22, 44, 69, 94, 119, 145, 172, 199, 227, 255)
PS: this x-axis in numbers (in millions).
Y <- c(1, 7, 8, 12, 12, 14, 14, 18, 19, 22)
This y-axis
any idea of how to write the code and fit this model in the data...?
In R you can write an exponential function with exp(), in your case:
F <- Y*(1-exp(-A*X))
data <-c(88, 84, 85, 85, 84, 85, 83, 85, 88, 89, 91, 99, 104, 112, 126, 138, 146,151, 150, 148, 147, 149, 143, 132, 131, 139, 147, 150, 148, 145, 140, 134, 131, 131, 129, 126, 126, 132, 137, 140, 142, 150, 159, 167, 170, 171, 172, 172, 174, 175, 172, 172, 174, 174, 169, 165, 156, 142, 131, 121, 112, 104, 102, 99, 99, 95, 88, 84, 84, 87, 89, 88, 85, 86, 89, 91, 91, 94, 101, 110, 121, 135, 145, 149, 156, 165, 171, 175, 177, 182, 193, 204, 208, 210, 215, 222, 228, 226, 222, 220)
Why do the ARMA models acting on the first differences of the data differ from the corresponding ARIMA models?
for (p in 0:5)
{
for (q in 0:5)
{
#data.arma = arima(diff(data), order = c(p, 0, q));cat("p =", p, ", q =", q, "AIC =", data.arma$aic, "\n");
data.arma = arima(data, order = c(p, 1, q));cat("p =", p, ", q =", q, "AIC =", data.arma$aic, "\n");
}
}
Same with Arima(data,c(5,1,4)) and Arima(diff(data),c(5,0,4)) in the forecast package. I can get the desired consistency with
auto.arima(diff(data),max.p=5,max.q=5,d=0,approximation=FALSE, stepwise=FALSE, ic ="aic", trace=TRUE);
auto.arima(data,max.p=5,max.q=5,d=1,approximation=FALSE, stepwise=FALSE, ic ="aic", trace=TRUE);
but it seems the holder of the minimum AIC estimate for these data has not been considered by the algorithm behind auto.arima; hence the suboptimal choice of ARMA(3,0) instead of ARMA(5,4) acting on the first differences. A related question is how much the two AIC estimates should differ before one considers one model better than the other has little to do wuth programming - the smallest AIC holder should at least be considered/reported, even though 9 coefficients may be a bit too much for a forecast from 100 observations.
My R questions are:
1) Vectorised version of the double loop so it is faster?
2) Why does arima(5,1,4) acting on the data differ from arma(5,4) acting on the first differences of the data? Which one is to be reported?
3) How do I sort the AICs output so that the smaller come first?
Thanks.
There are a lot of questions and issues raised here. I'll try to respond to each of them.
Arima() is just a wrapper for arima(), so it will give the same model.
arima() handles a model with differencing by using a diffuse prior. That is not the same as just differencing the data before fitting the model. Consequently, you will get slightly different results from arima(x,order=c(p,1,q)) and arima(diff(x),order=c(p,0,q)).
auto.arima() handles differencing directly and does not use a diffuse prior when fitting. So you will get the same results from auto.arima(x,d=1,...) and auto.arima(diff(x),d=0,...)
auto.arima() has an argument max.order which specifies the maximum of p+q. By default, max.order=5, so your arima(5,1,4) would not be considered. Increase max.order if you want to consider such large models (although I wouldn't recommend it).
You can't vectorize a loop involving nonlinear optimization at each iteration.
If you want to sort your output, you'll need to save it to a data.frame and then sort on the relevant column. The code currently just spits out the results as it goes and nothing is saved except for the most recent model fitted.