Test if two survival curves are different in R - r

I have generated two survival curves (Kaplan-Meier estimate) using the function survfit for R from the survival packagem, with a survival object of the form Surv(time_1, time_2, event) and the formula Surv(time_1, time_2, event) ~ gender.
I would like to perform a statistical test of equality of the two resulting survival curves.
Unfortunately such a form of survival object is not admissible for survdiff. It only accepts Surv(time_2, event) which gives different (and in my case wrong) results.
Is there a function which allows me to compare the two curves based on the results of survfit?
Here is the code to create sample data:
e<-c(1, 0 ,1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1)
t1<-c(35, 35, 34, 35, 35, 35, 34, 35, 35, 35, 34, 35, 35, 35, 34, 35)
t2<-c(36, 37, 37, 36, 36,37, 35, 36, 36, 37, 37, 36, 36, 37, 35, 36)
g<-c("F","F","F","F","F","F","F","F","M","M","M","M","M","M","M","M")
data<-cbind(g,t1,t2,e)
data<-data.frame(data)
#result differs
km<-survfit(Surv(time_1,time_2,event)~Gender,data=data_test)
km2<-survfit(Surv(time_2,event)~Gender,data=data_test)

From what I got reading up a bit on the subject, the usual logrank test is not defined for interval censored data. This explains why the survdiff function is complaining about right-censored data.
Nonetheless, there exists generalizations of the logrank for interval censored data. Some seems to be implemented in the interval package (described here).
I cannot really help you more as I only work with right censored data so I never needed these generalization.
I hope that this can help you anyway.

Related

"Manual" Factor Analysis in R

I am trying to follow along the Factor Analysis chapter in "Using Multivariate Statistics", by Tabachnick and Fidell.
The data, and my steps, are as follows:
# data
dat.ski <- data.frame(skiers = paste0("S", c(1:5), sep=""), cost = c(32, 61, 59, 36, 62), lift=c(64, 37, 40, 62, 46) , depth = c(65, 62, 45, 34, 43), powder = c(67, 65, 43, 35, 40))
# correlation matrix
cor.ski <- cor(dplyr::select(dat.ski, -skiers))
# eigenvalues and eigenvectors
eig.ski <- eigen(cor.ski)
The correlation matrix and eigenvalues (2.02, 1.94, 0.04 and 0.00) correspond to that in the book. The first two eigenvectors I have are (.352, -0.251, -0.626, -0.647) and (.514, -.664, .322, .280).
However, the book then continues to say that only the first two eigenvalues are retained and the "factor analysis is re-run" which results in the following two eigenvalues*: 2.00, 1.91 and eigenvectors (-2,83, 0.177, 0.568, 0.675) and (0.651, -0.685, 0.252, 0.207). I can't work out to reproduce these eigenvectors... if I run psych::fa(cor.ski, nfactors=2, fm="pa"), the SS loadings correspond to the new eigenvalues*.
Any help on how to return the eigenvectors as per the text will be greatly appreciated.
Thanks.
I worked this out by remembering that R is a visible language! By looking at the definition of psych::fac, I see that the authors have actually performed 7 iterations of factor analysis, not mereley "taken the first two eigenvectors and rerun FA"; I also finally understand how factor analysis is performed and can tie it in with the subsequent text, which in a nutshell is:
Starting with the correlation matrix (r) and assuming k factors are used
Get eigenvalues (L) and eigenvectors (V) of correlation matrix r
Calculate C = sum(diag(R))
Calculate the loadings, A = V[,1:k] * Sqrt{L[1:k]} (eqn 13.6 of text)
set R* = AA' (eqn 13.5 of text, R=AA')
set C* = sum(diag(R*))
Update diag(R) = diag(R*)
Repeat above steps until max iterations reached, or until e = abs(C-C*) is smaller than some threshold

exponential function in R

I have basic knowledge in R, I would like to know how to write a code of an exponential function in R
F(X)=B(1-e^-AX)
where A=lambda parameter, B is a parameter represents the Y data, X represents the X data below.
I need the exponential model to generate the curve to fit the data; for example:
X <- c(22, 44, 69, 94, 119, 145, 172, 199, 227, 255)
PS: this x-axis in numbers (in millions).
Y <- c(1, 7, 8, 12, 12, 14, 14, 18, 19, 22)
This y-axis
any idea of how to write the code and fit this model in the data...?
In R you can write an exponential function with exp(), in your case:
F <- Y*(1-exp(-A*X))

Polynomial regression in Maple

In Maple I have two lists
A:=[seq(i, i=1..10)];
B:=[10, 25, 43, 63, 83, 92, 99, 101, 101, 96];
Is it possible to do polynomial or power regression in Maple?
I want to fit a trend line as a 3rd order polynomium where each point is (A[i], B[i]).
All you need is
Statistics:-LinearFit([1,x,x^2,x^3], A, B, x);

Generate samples from frequency table with fixed values

I have a population 2x2 frequency table, with specific values: 20, 37, 37, 20. I need to generate N number of samples from this population (for simulation purposes).
How can I do it in R?
Try this. In the example, the integers represent cell 1, 2, 3, and 4 of the 2x2 table. As you can see the relative frequencies closely resemble those in your 20, 37, 37, 20 table.
probs<-c(20, 37, 37, 20)
N<-1000 #sample size
mysample<-sample(x=c(1,2,3,4), size=N, replace = TRUE, prob = probs/sum(probs))
table(mysample)/N
#Run Again for 100,000 samples
N<-100000
mysample<-sample(x=c(1,2,3,4), size=N, replace = TRUE, prob = probs/sum(probs))
#The relative probabilities should be similar to those in the original table
table(mysample)/N

How to apply Shapiro test in R?

I'm pretty new to statistics and I need your help. I just installed the R software and I have no idea how to work with it. I have a small sample looking as follows:
Group A : 10, 12, 14, 19, 20, 23, 34, 41, 12, 13
Group B : 8, 12, 14, 15, 15, 16, 21, 36, 14, 19
I want to apply t-test but before that I would like to apply Shapiro test to know whether my sample comes from a population which has a normal distribution. I know there is a function shapiro.test() but how can I give my numbers as an input to this function?
Can I simply enter shapiro.test(10,12,14,19,20,23,34,41,12,13, 8,12, 14,15,15,16,21,36,14,19)?
OK, because I'm feeling nice, let's work through this. I am assuming you know how to run commands, etc. First up, put your data into vector:
A = c(10, 12, 14, 19, 20, 23, 34, 41, 12, 13)
B = c(8, 12, 14, 15, 15, 16, 21, 36, 14, 19)
Let's check the help for shapiro.test().
help(shapiro.test)
In there you'll see the following:
Usage
shapiro.test(x)
Arguments
x a numeric vector of data values. Missing values are allowed, but
the number of non-missing values must be between 3 and 5000.
So, the inputs need to be a vector values. Now we know that we can run the 'shapiro.test()' function directly with our vectors, A and B. R uses named arguments for most of its functions, and so we tell the function what we are passing in:
shapiro.test(x = A)
and the result is put to the screen:
Shapiro-Wilk normality test
data: A
W = 0.8429, p-value = 0.0478
then we can do the same for B:
shapiro.test(x = B)
which gives us
Shapiro-Wilk normality test
data: B
W = 0.8051, p-value = 0.0167
If we want, we can test A and B together, although it's hard to know if this is a valid test or not. By 'valid', I mean imagine that you are pulling numbers out of a bag to get A and B. If the numbers in A get thrown back in the bag, and then we take B, we've just double counted. If the numbers in A didn't get thrown back in, testing x =c(A,B) is reasonable because all we've done is increased the size of our sample.
shapiro.test(x = c(A,B))
Do these mean that the data are normally distributed? Well, in the help we see this:
Value
...
p.value an approximate p-value for the test. This is said in Royston (1995) to be adequate for p.value < 0.1
So maybe that's good enough. But it depends on your requirements!

Resources