Convert "survfit()" output to a matrix or dataframe - r

Example data below.
My basic problem is that running "survfit" by itself gives a nice column with median lifespan for each category, which is the thing I want to extract from my survfit data. Ideally I'd like to export this "survfit" output as a dataframe/table and ultimately save to .csv. But I get errors however I try.
Thanks for help/advice!
Example data:
df<-data.frame(Gtype = as.factor(c("A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C","C")),
Time=as.numeric(c("5","6","7","7","7","7","2","3","3","4","5","7","2","2","2","3","3","4")),
Status=as.numeric(c("1","1","1","1","0","0","1","1","1","1","1","1","1","1","1","1","1","1")))
library(survival)
exsurv<-survfit(Surv(df$Time,df$Status)~strata(df$Gtype))
exsurv
and the "survfit" output I want to get as a dataframe:
> exsurv<-survfit(Surv(df$Time,df$Status)~strata(df$Gtype))
> exsurv
Call: survfit(formula = Surv(df$Time, df$Status) ~ strata(df$Gtype))
n events median 0.95LCL 0.95UCL
strata(df$Gtype)=A 6 4 7.0 6 NA
strata(df$Gtype)=B 6 6 3.5 3 NA
strata(df$Gtype)=C 6 6 2.5 2 NA
edit:
An earlier version of this question included the print() function superfluously. "print(survfit)" and "survfit()" give the same result.

Yes broom::tidy function works.
'mymk1' - is the object output of using survfit on my raw survival data set
I tried this and it worked well
results <- broom::tidy(mykm1)
write.csv(results, "./Desktop/Rout/mykm1.csv")
## the output csv file created in my folder Rout inside my Desktop folder.
The csv file can then be imported easily into any word or spreadsheet.

As usual, I was making it way more complicated by not understanding the basic survival function. The basic summary(exsurv) does give you median lifespan, mean lifespan, confidence intervals, etc... from the survfit function.
exsurv<-survfit(Surv(Time, Status)~ strata(Gtype))
You can put the summary(exsurv) data into a table using the code below
survoutput<-summary(exsurv)$table
And then just save to .csv as an ouput
write.csv(survoutput, file="exsurvoutput.csv")

Related

Why is my custom function printing my row names in addition to the data?

I have the following custom function that I am using to create a table of summary statistics in R.
regression.stats<-function(fit){
formula<-fit$call;
data<-eval(getCall(fit)$data);
abserror<-abs(exp(fit$fitted.values)-data$bm)/exp(fit$fitted.values);
QMLE<-exp((sigma(fit)^2)/2);
smear<-sum(exp(fit$residuals))/nrow(data);
RE<-mean(data$bm)/mean(exp(fit$fitted.values));
CF<-(RE+smear+QMLE)/3;
adjPE<-mean(abs((exp(fit$fitted.values)*CF)-data$bm)
/(exp(fit$fitted.values)*CF));
SEE<-exp(sigma(fit)+4.6052)-100;
summary<-summary(fit)
statistics<-data.frame("df"=fit$df.residual,
"r2"=round(summary(fit)$r.squared,4),
"adjr2"=round(summary(fit)$adj.r.squared,4),
"AIC"=AIC(fit),"BIC"=BIC(fit),
"logLik"=logLik(fit),
"PE"=round(mean(abserror)*100,2),QMLE=round(QMLE,3),
smear=round(smear,3),RE=round(RE,3),CF=round(CF,3),
"adjPE"=round(mean(adjPE)*100,2),
"SEE"=round(SEE,2),row.names = print(substitute(fit)));
return(statistics)
}
I want to bind the resulting rows into a data.frame in order to produce a table of comparison statistics between regression analyses. For example, using the data from the mtcars dataset...
data(mtcars)
lm1<-(cyl~mpg,data=mtcars)
lm2<-(cyl~disp,data=mtcars)
lm2<-(disp~mpg,data=mtcars)
rbind(regression.stats(lm1),regression.stats(lm2),regression.stats(lm3))
I am creating this for an R Markdown html file and I want readers to be able to tell which regression equation produced which statistics. However when I run the code it also ends up printing a list of the names of the lm functions in addition to the regression statistics in the resulting html document.
I have managed to track the problem down to the line row.names = print(substitute(fit))) in my function. If I remove that line it no longer prints the lm name when running the function. However, what happens then is my rows are no longer associated with the correct model name. How can I adjust my function so that it only prints the name of the model function as the row name of the summary function, rather than creating an additional list?
The line
...
row.names = print(substitute(fit))
...
should be
row.names = deparse(substitute(fit))
Or simply substitute(fit) as this gets converted to character
as print doesn't have any return value and it is just printing on the console
After the change in function
rbind(regression.stats(lm1),regression.stats(lm2),regression.stats(lm3))
# df r2 adjr2 AIC BIC logLik PE QMLE smear RE CF adjPE SEE
#lm1 30 0.7262 0.7171 91.46282 95.86003 -42.73141 NaN 1.570 1.443000e+00 NA NA NaN 1.585700e+02
#lm2 30 0.8137 0.8075 79.14552 83.54273 -36.57276 NaN 1.359 1.317000e+00 NA NA NaN 1.189600e+02
#lm3 30 0.7183 0.7090 363.71635 368.11356 -178.85818 NaN Inf 1.861805e+65 NA NA NaN 1.092273e+31

What is the best way to manage/store result from either posthoc.krukal.dunn.test() or dunn.test() - where my input data is in dataframe format?

I am a newbie in R programming and seek help in analyzing the Metabolomics data - 118 metabolites with 4 conditions (3 replicates per condition). I would like to know, for each metabolite, which condition(s) is significantly different from which. Here is part of my data
> head(mydata)
Conditions HMDB03331 HMDB00699 HMDB00606 HMDB00707 HMDB00725 HMDB00017 HMDB01173
1 DMSO_BASAL 0.001289121 0.001578235 0.001612297 0.0007772231 3.475837e-06 0.0001221674 0.02691318
2 DMSO_BASAL 0.001158363 0.001413287 0.001541713 0.0007278363 3.345166e-04 0.0001037669 0.03471329
3 DMSO_BASAL 0.001043537 0.002380287 0.001240891 0.0008595932 4.007387e-04 0.0002033625 0.07426482
4 DMSO_G30 0.001195253 0.002338346 0.002133992 0.0007924157 4.189224e-06 0.0002131131 0.05000778
5 DMSO_G30 0.001511538 0.002264779 0.002535853 0.0011580857 3.639661e-06 0.0001700157 0.02657079
6 DMSO_G30 0.001554804 0.001262859 0.002047611 0.0008419137 6.350990e-04 0.0000851638 0.04752020
This is what I have so far.
I learned the first line from this post
kwtest_pvl = apply(mydata[,-1], 2, function(x) kruskal.test(x,as.factor(mydata$Conditions))$p.value)
and this is where I loop through the metabolite that past KW test
tCol = colnames(mydata[,-1])[kwtest_pvl <= 0.05]
for (k in tCol){
output = posthoc.kruskal.dunn.test(mydata[,k],as.factor(mydata$Conditions),p.adjust.method = "BH")
}
I am not sure how to manage my output such that it is easier to manage for all the metabolites that passed KW test. Perhaps saving the output from each iteration appending to excel? I also tried dunn.test package since it has an option of table or list output. However, it still leaves me at the same point. Kinda stuck here.
Moreover, should I also perform some kind of adjusted p-value, i.e FWER, FDR, BH right after KW test - before performing the posthoc test?
Any suggestion(s) would be greatly appreciated.

how to Reading Csv file and calculating mean in R dynamically?

I want to read a file and calculate the mean of it.
`>list
[1] "book1.csv" "book2.csv".
for book1
observation1
23
24
65
76
34
In books i have a variable observation 1 and observation 2 column for book 1 and 2 respectively. So i want to write a function where i can calculate mean of it.I am new to R and not able subset the variable of books. Can anyone please help me out in writing the function?
Try this. File represents the file to be read in (book1) and the variable represents the variable to take mean over (observation 1)
read.mean<-function(file,variable){
df<-read.csv(file)
mean.df <- mean(df[,variable])
return(mean.df)
}
Make sure to pass your arguments in quotes, i.e. read.mean("book1", "observation1"). There is a way to do it without the quotes (Passing a variable name to a function in R) but it is complicated.

R Refer to (part of) data frame using string in R

I have a large data set in which I have to search for specific codes depending on what i want. For example, chemotherapy is coded by ~40 codes, that can appear in any of 40 columns called (diag1, diag2, etc).
I am in the process of writing a function that produces plots depending on what I want to show. I thought it would be good to specify what I want to plot in a input data frame. Thus, for example, in case I only want to plot chemotherapy events for patients, I would have a data frame like this:
Dataframe name: Style
Name SearchIn codes PlotAs PlotColour
Chemo data[substr(names(data),1,4)=="diag"] 1,2,3,4,5,6 | red
I already have a function that searches for codes in specific parts of the data frame and flags the events of interest. What i cannot do, and need your help with, is referring to a data frame (Style$SearchIn[1]) using codes in a data frame as above.
> Style$SearchIn[1]
[1] data[substr(names(data),1,4)=="diag"]
Levels: data[substr(names(data),1,4)=="diag"]
I thought perhaps get() would work, but I cant get it to work:
> get(Style$SearchIn[1])
Error in get(vars$SearchIn[1]) : invalid first argument
enter code here
or
> get(as.character(Style$SearchIn[1]))
Error in get(as.character(Style$SearchIn[1])) :
object 'data[substr(names(data),1,5)=="TDIAG"]' not found
Obviously, running data[substr(names(data),1,5)=="TDIAG"] works.
Example:
library(survival)
ex <- data.frame(SearchIn="lung[substr(names(lung),1,2) == 'ph']")
lung[substr(names(lung),1,2) == 'ph'] #works
get(ex$SearchIn[1]) # does not work
It is not a good idea to store R code in strings and then try to eval them when needed; there are nearly always better solutions for dynamic logic, such as lambdas.
I would recommend using a list to store the plot specification, rather than a data.frame. This would allow you to include a function as one of the list's components which could take the input data and return a subset of it for plotting.
For example:
library(survival);
plotFromSpec <- function(data,spec) {
filteredData <- spec$filter(data);
## ... draw a plot from filteredData and other stuff in spec ...
};
spec <- list(
Name='Chemo',
filter=function(data) data[,substr(names(data),1,2)=='ph'],
Codes=c(1,2,3,4,5,6),
PlotAs='|',
PlotColour='red'
);
plotFromSpec(lung,spec);
If you want to store multiple specifications, you could create a list of lists.
Have you tried using quote()
I'm not entirely sure what you want but maybe you could store the things you're trying to get() like
quote(data[substr(names(data),1,4)=="diag"])
and then use eval()
eval(quote(data[substr(names(data),1,4)=="diag"]), list(data=data))
For example,
dat <- data.frame("diag1"=1:10, "diag2"=1:10, "other"=1:10)
Style <- list(SearchIn=c(quote(data[substr(names(data),1,4)=="diag"]), quote("Other stuff")))
> head(eval(Style$SearchIn[[1]], list(data=dat)))
diag1 diag2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6

ANOVA in R using summary data

is it possible to run an ANOVA in r with only means, standard deviation and n-value? Here is my data frame:
q2data.mean <- c(90,85,92,100,102,106)
q2data.sd <- c(9.035613,11.479667,9.760268,7.662572,9.830258,9.111457)
q2data.n <- c(9,9,9,9,9,9)
q2data.frame <- data.frame(q2data.mean,q2data.sq,q2data.n)
I am trying to find the means square residual, so I want to take a look at the ANOVA table.
Any help would be really appreciated! :)
Here you go, using ind.oneway.second from the rspychi package:
library(rpsychi)
with(q2data.frame, ind.oneway.second(q2data.mean,q2data.sd,q2data.n) )
#$anova.table
# SS df MS F
#Between (A) 2923.5 5 584.70 6.413
#Within 4376.4 48 91.18
#Total 7299.9 53
# etc etc
Update: the rpsychi package was archived in March 2022 but the function is still available here: http://github.com/cran/rpsychi/blob/master/R/ind.oneway.second.R (hat-tip to #jrcalabrese in the comments)
As an unrelated side note, your data could do with some renaming. q2data.frame is a data.frame, no need to put it in the title. Also, no need to specify q2data.mean inside q2data.frame - surely mean would suffice. It just means you end up with complex code like:
q2data.frame$q2data.mean
when:
q2$mean
would give you all the info you need.

Resources