I have made a regression that looks like this:
survfit(Surv(YearsToEvent, Event) ~ CancerType, data = RegressionData)
From that I get an output table like this (I removed some of the columns for readability):
n
Cancer A 100
Cancer B 200
However, when I plot the output using ggsurvplot where I have a plot + a table with "number at risk", I want to be able to manually adjust the order of the legend. That is, I want to be able to put Cancer B before Cancer A. I found a similar thread here on SO: How to reorder strata in survfit object for ggsurvplot legend?. However, I did not find any answer in it. I have tried to sort the data frame RegressionData before the regression, without sucess.
Could anyone help me out here?
Here is an example how you could achieve what you want:
The main step is to transfrom the grouping variable to factor class. Then you can define the levels by hand:
The other steps are the same:
library(survminer)
library(survival)
library(tidyverse)
lung1 <- lung %>%
mutate(sex = factor(sex, levels = c(2, 1)))
ggsurvplot(
fit = survfit(Surv(time, status) ~ sex, data = lung1),
risk.table = TRUE,
xlab = "Days",
ylab = "Overall survival probability")
Related
I would like to plot a kaplan meier curve (KM) and cumulative events or cumulative incidence function (CIF) in one plot as a lattice.
I have switched recently from SAS to R, and in SAS you can do it all in one step using a macro (See this image), but I couldn't find something similar in R yet.
Currently, I run a code for two separate graphs. The first plots survfit object using ggsurvplot which results in a KM curve, While the second plots a cuminc object after a number of transformations using ggplot. ggcompetingrisks was not very optimizable, so I don't use it. Also I am interested in plotting one certain competing risk for example death from cancer, and not all competing risks.
Here is an example of my current code using the BMT data-frame from the survminer package.
library(survminer)
library(cmprsk)
data(BMT)
# I'll add the variable Death to plot overall survival.
BMT <- mutate(BMT, death = ifelse (status == 1, 1, 0))
# KM plot:
figKM <- ggsurvplot(survfit(Surv(ftime, death) ~ dis, BMT))
figKM
# CIF plot:
cif <- cuminc(ftime = BMT$ftime, fstatus = BMT$status, group = BMT$dis, cencode = 0)
cifDT <- cif %>%
list_modify("Tests" = NULL) %>%
map_df(`[`, c("time", "est"), .id = "id") %>%
filter(id %in% c("0 1","1 1")) # to keep the incident I want
figCIF <- ggplot (cifDT, aes(x = time, y = est, color = id)) + geom_step(lwd = 1.2)
figCIF
is there a way to put figKM and figCIF together in a lattice plot? May by plotting them differently?
If you look at the contents of your figKM object with class and str you see that the first item in that list is a "plot", so this seems to do what you asked for in your comment:
library(cowplot)
plot_grid(figKM[[1]], figKM[[1]], nrow = 2)
I'm not a tidyverse-user so the map_df is perhaps some clone of the base function Reduce or Map but I don't have enough experience to a) know which package to load, or b) have the ability to figure out what is being done with your piped expressions. Commented code might have been more understandable. I am quite experienced with the survival package.
After calling ggsurvplot(...) I want to superimpose some points from another data frame df containing two columns time and survival. I'm looking for tips on accomplishing this.
Edit: some code as an example
require("survival")
require("survminer")
fit<- survfit(Surv(time, status) ~ sex, data = lung)
# Basic survival curves
ggsurvplot(fit, data = lung)
# Example points
x <- fit$time
y <- fit$n.risk
How would I superimpose points(x, y) on ggsurvplot plot.
The ggplot-type object is part of the object returned by ggsurvplot() and can be addressed as $plot:
ggplot1 <- ggsurvplot(fit, data = lung)$plot
You can work with it as with a usual ggplot object and add other layers. For your specific example, however, it is not clear how you want to define Y coordinate of your points: fit$n.risk is a number between 1 and 138 while your plot is in 0..1 range. Here is one option:
ggplot1 <- ggsurvplot(fit, data = lung)$plot
df1 <- data.frame(time=fit$time, nRisk=fit$n.risk, nRiskRel=fit$n.risk/max(fit$n.risk))
ggplot1 + geom_point(aes(x=time, y=nRiskRel), data = df1, alpha=0.5, size=3)
You may want to add colors etc.
I am trying to plot my gam result. I want to turn the labels of the plots into Chinese. But, the x label will be used for all plots. How to creat different x-labels for different plots?
fit <- gam(happiness ~ s(age) + s(edu) + s(mobility), family = octa(R=5), data = data) plot(fit, xlab = c("年龄","教育”))
You could simply change the column names, not sure how to do this in Chinese though.
library(mgcv)
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2)[1:3]
names(dat)[2:3] <- c("ONE", "TWO")
b <- gam(y~s(ONE)+s(TWO),data=dat)
plot(b,pages=1,residuals=TRUE) ## show partial residuals
I have just performed a PCA analysis for a large data set with approximately 20,000 variables. To do so, I used the following code:
df_pca <- prcomp(df, center=FALSE, scale.=TRUE)
I am curious how my variables influenced PCA.1 (Dimension 1 of the PCA analysis) and PCA.2 (Dimension 2 of the PCA analysis).
I used the following code to look at how each variable influenced the dimensional analysis:
fviz_pca_var(df_pca, col.var = "black")
However, this creates a graph with all 20,000 of my variables and since there is so much information, it is unreadable.
Is there a way to select the variables that have most influenced PCA.1 and PCA.2 and graph only those?
Thank you in advance!
If you want to see the dimension that you want, you should do this:
library(factoextra)
fviz_contrib(df_pca,
choice = "var",
axes = 5,
top = 10, color = 'darkorange3', barfill = 'blue4',fill ='blue4')
with the axes you can choose the dim that you want to see. In this case you are seeing the dimension number 5.
If you want to see the variables and the curve that help you to choose the number of dimension, you can use this:
fviz_screeplot(df_pca, ncp=14,linecolor = 'darkorange3', barfill = 'blue4',
barcolor ='blue4', xlab = "Dimensioni",
ylab = '% varicance',
main = 'Reduction of components')
get_eigenvalue(df_pca)
What you want to do is first get the actual table that correlates the synthetic variable w/ the real variables. Do that like this:
a <- df_pca$rotation
Then we can use dplyr to manipulate the data frame and extract what we want:
library(dplyr)
library(tibble)
a %>% as.data.frame %>% rownames_to_column %>%
select(rowname, PC1, PC2) %>% arrange(desc(PC1^2+PC2^2)) %>% head(10)
The above will organize show the top 10 most important variables for PC1 and PC2. You can run the same thing for PC1 only by changing to arrange(desc(abs(PC1))), or PC2 by changing to arrange(desc(abs(PC2)))... and see more or less than 10 variables by changing head(10).
I'm trying to plot observed and predicted variables on the same plot in lattice. My data is a repeated dataset and I've tried a few things which haven't worked. Any assistance would be much appreciated. Code is given below.
library(nlme)
library(lattice)
# add random conc predictions to data
Theoph$predConc <- rnorm(132, 5)
# My attempt at plotting both predConc and conc against time on the same plot
lattice::xyplot(predConc + conc ~ Time | Subject, groups=Subject, data=Theoph, type="l", layout = c(4,4))
As you can see, it doesn't seem to be doing what I want it to do. Ideally I would like the "conc" and "predConc" to be in different colours but appear together on each panel for each Id so I can compare the two easily.
As was suggested in the comments, it is fixed simply by dropping groups = Subject.
lattice::xyplot(predConc + conc ~ Time | Subject, data = Theoph, type = "l",
auto.key = TRUE)