Why doesn't this survfit plot start at 100% - r

When I plot the survfit plot of data with two different censoring events, the overall plot (s0) doesnt start at time = 0, pstate = 100%, but jumps to 100% when the first cencoring event occurs.
Here you can see in an example, where the jump occurs at time 1, that is the first cencoring event.
library(survival)
library(ggfortify)
library(tidyverse)
set.seed(1337)
dummy_data = tibble(time = sample.int(100, 100, replace = TRUE),
event = sample.int(3, 100, replace = TRUE))%>%
mutate(event = factor(event))
kaplanMeier <- survfit(Surv(time, event) ~ 1, data=dummy_data)
autoplot(kaplanMeier, facets = TRUE)

This does seem to be a bug in ggfortify. As a temporary fix, you can set the survival percentage at t = 0 to 100% by doing:
p <- autoplot(kaplanMeier, facets = TRUE)
p$layers[[1]]$data[1, c(5, 7, 8)] <- 1
p

Related

Difference in differences placebo test plot

How do I make graphs like this in R?
Lets say I have a dataset like this:
data <- tibble(date=sample(seq(as.Date("2006-01-01"),
as.Date("2019-01-01"), by="day"),
10000, replace = T),
treatment=sample(c(0,1),10000, replace= T),
after=ifelse(date>as.Date("2015-03-01"), 1, 0),
score=rnorm(10000)+ifelse(treatment*after==1, 0.2, 0)
)
and is doing a difference in differences analysis:
did <- lm(score~treatment+after+treatment*after, data=data)
summary(did)
How can I make a plot with placebo tests?
Just using plot_model function in sjPlot.
data <- tibble(date=sample(seq(as.Date("2006-01-01"),
as.Date("2019-01-01"), by="day"),
10000, replace = T),
treatment=sample(c(0,1),10000, replace= T),
after=ifelse(date>as.Date("2015-03-01"), 1, 0),
score=rnorm(10000)+ifelse(treatment*after==1, 0.2, 0)
)
did <- lm(score~treatment+after+treatment*after, data=data)
summary(did)
sjPlot::plot_model(did,vline = 'black',show.values = T) + ylim(-.25, .5)
vline means to add a horizontal line at x = 1;
show.values means whether values should be plotted or not.
You can check the details of argument of plot_model from here.

ggsave ggsurvplot with risk.table

I am trying to save a ggsurvplot with risk.table using ggsave. However, the output off ggsave is always just the risk.table. I also tried this and this. None is working.
library(data.table)
library(survival)
library(survminer)
OS <- c(c(1:100), seq(1, 75, length = 50), c(1:50))
dead <- rep(1, times = 200)
variable <- c(rep(0, times = 100), rep(1, times = 50), rep(2, times = 50))
dt <- data.table(OS = OS,
dead = dead,
variable = variable)
survfit <- survfit(Surv(OS, dead) ~ variable, data = dt)
ggsurvplot(survfit, data = dt,
risk.table = TRUE)
ggsave("test.png")
The main issue is that a ggsurvplot object is a list of plots. Hence, when using ggsave only the last plot or element of the list is saved.
There is already a GitHub issue on that topic with several workarounds, e.g. using one of the more recent suggestions this works fine for me
library(survival)
library(survminer)
OS <- c(c(1:100), seq(1, 75, length = 50), c(1:50))
dead <- rep(1, times = 200)
variable <- c(rep(0, times = 100), rep(1, times = 50), rep(2, times = 50))
dt <- data.frame(OS = OS,
dead = dead,
variable = variable)
survfit <- survfit(Surv(OS, dead) ~ variable, data = dt)
# add method to grid.draw
grid.draw.ggsurvplot <- function(x){
survminer:::print.ggsurvplot(x, newpage = FALSE)
}
p <- ggsurvplot(survfit, data = dt, risk.table = TRUE)
ggsave("test.png", p, height = 6, width = 6)

How do I add difference proportion among each levels of a categorical variable in R using ybl_svysummary^

I would like to reproduce the following table.Desired table How ever I can't figure out how to add the p-value next to the statistics. The p-value here compares the difference of proportion among each level of those two groups. I'm using this dataset from the library questionr in RStudio. I tried to add_difference(), but it doesn't do what I expected. Here is my Rcode of what I've done so far:
library(questionr)
data(hdv2003)
d <- hdv2003
d$sport2[d$sport == "Oui"] <- TRUE
d$grpage <- cut(d$age, c(16, 25, 45, 65, 99), right = FALSE, include.lowest =
TRUE)
d$etud <- d$nivetud
levels(d$etud) <- c(
"Primaire", "Primaire", "Primaire",
"Secondaire", "Secondaire", "Technique/Professionnel",
"Technique/Professionnel", "Supérieur"
)
d$etud <- forcats::fct_explicit_na(d$etud, "manquant")
d$sexe <- relevel(d$sexe, "Femme")
dw <- svydesign(ids = ~1, data = d, weights = ~poids)
dw %>%
tbl_svysummary(by = sexe,
include = c(sport,sexe , grpage, etud, relig, heures.tv ))

Exclude or set a unique color to the bottom triangle of a correlation matrix heatmap

I have created a correlation matrix of the mtcars dataset in plotly with:
# Load data
data("mtcars")
my_data <- mtcars[, c(1,3,4,5,6,7)]
# print the first 6 rows
head(my_data, 6)
res <- cor(my_data)
round(res, 2)
plot_ly(x=colnames(res), y=rownames(res), z = res, type = "heatmap") %>%
layout(
xaxis=list(tickfont = list(size = 30), tickangle = 45),
margin = list(l = 150, r = 50, b = 150, t = 0, pad = 4))
However I was instructed that I shouldn't be displaying the symmetrical, full heatmap, because it contains 50% redundant information (the top and bottom triangles above and below the diagonal have symmetrical, opposite values). If there is an option within the heatmap plotting package that we're using, to grey out (display as a single, uniform grey color) the bottom half of the heatmap.For example:
One option would be to not use the complete correlation dataset and filter out only one half of the matrix using upper.tri. You could even consider setting its diag argument to TRUE to get rid of the arguably unnecessary diagonal ones.
How about the below?
# Load data
library(plotly)
data("mtcars")
my_data <- mtcars[, c(1,3,4,5,6,7)]
# print the first 6 rows
head(my_data, 6)
res <- cor(my_data)
res[upper.tri(res)] <- NA
round(res, 2)
plot_ly(x=colnames(res), y=rownames(res), z = res, type = "heatmap") %>%
layout(
xaxis=list(tickfont = list(size = 30), tickangle = 45),
margin = list(l = 150, r = 50, b = 150, t = 0, pad = 4))

data manipulation - R

I am struggling with data manipulation in R. My dataset consists of variables type(5 factors), intensity(3 factors), damage(continous). I want to calculate mean damage(demage1, demage2 and damage3 separately) with respect to intensity and type. In onther words I want to summarize the average damage by type and intensity. I have created this small reproducible example of my data:
type <- sample(seq(from = 1, to = 5, by = 1), size = 50, replace = TRUE)
intensity <- sample(seq(from = 1, to = 3, by = 1), size = 50, replace = TRUE)
damage1 <- sample(seq(from = 1, to = 50, by = 1), size = 50, replace = TRUE)
damage2 <- sample(seq(from = 1, to = 200, by = 1), size = 50, replace = TRUE)
damage3 <- sample(seq(from = 1, to = 500, by = 1), size = 50, replace = TRUE)
dat <- cbind(type, intensity, damage1, damage2, damage3)
then to manipulate the data I have used the pipe operator %>% buy my commands seem not to work very well:
dat <- as.data.frame(dat)
dat %>%
filter(type == 1) %>%
group_by(intensity, damage) %>%
summarise(mean_damage = mean(Value))
I have read about multiple usefull functions here:
efficient reshaping using data tables
manipulating data tables
Do Faster Data Manipulation using These 7 R Packages
But I wasnt able to make any progress here. My question are:
What is wrong with my code?
Am I even going in the right direction here?
Is there some alternative how to do this?

Resources