How to add baseline prevalence in survival/cumulative incidence curve in R - r

I have a approx. 40 years' follow-up data from a cohort where I want to calculate and plot cumulative incidence of an event (DM) in five groups (Clustnumner) and want to show them in same plot. I have made an initial survival curve using the code
fit = survfit(Surv(Folowup_time, DM_inc) ~ as.factor(Clustnumber), data=Co_followUp and then
plot(fit, conf.int=F, xlab = "Time in years", ylab = "Survival probability") to get the following survival curve. Each line represent one group.
I converted it to cumulative incidence plot using code plot(fit, conf.int=F, fun = function(x) 1-x, xlab = "Time in years", ylab = "Cumulative incidence")and got following plot
Question My question is if for the said event 'DM' that is showing incidence of event over time, I have another column (DM_B) that is also showing it's prevalence at baseline (at follow-up start date) and I want to show that prevalence in the plot e.g., say I don't want my cumulative incidence plot to start from zero and instead I want a line to start from 0.3 to show that 30% individual had prevalent event at baseline when the follow-up started, how do I go with it to get similar graphs? A help will be appreciated as I am really struggling with it :(

Related

Why are my survival curves not displaying as stratified categories?

First time poster so I hope I've enough information here.
I'm trying to show my survival curves in 4 categories. The analysis is stratified according to my 4 categories in survival tables, but the survival plots do not depict these 4 categories and instead show many different survival curves. What am I doing wrong here?
Survival curve
# categorise ADAMTS13 levels
TMAdata$ADAMTS13level.f<-cut(TMAdata$ADAMTS13level,
breaks=c(0.0,10.0,40.0, 60.0,160.0),
labels=c('0-10.0',
'10.1-40.0',
'40.1-60.0',
'60.1-160.0'))
summary(TMAdata$ADAMTS13level.f)
# use 10-40% ADAMTS13 level as reference point
TMAdata$ADAMTS13level.f = relevel(TMAdata$ADAMTS13level.f, ref="10.1-40.0")
# platelet recovery according to ADAMTS13 level (reference point is 10.1-40.0)
pltrecovery_ADAMTS13_table <- survfit(Surv(TMAdata$Daysplateletrecovery, TMAdata$Recoveredplatelets)~TMAdata$ADAMTS13level.f)
summary(pltrecovery_ADAMTS13_table)
plot(pltrecovery_ADAMTS13_table, conf.int=0,
xlab = "Days",
ylab = "Probability of not achieving platelet count =>150")
legend("topright", inset=0.03,
c("0-10.0",
"10.1-40.0",
"40.1-60.0",
"60.1-160.0"),
lty=1:2,
lwd=2,
cex=1)
The extra lines are confidence boundaries. Specifying conf.int=0 does not suppress confidence interval plotting. That's arguably incorrect with it's easy to demonstrate using the first example in ?survfit.formula. If you don't want the confidence boundaries, then leave out the conf.int parameter all-together.
The legend will only have two types of lines and they probably won't match the types of the survival plotted.

Problems plotting exponential formula

I am trying to make an exponential plot of a variable. The coefficient of the variable is very high (350 million) from the GLM results. With other variables that had lower coefficients, I was able to plot them easily with no issues. I have been trying to set the sequence interval smaller and smaller but it keeps crashing r when I try to plot it.
Any suggestions? I have tried breaking up the data already with no luck.
My vectors are very large numerics as well (18Mb).
chlautcnod<-seq.int(0, 2.45259, 0.000001)
chlautcnodline<- glmnodosaALL$coefficients[1] +
glmnodosaALL$coefficients[2]*mean(bornodosaAP$Chl_spring) +
glmnodosaALL$coefficients[3]*chlautcnod + glmnodosaALL$coefficients[4]*mean(bornodosaAP$Dist_coast) +
glmnodosaALL$coefficients[5]*mean(bornodosaAP$Chl_winter)+ glmnodosaALL$coefficients[6]*mean(bornodosaAP$Depth) +
glmnodosaALL$coefficients[7]*mean(bornodosaAP$Chl_yr_avg)+ glmnodosaALL$coefficients[8]*mean(bornodosaAP$Dist_complete_river) +
glmnodosaALL$coefficients[9]*mean(bornodosaAP$Temp_yr_min)+ glmnodosaALL$coefficients[10]*mean(bornodosaAP$Chl_summer)+
glmnodosaALL$coefficients[11]*mean(bornodosaAP$Chl_yr_max)+ glmnodosaALL$coefficients[12]*mean(bornodosaAP$SWH_summer)+
glmnodosaALL$coefficients[13]*mean(bornodosaAP$SWH_yr_min)+ glmnodosaALL$coefficients[14]*mean(bornodosaAP$SWH_spring)
gc(plot(exp(1)^chlautcnodline~chlautcnod, xlab = (expression(paste("Chlorophyll-α Autumn (mg/m"^"3"~")"))), ylab= "Probability of C. nodosa occurance",ylim=c(0,0.05),xlim=c(0.15,0.17), type="l", bty="l")

problems with Scatterplot

enter image description hereI'm trying to visualize correlation between two columns in my dataset.
I tried to use plot(), scatterplot, but the result is not a readable graph.
For example I used this function:
scatter.smooth(x=Lifestyles$SLEEP_HOURS, y=Lifestyles$SUFFICIENT_INCOME, main="sleep hours and Income", xlab = "Sleep hours", ylab = "income, 1,2")
About dataset.
I have about 12000 observations and 20 columns.
both columns are as.numeric and integer.
here I'm trying to observe number of sleep hours and how many tasks completed daily
my link to my dataset: https://www.kaggle.com/ydalat/lifestyle-and-wellbeing-data
Thank you all in advance!

Plot multiple trajectory lines with different max X values using for loop

I have population simulation data with 200 replications of 50, 1 year iterations. I want to plot all 200 trajectories as lines (y=population size, x=year) on the same plot. The following code meets this need...
baseline<-read.csv("C:\\Users\\Chelsea Mitchell\\Desktop\\Poster materials\\chinook baseline raw.csv", header=T)
plot(baseline$time.step..year[1:50],
baseline$pop.size[1:50], type="l", main="baseline model"
, xlab= "Year", ylab= "Population size", ylim= c(0,2e+08))
for (i in 2:(length(baseline$time.step..year)/50))
{lines(baseline$time.step..year[(1+(i-1)*50):(i*50)],
baseline$pop.size[(1+(i-1)*50):(i*50)])}
image of appropriate plot without extinction
But in some cases, the population goes extinct, and trajectories stop before year 50. How can I tell the for loop to stop the trajectory line before the following simiulation data starts again at year 1?
image of problem plot with extinction
Here is a constructed, simple version of the data and code with the same issue. The maximum number of years is 10, so the for loop plots trajectory lines for "year" sequences of 1:10. In cases where pop.size reaches 0, the replications stop, so the trajectory plotting should also stop.
rep <- c(1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
year <- c(1,2,3,4,1,2,3,4,5,6,1,2,3,4,5,6,7,8,9,10)
pop.size <- c(526,120,165,0,634,637,452,130,189,0,599,436,320,245,336,225,134,37,87,0)
extinct.pop <- data.frame(rep,year,pop.size)
plot(extinct.pop$year[1:10], extinct.pop$pop.size[1:10],
type="l", xlim= c(0,10))
for (i in 2:(length(extinct.pop$year)/10)){
lines(extinct.pop$year[(1+(i-1)*10):(i*10)],
extinct.pop$pop.size[(1+(i-1)*10):(i*10)])}
Thank you for your help!
Your codes are plotting just one line.
One alternative is using ggplot2 and you won't need the loop. Do you want something like this?
library(ggplot2)
ggplot(data=extinct.pop, aes(x=year, y=pop.size, group = rep)) + geom_line()

The variance of forecasted time serie data

I have used forecast() to the first 1526 data points in my data serie VIX, estimating the final 300 data points. I want to measure the goodness of fit with the variance of the difference between actual historical data and forecasted result. Is there an easy way of doing this in R?
The code currently is
r_vix_3b=diff(log(VIX[,"Close"]))
num_train=1526
h=300
plot_start=1300
plot_labels=126 # interval between x-axis major tick marks
data_fcst_pts=num_train:(num_train+h)
fit_1step=auto.arima(r_vix_3b[1:num_train])
forecast_1step = forecast(fit_1step, h=h)
plot(forecast_1step, xaxt="n", xlim=c(plot_start, num_train+h), ylim=c(-0.3, 0.3)) #ylim=range(r_vix)
points(data_fcst_pts, r_vix_3b[data_fcst_pts],col="blue", type="l", pch=16)
axis(1, at=seq(0,length(r_vix_3b)+h-1,plot_labels), labels=VIX$Date[seq(2, length(r_vix_3b)+h,plot_labels)] )
diff_1_step = r_vix_3b[1526:1825] - forecast_1step
Please check ?accuracy function from forecast package.
I guess in your case it would be something like:
acc <- accuracy(forecast_1step,diff_1_step)

Resources