generate difference between 2 survdiff objects - r

Is it possible to substract one survdiff object from another one in R, using the survival package?
I want to plot a figure that shows in which intervals one survival curve is higher/lower than the other and by how much.

one possible solution with survA and survB as survdiff-objects:
interval <- 0:2500
# choose a different time interval if you want
sumA <- summary(survA, time = interval)
sumB <- summary(survB, time = interval)
both <- data.frame(time = interval, A = sumA$surv, B = sumB$surv)
both$diff <- both$B - both$A
# or both$diff <- both$A - both$B
plot(x = both$time, y = both$diff, type = "line")

Related

Moving average on several time series using ggplot

Hi I try desperately to plot several time series with a 12 months moving average.
Here is an example with two time series of flower and seeds densities. (I have much more time series to work on...)
#datasets
taxon <- c(rep("Flower",36),rep("Seeds",36))
density <- c(seq(20, 228, length=36),seq(33, 259, length=36))
year <- rep(c(rep("2000",12),rep("2001",12),rep("2002",12)),2)
ymd <- c(rep(seq(ymd('2000-01-01'),ymd('2002-12-01'), by = 'months'),2))
#dataframe
df <- data.frame(taxon, density, year, ymd)
library(forecast)
#create function that does a Symmetric Weighted Moving Average (2x12) of the monthly log density of flowers and seeds
ma_12 <- function(x) {
ts_x <- ts(x, freq = 12, start = c(2000, 1), end = c(2002, 12)) # transform to time-series object as it is necessary to run the ma function
return(ma(log(ts_x + 1), order = 12, centre = T))
}
#trial of the function
ma_12(df[df$taxon=="Flower",]$density) #works well
library(ggplot2)
#Trying to plot flower and seeds log density as two time series
ggplot(df,aes(x=year,y=density,colour=factor(taxon),group=factor(taxon))) +
stat_summary(fun.y = ma_12, geom = "line") #or geom = "smooth"
#Warning message:
#Computation failed in `stat_summary()`:
#invalid time series parameters specified
Function ma_12 works correctly. The problem comes when I try to plot both time-series (Flower and Seed) using ggplot. I cannot define both taxa as different time series and apply a moving average on them. Seems that it has to do with "stat_summary"...
Any help would be more than welcome! Thanks in advance
Note: The following link is quite useful but can not directly help me as I want to apply a specific function and plot it in accordance to the levels of one group variable. For now, I can't find any solution. Any way, thank you to suggest me this.
Multiple time series in one plot
This is what you need?
f <- ma_12(df[df$taxon=="Flower", ]$density)
s <- ma_12(df[df$taxon=="Seeds", ]$density)
f <- cbind(f,time(f))
s <- cbind(s,time(s))
serie <- data.frame(rbind(f,s),
taxon=c(rep("Flower", dim(f)[1]), rep("Seeds", dim(s)[1])))
serie$density <- exp(serie$f)
library(lubridate)
serie$time <- ymd(format(date_decimal(serie$time), "%Y-%m-%d"))
library(ggplot2)
ggplot() + geom_point(data=df, aes(x=ymd, y=density, color=taxon, group=taxon)) +
geom_line(data=serie, aes(x= time, y=density, color=taxon, group=taxon))

ts.plot() not plotting Time Series data against custom x-axis

I am having issues with trying to plot some Time Series data; namely, trying to plot the date (increments in months) against a real number (which represents price).
I can plot the data with just plot(months, mydata) with no issue, but its in a scatter plot format.
However, when I try the same with ts.plot i.e. tsplot(months, mydata), I get the following error:
Error in .cbind.ts(list(...), .makeNamesTs(...), dframe = dframe, union = TRUE) : no time series supplied
I tried to bypass this by doing tsplot(ts(months, mydata)), but with this I get a straight linear line (which I know isn't correct).
I have made sure that both months and mydata have the same length
EDIT: What I mean by custom x-axis
I need the data to be in monthly increments (specifically from 03/1998 to 02/2018) - so I ran the following in R:
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
Now that I have attained the monthly increments, I need the above variable, months, to act as the x-axis for the Time Series plot (perhaps more accurately, the time index).
With package zoo you can do the following.
library(zoo)
z <- zoo(mydata, order.by = months)
labs <- seq(min(index(z)), max(index(z)), length.out = 10)
plot(z, xaxt = "n")
axis(1, at = labs, labels = format(labs, "%m/%Y"))
Data creation code.
set.seed(1234)
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
n <- length(months)
mydata <- cumsum(rnorm(n))

Plot time series knowing only start time/date and sampling periods

I want to plot a density time series with following data:
density vector (4,2,5,8,4,6,4)
sampling period vector (unit: seconds) (2,2,2,2,3,2,2)
as you can see, the sampling period is not constant. I only know the starting date and time.
I somehow need to assign the start time date to the first measurement and then compute the following dates and times for the following measurements, but i don't know how exactly to code it.
Try converting first the desired vector in a ts, provided an initial starttime and period's cumsum.
I assumed that you sample a continous process (there are not spanned/death times)
require (lubridate)
require (tidyr)
require (ggplot2)
require (ggfortify)
require (timetk)
density <- c (4,2,5,8,4,6,4)
seconds <- c (2,2,2,2,3,2,2)
starttime <- 0
time <- 0 + cumsum (seconds)
df <- as.data.frame (cbind (time, seconds, density))
df$time <- as_datetime(df$time)
df$ts <- tk_ts (df, select = density)
autoplot (df$ts, ts.geom = 'bar', fill = 'blue')
Plot the density against the cumulative sum of the seconds added to the start.
dens <- c(4,2,5,8,4,6,4)
secs <- c(2,2,2,2,3,2,2)
st <- as.POSIXct("2000-01-01 00:00:00")
plot(st + cumsum(secs), dens, xlab = "", type = "l")

Compute smoothed mean on time series

I have got a data.frame where one column represents dates in years and the other column observations of e.g. sea level in mm.
I need to calculate the 10-year smoothed mean.
Here some fake data:
x = rnorm(1:100) #annual sea leavel rise
date = seq(1801,1900) #years from 1801 to 1900
df = data.frame(date,x) #create data.frame
Is there any R function that could help?
Is the smoothed mean the same as the moving average?
Thanks for any help and/or suggestion
The moving average is just the simplest case of the smoothed mean, which is widely used in the climate science. The R filter function which may be quite a convenient way to resolve your issue
# sample data
x <- rnorm(1:100)
date <- seq(1801,1900)
df <- data.frame(date,x)
# coefficients for moving average are the simplest ones
f10 <- rep(1/10,10)
df[,"x_10ma"] <- filter(df$x, f10, sides = 1)
# fast check
plot(x = df$date, y = df$x, col="red")
points(x = df$date, y = df$x_10ma,col="blue")
More advanced smoothing options are provided, e.g. by the 'TTR' or 'smooth' packages.

2D Histogram in R: Converting from Count to Frequency within a Column

Would appreciate help with generating a 2D histogram of frequencies, where frequencies are calculated within a column. My main issue: converting from counts to column based frequency.
Here's my starting code:
# expected packages
library(ggplot2)
library(plyr)
# generate example data corresponding to expected data input
x_data = sample(101:200,10000, replace = TRUE)
y_data = sample(1:100,10000, replace = TRUE)
my_set = data.frame(x_data,y_data)
# define x and y interval cut points
x_seq = seq(100,200,10)
y_seq = seq(0,100,10)
# label samples as belonging within x and y intervals
my_set$x_interval = cut(my_set$x_data,x_seq)
my_set$y_interval = cut(my_set$y_data,y_seq)
# determine count for each x,y block
xy_df = ddply(my_set, c("x_interval","y_interval"),"nrow") # still need to convert for use with dplyr
# convert from count to frequency based on formula: freq = count/sum(count in given x interval)
################ TRYING TO FIGURE OUT #################
# plot results
fig_count <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = nrow)) # count
fig_freq <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = freq)) # frequency
I would appreciate any help in how to calculate the frequency within a column.
Thanks!
jac
EDIT: I think the solution will require the following steps
1) Calculate and store overall counts for each x-interval factor
2) Divide the individual bin count by its corresponding x-interval factor count to obtain frequency.
Not sure how to carry this out though. .
If you want to normalize over the x_interval values, you can create a column with a count per interval and then divide by that. I must admit i'm not a ddply wiz so maybe it has an easier way, but I would do
xy_df$xnrows<-with(xy_df, ave(nrow, x_interval, FUN=sum))
then
fig_freq <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) +
geom_tile(aes(fill = nrow/xnrows))

Resources