I'm trying to calculate the doubling time of cells using a scatterplot. This is my dataframe
df = data.frame("x" = 1:5, "y" = c(246, 667, 1715, 4867, 11694))
and I've graphed this dataframe using this code
plot(df$x, df$y, xlab = "days", ylab = "cells mL -1")
Does anyone know how to calculate the doubling time of these cells using the graph? the equation for doubling time is (ln(2)/rate constant)
Plot log2(y) vs. x suppressing the Y axis so that we can build a nicer one. We also improved the Y axis label slightly. Then use axis to build a pretty axis and calculate the doubling time. Note that the formula for doubling time in the question works if the rate constant is the slope of the log(y) ~ x regression line but if we use the regression log2(y) ~ x, i.e. log2 instead of log, then the correct formula is just 1/slope. We show both below.
plot(df$x, log2(df$y), xlab = "days", ylab = "cells/mL", yaxt = "n")
s <- 1:round(log2(max(df$y)))
axis(2, s, parse(text = sprintf("2^%d", s)))
fm <- lm(log2(y) ~ x, df)
abline(fm)
doubling.time <- 1/coef(fm)[[2]]
doubling.time
## [1] 0.7138163
log(2)/coef(lm(log(y) ~ x, df))[[2]] # same
## [1] 0.7138163
legend("topleft", paste("doubling time:", round(doubling.time, 3), "days"), bty = "n")
You can visualize the constant rate of change with ggplot2 by scaling the y-axis accordingly:
library(dplyr)
library(ggplot2)
library(broom)
library(scales)
df = data.frame("x" = 1:5, "y" = c(246, 667, 1715, 4867, 11694))
fit <- lm(data = df, log2(y) ~ x)
tidy_fit <- tidy(fit) %>%
mutate(x = 3, y = 2048)
ggplot(df, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(name = "log2(y)",
trans = 'log2',
breaks = trans_breaks("log2", function(x) 2^x),
labels = trans_format("log2", math_format(2^.x))) +
geom_smooth(method = "lm", se = FALSE) +
geom_text(tidy_fit,
mapping = aes(
x = x,
y = y,
label = paste0("log2(y) = ", round(estimate[1], 2), " + ", round(estimate[2], 2), "x",
"\n", "Doubling Time: ", round(1 / tidy_fit$estimate[2], 2), " Days")
),
nudge_x = -1,
nudge_y = 0.5,
hjust = 0)
Created on 2020-02-03 by the reprex package (v0.3.0)
You can plot the points to show the exponential rise and then linearize the function by applying log2 to the y values. With that you can plot and do a linear fit:
df = data.frame("x" = 1:5, "y" = c(246, 667, 1715, 4867, 11694))
plot(df) # plot not displayed
plot(df$x, log2(df$y))
abline(lm(log2(y)~x,df))
lm(log2(y)~x,df)
#-------------------
Call:
lm(formula = log2(y) ~ x, data = df)
Coefficients:
(Intercept) x
6.563 1.401 #the x-coefficient is the slope of the line
#---------------------
log(2)/1.4
#[1] 0.4951051
Checking with the original (not-displayed plot that does look like a sensible estimate of doubling time. Be sure to cite this posting if this happens to be a homework problem.
If I were tasked with using the original graph, first draw an exponential curve by hand. I would then draw two horizontal lines at y= 2000 and y=4000 and then drop perpendicular lines from their intersections with the curve and read off the difference in x values on the horizontal axis.That is what I meant by my comment above that I "checked" the log2/x-coef value for sensibility.
Related
I am beginning with ML in R, and I really like the idea of visualize the results of my calculations, I am wondering how to plot a Prediction.
library("faraway")
library(tibble)
library(stats)
data("sat")
df<-sat[complete.cases(sat),]
mod_sat_sal <- lm(total ~ salary, data = df)
new_teacher <- tibble(salary = 40)
predict(mod_sat_sal, new_teacher)
Expected result:
Data and Regression Model
data(sat, package = "faraway")
df <- sat[complete.cases(sat), ]
model <- lm(total ~ salary, data = df)
Method (1) : graphics way
# Compute the confidence band
x <- seq(min(df$salary), max(df$salary), length.out = 300)
x.conf <- predict(model, data.frame(salary = x),
interval = 'confidence')
# Plot
plot(total ~ salary, data = df, pch = 16, xaxs = "i")
polygon(c(x, rev(x)), c(x.conf[, 2], rev(x.conf[, 3])),
col = gray(0.5, 0.5), border = NA)
abline(model, lwd = 3, col = "darkblue")
Method (2) : ggplot2 way
library(ggplot2)
ggplot(df, aes(x = salary, y = total)) +
geom_point() +
geom_smooth(method = "lm")
I have made a plot of a polynomial function: y = x^2 - 6*x + 9
with a series of several points in a sequence + minor standard error in y. I used these points to construct a spline model for that function from the raw data points, and then I calculated the derivative from the spline model with R's predict() function and then I added both of the spline curves to the plot.
By the way, the expected derivative function is this: dy / dx = 2*x - 6
The original function I colored blue and the 1st derivative function I colored red. I wish to add legends to these plots, but I'm finding that difficult since I did not assign any points to the plots, as I declared the data-frames within the geom_smooth() functions.
The code I'm using is this:
library(ggplot2)
# Plot the function: f(x) = x^2 - 6x + 9
# with a smooth spline:
# And then the deriviative of that function from predicted values of the
# smoothed spline: f ' (x) = 2*x - 6
# Get a large sequence of x-values:
x <- seq(from = -10, to = 10, by = 0.01)
# The y-values are a function of each x value.
y <- x^2 - 6*x + 9 + rnorm(length(x), 0, 0.5)
# Fit the curve to a model which is a smoothed spine.
model <- smooth.spline(x = x, y = y)
# Predict the 1st derivative of this smoothed spline.
f_x <- predict(model, x = seq(from = min(x), to = max(x), by = 1), deriv = 1)
# Plot the smoothed spline of the original function and the derivative with respect to x.
p <- ggplot() + theme_bw() + geom_smooth(data = data.frame(x,y), aes(x = x, y = y), method = "loess", col = "blue", se = TRUE) + geom_smooth(data = data.frame(f_x$x, f_x$y), aes(x = f_x$x, y = f_x$y), method = "loess", col = "red", se = TRUE)
# Set the bounds of the plot.
p <- p + scale_x_continuous(breaks = scales::pretty_breaks(n = 20), limits = c(-5, 10)) + scale_y_continuous(breaks = scales::pretty_breaks(n = 20), limits = c(-10, 10))
# Add some axis labels
p <- p + labs(x = "x-axis", y = "y-axis", title = "Original Function and predicted derivative function")
p <- p + scale_fill_manual(values = c("blue", "red"), labels = c("Original Function", "Derivative Function with respect to x"))
print(p)
I was hoping that I could add the legend with scale_fill_manual(), but my attempt does not add a legend to the plot. Essentially, the plot I get generally looks like this, minus the messy legend that I added in paint. I would like that legend, thank you.
I did this because I want to show to my chemistry instructor that I can accurately measure the heat capacity just from the points from differential scanning calorimetry data for which I believe the heat capacity is just the first derivative plot of heat flow vs Temperature differentiated with respect to temperature.
So I tried to make a plot showing the original function overlayed with the 1st derivative function with respect to x, showing that the plot of the first derivative made only from a spline curve fitted to raw data points reliably produces the expected line dy / dx = 2 * x - 6, which it does.
I just want to add that legend.
Creating a data frame with you data and use color within aesthetics is the most common way of doing this.
df <- rbind(
data.frame(data='f(x)', x=x, y=y),
data.frame(data='f`(x)', x=f_x$x, y=f_x$y))
p <- ggplot(df, aes(x,y, color=data)) + geom_smooth(method = 'loess')
p <- p + scale_x_continuous(breaks = scales::pretty_breaks(n = 20), limits = c(-5, 10)) + scale_y_continuous(breaks = scales::pretty_breaks(n = 20), limits = c(-10, 10))
p <- p + labs(x = "x-axis", y = "y-axis", title = "Original Function and predicted derivative function")
p <- p + scale_color_manual(name = "Functions", values = c("blue", "red"), labels = c("Original Function", "Derivative Function with respect to x"))
print(p)
I'm trying to add error bars to line graphs to the following script.
#####Plot FW roses####
ntreatments <- max(df$Treats)
#get the range for the x and y axis
x2range <- range(df$Days)
y2range <- range(df$FWs)
# set up plot
plot(x2range, y2range,
type = "n",
xlab= "Time (days)",
ylab= "Fresh weight (g)")
colors <- c("blue", "red", "black", "darkgreen", "Purple")
linetype <- c(1:ntreatments)
plotchar <- seq(18, 18+ntreatments, 1)
# add lines
for(i in 1:ntreatments) {
tr2 <- subset(df, Treats==i)
lines(tr2$Days, tr2$FWs, type="b",
lwd=1.5,
lty=linetype[i],
col=colors[i],
pch=plotchar[i])
}
# add a title and subtitle
title("Fresh weight")
# add a legend
legend(x2range[1],
y2range[2],
ncol = 2,
1:ntreatments,
cex=0.8,
col=colors,
pch=plotchar,
lty=linetype,
title="Treatment")
I have tried errbar(x2range, y2range, y2range+df$sd, y2range-df$sd)
But the result is that all errorbars gather at the beginning and the end of the graph and not on the corresponding y coordinates.
How can I solve this?
Since you don't provide any sample data, here is a simple example using some simulated data.
# Generate some sample data
set.seed(2017);
x <- seq(0, 1, length.out = 10);
y <- 1 + 4 * x + runif(10);
dy <- sqrt(y);
df <- data.frame(x = x, y = y, dy = dy);
Plot in base R and add error bars using segments.
# Plot in base R
plot(df$x, df$y, ylim = c(0, 8), type = "l");
segments(df$x, df$y - df$dy, df$x, df$y + df$dy);
Or plot using ggplot2.
# Plot in ggplot
ggplot(df, aes(x = x, y = y)) +
geom_line() +
geom_errorbar(aes(ymin = y - dy, ymax = y + dy));
I was wondering how I can modify the following code to have a plot something like
data(airquality)
library(quantreg)
library(ggplot2)
library(data.table)
library(devtools)
# source Quantile LOESS
source("https://www.r-statistics.com/wp-content/uploads/2010/04/Quantile.loess_.r.txt")
airquality2 <- na.omit(airquality[ , c(1, 4)])
#'' quantreg::rq
rq_fit <- rq(Ozone ~ Temp, 0.95, airquality2)
rq_fit_df <- data.table(t(coef(rq_fit)))
names(rq_fit_df) <- c("intercept", "slope")
#'' quantreg::lprq
lprq_fit <- lapply(1:3, function(bw){
fit <- lprq(airquality2$Temp, airquality2$Ozone, h = bw, tau = 0.95)
return(data.table(x = fit$xx, y = fit$fv, bw = paste0("bw=", bw), fit = "quantreg::lprq"))
})
#'' Quantile LOESS
ql_fit <- Quantile.loess(airquality2$Ozone, jitter(airquality2$Temp), window.size = 10,
the.quant = .95, window.alignment = c("center"))
ql_fit_df <- data.table(x = ql_fit$x, y = ql_fit$y.loess, bw = "bw=1", fit = "Quantile LOESS")
I want to have all these fits in a plot.
geom_quantile can calculate quantiles using the rq method internally, so we don't need to create the rq_fit_df separately. However, the lprq and Quantile LOESS methods aren't available within geom_quantile, so I've used the data frames you provided and plotted them using geom_line.
In addition, to include the rq line in the color and linetype mappings and in the legend we add aes(colour="rq", linetype="rq") as a sort of "artificial" mapping inside geom_quantile.
library(dplyr) # For bind_rows()
ggplot(airquality2, aes(Temp, Ozone)) +
geom_point() +
geom_quantile(quantiles=0.95, formula=y ~ x, aes(colour="rq", linetype="rq")) +
geom_line(data=bind_rows(lprq_fit, ql_fit_df),
aes(x, y, colour=paste0(gsub("q.*:","",fit),": ", bw),
linetype=paste0(gsub("q.*:","",fit),": ", bw))) +
theme_bw() +
scale_linetype_manual(values=c(2,4,5,1,1)) +
labs(colour="Method", linetype="Method",
title="Different methods of estimating the 95th percentile by quantile regression")
When fitting a cox model that includes spline terms for a continuous covariate, I would like to be able to produce a plot of the hazard ratio across range of that covariate (relative to a fixed reference value) using ggplot2.
I have adapted an example from Terry Therneau's splines vignette here (see page 3). The only issue with this approach is the lack of a "waist" in the confidence interval at the reference value, as in this plot:
The example below produces the following plot, without the narrowing of the CI at the reference value.
library(survival)
library(splines)
library(ggplot2)
# colon cancer death dataset
ccd <- na.omit(subset(colon, etype == 2))
# fit model with ns() term for age
cox <- coxph(Surv(time, status) ~ rx + sex + ns(age, knots = c(20, 50, 70)), data = ccd)
# get data for plot
tp <- termplot(cox, se = TRUE, plot = FALSE)
# hazard ratio plot for natural spline of age, with reference # 50 yrs
ref <- tp$age$y[tp$age$x == 50]
ggplot() +
geom_line(data = tp$age, aes(x = x, y = exp(y - ref))) +
geom_line(data = tp$age, aes(x = x, y = exp(y - 1.96 * se - ref)), linetype = 2) +
geom_line(data = tp$age, aes(x = x, y = exp(y + 1.96 * se - ref)), linetype = 2) +
geom_hline(aes(yintercept = 1), linetype = 3) +
geom_rug(data = ccd, aes(x = age), sides = "b") +
labs(x = "Age at baseline, years",
y = "Hazard Ratio (95% CI) vs. 50 years",
title = "Mortality hazard ratio as a function of age",
subtitle = "Natural spline: knots at 20, 50, and 70 years")
I am aware that there are features in the rms package and the smoothHRpackage that produce these types of plots, but I am looking for a solution that is amenable to ggplot2 graphics and the coxph() function in the survival package. My question therefore boils down to:
Is there a way to adapt the output of termplot() to produce a plot with a "waist" at the reference value?
If termplot() cannot be used, how can I obtain the relevant plotting data by other means?
Edit 1: As the first comment suggested, this can be accomplished using rms and ggplot2 together. For example:
library(rms)
dd <- datadist(ccd)
dd$limits$age[2] <- 50
options(datadist = "dd")
cph <- cph(Surv(time, status) ~ rx + sex + rcs(age, c(20, 50, 70)), data = ccd, x = TRUE, y = TRUE)
pdata <- Predict(cph, age, ref.zero = TRUE, fun = exp)
ggplot(data = pdata) +
geom_hline(aes(yintercept = 1), linetype = 3) +
labs(x = "Age at baseline, years",
y = "Hazard Ratio (95% CI) vs. 50 years",
title = "Mortality hazard ratio as a function of age",
subtitle = "Natural spline: knots at 20, 50, and 70 years")
Which produces a plot very close to what I am after:
However, I would still like to know if there is a way to do this using coxph() and ns(). Not that I have anything against the rms package, I just have a bunch of old code based on survivalfunctionality.