Plotting Observed Vs Predicted variables on the same graph in Lattice - r

I'm trying to plot observed and predicted variables on the same plot in lattice. My data is a repeated dataset and I've tried a few things which haven't worked. Any assistance would be much appreciated. Code is given below.
library(nlme)
library(lattice)
# add random conc predictions to data
Theoph$predConc <- rnorm(132, 5)
# My attempt at plotting both predConc and conc against time on the same plot
lattice::xyplot(predConc + conc ~ Time | Subject, groups=Subject, data=Theoph, type="l", layout = c(4,4))
As you can see, it doesn't seem to be doing what I want it to do. Ideally I would like the "conc" and "predConc" to be in different colours but appear together on each panel for each Id so I can compare the two easily.

As was suggested in the comments, it is fixed simply by dropping groups = Subject.
lattice::xyplot(predConc + conc ~ Time | Subject, data = Theoph, type = "l",
auto.key = TRUE)

Related

Reorder legend survfit

I have made a regression that looks like this:
survfit(Surv(YearsToEvent, Event) ~ CancerType, data = RegressionData)
From that I get an output table like this (I removed some of the columns for readability):
n
Cancer A 100
Cancer B 200
However, when I plot the output using ggsurvplot where I have a plot + a table with "number at risk", I want to be able to manually adjust the order of the legend. That is, I want to be able to put Cancer B before Cancer A. I found a similar thread here on SO: How to reorder strata in survfit object for ggsurvplot legend?. However, I did not find any answer in it. I have tried to sort the data frame RegressionData before the regression, without sucess.
Could anyone help me out here?
Here is an example how you could achieve what you want:
The main step is to transfrom the grouping variable to factor class. Then you can define the levels by hand:
The other steps are the same:
library(survminer)
library(survival)
library(tidyverse)
lung1 <- lung %>%
mutate(sex = factor(sex, levels = c(2, 1)))
ggsurvplot(
fit = survfit(Surv(time, status) ~ sex, data = lung1),
risk.table = TRUE,
xlab = "Days",
ylab = "Overall survival probability")

Using interplot to plot continuous*continuous interactions for multiple individuals

I want to produce a plot which demonstrates the effects of a continuous by continuous interaction, with one line for each individual in the dataset. I have managed to successfully plot the interaction at the population level using interplot:
m2<-lmer(RMR~No_Squares*Temperature+(Temperature|ID), data=female1)
interplot(m = m2, var1 = "No_Squares", var2 = "Temperature", ci=FALSE)
But I am at a loss as how to produce such a plot for each individual ID (i.e. to show the differences between individuals)
I have tried adding:
+ geom_smooth(aes(group = ID), method = "lm")
To the code, but this doesn't work.
Any ideas?

plotting log(10) lengths differ

I am having difficulty plotting a log(10) formula on to existing data points. I derived a logarithmic function based on a list of data where "Tout_F_6am" is my independent variable and "clo" is my dependent variable.
When I go to plot it, I am getting the error that lengths x and y are different. Can someone please help me figure out whats going wrong?
logKT=lm(log10(clo)~ Tout_F_6am,data=passive)
summary(logKT) #r2=0.12
coef(logKT)
plot(passive$Tout_F_6am,passive$clo) #plot data points
x=seq(53,84, length=6381)#match length of x variable
y=logKT
lines(x,y,type="l",lwd=2,col="red")
length(passive$Tout_F_6am) #6381
length(passive$clo) #6381
Additionally, can the formula curve(-0.0219-0.005*log10(x),add=TRUE,col=2)be written as eq=(10^-0.022)*(10^-0.005*x)? thanks!
The problem is that you are trying to plot the model object, not the predictions from the model. Try something like this:
Define the explanatory values you want to plot, in a data frame (or tibble). It doesn't have to be as many as there are data points.
library(dplyr)
explanatory_data <- tibble(
Tout_F_6am = seq(53, 84, 0.1)
)
Add a column of predicted values using predict(). This takes a model and your explanatory data. predict() will return the transformed values, so you have to backtransform them.
prediction_data <- explanatory_data %>%
mutate(
log10_clo = predict(logKT, explanatory_data),
clo = 10 ^ log10_clo
)
Finally, draw your plot.
plot(clo ~ Tout_F_6am, data = prediction_data, log="y", type = "l")
The plotting is actually easier using ggplot2. This should give you more or less what you want.
library(ggplot2)
ggplot(passive, aes(Tout_F_6am, clo)) +
geom_point() +
geom_smooth(method = "lm") +
scale_y_log10()

plotting two categorical vectors in ggridges

I have a dataset with a few organisms, which I would like to plot on my y-axis, against date, which I would like to plot on the x-axis. However, I want the fluctuation of the curve to represent the abundance of the organisms. I.e I would like to plot a time series with the relative abundance separated by the organism to show similar patterns with time.
However, of course, plotting just date against an organism does not yield any information on the abundance. So, my question is, is there a way to make the curve represent abundance using ggridges?
Here is my code for an example dataset:
set.seed(1)
Data <- data.frame(
Abundance = sample(1:100),
Organism = sample(c("organism1", "organism2"), 100, replace = TRUE)
)
Date = rep(seq(from = as.Date("2016-01-01"), to = as.Date("2016-10-01"), by =
'month'),times=10)
Data <- cbind(Date, Data)
ggplot(Data, aes(x = Abundance, y = Organism)) +
geom_density_ridges(scale=1.15, alpha=0.6, color="grey90")
This produces a plot with the two organisms, however, I want the date on the x-axis and not abundance. However, this doesn't work. I have read that you need to specify group=Date or change date into julian day, however, this doesn't change the fact that I do not get to incorporate abundance into the plot.
Does anyone have an example of a plot with date vs. a categorical variable (i.e. organism) plotted against a continuous variable in ggridges?
I really like to output from ggridges and would like to be able to use it for these visualizations. Thank you in advance for your help!
Cheers,
Anni
To use geom_density_ridges, it'll help to reshape the data to show observations in separate rows, vs. as summarized by Abundance.
library(ggplot2); library(ggridges); library(dplyr)
# Uncount copies the row "Abundance" number of times
Data_sum <- Data %>%
tidyr::uncount(Abundance)
ggplot(Data_sum, aes(x = Date, y = Organism)) +
ggridges::geom_density_ridges(scale=1, alpha=0.6, color="grey90")

Indexing separate survival curves

I would like to plot Kaplan-Meier survival estimates for each of two groups in ggplot.
To do so requires getting a separate survival curve for each group. The survfit function in the survival package splits the nicely but I don't know how to index the separate plots to work on them.
Here is sample data:
rearrest<-read.table("http://stats.idre.ucla.edu/stat/examples/alda/rearrest.csv", sep=",", header=T)
This is the curve ungrouped
(sCurve <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~1, data = rearrest)))
It is easy to index elements within this, for example
sCurve$n.event
When I fit the same thing except this time grouped according to the value of the personal variable I get two nice survival curve objects ready to go.
(sCurveA <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~personal, data = rearrest)))
One object is labelled personal=0 and the other personal=1. I have tried indexing with $, [], [[]] both with number-type indexes and named-, all to no avail.
Can anyone help?
sCurveA$strata provides the grouping variable as a vector. You can pull out the key pieces and throw them into a data.frame for ggplot.
df = data.frame(Time = sCurveA$time,
Survival = sCurveA$surv,
Strata = sCurveA$strata)
ggplot(df, aes(Time, Survival, col = Strata)) +
geom_line()

Resources