Unable to plot binary outcome and continuous predictor? - r

I am trying to show how age (V1) is correlated with a binary outcome (V2), however, I am not having any luck with plotting this.
Here are my data:
> dput(head(test, 100))
structure(list(V1 = c(48, 92, 36, NA, 69, NA, NA, 19, 69, 82,
NA, 39, 42, NA, 68, 72, 27, 78, 42, 15, 79, 48, 38, 46, 17, 33,
24, 41, 68, 28, 79, NA, 52, 81, 74, 58, 57, 71, 51, 51, 51, 51,
31, 96, 47, NA, 66, 66, 73, 55, 79, 60, 60, 76, 34, 53, 58, 70,
80, 33, 17, 54, 42, 64, NA, 72, 53, 55, 59, NA, 68, 71, 70, 77,
16, 74, 74, 29, 49, NA, 64, 65, 65, 65, 57, 63, 60, 78, 77, 75,
54, 55, 97, NA, NA, 74, 80, 73, 74, 67), V2 = c(1, 0, 1, NA,
1, NA, NA, 1, 1, 1, NA, 0, 1, NA, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 0, 1, 1, 0, NA, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1,
1, 1, NA, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1,
1, NA, 1, 1, 1, 1, NA, 0, 1, 1, 1, 1, 1, 0, 1, 0, NA, 1, 1, 1,
1, 0, 0, 0, 1, 0, 1, 1, 0, 0, NA, NA, 0, 1, 0, 0, 0)), row.names = c(NA,
100L), class = "data.frame")
Here is what I attempted to do, but I am not getting any sort of smoothing curve to show how age is associated with the binary outcome:
ggplot(test, aes(x=V1, y=V2))+
geom_point(size=2, alpha=0.4)+
stat_smooth(method="loess", color="blue", size=1.5)
And this is what I am trying to create (although I am open to suggestions for betting plotting methods).
This is my output (haven't changed the axis labels, but the y-axis should be the binary outcome and the x-axis is age):

If you have binary outcome data and a numeric predictor, the typical way to model this would be with logistic regression. You can show a logistic regression quite easily in ggplot by passing method = glm and method.args = list(family = binomial)) to geom_smooth.
You can augment this by adding the successes and failures as a sort of "rug plot", and adding a few aesthetic tweaks:
ggplot(test, aes(V1, V2)) +
geom_point(shape = "|", size = 6, na.rm = TRUE, aes(color = factor(V2))) +
geom_smooth(method = glm, method.args = list(family = binomial), na.rm = TRUE,
formula = y ~ x, color = "navy", fill = "lightblue") +
coord_cartesian(ylim = c(0, 1), expand = 0) +
labs(x = "Age", y = "Probability") +
theme_minimal(base_size = 16) +
theme(axis.line = element_line(color = "gray"),
axis.ticks = element_line(color = "gray"),
axis.ticks.length = unit(3, "mm"),
legend.position = "none")
Note that this is preferable to a plain loess because with a loess (or other methods that do not explicitly account for the binary nature of the data) will give inaccurate confidence intervals (your target plot has a confidence interval which goes above 100% probability, which clearly doesn't make sense).

Related

From Boxplot to Barplot in ggplot possible?

I have to do a ggplot barplot with errorbars, Tukey sig. letters for plants grown with different fertilizer concentraitions.
The data should be grouped after the dif. concentrations and the sig. letters should be added automaticaly.
I have already a code for the same problem but for Boxplot - which is working nicely. I tried several tutorials with barplots but I always get the problem; stat_count() can only have an x or y aesthetic.
So I thought, is it possible to get my boxplot code to a barplot code? I tried but I couldnt do it :) And if not - how do I automatically add tukeyHSD Test result sig. letters to a ggplot barplot?
This is my Code for the boxplot with the tukey letters:
    value_max = Dünger, group_by(Duenger.g), summarize(max_value = max(Höhe.cm))
hsd=HSD.test(aov(Höhe.cm~Duenger.g, data=Dünger),
trt = "Duenger.g", group = T) sig.letters <- hsd$groups[order(row.names(hsd$groups)), ]
J <- ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm))+ geom_boxplot(aes(fill= Duenger.g))+ scale_fill_discrete(labels=c("0.5g", '1g', "2g", "3g", "4g"))+ geom_text(data = value_max, aes(x=Duenger.g, y = 0.1 + max_value, label = sig.letters$groups), vjust=0)+ stat_boxplot(geom = 'errorbar', width = 0.1)+ ggtitle("Auswirkung von Dünger auf die Höhe von Pflanzen") + xlab("Dünger in g") + ylab("Höhe in cm"); J
This is how it looks:
boxplot with tukey
Data from dput:
structure(list(Duenger.g = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4), plant = c(1, 2, 3, 4, 5, 7, 10, 11, 12, 13, 14, 18, 19,
21, 23, 24, 25, 26, 27, 29, 30, 31, 33, 34, 35, 37, 38, 39, 40,
41, 42, 43, 44, 48, 49, 50, 53, 54, 55, 56, 57, 58, 61, 62, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 79, 80, 81, 83, 85, 86,
88, 89, 91, 93, 99, 100, 102, 103, 104, 105, 106, 107, 108, 110,
111, 112, 113, 114, 115, 116, 117, 118, 120, 122, 123, 125, 126,
127, 128, 130, 131, 132, 134, 136, 138, 139, 140, 141, 143, 144,
145, 146, 147, 149), height.cm = c(5.7, 2.8, 5.5, 8, 3.5, 2.5,
4, 6, 10, 4.5, 7, 8.3, 11, 7, 8, 2.5, 7.4, 3, 14.5, 7, 12, 7.5,
30.5, 27, 6.5, 19, 10.4, 12.7, 27.3, 11, 11, 10.5, 10.5, 13,
53, 12.5, 12, 6, 12, 35, 8, 16, 56, 63, 69, 62, 98, 65, 77, 32,
85, 75, 33.7, 75, 55, 38.8, 39, 46, 35, 59, 44, 31.5, 49, 34,
52, 37, 43, 38, 28, 14, 28, 19, 20, 23, 17.5, 32, 16, 17, 24.7,
34, 50, 12, 14, 21, 33, 39.3, 41, 29, 35, 48, 40, 65, 35, 10,
26, 34, 41, 32, 38, 23.5, 22.2, 20.5, 29, 34, 45)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -105L))
Thank you
mirai
A bar chart and a boxplot are two different things. By default geom_boxplot computes the boxplot stats by default (stat="boxplot"). In contrast when you use geom_bar it will by default count the number of observations (stat="count") which are then mapped on y. That's the reason why you get an error. Hence, simply replacing geom_boxplot by geom_bar will not give your your desired result. Instead you could use e.g. stat_summary to create your bar chart with errorbars. Additionally I created a summary dataset to add the labels on the top of the error bars.
library(ggplot2)
library(dplyr)
library(agricolae)
Dünger <- Dünger |>
rename("Höhe.cm" = height.cm) |>
mutate(Duenger.g = factor(Duenger.g))
hsd <- HSD.test(aov(Höhe.cm ~ Duenger.g, data = Dünger), trt = "Duenger.g", group = T)
sig.letters <- hsd$groups %>% mutate(Duenger.g = row.names(.))
duenger_sum <- Dünger |>
group_by(Duenger.g) |>
summarize(mean_se(Höhe.cm)) |>
left_join(sig.letters, by = "Duenger.g")
ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm, fill = Duenger.g)) +
stat_summary(geom = "bar", fun = "mean") +
stat_summary(geom = "errorbar", width = .1) +
scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
geom_text(data = duenger_sum, aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
labs(
title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
x = "Dünger in g", y = "Höhe in cm"
)
#> No summary function supplied, defaulting to `mean_se()`
But as the summary dataset now already contains the mean and the values for the error bars a second option would be to do:
ggplot(duenger_sum, aes(x = Duenger.g, y = y, fill = Duenger.g)) +
geom_col() +
geom_errorbar(aes(ymin = ymin, ymax = ymax), width = .1) +
scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
geom_text(aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
labs(
title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
x = "Dünger in g", y = "Höhe in cm"
)

fill delaunay triangles with colors of vertex points in R

here is a reprex
data<- structure(list(lanmark_id = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59, 60, 61, 62, 63, 64, 65, 66, 67), V1 = c(0.00291280916742007,
0.00738863171211713, 0.0226678081211574, 0.0475105228945172,
0.0932285720818941, 0.167467706279089, 0.257162845610094, 0.365202733889021,
0.49347857580521, 0.623654594804239, 0.738846221030799, 0.838001377618909,
0.911583795022151, 0.954620025430512, 0.976736039833402, 0.99275439380643,
1.00100526672829, 0.0751484964183746, 0.136267471453466, 0.223219796351563,
0.312829176190895, 0.396253287447153, 0.589077347394549, 0.682150866526948,
0.771279538477539, 0.856242644022999, 0.915433541338973, 0.493665602840245,
0.491283285973581, 0.488913167946858, 0.486968906096063, 0.384707082576335,
0.43516446651127, 0.48730704698643, 0.541730425616146, 0.590794609520034,
0.176234316360877, 0.230353437655898, 0.295908510434122, 0.350673723300921,
0.2927721757992, 0.228392965512228, 0.634474821310078, 0.692554938010577,
0.757884656518485, 0.809961553290539, 0.760324208523177, 0.696892501347341,
0.299062528225204, 0.371899560139738, 0.440183530232855, 0.488448817156316,
0.542120710507391, 0.613931454931259, 0.683122622479693, 0.614367295821043,
0.544516611213321, 0.487065702940653, 0.43466839036949, 0.367662837035504,
0.329392110306872, 0.439192556373207, 0.488617118648197, 0.543288506065858,
0.652131615571443, 0.541622182786469, 0.486664920417254, 0.437126878794749
), V2 = c(0.201088019764115, 0.335422141956174, 0.468591127485112,
0.597955245417373, 0.719502795031081, 0.826191980419368, 0.912263437847338,
0.978932088608654, 0.996572250349122, 0.975164350943783, 0.906204543800476,
0.817791059656974, 0.711167374856116, 0.587462637963028, 0.457981280500493,
0.327526817895531, 0.19652402489511, 0.0832018969548692, 0.0247526745448235,
0.00543973063471442, 0.0169853862992864, 0.0463565705952832,
0.0442986445765913, 0.0151651597693172, 0.00747493463745755,
0.0263496825405166, 0.0805712600069456, 0.160307477500307, 0.24640401358039,
0.332244740019727, 0.420995916418539, 0.486383354389177, 0.505514985155285,
0.521022030162301, 0.5059272511442, 0.48818970795347, 0.184054088286897,
0.153658218058329, 0.153359749238857, 0.186997311695192, 0.20294291755153,
0.204166125257439, 0.186997311695192, 0.153386090373069, 0.155932705636629,
0.184603717976376, 0.203900583330345, 0.202836636618411, 0.670663080116174,
0.635972857244521, 0.619932598923225, 0.632625553953685, 0.620132318139554,
0.637530241507316, 0.668109937001625, 0.718821664744205, 0.73956412947459,
0.744898219300658, 0.74046882628352, 0.720755964662638, 0.672731384920681,
0.666152981987244, 0.670464844757437, 0.664772611108765, 0.671145517468628,
0.673968618595099, 0.67986363963374, 0.675352028351748), coef2 = c(0,
0, 0, 0, 0, 0, 0, 0, 0.565178003460693, 0, 0, 0, 0, 0, 0, 0,
0, 0.0433232019717308, 0.0433232019717308, 0.442833876807268,
0.574211955093656, 0.574211955093656, 0.574211955093656, 0.574211955093656,
0.442833876807268, 0.0433232019717308, 0.0433232019717308, 0.0612451242746323,
0.0612451242746323, 0, 0, 0, 0, 0, 0, 0, 0.343056259557492, 0.701076795777046,
0.674029769391816, 0, 0.538117834886036, 0.990039002564078, 0.451921167678043,
0.701076795777046, 0.701076795777046, 0.316009233172263, 0.990039002564078,
0.990039002564078, 0.878350036859346, 0.343364662128988, 0.282119537854356,
0.282119537854356, 0.282119537854356, 0.343364662128988, 0.384793696241895,
0.608382647917744, 0.608382647917744, 1, 0.608382647917744, 0.608382647917744,
0.384793696241895, 0.501936678206125, 0.501936678206125, 0, 0.878350036859346,
0, 0.501936678206125, 0.501936678206125)), row.names = c(NA,
-68L), class = c("tbl_df", "tbl", "data.frame"))
I used this data to create a deulanay plot in R
library(tidyverse)
library(ggforce)
data%>%
mutate(coef2 = coef2/max(coef2))%>%
ggplot(aes(V1, V2))+
geom_delaunay_tile(aes(colour = coef2, fill = coef2), alpha = .5)+
geom_delaunay_segment2(aes(colour = coef2, fill = coef2))+
geom_point(aes(colour = coef2))+
ylim(1,0)+
scale_color_viridis_c(option = "magma")+
scale_fill_viridis_c(option = "magma")+
theme_minimal()
which gives this
I want to fill all triangles with a blend of colors that match the color of each point, just as the lines are colored.
as you can see I have tried using fill = coef2 within de geom_delaunay but this doesn't really achieve what I want.
is there a way to do this in R.
Many thanks!

Surface in plotly does not cover all data, leaving a gap between surface and highlight

I am using a plotly surface plot with data that has some missing values.
As you can see in the example below, I am using highlight lines to show the surface does not reach the highlight, leaving a weird empty gap. It is not a matter of perspective, as the gap also shows in a cenital plane.
To be more specific, below I am hovering on row 12, column 2006, and although the missing data starts in row 13, in the plot the missing data seems to start before row 12 ("row 11.9"). My expectation would be that the purple surface would reach all the way to the bright blue highlight in row 12.
Is this a bug, or there is a parameter to make sure this does not happen?
Thanks!
library(dplyr)
library(plotly)
DF_RAW = structure(c(181, 163, 60, 124, 76, 62, 73, 59, 17, 21, 26, 7, NA, NA, NA,
188, 145, 61, 130, 61, 59, 62, 57, 20, 22, 22, 6, NA, NA, NA,
137, 154, 54, 191, 75, 56, 65, 56, 22, 27, 33, 14, NA, NA, NA,
126, 185, 65, 109, 51, 71, 57, 38, 25, 23, 21, 10, NA, NA, NA,
150, 144, 44, 123, 58, 24, 48, 41, 19, 26, 21, 5, NA, NA, NA,
138, 137, 61, 130, 67, 34, 60, 44, 19, 21, 16, 4, NA, NA, NA,
121, 146, 101, 92, 70, 74, 88, 33, 18, 39, 24, 12, NA, NA, NA,
NA, 160, 129, 117, 70, 61, 42, 35, 22, 25, 21, 7, 10, 23, 8,
NA, 129, 130, 107, 64, 61, 44, 25, 23, 30, 18, 11, 20, 58, 40,
NA, 136, 131, 96, 53, 31, 51, 37, 43, 31, 19, 2, 22, 40, 41,
NA, 124, 154, 74, 62, 44, 34, 15, 26, 23, 20, 6, 23, 10, 19,
NA, 126, 251, 76, 73, 84, 47, 40, 32, 25, 32, 6, 13, 10, 13,
NA, 129, 194, 91, 53, 99, 46, 34, 60, 21, 17, 6, 14, 14, 26,
NA, 115, 119, 88, 64, 108, 37, 24, 49, 26, 17, 6, 15, 15, 47),
.Dim = 15:14,
.Dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"),
c("2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019")))
DF = DF_RAW
plot1 = plotly::plot_ly(x = ~ colnames(DF),
y = ~ rownames(DF),
z = ~ DF) %>%
plotly::add_surface(name = "3D mesh",
connectgaps = TRUE, hidesurface = TRUE,
contours = list(
x = list(show = TRUE, width = 1, highlightwidth = 2, highlightcolor = "#41a7b3", highlight = TRUE),
y = list(show = TRUE, width = 1, highlightwidth = 2, highlightcolor = "#41a7b3", highlight = TRUE),
z = list(show = FALSE, width = 1, highlightwidth = 2, highlightcolor = "#41a7b3", highlight = FALSE)
)) %>%
plotly::add_surface(name = "surface",
connectgaps = FALSE,
contours = list(
x = list(show = F, width = 1, highlightwidth = 2, highlightcolor = "#41a7b3", highlight = TRUE),
y = list(show = F, width = 1, highlightwidth = 2, highlightcolor = "#41a7b3", highlight = TRUE),
z = list(show = FALSE, width = 1, highlightwidth = 2, highlightcolor = "#41a7b3", highlight = FALSE)
)
)
plot1
EDIT:To emphasize that this is not a matter of perspective, here a cenital view of the plot. The gap is still visible.
plot1 %>%
plotly::layout(
scene = list(
camera = list(
eye = list(x = 0, y = 0, z = 2),
center = list(x = 0, y = 0, z = 0),
up = list(x = 0, y = 0, z = 1)
)
)
)
And if we get rid of the 3d mesh and show only the surface with the highlight, see how in row 11 (right) is very clear we have all the data (blue highlight goes all the way from top to bottom) but in row 12 it seems we only have data up to 2013 (blue line stops there).
plotly::plot_ly(x = ~ colnames(DF),
y = ~ rownames(DF),
z = ~ DF, showscale = FALSE) %>%
plotly::add_surface(name = "surface",
connectgaps = FALSE,
contours = list(
x = list(show = F, width = 1, highlightwidth = 2, highlightcolor = "#41a7b3", highlight = TRUE),
y = list(show = F, width = 1, highlightwidth = 2, highlightcolor = "#41a7b3", highlight = TRUE),
z = list(show = FALSE, width = 1, highlightwidth = 2, highlightcolor = "#41a7b3", highlight = FALSE)
)
) %>%
plotly::layout(
scene = list(
xaxis = list(showspikes = FALSE),
yaxis = list(showspikes = FALSE),
zaxis = list(showspikes = FALSE),
camera = list(
eye = list(x = 0, y = 0, z = 2),
center = list(x = 0, y = 0, z = 0),
up = list(x = 0, y = 0, z = 1)
)
)
)

How to calculate the conditional expectation Weibull model?

I would like to calculate the conditional expectation of the Weibull model. In specific, I would like to estimate the remaining tenure of a client looking at random moments (time = t) in his total tenure.
To do so, I have calculated the total tenure for each client (currently active or inactive) and based on the random moment for each client, calculated his/her tenure at that moment.
The example below is a snapshot of my attempt. I use 2 variables STED and TemporalTenure to predict the dependent variable tenure which has either status 0 = active or 1 = inactive. I use the survival package for obtaining the survival object (km_surv).
df = structure(list(ID = c(16008, 21736, 18851, 20387, 30749,
42159), STED = c(2,
5, 1, 3, 2, 2), TemporalTenure = c(84, 98, 255, 392, 108, 278
), tenure = c(152, 166, 273, 460, 160, 289), status = c(0, 0,
1, 0, 1, 1)), row.names = c(NA,
6L), class = "data.frame")
km_surv <- Surv(time = df$tenure, event = df$status)
df <- data.frame(y = km_surv, df[,!(names(df) %in% c("tenure","status", "ID"))])
weibull_fit <- psm(y ~. , dist="weibull", data = df)
quantsurv <- Quantile(weibull_fit, df)
lp <- predict(weibull_fit, df, type="lp")
print(quantsurv(0.5, lp))
The output of these estimations are way too high. I assume this is caused by including the TemporalTenure, but I can't find out how the psm package calculates this and if there are other packages where it's possible to estimate the remaining tenure of client i at time t.
How can I obtain the predicted tenure conditioned over the time that a client is already active (random moment in time: TemporalTenure) where the dependent tenure can either be a client that is still active or one that is inactive?
EDIT
To clarify, whenever I add time conditional variables such as: TemporalTenure, number of received payments and number of complaints until time t, the predicted lifetime explodes in many cases. Therefore, I suspect that the psm is not the right way to go. Similar question is asked here, but the solution given doesn't work for the same reasons.
Below a slightly bigger dataset which already causes problems.
df = structure(list(ID= c(16008, 21736, 18851, 20387, 30749,
42159, 34108, 47511, 47917, 61116, 66600, 131380, 112668, 90799,
113615, 147562, 166247, 191603, 169698, 1020841, 1004077, 1026953,
1125673, 1129788, 22457, 1147883, 1163870, 1220268, 2004623,
1233924, 2009026, 2026688, 2031284, 2042982, 2046137, 2043214,
2033631, 2034252, 2068467, 2070284, 2070697, 2084859, 2090567,
2087133, 2087685, 2095100, 2095720, 2100482, 2105150, 2109353,
28852, 29040, 29592, 29191, 31172, 2126369, 2114207, 2111947,
2102678, 237687, 1093221, 2111607, 2031732, 2105275, 2020226,
1146777, 1028487, 1030165, 1098033, 1142093, 1186763, 2005605,
2007182, 2021092, 2027676, 2027525, 2070471, 2070621, 2072706,
2081862, 2085084, 2085353, 2094429, 2096216, 2109774, 2114526,
2115510, 2117329, 2122045, 2119764, 2122522, 2123080, 2128547,
2130005, 30025, 24166, 61529, 94568, 70809, 159214), STED = c(2,
5, 1, 3, 2, 2, 3, 1, 2, 2, 2, 2, 2, 1, 2, 2, 4, 1, 4, 3, 2, 4,
1, 1, 2, 1, 4, 1, 1, 1, 2, 4, 2, 5, 4, 1, 4, 2, 5, 3, 2, 1, 4,
2, 1, 5, 3, 1, 1, 5, 2, 2, 2, 2, 3, 4, 3, 5, 1, 1, 5, 2, 5, 1,
3, 5, 3, 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 3, 5, 2, 2, 1, 2, 1, 2,
3, 1, 1, 3, 5, 1, 2, 2, 2, 2, 1, 2, 1, 3, 1), TemporalTenure = c(84,
98, 255, 392, 108, 278, 120, 67, 209, 95, 224, 198, 204, 216,
204, 190, 36, 160, 184, 95, 140, 256, 142, 216, 56, 79, 194,
172, 155, 158, 78, 24, 140, 87, 134, 111, 15, 126, 41, 116, 66,
60, 0, 118, 22, 116, 110, 52, 66, 0, 325, 323, 53, 191, 60, 7,
45, 73, 42, 161, 30, 17, 30, 12, 87, 85, 251, 120, 7, 6, 38,
119, 156, 54, 11, 141, 50, 25, 33, 3, 48, 58, 13, 113, 25, 18,
23, 2, 102, 5, 90, 0, 101, 83, 44, 125, 226, 213, 216, 186),
tenure = c(152, 166, 273, 460, 160, 289, 188, 72, 233, 163,
266, 266, 216, 232, 247, 258, 65, 228, 252, 99, 208, 324,
201, 284, 124, 84, 262, 180, 223, 226, 146, 92, 208, 155,
202, 179, 80, 185, 64, 184, 120, 65, 6, 186, 45, 120, 170,
96, 123, 12, 393, 391, 64, 259, 73, 42, 69, 141, 47, 229,
37, 19, 37, 17, 155, 99, 319, 188, 75, 11, 49, 187, 180,
55, 52, 209, 115, 93, 88, 6, 53, 126, 31, 123, 26, 26, 24,
9, 114, 6, 111, 4, 168, 84, 112, 193, 294, 278, 284, 210),
status = c(0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1,
0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1,
0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0,
1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 1, 0, 1), TotalValue = c(2579.35, 2472.85,
581.19, 2579.35, 2472.85, 0, 1829.18, 0, 936.79, 2098.2,
850.47, 2579.35, 463.68, 463.68, 2171.31, 3043.03, 561.16,
3043.03, 3043.03, -68.06, 2098.2, 2504.4, 1536.67, 2719.7,
3043.03, 109.91, 2579.35, 265.57, 3560.34, 2266.95, 3123.16,
3544.4, 1379.19, 2288.35, 2472.85, 2560.48, 1414.45, 3741.49,
202.2, 2856.23, 1457.75, 313.68, 191.32, 2266.95, 661.01,
0, 2050.81, 298.76, 1605.44, 373.86, 3043.03, 2579.35, 448.63,
3043.03, 463.68, 977.28, 818.06, 2620.06, 0, 3235.8, 280.99,
0, 0, 194.04, 3212.75, -23.22, 1833.46, 1829.18, 2786.7,
0, 0, 3250.38, 936.79, 0, 1045.21, 3043.03, 1988.36, 2472.85,
1197.94, 0, 313.68, 3212.75, 1419.33, 531.14, 0, 96.28, 0,
142.92, 174.79, 0, 936.79, 156.19, 2472.85, 463.68, 3520.69,
2579.35, 3328.87, 2567.88, 3043.03, 1081.14)), row.names = c(NA,
100L), class = "data.frame")
So here's what I have done: 1) added library call to load pkg:rms, removed the attempt to place a Surv object in a dataframe column, 3) built the Surv object inside formula as Therneau expects formulas to be built, and removed ID from the covariates where it most probably does not belong.
library(survival); library(rms)
#km_surv <- Surv(time = df$tenure, event = df$status)
#df <- data.frame(y = km_surv, df[,!(names(df) %in% c("tenure","status"))])
weibull_fit <- psm(Surv(time = tenure, event = status) ~TemporalTenure +STED , dist="weibull", data = df)
quantsurv <- Quantile(weibull_fit, df)
lp <- predict(weibull_fit, df, type="lp")
Results#
print(quantsurv(0.5, lp))
1 2 3 4 5 6
151.4129 176.0490 268.4644 466.8266 164.8640 301.2630

How do I use column index as x axis in R

I have a data frame with 7 columns and 100 observations
I divided observations into two groups
the question I'm working on is: b) Construct two time plots of the mean blood lead levels superimposed on the blood lead levels at each occasion for succimer and placebo groups.
This is my code so far:
library(tidyverse)
library(haven)
library(dplyr)
library(plyr)
library(foreign)
library(ggplot2)
tlc = read_dta(file = 'tlc.dta')
head(tlc)
## a)
placebo = subset(tlc, tlc$trt==0)
succimer = subset(tlc, tlc$trt==1)
summary(placebo[, 3:6])
summary(succimer[, 3:6])
placebo_mean=colMeans(placebo[ ,3:6])
placebo_std=apply(placebo[ ,3:6],2,sd)
placebo_var=placebo_std^2
succimer_mean=colMeans(succimer[ ,3:6])
succimer_std=apply(succimer[ ,3:6],2,sd)
succimer_var=succimer_std^2
## b)
## c)
placebo_cor=cor(placebo[ , 3:6]) %>% round(digits = 3)
succimer_cor=cor(succimer[ , 3:6]) %>% round(digits = 3)
placebo_cov=cov(placebo[ , 3:6]) %>% round(digits = 3)
succimer_cov=cov(succimer[ , 3:6]) %>% round(digits = 3)
So the purpose is to plot all observation by using values as y axis, and columns y0, y1, y4, y6 (represent to week 0, week 1, week 4, week 6) as x axis, then plot the mean of each group superimposed on the plot. I'm planning to use different colors to distinguish two groups, so the final plot will have a lot of points on each x coordinate, and two short lines to indicate means for each group at each x coordinate.
My question is how to use column index as x axis in R? with or with out using ggplot. I know this question may be too elementary, but it caused a lot of trouble for me as a beginner.
below is my data:
dput(tlc)
structure(list(id = structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100), format.stata = "%9.0g"),
trt = structure(c(0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1,
0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1,
1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1), format.stata = "%9.0g", class = "haven_labelled", labels = c(Placebo = 0,
Succimer = 1)), y0 = structure(c(30.7999992370605, 26.5,
25.7999992370605, 24.7000007629395, 20.3999996185303, 20.3999996185303,
28.6000003814697, 33.7000007629395, 19.7000007629395, 31.1000003814697,
19.7999992370605, 24.7999992370605, 21.3999996185303, 27.8999996185303,
21.1000003814697, 20.6000003814697, 24, 37.5999984741211,
35.2999992370605, 28.6000003814697, 31.8999996185303, 29.6000003814697,
21.5, 26.2000007629395, 21.7999992370605, 23, 22.2000007629395,
20.5, 25, 33.2999992370605, 26, 19.7000007629395, 27.8999996185303,
24.7000007629395, 28.7999992370605, 29.6000003814697, 32,
21.7999992370605, 24.3999996185303, 33.7000007629395, 24.8999996185303,
19.7999992370605, 26.7000007629395, 26.7999992370605, 20.2000007629395,
35.4000015258789, 25.2999992370605, 20.2000007629395, 24.5,
20.2999992370605, 20.3999996185303, 24.1000003814697, 27.1000003814697,
34.7000007629395, 28.5, 26.6000003814697, 24.5, 20.5, 25.2000007629395,
34.7000007629395, 30.2999992370605, 26.6000003814697, 20.7000007629395,
27.7000007629395, 24.2999992370605, 36.5999984741211, 28.8999996185303,
34, 32.5999984741211, 29.2000007629395, 26.3999996185303,
21.7999992370605, 27.2000007629395, 22.3999996185303, 32.5,
24.8999996185303, 24.6000003814697, 23.1000003814697, 21.1000003814697,
25.7999992370605, 30, 22.1000003814697, 20, 38.0999984741211,
28.8999996185303, 25.1000003814697, 19.7999992370605, 22.1000003814697,
23.5, 29.1000003814697, 30.2999992370605, 25.3999996185303,
30.6000003814697, 22.3999996185303, 31.2000007629395, 31.3999996185303,
41.0999984741211, 29.3999996185303, 21.8999996185303, 20.7000007629395
), format.stata = "%9.0g"), y1 = structure(c(26.8999996185303,
14.8000001907349, 23, 24.5, 2.79999995231628, 5.40000009536743,
20.7999992370605, 31.6000003814697, 14.8999996185303, 31.2000007629395,
17.5, 23.1000003814697, 26.2999992370605, 6.30000019073486,
20.2999992370605, 23.8999996185303, 16.7000007629395, 33.7000007629395,
25.5, 15.8000001907349, 27.8999996185303, 15.8000001907349,
6.5, 26.7999992370605, 12, 4.19999980926514, 11.5, 21.1000003814697,
3.90000009536743, 26.2000007629395, 21.3999996185303, 13.1999998092651,
21.6000003814697, 21.2000007629395, 26.3999996185303, 17.5,
30.2000007629395, 19.2999992370605, 16.3999996185303, 14.8999996185303,
20.8999996185303, 18.8999996185303, 6.40000009536743, 20.3999996185303,
10.6000003814697, 30.3999996185303, 23.8999996185303, 17.5,
10, 21, 17.2000007629395, 20.1000003814697, 14.8999996185303,
39, 32.5999984741211, 22.3999996185303, 5.09999990463257,
17.5, 25.1000003814697, 39.5, 29.3999996185303, 25.2999992370605,
19.2999992370605, 4, 24.2999992370605, 23.2999992370605,
28.8999996185303, 10.6999998092651, 19, 9.19999980926514,
15.3000001907349, 10.6000003814697, 28.5, 22, 25.1000003814697,
23.6000003814697, 25, 20.8999996185303, 5.59999990463257,
21.8999996185303, 27.6000003814697, 21, 22.7000007629395,
40.7999992370605, 12.5, 28.1000003814697, 11.6000003814697,
21.1000003814697, 7.90000009536743, 16.7999992370605, 3.5,
24.2999992370605, 28.2000007629395, 7.09999990463257, 10.8000001907349,
3.90000009536743, 15.1000003814697, 22.1000003814697, 7.59999990463257,
8.10000038146973), format.stata = "%9.0g"), y4 = structure(c(25.7999992370605,
19.5, 19.1000003814697, 22, 3.20000004768372, 4.5, 19.2000007629395,
28.5, 15.3000001907349, 29.2000007629395, 20.5, 24.6000003814697,
19.5, 18.5, 18.3999996185303, 19, 21.7000007629395, 34.4000015258789,
26.2999992370605, 22.8999996185303, 27.2999992370605, 23.7000007629395,
7.09999990463257, 25.2999992370605, 16.7999992370605, 4,
9.5, 17.3999996185303, 12.8000001907349, 34, 21, 14.6000003814697,
23.6000003814697, 22.8999996185303, 23.7999992370605, 21,
30.2000007629395, 16.3999996185303, 11.6000003814697, 14.5,
22.2000007629395, 18.8999996185303, 5.09999990463257, 19.2999992370605,
9, 26.5, 22.2000007629395, 17.3999996185303, 15.6000003814697,
16.7000007629395, 15.8999996185303, 17.8999996185303, 18.1000003814697,
28.7999992370605, 27.5, 21.7999992370605, 8.19999980926514,
19.6000003814697, 23.3999996185303, 38.5999984741211, 33.0999984741211,
25.1000003814697, 21.8999996185303, 4.19999980926514, 18.3999996185303,
40.4000015258789, 32.7999992370605, 12.6000003814697, 16.2999992370605,
8.30000019073486, 24.6000003814697, 14.3999996185303, 35,
19.1000003814697, 27.7999992370605, 21.2000007629395, 21.7000007629395,
21.7000007629395, 7.30000019073486, 23.6000003814697, 24,
8.60000038146973, 21.2000007629395, 38, 16.7000007629395,
27.5, 13, 21.5, 12.3999996185303, 15.1000003814697, 3, 22.7000007629395,
27, 17.2000007629395, 19.7999992370605, 7, 10.8999996185303,
25.2999992370605, 10.8000001907349, 25.7000007629395), format.stata = "%9.0g"),
y6 = structure(c(23.7999992370605, 21, 23.2000007629395,
22.5, 9.39999961853027, 11.8999996185303, 18.3999996185303,
25.1000003814697, 14.6999998092651, 30.1000003814697, 27.5,
30.8999996185303, 19, 16.2999992370605, 20.7999992370605,
17, 20.2999992370605, 31.3999996185303, 30.2999992370605,
25.8999996185303, 34.2000007629395, 23.3999996185303, 16,
24.7999992370605, 19.2000007629395, 16.2000007629395, 14.5,
21.1000003814697, 12.6999998092651, 28.2000007629395, 22.3999996185303,
11.6000003814697, 27.7000007629395, 21.8999996185303, 22,
24.2000007629395, 27.5, 17.6000003814697, 16.6000003814697,
63.9000015258789, 19.7999992370605, 15.5, 15.1000003814697,
23.7999992370605, 16, 28.1000003814697, 27.2000007629395,
18.6000003814697, 15.1999998092651, 13.5, 17.7000007629395,
18.7000007629395, 21.2999992370605, 34.7000007629395, 22.7999992370605,
21, 23.6000003814697, 18.3999996185303, 22.2000007629395,
43.2999992370605, 28.3999996185303, 27.8999996185303, 21.7999992370605,
11.6999998092651, 27.7999992370605, 39.2999992370605, 31.7999992370605,
21.2000007629395, 18.6000003814697, 18.3999996185303, 32.4000015258789,
18.7000007629395, 30.5, 18.7000007629395, 27.2999992370605,
21.1000003814697, 23.8999996185303, 19.8999996185303, 12.3000001907349,
24.7999992370605, 23.7000007629395, 24.6000003814697, 20.5,
32.7000007629395, 22.2000007629395, 24.7999992370605, 23.1000003814697,
20.6000003814697, 18.8999996185303, 18.7999992370605, 11.5,
20.1000003814697, 25.5, 18.7000007629395, 22.2000007629395,
17.7999992370605, 27.1000003814697, 4.09999990463257, 13,
12.3000001907349), format.stata = "%9.0g")), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
also I have tried this:
p=ggplot(tlc, aes(x=colnames(tlc[,3:6],do.NULL=TRUE)),
y=value)
p=p+geom_point()
No errors found when running the code, but R did report an error (Aesthetics must be either length 1 or the same as the data (100): x) when I call 'p' to plot it.
I don't have your data, but it sounds like you want something that looks like this:
Here is how I made it:
library(tidyverse)
# Setting up some fake data: 100 observations and 7 variables
set.seed(123)
some_data <- data.frame(y0 = rnorm(100),
y1 = runif(100),
y2 = rexp(100, 2),
y3 = rnorm(100, 2, 1),
y4 = rexp(100),
y5 = rnorm(100, 2,2),
y6 = runif(100, -5, 5))
# pivoting the data to longer format:
long_data <- some_data %>%
pivot_longer(cols = everything(),
names_to = "variable")
# building the base plot
p <- ggplot(long_data, aes(x = variable, y = value))
# adding the points - use position_jitter to give it some width if you want
p <- p + geom_point(position = position_jitter(width = 0.2))
# adding the bars at mean - play around with width, color, and size
p <- p + stat_summary(geom = "errorbar",
fun = mean,
width = 0.4,
aes(ymax = ..y.., ymin = ..y..),
color = "orange",
size = 1.5)
p # show plot

Resources