I have code to produce a good barplot and I'm trying to create a boxplot with the same data.
The barplot displays the count of "response" across all people (id). I'd like to create a boxplot for each type of "response" to replace the 3 bars. Boxplots should be calculated from the count of that specific "response" for each participant. So far no luck because I'm stuck on how to count the response for each participant.
current code:
df %>%
ggplot(position = dodge) +
labs(title= "question") +
geom_bar(aes(x = response), fill="red") +
labs(y = "count", x = "responses") +
scale_y_continuous(breaks=seq(0,100,20), limits = c(0,100))
output:
data sample:
structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5),
response = c(0, 1, 1, 0, 0, 0, 1, -1, 1, -1, 0, 1, -1, 1,
0, 0, 0, 0, 1, 1, 1, -1, 0, 1, 0, 1, 1, -1, 0, 1, 1, 1, 0,
1, 0, 0, 1, -1, 0, 1, 1, 1, -1, 1, 1, 1, 0, 0, -1, 1, 1,
-1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0,
1, 1, 0, 0, 0), iscorrect = c(0, 1, 1, 0, 0, 0, 1, 0, 1,
0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0,
0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0,
0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1,
0, 1, 0, 0, 1, 1, 0, 0, 0), min = c(100, 150, 150,
50, 50, 50, 150, 100, 100, 100, 50, 100, 50, 150, 150, 150,
50, 100, 100, 100, 150, 150, 50, 50, 50, 150, 150, 100, 50,
100, 100, 150, 150, 50, 50, 50, 150, 100, 100, 100, 50, 100,
50, 150, 150, 150, 50, 100, 100, 100, 150, 150, 50, 50, 50,
150, 150, 100, 50, 100, 100, 150, 100, 50, 50, 50, 150, 100,
100, 50, 150, 100, 50, 150, 150), max = c(125.4, 180.8,
180.8, 62.4, 62.4, 62.4, 180.8, 125.4, 125.4, 125.4, 62.4,
125.4, 62.4, 180.8, 180.8, 180.8, 62.4, 125.4, 125.4, 125.4,
180.8, 180.8, 62.4, 62.4, 62.4, 180.8, 180.8, 125.4, 62.4,
125.4, 125.4, 180.8, 180.8, 62.4, 62.4, 62.4, 180.8, 125.4,
125.4, 125.4, 62.4, 125.4, 62.4, 180.8, 180.8, 180.8, 62.4,
125.4, 125.4, 125.4, 180.8, 180.8, 62.4, 62.4, 62.4, 180.8,
180.8, 125.4, 62.4, 125.4, 125.4, 180.8, 125.4, 62.4, 62.4,
62.4, 180.8, 125.4, 125.4, 62.4, 180.8, 125.4, 62.4, 180.8,
180.8), time = c(5, 7, 9, 5, 1, 7, 1, 1, 7, 3, 9, 9,
3, 5, 3, 1, 9, 5, 1, 7, 9, 3, 5, 7, 1, 5, 7, 3, 3, 9, 5,
7, 9, 5, 1, 7, 1, 1, 7, 3, 9, 9, 3, 5, 3, 1, 9, 5, 1, 7,
9, 3, 5, 7, 1, 5, 7, 3, 3, 9, 9, 7, 5, 7, 5, 9, 5, 3, 1,
1, 9, 7, 3, 3, 1)), row.names = c(NA, -75L), class = c("tbl_df",
"tbl", "data.frame"))
You can use this code:
data %>%
group_by(id) %>%
count(response) %>%
mutate(response = as.factor(response)) %>%
ggplot(aes(x = response, y = n)) +
geom_boxplot(fill = "red") +
labs(y = "count", x = "responses")
Output:
You can try:
library(dplyr)
library(ggplot2)
df %>%
group_by(id, response) %>%
count() %>%
mutate(id = factor(id), response = factor(response)) %>%
ggplot(aes(response, n)) +
geom_boxplot(fill = "red") +
scale_y_continuous(name = "Number of responses per participant")
Note that boxplots don't work well for discrete data like small counts (unless your actual data has a far higher number of participants with a far higher count per response)
Related
This question was migrated from Stack Overflow because it can be answered on Cross Validated.
Migrated 24 days ago.
Suppose there are several categorical variables included in the LASSO regression.
For a categorical variable with more than two factors, it is mandatory to create a dummy table.
For example, the categorical variable is vaccination status (Vacc_Stat), in which there are three categories,i.e., 1 = not vaccinated, 2 = partially vaccinated, and 3 = fully vaccinated.
Using the model. matrix function for the vaccination status variable will yield two dummy columns because the value 1= not vaccinated is the reference.
If the final result of the LASSO regression coefficient is as follows
Vacc_Stat1 .
Vacc_Stat2 .
Vacc_Stat3 -4.208877e-01
Do we use the Vacc_Stat3 only, or we used the Vacc_Stat variable as a whole?
I am planning to do a LASSO regression followed by a logistic regression of the remaining variables selected through LASSO regression.
Thank you in advance.
I am expecting that if one of the dummy variables is included in the LASSO regression, then we used the original categorical variable as a whole.
The following is the minimal reproducible dataset
structure(c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1,
1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,
0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0,
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 0, 0.34, 0.49, 38, 0.58, 0.2, 0.49, 0.65,
40.57, 2.08, 49.52, 50.77, 38.04, 76.55, 55.95, 53.23, 38.04,
99.72, 80.04, 92.41, 47, 66, 70, 52, 36, 39, 67, 42, 23, 66,
37.109375, 31.22945431, 26.2345679, 20.76124567, 35.3798127,
26.44628099, 23.87511478, 24.8015873, 21.49959688, 22.47120876,
110, 159, 127, 100, 120, 115, 100, 112, 130, 119, 72, 78, 80,
72, 80, 73, 76, 75, 80, 78, 84, 86, 88, 103, 90, 91, 90, 82,
88, 105, 36, 37, 36.5, 36, 38, 38, 36, 36.4, 37, 36, 20, 40,
24, 20, 22, 24, 18, 20, 22, 20, 90, 99, 98, 99, 96, 90, 98, 99,
99, 90, 7, 5, 0, 2, 7, 10, 3, 3, 2, 2, 11.7, 13.8, 13, 10.9,
11.6, 14.5, 15, 16.2, 12.3, 14.2, 3.9, 4.2, 3.6, 4.7, 4, 3.2,
4.4, 5.1, 3, 3.78, 15.7, 28.8, 6, 7.8, 37.6, 13.9, 26.6, 27.2,
33, 23, 138, 139, 121, 135, 139, 132, 133, 138, 137, 128, 75,
64.5, 87.4, 88.9, 47.1, 78, 61.8, 62.52, 56.3, 63.2, 753, 305,
250, 267, 315, 207, 285, 293, 366, 307, 8.7, 8.1, 11.2, 75.9,
13.7, 10.03, 42.2, 10, 9, 10.6, 11.07, 6.8, 1.18, 23.18, 4.33,
5.25, 8.73, 7.44, 8.01, 10.37), dim = c(10L, 76L), dimnames =
list(
c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"),
c("TEST.Year2",
"TEST.Year3", "TEST.Gender2", "TEST.Vacc_Stat1",
"TEST.Vacc_Stat2",
"TEST.Vacc_Stat3", "TEST.Risk_AI2", "TEST.Risk_Obesity2",
"TEST.Risk_Smoking2",
"TEST.Risk_HT2", "TEST.Risk_DM2", "TEST.Risk_Asthma2",
"TEST.Risk_CHD2",
"TEST.Risk_CVD2", "TEST.Risk_COPD2", "TEST.Risk_TBC2",
"TEST.Risk_CKD2",
"TEST.Risk_CLD2", "TEST.Risk_Brain2", "TEST.Risk_HIV2",
"TEST.Risk_Cancer2",
"TEST.Symptom_Fever2", "TEST.Symptom_Cough2",
"TEST.Symptom_Sore_Throat2",
"TEST.Symptom_Rinnorrhea2", "TEST.Symptom_Anosmia2",
"TEST.Symptom_Myalgia2",
"TEST.Symptom_Headache2", "TEST.Symptom_Malaise2",
"TEST.Symptom_Anorexia2",
"TEST.Symptom_Diarrhea2", "TEST.Symptom_Nausea2",
"TEST.Symptom_Vomitting2",
"TEST.Symptom_Abd_Pain2", "TEST.Symptom_Dyspneu2",
"TEST.Symptom_Chest_Pain2",
"TEST.Symptom_LOC2", "TEST.Lab_RT_PCR2", "TEST.CXR_Proj2",
"TEST.CXR_Proj3",
"TEST.CXR_Pneumonia1", "TEST.CXR_Pneumonia2",
"TEST.CXR_Effusion2",
"TEST.Co_Septic2", "TEST.Co_Septic_Shock2", "TEST.Co_ARDS2",
"TEST.Co_Sx_Infection2", "TEST.Severity_Adm2",
"TEST.Severity_Adm3",
"TEST.Severity_Adm4", "TEST.Severity_Adm_Cat_12",
"TEST.Severity_Adm_Cat_22",
"TEST.Severity_Adm_Cat_32", "TEST.Severity_Worst2",
"TEST.Severity_Worst3",
"TEST.Severity_Worst4", "TEST.Progression2", "TEST.CXR_ALA_Num",
"TEST.CXR_Prob_Num", "TEST.Age", "TEST.BMI", "TEST.Vital_SBP",
"TEST.Vital_DBP", "TEST.Vital_PR", "TEST.Vital_Temp",
"TEST.Vital_RR",
"TEST.Vital_SpO2", "TEST.Symptom_Onset", "TEST.Lab_Hb",
"TEST.Lab_K",
"TEST.Lab_Lim", "TEST.Lab_Na", "TEST.Lab_Neu", "TEST.Lab_Tr",
"TEST.Lab_Ur", "TEST.Lab_WBC")))
I have a nested df that I am trying to clean up.
Sample Data:
df <-
tibble::tribble(
~idTeam, ~ptsTotalBehindFirst, ~ptsOverall, ~ptsDiffLastPeriod, ~rankOverall, ~ptsBattingBehindFirst, ~ptsBatting, ~ptsDiffBattingLastPeriod, ~dataBatting, ~rankBatting, ~ptsPitchingBehindFirst, ~ptsPitching, ~ptsDiffPitchingLastPeriod, ~dataPitching, ~rankPitching,
"2", "0", "111", "-4", 1L, "0", "65", "0", list(abbr = c("OBP", "HR", "RBI", "R", "SB"), roto_points = c(13, 13, 13, 13, 13), value = c(0.3663, 384, 1012, 1102, 164), diff = c(0, 0, 0, 0, 0), rank = c(1, 1, 1, 1, 1)), 1L, "5", "46", "-4", list(abbr = c("S", "W", "K", "ERA", "WHIP"), roto_points = c(12, 6, 11, 8, 9), value = c(94, 89, 1576, 3.946, 1.2179), diff = c(0, -2, -2, 0, 0), rank = c(2, 8, 3, 6, 5)), 3L,
"8", "13.5", "97.5", "2", 2L, "13", "52", "0", list(abbr = c("OBP", "HR", "RBI", "R", "SB"), roto_points = c(12, 11, 11, 12, 6), value = c(0.3576, 323, 954, 1011, 89), diff = c(0, 0, 0, 0, 0), rank = c(2, 3, 3, 2, 8)), 3L, "5.5", "45.5", "2", list(abbr = c("S", "W", "K", "ERA", "WHIP"), roto_points = c(2, 7.5, 10, "13", 13), value = c(56, 91, 1508, 3.688, 1.1474), diff = c(-1, 1.5, 0.5, 1, 0), rank = c(12, 6, 4, 1, 1)), 4L
)
The data I am trying to unnest is stored in the dataBatting and dataPitching columns. I am trying to unnest all of the columns in both columns and bind the results as rows. Something akin to pivot_longer but I wasn't sure of the right way to do this with 4 duplicate columns nested within 2 separate columns.
My attempt to do this was:
df %>%
unnest_wider(dataBatting) %>%
unnest(c(abbr, roto_points, value, diff, rank)) %>%
unnest_wider(dataPitching) %>%
unnest(c(abbr, roto_points, value, diff, rank))
Error is:
Error: Column names `abbr`, `roto_points`, `value`, `diff`, `rank` must not be duplicated.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace
My issue is that I want to bind the same columns from dataPitching that have the same column names as dataBatting (abbr, roto_points, value, diff, rank).
I also want to change the name of the columns that are duplicates. Is tidyr::hoist a better way to do this?
Desired df:
tibble::tribble(
~idTeam, ~ptsTotalBehindFirst, ~ptsOverall, ~ptsDiffLastPeriod, ~rankOverall, ~ptsBattingBehindFirst, ~ptsBatting, ~ptsDiffBattingLastPeriod, ~abbr, ~roto_points5, ~value, ~diff, ~rank, ~rankPitching, ~ptsPitchingBehindFirst, ~ptsPitching, ~ptsDiffPitchingLastPeriod,
2, 0, 111, -4, 1, 0, 65, 0, "OBP", 13, 0.3663, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "HR", 13, 384, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "RBI", 13, 1012, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "R", 13, 1102, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "SB", 13, 164, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "S", 12, 94, 0, 2, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "W", 6, 89, -2, 8, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "K", 11, 1576, -2, 3, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "ERA", 8, 3.946, 0, 6, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "WHIP", 9, 1.2179, 0, 5, 3, 5, 46, -4,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "OBP", 12, 0.3576, 0, 2, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "HR", 11, 323, 0, 3, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "RBI", 11, 954, 0, 3, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "R", 12, 1011, 0, 2, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "SB", 6, 89, 0, 8, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "S", 2, 56, -1, 12, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "W", 7.5, 91, 1.5, 6, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "K", 10, 1508, 0.5, 4, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "ERA", 13, 3.688, 1, 1, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "WHIP", 13, 1.1474, 0, 1, 4, 5.5, 45.5, 2
)
An option is to loop through the 'dataBatting', 'dataPitching' column names, do the unnest_wider separately, unnest the columns of interest, and bind the rows together (map_dfr - suffix 'dfr' returns dataframe with rows binded together from a list of data.frames of tibbles). One thing that should be noted is that many of the tidyverse functions are type sensitive. Here, we find some list elements to have different types and this would have an issue in unnest unless the 'ptype' is mentioned. To avoid that, we can use type.convert to change the type automatically based on the values and then do the unnesting
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
map_dfr(c('dataBatting', 'dataPitching'), ~
df %>%
unnest_wider(.x) %>%
mutate_at(vars(c(abbr, roto_points, value, diff, rank)),
type.convert) %>%
unnest(c(abbr, roto_points, value, diff, rank)) %>%
mutate_if(is.factor, as.character) %>%
select(-one_of(c("dataBatting", "dataPitching"))))
Consider dput:
structure(list(REAÇÃO = structure(c(0, 1, 0, 0, 1, 0, 1, 1,
0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1,
1, 0, 1, 1, 0, 1, 1), format.spss = "F11.0"), IDADE = structure(c(22,
38, 36, 58, 37, 31, 32, 54, 60, 34, 45, 27, 30, 20, 30, 30, 22,
26, 19, 18, 22, 23, 24, 50, 20, 47, 34, 31, 43, 35, 23, 34, 51,
63, 22, 29), format.spss = "F11.0"), ESCOLARIDADE = structure(c(6,
12, 12, 8, 12, 12, 10, 12, 8, 12, 12, 12, 8, 4, 8, 8, 12, 8,
9, 4, 12, 6, 12, 12, 12, 12, 12, 12, 12, 8, 8, 12, 16, 12, 12,
12), format.spss = "F11.0"), SEXO = structure(c(1, 1, 0, 0, 1,
0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0,
0, 1, 0, 1, 0, 0, 0, 1, 1, 1), format.spss = "F11.0")), .Names = c("REAÇÃO",
"IDADE", "ESCOLARIDADE", "SEXO"), row.names = c(NA, -36L), class = "data.frame")
where: REAÇÃO is a dependent variable in the model.
Constant: -4.438.
How can I obtain this value using a simple function in R?
For obtain constant term in Discriminant Analysis on R (with library MASS):
groupmean<-(model$prior%*%model$means)
constant<-(groupmean%*%model$scaling)
constant
where model is the lda discriminant expression:
model<-lda(y~x1+x2+xn,data=mydata)
model
I am trying to extract the median values from the following data
df<-structure(list(n = 26L, time = c(64, 77, 142, 148, 167, 175,
181, 218, 286, 294, 323, 362, 375, 414, 427, 442, 455, 460, 505,
543, 544, 548, 598, 604, 771, 951), n.risk = c(26, 25, 24, 23,
22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7,
6, 5, 4, 3, 2, 1), n.event = c(1, 0, 1, 1, 0, 1, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0), n.censor = c(0,
1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
0, 1, 0, 1), surv = c(0.961538461538462, 0.961538461538462, 0.921474358974359,
0.881410256410256, 0.881410256410256, 0.839438339438339, 0.839438339438339,
0.839438339438339, 0.839438339438339, 0.839438339438339, 0.786973443223443,
0.734508547008547, 0.682043650793651, 0.629578754578755, 0.577113858363858,
0.524648962148962, 0.524648962148962, 0.4663546330213, 0.408060303893637,
0.349765974765975, 0.349765974765975, 0.27981277981278, 0.209859584859585,
0.209859584859585, 0.104929792429792, 0.104929792429792), type = "right",
std.err = c(0.0392232270276368, 0.0392232270276368, 0.0578796660439579,
0.0729817807835649, 0.0729817807835649, 0.0877911880959172,
0.0877911880959172, 0.0877911880959172, 0.0877911880959172,
0.0877911880959172, 0.108967698764172, 0.128980092013706,
0.148762796526449, 0.168939711260041, 0.190043109889266,
0.212620066567793, 0.212620066567793, 0.24309706208875, 0.277404622263805,
0.317431643449181, 0.317431643449181, 0.388281918537096,
0.483834870173886, 0.483834870173886, 0.856794130229766,
0.856794130229766), upper = c(1, 1, 1, 1, 1, 0.997049673308717,
0.997049673308717, 0.997049673308717, 0.997049673308717,
0.997049673308717, 0.974346771572688, 0.945768634864856,
0.912933812389795, 0.876701615980298, 0.837580372384821,
0.795886882462859, 0.795886882462859, 0.751001648029994,
0.70283210436471, 0.651592180391947, 0.651592180391947, 0.598926755204663,
0.541713673163476, 0.541713673163476, 0.56260462703826, 0.56260462703826
), lower = c(0.890389006776242, 0.890389006776242, 0.822651689473135,
0.763934098528765, 0.763934098528765, 0.706741845048289,
0.706741845048289, 0.706741845048289, 0.706741845048289,
0.706741845048289, 0.635633245173389, 0.570438462156972,
0.509547937949868, 0.45211438075625, 0.397645905392106, 0.345848812876783,
0.345848812876783, 0.289595428067216, 0.236917480831754,
0.187749701094333, 0.187749701094333, 0.130725820922461,
0.0812994900059442, 0.0812994900059442, 0.019570157816371,
0.019570157816371), conf.type = "log", conf.int = 0.95, call = survfit(formula = Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],
all_clin$death_event[ind_clin]) ~ event_rna[ind_gene,
ind_tum])), .Names = c("n", "time", "n.risk", "n.event",
"n.censor", "surv", "type", "std.err", "upper", "lower", "conf.type",
"conf.int", "call"), class = "survfit")
I try to get it like below
x1 <- ifelse (is.na(as.numeric(summary(s)$table[,'median'][1])),'NA',as.numeric(summary(s)$table[,'median'][1]))
x2 <- as.numeric(summary(s)$table[,'median'][2])
if(x1 != 'NA' & x2 != 'NA'){
lines(c(0,x1),c(0.5,0.5),col='blue')
lines(c(x1,x1),c(0,0.5),col='black')
lines(c(x2,x2),c(0,0.5),col='red')
}
I get the following error for both comments
Error in summary(s)$table[, "median"] : incorrect number of dimensions
I'm a beginneR using R Studio with R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" in Windows 7.
Data I'm using...
> dput(head(data,20))
structure(list(case = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), age = c(37, 42, 44, 40, 26, 29, 42, 26,
18, 56, 29, 66, 71, 26, 30, 48, 39, 65, 65, 48), bmi = c(25.95,
29.07, 27.63, 27.4, 25.34, 31.38, 25.08, 28.01, 24.69, 25.06,
27.68, 23.51, 29.86, 21.72, 25.95, 22.86, 23.53, 21.3, 33.2,
29.39), ord.bmi = c(3, 3, 3, 3, 3, 4, 3, 3, 2, 3, 3, 2, 3, 2,
3, 2, 2, 2, 4, 3), alcohol = c(2, 2, 1, 1, 2, 1, 1, 1, 1, 1,
2, 1, 1, 1, 1, 1, 2, 2, 1, 1), tobacco = c(1, 1, 1, 2, 2, 1,
2, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1), dent.amalgam = c(1,
2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1), exp.years = c(7,
9, 9, 5, 2, 10, 15, 5, 1, 40, 10, 50, 50, 1, 12, 22, 22, 30,
40, 30), mn = c(0, 0, 0, 1.5, 1.5, 1, 0, 0, 0.5, 0.5, 1, 1, 0,
0, 0, 0.5, 0, 0.5, 2, 1), bn = c(2.5, 5, 2.5, 2, 1.5, 4, 2, 1.5,
4.5, 4.5, 2.5, 2, 6, 2, 5, 4, 1, 1.5, 7, 1.5), ln = c(0.5, 1.5,
0, 2, 1.5, 1.5, 1, 0.5, 2, 2, 1, 1, 4.5, 0, 2, 1, 3, 2, 3, 3),
pn = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5,
0.5, 0.5, 0, 0), cc = c(0, 1, 0, 2, 2, 4, 1, 1.5, 4.5, 2,
0, 3.5, 2, 1.5, 2, 1.5, 0.5, 1, 2, 1.5), kr = c(0, 0, 0,
0, 0, 0, 0.5, 0, 0.5, 1, 0, 0.5, 1.5, 0.5, 0.5, 0.5, 0, 0.5,
0, 0), kl = c(0.5, 2, 0, 1.5, 1.5, 0, 2, 0, 2, 2, 0, 1.5,
1.5, 1, 4, 3, 2, 3.5, 4.5, 2)), .Names = c("case", "age",
"bmi", "ord.bmi", "alcohol", "tobacco", "dent.amalgam", "exp.years",
"mn", "bn", "ln", "pn", "cc", "kr", "kl"), row.names = c(NA,
20L), class = "data.frame")
I'm plotting two different densities (which I get using density.a <- lapply(data[which(data$case == 0),], density) and density.b <- lapply(data[which(data$case == 1),], density)), and everything seems to work fine:
plot.densities <- function(sample.a, sample.b){ # declaring the function arguments
for(i in seq(length(sample.a))){ # for every element in the first argument (expected equal lengths)
plot(range(sample.a[[i]]$x, sample.b[[i]]$x), # generate a plot
range(sample.a[[i]]$y, sample.b[[i]]$y),
xlab = names(sample.a[i]), ylab = "Density", main = paste(names(sample.a[i]), "density plot"))
lines(sample.a[[i]], col = "red") # red lines
lines(sample.b[[i]], col = "green") #green lines
}
}
When I call the function, I get plots like this:
Then, if I want to fill the line between the two curves, I add the polygon function and looks like this:
filled.plot <- function(sample.a, sample.b){ # declaring the function arguments
for(i in seq(length(sample.a))){ # for every element in the first argument (expected equal lengths)
plot(range(sample.a[[i]]$x, sample.b[[i]]$x), # generate a plot
range(sample.a[[i]]$y, sample.b[[i]]$y),
xlab = names(sample.a[i]), ylab = "Density",
main = paste(names(sample.a[i])))
lines(sample.a[[i]], col = "red") # red lines
lines(sample.b[[i]], col = "green") #green lines
polygon(x = c(range(sample.a[[i]]$x, sample.b[[i]]$x),
rev(range(sample.a[[i]]$x, sample.b[[i]]$x))),
y = c(range(sample.a[[i]]$y, sample.b[[i]]$y),
rev(range(sample.a[[i]]$x, sample.b[[i]]$x))),
col = "skyblue")
}
}
But when I call the filled.plot function, I get plots like this:
I'm stuck, and some help would be just fine!
Thanks in advance.
Try with ggplot (I have changed the case value of rows 11:20 to 2):
ggplot()+
geom_density(data=testdf[testdf$case==1,], aes(age),fill='red', alpha=0.5)+
geom_density(data=testdf[testdf$case==2,], aes(age), fill='green', alpha=0.5)