How to overlay survival plot - r

I would to overlay two different survival curves on same plot, for example OS et PFS (here false results).
N pt. OS. OS_Time_(years). PFS. PFS_Time_(years).
__________________________________________________________________
1. 1 12 0 12
2. 0 10 1 8
3. 0 14 0 14
4. 0 10 0 10
5. 1 11 1 8
6. 1 16 1 6
7. 0 11 1 4
8. 0 12 1 10
9. 1 9 0 9
10 1 10 1 9
__________________________________________________________
First, I import my dataset:
library(readxl)
testR <- read_excel("~/test.xlsx")
View(testR)
Then, I created survfit for both OS and PFS:
OS<-survfit(Surv(OS_t,OS)~1, data=test)
PFS<-survfit(Surv(PFS_t,PFS)~1, data=test)
And finally, I can plot each one thanks to:
plot(OS)
plot(PFS)
for example (or ggplot2...).
Here my question, if I want to overlay the 2 ones on same graph, how can I do?
I tried multipleplot or
ggplot(testR, aes(x)) + # basic graphical object
geom_line(aes(y=y1), colour="red") + # first layer
geom_line(aes(y=y2), colour="green") # second layer
But it didn't work (but I'm not sure to use it correctly).
Can someone help me, please ?
Thanks a lot
Here is my code for Data sample:
test <- structure(list(ID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 2, 3, 4, 5, 6, 7, 8, 9),
Sex = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1),
Tabac = c(2, 0, 1, 1, 0, 0, 2, 0, 0, 0, 1, 1, 1, 0, 2, 0, 1, 1, 1),
Bmi = c(20, 37, 37, 25, 28, 38, 16, 27, 26, 28, 15, 36, 20, 17, 28, 37, 27, 26, 18),
Age = c(75, 56, 45, 65, 76, 34, 87, 43, 67, 90, 56, 37, 84, 45, 80, 87, 90, 65, 23), c(0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0),
OS_times = c(2, 4, 4, 2, 3, 5, 5, 3, 2, 2, 4, 1, 3, 2, 4, 3, 4, 3, 2),
OS = c(0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0),
PFS_time = c(1, 2, 1, 1, 3, 4, 3, 1, 2, 2, 4, 1, 2, 2, 2, 3, 4, 3, 2),
PFS = c(1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0)),
.Names = c("ID", "Sex", "Tabac", "Bmi", "Age", "LN", "OS_times", "OS", "PFS_time", "PFS"),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -19L))

You may use the ggsurv function from the GGally package in the following way. Combine both groups of variables in a data frame and add a "type" column. Later in the call to the plot, you refer to the type.
I used your data structure and named it "test". Afterwards, I transformed it to a data frame with the name "testdf".
library(GGally)
testdf <- data.frame(test)
OS_PFS1 <- data.frame(life = testdf$OS, life_times = testdf$OS_times, type= "OS")
OS_PFS2 <- data.frame(life = testdf$PFS, life_times = testdf$PFS_time, type= "PFS")
OS_PFS <- rbind(OS_PFS1, OS_PFS2)
sf.OS_PFS <- survfit(Surv(life_times, life) ~ type, data = OS_PFS)
ggsurv(sf.OS_PFS)
if you want the confidence intervals shown:
ggsurv(sf.OS_PFS, CI = TRUE)
Please let me know whether this is what you want.

Related

Odds ratio for 2*5 Table [duplicate]

I have the following dataframe:
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15), var1 = c(1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1), var2 = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1),
var3 = c(1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1), var4 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1), outcome = c(1,
1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1)), row.names = c(NA,
-15L), class = c("tbl_df", "tbl", "data.frame"))
I would like to arrange a script to calculate all possible odds ratio (using chi square), with 95% CI and p values, between all columns and the column outcome.
How can I do that?
I installed epitools but it seems that I need a 2x2 contingency table and I am not able to apply the function to columns of a dataframe
With mapply, you can use the fisher.test function, which doesn't fail when the odds ratio cannot be calculated.
mapply(fisher.test, x=data[, grep("var", names(data))], y=data[,"outcome"])
But the output is a 7x4 matrix which cannot be tidied into a nice format. However, we can use lapply to perform Fisher's test for each column and then tidy the results with the broom package.
library(broom)
cols <- df1[,grep("var", names(df1))]
res_list <- lapply(as.list(cols), function(x) fisher.test(x, y=df1$outcome))
do.call(rbind, lapply(res_list, broom::tidy))
# A tibble: 4 x 6
estimate p.value conf.low conf.high method alternative
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 0 1 0 77.9 Fisher's Exact Test ~ two.sided
2 Inf 0.505 0.204 Inf Fisher's Exact Test ~ two.sided
3 2.13 0.608 0.160 37.2 Fisher's Exact Test ~ two.sided
4 Inf 0.505 0.204 Inf Fisher's Exact Test ~ two.sided
Or using dplyr with map, reshaping first and then splitting on the name.
library(dplyr)
df1 %>%
pivot_longer(cols=starts_with("var")) %>%
split(.$name) %>%
map(~fisher.test(x=.$value, y=.$outcome)) %>%
map(tidy) %>%
map_df(~as_tibble(.))
Data:
df1 <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15), var1 = c(1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1), var2 = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1),
var3 = c(1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1), var4 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1), outcome = c(1,
1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1)), row.names = c(NA,
-15L), class = c("tbl_df", "tbl", "data.frame"))
The following code performs the computations as described in the question but 3/4 give errors.
library(epitools)
cols <- grep("var", names(df1), value = TRUE)
res_list <- lapply(cols, function(v){
tbl <- table(df1[, c(v, "outcome")])
tryCatch(oddsratio(x = tbl), error = function(e) e)
})
ok <- !sapply(res_list, inherits, "error")
res_list[ok]
The errors are all this:
simpleError in uniroot(function(or) { 1 - midp(a1, a0, b1, b0, or)
- alpha/2}, interval = interval): f() values at end points not of opposite sign
which can be seen with
res_list[!ok]

One-hot coding to numeric [duplicate]

This question already has answers here:
How do I dichotomise efficiently
(5 answers)
How to one hot encode several categorical variables in R
(5 answers)
Closed 9 months ago.
I am working on a project that requires me to one-hot code a single variable and I cannot seem to do it correctly.
I simply want to one-hot code the variable data$Ratings so that the values for 1,2,3 and separated in the dataframe and only equal either 0 or 1. E.g., if data$Ratings = 3 then the dummy would = 1. All the other columns are not to change.
structure(list(ID = c(284921427, 284926400, 284946595, 285755462,
285831220, 286210009, 286313771, 286363959, 286566987, 286682679
), AUR = c(4, 3.5, 3, 3.5, 3.5, 3, 2.5, 2.5, 2.5, 2.5), URC = c(3553,
284, 8376, 190394, 28, 47, 35, 125, 44, 184), Price = c(2.99,
1.99, 0, 0, 2.99, 0, 0, 0.99, 0, 0), AgeRating = c(1, 1, 1, 1,
1, 1, 1, 1, 1, 1), Size = c(15853568, 12328960, 674816, 21552128,
34689024, 48672768, 6328320, 64333824, 2657280, 1466515), HasSubtitle = c(0,
0, 0, 0, 0, 1, 0, 0, 0, 0), InAppSum = c(0, 0, 0, 0, 0, 1.99,
0, 0, 0, 0), InAppMin = c(0, 0, 0, 0, 0, 1.99, 0, 0, 0, 0), InAppMax = c(0,
0, 0, 0, 0, 1.99, 0, 0, 0, 0), InAppCount = c(0, 0, 0, 0, 0,
1, 0, 0, 0, 0), InAppAvg = c(0, 0, 0, 0, 0, 1.99, 0, 0, 0, 0),
descriptionTermCount = c(263, 204, 97, 272, 365, 368, 113,
129, 61, 87), LanguagesCount = c(17, 1, 1, 17, 15, 1, 0,
1, 1, 1), EngSupported = c(2, 2, 2, 2, 2, 2, 1, 2, 1, 2),
GenreCount = c(2, 2, 2, 2, 3, 3, 3, 2, 3, 2), months = c(7,
7, 7, 7, 7, 7, 7, 8, 8, 8), monthsSinceUpdate = c(29, 17,
25, 29, 15, 6, 71, 12, 23, 134), GameFree = c(0, 0, 0, 0,
0, 1, 0, 0, 0, 0), Ratings = c(3, 3, 3, 3, 2, 3, 2, 3, 2,
3)), row.names = c(NA, 10L), class = "data.frame")
install.packages("mlbench")
install.packages("neuralnet")
install.packages("mltools")
library(mlbench)
library(dplyr)
library(caret)
library(mltools)
library(tidyr)
data2 <- mutate_if(data, is.factor,as.numeric)
data3 <- lapply(data2, function(x) as.numeric(as.character(x)))
data <- data.frame(data3)
summary(data)
head(data)
str(data)
View(data)
#
dput(head(data, 10))
data %>% mutate(value = 1) %>% spread(data$Ratings, value, fill = 0 )
Is this what you want? I will assume your data is called data and continue with that for the data frame you supplied:
library(plm)
plm::make.dummies(data$Ratings) # returns a matrix
## 2 3
## 2 1 0
## 3 0 1
# returns the full data frame with dummies added:
plm::make.dummies(data, col = "Ratings")
## [not printed to save space]
There are some options for plm::make.dummies, e.g., you can select the base category via base and you can choose whether to include the base (add.base = TRUE) or not (add.base = FALSE).
The help page ?plm::make.dummies has more examples and explanation as well as a comparison for LSDV model estimation by a factor variable and by explicitly self-created dummies.

How to set NA values from a matrix to black-coloured tiles in a ggplot heatmap

I am working on the following structure and the following plotting code:
structure(c(NA, 11, 9, 9, 21, 7, 2, 5, 3, 0, 0, 1, 31, NA, 3,
2, 1, 0, 0, 10, 3, 0, 0, 0, 31, 16, NA, 2, 2, 10, 0, 5, 0, 0,
0, 0, 59, 65, 1, NA, 2, 4, 0, 4, 0, 0, 0, 0, 156, 23, 7, 17,
NA, 3, 2, 4, 7, 0, 0, 0, 31, 84, 0, 10, 16, NA, 0, 6, 0, 0, 2,
0, 129, 0, 2, 1, 0, 0, NA, 0, 0, 0, 0, 0, 41, 41, 0, 3, 4, 5,
0, NA, 0, 0, 0, 1, 16, 4, 1, 2, 0, 0, 0, 3, NA, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, NA, 0, 0, 1, 12, 2, 0, 0, 6, 0, 0, 0, 0,
NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), .Dim = c(12L,
12L), .Dimnames = list(c("WILL_", "WOULD_", "MAY_", "MIGHT_",
"CAN_", "COULD_", "SHALL_", "SHOULD_", "MUST_", "OUGHT TO_",
"USED TO_", "HAVE TO_"), c("_WILL", "_WOULD", "_MAY", "_MIGHT",
"_CAN", "_COULD", "_SHALL", "_SHOULD", "_MUST", "_OUGHT TO",
"_USED TO", "_HAVE TO")))
breaks <- c(0,1,5,10,50,100,500,100000)
reshape2::melt(structure, value.name = "Freq") %>%
mutate(label = ifelse(is.na(Freq) | Freq == 0, "", as.character(Freq))) %>%
ggplot(aes(Var2, fct_rev(Var1))) +
geom_tile(aes(fill = Freq), color = "black") +
geom_text(aes(label = label), color = "black") +
scale_fill_steps(low = "white", high = "purple", breaks = breaks, na.value = "grey",trans = "log")+
scale_x_discrete(NULL, expand = c(0, 0), position="top") +
scale_y_discrete(NULL, expand = c(0, 0)) +
theme(axis.text.x = element_text(angle=60,vjust = 0.5, hjust = 0))
I am trying to tweak the code so that original NA values (seen on the plot as the tiles forming a diagonal line from the co-occurrence of WILL WILL to HAVE TO HAVE TO, and the X HAVE TO column) are represented as black tiles separately from the other tiles which I would like to keep as they are.
Looking for tips on how to do this as I think I'm doing something wrong with the representation of values at the beginning of my code.
All the best
Cameron

Use and save lm summary for multplie lm fitting

I work with animal trials in which I try to get information about movement for several groups of animals (normally 4 groups of 12 individuals, but not allways the same).
My final data frame per trial looks like this.
> dput(aa)
structure(list(Tiempo = c(618.4, 618.6, 618.8, 619, 619.2, 619.4,
619.6, 619.8, 620, 620.2, 620.4), UT1 = c(0, 0, 15, 19, 26, 27,
29, 37, 42, 44, 45), UT2 = c(0, 0, 0, 0, 0, 1, 18, 19, 21, 21,
21), UT3 = c(0, 2, 3, 3, 3, 3, 16, 19, 20, 20, 20), UT4 = c(0,
0, 0, 0, 0, 0, 5, 17, 29, 34, 39), UT5 = c(0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1), UT6 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), UT7 = c(0,
0, 1, 2, 2, 3, 4, 6, 7, 7, 8), UT8 = c(0, 19, 20, 23, 24, 25,
33, 80, 119, 122, 130), UT9 = c(0, 1, 1, 1, 1, 3, 6, 9, 19, 19,
19), UT10 = c(0, 0, 0, 0, 0, 1, 2, 3, 10, 12, 14), TR1 = c(0,
0, 0, 0, 0, 0, 0, 1, 2, 2, 2), TR2 = c(0, 0, 0, 0, 0, 0, 2, 19,
32, 37, 43), TR3 = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), TR4 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), TR5 = c(0, 0, 0, 0, 0, 0, 13,
18, 20, 22, 26), TR6 = c(0, 2, 11, 20, 25, 29, 37, 40, 41, 42,
43), TR7 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), TR8 = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), TR9 = c(0, 0, 4, 9, 16, 19, 23, 27,
31, 33, 34), TR10 = c(0, 1, 9, 25, 32, 41, 49, 49, 51, 57, 60
), UT1.1 = c(0, 10, 15, 17, 23, 31, 37, 48, 53, 57, 58), UT2.1 = c(0,
1, 1, 1, 1, 2, 2, 4, 4, 4, 4), UT3.1 = c(0, 2, 11, 14, 20, 22,
24, 25, 26, 26, 26), UT4.1 = c(0, 0, 0, 0, 0, 0, 0, 11, 13, 13,
14), UT5.1 = c(0, 3, 5, 7, 18, 19, 19, 27, 37, 39, 42), UT6.1 = c(0,
0, 0, 0, 0, 0, 2, 2, 3, 4, 4), UT7.1 = c(0, 0, 2, 8, 9, 9, 12,
16, 18, 18, 18), UT8.1 = c(0, 0, 1, 8, 13, 15, 44, 68, 80, 89,
94), UT9.1 = c(0, 1, 1, 1, 1, 2, 3, 5, 9, 10, 10), UT10.1 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), UT11 = c(0, 12, 17, 17, 18, 34,
74, 116, 131, 145, 170), UT12 = c(0, 1, 2, 3, 3, 3, 5, 14, 21,
22, 24), TR1.1 = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1), TR2.1 = c(0,
0, 0, 11, 16, 19, 40, 94, 121, 134, 145), TR3.1 = c(0, 0, 0,
2, 3, 5, 6, 6, 6, 7, 7), TR4.1 = c(0, 0, 0, 1, 1, 1, 1, 1, 4,
4, 5), TR5.1 = c(0, 24, 27, 28, 29, 37, 86, 151, 212, 258, 288
), TR6.1 = c(0, 0, 1, 1, 1, 2, 5, 9, 12, 12, 13), TR7.1 = c(0,
4, 7, 28, 47, 70, 108, 125, 127, 127, 127), TR8.1 = c(0, 1, 2,
2, 2, 2, 3, 3, 4, 4, 4), TR9.1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), TR10.1 = c(0, 1, 1, 1, 1, 1, 13, 40, 41, 45, 49), TR11 = c(0,
0, 0, 1, 4, 8, 10, 11, 17, 23, 25), TR12 = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0)), .Names = c("Tiempo", "UT1", "UT2", "UT3", "UT4",
"UT5", "UT6", "UT7", "UT8", "UT9", "UT10", "TR1", "TR2", "TR3",
"TR4", "TR5", "TR6", "TR7", "TR8", "TR9", "TR10", "UT1.1", "UT2.1",
"UT3.1", "UT4.1", "UT5.1", "UT6.1", "UT7.1", "UT8.1", "UT9.1",
"UT10.1", "UT11", "UT12", "TR1.1", "TR2.1", "TR3.1", "TR4.1",
"TR5.1", "TR6.1", "TR7.1", "TR8.1", "TR9.1", "TR10.1", "TR11",
"TR12"), row.names = c(NA, -11L), class = "data.frame")
My goal is to lm the individuals represented in each column using Tiempo variable as x so I do it like this:
fit<-apply(aa,2,function(x) lm(x~aa$Tiempo))
It works perfect but the problem is that all the valuable (and useless) information gets stored in that lm object and I can't extract the data in an efficient way. My lm object looks like this
summary(fit)
Length Class Mode
Tiempo 12 lm list
UT1 12 lm list
UT2 12 lm list
UT3 12 lm list
UT4 12 lm list
UT5 12 lm list
UT6 12 lm list
UT7 12 lm list
UT8 12 lm list
UT9 12 lm list
UT10 12 lm list
TR1 12 lm list
TR2 12 lm list
TR3 12 lm list
TR4 12 lm list
TR5 12 lm list
TR6 12 lm list
TR7 12 lm list
TR8 12 lm list
TR9 12 lm list
TR10 12 lm list
UT1.1 12 lm list
UT2.1 12 lm list
UT3.1 12 lm list
UT4.1 12 lm list
UT5.1 12 lm list
UT6.1 12 lm list
UT7.1 12 lm list
UT8.1 12 lm list
UT9.1 12 lm list
UT10.1 12 lm list
UT11 12 lm list
UT12 12 lm list
TR1.1 12 lm list
TR2.1 12 lm list
TR3.1 12 lm list
TR4.1 12 lm list
TR5.1 12 lm list
TR6.1 12 lm list
TR7.1 12 lm list
TR8.1 12 lm list
TR9.1 12 lm list
TR10.1 12 lm list
TR11 12 lm list
TR12 12 lm list
And each animal looks like this
summary(fit$UT1)
Call:
lm(formula = x ~ aa$Tiempo)
Residuals:
Min 1Q Median 3Q Max
-6.873 -1.845 1.182 2.314 4.918
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -14642.700 1104.825 -13.25 3.29e-07 ***
aa$Tiempo 23.682 1.784 13.28 3.24e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.742 on 9 degrees of freedom
Multiple R-squared: 0.9514, Adjusted R-squared: 0.946
F-statistic: 176.3 on 1 and 9 DF, p-value: 3.24e-07
I would like to get the summary information organised in a data frame with all animals (or at least the coefficients and R-squared data) in order to keep doing some statistical analysis. Having that information cuould possibly help me to think a function to evaluate if the R-squared is lower than a fixed value and I should check that fit (or discard that animal if it's really not performing well). Besides, I should find a way to make it reproducible because nowadays I'm using
FIT<-data.frame(UT1=fit$UT1$coefficients,
UT2=fit$UT2$coefficients,
UT3=fit$UT3$coefficients,...)
This approach doesn't even meet what I'm trying to do and it's really precarious.
I've made a little search and find about coef function but
coef(fit)
NULL
With your fit list, you can extract the coefficients and r-squared values with
fit<-apply(aa,2,function(x) lm(x~aa$Tiempo))
mysummary <- t(sapply(fit, function(x) {
ss<-summary(x); c(coef(x),
r.square=ss$r.squared, adj.r.squared=ss$adj.r.squared)
}))
We use sapply to go over the list you created and extract the coefficients from the model and the r-squared values from the summary. The output is
> mysummary
(Intercept) aa$Tiempo r.square adj.r.squared
Tiempo 0.0000 1.0000000 1.0000000 1.0000000
UT1 -14642.7000 23.6818182 0.9514231 0.9460256
UT2 -8662.4182 14.0000000 0.7973105 0.7747894
UT3 -7535.5091 12.1818182 0.8404400 0.8227111
...

How to plot a ternary graphic with different size point in ggtern?

I would like to make a ggtern graph that I could change the size of every point. My data has some patients which have only 1 of the 3 possible compositions. As a result, in a vertex, I have more than 1 patient information overlapped, and I don't want to jitter.
What I have so far:
library(compositions)
library(ggtern)
ds <- structure(list(`GC+` = c(1, 0, 9, 21, 2, 0, 0, 0, 4, 0, 0, 24,
0, 0, 1, 0, 0, 3, 3, 0, 5, 0, 0, 3, 0, 0, 0, 2, 11, 0, 0, 18,
13, 0, 6, 8, 0, 1, 0, 1, 23, 0, 1, 4, 5), `PC+` = c(5, 2, 8,
0, 6, 0, 0, 0, 10, 0, 0, 20, 0, 0, 2, 0, 0, 3, 3, 0, 0, 0, 10,
2, 0, 0, 0, 0, 10, 1, 0, 4, 8, 0, 1, 16, 1, 2, 0, 0, 18, 0, 0,
0, 1), `OT+` = c(0, 2, 7, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0,
0, 2, 5, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 12, 0, 0, 6, 3, 1, 0,
6, 0, 0, 0, 0, 3, 0, 0, 3, 0), size = c(1, 1, 1, 4, 1, 0, 0,
0, 1, 0, 3, 1, 0, 0, 1, 0, 3, 1, 1, 0, 4, 0, 1, 1, 0, 0, 0, 1,
1, 2, 0, 1, 1, 3, 1, 1, 2, 1, 0, 4, 1, 0, 4, 1, 1)), row.names = c(NA,
45L), class = "data.frame")
d.tern <- as.data.frame(acomp(ds))
size <- apply(d.tern, 2, function(x) {
sum(x==1)
})
ds$size <- ifelse(d.tern$`GC+` == 1, 4,
ifelse(d.tern$`PC+` == 1, 2,
ifelse(d.tern$`OT+` == 1, 3, 1)))
ds$size[is.na(ds$size)] <- 0
ggtern(data = ds, aes(`GC+`, `PC+`, `OT+`)) +
geom_mask() +
geom_point(fill="red", shape=21, size = 3) +
theme_bw() +
theme_showarrows() +
theme_clockwise() +
labs(x = "GC+", y = "PC+", z = "OT+",
title = "Composição dos Linfonodos Positivos")
I would like to pass size from ds to geom_point. But it doesn't work.
So here is a way how to count the samples per unique value:
tab <- as.data.frame(table(ds[,1:3]))
# Keep only observed samples
tab <- tab[tab$Freq > 0,]
# Fix colnames to contain plus
colnames(tab) <- gsub("\\.", "+", colnames(tab))
# For reasons I don't understand the columns were converted to factors
# so we'll fix them again as numeric
tab[, 1:3] <- lapply(tab[, 1:3], as.numeric)
And then the plotting would be as follows:
ggtern(data = tab, aes(`GC+`, `PC+`, `OT+`)) +
geom_mask() +
geom_point(aes(size = Freq), fill="red", shape=21) +
scale_size_continuous(range = c(3, 5), breaks = sort(unique(tab$Freq))) +
theme_bw() +
theme_showarrows() +
theme_clockwise() +
labs(x = "GC+", y = "PC+", z = "OT+",
title = "Composição dos Linfonodos Positivos")
You can play around with the scale_size_continuous() function untill you have sizes that satisfy you.

Resources