plotting two paths on ggtern plot in R - r

In the ggtern package in R, I am trying to plot two paths of different colors on the same ternery plot, and label their starting points ONLY, could someone show me how to do this, I can get the path on single plots, but not together on the same one, here is my example:
require(ggtern)
require(ggtern)
x <- data.frame(
A = c( 0, 0, 1, 0.1),
B = c( 0, 1, 0, 0.3) ,
C = c( 1, 0, 0, 0.6)
)
yy<-data.frame(
D= c(0.6, 0.2,0.8,0.33 ),
E= c(0.2, 0.8, 0.1,0.33),
F= c(0.2, 0.0, 0.1,0.33)
)
ggtern(data=x,aes(A,B,C)) +
geom_path(color="red")+
geom_point(type="l",shape=21,size=2) +
geom_text(label="", color="blue")+
theme_classic()
ggtern(data=yy,aes(D,E,F)) +
geom_path(color="blue")+
geom_point(type="l",shape=21,size=1) +
theme_classic()

Here I provide an answer to your question, also taking the opportunity to demonstrate some of the additional functionality of ggtern 2.0.1, which was published on CRAN a couple of days ago after completely re-writing the package to be compatible with ggplot2 2.0.0. A summary of the new functionality in ggtern 2.0.X can be found here:
Eric Fail is correct in saying that the best solution requires that the data to be combined into a single dataframe, and the paths either grouped or mapped to a different variable for colour, in order to distinguish between them. An alternate way is to create two(2) path layers, with a local dataframe passed to each geometry, rather than using the global dataframe passed to the ggtern constructor.
In the following solution, I have combined the data, created a 'Series' variable (subsequently mapped to colour), and then made use of the new geom_label(...) geometry that comes with the new version of ggplot2. Since some of the points lie on the perimeter (and the labels extend beyond the perimeter), I have also applied a manual clipping mask under the layers, which suppresses ggterns automatic clipping mask -- normally rendered in the foreground. I have also applied the theme_rotate(...) convenience function for the purposes of demonstration, and made use of the limit_tern(...) convenience function to extend the range of the axes beyond the standard range of [0,1]. Finally, new labels have been created for the procession arrows, which are different from the apex labels.
The above solution can be produced with the following code:
require(ggtern)
df.A <- data.frame(
A = c( 0, 0, 1, 0.1),
B = c( 0, 1, 0, 0.3) ,
C = c( 1, 0, 0, 0.6)
)
df.B <-data.frame(
A= c(0.6, 0.2,0.8,0.33 ),
B= c(0.2, 0.8, 0.1,0.33),
C= c(0.2, 0.0, 0.1,0.33)
)
df = rbind(data.frame(df.A,Series='A'),
data.frame(df.B,Series='B'))
df$Label = 1:nrow(df)
ggtern(data=df,aes(A,B,C,colour=Series)) +
theme_dark() +
theme_legend_position('topleft') +
theme_showarrows() + custom_percent('%') +
theme_rotate(60) +
geom_mask() +
geom_path(size=1) +
geom_label(aes(label=Label),show.legend = F) +
limit_tern(1.1,1.1,1.1) +
labs(title ="Example Combined Paths",
Tarrow = "Value B",
Larrow = "Value A",
Rarrow = "Value C")

Related

Create venn diagrams in R with circles one inside another

I want to create venn diagrams to emphasize that groups (circles) are completely located inside one another, i.e., there are no elements in the inner circles that are not simutanously in outer circles.
I've used ggvenn and arrived at these results:
colonias <- c("colônias")
possessoes <- c("possessões", colonias)
dominios <- c("domínios", possessoes, colonias)
ggvenn(tipologia_britanica,
show_elements = T,
label_sep = "\n",
fill_color = brewer.pal(name="Dark2", n=3),
fill_alpha = 0.6,
stroke_size = 0.2,
stroke_alpha = 0.2,
set_name_size = 5,
text_size = 5)
The result is tchnically correct because it show that "colonias" are common to all three groups and that "possessoes" are common to both "possessoes" and "dominios". But graphically I would like te groups to be completely inside one another to show that are no elements in "colonias" that are not common to all three, and in "possessoes" that are not common to "dominios". I'm not sure that ggvenn package is capable of plotting that.
One way may use the package eulerr.
However, your question isn't very clear so I let you play with the package
See the example below :
library(eulerr)
fit <- euler(c("A" = 10, "B" = 10, "A&B" = 8, "A&B&C"=3))
plot(fit,
fills = list(fill = c("red", "steelblue4","green"), alpha = 0.5),
labels = list(col = "black", font = 4),quantities = T)
I don't think ggvenn allows a plot with this kind of relationship. However, it's not terribly difficult to draw it yourself with ggplot and geom_circle from ggforce
ggplot(data.frame(group = c("domínios", "possessões", "colônias"),
r = c(3, 2, 1)),
aes(x0 = 3 - r, y0 = 0, fill = factor(group, group))) +
geom_circle(aes(r = r), alpha = 1) +
geom_text(aes(x = c(0, 1, 2), y = c(2.3, 1.3, 0), label = group),
size = 8) +
scale_fill_manual(values = c('#77bca2', '#e1926b', '#a09cc8'),
guide = 'none') +
coord_equal() +
theme_void()

Control Label of Contour Lines in `contour()`

I am using image() and contour() to create a "heatmap" of probabilities - for example:
I was asked to change the labels such that they "do not overlap the lines, and the lines are unbroken." After consulting ?contour(), I tried changing to method = "edge" and method = "simple", but both fail print the labels (although the lines are unbroken), and cant seem to find posts regarding similar issues elsewhere.
Any advice on how to manipulate the labels to appear adjacent to (not on top of) unbroken lines would be much appreciated. I would prefer base R but also would welcome options from more flexible packages or alternative base R functions.
Minimal code to recreate example figure is here:
# Generate Data
Rs <- seq(0.02, 1.0, 0.005)
ks <- 10 ^ seq(-2.3, 0.5, 0.005)
prob <- function(Y,R,k) {
exp(lgamma(k*Y+Y-1) - lgamma(k*Y) - lgamma(Y+1) + (Y-1) * log(R/k) - (k*Y+Y-1) * log(1+R/k))
}
P05 <- matrix(NA, ncol = length(ks), nrow = length(Rs))
for(i in 1:length(Rs)) {
for(j in 1:length(ks)) {
P05[i,j] <- 1 - sum(prob(1:(5 - 1), Rs[i], ks[j]))
}
}
colfunc <- colorRampPalette(c("grey25", "grey90"))
lbreaks <- c(-1e-10, 1e-5, 1e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 1.5e-1, 1)
## Create Figure
image(Rs, ks, P05,
log="y", col = rev(colfunc(length(lbreaks)-1)), breaks = lbreaks, zlim = lbreaks,
ylim = c(min(ks), 2), xlim = c(0,1))
contour(Rs, ks, P05, levels = lbreaks, labcex = 1, add = TRUE)
There is an easy(ish) way to do this in ggplot, using the geomtextpath package.
First, convert your matrix to an x, y, z data frame:
df <- expand.grid(Rs = Rs, ks = ks)
df$z <- c(P05)
Now plot a filled contour, and then geom_textcontour. By default the text will break the lines, as in contour, but if you set the vjust above one or below zero the lines will close up as they don't need to break for the text.
I've added a few theme and scale elements to match the aesthetic of the base graphics function. Note the text and line size, color etc remain independently adjustable.
library(geomtextpath)
ggplot(df, aes(Rs, ks, z = z)) +
geom_contour_filled(breaks = lbreaks) +
geom_textcontour(breaks = lbreaks, color = 'black', size = 5,
aes(label = stat(level)), vjust = 1.2) +
scale_y_log10(breaks = c(0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2),
expand = c(0, 0)) +
scale_fill_manual(values = rev(colfunc(9)), guide = 'none') +
scale_x_continuous(expand = c(0, 0)) +
theme_classic(base_size = 16) +
theme(axis.text.y = element_text(angle = 90, hjust = 0.5),
axis.ticks.length.y = unit(3, 'mm'),
plot.margin = margin(20, 20, 20, 20))
The contour function is mostly written in C, and as far as I can see, it doesn't support the kinds of labels you want.
So I think there are two ways to do this, neither of which is very appealing:
Modify the source to the function. You can see the start of the labelling code here. I think you would need to rebuild R to incorporate your changes; it's not easy to move a function from a base package to a contributed package.
Draw the plot with no labels, and add them manually after the fact. You could add them using text(), or produce an output file and use an external program to edit the output file.

R stairstep without upwards line after last data point

I'm plotting a cumulative step function, and I want to suppress the behavior of the line jumping up after the last row in dataset. This happens both in base R and ggplot2.
Is there a way to do it without specifying xlim to exclude the jump upwards?
data = data.frame(V1 = c(-0.1, 0, 0, 1, 1.1), V2 = c(0, 0, 0.7, 0.3, 0.3))
base R
plot(data$V1, cumsum(data$V2), type="s")
ggplot2
ggplot(data, aes(x=V1, y=cumsum(V2))) +
geom_step()
The way the step function works seems correct to me, if you take sum(data$V2) that is 1.3 and that is where your line ends. It is also identical to tail(cumsum(data$V2), 1). However, if you insist on not drawing the last line segment, you can set the last value of data$V2 to 0. Example below:
library(ggplot2)
data = data.frame(V1 = c(-0.1, 0, 0, 1, 1.1), V2 = c(0, 0, 0.7, 0.3, 0.3))
ggplot(data, aes(x = V1, y = cumsum(c(head(V2, -1), 0)))) +
geom_step()
Note that the example doesn't generalise to multiple groups; pre-processing the data should help then.

R code for plotting multiple line segments with unique R ranges

I know there are many many questions on here around plotting multiple lines in a graph in R, but I've been struggling with a more specific task. I would like to add multiple line segments to a graph using only the intercept and slope specified for each line. abline() would work great for this, except each line has a specific range on the X axis, and I do not want the line plotted beyond the range.
I managed to get the graph I want using plotrix, but I am hoping to publish the work, and the graph does not look up-to-par (very basic). I am somewhat familiar with ggpplot, and think that graphs generated in ggplot look much better than what I have made, especially with the various themes availible, but I cannot figure out how to do something similar using ggplot.
Code:
library(plotrix)
plot(1, type="n", xlab="PM2.5(ug/m3)", ylab="LogRR Preeclampsia ", xlim=c(0, 20), ylim=c(-1, 2.5))
ablineclip(a = 0, b = 0.3, x1=1.2, x2=3)
ablineclip(a = 0, b = 0.08, x1=8.0, x2=13.1)
ablineclip(a = 0, b = 0.5, x1=10.1, x2=18.9)
ablineclip(a = 0, b = 0.12, x1=2.6, x2=14.1)
Any help would be appreciated!
Thank you.
You can write a basic function doing a bit of algebra to calculate the start/stop points for the line segments and then feed that into ggplot. For example
to_points <- function(intercept, slope, start, stop) {
data.frame(
segment = seq_along(start),
xstart = start,
xend = stop,
ystart = intercept + slope*start,
yend = intercept + slope*stop)
}
And then use that with
library(ggplot2)
segments <- to_points(0, c(0.3, 0.08, 0.5, .12),
c(1.2, 8.0, 10.1, 2.6),
c(3, 13.1, 18.9, 14.2))
ggplot(segments) +
aes(xstart, ystart, xend=xend, yend=yend) +
geom_segment() +
coord_cartesian(xlim=c(0,20), ylim=c(-1, 2.5)) +
labs(x="PM2.5(ug/m3)", y="LogRR Preeclampsia ")
That will produce the following plot
(Note the third segment is outside the region you specified. You can drop the coord_cartesian to see all the segments.)

Plotting violin plots: when I add a sample to be displayed, the violin plots no longer show up

I am trying to display my results using violin plot and box plot at the same time.
I am using cell count to display the number of immune cells in different cancer samples/groups. When I plot the expression for 4 samples, everything works. When I add another sample (GTEx_M2), the violin plots for all other 4 samples disappear and I end up with only the box plots.
Any suggestion? Thanks in advance!
library(ggplot2)
library(ggpubr)
Cibersort7 = structure(list(
Hot_M1 = c(0.0214400757119873, 0.170557805230298, 0.0804456569076382,
0.0893978598771954, 0.134477669028274, 0, 0.0525708788146097,
0.0511711964723951, 0.126904881120795, 0.0485101553521798,
0.170894800822398, 0.106555021195299, 0.0970104286070479,
0.115825265978309, 0.0427923320117795, 0.0733825856784013,
0.0111265771852828, 0.0657019859547462, 0.11656416302191,
0.172002238486688, 0.0154591596631105, 0.0350445248592811,
0.0795539781894198, 0.0781276090630857, 0.0087982313041526,
0.0289274652853823, 0.0712661645666698, 0.0435482190581647,
0.0455556872660798, 0.0871522448556361),
Cold_M1 = c(0.0346024087291239, 0.0201947741817111, 0.0306194109725081,
0.0277445612030966, 0.00905915199266666, 0.00939058305405205,
0.0146535473252646, 0.0159980760737253, 0.147670469457772,
0.0426119074182886, 0.0219251208462312, 0.0128996237306264,
0.0094816829459359, 0.0219336027293415, 0.0438220246067735,
0.00950926112282649, 0.0838386603270565, 0.0486661009213444,
0.00651564872414969, 0.00110323590537234, 0.0807125087307139, 0,
0.037709808301658, 0, 0.0898041410439557, 0.0417739517920607, 0,
0.0202168551193018, 0.00176008746063679, 0.0161337603014608),
Hotnorm_M1 = c(0.00622155478760928, 0.00864956989565159, 0.0245812979257332,
0.0339687958970202, 8e-04, 0, 0.0582086801600888, 0,
0.03481918582501, 0.021338008027511, 0.0157360408231509,
0.00489068636912568, 0.0281166183638247, 0.0162726467268935,
0.0415769266772567, 0, 0.00344830695596762, 0.00196737745405557,
0.0075141479562764, 0.0232464687737552, 0, 0, 0.0289423690350636,
0.0218584208695064, 0.0255945495324721, 4e-04, 0.0221942067802419,
0.00476738514342175, 0.00722699142988291, 0.00974645683928458),
Coldnorm_M1 = c(0.0280536098964266, 0.0261826834038114, 0.0150413750071331, 0,
0.0199730743908202, 0.0115748800373456, 0.0275674859254823,
0.0168847795974374, 0.0140281070945953, 0.00907861159279308,
0, 0, 0, 0.0453414461512909, 0, 0.00730963773612433,
0.0236424416792874, 0.0866914356225127, 0.0246339344582405,
0.00881531992455549, 0.0140744199322424, 0, 0, 0,
0.0319211626770028, 0.00155291355277603, 0.00295913497381517,
0.00738775271575955, 0.0179786878323852, 0.00442919920031897),
GTEx_M1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0.00551740159760184, 0, 0, 0, 0, 0)),
row.names = c(NA, -30L),
class = c("tbl_df", "tbl", "data.frame"))
This is a small part of my data that still shows the same issue I see.
y_axis = list(na.omit(Cibersort7$Hot_M1),
na.omit(Cibersort7$Cold_M1),
na.omit(Cibersort7$Hotnorm_M1),
na.omit(Cibersort7$Coldnorm_M1),
na.omit(Cibersort7$GTEx_M1))
groupname = groupexpression = data = violinPlot = pairwise_results = list(5)
for (i in 1:5){
groupname[[i]] = as.factor(colnames(Cibersort7[, i]))
groupexpression[[i]] = y_axis[[i]]
data[[i]] = data.frame("Sample" = groupname[[i]],
"Expression" = groupexpression[[i]])
}
dataframe = do.call(rbind, data)
dataframe$Sample = as.factor(dataframe$Sample)
my_comparisons = list(c("Hot_M1", "Cold_M1"),
c("Hot_M1", "Hotnorm_M1"),
c("Hot_M1", "GTEx_M1"),
c("Cold_M1", "Coldnorm_M1"),
c("Cold_M1", "GTEx_M1"))
violinPlot = ggplot(dataframe,
aes(x =Sample, y = Expression, fill = Sample)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1, fill="white") +
labs(title ="Distribution of M2 Macrophages",
x = "Tissue Samples", y = "Cibersort Count") +
theme_classic()
violinPlot
Here is how my violin plots look like:
Here is how they look like before adding the GTEx data:
And here's GTEx violin plots when displayed alone:
I understand that my GTEx data is zero but why do the violin plots disappear?
geom_violin has an argument named scale, which takes on the default value "area". From ?geom_violin:
if "area" (default), all violins have the same area (before trimming
the tails). If "count", areas are scaled proportionally to the number
of observations. If "width", all violins have the same maximum width.
Since GTEx's Expression values are concentrated at 0, its density peaks sharply at that value. We can see it more obviously in a normal density plot, with each sample's line overlaid atop one another:
ggplot(dataframe,
aes(x = Expression, color = Sample)) +
geom_density() +
theme_classic()
With the default scale = "area" argument, including GTEx in the data means the violin plot for all other samples becomes a lot skinnier, & hence become almost completely covered by the boxplots. You'd still be able to see them if you comment out the boxplot layer.
You can set scale = "width" instead if you want comparable visibility between each violin. You may also want to highlight this to your target audience if you choose this option, as scale = "area" tends to be more common, & people may feel confused when some violins appear clearly larger than others.
ggplot(dataframe,
aes(x = Sample, y = Expression, fill = Sample)) +
geom_violin(trim = FALSE, scale = "width") +
geom_boxplot(width=0.1, fill="white") +
labs(title ="Distribution of M2 Macrophages",
x = "Tissue Samples", y = "Cibersort Count") +
theme_classic()
p.s. You can simplify your data processing steps, which are (from what I can tell) essentially a conversion from wide to long format. The usual way to do this is via melt (from reshape2 package) or gather (from tidyr package). Here's a possible implementation:
library(dplyr)
library(tidyr)
df2 <- Cibersort7 %>%
gather(Sample, Expression) %>%
mutate(Sample = factor(Sample, levels = colnames(Cibersort7)))
> all.equal(dataframe, as.data.frame(df2))
[1] TRUE
p.p.s. If there are multiple people commenting in your thread & you don't # anyone in your reply, no one is going to get any notification about it, which is rather a waste if you've gone through all the trouble of improving your question. See here for an explanation of how the system works.

Resources