Karyogram and SNP with ggplot2 (geom_path, geom_bar, ggbio) - r

I want to plot a karyogram with SNP markers.
It works with function segments but I want to use ggplot2 package to display an elegant graphic.
ggbio:
I checked the package ggbio with the function layout_karyogram but the chromosomes are plotted in a vertical position. I didn't find a way to rotate the graph with the name below each chromosome and to write the name of my SNP next to their segment.
geom_bar:
Then I tried geom_bar from the package ggplot2:
data<-data.frame(chromosome=paste0("chr", 1:4),size=c(100,400,300,200),stringsAsFactors = FALSE)
dat$chromosome<-factor(dat$chromosome, levels = dat$chromosome)
SNP<-data.frame(chromosome=c(1,1,2,3,3,4),Position=c(50,70,250,20,290,110),Type=c("A","A","A","B","B","B"),labels=c("SNP1","SNP2","SNP3","SNP4","SNP5","SNP6"))
p <- ggplot(data=data, aes(x=chromosome, y=size)) + geom_bar( stat="identity", fill="grey70",width = .5) +theme_bw()
p + geom_segment(data=SNP, aes(x=SNP$chromosome-0.2, xend=SNP$chromosome+0.2, y=SNP$Position,yend=SNP$Position,colour=SNP$Type), size=1) +annotate("text", label =SNP$labels, x =SNP$chromosome-0.5, y = SNP$Position, size = 2, colour= "red")
The only problem here, it looks more like a barplot than a chromosome. I would like to have rounded extremities. I found someone who got the same problem as I am.
geom_path:
Instead of using geom_bar, I used geom_path with the option lineend = "round" to get rounded extremities.
ggplot() + geom_path(data=NULL, mapping=aes(x=c(1,1), y=c(1,100)),size=3, lineend="round")
The shape looks quite good. So I tried to run the code for severals chromosomes.
p <- ggplot()
data<-data.frame(chromosome=paste0("chr", 1:4),size=c(100,400,300,200),stringsAsFactors = FALSE)
for (i in 1:length(data[,1])){
p <- p + geom_path(data=NULL, mapping=aes(x=c(i,i), y=c(1,data[i,2])), size=3, lineend="round")
}
It doesn't work, I don't know why but p only save the last chromosome instead of plotting the four chromosomes in my karyogram.
Any suggestions for these problems ?

I would go for geom_segment. The x start/end of the SNP segments are hardcoded (as.integer(chr) -+ 0.05), but otherwise the code is fairly straightforward.
ggplot() +
geom_segment(data = data,
aes(x = chr, xend = chr, y = 0, yend = size),
lineend = "round", color = "lightgrey", size = 5) +
geom_segment(data = SNP,
aes(x = as.integer(chr) - 0.05, xend = as.integer(chr) + 0.05,
y = pos, yend = pos, color = type),
size = 1) +
theme_minimal()
data <- data.frame(chr = paste0("chr", 1:4),
size = c(100, 400, 300, 200))
SNP <- data.frame(chr = paste0("chr", c(1, 1, 2, 3, 3, 4)),
pos = c(50, 70, 250, 20, 290, 110),
type = c("A", "A", "A", "B", "B", "B"))

Related

divide the y axis to make part with a score <25 occupies the majority in ggplot

I want to divide the y axis for the attached figure to take part with a score <25 occupies the majority of the figure while the remaining represent a minor upper part.
I browsed that and I am aware that I should use scale_y_discrete(limits .I used this p<- p+scale_y_continuous(breaks = 1:20, labels = c(1:20,"//",40:100)) but it doesn't work yet.
I used the attached data and this is my code
Code
p<-ggscatter(data, x = "Year" , y = "Score" ,
color = "grey", shape = 21, size = 3, # Points color, shape and size
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
add = "loess", #reg.line
conf.int = T,
cor.coef = F, cor.method = "pearson",
xlab = "Year" , ylab= "Score")
p<-p+ coord_cartesian(xlim = c(1980, 2020));p
Here is as close as I could get getting a fake axis break and resizing the upper area of the plot. I still think it's a bad idea and if this were my plot I'd much prefer a more straightforward axis transform.
First, we'd need a function that generates a transform that squeezes all values above some threshold:
library(ggplot2)
library(scales)
# Define new transform
my_transform <- function(threshold = 25, squeeze_factor = 10) {
force(threshold)
force(squeeze_factor)
my_transform <- trans_new(
name = "trans_squeeze",
transform = function(x) {
ifelse(x > threshold,
((x - threshold) * (1 / squeeze_factor)) + threshold,
x)
},
inverse = function(x) {
ifelse(x > threshold,
((x - threshold) * squeeze_factor) + threshold,
x)
}
)
return(my_transform)
}
Next we apply that transformation to the y-axis and add a fake axis break. I've used vanilla ggplot2 code as I find the ggscatter() approach confusing.
ggplot(data, aes(Year, Score)) +
geom_point(color = "grey", shape = 21, size = 3) +
geom_smooth(method = "loess", fill = "lightgray") +
# Add fake axis lines
annotate("segment", x = -Inf, xend = -Inf,
y = c(-Inf, Inf), yend = c(24.5, 25.5)) +
# Apply transform to y-axis
scale_y_continuous(trans = my_transform(25, 10),
breaks = seq(0, 80, by = 10)) +
scale_x_continuous(limits = c(1980, 2020), oob = oob_keep) +
theme_classic() +
# Turn real y-axis line off
theme(axis.line.y = element_blank())
You might find it informative to read Hadley Wickham's view on discontinuous axes. People sometimes mock weird y-axes.

R, ggplot2 - In the legend, how do I hide unused colors from one geom while showing them in others?

I'm making a plot that has color defined for a geom_point(), and everything looks good.
points_a <- data.frame(x = sample(1:10, 4), y = sample(50:60, 4), id = "a")
points_b <- data.frame(x = sample(1:100, 4), y = sample(1:100, 4), id = "b")
points_c <- data.frame(x = sample(1:100, 4), y = sample(1:100, 4), id = "c")
points_all <- rbind(points_a, points_b, points_c)
ggplot(points_all) + geom_point(aes(x, y, color=id))
I'd like to highlight a group of points by drawing a rectangle around them using geom_rect(), picking up the same color already used in the chart. As desired, the legend adds a border around the item in the key; unfortunately, it also adds a border to every other item in the legend, as shown below:
my_box <- data.frame(left = 1, right = 10, bottom = 50, top = 60, id = "a")
ggplot(points_all) +
geom_point(aes(x, y, color=id)) +
geom_rect(data = my_box,
aes(xmin=left, xmax=right, ymin=bottom, ymax=top, color = id),
fill = NA, alpha = 1)
I want to get rid of the outlines around items "b" and "c" in the legend, since they're not plotted. I don't know how to do that, since they're defined by the same color aesthetic as the points. Ideally these unused factors should have been dropped from the legend for the outline color, as they rightfully are when showing only one geom_, but that doesn't seem to be how it works. (And defining the color manually outside the aes() call means it wouldn't get shown on the legend for id="a".)
Lots of searching hasn't yet yielded an answer, though I may have overlooked something. What's the best way to hide from the legend unused colors for one geom_ while keeping them for others? (Alternatively: Should I split these into two legends, and how?)
I usually tackle this sort of thing with override.aes in guide_legend(). In your case you can set the line type for the last two legend items to be 0 (no line). The first legend item should have linetype 1.
ggplot(points_all) +
geom_point(aes(x, y, color=id)) +
geom_rect(data = my_box,
aes(xmin=left, xmax=right, ymin=bottom, ymax=top,
color = id),
fill = NA, alpha = 1) +
guides(color = guide_legend(override.aes = list(linetype = c(1, 0, 0) ) ) )
aosmith has a great answer above. Alternatively, you could split the legends by changing the aesthetic in geom_point to fill in a hollow point:
points_all %>%
ggplot() +
geom_point(aes(x = x, y = y, fill = id), shape = 21, stroke = 0, size = 2.5) +
geom_rect(
data = my_box,
aes(xmin = left, xmax = right, ymin = bottom, ymax = top, color = id),
fill = NA, alpha = 1
)

How to add counts on top of bars, and percentage of fill inside bars?

I have some data in which each observation contains 2 factors, classes (a letter between A and E), and flag (0 or 1). After applying a group_by(classes,flag) and a summarize(frequency=n()), I get a data frame similar to this one:
classes <-as.factor(c("A", "A", "B", "B", "C", "C", "D", "D", "E", "E"))
flag <- as.factor(rep(c(0,1),10))
quantity <- c(856, 569, 463, 125, 795, 313, 1000, 457, 669, 201)
df <- data.frame(classes, flag, quantity)
I managed to get the chart that I want (ordered bars, one for each level of classes, each bar filled with the proportion of flag) with this code:
ggplot(df, aes(x = reorder(classes, -quantity), y = quantity)) +
geom_bar(aes(fill = as.factor(flag)), stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust = 1)) +
labs(x = NULL, y = "Quantity", fill = "flag") +
scale_fill_manual(values=c("firebrick","dodgerblue4"),
labels=c("1"="Yes","0"="No"))+
theme(axis.ticks = element_blank())
However, I am not sure how to use the geom_text() to include both the total count on top of each bar, and the proportion of the fill value inside the bars.
Thanks for helping!
I don't know a way to automate this, probably it's easiest to calculate proportions and sums outside the plot.
It's easier to reorder the classes outside the plot, so that your text can take over the factor-levels.
df$x <- reorder(df$classes, -df$quantity)
Next you can calculate the statistics you want. Below I assumed that if we split df by classes, it is always the order flag = 0, flag = 1, so we can take x[2]/x[1] as proportion.
text_df <- data.frame(
class = sapply(split(df$classes, df$classes), unique),
sum = sapply(split(df$quantity, df$classes), sum),
prop = sapply(split(df$quantity, df$classes), function(x){x[2]/(x[1]+x[2])})
)
Then we let text_df$class take on the same ordering as df$x.
text_df$class <- factor(text_df$class, levels = levels(df$x))
Then we make the plot similar to your example, remember we reordered the x-variable earlier:
ggplot(df, aes(x = x, y = quantity)) +
geom_bar(aes(fill = as.factor(flag)), stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust = 1)) +
labs(x = NULL, y = "Quantity", fill = "flag") +
scale_fill_manual(values=c("firebrick","dodgerblue4"),
labels=c("1"="Yes","0"="No"))+
theme(axis.ticks = element_blank())
And add two geoms for text, one for the proportion, one for the sum; both with a y-offset.
+geom_text(data = text_df,
aes(x = class,
y = sum + 100, # some offset
label = sum)) +
geom_text(data = text_df,
aes(x = class,
y = sum - 100, # opposite offset
label = scales::percent(prop)))
And I think that did the trick. Good luck!

How to add two different magnitudes of point size in a ggplot bubbles chart?

I just encountered such graph attached where two colors of geom_point are used (I believe it is made by ggplot2). Similarly, I would like to have dots of one color to range from size 1 to 5, and have another color for a series of dots for the range 10 to 50. I have however no clue on how to add two different ranges of point in one graph.
At the basic step I have:
a <- c(1,2,3,4,5)
b <- c(10,20,30,40,50)
Species <- factor(c("Species1","Species2","Species3","Species4","Species5"))
bubba <- data.frame(Sample1=a,Sample2=b,Species=Species)
bubba$Species=factor(bubba$Species, levels=bubba$Species)
xm=melt(bubba,id.vars = "Species", variable.name="Samples", value.name = "Size")
str(xm)
ggplot(xm,aes(x= Samples,y= fct_rev(Species)))+geom_point(aes(size=Size))+scale_size(range = range(xm$Size))+theme_bw()
Any would have clues where I should look into ? Thanks!
I've got an approach that gets 90% of the way there, but I'm not sure how to finish the deed. To get a single legend for size, I used a transformation to convert input size to display size. That makes the legend appearance conform to the display. What I don't have figured out yet is how to apply a similar transformation to the fill so that both can be integrated into the same legend.
Here's the transformation, which in this case shrinks everything 10 or more:
library(scales)
shrink_10s_trans = trans_new("shrink_10s",
transform = function(y){
yt = if_else(y >= 10, y*0.1, y)
return(yt)
},
inverse = function(yt){
return(yt) # Not 1-to-1 function, picking one possibility
}
)
Then we can use this transformation on the size to selectively shink only the dots that are 10 or larger. This works out nicely for the legend, aside from integrating the fill encoding with the size encoding.
ggplot(xm,aes(x= Samples,y= fct_rev(Species), fill = Size < 10))+
geom_point(aes(size=Size), shape = 21)+
scale_size_area(trans = shrink_10s_trans, max_size = 10,
breaks = c(1,2,3,10,20,30,40),
labels = c(1,2,3,10,20,30,40)) +
scale_fill_manual(values = c(rgb(136,93,100, maxColorValue = 255),
rgb(236,160,172, maxColorValue = 255))) +
theme_bw()
a <- c(1, 2, 3, 4, 5)
b <- c(10, 20, 30, 40, 50)
Species <- factor(c("Species1", "Species2", "Species3", "Species4", "Species5"))
bubba <- data.frame(Sample1 = a, Sample2 = b, Species = Species)
bubba$Species <- factor(bubba$Species, levels = bubba$Species)
xm <- reshape2::melt(bubba, id.vars = "Species", variable.name = "Samples", value.name = "Size")
ggplot(xm, aes(x = Samples, y = fct_rev(Species))) +
geom_point(aes(size = Size, color = Size)) +
scale_color_continuous(breaks = c(1,2,3,10,20,30), guide = guide_legend()) +
scale_size(range = range(xm$Size), breaks = c(1,2,3,10,20,30)) +
theme_bw()
Here's a cludge. I haven't got time to figure out the legend at the moment. Note that 1 and 10 are the same size, but a different colour, as are 3 and 40.
# Create data frame
a <- c(1, 2, 3, 4, 5)
b <- c(10, 20, 30, 40, 50)
Species <- factor(c("Species1", "Species2", "Species3", "Species4", "Species5"))
bubba <- data.frame(Sample1 = a, Sample2 = b, Species = Species)
# Restructure data
xm <- reshape2::melt(bubba, id.vars = "Species", variable.name = "Samples", value.name = "Size")
# Calculate bubble size
bubble_size <- function(val){
ifelse(val > 3, (1/15) * val + (1/3), val)
}
# Calculate bubble colour
bubble_colour <- function(val){
ifelse(val > 3, "A", "B")
}
# Calculate bubble size and colour
xm %<>%
mutate(bub_size = bubble_size(Size),
bub_col = bubble_colour(Size))
# Plot data
ggplot(xm, aes(x = Samples, y = fct_rev(Species))) +
geom_point(aes(size = bub_size, fill = bub_col), shape = 21, colour = "black") +
theme(panel.grid.major = element_line(colour = alpha("gray", 0.5), linetype = "dashed"),
text = element_text(family = "serif"),
legend.position = "none") +
scale_size(range = c(1, 20)) +
scale_fill_manual(values = c("brown", "pink")) +
ylab("Species")
I think you are looking for bubble plots in R
https://www.r-graph-gallery.com/bubble-chart/
That said, you probably want to build the right and left the side of the graphic separately and then combine.

ggplot2 geom_line indicating group size

Following http://docs.ggplot2.org/current/aes_group_order.html
h <- ggplot(Oxboys, aes(age, height))
h + geom_line(aes(group = Subject))
Produces
But if two Subjects have exactly the same line, one subject's line will hide the other. Could we use line thickness or intensity to indicate the number of subjects who have the same line? Could we add a bubble using geom_point() to indicate the number of subjects?
Use geom_line(aes(group = 'Subject'), alpha = .5). Play around with the alpha values.
You could accomplish it by first mapping the colour and size aesthetics and then adjusting their values using the scale_size_manual and scale_colour_manual functions. Below is a demonstration of the approach.
# a fake data set with two pairs of identical lines:
df <- data.frame(t = c(1:10, 1:10, 1:10, 1:10),
a = c(1:10, 1:10, seq(5, 8, length =10), seq(5, 8, length =10)),
c = rep(c("a", "b", "c", "d"), each = 10))
ggplot(df, aes(x = t, y = a, group = c)) +
geom_line(aes(size = c, colour = c)) +
scale_size_manual(values = c(4, 2, 3, 1.5)) +
scale_colour_manual(values = c("black", "red", "blue", "yellow"))
You must consider how your grouping factor (in the example c) is ordered, because the lines are also plotted in this order. So the line which is plotted first should get a larger value for size.

Resources