ggplot2 geom_line indicating group size - r

Following http://docs.ggplot2.org/current/aes_group_order.html
h <- ggplot(Oxboys, aes(age, height))
h + geom_line(aes(group = Subject))
Produces
But if two Subjects have exactly the same line, one subject's line will hide the other. Could we use line thickness or intensity to indicate the number of subjects who have the same line? Could we add a bubble using geom_point() to indicate the number of subjects?

Use geom_line(aes(group = 'Subject'), alpha = .5). Play around with the alpha values.

You could accomplish it by first mapping the colour and size aesthetics and then adjusting their values using the scale_size_manual and scale_colour_manual functions. Below is a demonstration of the approach.
# a fake data set with two pairs of identical lines:
df <- data.frame(t = c(1:10, 1:10, 1:10, 1:10),
a = c(1:10, 1:10, seq(5, 8, length =10), seq(5, 8, length =10)),
c = rep(c("a", "b", "c", "d"), each = 10))
ggplot(df, aes(x = t, y = a, group = c)) +
geom_line(aes(size = c, colour = c)) +
scale_size_manual(values = c(4, 2, 3, 1.5)) +
scale_colour_manual(values = c("black", "red", "blue", "yellow"))
You must consider how your grouping factor (in the example c) is ordered, because the lines are also plotted in this order. So the line which is plotted first should get a larger value for size.

Related

How to add two different magnitudes of point size in a ggplot bubbles chart?

I just encountered such graph attached where two colors of geom_point are used (I believe it is made by ggplot2). Similarly, I would like to have dots of one color to range from size 1 to 5, and have another color for a series of dots for the range 10 to 50. I have however no clue on how to add two different ranges of point in one graph.
At the basic step I have:
a <- c(1,2,3,4,5)
b <- c(10,20,30,40,50)
Species <- factor(c("Species1","Species2","Species3","Species4","Species5"))
bubba <- data.frame(Sample1=a,Sample2=b,Species=Species)
bubba$Species=factor(bubba$Species, levels=bubba$Species)
xm=melt(bubba,id.vars = "Species", variable.name="Samples", value.name = "Size")
str(xm)
ggplot(xm,aes(x= Samples,y= fct_rev(Species)))+geom_point(aes(size=Size))+scale_size(range = range(xm$Size))+theme_bw()
Any would have clues where I should look into ? Thanks!
I've got an approach that gets 90% of the way there, but I'm not sure how to finish the deed. To get a single legend for size, I used a transformation to convert input size to display size. That makes the legend appearance conform to the display. What I don't have figured out yet is how to apply a similar transformation to the fill so that both can be integrated into the same legend.
Here's the transformation, which in this case shrinks everything 10 or more:
library(scales)
shrink_10s_trans = trans_new("shrink_10s",
transform = function(y){
yt = if_else(y >= 10, y*0.1, y)
return(yt)
},
inverse = function(yt){
return(yt) # Not 1-to-1 function, picking one possibility
}
)
Then we can use this transformation on the size to selectively shink only the dots that are 10 or larger. This works out nicely for the legend, aside from integrating the fill encoding with the size encoding.
ggplot(xm,aes(x= Samples,y= fct_rev(Species), fill = Size < 10))+
geom_point(aes(size=Size), shape = 21)+
scale_size_area(trans = shrink_10s_trans, max_size = 10,
breaks = c(1,2,3,10,20,30,40),
labels = c(1,2,3,10,20,30,40)) +
scale_fill_manual(values = c(rgb(136,93,100, maxColorValue = 255),
rgb(236,160,172, maxColorValue = 255))) +
theme_bw()
a <- c(1, 2, 3, 4, 5)
b <- c(10, 20, 30, 40, 50)
Species <- factor(c("Species1", "Species2", "Species3", "Species4", "Species5"))
bubba <- data.frame(Sample1 = a, Sample2 = b, Species = Species)
bubba$Species <- factor(bubba$Species, levels = bubba$Species)
xm <- reshape2::melt(bubba, id.vars = "Species", variable.name = "Samples", value.name = "Size")
ggplot(xm, aes(x = Samples, y = fct_rev(Species))) +
geom_point(aes(size = Size, color = Size)) +
scale_color_continuous(breaks = c(1,2,3,10,20,30), guide = guide_legend()) +
scale_size(range = range(xm$Size), breaks = c(1,2,3,10,20,30)) +
theme_bw()
Here's a cludge. I haven't got time to figure out the legend at the moment. Note that 1 and 10 are the same size, but a different colour, as are 3 and 40.
# Create data frame
a <- c(1, 2, 3, 4, 5)
b <- c(10, 20, 30, 40, 50)
Species <- factor(c("Species1", "Species2", "Species3", "Species4", "Species5"))
bubba <- data.frame(Sample1 = a, Sample2 = b, Species = Species)
# Restructure data
xm <- reshape2::melt(bubba, id.vars = "Species", variable.name = "Samples", value.name = "Size")
# Calculate bubble size
bubble_size <- function(val){
ifelse(val > 3, (1/15) * val + (1/3), val)
}
# Calculate bubble colour
bubble_colour <- function(val){
ifelse(val > 3, "A", "B")
}
# Calculate bubble size and colour
xm %<>%
mutate(bub_size = bubble_size(Size),
bub_col = bubble_colour(Size))
# Plot data
ggplot(xm, aes(x = Samples, y = fct_rev(Species))) +
geom_point(aes(size = bub_size, fill = bub_col), shape = 21, colour = "black") +
theme(panel.grid.major = element_line(colour = alpha("gray", 0.5), linetype = "dashed"),
text = element_text(family = "serif"),
legend.position = "none") +
scale_size(range = c(1, 20)) +
scale_fill_manual(values = c("brown", "pink")) +
ylab("Species")
I think you are looking for bubble plots in R
https://www.r-graph-gallery.com/bubble-chart/
That said, you probably want to build the right and left the side of the graphic separately and then combine.

ggplo2 in R: geom_segment displays different line than geom_line

Say I have this data frame:
treatment <- c(rep("A",6),rep("B",6),rep("C",6),rep("D",6),rep("E",6),rep("F",6))
year <- as.numeric(c(1999:2004,1999:2004,2005:2010,2005:2010,2005:2010,2005:2010))
variable <- c(runif(6,4,5),runif(6,5,6),runif(6,3,4),runif(6,4,5),runif(6,5,6),runif(6,6,7))
se <- c(runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5))
id <- 1:36
df1 <- as.data.table(cbind(id,treatment,year,variable,se))
df1$year <- as.numeric(df1$year)
df1$variable <- as.numeric(df1$variable)
df1$se <- as.numeric(df1$se)
As I mentioned in a previous question (draw two lines with the same origin using ggplot2 in R), I wanted to use ggplot2 to display my data in a specific way.
I managed to do so using the following script:
y1 <- df1[df1$treatment=='A'&df1$year==2004,]$variable
y2 <- df1[df1$treatment=='B'&df1$year==2004,]$variable
y3 <- df1[df1$treatment=='C'&df1$year==2005,]$variable
y4 <- df1[df1$treatment=='D'&df1$year==2005,]$variable
y5 <- df1[df1$treatment=='E'&df1$year==2005,]$variable
y5 <- df1[df1$treatment=='E'&df1$year==2005,]$variable
y6 <- df1[df1$treatment=='F'&df1$year==2005,]$variable
p <- ggplot(df1,aes(x=year,y=variable,group=treatment,color=treatment))+
geom_line(aes(y = variable, group = treatment, linetype = treatment, color = treatment),size=1.5,lineend = "round") +
scale_linetype_manual(values=c('solid','solid','solid','dashed','solid','dashed')) +
geom_point(aes(colour=factor(treatment)),size=4)+
geom_errorbar(aes(ymin=variable-se,ymax=variable+se),width=0.2,size=1.5)+
guides(colour = guide_legend(override.aes = list(shape=NA,linetype = c("solid", "solid",'solid','dashed','solid','dashed'))))
p+labs(title="Title", x="years", y = "Variable 1")+
theme_classic() +
scale_x_continuous(breaks=c(1998:2010), labels=c(1998:2010),limits=c(1998.5,2010.5))+
geom_segment(aes(x=2004, y=y1, xend=2005, yend=y3),colour='blue1',size=1.5,linetype='solid')+
geom_segment(aes(x=2004, y=y1, xend=2005, yend=y4),colour='blue1',size=1.5,linetype='dashed')+
geom_segment(aes(x=2004, y=y2, xend=2005, yend=y5),colour='red3',size=1.5,linetype='solid')+
geom_segment(aes(x=2004, y=y2, xend=2005, yend=y6),colour='red3',size=1.5,linetype='dashed')+
scale_color_manual(values=c('blue1','red3','blue1','blue1','red3','red3'))+
theme(text = element_text(size=12))
As you can see I used both geom_line and geom_segment to display the lines for my graph.
It's almost perfect but if you look closely, the segments that are drawn (between 2004 and 2005) do not display the same line size, even though I used the same arguments values in the script (i.e. size=1.5 and linetype='solid' or dashed).
Of course I could change manually the size of the segments to get similar lines, but when I do that, segments are not as smooth as the lines using geom_line.
Also, I get the same problem (different line shapes) by including the size or linetype arguments within the aes() argument.
Do you have any idea what causes this difference and how I can get the exact same shapes for both my segments and lines ?
It seems to be an anti-aliasing issue with geom_segment, but that seems like a somewhat cumbersome approach to begin with. I think I have resolved your issue by duplicating the A and B treatments in the original data frame.
# First we are going to duplicate and rename the 'shared' treatments
library(dplyr)
library(ggplot2)
df1 %>%
filter(treatment %in% c("A", "B")) %>%
mutate(treatment = ifelse(treatment == "A",
"AA", "BB")) %>%
bind_rows(df1) %>% # This rejoins with the original data
# Now we create `treatment_group` and `line_type` variables
mutate(treatment_group = ifelse(treatment %in% c("A", "C", "D", "AA"),
"treatment1",
"treatment2"), # This variable will denote color
line_type = ifelse(treatment %in% c("AA", "BB", "D", "F"),
"type1",
"type2")) %>% # And this variable denotes the line type
# Now pipe into ggplot
ggplot(aes(x = year, y = variable,
group = interaction(treatment_group, line_type), # grouping by both linetype and color
color = treatment_group)) +
geom_line(aes(x = year, y = variable, linetype = line_type),
size = 1.5, lineend = "round") +
geom_point(size=4) +
# The rest here is more or less the same as what you had
geom_errorbar(aes(ymin = variable-se, ymax = variable+se),
width = 0.2, size = 1.5) +
scale_color_manual(values=c('blue1','red3')) +
scale_linetype_manual(values = c('dashed', 'solid')) +
labs(title = "Title", x = "Years", y = "Variable 1") +
scale_x_continuous(breaks = c(1998:2010),
limits = c(1998.5, 2010.5))+
theme_classic() +
theme(text = element_text(size=12))
Which will give you the following
My numbers are different since they were randomly generated.
You can then modify the legend to your liking, but my recommendation is using something like geom_label and then be sure to set check_overlap = TRUE.
Hope this helps!

Karyogram and SNP with ggplot2 (geom_path, geom_bar, ggbio)

I want to plot a karyogram with SNP markers.
It works with function segments but I want to use ggplot2 package to display an elegant graphic.
ggbio:
I checked the package ggbio with the function layout_karyogram but the chromosomes are plotted in a vertical position. I didn't find a way to rotate the graph with the name below each chromosome and to write the name of my SNP next to their segment.
geom_bar:
Then I tried geom_bar from the package ggplot2:
data<-data.frame(chromosome=paste0("chr", 1:4),size=c(100,400,300,200),stringsAsFactors = FALSE)
dat$chromosome<-factor(dat$chromosome, levels = dat$chromosome)
SNP<-data.frame(chromosome=c(1,1,2,3,3,4),Position=c(50,70,250,20,290,110),Type=c("A","A","A","B","B","B"),labels=c("SNP1","SNP2","SNP3","SNP4","SNP5","SNP6"))
p <- ggplot(data=data, aes(x=chromosome, y=size)) + geom_bar( stat="identity", fill="grey70",width = .5) +theme_bw()
p + geom_segment(data=SNP, aes(x=SNP$chromosome-0.2, xend=SNP$chromosome+0.2, y=SNP$Position,yend=SNP$Position,colour=SNP$Type), size=1) +annotate("text", label =SNP$labels, x =SNP$chromosome-0.5, y = SNP$Position, size = 2, colour= "red")
The only problem here, it looks more like a barplot than a chromosome. I would like to have rounded extremities. I found someone who got the same problem as I am.
geom_path:
Instead of using geom_bar, I used geom_path with the option lineend = "round" to get rounded extremities.
ggplot() + geom_path(data=NULL, mapping=aes(x=c(1,1), y=c(1,100)),size=3, lineend="round")
The shape looks quite good. So I tried to run the code for severals chromosomes.
p <- ggplot()
data<-data.frame(chromosome=paste0("chr", 1:4),size=c(100,400,300,200),stringsAsFactors = FALSE)
for (i in 1:length(data[,1])){
p <- p + geom_path(data=NULL, mapping=aes(x=c(i,i), y=c(1,data[i,2])), size=3, lineend="round")
}
It doesn't work, I don't know why but p only save the last chromosome instead of plotting the four chromosomes in my karyogram.
Any suggestions for these problems ?
I would go for geom_segment. The x start/end of the SNP segments are hardcoded (as.integer(chr) -+ 0.05), but otherwise the code is fairly straightforward.
ggplot() +
geom_segment(data = data,
aes(x = chr, xend = chr, y = 0, yend = size),
lineend = "round", color = "lightgrey", size = 5) +
geom_segment(data = SNP,
aes(x = as.integer(chr) - 0.05, xend = as.integer(chr) + 0.05,
y = pos, yend = pos, color = type),
size = 1) +
theme_minimal()
data <- data.frame(chr = paste0("chr", 1:4),
size = c(100, 400, 300, 200))
SNP <- data.frame(chr = paste0("chr", c(1, 1, 2, 3, 3, 4)),
pos = c(50, 70, 250, 20, 290, 110),
type = c("A", "A", "A", "B", "B", "B"))

Generating multiple lines for repeat observations in only some factor levels

I am generating density plots for observations. The observations belong to a species and some are also connected to an individual ID.
With the data below, I want to generate a line for each level of IndID for species One and Two, and only a single line for Species Three, which does not include IndID. There are related questions on SO, but not with reproducible data and looking for different results.
library(ggplot2)
set.seed(1)
dat <- data.frame(Species = c(rep(c("One", "Two"), each = 2, length = 30), rep("Three",50)),
IndID = c(rep(letters[1:5],each = 6),rep(NA,50) ),
Value = sample(1:20, replace = T))
Keeping the color ascetic on the Species level, I want to create multiple lines for Species One and Two (green and red) and a single blue line for species Three.
ggplot(dat, aes(Value)) + geom_density(aes(color = Species), size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red"))
If you want to be able to tell them apart, you can set the linetype to IndID. Note, however, that you will need to change the NA to some other value to (easily) get it to plot.
I also expanded your data a little bit to give enough values per individual to show meaningful lines. I also used geom_line(stat = "density") instead of geom_density() because it omits the line along the bottom and gives legends with lines instead of boxes.
set.seed(1)
dat <- data.frame(Species = c(rep(c("One", "Two"), each = 2, length = 60), rep("Three",50)),
IndID = c(rep(letters[1:5],each = 12),rep("NA",50) ),
Value = sample(1:20, 110, replace = T))
ggplot(dat
, aes(x = Value
, color = Species
, linetype = IndID)) +
geom_line(stat = "density"
, size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red"))
gives
If you want the lines to all be solid, you can run:
ggplot(dat
, aes(x = Value
, color = Species
, linetype = IndID)) +
geom_line(stat = "density"
, size = 1.25) +
scale_colour_manual(values = c("darkgreen","blue", "red")) +
scale_linetype_manual(values = rep("solid", 6)) +
guides(linetype = "none")
(or use group as #Henrik suggested in zir comment)

connect points in ggplot based on specific column values

I have the following data set called t:
n <- 12
t <- data.frame(
V1 = runif(n, 0.12, 0.35),
V2 = runif(n, 0.25, 0.39),
group = gl(3, 4, labels = c("a1", "a2", "a3")),
x = seq_len(n),
color = rep(rep.int(c("R", "G"), 2), c(3, 4, 3, 2))
)
I created the following plot from this data.
p <- ggplot(t, aes(x, colour = color)) +
geom_point(aes(y = V1, size = 10)) +
geom_point(aes(y = V2, size = 10))
What I want to do now is to connect the points depending on the group column (e.g, points of group a1 will be connected with a blue line, points of group a2 will be connected in a yellow line, ...) and i want the line to be different depending on V1 and V2 (dashed line for V1 and normal line for V2).
How this can be done?
First of all: naming a dataset "t" is not a good idea because it is confusing since there is a function t() as well.
The easiest way is to melt() your dataset first
Molten <- melt(t, id.vars = c("group", "x", "color"))
ggplot(Molten, aes(x = x, y = value, colour = group, linetype = variable)) + geom_line()
Have a look at the ggplot2 website on how to customise the colours.
If you want to plot your graph without using melt():
p <-ggplot(t) + geom_line(aes(x,V2,color=group)) + geom_line(aes(x,V1,color=group), linetype = "dashed")

Resources