Visualize overlapping and non-overlapping ranges - r

I'm working on some flattening of overlapping ranges and would like to visualize the initial data (overlapping) and the resulting set (flattened) the following way:
Initial data:
Resulting set:
Is such possible with R and, for example, ggplot2?

read.table(header=TRUE, sep=",", text="color,start,end
red,12.5,13.8
blue,0.0,5.4
green,2.0,12.0
yellow,3.5,6.7
orange,6.7,10.0", stringsAsFactors=FALSE) -> df
library(ggplot2)
df$color <- factor(df$color, levels=rev(df$color))
ggplot(df) +
geom_segment(aes(x=start, xend=end, y=color, yend=color, color=color), size=10) +
scale_x_continuous(expand=c(0,0)) +
scale_color_identity() +
labs(x=NULL, y=NULL) +
theme_minimal() +
theme(panel.grid=element_blank()) +
theme(axis.text.x=element_blank()) +
theme(plot.margin=margin(30,30,30,30))
There are other posts on SO that show how to get the y labels like you have shown (we can't do all the work for you ;-)

The answer to the second part of the question can be using #hrbrmstr 's great answer for the first part. We can use overplotting to our advantage and simply set the y coordinates for the segments to a fixed value (for example 1, which where "red" is):
p <- ggplot(df) +
geom_segment(aes(x=start, xend=end, color=color),
y=1, yend=1, size=10) +
scale_x_continuous(expand=c(0,0)) + scale_color_identity() +
labs(x=NULL, y=NULL) +
theme_minimal() +theme(panel.grid=element_blank()) +
theme(axis.text.x=element_blank()) +
theme(plot.margin=margin(30,30,30,30))
print(p)

Related

How do I add count label to an axis?

I created this bar chart using ggplot. I had to create my own count function and called it 'a,' but now the label for that axis just has an a and nothing else... How do I fix it?
a <- count(df, glass)
gl <- ggplot(df, aes(x=glass, y="a", fill=glass)) +
geom_bar(stat="identity") +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
xlab("Glass type") +
ylab("Count") +
coord_flip() +
theme_minimal() +
theme(legend.position = "none")
gl
Here is my graph
The default stat value for geom_bar is “count”, which means that geom_bar() uses stat_count() to count the rows of each x value, or glass in this case. Using stat="identity" will override the default geom_bar() stat and require that you provide the y values in order for aggregation to occur. I don't believe you are trying to do this.
Try the following and see if it is what you were looking for:
gl <- ggplot(df, aes(x=glass, fill=glass)) +
geom_bar() +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
xlab("Glass type") +
ylab("Count") +
coord_flip() +
theme_minimal() +
theme(legend.position = "none")

How to overlay geom_bar and geom_line plots with different number of elements using ggplot2?

Assuming I have two data.frames with different data but in the same range of x-values
a <-data.frame(x=c(1,1,1,2,2,2,3,3,3),
y=c(0.3,0.4,0.3,0.2,0.5,0.3,0.4,0.4,0.2),
z=c("do","re","mi","do","re","mi","do","re","mi"))
b <- data.frame(x=c(1,2,3),y=c(10,15,8))
Both, a and b have the same range of X values (1,2,3) but while a is a data.frame with 9 rows, b is a data.frame with 3 rows.
I use geom_bar in order to plot the distribution of values of a, like this:
ggplot(a, aes(x=x, y=y, fill=z)) +
geom_bar(position="stack",stat="identity") +
ylab("") +
xlab("x")
And I use geom_line to plot b data, like this:
ggplot(b, aes(x=x, y=y)) +
geom_line(stat="identity") +
ylab("") + xlab("x") + ylim(0,15)
Now I would like to overlay this geom_line plot to the previous geom_bar plot. My first try was to do the following:
ggplot(a, aes(x=x, y=y, fill=z)) +
geom_bar(position="stack",stat="identity") +
ylab("") + xlab("x") +
ggplot(b, aes(x=x, y=y)) +
geom_line(stat="identity") +
ylab("") + xlab("x") + ylim(0,15)
With no success.
How can I overlay a geom_line plot to a geom_bar plot?
Try this
p <- ggplot()
p <- p + geom_bar(data = a, aes(x=x, y=y, fill=z), position="stack",stat="identity")
p <- p + geom_line(data = b, aes(x=x, y=y/max(y)), stat="identity")
p
Update:
You can rescale the one y to make them the same. As I don't know the relations between the two ys, I rescaled them by using y/max(y). Does this solve you problem?
Try merging the datasets first, then plotting, like this:
require(ggplot2)
df <- merge(a,b,by="x")
ggplot(df, aes(x=x, y=y.x, fill=z)) +
geom_bar(position="stack",stat="identity") +
geom_line(aes(x=x, y=y.y)) +
ylab("") + xlab("x")
Output:
I edited the sample data to better illustrate the effects, because the y-axis scaling of the original data would not have matched well:
a <-data.frame(x=c(1,1,1,2,2,2,3,3,3),
y=c(0.3,0.4,0.3,0.2,0.5,0.3,0.4,0.4,0.2),
z=c("do","re","mi","do","re","mi","do","re","mi"))
b <- data.frame(x=c(1,2,3),y=c(.4,1,.4))

ggplot geom_line for specific factor levels

Is there a way to add a line for specific factor levels in ggplot?
this simple example could provide a base to explain what I'm trying to say. In this case I'd like to avoid plotting the last level.
ggplot(BOD, aes(x=factor(Time), y=demand, group=1)) + geom_line() + geom_point()
You can just simply create a new variable with an NA-value for Time == 7:
BOD$demand2[BOD$Time<7] <- BOD$demand[BOD$Time<7]
and then plot:
ggplot(BOD, aes(x=factor(Time), y=demand2, group=1)) +
geom_line() +
geom_point() +
theme_classic()
You could also do it on the fly by utilizing the functionality of the data.table-package:
library(data.table)
ggplot(data = as.data.table(BOD)[Time==7, demand := NA],
aes(x=factor(Time), y=demand, group=1)) +
geom_line() +
geom_point() +
theme_classic()
To answer your comment, you could include the point at 7 as follows:
ggplot(BOD, aes(x=factor(Time), y=demand2, group=1)) +
geom_line() +
geom_point(aes(x=factor(Time), y=demand)) +
theme_classic()

Flip ordering of legend without altering ordering in plot

I have found that when adding coord_flip() to certain plots using ggplot2 that the order of values in the legend no longer lines up with the order of values in the plot.
For example:
dTbl = data.frame(x=c(1,2,3,4,5,6,7,8),
y=c('a','a','b','b','a','a','b','b'),
z=c('q','q','q','q','r','r','r','r'))
print(ggplot(dTbl, aes(x=factor(y),y=x, fill=z)) +
geom_bar(position=position_dodge(), stat='identity') +
coord_flip() +
theme(legend.position='top', legend.direction='vertical'))
I would like the 'q' and 'r' in the legend to be reversed without changing the order of 'q' and 'r' in the plot.
scale.x.reverse() looked promising, but it doesn't seem to work within factors (as is the case for this bar plot).
You're looking for guides:
ggplot(dTbl, aes(x=factor(y),y=x, fill=z)) +
geom_bar(position=position_dodge(), stat='identity') +
coord_flip() +
theme(legend.position='top', legend.direction='vertical') +
guides(fill = guide_legend(reverse = TRUE))
I was reminded in chat by Brian that there is a more general way to do this for arbitrary orderings, by setting the breaks argument:
ggplot(dTbl, aes(x=factor(y),y=x, fill=z)) +
geom_bar(position=position_dodge(), stat='identity') +
coord_flip() +
theme(legend.position='top', legend.direction='vertical') +
scale_fill_discrete(breaks = c("r","q"))
If you don't like joran's elegant answer, you can go with the hack:
geom_bar(position=position_dodge(-.9), stat='identity')
For arbitrary level reordering, you can modify the order of levels in the factor:
dTbl$z=factor(dTbl$z,levels=c('r','q'))
ggplot(dTbl, aes(x=factor(y),y=x, fill=z)) +
geom_bar(position=position_dodge(), stat='identity') +
coord_flip() +
theme(legend.position='top', legend.direction='vertical')

How to Unearth the Buried Regression Line in GGPLOT

Currently my regression plot looks like this. Notice that
the regression line is deeply buried.
Is there any way I can modify my code here, to show it on top of the dots?
I know I can increase the size but it's still underneath the dots.
p <- ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5) +
geom_point()
p
Just change the order:
p <- ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_point() +
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5)
p
The issue is not the color, but the order of the geoms.
If you first call geom_point() and then geom_smooth()
the latter will be on top of the former.
Plot the following for comparison:
Before <-
ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5) +
geom_point()
After <-
ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_point() +
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5)
How about transparent points?
library(ggplot2)
seed=616
x1<- sort(runif(rnorm(1000)))
seed=626
x2<- rnorm(1000)*0.02+sort(runif(rnorm(1000)))
my_df<- data.frame(x= x1, y = x2)
p <- ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5)+
geom_point(size = I(2), alpha = I(0.1))
p

Resources