Plot unsimilar data with ggplot - r

I am trying to plot the different lines, but one of the observations is far more beyond the normal scale.
library(ggplot2)
library(reshape)
library(dplyr)
a = matrix(rnorm(50, 0, 1), ncol=5)
b = c(rnorm(5, 0, 2), rnorm(5,20,2))
dt = data.frame(a,b)
rownames(dt) = paste0('Day',1:10)
colnames(dt) = c('A','B','C','D','E','F')
mdt = melt(as.matrix(dt), varnames=c('Date', 'Model'))
head(mdt)
ggplot(mdt, aes(x=Date, y=value, group=Model, color=Model))+
geom_line(size=1.2)
This is what I got:
As you can see, the large fluctuation of F enlarges my general y axis scale and makes the other five observations' trends unclear.
I tried to set the ylim=(-5,5), then I lost the entire F:
I am not entirely sure how shall I plot all of them together, but I am thinking is it possible to scale the out of range part, like the following part, or you actually wouldn't recommend this?
Any advice is highly appreciated. Thanks!

I don't know if this is the best option for you data, but you can divide each group by it's range to make it more like a ratio.
library(data.table)
library(ggplot2)
set.seed(99)
dt <- data.table(value = c(rnorm(10, 0, 5), rnorm(20)),
cat = rep(c("x", "y", "z"), each = 10),
id = rep(1:10, 3))
ggplot(dt, aes(id, value, color = cat)) +
geom_line(size = 2)
ggplot(dt[, ratio := value / diff(range(value)), cat], aes(id, ratio, color = cat)) +
geom_line(size = 2)

Related

R ggplot: overlay two conditional density plots (same binary outcome variable) - possible?

I know how to plot several density curves/polygrams on one plot, but not conditional density plots.
Reproducible example:
require(ggplot2)
# generate data
a <- runif(200, min=0, max = 1000)
b <- runif(200, min=0, max = 1000)
c <- sample(c("A", "B"), 200, replace =T)
df <- data.frame(a,b,c)
# plot 1
ggplot(df, aes(a, fill = c)) +
geom_density(position='fill', alpha = 0.5)
# plot 2
ggplot(df, aes(b, fill = c)) +
geom_density(position='fill', alpha = 0.5)
In my real data I have a bunch of these paired conditional density plots and I would need to overlay one over the other to see (and show) how different (or similar) they are. Does anyone know how to do this?
One way would be to plot the two versions as layers. The overlapping areas will be slightly different, depending on the layer order, based on how alpha works in ggplot2. This may or may not be what you want. You might fiddle with the two alphas, or vary the border colors, to distinguish them more.
ggplot(df, aes(fill = c)) +
geom_density(aes(a), position='fill', alpha = 0.5) +
geom_density(aes(b), position='fill', alpha = 0.5)
For example, you might make it so the fill only applies to one layer, but the other layer distinguishes groups using the group aesthetic, and perhaps a different linetype. This one seems more readable to me, especially if there is a natural ordering to the two variables that justifies putting one in the "foreground" and one in the "background."
ggplot(df) +
geom_density(aes(a, group = c), position='fill', alpha = 0.2, linetype = "dashed") +
geom_density(aes(b, fill = c), position='fill', alpha = 0.5)
I'm not so sure if "on top of one another" is a great idea. Jon's ideas are probably the way to go. But what about just plotting side-by side - our brains can cope with that and we can compare this pretty well.
Make it long, then use facet.
Another option might be an animated graph (see 2nd code chunk below).
require(ggplot2)
#> Loading required package: ggplot2
library(tidyverse)
a <- runif(200, min=0, max = 1000)
b <- runif(200, min=0, max = 1000)
#### BAAAAAD idea to call anything "c" in R!!! Don't do this. ever!
d <- sample(c("A", "B"), 200, replace =T)
df <- data.frame(a,b,d)
df %>% pivot_longer(cols = c(a,b)) %>%
ggplot(aes(value, fill = d)) +
geom_density(position='fill', alpha = 0.5) +
facet_grid(~name)
library(gganimate)
p <- df %>% pivot_longer(cols = c(a,b)) %>%
ggplot(aes(value, fill = d)) +
geom_density(position='fill', alpha = 0.5) +
labs(title = "{closest_state}")
p_anim <- p + transition_states(name)
animate(p_anim, duration = 2, fps = 5)
Created on 2022-06-14 by the reprex package (v2.0.1)
Although it is not the overlay you might have thought of, it facilitates the comparison of density curves:
library(tidyverse)
library(ggridges)
library(truncnorm)
DF <- tibble(
alpha = rtruncnorm(n = 200, a = 0, b = 1000, mean = 500, sd = 50),
beta = rtruncnorm(n = 200, a = 0, b = 1000, mean = 550, sd = 50)
)
DF <- DF %>%
pivot_longer(c(alpha, beta), names_to = "name", values_to = "meas") %>%
mutate(name = factor(name))
DF %>%
ggplot(aes(meas, name, fill = factor(stat(quantile)))) +
stat_density_ridges(
geom = "density_ridges_gradient",
calc_ecdf = T,
quantiles = 4,
quantile_lines = T
) +
scale_fill_viridis_d(name = "Quartiles")

Advice/ on how to plot side by side histograms with line graph going through in ggplot2

I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.

How to annotate lines like this in ggplot2?

See example:
I hope I don't need to manually assign the coordinators of the texts. If this is too complicated to achieve in ggplot2, what are the alternatives in R? Or maybe even not in R?
As #Axeman says, ggrepel is a decent option. Unfortunately it will only avoid overlap with other labels, and not the lines, so the solution isn't quite perfect.
library(ggplot2)
install.packages("ggrepel")
library(ggrepel)
set.seed(50)
d <- data.frame(y = c(rnorm(50), rnorm(50, 5), rnorm(50, 10)),
x = rep(seq(50), times = 3),
group = rep(LETTERS[seq(3)], each = 50))
ggplot(d, aes(x, y, group = group, label = group)) +
geom_line() +
geom_text_repel(data = d[d$x == sample(d$x, 1), ], size = 10)

Mix color and fill aesthetics in ggplot

I wonder if there is the possibility to change the fill main colour according to a categorical variable
Here is a reproducible example
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = c(rep('a', times = 10),
rep('b', times = 10)),
val = rep(1:10, times = 2))
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(color = grp,
fill = val,
size = val))
Of course it is easy to change the circle colour/shape, according to the variable grp, but I'd like to have the a group in shades of red and the b group in shades of blue.
I also thought about using facets, but don't know if the fill gradient can be changed for the two panels.
Anyone knows if that can be done, without gridExtra?
Thanks!
I think there are two ways to do this. The first is using the alpha aesthetic for your val column. This is a quick and easy way to accomplish your goal but may not be exactly what you want:
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(alpha=val,
fill = grp,
size = val)) + theme_minimal()
The second way would be to do something similar to this post: Vary the color gradient on a scatter plot created with ggplot2. I edited the code slightly so its not a range from white to your color of interest but from a lighter color to a darker color. This requires a little bit of work and using the scale_fill_identity function which basically takes a variable that has the colors you want and maps them directly to each point (so it doesn't do any scaling).
This code is:
#Rescale val to [0,1]
df$scaled_val <- rescale(df$val)
low_cols <- c("firebrick1","deepskyblue")
high_cols <- c("darkred","deepskyblue4")
df$col <- ddply(df, .(grp), function(x)
data.frame(col=apply(colorRamp(c(low_cols[as.numeric(x$grp)[1]], high_cols[as.numeric(x$grp)[1]]))(x$scaled_val),
1,function(x)rgb(x[1],x[2],x[3], max=255)))
)$col
df
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(
fill = col,
size = val)) + theme_minimal() +scale_fill_identity()
Thanks to this other post I found a way to visualize the fill bar in the legend, even though that wasn't what I meant to do.
Here's the ouptup
And the code
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = factor(c(rep('a', times = 10),
rep('b', times = 10)),
levels = c('a', 'b')),
val = rep(1:10, times = 2)) %>%
group_by(grp) %>%
mutate(scaledVal = rescale(val)) %>%
ungroup %>%
mutate(scaledValOffSet = scaledVal + 100*(as.integer(grp) - 1))
scalerange <- range(df$scaledVal)
gradientends <- scalerange + rep(c(0,100,200), each=2)
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(fill = scaledValOffSet,
size = val)) +
scale_fill_gradientn(colours = c('white',
'darkred',
'white',
'deepskyblue4'),
values = rescale(gradientends))
Basically one should rescale fill values (e.g. between 0 and 1) and separate them using another order of magnitude, provided by the categorical variable grp.
This is not what I wanted though: the snippet can be improved, of course, to make the whole thing less manual, but still lacks the simple usual discrete fill legend.

How to prevent two labels to overlap in a barchart?

The image below shows a chart that I created with the code below. I highlighted the missing or overlapping labels. Is there a way to tell ggplot2 to not overlap labels?
week = c(0, 1, 1, 1, 1, 2, 2, 3, 4, 5)
statuses = c('Shipped', 'Shipped', 'Shipped', 'Shipped', 'Not-Shipped', 'Shipped', 'Shipped', 'Shipped', 'Not-Shipped', 'Shipped')
dat <- data.frame(Week = week, Status = statuses)
p <- qplot(factor(Week), data = dat, geom = "bar", fill = factor(Status))
p <- p + geom_bar()
# Below is the most important line, that's the one which displays the value
p <- p + stat_bin(aes(label = ..count..), geom = "text", vjust = -1, size = 3)
p
You can use a variant of the well-known population pyramid.
Some sample data (code inspired by Didzis Elferts' answer):
set.seed(654)
week <- sample(0:9, 3000, rep=TRUE, prob = rchisq(10, df = 3))
status <- factor(rbinom(3000, 1, 0.15), labels = c("Shipped", "Not-Shipped"))
data.df <- data.frame(Week = week, Status = status)
Compute count scores for each week, then convert one category to negative values:
library("plyr")
plot.df <- ddply(data.df, .(Week, Status), nrow)
plot.df$V1 <- ifelse(plot.df$Status == "Shipped",
plot.df$V1, -plot.df$V1)
Draw the plot. Note that the y-axis labels are adapted to show positive values on either side of the baseline.
library("ggplot2")
ggplot(plot.df) +
aes(x = as.factor(Week), y = V1, fill = Status) +
geom_bar(stat = "identity", position = "identity") +
scale_y_continuous(breaks = 100 * -1:5,
labels = 100 * c(1, 0:5)) +
geom_text(aes(y = sign(V1) * max(V1) / 30, label = abs(V1)))
The plot:
For production purposes you'd need to determine the appropriate y-axis tick labels dynamically.
Made new sample data (inspired by code of #agstudy).
week <- sample(0:5,1000,rep=TRUE,prob=c(0.2,0.05,0.15,0.5,0.03,0.1))
statuses <- gl(2,1000,labels=c('Not-Shipped', 'Shipped'))
dat <- data.frame(Week = week, Status = statuses)
Using function ddply() from library plyr made new data frame text.df for labels. Column count contains number of observations in each combination of Week and Status. Then added column ypos that contains cumulative sum of count for each Week plus 15. This will be used for y position. For Not-Shipped ypos replaced with -10.
library(plyr)
text.df<-ddply(dat,.(Week,Status),function(x) data.frame(count=nrow(x)))
text.df<-ddply(text.df,.(Week),transform,ypos=cumsum(count)+15)
text.df$ypos[text.df$Status=="Not-Shipped"]<- -10
Now labels are plotted with geom_text() using new data frame.
ggplot(dat,aes(as.factor(Week),fill=Status))+geom_bar()+
geom_text(data=text.df,aes(x=as.factor(Week),y=ypos,label=count))
One solution to avoid overlaps is to use to dodge position of bars and texts. To avoid missing values you can set ylim. Here an example.
## I create some more realistic data similar to your picture
week <- sample(0:5,1000,rep=TRUE)
statuses <- gl(2,1000,labels=c('Not-Shipped', 'Shipped'))
dat <- data.frame(Week = week, Status = statuses)
## for dodging
dodgewidth <- position_dodge(width=0.9)
## get max y to set ylim
ymax <- max(table(dat$Week,dat$Status))+20
ggplot(dat,aes(x = factor(Week),fill = factor(Status))) +
geom_bar( position = dodgewidth ) +
stat_bin(geom="text", position= dodgewidth, aes( label=..count..),
vjust=-1,size=5)+
ylim(0,ymax)
Based on Didzis plot you could also increase readability by keeping the position on the y axis constant and by colouring the text in the same colour as the legend.
library(ggplot2)
week <- sample(0:5,1000,rep=TRUE,prob=c(0.2,0.05,0.15,0.5,0.03,0.1))
statuses <- gl(2,1000,labels=c('Not-Shipped', 'Shipped'))
dat <- data.frame(Week = week, Status = statuses)
library(plyr)
text.df<-ddply(dat,.(Week,Status),function(x) data.frame(count=nrow(x)))
text.df$ypos[text.df$Status=="Not-Shipped"]<- -15
text.df$ypos[text.df$Status=="Shipped"]<- -55
p <- ggplot(dat,aes(as.factor(Week),fill=Status))+geom_bar()+
geom_text(data=text.df,aes(x=as.factor(Week),y=ypos,label=count),colour=ifelse(text.df$Status=="Not-Shipped","#F8766D","#00BFC4"))

Resources