Plotting Gradient for Line chart using ggplot_line() - r

Below is the a small dataset which i have tried to reproduce it to my best understanding.As in the attached plot you can observe that i am able to gradient in geom_point() but the same visuals i am trying for geom_line.Note : we are always been provided with data ofTempandvar.Thecat` variable is not given in data set.
df=data.frame(seq=(1:30),Temp =rnorm(30,mean = 34 ,sd=18))
f=summary(df$Temp)
df$cat <- cut(df$Temp,
breaks=c(f[1], f[3] ,f[4] ,f[6]),
labels=c("low","medium","high"))
f=ggplot(df , aes(x=seq ,y= Temp,colour=cat))+ geom_line()
f
Output of Above Code
Required Output
Gradient should as per High , Medium & Low using geom_line() function in ggplot2.

You want to group the lines by cat, but colour them by Temp. For getting a nice gradient, I like the colour scales in the viridis package, but you can play around with scale_colour_gradient instead:
library(viridis)
ggplot(df , aes(x=seq ,y= Temp,colour=Temp, group = cat)) +
geom_line(size = 1.2) +
scale_colour_viridis(option = "A")
Output:
To have a legend for the lines, you can represent cat with something like
shape:
ggplot(df , aes(x=seq ,y= Temp,colour=Temp, group = cat, shape = cat)) +
geom_point(size = 3) +
geom_line(size = 1.2) +
scale_colour_viridis(option = "A")

Is this what you want:
a <- data.frame(seq=(1:30),Temp =rnorm(30,mean = 34 ,sd=18))
ggplot(a, aes(x = seq, y = Temp, color = Temp )) +
geom_line(size = 0.5) +
geom_smooth(aes(color=..y..), size=1.5, method = "loess", se=FALSE) +
scale_colour_gradient2(low = "green", mid = "yellow" , high = "red",
midpoint=median(a$Temp))

Related

How to highlight a column in ggplot2

I have the following graph and I want to highlight the columns (both) for watermelons as it has the highest juice_content and weight. I know how to change the color of the columns but I would like to WHOLE columns to be highlighted. Any idea on how to achieve this? There doesn't seems to be any similar online.
fruits <- c("apple","orange","watermelons")
juice_content <- c(10,1,1000)
weight <- c(5,2,2000)
df <- data.frame(fruits,juice_content,weight)
df <- gather(df,compare,measure,juice_content:weight, factor_key=TRUE)
plot <- ggplot(df, aes(fruits,measure, fill=compare)) + geom_bar(stat="identity", position=position_dodge()) + scale_y_log10()
An option is to use gghighlight
library(gghighlight)
ggplot(df, aes(fruits,measure, fill = compare)) +
geom_col(position = position_dodge()) +
scale_y_log10() +
gghighlight(fruits == "watermelons")
In response to your comment, how about working with different alpha values
ggplot(df, aes(fruits,measure)) +
geom_col(data = . %>% filter(fruits == "watermelons"),
mapping = aes(fill = compare),
position = position_dodge()) +
geom_col(data = . %>% filter(fruits != "watermelons"),
mapping = aes(fill = compare),
alpha = 0.2,
position = position_dodge()) +
scale_y_log10()
Or you can achieve the same with one geom_col and a conditional alpha (thanks #Tjebo)
ggplot(df, aes(fruits, measure)) +
geom_col(
mapping = aes(fill = compare, alpha = fruits == 'watermelons'),
position = position_dodge()) +
scale_alpha_manual(values = c(0.2, 1)) +
scale_y_log10()
You could use geom_area to highlight behind the bars. You have to force the x scale to discrete first which is why I've used geom_blank (see this answer geom_ribbon overlay when x-axis is discrete) noting that geom_ribbon and geom_area are effectively the same except geom_area always has 0 as ymin
#minor edit so that the level isn't hard coded
watermelon_level <- which(levels(df$fruits) == "watermelons")
AreaDF <- data.frame(fruits = c(watermelon_level-0.5,watermelon_level+0.5))
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes( y = max(df$measure)), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
Edit to address comment
If you want to highlight multiple fruits then you could do something like this. You need a data.frame with where you want the geom_area x and y, including dropping it to 0 between. I'm sure there's slightly tidier methods of getting the data.frame but this one works
highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
AreaDF <- data.frame(fruits = unlist(lapply(highlight_level, function(x) c(x -0.51,x -0.5,x+0.5,x+0.51))),
yval = rep(c(1,max(df$measure),max(df$measure),1), length(highlight_level)))
AreaDF <- AreaDF %>% mutate(
yval = ifelse(floor(fruits) %in% highlight_level & ceiling(fruits) %in% highlight_level, max(df$measure), yval)) %>%
arrange(fruits) %>% distinct()
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes(y = yval ), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
plot

Add legend using geom_point and geom_smooth from different dataset

I really struggle to set the correct legend for a geom_point plot with loess regression, while there is 2 data set used
I got a data set, who is summarizing activity over a day, and then I plot on the same graph, all the activity per hours and per days recorded, plus a regression curve smoothed with a loess function, plus the mean of each hours for all the days.
To be more precise, here is an example of the first code, and the graph returned, without legend, which is exactly what I expected:
# first graph, which is given what I expected but with no legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = 20, size = 3) +
geom_smooth(method = "loess", span = 0.2, color = "red", fill = "blue")
and the graph (in grey there is all the data, per hours, per days. the red curve is the loess regression. The blue dots are the means for each hours):
When I tried to set the legend I failed to plot one with the explanation for both kind of dots (data in grey, mean in blue), and the loess curve (in red). See below some example of what I tried.
# second graph, which is given what I expected + the legend for the loess that
# I wanted but with not the dot legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = "blue", size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_identity(name = "legend model", guide = "legend",
labels = "loess regression \n with confidence interval")
I obtained the good legend for the curve only
and another trial :
# I tried to combine both date set into a single one as following but it did not
# work at all and I really do not understand how the legends works in ggplot2
# compared to the normal plots
A <- rbind(dat1, dat2)
p <- ggplot(A, aes(x = Heure, y = value, color = variable)) +
geom_point(data = subset(A, variable == "data"), size = 1) +
geom_point(data = subset(A, variable == "Moy"), size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_manual(name = "légende",
labels = c("Data", "Moy", "loess regression \n with confidence interval"),
values = c("darkgray", "royalblue", "red"))
It appears that all the legend settings are mixed together in a "weird" way, the is a grey dot covering by a grey line, and then the same in blue and in red (for the 3 labels). all got a background filled in blue:
If you need to label the mean, might need to be a bit creative, because it's not so easy to add legend manually in ggplot.
I simulate something that looks like your data below.
dat1 = data.frame(
Hour = rep(1:24,each=10),
value = c(rnorm(60,0,1),rnorm(60,2,1),rnorm(60,1,1),rnorm(60,-1,1))
)
# classify this as raw data
dat1$Data = "Raw"
# calculate mean like you did
dat2 <- dat1 %>% group_by(Hour) %>% summarise(value=mean(value))
# classify this as mean
dat2$Data = "Mean"
# combine the data frames
plotdat <- rbind(dat1,dat2)
# add a dummy variable, we'll use it later
plotdat$line = "Loess-Smooth"
We make the basic dot plot first:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)
Note with the size, we set guide to FALSE so it will not appear. Now we add the loess smooth, one way to introduce the legend is to introduce a linetype, and since there's only one group, you will have just one variable:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)+
geom_smooth(data=subset(plotdat,Data="Raw"),
aes(linetype=line),size=1,alpha=0.3,
method = "loess", span = 0.2, color = "red", fill = "blue")

How to put different colors in a multifactor boxplot with ggplot2 in R? [duplicate]

This question already has an answer here:
Changing colour schemes between facets
(1 answer)
Closed 3 years ago.
I am doing a boxplot in ggplot2, but I have been unable to find a way to deal with multiple colors across a 3 x 3 factor design.
This is an example code what I have able to do (using as a guide this thread):
library(ggplot2)
data <- data.frame(
value = sample(1:50),
animals = sample(c("cat","dog","zebra"), 50, replace = TRUE),
region = sample(c("forest","desert","tundra"), 50, replace = TRUE)
)
ggplot(data, aes(animals, value)) + geom_boxplot(aes(fill = animals)) +
facet_grid(~region) + scale_fill_brewer()
I am being able to use the color blue scale for the the categories: desert, forest and tundra. You can see the output here.
However, what I would like to use a diferent color scale for each one this categories. For example: yellow scale for dessert, green scale for forest and blue for tundra. Thanks!
The easiest way is to use alpha for transparency as a dimension, as suggested at the possible dupe. It's a little different to get a nice legend for boxplots, here's a worked example. (Though, since they have x-labels, you could probably just set guide = FALSE in the alpha scale.)
ggplot(data, aes(animals, value)) +
geom_boxplot(aes(fill = region, alpha = animals)) +
facet_grid( ~ region) +
scale_alpha_discrete(
range = c(0.3, 0.9),
guide = guide_legend(override.aes = list(fill = "black"))) +
scale_fill_manual(values = c("goldenrod2", "forestgreen", "dodgerblue4"))
You can do this in a not-so-elegant way with data manipulation.
library(ggplot2)
library(dplyr)
data <- data.frame(value = sample(1:50),
animals = sample(c("cat","dog","zebra"), 50, replace = TRUE),
region = sample(c("forest","desert","tundra"), 50, replace = TRUE))
data <- data %>%
dplyr::mutate(fill = paste(animals, "-", region))
ggplot(data, aes(animals, value)) +
geom_boxplot(aes(fill = fill), col = "black", show.legend = F) +
facet_grid(~region) +
scale_fill_manual(values = c("gold3", "green3", "blue",
"yellow", "green4", "blue4",
"goldenrod", 'greenyellow', "dodgerblue2"))

R : ggplot2 plot several data frames in one plot

I'm little bit stuck on ggplot2 trying to plot several data frame in one plot.
I have several data frame here I'll present just two exemples.
The data frame have the same Header but are different. Let say that I want to count balls that I have in 2 boxes.
name=c('red','blue','green','purple','white','black')
value1=c(2,3,4,2,6,8)
value2=c(1,5,7,3,4,2)
test1=data.frame("Color"=name,"Count"=value1)
test2=data.frame("Color"=name,"Count"=value2)
What I'm trying to do it's to make a bar plot of my count.
At the moment what I did it's :
(plot_test=ggplot(NULL, aes(x= Color, y=Count)) +
geom_bar(data=test1,stat = "identity",color='green')+
geom_bar(data=test2,stat = "identity",color='blue')
)
I want to have x=Color and y=Count, and barplot of test2 data frame next to test1. Here there are overlapping themselves. So I'll have same name twice in x but I want to plot the data frames in several color and got in legend the name.
For example "Green bar" = test1
"Blue bar" = test2
Thank you for your time and your help.
Best regards
You have two options here:
Either tweak the size and position of the bars
ggplot(NULL, aes(x= Color, y=Count)) +
geom_bar(data=test1, aes(color='test1'), stat = "identity",
width=.4, position=position_nudge(x = -0.2)) +
geom_bar(data=test2, aes(color='test2'), stat = "identity",
width=.4, position=position_nudge(x = 0.2))
or what I recommend is join the two data frames together and then plot
library(dplyr)
test1 %>%
full_join(test2, by = 'Color') %>%
data.table::melt(id.vars = 'Color') %>%
ggplot(aes(x= Color, y=value, fill = variable)) +
geom_bar(stat = "identity", position = 'dodge')
Try this:
name=c('red','blue','green','purple','white','black')
value1=c(2,3,4,2,6,8)
value2=c(1,5,7,3,4,2)
test1=data.frame("Color"=name,"Count"=value1)
test2=data.frame("Color"=name,"Count"=value2)
test1$var <- 'test1'
test2$var <- 'test2'
test_all <- rbind(test1,test2)
(plot_test=ggplot(data=test_all) +
geom_bar(aes(x=Color,y=Count,color=var),
stat = "identity", position=position_dodge(1))+
scale_color_manual(values = c('green', 'blue'))
)
This will do what you were trying to do:
balls <- data.frame(
count = c(c(2,3,4,2,6,8),c(1,5,7,3,4,2)),
colour = c(c('red','blue','green','purple','white','black'),c('red','blue','green','purple','white','black')),
box = c(rep("1", times = 6), rep("2", times = 6))
)
ggplot(balls, aes(x = colour, y = count, fill = box)) +
geom_col() +
scale_fill_manual(values = c("green","blue"))
This is better because it facilitates comparisons between the box counts:
ggplot(balls, aes(x = colour, y = count)) +
geom_col() +
facet_wrap(~ box, ncol = 1, labeller = as_labeller(c("1" = "Box #1", "2" = "Box #2")))

Overlaying histograms with ggplot2 in R

I am new to R and am trying to plot 3 histograms onto the same graph.
Everything worked fine, but my problem is that you don't see where 2 histograms overlap - they look rather cut off.
When I make density plots, it looks perfect: each curve is surrounded by a black frame line, and colours look different where curves overlap.
Can someone tell me if something similar can be achieved with the histograms in the 1st picture? This is the code I'm using:
lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
Using #joran's sample data,
ggplot(dat, aes(x=xx, fill=yy)) + geom_histogram(alpha=0.2, position="identity")
note that the default position of geom_histogram is "stack."
see "position adjustment" of this page:
geom_histogram documentation
Your current code:
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
is telling ggplot to construct one histogram using all the values in f0 and then color the bars of this single histogram according to the variable utt.
What you want instead is to create three separate histograms, with alpha blending so that they are visible through each other. So you probably want to use three separate calls to geom_histogram, where each one gets it's own data frame and fill:
ggplot(histogram, aes(f0)) +
geom_histogram(data = lowf0, fill = "red", alpha = 0.2) +
geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
geom_histogram(data = highf0, fill = "green", alpha = 0.2) +
Here's a concrete example with some output:
dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))
ggplot(dat,aes(x=xx)) +
geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)
which produces something like this:
Edited to fix typos; you wanted fill, not colour.
While only a few lines are required to plot multiple/overlapping histograms in ggplot2, the results are't always satisfactory. There needs to be proper use of borders and coloring to ensure the eye can differentiate between histograms.
The following functions balance border colors, opacities, and superimposed density plots to enable the viewer to differentiate among distributions.
Single histogram:
plot_histogram <- function(df, feature) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
geom_density(alpha=0.3, fill="red") +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
print(plt)
}
Multiple histogram:
plot_multi_histogram <- function(df, feature, label_column) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
Usage:
Simply pass your data frame into the above functions along with desired arguments:
plot_histogram(iris, 'Sepal.Width')
plot_multi_histogram(iris, 'Sepal.Width', 'Species')
The extra parameter in plot_multi_histogram is the name of the column containing the category labels.
We can see this more dramatically by creating a dataframe with many different distribution means:
a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))
Passing data frame in as before (and widening chart using options):
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')
To add a separate vertical line for each distribution:
plot_multi_histogram <- function(df, feature, label_column, means) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(xintercept=means, color="black", linetype="dashed", size=1)
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
The only change over the previous plot_multi_histogram function is the addition of means to the parameters, and changing the geom_vline line to accept multiple values.
Usage:
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, "n", 'category', c(1, 2, 3, 4, 5, 6))
Result:
Since I set the means explicitly in many_distros I can simply pass them in. Alternatively you can simply calculate these inside the function and use that way.

Resources