Gradient fill columns using ggplot2 doesn't seem to work - r

I would like to create a gradient within each bar of this graph going from red (low) to green (high).
At present I am using specific colours within geom_col but want to have each individual bar scale from red to green depending on the value it depicts.
Here is a simplified version of my graph (there is also a geom_line on the graph (among several other things such as titles, axis adjustments, etc.), but it isn't relevant to this question so I have excluded it from the example):
I have removed the hard-coded colours from the columns and tried using things such as scale_fill_gradient (and numerous other similar functions) to apply a gradient to said columns, but nothing seems to work.
Here is what the output is when I use scale_fill_gradient(low = "red", high = "green"):
What I want is for each bar to have its own gradient and not for each bar to represent a step in said gradient.
How can I achieve this using ggplot2?
My code for the above (green) example:
ggplot(data = all_chats_hn,
aes(x = as.Date(date))) +
geom_col(aes(y = total_chats),
colour = "black",
fill = "forestgreen")

I'm not sure if that is possible with geom_col. It is possible by using geom_line and a little data augmentation. We have to use the y value to create a sequence of y values (y_seq), so that the gradient coloring works. We also create y_seq_scaled in case you want each line to have an "independent" gradient.
library(tidyverse)
set.seed(123) # reproducibility
dat <- data_frame(x = 1:10, y = abs(rnorm(10))) %>%
group_by(x) %>%
mutate(y_seq = list(seq(0, y, length.out = 100))) %>% # create sequence
unnest(y_seq) %>%
mutate(y_seq_scaled = (y_seq - mean(y_seq)) / sd(y_seq)) # scale sequence
# gradient for all together
ggplot(dat, aes(x = factor(x), y = y_seq, colour = y_seq))+
geom_line(size = 2)+
scale_colour_gradient(low = 'red', high = 'green')
# independent gradients per x
ggplot(dat, aes(x = factor(x), y = y_seq, colour = y_seq_scaled))+
geom_line(size = 2)+
scale_colour_gradient(low = 'red', high = 'green')

Related

Overlay violin plots in r

I am trying to plot overlaying violin plots by condition within the same variable.
Var <- rnorm(100,50)
Cond <- rbinom(100, 1, 0.5)
df2 <- data.frame(Var,Cond)
ggplot(df2)+
aes(x=factor(Cond),y=Var, colour = Cond)+
geom_violin(alpha=0.3,position="identity")+
coord_flip()
So, where do I specify that I want them to overlap? Preferably, I want them to become more lighter when overlapping and darker colour when not so that their differences are clear. Any clues?
If you don't want them to have different (flipped) x-values, set x to a constant instead of x = factor(Cond). And if you want them filled in, set a fill aesthetic.
ggplot(df2)+
aes(x=0,y=Var, colour = Cond, fill = Cond)+
geom_violin(alpha=0.3,position="identity")+
coord_flip()
coord_flip isn't often needed anymore--since version 3.3.0 (released in early 2020) all geoms can point in either direction. I'd recommend simplifying as below for a similar result.
df2$Cond = factor(df2$Cond)
ggplot(df2) +
aes(y = 0, x = Var, colour = Cond, fill = Cond) +
geom_violin(alpha = 0.3, position = "identity")

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

Create a ggplot barchart/histogram with fill and outline colors indicated by different variables

I am trying to create a chart in ggplot along these lines:
...in which the color of the boxes is indicated by one variable, and the outline of the boxes is indicated by another variable.
Assuming the data is structured like this:
df<-data.frame(index=1:50,date=sample(seq(as.Date('1999/01/01'), as.Date('1999/06/01'), by="day"), 50,replace=T),
V1=sample(c("Indigenous","Import-related","Imported","Unknown"), 50,replace=T),
V2=sample(c(NA,"Zombie","Pulmonary Hemorrhage"), 50,replace=T))
All I can think of is something like this:
require(ggplot2)
#draw the histogram with fill determined by V1
p<-ggplot(data=df)+geom_histogram(aes(x=date,group=V1,fill=V1),binwidth=7,color="black",alpha=0.9)
#draw the individual boxes for each case
p1<-p+scale_fill_discrete()+geom_histogram(aes(x=date,group=index),binwidth=7,color="black",alpha=0)
#attempt to draw green boxes for one value of V2
p2<-p1+geom_histogram(aes(x=date,group=V2=="Zombie"),binwidth=7,color="green",alpha=0,size=1.2)
#attempt to draw orange boxes for the other value of V2
p3<-p2+geom_histogram(aes(x=date,group=V2=="Pulmonary Hemorrhage"),binwidth=7,color="orange",alpha=0,size=1.2)
However, this is not working as it draws borders everywhere and I cannot isolate individual cases using this approach, as you can see.
Is there a ggplot solution? If I can't do colored boxes, I could indicate V2 by some kind of text annotation on the appropriate boxes, but then would have to figure out x and y for each label, and that's giving me fits as well.
We can try like this
df<-data.table(index=1:50,date=sample(seq(as.Date('1999/01/01'), as.Date('1999/06/01'), by="day"), 50,replace=T),
V1=sample(c("Indigenous","Import-related","Imported","Unknown"), 50,replace=T),
V2=sample(c(NA,"Zombie","Pulmonary Hemorrhage"), 50,replace=T))
df <- df[,.(count=.N), by = .(date,V1,V2)]
windows()
ggplot(data = df, aes(x = date, y = count, color = V2, fill = V1)) +
geom_bar(stat = "identity", position = "stack", width = 2, size = 1) +
scale_fill_manual(values=c("red4", "blue4", "green4", "blue")) +
scale_color_manual(values=c("orange", "green", "black")) +
scale_y_continuous(breaks = seq(1,3,1))

Scatterplot fill doesn't change in ggplot2 even when specified

I am trying to assign a single color for all points on a scatter plot in ggplot2. However, no matter what color I set fill = to, the points always end up being black.
Here is my code (with dummy variables):
ggplot(data = testDF) +
geom_point(aes(x = testDF$X, y = testDF$Y),
fill = "#2EC4B6", color = "#E71D36") # Fill is cyan, color is red
This is what the plot looks like:
http://i.imgur.com/fAgEq9R.png
The default shape used by geom_point does not have a fill aesthetic. You can address this by changing the shape parameter:
ggplot( mtcars, aes( x = wt, y = mpg ) ) +
geom_point( shape = 21, fill = "#2EC4B6", color = "#E71D36" )
You need to set the dot style with pch=21. Default dot style does not use fill.
EDIT actual example:
ggplot(data = testDF) +
geom_point(aes(x = X, y = Y), # testDF has columns X and Y
fill = "#2EC4B6", color = "#E71D36", # Fill is cyan, color is red
pch = 21) # point style that uses both color and fill
As #Artem pointed out, you can also use the shape aesthetic to change the dot style. I don't know exactly what the difference between pch and shape is, but I think pch is from older versions of ggplot2.

Can I fix overlapping dashed lines in a histogram in ggplot2?

I am trying to plot a histogram of two overlapping distributions in ggplot2. Unfortunately, the graphic needs to be in black and white. I tried representing the two categories with different shades of grey, with transparency, but the result is not as clear as I would like. I tried adding outlines to the bars with different linetypes, but this produced some strange results.
require(ggplot2)
set.seed(65)
a = rnorm(100, mean = 1, sd = 1)
b = rnorm(100, mean = 3, sd = 1)
dat <- data.frame(category = rep(c('A', 'B'), each = 100),
values = c(a, b))
ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 1) +
scale_fill_grey()
Notice that one of the lines that should appear dotted is in fact solid (at a value of x = 4). I think this must be a result of it actually being two lines - one from the 3-4 bar and one from the 4-5 bar. The dots are out of phase so they produce a solid line. The effect is rather ugly and inconsistent.
Is there any way of fixing this overlap?
Can anyone suggest a more effective way of clarifying the difference between the two categories, without resorting to colour?
Many thanks.
One possibility would be to use a 'hollow histogram', as described here:
# assign your original plot object to a variable
p1 <- ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
# p1
# extract relevant variables from the plot object to a new data frame
# your grouping variable 'category' is named 'group' in the plot object
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
# plot using geom_step
ggplot(data = df, aes(x = xmin, y = y, linetype = factor(group))) +
geom_step()
If you want to vary both linetype and fill, you need to plot a histogram first (which can be filled). Set the outline colour of the histogram to transparent. Then add the geom_step. Use theme_bw to avoid 'grey elements on grey background'
p1 <- ggplot() +
geom_histogram(data = dat, aes(x = values, fill = category),
colour = "transparent", position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
df$category <- factor(df$group, labels = c("A", "B"))
p1 +
geom_step(data = df, aes(x = xmin, y = y, linetype = category)) +
theme_bw()
First, I would recommend theme_set(theme_bw()) or theme_set(theme_classic()) (this sets the background to white, which makes it (much) easier to see shades of gray).
Second, you could try something like scale_linetype_manual(values=c(1,3)) -- this won't completely eliminate the artifacts you're unhappy about, but it might make them a little less prominent since linetype 3 is sparser than linetype 2.
Short of drawing density plots instead (which won't work very well for small samples and may not be familiar to your audience), dodging the positions of the histograms (which is ugly), or otherwise departing from histogram conventions, I can't think of a better solution.

Resources