R ggplot conditional color without exact match - r

I am trying to color points in a line conditional if they are above or below the yearly mean in ggplot2 and I cannot find any help that where colors are not exactly matched to values.
I'm using the following code:
ggplot(aes(x = M, y = O)) + geom_line()
I want it to be one color if O is above mean(O) or below.
I tried to follow the advice but I just get a split graph when I use:
mutate(color=ifelse(O>mean(O),"green","red")) %>% ggplot(aes(x=M,y=O,color=color))+geom_line()+scale_color_manual(values=c("red", "darkgreen"))
I get the following graph:

This works, but makes a break in the line.
library(tidyverse)
df <- data.frame(
M = 1:5,
O = c(1, 2, 3, 4, 5)
)
df <- mutate(df, above = O > mean(O))
ggplot(df, aes(x=M,y=O, color=above))+geom_line()

Build a variable color to mark your color type.
For points use geom_point(), not geom_line().
Edit: color option splits the data in 2 groups. Use group=1 (one value for all) to force a single group.
Advice: Avoid naming a variable O, there is a big confusion with 0 (zero).
library(tidyverse)
df <- data.frame(M=rnorm(10), O=rnorm(10)) %>%
mutate(color=ifelse(O > mean(O), T, F))
#ggplot(df, aes(x=M, y=O, color = color)) + geom_point()
ggplot(df, aes(x=M, y=O, color = color, group=1)) + geom_line() + scale_color_manual(values=c("red", "green"))
# > df
# M O color
# 1 0.05829207 -0.03490925 FALSE
# 2 -0.09255111 -0.52513201 FALSE
# 3 0.44859944 0.19371037 FALSE
# 4 -0.54216222 0.40783749 TRUE

Related

Extract dplyr tbl and create vector where column 1 = column 2

In order to map colours to group, I am using the scale_colour_manual(values = c("G1" = "grey", ...)) function of the {ggplot2}package.
I have a main tibble with data where you can find groups, and I would like to highlight a specific group. Here G3 is highlighted, however this is not necessarily the case for all plots I want to generate.
Here is some sample data:
groups <- as_tibble(c("G1", "G2", "G3"))
colours <- as_tibble(c("grey", "grey", "purple"))
I then pull the vectors, but I don't know how to get the result mentioned above (values = c("G1" = "grey", ...))
groups_vec <- groups %>% pull()
colours_vec <- colours %>% pull()
myvalues <- c(groups_vec = colours_vec)
# this code returns the following
groups_vec1 groups_vec2 groups_vec3
"grey" "grey" "purple"
whereas I expect the following result:
c("G1" = "grey", "G2" = "grey", "G3" = "purple")
G1 G2 G3
"grey" "grey" "purple"
I can't find the right words to describe my problem, hope the example is clear enough.
The following should help.
library(dplyr)
library(ggplot2)
# simulate some data
some_data <- tibble(
GROUPS = c("Group1", "Group2", "Group3","Group4"),
VALUES = c(2,5,9,3),
COL = c("grey","grey","lightblue","grey")
)
# plot
ggplot(data = some_data) +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL)) +
scale_fill_manual(values = c("grey", "green"))
Try to understand how the variable "COL" and the "values" in scale_fill_manual() work together.
EDITED ANSWER
You did not provide a reproducible example. Thus, the answer might not be exactly what you are fishing for. But I truly hope the following will help you understand how {ggplot} works with aesthetics and how you can control the way these aesthetics are presented.
I build the example on a simple geom_col(). You might have another use-case. But the principle should be transferable to all geoms. Just note that for some geoms you need to use color instead of fill.
P.S. I also recommend to use a single data-frame for your plot. There is no need to keep every variable in a separate tibble. Just add a column with the color-flag you want to use. This might simplify your code ... and the number of objects you use.
Let's start with a simple plot of what we have.
library(dplyr)
library(ggplot2)
# simulate some data
some_data <- tibble(
GROUPS = c("Group1", "Group2", "Group3","Group4"),
VALUES = c(2,5,9,3),
COL = c("grey","grey","purple","grey")
)
# understand how ggplot uses "categories" based on COL variable
# without specification ggplot uses the default colors for 2 different "categories", i.e. grey and purple
p <- ggplot(data = some_data) +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL))
p
Please note that {ggplot} uses the default colors in a specific sequence (as they come). We come to this back later, as this is the sequence you will need to control. To make this more prominent, recode your colors, e.g. instead of purple set it to highlight (or yes,Y). The characters are only a "category" for {ggplot}. The actual value is of secondary order (while we humans assign meaning to something like "green" or "purple").
This yields:
Next look into assigning colors. Key take away, the sequence of your color specification matters!
p1 <- p +
scale_fill_manual(values = c("grey", "purple")) +
labs(subtitle = "order of categories [grey, purple]")
p2 <- p +
scale_fill_manual(values = c("purple", "grey")) +
labs(subtitle = "order of categories [purple, grey]")
library(patchwork) # for demo purposes we plot both graphs side by side
p1 + p2
You did not provide a reproducible example. Thus, let's emulate your function f() by assigning a new color based on a value. (Your function might be more complex, but I understand it will give you a flag for the color).
# some operation that can happen in a function to change the color coding
# e.g. we pick the value 5
some_data2 <- some_data %>%
mutate(COL = case_when(
between(VALUES, 4,6) ~ "purple"
,TRUE ~ "grey"
))
# note - now we have a vector of values for each "row" (aka group) in your dataframe
color_vec <- c("grey", "yellow", "grey", "purple")
some_data2 <- some_data2 %>% mutate(COL2 = color_vec)
Have a look at the tibble:
some_data2
# A tibble: 4 × 4
GROUPS VALUES COL COL2
<chr> <dbl> <chr> <chr>
1 Group1 2 grey grey
2 Group2 5 purple yellow
3 Group3 9 grey grey
4 Group4 3 grey purple
Let's plot this tibble using aesthetic fill for column COL and set our color-vector as the desired sequence in scale_fill_manual():
some_data2 %>%
ggplot() +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL)) +
scale_fill_manual(values = color_vec)
OOOOOPS!?! what happened here?
Again. try to understand color assigment and number of colors in your plot (aka "categories").
Let's now use our "new" color column, i.e. COL2.
some_data2 %>%
ggplot() +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL2)) +
scale_fill_manual(values = color_vec)
Try to work out why this also not works.
What you probably want is a sequence of colors dependent on the values you want to highlight. Your example does not explicitly mention how you construct this. Above provides you a pointer how to handle this with a simple highlight vs no-highlight (2 categories). But you can obviously define more colors like in the following example. In the following example, we make use of the fact that our column COL2 uses the target colors based on your function. Please note that you could define the categories (aka breaks different from the color values - think factor label and level).
# note - we define values for the different "categories" that we expect
# in this example now 3 (order matters - c.f. above!)
categories <- some_data2$COL2
color_vec2 <- some_data2$COL2 # if you use flags that are different from colors you can define them here
some_data2 %>%
ggplot() +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL2)) +
scale_fill_manual(
# breaks sets the sequence of your categories
breaks = categories
# values are the colors you want to use
, values = color_vec2)
You can use - for geom_col() - the fill aesthetic to have it colored in - what I call here - "categories".
With scale_fill_manual(), you control the color sequence of these categories. You may want to create this vector based on the order of colors.

Where does ggplot set the order of the color scheme?

I have a data set that I'm showing in a series of violin plots with one categorical variable and one continuous numeric variable. When R generated the original series of violins, the categorical variable was plotted alphabetically (I rotated the plot, so it appears alphabetically from bottom to top). I thought it would look better if I sorted them using the numeric variable.
When I do this, the color scheme doesn't turn out as I wanted it to. It's like R assigned the colors to the violins before it sorted them; after the sorting, they kept their original colors - which is the opposite of what I wanted. I wanted R to sort them first and then apply the color scheme.
I'm using the viridis color scheme here, but I've run into the same thing when I used RColorBrewer.
Here is my code:
# Start plotting
g <- ggplot(NULL)
# Violin plot
g <- g + geom_violin(data = df, aes(x = reorder(catval, -numval,
na.rm = TRUE), y = numval, fill = catval), trim = TRUE,
scale = "width", adjust = 0.5)
(snip)
# Specify colors
g <- g + scale_colour_viridis_d()
# Remove legend
g <- g + theme(legend.position = "none")
# Flip for readability
g <- g + coord_flip()
# Produce plot
g
Here is the resulting plot.
If I leave out the reorder() argument when I call geom_violin(), the color order is what I would like, but then my categorical variable is sorted alphabetically and not by the numeric variable.
Is there a way to get what I'm after?
I think this is a reproducible example of what you're seeing. In the diamonds dataset, the mean price of "Good" diamonds is actually higher than the mean for "Very Good" diamonds.
library(dplyr)
diamonds %>%
group_by(cut) %>%
summarize(mean_price = mean(price))
# A tibble: 5 x 2
cut mean_price
<ord> <dbl>
1 Fair 4359.
2 Good 3929.
3 Very Good 3982.
4 Premium 4584.
5 Ideal 3458.
By default, reorder uses the mean of the sorting variable, so Good is plotted above Very Good. But the fill is still based on the un-reordered variable cut, which is a factor in order of quality.
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price, fill = cut)) +
geom_violin() +
coord_flip()
If you want the color to follow the ordering, then you could reorder upstream of ggplot2, or reorder in both aesthetics:
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price,
fill = reorder(cut, -price))) +
geom_violin() +
coord_flip()
Or
diamonds %>%
mutate(cut = reorder(cut, -price)) %>%
ggplot(aes(x = cut, y = price, fill = cut)) +
geom_violin() +
coord_flip()

ggplot: How does geom_tile calculate the fill? [duplicate]

I used geom_tile() for plot 3 variables on the same graph... with
tile_ruined_coop<-ggplot(data=df.1[sel1,])+
geom_tile(aes(x=bonus, y=malus, fill=rf/300))+
scale_fill_gradient(name="vr")+
facet_grid(Seuil_out_coop_i ~ nb_coop_init)
tile_ruined_coop
and I am pleased with the result !
But What kind of statistical treatment is applied to fill ? Is this a mean ?
To plot the mean of the fill values you should aggregate your values, before plotting. The scale_colour_gradient(...) does not work on the data level, but on the visualization level.
Let's start with a toy Dataframe to build a reproducible example to work with.
mydata = expand.grid(bonus = seq(0, 1, 0.25), malus = seq(0, 1, 0.25), type = c("Risquophile","Moyen","Risquophobe"))
mydata = do.call("rbind",replicate(40, mydata, simplify = FALSE))
mydata$value= runif(nrow(mydata), min=0, max=50)
mydata$coop = "cooperative"
Now, before plotting I suggest you to calculate the mean over your groups of 40 values, and for this operation like to use the dplyr package:
library(dplyr)
data = mydata %>% group_by("bonus","malus","type","coop") %>% summarise(vr=mean(value))
Tow you have your dataset ready to plot with ggplot2:
library(ggplot2)
g = ggplot(data, aes(x=bonus,y=malus,fill=vr))
g = g + geom_tile()
g = g + facet_grid(type~coop)
and this is the result:
where you are sure that the fill value is exactly the mean of your values.
Is this what you expected?
It uses stat_identity as can be seen in the documentation. You can test that easily:
DF <- data.frame(x=c(rep(1:2, 2), 1),
y=c(rep(1:2, each=2), 1),
fill=1:5)
# x y fill
#1 1 1 1
#2 2 1 2
#3 1 2 3
#4 2 2 4
#5 1 1 5
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
As you see the fill value for the 1/1 combination is 5. If you use factors it's even more clear what happens:
p <- ggplot(data=DF) +
geom_tile(aes(x=x, y=y, fill=factor(fill)))
print(p)
If you want to depict means, I'd suggest to calculate them outside of ggplot2:
library(plyr)
DF1 <- ddply(DF, .(x, y), summarize, fill=mean(fill))
p <- ggplot(data=DF1) +
geom_tile(aes(x=x, y=y, fill=fill))
print(p)
That's easier than trying to find out if stat_summary can play with geom_tile somehow (I doubt it).
scale_fill() and geom_tile() apply no statistics -or better apply stat_identity()- to your fill value=rf/300. It just computes how many colors you use and then generates the colors with the munsell function 'mnsl()'. If you want to apply some statistics only to the colors displayed you should use:
scale_colour_gradient(trans = "log")
or
scale_colour_gradient(trans = "sqrt")
Changing the colors among the tiles could not be the best idea since the plots have to be comparable, and you compare the values by their colours. Hope this helps

Drawing several "numeric" lines using ggplot

I have a dataset which contains 200 different groups, which can take some a between 0 and 200. I would like to draw a line for every group, so a total of 200 lines and have the legend to be "numeric". I know how to do this with a factor, but cant get it to work. Not the best example:
library(tidyverse)
df <- data.frame(Day = 1:100)
df <- df %>% mutate(A = Day + runif(100,1,400) + rnorm(100,3,400) + 2500,
B = Day + rnorm(100,2,900) + -5000 ,
C = Day + runif(100,1,50) + rnorm(100,1,1000) -500,
D = (A+B+C)/5 - rnorm(100, 3,450) - 2500)
df <- gather(df, "Key", "Value", -Day)
df$Key1 <- apply(df, 1, function(x) which(LETTERS == x[2]))
ggplot(df, aes(Day, Value, col = Key)) + geom_line() # I would to keep 4 lines, but would like have the following legend
ggplot(df, aes(Day, Value, col = Key1)) + geom_line() # Not correct lines
ggplot(df, aes(Day, Value)) + geom_line(aes(col = Key1)) # Not correct lines
Likely a duplicate, but I cant find the answer and guess there is something small that is incorrect.
Is this what you mean? I'm not sure since you say you want 200 lines, but in your code you say you want 4 lines.
ggplot(df, aes(Day, Value, group = Key, col=Key1)) + geom_line()
Using group gives you the different lines, using col gives you the different colours.

ggplot Highlight a point where x axis equals a value

If you run the code below you will a line graph. How can I change the color of the point at x = 2 to RED and increase it's size?
In this case the on the graph the point at (.6) where x = 2 would be highlighted red and made bigger.
Here is my code:
library("ggplot2")
data<-data.frame(time= c(1,2,3), value = c(.4,.6,.7))
ggplot(data, aes( x = time, y=value) ) + geom_line() + geom_point(shape = 7,size = 1)
Thank you!
If your dataset is small you could do this:
> library("ggplot2")
> data<-data.frame(time= c(1,2,3), value = c(.4,.6,.7),point_size=c(1,10,1),cols=c('black','red','black'))
> ggplot(data, aes( x = time, y=value) ) + geom_line() + geom_point(shape = 7,size = data$point_size, colour=data$cols)
Makes:
Also I would not advise calling your data frame data
In addition to #Harpal's solution, you can add two more columns to your data frame where pointsize and -color is specified according to particular conditions:
df <- data.frame(time= c(1,2,3), value = c(.4,.6,.7))
# specify condition and pointsize here
df$pointsize <- ifelse(df$value==0.6, 5, 1)
# specify condition and pointcolour here
df$pointcol <- ifelse(df$value==0.6, "red", "black")
ggplot(df, aes(x=time, y=value)) + geom_line() + geom_point(shape=7, size=df$pointsize, colour=df$pointcol)
You may change ifelse(df$value==0.6, 5, 1) to meet any criteria you like, or you use a more complex approach to specifiy more conditions to be met:
df <- data.frame(time= c(1,2,3), value = c(.4,.6,.7))
df$pointsize[which(df$value<0.6)] <- 1
df$pointsize[which(df$value>0.6)] <- 8
df$pointsize[which(df$value==0.6)] <- 5
df$pointcol[which(df$value<0.6)] <- "black"
df$pointcol[which(df$value>0.6)] <- "green"
df$pointcol[which(df$value==0.6)] <- "red"
ggplot(df, aes(x=time, y=value)) + geom_line() + geom_point(shape=7, size=df$pointsize, colour=df$pointcol)

Resources