Extract dplyr tbl and create vector where column 1 = column 2

Extract dplyr tbl and create vector where column 1 = column 2 - r

In order to map colours to group, I am using the scale_colour_manual(values = c("G1" = "grey", ...)) function of the {ggplot2}package.
I have a main tibble with data where you can find groups, and I would like to highlight a specific group. Here G3 is highlighted, however this is not necessarily the case for all plots I want to generate.
Here is some sample data:
groups <- as_tibble(c("G1", "G2", "G3"))
colours <- as_tibble(c("grey", "grey", "purple"))
I then pull the vectors, but I don't know how to get the result mentioned above (values = c("G1" = "grey", ...))
groups_vec <- groups %>% pull()
colours_vec <- colours %>% pull()
myvalues <- c(groups_vec = colours_vec)
# this code returns the following
groups_vec1 groups_vec2 groups_vec3
"grey" "grey" "purple"
whereas I expect the following result:
c("G1" = "grey", "G2" = "grey", "G3" = "purple")
G1 G2 G3
"grey" "grey" "purple"
I can't find the right words to describe my problem, hope the example is clear enough.

The following should help.
library(dplyr)
library(ggplot2)
# simulate some data
some_data <- tibble(
GROUPS = c("Group1", "Group2", "Group3","Group4"),
VALUES = c(2,5,9,3),
COL = c("grey","grey","lightblue","grey")
)
# plot
ggplot(data = some_data) +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL)) +
scale_fill_manual(values = c("grey", "green"))
Try to understand how the variable "COL" and the "values" in scale_fill_manual() work together.
EDITED ANSWER
You did not provide a reproducible example. Thus, the answer might not be exactly what you are fishing for. But I truly hope the following will help you understand how {ggplot} works with aesthetics and how you can control the way these aesthetics are presented.
I build the example on a simple geom_col(). You might have another use-case. But the principle should be transferable to all geoms. Just note that for some geoms you need to use color instead of fill.
P.S. I also recommend to use a single data-frame for your plot. There is no need to keep every variable in a separate tibble. Just add a column with the color-flag you want to use. This might simplify your code ... and the number of objects you use.
Let's start with a simple plot of what we have.
library(dplyr)
library(ggplot2)
# simulate some data
some_data <- tibble(
GROUPS = c("Group1", "Group2", "Group3","Group4"),
VALUES = c(2,5,9,3),
COL = c("grey","grey","purple","grey")
)
# understand how ggplot uses "categories" based on COL variable
# without specification ggplot uses the default colors for 2 different "categories", i.e. grey and purple
p <- ggplot(data = some_data) +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL))
p
Please note that {ggplot} uses the default colors in a specific sequence (as they come). We come to this back later, as this is the sequence you will need to control. To make this more prominent, recode your colors, e.g. instead of purple set it to highlight (or yes,Y). The characters are only a "category" for {ggplot}. The actual value is of secondary order (while we humans assign meaning to something like "green" or "purple").
This yields:
Next look into assigning colors. Key take away, the sequence of your color specification matters!
p1 <- p +
scale_fill_manual(values = c("grey", "purple")) +
labs(subtitle = "order of categories [grey, purple]")
p2 <- p +
scale_fill_manual(values = c("purple", "grey")) +
labs(subtitle = "order of categories [purple, grey]")
library(patchwork) # for demo purposes we plot both graphs side by side
p1 + p2
You did not provide a reproducible example. Thus, let's emulate your function f() by assigning a new color based on a value. (Your function might be more complex, but I understand it will give you a flag for the color).
# some operation that can happen in a function to change the color coding
# e.g. we pick the value 5
some_data2 <- some_data %>%
mutate(COL = case_when(
between(VALUES, 4,6) ~ "purple"
,TRUE ~ "grey"
))
# note - now we have a vector of values for each "row" (aka group) in your dataframe
color_vec <- c("grey", "yellow", "grey", "purple")
some_data2 <- some_data2 %>% mutate(COL2 = color_vec)
Have a look at the tibble:
some_data2
# A tibble: 4 × 4
GROUPS VALUES COL COL2
<chr> <dbl> <chr> <chr>
1 Group1 2 grey grey
2 Group2 5 purple yellow
3 Group3 9 grey grey
4 Group4 3 grey purple
Let's plot this tibble using aesthetic fill for column COL and set our color-vector as the desired sequence in scale_fill_manual():
some_data2 %>%
ggplot() +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL)) +
scale_fill_manual(values = color_vec)
OOOOOPS!?! what happened here?
Again. try to understand color assigment and number of colors in your plot (aka "categories").
Let's now use our "new" color column, i.e. COL2.
some_data2 %>%
ggplot() +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL2)) +
scale_fill_manual(values = color_vec)
Try to work out why this also not works.
What you probably want is a sequence of colors dependent on the values you want to highlight. Your example does not explicitly mention how you construct this. Above provides you a pointer how to handle this with a simple highlight vs no-highlight (2 categories). But you can obviously define more colors like in the following example. In the following example, we make use of the fact that our column COL2 uses the target colors based on your function. Please note that you could define the categories (aka breaks different from the color values - think factor label and level).
# note - we define values for the different "categories" that we expect
# in this example now 3 (order matters - c.f. above!)
categories <- some_data2$COL2
color_vec2 <- some_data2$COL2 # if you use flags that are different from colors you can define them here
some_data2 %>%
ggplot() +
geom_col(aes(x = GROUPS, y = VALUES, fill = COL2)) +
scale_fill_manual(
# breaks sets the sequence of your categories
breaks = categories
# values are the colors you want to use
, values = color_vec2)
You can use - for geom_col() - the fill aesthetic to have it colored in - what I call here - "categories".
With scale_fill_manual(), you control the color sequence of these categories. You may want to create this vector based on the order of colors.

Related

Where does ggplot set the order of the color scheme?

I have a data set that I'm showing in a series of violin plots with one categorical variable and one continuous numeric variable. When R generated the original series of violins, the categorical variable was plotted alphabetically (I rotated the plot, so it appears alphabetically from bottom to top). I thought it would look better if I sorted them using the numeric variable.
When I do this, the color scheme doesn't turn out as I wanted it to. It's like R assigned the colors to the violins before it sorted them; after the sorting, they kept their original colors - which is the opposite of what I wanted. I wanted R to sort them first and then apply the color scheme.
I'm using the viridis color scheme here, but I've run into the same thing when I used RColorBrewer.
Here is my code:
# Start plotting
g <- ggplot(NULL)
# Violin plot
g <- g + geom_violin(data = df, aes(x = reorder(catval, -numval,
na.rm = TRUE), y = numval, fill = catval), trim = TRUE,
scale = "width", adjust = 0.5)
(snip)
# Specify colors
g <- g + scale_colour_viridis_d()
# Remove legend
g <- g + theme(legend.position = "none")
# Flip for readability
g <- g + coord_flip()
# Produce plot
g
Here is the resulting plot.
If I leave out the reorder() argument when I call geom_violin(), the color order is what I would like, but then my categorical variable is sorted alphabetically and not by the numeric variable.
Is there a way to get what I'm after?

I think this is a reproducible example of what you're seeing. In the diamonds dataset, the mean price of "Good" diamonds is actually higher than the mean for "Very Good" diamonds.
library(dplyr)
diamonds %>%
group_by(cut) %>%
summarize(mean_price = mean(price))
# A tibble: 5 x 2
cut mean_price
<ord> <dbl>
1 Fair 4359.
2 Good 3929.
3 Very Good 3982.
4 Premium 4584.
5 Ideal 3458.
By default, reorder uses the mean of the sorting variable, so Good is plotted above Very Good. But the fill is still based on the un-reordered variable cut, which is a factor in order of quality.
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price, fill = cut)) +
geom_violin() +
coord_flip()
If you want the color to follow the ordering, then you could reorder upstream of ggplot2, or reorder in both aesthetics:
ggplot(diamonds, aes(x = reorder(cut, -price),
y = price,
fill = reorder(cut, -price))) +
geom_violin() +
coord_flip()
Or
diamonds %>%
mutate(cut = reorder(cut, -price)) %>%
ggplot(aes(x = cut, y = price, fill = cut)) +
geom_violin() +
coord_flip()

ggRadar highlight top values in radar

Hi everyone I am making a a radar plot and I want to highlight the two highest values in the factors or levels. Highlight in this case is to make the text of the top tree values bold
require(ggplot2)
require(ggiraph)
require(plyr)
require(reshape2)
require(moonBook)
require(sjmisc)
ggRadar(iris,aes(x=c(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width)))
an example can be like this
thank you

Here is a step-by-step example of how to highlight specific categories in a radar plot. I don't really see the point of all these extra dependencies (ggRadar etc.), as it's pretty straightforward to draw a radar plot in ggplot2 directly using polar coordinates.
First, let's generate some sample data. According to OPs comments and his example based on the iris dataset, we select the maximal value for every variable (from Sepal.Length, Sepal.Width, Petal.Length, Petal.Width); we then store the result in a long tibble for plotting.
library(purrr)
library(dplyr)
library(tidyr)
df <- iris %>% select(-Species) %>% map_df(max) %>% pivot_longer(everything())
df
# # A tibble: 4 x 2
# name value
# <chr> <dbl>
#1 Sepal.Length 7.9
#2 Sepal.Width 4.4
#3 Petal.Length 6.9
#4 Petal.Width 2.5
Next, we make use of a custom coord_radar function (thanks to this post), that is centred around coord_polar and ensures that polygon lines in a polar plot are straight lines rather than curved arcs.
coord_radar <- function (theta = "x", start = - pi / 2, direction = 1) {
theta <- match.arg(theta, c("x", "y"))
r <- if (theta == "x") "y" else "x"
ggproto(
"CordRadar", CoordPolar, theta = theta, r = r, start = start,
direction = sign(direction),
is_linear = function(coord) TRUE)
}
We now create a new column df$face that is "bold" for the top 3 variables (ranked by decreasing value) and "plain" otherwise. We also need to make sure that factor levels of our categories are sorted by row number (otherwise name and face won't necessarily match later).
df <- df %>%
mutate(
rnk = rank(-value),
face = if_else(rnk < 4, "bold", "plain"),
name = factor(name, levels = unique(name)))
We can now draw the plot
library(ggplot2)
ggplot(df, aes(name, value, group = 1)) +
geom_polygon(fill = "red", colour = "red", alpha = 0.4) +
geom_point(colour = "red") +
coord_radar() +
ylim(0, 10) +
theme(axis.text.x = element_text(face = df$face))
Note that this gives a warning, which I choose to ignore here, as we explicitly make use of the vectorised element_text option.
Warning message:
Vectorized input to element_text() is not officially supported.
Results may be unexpected or may change in future versions of ggplot2.

My suggestion would be to identify the highest values you wish to highlight, and put them in a dataframe. Then use geom_richtext() to highlight.

How to add a legend on a multiple line graph in R?

I am trying to plot two different datasets on the same plot. I am using this code to add the lines and to actually plot everything
ggplot()+
geom_point(data=Acc, aes(x=Year, y=Accumulo), color="lightskyblue")+
geom_line(data=Acc, aes(x=Year, y=RM3), color="gold1")+
geom_line(data=Acc, aes(x=Year, y=RM5), color="springgreen3")+
geom_line(data=Acc, aes(x=Year, y=RM50), color="blue")+
geom_line(data=Vulcani, aes(x=Year, y=Accumulo.V), color="red")+
theme_bw()+
scale_x_continuous(expand=expand_scale(0)) + scale_y_continuous(limits=c(50,350),expand=expand_scale(0))
but I can't find any way to add a legend and add custom labels to the different series. I find a way to add legends on a single dataset, but I can't find a way to add to this one a legend on the side

You are better off creating a single dataset tailored to your plot needs before, which would be in the long format, so that you can give a single geom_line() instruction, and add colors to the lines with aes(color = ...) within the call to geom_line(). Here's an example with the midwest dataset (consider them as distinct datasets for the sake of example)
library(ggplot2)
library(dplyr)
library(tidyr)
long_midwest <- midwest %>%
select(popwhite, popasian, PID, poptotal) %>%
gather(key = "variable", value = "value", -PID, -poptotal) # convert to long format
long_midwest2 <- midwest %>%
select(poptotal, perchsd, PID) %>%
gather(key = "variable", value = "value", -PID, -poptotal)
plot_data <- bind_rows(long_midwest, long_midwest2) %>% # bind datasets vertically
mutate(line_type = ifelse(variable == 'perchsd', 'A', 'B')) # creates a line_type variable
ggplot(data = plot_data, aes(x=poptotal, y = value))+
geom_line(aes(color = variable, linetype = line_type)) +
scale_color_manual(
values = c('lightskyblue', 'gold1', 'blue'),
name = "My color legend"
) +
scale_linetype_manual(
values = c(3, 1), # play with the numbers to get the correct styling
name = "My linetype legend"
)
I added a line_type variable to show the most generic case where you want specific mapping between the column values and the line type. If it is the same than, say, variable, just use aes(color = variable, linetype = variable). You can then decide which linetype you want (see here for more details).
For customising the labels, just change the content of variable within the dataset with the desired values.

R ggplot conditional color without exact match

I am trying to color points in a line conditional if they are above or below the yearly mean in ggplot2 and I cannot find any help that where colors are not exactly matched to values.
I'm using the following code:
ggplot(aes(x = M, y = O)) + geom_line()
I want it to be one color if O is above mean(O) or below.
I tried to follow the advice but I just get a split graph when I use:
mutate(color=ifelse(O>mean(O),"green","red")) %>% ggplot(aes(x=M,y=O,color=color))+geom_line()+scale_color_manual(values=c("red", "darkgreen"))
I get the following graph:

This works, but makes a break in the line.
library(tidyverse)
df <- data.frame(
M = 1:5,
O = c(1, 2, 3, 4, 5)
)
df <- mutate(df, above = O > mean(O))
ggplot(df, aes(x=M,y=O, color=above))+geom_line()

Build a variable color to mark your color type.
For points use geom_point(), not geom_line().
Edit: color option splits the data in 2 groups. Use group=1 (one value for all) to force a single group.
Advice: Avoid naming a variable O, there is a big confusion with 0 (zero).
library(tidyverse)
df <- data.frame(M=rnorm(10), O=rnorm(10)) %>%
mutate(color=ifelse(O > mean(O), T, F))
#ggplot(df, aes(x=M, y=O, color = color)) + geom_point()
ggplot(df, aes(x=M, y=O, color = color, group=1)) + geom_line() + scale_color_manual(values=c("red", "green"))
# > df
# M O color
# 1 0.05829207 -0.03490925 FALSE
# 2 -0.09255111 -0.52513201 FALSE
# 3 0.44859944 0.19371037 FALSE
# 4 -0.54216222 0.40783749 TRUE

Draw heatmap tiles for all combination of x-y

I asked a question about the heatmap which was solved here: custom colored heatmap of categorical variables. I defined my scale_fill_manual for all combinations as suggested in the accepted answer.
Based on this question, I would like to know how to tell ggplot2 to plot a heatmap with all combination of variables and not just the ones that are available in the dataframe (given that they are already in the scale_fill_manual but are not showing in the final plot).
How can I do this?
The current plotting code:
df <- data.frame(X = LETTERS[1:3],
Likelihood = c("Almost Certain","Likely","Possible"),
Impact = c("Catastrophic", "Major","Moderate"),
stringsAsFactors = FALSE)
df$color <- paste0(df$Likelihood,"-",df$Impact)
ggplot(df, aes(Impact, Likelihood)) + geom_tile(aes(fill = color),colour = "white") + geom_text(aes(label=X)) +
scale_fill_manual(values = c("Almost Certain-Catastrophic" = "red","Likely-Major" = "yellow","Possible-Moderate" = "blue"))
scale_fill_manual contains all combination of Impact, Likelihood with their respective colors.

Similar to #aosmith I tried expand.grid to get a finite set of combinations but tidyr::complete() works pretty nice as well. Add the colors and letters and fill using a set color range.
df <- data.frame(Likelihood = c("Almost Certain","Likely","Possible"),
Impact = c("Catastrophic", "Major","Moderate"),
stringsAsFactors = FALSE)
df2 <- df %>% tidyr::complete(Likelihood,Impact) # alt expand.grid(df)
df2$X <- LETTERS[1:9] # Add letters here
df2$color <- paste0(df2$Likelihood,"-",df2$Impact) # Add colors
ggplot(df2, aes(Impact, Likelihood)) + geom_tile(aes(fill = color),colour = "white") + geom_text(aes(label=X)) +
scale_fill_manual(values = RColorBrewer::brewer.pal(9,"Pastel1"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract dplyr tbl and create vector where column 1 = column 2 - r

Related

Where does ggplot set the order of the color scheme?

ggRadar highlight top values in radar

How to add a legend on a multiple line graph in R?

R ggplot conditional color without exact match

Draw heatmap tiles for all combination of x-y

Categories

Resources