ggRadar highlight top values in radar - r

Hi everyone I am making a a radar plot and I want to highlight the two highest values in the factors or levels. Highlight in this case is to make the text of the top tree values bold
require(ggplot2)
require(ggiraph)
require(plyr)
require(reshape2)
require(moonBook)
require(sjmisc)
ggRadar(iris,aes(x=c(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width)))
an example can be like this
thank you

Here is a step-by-step example of how to highlight specific categories in a radar plot. I don't really see the point of all these extra dependencies (ggRadar etc.), as it's pretty straightforward to draw a radar plot in ggplot2 directly using polar coordinates.
First, let's generate some sample data. According to OPs comments and his example based on the iris dataset, we select the maximal value for every variable (from Sepal.Length, Sepal.Width, Petal.Length, Petal.Width); we then store the result in a long tibble for plotting.
library(purrr)
library(dplyr)
library(tidyr)
df <- iris %>% select(-Species) %>% map_df(max) %>% pivot_longer(everything())
df
# # A tibble: 4 x 2
# name value
# <chr> <dbl>
#1 Sepal.Length 7.9
#2 Sepal.Width 4.4
#3 Petal.Length 6.9
#4 Petal.Width 2.5
Next, we make use of a custom coord_radar function (thanks to this post), that is centred around coord_polar and ensures that polygon lines in a polar plot are straight lines rather than curved arcs.
coord_radar <- function (theta = "x", start = - pi / 2, direction = 1) {
theta <- match.arg(theta, c("x", "y"))
r <- if (theta == "x") "y" else "x"
ggproto(
"CordRadar", CoordPolar, theta = theta, r = r, start = start,
direction = sign(direction),
is_linear = function(coord) TRUE)
}
We now create a new column df$face that is "bold" for the top 3 variables (ranked by decreasing value) and "plain" otherwise. We also need to make sure that factor levels of our categories are sorted by row number (otherwise name and face won't necessarily match later).
df <- df %>%
mutate(
rnk = rank(-value),
face = if_else(rnk < 4, "bold", "plain"),
name = factor(name, levels = unique(name)))
We can now draw the plot
library(ggplot2)
ggplot(df, aes(name, value, group = 1)) +
geom_polygon(fill = "red", colour = "red", alpha = 0.4) +
geom_point(colour = "red") +
coord_radar() +
ylim(0, 10) +
theme(axis.text.x = element_text(face = df$face))
Note that this gives a warning, which I choose to ignore here, as we explicitly make use of the vectorised element_text option.
Warning message:
Vectorized input to element_text() is not officially supported.
Results may be unexpected or may change in future versions of ggplot2.

My suggestion would be to identify the highest values you wish to highlight, and put them in a dataframe. Then use geom_richtext() to highlight.

Related

How to create a boxplot or table that have labels of outliers and count the outliers?

I'm not able to share my data, so sorry for that. Most of my data are either dummy or ordinal or unordered discrete variables. Only age is numeric.
I used this code to see which values are outliers
boxplot(df$var1, plot = TRUE)$out
And this code for count how many outliers:
length(boxplot(dataDK$sclmeet)$out)
I replaced the outliers with NA's using the sapply function.
I now want to either create boxplot or a table that count the amount of outliers and which they are. How is this possible?
If you help with the boxplot method then I can make mutilple boxplots and then combine them into one using par(mfrow = c(,))
The boxplot could look like this, where 1 (blue) is the value of outlier and 4 (blue) is the count of how many 1 there are:
Edit:
I forgot to mention that I know this method:
out <- boxplot.stats(df$var1)$out
boxplot(df$var1,
ylab = "var1",
main = "Boxplot for var1"
)
mtext(paste("Outliers: ", paste(out, collapse = ", ")))
This will give a plot similary to this. However it is not a good method for many different outliers
(taken from boxplot outlier labels):
These kinds of plots are easier with ggplot, not base R. Have you considered adding a table of your outliers next to your plot? There may be cases where you have different kinds of outliers (and thus your text would be cumbersome). However, if you already know how many outliers you have, you can use annotate to add simple text.
library(tidyverse)
library(cowplot) # to plot stuff side by side
library(gridExtra)
data(iris)
boxplot(iris$Sepal.Width, plot = TRUE)$out
length(boxplot(iris$Sepal.Width)$out)
# https://stackoverflow.com/questions/54993511/how-to-replace-outliers-with-na-in-r-from-vector-created-with-boxplotout
iris$is_outlier <- ifelse(iris$Sepal.Width %in% boxplot.stats(iris$Sepal.Width)$out, 1, 0)
iris <- iris %>%
select(Sepal.Width, is_outlier) %>%
mutate(Sepal.Width_NA = ifelse(is_outlier == 1, NA, Sepal.Width))
t <- iris %>%
filter(is_outlier == 1) %>%
select(Sepal.Width) %>% table() %>% as.data.frame() %>% tableGrob(rows = NULL)
p <- ggplot(iris, aes(y = Sepal.Width_NA)) +
geom_boxplot()
# plot with table side-by-side
plot_grid(p, t, rel_widths = c(2, 1))
# close to your original desired plot
ggplot(iris, aes(y = Sepal.Width_NA)) +
geom_boxplot() +
annotate("label", color = "blue",
size = 4,
x = 0, y = 2,
label = "1 (4)")

How can I manually add labels to multiple ggplot2 mappings created through a for-loop?

I have been working on plotting several lines according to different probability levels and am stuck adding labels to each line to represent the probability level.
Since each curve plotted has varying x and y coordinates, I cannot simply have a large data-frame on which to perform usual ggplot2 functions.
The end goal is to have each line with a label next to it according to the p-level.
What I have tried:
To access the data comfortably, I have created a list df with for example 5 elements, each element containing a nx2 data frame with column 1 the x-coordinates and column 2 the y-coordinates. To plot each curve, I create a for loop where at each iteration (i in 1:5) I extract the x and y coordinates from the list and add the p-level line to the plot by:
plot = plot +
geom_line(data=df[[i]],aes(x=x.coor, y=y.coor),color = vector_of_colors[i])
where vector_of_colors contains varying colors.
I have looked at using ggrepel and its geom_label_repel() or geom_text_repel() functions, but being unfamiliar with ggplot2 I could not get it to work. Below is a simplification of my code so that it may be reproducible. I could not include an image of the actual curves I am trying to add labels to since I do not have 10 reputation.
# CREATION OF DATA
plevel0.5 = cbind(c(0,1),c(0,1))
colnames(plevel0.5) = c("x","y")
plevel0.8 = cbind(c(0.5,3),c(0.5,1.5))
colnames(plevel0.8) = c("x","y")
data = list(data1 = line1,data2 = line2)
# CREATION OF PLOT
plot = ggplot()
for (i in 1:2) {
plot = plot + geom_line(data=data[[i]],mapping=aes(x=x,y=y))
}
Thank you in advance and let me know what needs to be clarified.
EDIT :
I have now attempted the following :
Using bind_rows(), I have created a single dataframe with columns x.coor and y.coor as well as a column called "groups" detailing the p-level of each coordinate.
This is what I have tried:
plot = ggplot(data) +
geom_line(aes(coors.x,coors.y,group=groups,color=groups)) +
geom_text_repel(aes(label=groups))
But it gives me the following error:
geom_text_repel requires the following missing aesthetics: x and y
I do not know how to specify x and y in the correct way since I thought it did this automatically. Any tips?
You approach is probably a bit to complicated. As far as I get it you could of course go on with one dataset and use the group aesthetic to get the same result you are trying to achieve with your for loop and multiple geom_line. To this end I use dplyr:.bind_rows to bind your datasets together. Whether ggrepel is needed depends on your real dataset. In my code below I simply use geom_text to add an label at the rightmost point of each line:
plevel0.5 <- data.frame(x = c(0, 1), y = c(0, 1))
plevel0.8 <- data.frame(x = c(0.5, 3), y = c(0.5, 1.5))
library(dplyr)
library(ggplot2)
data <- list(data1 = plevel0.5, data2 = plevel0.8) |>
bind_rows(.id = "id")
ggplot(data, aes(x = x, y = y, group = id)) +
geom_line(aes(color = id)) +
geom_text(data = ~ group_by(.x, id) |> filter(x %in% max(x)), aes(label = id), vjust = -.5, hjust = .5)

How to calculate and label peak value of distribution by multiple conditions/facets in R ggplot?

While the question appears similar to others, there's a key difference in my mind.
I want to be able to calculate and/or print (graphing it would be the ultimate goal, but calculating it in the data frame the primary goal) the peak value of a density curve of EACH SUB-CONDITION BY FACET The density graph looks like this:
So, ideally, I would be able to know the intensity (x-axis value) corresponding to the highest peak of the density curves for each condition.
Here's some dummy data:
set.seed(1234)
library(tidyverse)
library(fs)
n = 100000
silence = factor(c("sil1", "sil2", "sil3", "sil4", "sil5"))
treat = factor(c("con", "uos", "uos+wnt5a", "wnt5a"))
silence = rep(silence, n)
treat = rep(treat, n)
intensity = sample(4000:10000, n)
df <- cbind(silence, treat, intensity)
df$silence <- silence
df$treat <- treat
What I've tried:
Subsetting the primary DF and going through and calculating the density of each condition, but this could take days
Something close to this answer: Calculating peaks in histograms or density functions but not quite. I think the data look better as a histogram personally, but that constructs an arbitrary number of bins for intensity data (a continuous measure). The histogram looks like this:
Again, it would be sufficient to get the peak values for each of these groups (i.e., treatments by silencing subdistributions) just in the console, but adding them as a vertical line in the graphs would be a sweet cherry on top (it could also make it hella busy, so I will see about that piece later)
Thank you!!
Depending on the way you're producing the density plots, there may be a more direct way to recreate the density calculation before it goes into ggplot. That'll be the easiest way to get the peak values and keep them in the format of your data.
Without that, here's a hack that should work in general, but requires some kludging to fit the extracted points back into the form of your original data.
Here's a plot like yours:
mtcars %>%
mutate(gear = as.character(gear)) %>%
ggplot(aes(wt, fill = gear, group = gear)) +
geom_density(alpha = 0.2) +
facet_wrap(~am) ->my_plot
Here are the components that make up that plot:
ggplot_build(my_plot) -> my_plot_innards
With some ugly hacking we can extract the points that make up the curves and make them look kind of like our original data. Some info is destroyed, e.g. the gear values 3/4/5 become group 1/2/3. There might be a cool way to convert back, but I don't know it yet.
extracted_points <- tibble(
wt = my_plot_innards[["data"]][[1]][["x"]],
y = my_plot_innards[["data"]][[1]][["y"]],
gear = (my_plot_innards[["data"]][[1]][["group"]] + 2) %>% as.character, # HACK
am = (my_plot_innards[["data"]][[1]][["PANEL"]] %>% as.numeric) - 1 # HACK
)
ggplot(extracted_points, aes(wt, y, fill = gear)) +
geom_point(size = 0.3) +
facet_wrap(~am)
extracted_points_notes <- extracted_points %>%
group_by(gear, am) %>%
slice_max(y)
my_plot +
geom_point(data = extracted_points_notes,
aes(y = y), color = "red", size = 3, show.legend = FALSE) +
geom_text(data = extracted_points_notes, hjust = -0.5,
aes(y = y, label = scales::comma(y)), color = "red", size = 3, show.legend = FALSE)

Assigning specific colors to specific cases in ridgeline plots in R

Recently this community helped me tremendously with getting Ridgeline plots to work with my data.
Now I am struggling with coloring them according to my needs.
Basically what I want is plotting my cases in different orders but they should keep a specific color so observations remain recognizable even when plotted in a different order. So far I failed with applying the available solutions to my requirements.
Let us take for example this data, where we have a name, a mean and an SD:
caseName caseMean caseSD
Svansdottir 2006 -0.0646 0.4032398
Guétin 2009 -0.4649 0.3995663
Raglio 2010a -0.2145 0.2814031
Let's first sort them by caseMean:
df$caseName <- factor(df$caseName, levels = df$caseName[order(df$caseMean)])
and plot it with the following code:
library(tidyverse); library(ggridges)
n = 100
df3 <- df %>%
mutate(low = caseMean - 3 * caseSD, high = caseMean + 3 * caseSD) %>%
uncount(n, .id = "row") %>%
mutate(x = (1 - row/n) * low + row/n * high,
norm = dnorm(x, caseMean, caseSD))
ggplot(df3, aes(x, caseName, height = norm, fill=caseName)) +
geom_ridgeline(scale = 2,alpha=0.75) +
scale_fill_viridis_d()
we get this:
Now we reverse the order
df$caseName <- factor(df$caseName, levels = df$caseName[order(-df$caseMean)])
and plot again with the code above we see that the plots have switched color:
How can I make sure that the same cases have always the same colors no matter the order I put them in?
I would like to have code that doesn't require me to to "hard-wire" colors to a specific case name. I want to be able to do this to ridgeline plots with 20, 30, or more observations. The fact that I picked the viridis color palette doesn't matter. I am happy with any solution (like with heat.colors or something similar).
If your new factor is just reversing the order of the previous one, you could use the argument direction in scale_fill_viridis_d().
For more complicated cases (i.e. re-leveling a factor), a possibility is to add the colour manually, possibly in your orginal data-frame, and to feed it with scale_fill_manual()
simple case: reversing order of factor
library(tidyverse)
df <- data.frame(name = letters[3:1], value = c(3,1,2))
pl_1 <- ggplot(aes(x=name, y=value, fill=name), data=df)+
geom_col() +
scale_fill_viridis_d()
pl_1
pl_1 %+% mutate(df, name = factor(name, levels = c("c", "b", "a"))) +
scale_fill_viridis_d(direction=-1)
#> Scale for 'fill' is already present. Adding another scale for 'fill',
#> which will replace the existing scale.
More complicated case
library(tidyverse)
library(viridis)
df_new <- tibble(name = letters[3:1], value = c(3,1,2),
col = rev(viridis(3))) %>%
mutate(name = factor(name, levels = c("c", "b", "a"))) %>%
arrange(name)
df_new %>%
ggplot(aes(x=name, y=value, fill=name)) +
geom_col() +
scale_fill_manual(values=df_new$col)
Created on 2019-06-06 by the reprex package (v0.3.0)

Setting facet-specific breaks in stat_contour

I'd like to show a contour plot using ggplot and stat_contour for two categories of my data with facet_grid. I want to highlight a particular level based on the data. Here's an analogous dummy example using the usual volcano data.
library(dplyr)
library(ggplot2)
v.plot <- volcano %>% reshape2::melt(.) %>%
mutate(dummy = Var1 > median(Var1)) %>%
ggplot(aes(Var1, Var2, z = value)) +
stat_contour(breaks = seq(90, 200, 12)) +
facet_grid(~dummy)
Plot 1:
Let's say within each factor level (here east and west halves, I guess), I want to find the mean height of the volcano and show that. I can calculate it manually:
volcano %>% reshape2::melt(.) %>%
mutate(dummy = Var1 > median(Var1)) %>%
group_by(dummy) %>%
summarise(h.bar = mean(value))
# A tibble: 2 × 2
dummy h.bar
<lgl> <dbl>
1 FALSE 140.7582
2 TRUE 119.3717
Which tells me that the mean heights on each half are 141 and 119. I can draw BOTH of those on BOTH facets, but not just the appropriate one on each side.
v.plot + stat_contour(breaks = c(141, 119), colour = "red", size = 2)
Plot 2:
And you can't put breaks= inside an aes() statement, so passing it in as a column in the original dataframe is out. I realize with this dummy example I could probably just do something like bins=2 but in my actual data I don't want the mean of the data, I want something else altogether.
Thanks!
I made another attempt at this problem and came up with a partial solution, but I'm forced to use a different geom.
volcano %>% reshape2::melt(.) %>%
mutate(dummy = Var1 > median(Var1)) %>%
group_by(dummy) %>%
mutate(h.bar = mean(value), # edit1
is.close = round(h.bar) == value) %>% #
ggplot(aes(Var1, Var2, z = value)) +
stat_contour(breaks = seq(90, 200, 12)) +
geom_point(colour = "red", size = 3, # edit 2
aes(alpha = is.close)) + #
scale_alpha_discrete(range = c(0,1)) + #
facet_grid(~dummy)
In edit 1 I added a mutate() to the above block to generate a variable identifying where value was "close enough" (rounded to the nearest integer) to the desired highlight point (the mean of the data for this example).
In edit2 I added geom_points to show the grid locations with the desired value, and hid the undesired ones using an alpha of 0 or totally transparent.
Plot 3:
The problem with this solution is that it's very gappy, and trying to bridge those with geom_path is a jumbled mess. I tried coarser rounding as well, and it just made things muddy.
Would love to hear other ideas! Thanks

Resources