I'm trying to plot the labels in the plot. I did that but they don't really lok very good. Here's an example to better understand it, I have this plot:
And I want to add the values in the end, I did that but they look strange:
Is there any way of how I could fix this? I do the plotting at the end of the code where it is commented #plotting.
Here is the reproducible code:
library(glmnet)
library(dplyr)
library(tidyr)
set.seed(100)
n=100
p=50
X=matrix(rnorm(n*p), nrow=n)
y=matrix(rnorm(n), nrow=n)
lam = seq(0.1,7,length.out=100)
lm=glmnet(X,y,alpha=1,lambda=lam, intercept=FALSE, standardize=FALSE)
value1=as.matrix(coef(lm))
#creating a dataframe
L1 <- function(x)
sum(abs(x))
bind_cols(
as.data.frame(value1) %>%
summarise_all(funs(L1(.))) %>%
t() %>%
as.data.frame() %>%
rename(x = V1),
t(value1) %>%
as.data.frame() %>%
rename_all(funs(gsub("V", "", .)))
) %>%
gather(row, y, 2:(nrow(value1) + 1)) -> dataf
#plotting
ggplot(dataf, aes(x, y, colour = row)) + geom_line() +
geom_text_repel(
data = subset(dataf, x == max(x)),
aes(label = row),
size = 2,
nudge_x = 1
) +
theme(legend.position = "none")
The main thing going on here is that you have a bunch of text labels all in one spot, so by repelling them and letting them have segments attaching labels to their values, you end up with this starburst at the end of your plot.
To see what I mean, filter your data for the maximum x value, which is where you're placing your label, and rows where y == 0: there's 35 of these! So you have 35 bits of text all vying for the same spot and being repelled away from one another.
dataf %>%
filter(x == max(x), y == 0) %>%
nrow()
#> [1] 35
Second way you can see this is if you set the color of the segments connecting the texts to their values. If you set it to gray, you can distinguish those segments from the actual geom_lines, since now they aren't the same color.
ggplot(dataf, aes(x, y, colour = row)) +
geom_line() +
geom_text_repel(
data = . %>% filter(x == max(x)),
aes(label = row),
size = 2,
nudge_x = 0.01,
segment.color = "gray60"
) +
scale_x_continuous(expand = expand_scale(mult = c(0.05, 0.1))) +
theme(legend.position = "none")
Here are a couple ways you can avoid this tangle: I decreased the nudge_x so the texts would be closer to the lines (nudge_x works in relation to your x values, so nudging over by 1 when values are only 0 to 0.6ish puts the labels very far away). I changed the segment color to something neutral, and adjusted the minimum distance before the segments are drawn. I added a expand_scale to give some more space on the right side (this is only in the dev version of ggplot still). And most importantly, I took out labels for values of 0.
You should probably tweak these things to your liking, but hopefully this is a start in cleaning it up.
ggplot(dataf, aes(x, y, colour = row)) +
geom_line() +
geom_text_repel(
data = . %>% filter(x == max(x), y != 0),
aes(label = row),
size = 2,
nudge_x = 0.01,
min.segment.length = 5,
segment.color = "gray60"
) +
scale_x_continuous(expand = expand_scale(mult = c(0.05, 0.1))) +
theme(legend.position = "none")
Created on 2018-06-11 by the reprex package (v0.2.0).
Related
I am trying to add a number label on each cell of a heatmap. Because it also needs marginal barcharts I have tried two packages. iheatmapr and ComplexHeatmap.
(1st try) iheatmapr makes it easy to add to add bars as below, but I couldnt see how to add labels inside the heatmap on individual cells.
library(tidyverse)
library(iheatmapr)
library(RColorBrewer)
in_out <- data.frame(
'Economic' = c(2,1,1,3,4),
'Education' = c(0,3,0,1,1),
'Health' = c(1,0,1,2,0),
'Social' = c(2,5,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
GreenLong <- colorRampPalette(brewer.pal(9, 'Greens'))(12)
lowGreens <- GreenLong[0:5]
in_out_matrix <- as.matrix(in_out)
main_heatmap(in_out_matrix, colors = lowGreens)
in_out_plot <- iheatmap(in_out_matrix,
colors=lowGreens) %>%
add_col_labels() %>%
add_row_labels() %>%
add_col_barplot(y = colSums(bcio)/total) %>%
add_row_barplot(x = rowSums(bcio)/total)
in_out_plot
Then used: save_iheatmap(in_out_plot, "iheatmapr_test.png")
Because I couldnt use ggsave(device = ragg::agg_png etc) with iheatmapr object.
Also, the iheatmapr object's apparent incompatibility (maybe I am wrong) with ggsave() is a problem for me because I normally use ragg package to export image AGG to preserve font sizes. I am suspecting some other heatmap packages make custom objects that maybe incompatible with patchwork and ggsave.
ggsave("png/iheatmapr_test.png", plot = in_out_plot,
device = ragg::agg_png, dpi = 72,
units="in", width=3.453, height=2.5,
scaling = 0.45)
(2nd try) ComplexHeatmap makes it easy to label individual number "cells" inside a heatmap, and also offers marginal bars among its "Annotations", and I have tried it, but its colour palette system (which uses integers to refer to a set of colours) doesnt suit my RGB vector colour gradient, and overall it is a sophisticated package clearly designed to make graphics more advanced than what I am doing.
I am aiming for style as shown in screenshot example below, which was made in Excel.
Please can anyone suggest a more suitable R package for a simple heatmap like this with marginal bars, and number labels inside?
Instead of relying on packages which offer out-of-the-box solutions one option to achieve your desired result would be to create your plot from scratch using ggplot2 and patchwork which gives you much more control to style your plot, to add labels and so on.
Note: The issue with iheatmapr is that it returns a plotly object, not a ggplot. That's why you can't use ggsave.
library(tidyverse)
library(patchwork)
in_out <- data.frame(
'Economic' = c(1,1,1,5,4),
'Education' = c(0,0,0,1,1),
'Health' = c(1,0,1,0,0),
'Social' = c(1,1,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
in_out_long <- in_out %>%
mutate(y = rownames(.)) %>%
pivot_longer(-y, names_to = "x")
# Summarise data for marginal plots
yin <- in_out_long %>%
group_by(y) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
xin <- in_out_long %>%
group_by(x) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
# Heatmap
ph <- ggplot(in_out_long, aes(x, y, fill = value)) +
geom_tile() +
geom_text(aes(label = value), size = 8 / .pt) +
scale_fill_gradient(low = "#F7FCF5", high = "#00441B") +
theme(legend.position = "bottom") +
labs(x = NULL, y = NULL, fill = NULL)
# Marginal plots
py <- ggplot(yin, aes(value, y)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), hjust = -.1, size = 8 / .pt) +
scale_x_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
px <- ggplot(xin, aes(x, value)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), vjust = -.5, size = 8 / .pt) +
scale_y_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
# Glue plots together
px + plot_spacer() + ph + py + plot_layout(ncol = 2, widths = c(2, 1), heights = c(1, 2))
I am plotting a some data using ggplot2's geom_bar. The data represents a ratio that should center around 1 and not 0. This would allow me to highlight which categories go below or above this central ratio number. I've tried playing with set_y_continuous() and ylim(), neither of which allow me to sent a central axis value.
Basically: how to I make Y center around 1 and not 0.
sorry if i am asking a question that's been answered... maybe I just don't know the right key words?
ggplot(data = plotdata) +
geom_col(aes(x = stressclass, y= meanexpress, color = stressclass, fill = stressclass)) +
labs(x = "Stress Response Category", y = "Average Response Normalized to Control") +
facet_grid(exposure_cond ~ .)
As of now my plots look like this:
You can pre-process your y-values so that the plot actually starts at 0, then change the scale labels to reflect the original values (demonstrating with a built-in dataset):
library(dplyr)
library(ggplot2)
cut.off = 500 # (= 1 in your use case)
diamonds %>%
filter(clarity %in% c("SI1", "VS2")) %>%
count(cut, clarity) %>%
mutate(n = n - cut.off) %>% # subtract cut.off from y values
ggplot(aes(x = cut, y = n, fill = cut)) +
geom_col() +
geom_text(aes(label = n + cut.off, # label original values (optional)
vjust = ifelse(n > 0, 0, 1))) +
geom_hline(yintercept = 0) +
scale_y_continuous(labels = function(x) x + cut.off) + # add cut.off to label values
facet_grid(clarity ~ .)
This question already has answers here:
How to fill with different colors between two lines? (originally: fill geom_polygon with different colors above and below y = 0 (or any other value)?)
(4 answers)
Closed 7 months ago.
Aim
I am trying to fill the area between two lines in a plot generated with ggplot in R. I would like to fill everything between the lines above of the horizontal line with a different color than below the horizontal line.
I succeeded to fill everything between the two lines with a single color, however, I did not manage to differentiate above and below the vertical line by two different colors.
Code
set.seed(123)
# Load packages
library(tidyverse)
# Create sample dataframe
df <- data.frame(x=seq(1,50,1),y=runif(50, min = 0, max = 10))
# Generate plot
ggplot(data = df, aes(x = x, y = y)) +
geom_line() +
geom_hline(yintercept = 5) +
theme_classic() +
geom_ribbon(aes(ymin=5,ymax=y), fill="blue")
Question
How do I fill the space above and below the horizontal line with a different color?
You can calculate the coordinates of the points where the two lines intersect & add them to your data frame:
m <- 5 # replace with desired y-intercept value for the horizontal line
# identify each run of points completely above (or below) the horizontal
# line as a new section
df.new <- df %>%
arrange(x) %>%
mutate(above.m = y >= m) %>%
mutate(changed = is.na(lag(above.m)) | lag(above.m) != above.m) %>%
mutate(section.id = cumsum(changed)) %>%
select(-above.m, -changed)
# calculate the x-coordinate of the midpoint between adjacent sections
# (the y-coordinate would be m), & add this to the data frame
df.new <- rbind(
df.new,
df.new %>%
group_by(section.id) %>%
filter(x %in% c(min(x), max(x))) %>%
ungroup() %>%
mutate(mid.x = ifelse(section.id == 1 |
section.id == lag(section.id),
NA,
x - (x - lag(x)) /
(y - lag(y)) * (y - m))) %>%
select(mid.x, y, section.id) %>%
rename(x = mid.x) %>%
mutate(y = m) %>%
na.omit())
With this data frame, you can then define two separate geom_ribbon layers with different colours. Comparison of results below (note: I also added a geom_point layer for illustration, & changed the colours because the blue in the original is a little glaring on the eyes...)
p1 <- ggplot(df,
aes(x = x, y = y)) +
geom_ribbon(aes(ymin=5, ymax=y), fill="dodgerblue") +
geom_line() +
geom_hline(yintercept = m) +
geom_point() +
theme_classic()
p2 <- ggplot(df.new, aes(x = x, y = y)) +
geom_ribbon(data = . %>% filter(y >= m),
aes(ymin = m, ymax = y),
fill="dodgerblue") +
geom_ribbon(data = . %>% filter(y <= m),
aes(ymin = y, ymax = m),
fill = "firebrick1") +
geom_line() +
geom_hline(yintercept = 5) +
geom_point() +
theme_classic()
fill three layers in order, for a particularly ugly result:
# Generate plot
ggplot(data = df, aes(x = x, y = y)) +
geom_line() +
geom_hline(yintercept = 5) +
theme_classic() +
geom_ribbon(aes(ymin=y,ymax=10), fill="green")+
geom_ribbon(aes(ymin=0,ymax=y), fill="yellow")+
geom_ribbon(aes(ymin=5,ymax=y), fill="blue")
I have created a stacked barplot
ggplot(data %>% count(x, y),
aes(x, n, fill = factor(y))) +
geom_bar(stat="identity")+
theme_light()+
theme(plot.title = element_text(hjust=0.5))
there are (possible) outliers at 50,54 and 60. How can I add their ID into the graph?
If you post your data, I'll amend this answer using it. But basically you want
df %>%
count(x, y) %>%
ggplot(aes(x = x, y = n, fill = y)) +
geom_col() +
geom_text(aes(label = x), data = . %>% filter(x >= thresh), vjust = 0, nudge_y = 0.1)
where thresh is some threshold you've set--maybe an arbitrary cutoff point that makes sense, or maybe 3 standard deviations from the mean of x, or whatever. You can store it in an outside variable, you can make a boolean column in your dataframe, or you can calculate it inline inside your geom_text--really up to you. vjust = 0, nudge_y = 0.1 puts the labels just above the bars corresponding to your outliers.
Maybe geom_text(data=mydata%>%filter(just.the.outliers) ?
See also this: RE: Alignment of numbers on the individual bars with ggplot2
I would like to create a colour blind test, similar to that below, using ggplot.
The basic idea is to use geom_hex (or perhaps a voronoi diagram, or possibly even circles as in the figure above) as the starting point, and define a dataframe that, when plotted in ggplot, produces the image.
We would start by creating a dataset, such as:
df <- data.frame(x = rnorm(10000), y = rnorm(10000))
then plot this:
ggplot(df, aes(x, y)) +
geom_hex() +
coord_equal() +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
theme_void()
which gives the image below:
The main missing step is to create a dataset that actually plots a meaningful symbol (letter or number), and I'm not sure how best to go about this without painstakingly mapping the coordinates. Ideally one would be able to read in the coordinates perhaps from an image file.
Finally, a bit of tidying up could round the plot edges by removing the outlying points.
All suggestions are very welcome!
EDIT
Getting a little closer to what I'm after, we can use the image below of the letter 'e':
Using the imager package, we can read this in and convert it to a dataframe:
img <- imager::load.image("e.png")
df <- as.data.frame(img)
then plot that dataframe using geom_raster:
ggplot(df, aes(x, y)) +
geom_raster(aes(fill = value)) +
coord_equal() +
scale_y_continuous(trans = scales::reverse_trans()) +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
theme_void()
If we use geom_hex instead of geom_raster, we can get the following plot:
ggplot(df %>% filter(value %in% 1), aes(x, y)) +
geom_hex() +
coord_equal() +
scale_y_continuous(trans = scales::reverse_trans()) +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
theme_void()
so, getting there but clearly still a long way off...
Here's an approach for creating this plot:
Packages you need:
library(tidyverse)
library(packcircles)
Get image into a 2D matrix (x and y coordinates) of values. To do this, I downloaded the .png file of the e as "e.png" and saved in my working directory. Then some processing:
img <- png::readPNG("e.png")
# From http://stackoverflow.com/questions/16496210/rotate-a-matrix-in-r
rotate <- function(x) t(apply(x, 2, rev))
# Convert to one colour layer and rotate it to be in right direction
img <- rotate(img[,,1])
# Check that matrix makes sense:
image(img)
Next, create a whole lot of circles! I did this based on this post.
# Create random "circles"
# *** THESE VALUES WAY NEED ADJUSTING
ncircles <- 1200
offset <- 100
rmax <- 80
x_limits <- c(-offset, ncol(img) + offset)
y_limits <- c(-offset, nrow(img) + offset)
xyr <- data.frame(
x = runif(ncircles, min(x_limits), max(x_limits)),
y = runif(ncircles, min(y_limits), max(y_limits)),
r = rbeta(ncircles, 1, 10) * rmax)
# Find non-overlapping arrangement
res <- circleLayout(xyr, x_limits, y_limits, maxiter = 1000)
cat(res$niter, "iterations performed")
#> 1000 iterations performed
# Convert to data for plotting (just circles for now)
plot_d <- circlePlotData(res$layout)
# Check circle arrangement
ggplot(plot_d) +
geom_polygon(aes(x, y, group=id), colour = "white", fill = "skyblue") +
coord_fixed() +
theme_minimal()
Finally, interpolate the image pixel values for the centre of each circle. This will indicate whether a circle is centered over the shape or not. Add some noise to get variance in colour and plot.
# Get x,y positions of centre of each circle
circle_positions <- plot_d %>%
group_by(id) %>%
summarise(x = min(x) + (diff(range(x)) / 2),
y = min(y) + (diff(range(y)) / 2))
# Interpolate on original image to get z value for each circle
circle_positions <- circle_positions %>%
mutate(
z = fields::interp.surface(
list(x = seq(nrow(img)), y = seq(ncol(img)), z = img),
as.matrix(.[, c("x", "y")])),
z = ifelse(is.na(z), 1, round(z)) # 1 is the "empty" area shown earlier
)
# Add a little noise to the z values
set.seed(070516)
circle_positions <- circle_positions %>%
mutate(z = z + rnorm(n(), sd = .1))
# Bind z value to data for plotting and use as fill
plot_d %>%
left_join(select(circle_positions, id, z)) %>%
ggplot(aes(x, y, group = id, fill = z)) +
geom_polygon(colour = "white", show.legend = FALSE) +
scale_fill_gradient(low = "#008000", high = "#ff4040") +
coord_fixed() +
theme_void()
#> Joining, by = "id"
To get colours right, tweak them in scale_fill_gradient