There is a very similar question here: Add NA value to ggplot legend for continuous data map.
I tried to understand it, but I didn't manage to make it work for my data.
So I created a super simple example. I have this data:
set.seed(1)
df = data.frame(a=rnorm(50), b=rnorm(50), c=rep(1:5, 10))
df[sample(1:50, 10), ]$c = NA
where all columns are numeric. Now I'd like to make a ggplot with a legend entry for the NA-values. When I do the following:
ggplot(df) +
geom_point(
aes(x = a, y =b, col=c)
)
This is the result
What I want is something like this (when c is a a factor it gets automatically an entry):
ggplot(df) +
geom_point(
aes(x = a, y =b, col=factor(c))
)
Could I achieve more or less easy similar results and keep my values in class numeric?
Defining a color for NA is easy by adding scale_color_continuous(na.value="red"), but it is not explicitly labeled in the legend.
To achieve that you could add a second color scale just for the NA value using ggnewscale:
library(ggplot2)
library(ggnewscale)
set.seed(1)
df = data.frame(a=rnorm(50), b=rnorm(50), c=rep(1:5, 10))
df[sample(1:50, 10), ]$c = NA
na.value.forplot <- 'red'
ggplot(df) +
geom_point(aes(x = a, y =b, col=c)) +
scale_color_continuous(guide = guide_colorbar(order = 2)) +
new_scale_color() +
geom_point(data=subset(df, is.na(c)),
aes(x=a, y=b, col="red")) +
scale_color_manual(name=NULL, labels="NA", values="red")
Created on 2021-03-31 by the reprex package (v1.0.0)
Related
I have the following problem:
My code is like this:
ggplot(data, aes(x = fct_infreq(sub-group), fill = group)) + geom_bar()
And the result was this:
I want to plot firstly the red group (in ascendent order) and after the blue group (also in ascendent order), all this in the same plot.
How can i do this?
Thanks in advance!
The following is simply providing the limits to the y-axis in the order you want, without bothering with factors.
library(ggplot2)
df <- data.frame(
y = LETTERS[1:20],
group = rep(c("A", "B"), 10),
x = rnorm(20)
)
ggplot(df, aes(x, y, fill = group)) +
geom_col() +
scale_y_discrete(
limits = df$y[rev(order(df$group, df$x))]
)
Created on 2021-12-16 by the reprex package (v2.0.1)
My data consists of three numeric variables. Something like this:
set.seed(1)
df <- data.frame(x= rnorm(10000), y= rnorm(10000))
df$col= df$x + df$y + df$x*df$y
Plotting this as a heatplot looks good:
ggplot(df, aes(x, y, col= col)) + geom_point(size= 2) + scale_color_distiller(palette = "Spectral")
But real variables can have some skewness or outliers and this totally changes the plot. After df$col[nrow(df)] <- 100 same ggplot code as above returns this plot:
Clearly, the problem is that this one point changes the scale and we get a plot with little information. My solution is to rank the data with rank() which gives a reasonable color progression for any variable I`ve tried so far. See here:
ggplot(df, aes(x, y, col= rank(col))) + geom_point(size= 2) + scale_color_distiller(palette = "Spectral")
The problem with this solution that the new scale (2,500 to 10,000) is shown as the color label. I want the original scale to be shown as color label (o to 10). Therefor, I want that the color progression corresponds to the ranked data; i.e. I need to somehow map the original values to the ranked color values. Is that possible? I tried to change limits argument to limits= c(0, 10) inside scale_color_distiller() but this does not help.
Sidenotes: I do not want to remove the outlier. Ranking works well. I wan to use scale_color_distiller(). If possible, I want not to use any additional packages than ggplot2.
rescale the rank to the range of your original df$col.
library(tidyverse)
set.seed(1)
df <- data.frame(x = rnorm(10000), y = rnorm(10000))
df %>%
mutate(
col = x + y + x * y,
scaled_rank = scales::rescale(rank(col), range(col))
) %>%
ggplot(aes(x, y, col = scaled_rank)) +
geom_point(size = 2) +
scale_color_distiller(palette = "Spectral")
Created on 2021-11-17 by the reprex package (v2.0.1)
I must plot 25 plots, each with its own dataset. I need to insert a horizontal line into each plot. Problem is, the coordinates cannot be hardcoded as each dataset's range varies.
I need to have the horizontal line always to be at the first value of the according dataset
This is my geom for the line that I tried (the y-axis intercept is hardcoded in this case and doesnt help).
+ geom_hline(yintercept=c(75,0), linetype="dotted")
I can grab the value (which is at the identical position in each dataset for each plot) for each line's y-intersepction with this:
dataset[1, 6]
which I could also store in a vector like this
coord <- dataset[1, 6]
But not having any success bringing this together
I tried with no luck:
+ geom_hline(yintercept=coord, linetype="dotted")
Example Code:
a <- c(10,40,30,22)
b <- c(1,2,3,4)
df <- data.frame(a,b)
try <- df %>% ggplot(aes(x = b, y = a)) + geom_line() + scale_y_continuous(expand = c(0,0), limits = c(0, NA)) + geom_hline(yintercept=c(30,0), linetype="dotted") + theme_tq()
Thanks in advance
I don't understand what exactly is causing you trouble. If I loop through a list of dataframes, I can set the yintercept of each corresponding plot without too much trouble. Example below:
library(ggplot2)
library(patchwork)
# Split the economics dataset as an example
datasets <- split(economics, cut(seq_len(nrow(economics)), 9))
# Loop through list of dataframes, set hline to [1, 6] (drop because tibble)
plots <- lapply(datasets, function(df) {
ggplot(df, aes(date, unemploy)) +
geom_line() +
scale_y_continuous(limits = c(0, NA)) +
geom_hline(yintercept = c(df[1, 6, drop = TRUE], 0),
linetype = "dotted")
})
# For visualisation purposes
wrap_plots(plots)
Created on 2020-12-04 by the reprex package (v0.3.0)
I have discreet data that looks like this:
height <- c(1,2,3,4,5,6,7,8)
weight <- c(100,200,300,400,500,600,700,800)
person <- c("Jack","Jim","Jill","Tess","Jack","Jim","Jill","Tess")
set <- c(1,1,1,1,2,2,2,2)
dat <- data.frame(set,person,height,weight)
I'm trying to plot a graph with same x-axis(person), and 2 different y-axis (weight and height). All the examples, I find is trying to plot the secondary axis (sec_axis), or discreet data using base plots.
Is there an easy way to use sec_axis for discreet data on ggplot2?
Edit: Someone in the comments suggested I try the suggested reply. However, I run into this error now
Here is my current code:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("height",sec_axis(~.*1.2, name="height"))
p2
I get the error: Error in x < range[1] :
comparison (3) is possible only for atomic and list types
Alternately, now I have modified the example to match this example posted.
p <- ggplot(dat, aes(x = person))
p <- p + geom_line(aes(y = height, colour = "Height"))
# adding the relative weight data, transformed to match roughly the range of the height
p <- p + geom_line(aes(y = weight/100, colour = "Weight"))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*100, name = "Relative weight [%]"))
# modifying colours and theme options
p <- p + scale_colour_manual(values = c("blue", "red"))
p <- p + labs(y = "Height [inches]",
x = "Person",
colour = "Parameter")
p <- p + theme(legend.position = c(0.8, 0.9))+ facet_wrap(~set, scales="free")
p
I get an error that says
"geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?"
I get the template, but no points get plotted
R function arguments are fed in by position if argument names are not specified explicitly. As mentioned by #Z.Lin in the comments, you need sec.axis= before your sec_axis function to indicate that you are feeding this function into the sec.axis argument of scale_y_continuous. If you don't do that, it will be fed into the second argument of scale_y_continuous, which by default, is breaks=. The error message is thus related to you not feeding in an acceptable data type for the breaks argument:
p1 <- ggplot(data = dat, aes(x = person, y = weight)) +
geom_point(color = "red") + facet_wrap(~set, scales="free")
p2 <- p1 + scale_y_continuous("weight", sec.axis = sec_axis(~.*1.2, name="height"))
p2
The first argument (name=) of scale_y_continuous is for the first y scale, where as the sec.axis= argument is for the second y scale. I changed your first y scale name to correct that.
I have two graphs with the same x axis - the range of x is 0-5 in both of them.
I would like to combine both of them to one graph and I didn't find a previous example.
Here is what I got:
c <- ggplot(survey, aes(often_post,often_privacy)) + stat_smooth(method="loess")
c <- ggplot(survey, aes(frequent_read,often_privacy)) + stat_smooth(method="loess")
How can I combine them?
The y axis is "often privacy" and in each graph the x axis is "often post" or "frequent read".
I thought I can combine them easily (somehow) because the range is 0-5 in both of them.
Many thanks!
Example code for Ben's solution.
#Sample data
survey <- data.frame(
often_post = runif(10, 0, 5),
frequent_read = 5 * rbeta(10, 1, 1),
often_privacy = sample(10, replace = TRUE)
)
#Reshape the data frame
survey2 <- melt(survey, measure.vars = c("often_post", "frequent_read"))
#Plot using colour as an aesthetic to distinguish lines
(p <- ggplot(survey2, aes(value, often_privacy, colour = variable)) +
geom_point() +
geom_smooth()
)
You can use + to combine other plots on the same ggplot object. For example, to plot points and smoothed lines for both pairs of columns:
ggplot(survey, aes(often_post,often_privacy)) +
geom_point() +
geom_smooth() +
geom_point(aes(frequent_read,often_privacy)) +
geom_smooth(aes(frequent_read,often_privacy))
Try this:
df <- data.frame(x=x_var, y=y1_var, type='y1')
df <- rbind(df, data.frame(x=x_var, y=y2_var, type='y2'))
ggplot(df, aes(x, y, group=type, col=type)) + geom_line()