Error: Discrete value supplied to continuous scale - stat_ma_line - r

This is the data frame and ggplot code I am using:
ROS<- c(0.03, 0.03, 0.03, 0.03, 0.07, 0.07, 0.07, 0.07, 0.07, 0.1, 0.1, 0.1)
wind<- c(0.84, 1.77, 3.5, 6.44, 0.84, 1.77, 3.5, 6.44, 7.55, 0.84, 1.77, 3.5)
rey <- c(31500,66375,131250,241500,31500,66375,131250,241500,283125,31500,66375,131250)
wind250_1 <- c(69.4,69.4,1,1,31.08,37.07,1,1,1,22.8,19.45,1)
lee250_1 <- c(79.84,125.56,93.34,94.42,33.78,49.6,38.95,40.9,39.32,24.2,32.95,27.46)
df<- data.frame(ROS,wind,rey,wind250_1,lee250_1)
ggplot() +
stat_ma_line(df, mapping=aes(rey, lee250_1), method="RMA",
range.y = "interval", range.x = "interval",
linewidth = 1,fill = "yellow") +
geom_point(df, mapping = aes(x = rey, lee250_1, colour=factor(ROS)),
size=3)+
xlab("Re") + ylab((expression(paste(tau~"windward"))))+
scale_x_continuous(trans='log10', label = scientific_10) +
scale_y_continuous(trans='log10') +
scale_color_manual(values = c("#0072B2", "#000000","#E7B800","#CC79A7")) +
labs(colour = "ROS (m/s)") +
theme_bw()
When I plot using the variable "y = wind250_1", the code work with no problem. But when I try to use the variable "y = lee250_1" it gives the "Error: Discrete value supplied to continuous scale". The variable is numeric (checked the class) and here are a few things I tried it didn't work: use y= as.numeric(lee250_1) in ggplot code, change the name of the variables, run ggplot code without the lines scale_x_continuous(), scale_y_continuous(), and scale_color_manual().
The error I am getting is probably related to the stat_ma_line() because I tried to plot using geom_line() and it did work but I need to use stat_ma_line. So any help on how to solve this error is very much appreciated!!

You probably have too less points per group (probably you need more than 7 points per group), that's why you get an error. I added some fake data and now it works:
ROS<- c(0.03, 0.03, 0.03, 0.03, 0.07, 0.07, 0.07, 0.07, 0.07, 0.03, 0.03, 0.03, 0.03, 0.07, 0.07, 0.07, 0.07, 0.07)
rey <- c(31500,66375,131250,241500,31500,66375,131250,241500,131250, 31600,66475,131350,241600,31300,66575,132250,242500,283425)
lee250_1 <- c(79.84,125.56,93.34,94.42,33.78,49.6,24.2,32.95, 79.94,122.54,92.34,91.42,32.78,43.6,31.95,44.9,32.32,22.2)
library(ggplot2)
library(ggpmisc)
df<- data.frame(ROS,rey,lee250_1)
ggplot(df, aes(rey, lee250_1)) +
geom_point(aes(colour = factor(ROS))) +
stat_ma_line(method = "RMA",
range.y = "interval", range.x = "interval", fill = 'yellow', linewidth = 1) +
xlab("Re") + ylab((expression(paste(tau~"windward"))))+
scale_x_continuous(trans='log10') +
scale_y_continuous(trans='log10') +
scale_color_manual(values = c("#0072B2", "#000000","#E7B800","#CC79A7")) +
labs(colour = "ROS (m/s)") +
theme_bw()
Created on 2023-01-26 with reprex v2.0.2

It looks like that the model fit returns NAs for RMA and ggplot would get problems with that. Please see the answer in more details here https://github.com/aphalo/ggpmisc/issues/36

Related

Plotting of the mean in boxplot before axis log transformation in R

I want to include the mean inside the boxplot but apparently, the mean is not located at the position where it is supposed to be. If I calculate the mean from the data it is 16.2, which would equal 1.2 at the log scale. I tried various things, e.g., changing the position of the stat_summary function before or after the transformation but this does not work.
Help is much appreciated!
Yours,
Kristof
Code:
Data:
df <- c(2e-05, 0.38, 0.63, 0.98, 0.04, 0.1, 0.16, 0.83, 0.17, 0.09, 0.48, 4.36, 0.83, 0.2, 0.32, 0.44, 0.22, 0.23, 0.89, 0.23, 1.1, 0.62, 5, 340, 47) %>% as.tibble()
Output:
df %>%
ggplot(aes(x = 0, y = value)) +
geom_boxplot(width = .12, outlier.color = NA) +
stat_summary(fun=mean, geom="point", shape=21, size=3, color="black", fill="grey") +
labs(
x = "",
y = "Particle counts (P/kg)"
) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x)))
The mean calculated by stat_summary is the mean of log10(value), not of value. Below I propose to define a new function my_mean for a correct calculation of the average value.
library(ggplot2)
library(dplyr)
library(tibble)
library(scales)
df <- c(2e-05, 0.38, 0.63, 0.98, 0.04, 0.1, 0.16,
0.83, 0.17, 0.09, 0.48, 4.36, 0.83, 0.2, 0.32, 0.44,
0.22, 0.23, 0.89, 0.23, 1.1, 0.62, 5, 340, 47) %>% as.tibble()
# Define the mean function
my_mean <- function(x) {
log10(mean(10^x))
}
df %>%
ggplot(aes(x = 0, y = value)) +
geom_boxplot(width = .12, outlier.color = NA) +
stat_summary(fun=my_mean, geom="point", shape=21, size=3, color="black", fill="grey") +
labs(
x = "",
y = "Particle counts (P/kg)"
) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))

Scale fill gradient using absolute values

In the following chart, I would like a gradient to be applied at an absolute value level, rather than relative values. For example, rows I and G should be the same color of red as their values are -75 and 75, respectively. By the same token, rows F and E should be the same shade of green as their values are -15 and 15, respectively. Can anyone tell me how I would do this?
library(dplyr)
library(ggplot2)
data.frame(grp = LETTERS[1:10],
vals = c(0.11, 0.39, -0.06, 0.42, 0.15, -0.15, 0.75, -0.02, -0.75, 0.00)) %>%
ggplot(aes(x = vals, y = grp, fill = vals)) +
geom_col() +
scale_fill_gradient(low = "green", high = "red")
You could simply use fill = abs(vals)
data.frame(grp = LETTERS[1:10],
vals = c(0.11, 0.39, -0.06, 0.42, 0.15, -0.15, 0.75, -0.02, -0.75, 0.00)) %>%
ggplot(aes(x = vals, y = grp, fill = abs(vals))) +
geom_col() +
scale_fill_gradient(low = "green", high = "red")

ggplot legend for color shows dot instead of bar

documentFracation <- c(0.164, 0.196, 0.102, 0.166, 0.145, 0.017,
0.144, 0.258, 0.139, 0.019, 0.155, 0.013,
0.001,0.099,0.007)
tsSTDCommoncrawl <- c(19,21,23,30,33,34,
38,52,54,65,90,123,
180,181,1014)
average_dice <- c(0.495, 0.505, 0.495, 0.615, 0.48, 0.385,
0.5, 0.555, 0.4, 0.33, 0.405, 0.33,
0.19, 0.32, 0.145)
std_dice <- c(0.278, 0.213, 0.240, 0.184, 0.175,
0.240, 0.282, 0.261, 0.262, 0.2188,
0.2191, 0.1989, 0.143, 0.1874, 0.086)
data2 <- data.frame(type=crawls, df=documentFracation, ts=tsSTDCommoncrawl,
avgdice=average_dice)
# generate scatter plot chart by crawl type with size of point corresponding to max dice value
p <- ggplot() +
geom_point(data=data2, aes(x=df, y=ts, size=avgdice, fill = std_dice), shape=21)
p <- p + scale_y_continuous(trans = 'log10')
# add labels besides points
p <- p + geom_text(data=data2, aes(x=df, y=ts, label=avgdice), size=2, hjust=0.5,vjust= -2)
#add scaled colr as paired color from brewer
#p <- p + scale_color_manual(values=colors)
# legend title
p <- p + guides(fill =
guide_legend(title = "",
label.position = "right",
#keywidth=0.25,
#keyheight=0.2,
default.unit="inch")) + theme(legend.position="right")
p <- p + xlab("document fraction between commoncrawl and directcrawl") +
ylab("timestamp interval standard deviation in commoncrawl (log10 scale)") +
ggtitle("Average Dice value with document fraction \nand timestamp inverval variation") +
#mar=c(top,right,bottom,left)
theme(plot.margin = unit(c(1,1,1,1),"cm"))+
theme(plot.title = element_text(hjust = 0.5))
print(p)
I am using the above code and the graph I generated looks like this
On the right side of the figure, why the color code legend is dot but not bar? And why I have legend on "avgdice" but not the color (which I use std_dice to fill the color)? I want the color legend to look like below:
Thanks for help!

What parameters should I adjust in ggplot if my axis titles are being clipped?

Fairly specific question here, but it may help others who are having similar issues.
I have some simple data:
Y = c(0.02, 0.03, 0.03, 0.04, 0.05, 0.06, 0.08, 0.09, 0.10, 0.13, 0.17, 0.17, 0.21, 0.22,
0.35, 0.47, 0.51, 0.53, 0.54, 0.65, 0.78)
X = c(0.45, 0.26, 0.35, 0.22, 0.37, 0.09, 0.27, 0.51, 0.39, 0.37, 0.37, 0.27, 0.51, 0.36,
0.44, 0.49, 0.63, 0.49, 0.71, 0.56, 0.67)
self1 = data.frame(X, Y)
I also have a simple custom ggplot theme:
plot.theme = theme(axis.text = element_text(size=26), axis.title=element_text(size=28),
plot.title=element_text(size=36, margin=margin(0,0,20,0)), panel.grid.minor
= element_blank(), plot.margin=unit(c(0.1,0.25,0.5,0.85), "cm"), axis.title.y =
element_text(margin=margin(0,15,0,0)), panel.border = element_rect(color="black", fill=NA,
size=2), axis.ticks = element_blank(), legend.title = element_text(size=26), legend.text =
element_text(size=18))
When I plot a scatterplot of the data with marginal histograms:
bing = ggplot(self1, aes(x=X, y=Y)) + geom_point(size=3) +
geom_smooth(method = "lm", se=F, color="black") +
plot.theme +
ylab("Observed selfing rate") +
xlab("Observed crossing rate") +
geom_vline(xintercept = 0.42, linetype="longdash") +
geom_hline(yintercept = 0.25, linetype="longdash")
ggExtra::ggMarginal(bing, type = "histogram", bins=6, size=10)
Everything looks great, except that the "g" in "Observed crossing rate" is getting cut off at the bottom of the graph. I have tried fidgeting with every theme parameter I can think of, and I've also tried adjusting several of the arguments to ggMarginal, but I have yet to find the one I need to change to get everything to stay inside the plot area. Can anyone help me out? I suspect the issue ultimately lies with the way ggMarginal is auto-adjusting the sizes of various theme parameters, but that's just a hunch.
if g is your plot, you can do g$vp = grid::viewport(height=0.9, width=0.9) before drawing it (print or grid.draw)
I haven't found a way to change the plot margins on the object returned by ggMarginal. So, until someone comes along with a better solution, you can modify the code in the ggMarginal function itself. Here's how:
Type ggMarginal in the console. This will print the code of ggMarginal. Paste this code into a script window. Give this function a new name, like my_ggMarginal = [all the ggMarginal code you just pasted in].
Find the following line inside this function:
p <- p + ggplot2::theme(plot.margin = grid::unit(c(0, 0,
0, 0), "null"))
and change it to this:
p <- p + ggplot2::theme(plot.margin = grid::unit(c(0, 0,
1, 0), "lines"))
Run the code for the new function you just created so that my_ggMarginal will be available in your current workspace.
Run your new function on bing:
my_ggMarginal(bing, type = "histogram", bins=6, size=10)

Repeating categories on lattice plot (likert function in R)

I am a novice R user and am trying to create a plot using the likert function from the HH package. My problem seems to come from from repeating category labels. It is easier to show the issue:
library(HH)
responses <- data.frame( Subtable= c(rep('Var1',5),rep('Var2',4),rep('Var3',3)),
Question=c('very low','low','average','high','very high', '<12', '12-14', '15+',
'missing', '<25','25+','missing'), Res1=as.numeric(c(0.05, 0.19, 0.38, 0.24, .07,
0.09, 0.73, 0.17, 0.02, 0.78, 0.20, 0.02)), Res2=as.numeric(c(0.19, 0.04, 0.39,
0.22, 0.06, 0.09, 0.50, 0.16, 0.02, 0.75, 0.46, 0.20)))
likert(Question ~ . | Subtable, responses,
scales=list(y=list(relation="free")), layout=c(1,3),
positive.order=TRUE,
between=list(y=0),
strip=FALSE, strip.left=strip.custom(bg="gray97"),
par.strip.text=list(cex=.6, lines=3),
main="Description of Sample",rightAxis=FALSE,
ylab=NULL, xlab='Percent')
Unfortunately it creates strange spaces that aren't really there, as exhibited in the bottom panel of the following plot:
This seems to come from the repeated category 'missing'. My actual data has several repeats (e.g., 'no', 'other') and whenever they are included I get these extra spaces. If I run the same code but remove the repeated categories then it runs properly. In this case that means changing 'responses' in the code above to responses[! responses$Question %in% 'missing',].
Can someone tell me how to create the graph using all the categories, without getting the 'extra' spaces? Thanks for your help and patience.
-Z
R 3.0.2
HH 3.0-3
lattice 0.20-24
latticeExtra 0.6-26
Here is a solution using ggplot2 to create the graphic
library(ggplot2)
responses <-
data.frame(Subtable = c(rep('Var1',5), rep('Var2',4), rep('Var3',3)),
Question = c('very low','low','average','high','very high',
'<12', '12-14', '15+', 'missing', '<25','25+',
'missing'),
Res1 = as.numeric(c(0.05, 0.19, 0.38, 0.24, .07, 0.09, 0.73,
0.17, 0.02, 0.78, 0.20, 0.02)),
Res2 = as.numeric(c(0.19, 0.04, 0.39, 0.22, 0.06, 0.09, 0.50,
0.16, 0.02, 0.75, 0.46, 0.20)),
stringsAsFactors = FALSE)
responses$Subtable <- factor(responses$Subtable, levels = paste0("Var", 1:3))
responses$Question <-
factor(responses$Question,
levels = c("missing", "25+","<25", "<12", "12-14", "15+",
"very low", "low", "average", "high", "very high"))
ggplot(responses) +
theme_bw() +
aes(x = 0, y = Question) +
geom_errorbarh(aes(xmax = 0, xmin = Res1, color = "red")) +
geom_errorbarh(aes(xmin = 0, xmax = -Res2, color = "blue")) +
facet_wrap( ~ Subtable, ncol = 1, scale = "free_y") +
scale_color_manual(name = "",
values = c("red", "blue"),
labels = c("Res1", "Res2")) +
scale_x_continuous(breaks = c(-0.5, 0, 0.5),
labels = c("0.5", "0", "0.5")) +
ylab("") + xlab("Percent") +
theme(legend.position = "bottom")

Resources