Question regarding some specific labelling on plots - r

I have a code for a plot that I am trying to add specific labels for.
The data code is this:
CombinedMetricScore<-c("zero", "5", "10", "15", "20", "25", "30", "35", "40",
"45", "50", "60", "M11", "MICKEY", "MEANING", "MICKEYTWO",
"MICKEYTHREE", "MIKE", "PASTA", "MCIDandPASS",
"MICKDorPASS", "MIKEDOORPASS", "WOMAC20andPASS" ,"Ideal")
FalsePositiveRate<-c( 0, 0.05, 0.08, 0.12, 0.2, 0.28, 0.19, 0.5, 0.6, 0.7, 0.8, 0.94,
0.11, 0.28, 0.07, 0.5, 0.08, 0.28, 0.04, 0.3, 0.03, 0.03, 0.22, 1 )
TruePositiveRate<-c(0, 0.31, 0.35, 0.46, 0.69, 0.73, 0.59, 0.92, 0.92, 0.96, 1, 1,
0.46, 0.73, 0.42, 0.88, 0.35, 0.73, 0.46, 0.73, 0.46, 0.46, 0.69, 1)
ScoreOrMetric<-c("Metric", "Score", "Score", "Score", "Score", "Score", "Score", "Score", "Score",
"Score", "Score", "Score", "Metric", "Metric", "Metric", "Metric",
"Metric", "Metric", "Metric", "Metric",
"Metric", "Score", "Score", "Metric" )
COMBINEDSCORETABLE<-data.frame(CombinedMetricScore, FalsePositiveRate, TruePositiveRate, ScoreOrMetric)
The plot code is this:
ggplot(COMBINEDSCORETABLE, aes(x = FalsePositiveRate, y = TruePositiveRate, color = ScoreOrMetric)) +
geom_abline(slope = 1, intercept = .5, lwd = 1.5, color = "grey") +
geom_point(size =2, alpha = .8) +
coord_cartesian(xlim=c(0,1), ylim=c(0, 1)) +
coord_fixed() +
geom_text_repel(label = ifelse(TruePositiveRate > .44 + FalsePositiveRate,
yes = CombinedMetricScore, no = ""),
box.padding = 0.5)
Question: I want to add labels for the following 2 points "5", "45" but I don't know how to add it to my existing plot code.

We can use an | ("OR") in your ifelse logic. In general, though, I recommend only passing the data you need to geom_text_repel instead of everything (most of which having ""), so try this:
ggplot(COMBINEDSCORETABLE, aes(x = FalsePositiveRate, y = TruePositiveRate, color = ScoreOrMetric)) +
geom_abline(slope = 1, intercept = .5, lwd = 1.5, color = "grey") +
geom_point(size =2, alpha = .8) +
coord_cartesian(xlim=c(0,1), ylim=c(0, 1)) +
coord_fixed() +
ggrepel::geom_text_repel(
aes(label = CombinedMetricScore),
box.padding = 0.5,
data = ~ subset(., TruePositiveRate > (0.44 + FalsePositiveRate) | CombinedMetricScore %in% c("5", "45")))

Related

Plotting of the mean in boxplot before axis log transformation in R

I want to include the mean inside the boxplot but apparently, the mean is not located at the position where it is supposed to be. If I calculate the mean from the data it is 16.2, which would equal 1.2 at the log scale. I tried various things, e.g., changing the position of the stat_summary function before or after the transformation but this does not work.
Help is much appreciated!
Yours,
Kristof
Code:
Data:
df <- c(2e-05, 0.38, 0.63, 0.98, 0.04, 0.1, 0.16, 0.83, 0.17, 0.09, 0.48, 4.36, 0.83, 0.2, 0.32, 0.44, 0.22, 0.23, 0.89, 0.23, 1.1, 0.62, 5, 340, 47) %>% as.tibble()
Output:
df %>%
ggplot(aes(x = 0, y = value)) +
geom_boxplot(width = .12, outlier.color = NA) +
stat_summary(fun=mean, geom="point", shape=21, size=3, color="black", fill="grey") +
labs(
x = "",
y = "Particle counts (P/kg)"
) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x)))
The mean calculated by stat_summary is the mean of log10(value), not of value. Below I propose to define a new function my_mean for a correct calculation of the average value.
library(ggplot2)
library(dplyr)
library(tibble)
library(scales)
df <- c(2e-05, 0.38, 0.63, 0.98, 0.04, 0.1, 0.16,
0.83, 0.17, 0.09, 0.48, 4.36, 0.83, 0.2, 0.32, 0.44,
0.22, 0.23, 0.89, 0.23, 1.1, 0.62, 5, 340, 47) %>% as.tibble()
# Define the mean function
my_mean <- function(x) {
log10(mean(10^x))
}
df %>%
ggplot(aes(x = 0, y = value)) +
geom_boxplot(width = .12, outlier.color = NA) +
stat_summary(fun=my_mean, geom="point", shape=21, size=3, color="black", fill="grey") +
labs(
x = "",
y = "Particle counts (P/kg)"
) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))

Need help to display legend and in similar color code to the data

I am visualizing a time-series plot using ggplot2 and trying to combine the legend. I have tried many options but in not yet gotten my desired output. In one plot the lines are missing the color coding and in the other, the chart is missing the legend. My desired output is to have a chart with the legend and the color scheme being the same.
Here is the script where the lines are missing the color-coding;
library(tidyverse)
deviation <- read_csv("C:/Users/JohnWaweru/Documents/Thesis/Data/yearly_CSVs/Turkana_new/2018_new.csv")
deviation %>% ggplot() +
geom_line(aes(x = as.Date(Month), y = Upper_curve, col = 'red'), linetype = 2) +
geom_line(aes(x = as.Date(Month), y = Lower_curve, col = 'red'), linetype = 2) +
geom_line(aes(x = as.Date(Month), y = Mean_NDVI, col = 'red'), linetype = 1) +
geom_line(aes(x = as.Date(Month), y = NDVI_2018, col = 'green'), linetype = 1) +
scale_color_manual(name = 'Legend',
values = c('Mean_NDVI'= 'red', 'NDVI_2018' = 'green', 'Upper_curve' = 'red', 'Lower_curve' = 'red'),
labels = c('Mean_NDVI', 'NDVI_2018', 'Upper_curve','Lower_curve')) +
ylim(0.2, 0.6) +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
ylab(label = "NDVI") +
xlab(label = "Month") +
ggtitle("NDVI Deviation 2018") ```
Here is the Sample data I am working with;
structure(list(Month = structure(c(18262, 18293, 18322, 18353, 18383, 18414), class = "Date"),
Mean_NDVI = c(0.26, 0.23, 0.25, 0.34, 0.36, 0.32),
NDVI_2018 = c(0.22, 0.23, 0.23, 0.41, 0.46, 0.32),
Mean_Std = c(0.01, 0.01, 0.01, 0.02, 0.02, 0.02),
Std_2018 = c(0.01, 0.01, 0.03, 0.03, 0.04, 0.03),
Upper_curve = c(0.27, 0.24, 0.26, 0.36, 0.38, 0.34),
Lower_curve = c(0.25, 0.22, 0.24, 0.32, 0.34, 0.3)),
row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
Setting literal colours only works outside the aes() function or when you use scale_colour_identity(). Most of the time when you want to label individual line layers, you can set aes(..., colour = "My legend label").
library(ggplot2)
deviation <- structure(list(
Month = structure(c(18262, 18293, 18322, 18353, 18383, 18414), class = "Date"),
Mean_NDVI = c(0.26, 0.23, 0.25, 0.34, 0.36, 0.32),
NDVI_2018 = c(0.22, 0.23, 0.23, 0.41, 0.46, 0.32),
Mean_Std = c(0.01, 0.01, 0.01, 0.02, 0.02, 0.02),
Std_2018 = c(0.01, 0.01, 0.03, 0.03, 0.04, 0.03),
Upper_curve = c(0.27, 0.24, 0.26, 0.36, 0.38, 0.34),
Lower_curve = c(0.25, 0.22, 0.24, 0.32, 0.34, 0.3)),
row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")
)
ggplot(deviation) +
geom_line(aes(x = Month, y = Upper_curve, colour = 'Upper_curve'), linetype = 2) +
geom_line(aes(x = Month, y = Lower_curve, colour = 'Lower_curve'), linetype = 2) +
geom_line(aes(x = Month, y = Mean_NDVI, colour = 'Mean_NDVI'), linetype = 1) +
geom_line(aes(x = Month, y = NDVI_2018, colour = 'NDVI_2018'), linetype = 1) +
scale_color_manual(
name = 'Legend',
values = c('Mean_NDVI'= 'red', 'NDVI_2018' = 'green',
'Upper_curve' = 'red', 'Lower_curve' = 'red'),
# Setting appropriate linetypes
guide = guide_legend(
override.aes = list(linetype = c(2,1,1,2))
)
) +
ylim(0.2, 0.6) +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
ylab(label = "NDVI") +
xlab(label = "Month") +
ggtitle("NDVI Deviation 2018")
Created on 2021-08-05 by the reprex package (v1.0.0)

Text labels in scatterplot not overlapping trend line in r (ggplot)

I am trying to create a scatterplot using ggplot. Is there a way to stop my text labels from overlapping the trend line?
I was only able to stop overlapping the text labels from each other.
rownames = c("dummy", "dummy", "dummy", "dummy", "dummy", "dummy","dummy", "dummy", "dummy", "dummy")
corr_truth = c(-0.39, -0.13, 0.28, -0.49, -0.14, 0.52, 0.43, 0.22, -0.29, -0.02)
corr_pred= c(-0.41, 0.01, 0.36, -0.38, -0.28, 0.44, 0.26, 0.24, -0.38, -0.23)
corr_complete = data.frame(rownames, corr_truth,corr_pred)
plot_corr_complete = ggplot(data = corr_complete, aes(corr_truth, corr_pred)) + geom_point() +
xlim(-0.5,0.7) +
ylim(-0.5,0.7) +
geom_text(label = corr_complete$rownames, nudge_x = 0.08, nudge_y = 0.005, check_overlap = T) +
geom_smooth(method = "lm", se = FALSE, color = "black")
plot_corr_complete
An example using ggrepel. I needed to add some padding to the solution, so the labels did not overlap the trend line.
library(tidyverse);library(ggrepel)
rownames = c("dummy", "dummy", "dummy", "dummy", "dummy", "dummy","dummy", "dummy", "dummy", "dummy")
corr_truth = c(-0.39, -0.13, 0.28, -0.49, -0.14, 0.52, 0.43, 0.22, -0.29, -0.02)
corr_pred= c(-0.41, 0.01, 0.36, -0.38, -0.28, 0.44, 0.26, 0.24, -0.38, -0.23)
corr_complete = data.frame(rownames, corr_truth,corr_pred)
plot_corr_complete = ggplot(data = corr_complete, aes(corr_truth, corr_pred)) + geom_point() +
xlim(-0.5,0.7) +
ylim(-0.5,0.7) +
geom_text_repel(label = corr_complete$rownames,point.padding = 0.2,
nudge_y = 0.005, nudge_x = 0.02) +
geom_smooth(method = "lm", se = FALSE, color = "black")
plot_corr_complete
ggrepel package provides functions to avoid texts from overlapping.
Once youve installed the package, load it before running the following code
Revised code worked from my machine:
rownames = c("dummy", "dummy", "dummy", "dummy", "dummy", "dummy","dummy", "dummy", "dummy", "dummy")
corr_truth = c(-0.39, -0.13, 0.28, -0.49, -0.14, 0.52, 0.43, 0.22, -0.29, -0.02)
corr_pred= c(-0.41, 0.01, 0.36, -0.38, -0.28, 0.44, 0.26, 0.24, -0.38, -0.23)
corr_complete = data.frame(rownames, corr_truth,corr_pred)
plot_corr_complete = ggplot(data = corr_complete, aes(corr_truth, corr_pred, label = rownames)) + geom_point() +
xlim(-0.5,0.7) +
ylim(-0.5,0.7) +
geom_text_repel() +
geom_smooth(method = "lm", se = FALSE, color = "black")
plot_corr_complete
Hope this helps

Different colors in ggplot when you upload a file vs. when you create a data of your own in R

So, I have a dataset in .csv format, which is attached here, and I want to make the following graph in ggplot2.
df1 <- read.csv("obesity.csv")
ggplot(df1, aes(x = Levels, y = OR, color = col)) +
geom_point(size = 3) +
geom_errorbar(aes(ymax = UCI, ymin = LCI), width = 0.3) +
coord_flip() +
facet_grid(name ~ ., scales = "free_y", space = "free_y", switch = "y") +
geom_hline(yintercept = 1, size=1, col="brown") +
scale_x_discrete(position = "top") +
theme_bw() +
scale_color_manual(guide = FALSE, values = c("red", "blue", "black", "green", "orange",
"chocolate", "brown", "grey", "tomato", "navyblue", "purple1", "orchid4"))
As you can see the order of colors are completely different than what I chose.
However, if I create the data in R by myself, then there is no problem.
df2 <- data.frame(
Levels = as.factor(c("Africa", "Europe", "Latin America", "Middle East", "High",
"Upper middle", "Lower middle", "Secondary", "Post-Secondary", ">= 9 years", "Male",
"> 60 min/day")),
OR = c(0.72, 3.17, 0.51, 0.51, 0.38, 0.94, 1.04, 2.22, 3.49, 2.24, 1.9, 0.44),
LCI = c(0.23, 0.68, 0.09, 0.2, 0.09, 0.29, 0.49, 0.72, 1.33, 1.14, 0.93, 0.19),
UCI = c(2.28,9.5,2.92,1.32,1.62,3.07,2.22,6.77,9.16,4.39,3.87,1.03),
name = as.factor(c("Region", "Region", "Region", "Region", "Income", "Income",
"Income", "Education", "Education", "Age", "Sex", "PAL")),
col = paste("col",1:12, sep=""))
Now, I am trying to understand what is the difference between the 2, and how can I set colors the same way as I want, when I am using a dataset in csv format.
Edit: I changed the variables in df2 to be factors, not characters.

Repeating categories on lattice plot (likert function in R)

I am a novice R user and am trying to create a plot using the likert function from the HH package. My problem seems to come from from repeating category labels. It is easier to show the issue:
library(HH)
responses <- data.frame( Subtable= c(rep('Var1',5),rep('Var2',4),rep('Var3',3)),
Question=c('very low','low','average','high','very high', '<12', '12-14', '15+',
'missing', '<25','25+','missing'), Res1=as.numeric(c(0.05, 0.19, 0.38, 0.24, .07,
0.09, 0.73, 0.17, 0.02, 0.78, 0.20, 0.02)), Res2=as.numeric(c(0.19, 0.04, 0.39,
0.22, 0.06, 0.09, 0.50, 0.16, 0.02, 0.75, 0.46, 0.20)))
likert(Question ~ . | Subtable, responses,
scales=list(y=list(relation="free")), layout=c(1,3),
positive.order=TRUE,
between=list(y=0),
strip=FALSE, strip.left=strip.custom(bg="gray97"),
par.strip.text=list(cex=.6, lines=3),
main="Description of Sample",rightAxis=FALSE,
ylab=NULL, xlab='Percent')
Unfortunately it creates strange spaces that aren't really there, as exhibited in the bottom panel of the following plot:
This seems to come from the repeated category 'missing'. My actual data has several repeats (e.g., 'no', 'other') and whenever they are included I get these extra spaces. If I run the same code but remove the repeated categories then it runs properly. In this case that means changing 'responses' in the code above to responses[! responses$Question %in% 'missing',].
Can someone tell me how to create the graph using all the categories, without getting the 'extra' spaces? Thanks for your help and patience.
-Z
R 3.0.2
HH 3.0-3
lattice 0.20-24
latticeExtra 0.6-26
Here is a solution using ggplot2 to create the graphic
library(ggplot2)
responses <-
data.frame(Subtable = c(rep('Var1',5), rep('Var2',4), rep('Var3',3)),
Question = c('very low','low','average','high','very high',
'<12', '12-14', '15+', 'missing', '<25','25+',
'missing'),
Res1 = as.numeric(c(0.05, 0.19, 0.38, 0.24, .07, 0.09, 0.73,
0.17, 0.02, 0.78, 0.20, 0.02)),
Res2 = as.numeric(c(0.19, 0.04, 0.39, 0.22, 0.06, 0.09, 0.50,
0.16, 0.02, 0.75, 0.46, 0.20)),
stringsAsFactors = FALSE)
responses$Subtable <- factor(responses$Subtable, levels = paste0("Var", 1:3))
responses$Question <-
factor(responses$Question,
levels = c("missing", "25+","<25", "<12", "12-14", "15+",
"very low", "low", "average", "high", "very high"))
ggplot(responses) +
theme_bw() +
aes(x = 0, y = Question) +
geom_errorbarh(aes(xmax = 0, xmin = Res1, color = "red")) +
geom_errorbarh(aes(xmin = 0, xmax = -Res2, color = "blue")) +
facet_wrap( ~ Subtable, ncol = 1, scale = "free_y") +
scale_color_manual(name = "",
values = c("red", "blue"),
labels = c("Res1", "Res2")) +
scale_x_continuous(breaks = c(-0.5, 0, 0.5),
labels = c("0.5", "0", "0.5")) +
ylab("") + xlab("Percent") +
theme(legend.position = "bottom")

Resources