How to facet wrap plots with two different factors - r

For example, i have two trials, each with main factors of year and treatment.
I want to plot response to the same plot, preferably Trial one / treatment / year and the same for trial 2.
Closest that i can get is shown on the simple example picture. Basically i get two graphs joined together, one showing effects of treatment in trial 1 and trial 2 with y axis representing content, and other showing effect of year in trial 1 and trial 2 with y axis representing the same content.
Simple example of plot with Si content affected by treatments and year of sampling
Is it possible to facet wrap graphs together, or at least to remove redundant y axis?
Code that i use is:
a <- ggplot(I1, aes(x=fct_reorder(SISTEM, ORDER), y=Si)) + geom_jitter(show.legend=FALSE, width=0.25, color="black", size=0.5) + stat_summary(fun.data = mean_cl_normal, show.legend=FALSE, color="red", size=0.3) + labs(x=NULL,
y="Si (mg / 100 g)") + facet_wrap(~POSKUS, ncol=2, scales="free_x") + theme_classic(base_family = "Palatino Linotype") + theme(axis.text=element_text(colour="black", size=8), axis.title=element_text(colour="black", size=8), axis.text.x=element_text(angle=45, vjust = 1, hjust=1)) + theme(strip.background = element_blank()) + ggplot(I1, aes(x=Leto, y=Si)) + geom_jitter(show.legend=FALSE, width=0.25, color="black", size=0.5) + scale_x_continuous(breaks=c(2016,2017)) + stat_summary(fun.data = mean_cl_normal, show.legend=FALSE, color="red", size=0.3) + labs(x=NULL,
y="") + facet_wrap(~POSKUS + Leto, ncol=4, scales="free_x") + theme_classic(base_family = "Palatino Linotype") + theme(axis.text=element_text(colour="black", size=8), axis.title=element_text(colour="black", size=8), axis.text.x=element_text(angle=45, vjust = 1, hjust=1)) + theme(strip.background = element_blank())`
The solution as given below sorta works with minor tweaks still needed.
Using the provided code, and expanding it to sort the treatments as needed, and changing the order of variables in the facet wrap provided the plot as shown in figure 2. However, facets labels are now showing only 1 and 2 rather than Trial 1 and Trial 2. Furthermore, is it possible to have only one Trial 1 and Trial 2 name for both TREATMENT and YEAR variables?
Added dput:
I2 <- structure(list(Leto = c("2016", "2016", "2016", "2016", "2016",
"2016", "2016", "2016", "2016", "2016", "2016", "2016", "2016",
"2016", "2016", "2017", "2017", "2017", "2017", "2017", "2017",
"2017", "2017", "2017", "2017", "2017", "2017", "2017", "2017",
"2017", "2016", "2016", "2016", "2016", "2016", "2016", "2016",
"2016", "2016", "2016"), POSKUS = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("Trial 1", "Trial 2"), class = "factor"),
SISTEM = structure(c(5L, 5L, 5L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L,
9L), .Label = c("Manure-N0", "Manure-N1", "Manure-N2", "Manure-N3",
"No.org-N0", "No.org-N3", "Straw-N0", "Straw-N1", "Straw-N2",
"Straw-N3"), class = "factor"), ORDER = c(1, 1, 1, 2, 2,
2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3,
4, 4, 4, 5, 5, 5, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3), DUSIK = c(0,
0, 0, 0, 0, 0, 55, 55, 55, 110, 110, 110, 165, 165, 165,
0, 0, 0, 0, 0, 0, 55, 55, 55, 110, 110, 110, 165, 165, 165,
165, 165, 165, 0, 0, 0, 55, 55, 55, 110), Si = c(9.35, 11.6,
9.71, 8.96, 6.13, 7.08, 3.74, 3.72, 4.75, 1.3, 1.82, 3.41,
5.13, 3.41, 3.68, 7.67, 7.48, 6.21, 5.02, 9.46, 7.79, 8.11,
3.59, 8.28, 7.36, 9.69, 9.08, 6.46, 5.48, 7.9, 5.27, 4.06,
4.22, 5.6, 5.92, 6.9, 3.26, 4.45, 3.09, 4.38), P = c(2.62,
2.26, 2.33, 2.57, 3.06, 2.99, 1.71, 2.06, 2.18, 1.48, 1.71,
2.66, 2.24, 2.38, 2.55, 2.49, 2.48, 3.9, 2.65, 1.79, 2.88,
2.54, 3.22, 2.54, 2.88, 2.93, 3.26, 2.09, 3.03, 2.56, 2.43,
2.72, 2.59, 2.58, 3.71, 2.5, 2.45, 2.48, 3.49, 3.31), S = c(1.24,
0.95, 1.07, 1.17, 1.15, 1.15, 0.81, 1.08, 1.07, 0.89, 0.85,
1.15, 1.12, 1.22, 1.24, 1.16, 0.98, 1.32, 1.29, 1.04, 1,
0.9, 1.19, 1.03, 1.14, 1.05, 1.14, 1.1, 1.13, 1.25, 0.92,
1.19, 0.84, 1.27, 1.14, 1.05, 1.29, 1.05, 1.15, 1.02), Cl = c(0.39,
0.31, 0.32, 0.3, 0.39, 0.38, 0.24, 0.26, 0.32, 0.35, 0.3,
0.31, 0.3, 0.32, 0.28, 0.3, 0.24, 0.27, 0.29, 0.28, 0.25,
0.34, 0.38, 0.34, 0.33, 0.31, 0.33, 0.33, 0.31, 0.35, 0.25,
0.25, 0.26, 0.35, 0.35, 0.39, 0.33, 0.25, 0.25, 0.28), K = c(4.47,
4.05, 3.59, 4.18, 4.07, 4.43, 3.12, 3.79, 4.63, 5.02, 4.52,
4.49, 4.64, 4.21, 4.38, 4.27, 4.08, 5.23, 3.66, 3.39, 4.14,
3.99, 4.21, 3.83, 4.19, 4.95, 5.11, 3.44, 4.27, 4.6, 4.99,
4.54, 4.12, 3.82, 5.55, 4.48, 3.7, 3.8, 5.08, 4.47), Ca = c(0.78,
0.68, 0.66, 0.69, 0.77, 0.73, 0.46, 0.6, 0.66, 0.59, 0.61,
0.71, 0.77, 0.58, 0.7, 0.61, 0.79, 0.87, 0.77, 0.69, 0.84,
0.62, 0.77, 0.62, 0.66, 0.71, 0.68, 0.59, 0.67, 0.73, 0.62,
0.69, 0.61, 0.69, 0.8, 0.72, 0.56, 0.6, 0.63, 0.65), Ti = c(78.5,
73.7, 74, 69, 68.9, 52.3, 33.7, 35, 26.6, 41, 50.7, 42.2,
33.6, 38.7, 41.5, 56.9, 64.6, 60.1, 69.4, 65.7, 65.7, 52.6,
42.2, 46.1, 50.8, 44.1, 35.6, 47.3, 39.2, 47.7, 39.6, 40.3,
38.2, 67.9, 52.3, 63.1, 43.4, 35.1, 37.2, 27), Fe = c(56.2,
52.9, 57.1, 48.8, 46.7, 35.1, 45.8, 48.6, 49.6, 71.5, 66,
85.7, 45.6, 70.2, 58.8, 75.6, 85.2, 93.9, 85.7, 68.7, 70.1,
61.2, 60.6, 76.8, 113, 68.5, 74.9, 91.9, 44.4, 104, 62.1,
55.3, 78.5, 75.7, 51.7, 53.2, 49, 74.4, 51.9, 57.6), Zn = c(31.3,
29.9, 28, 27.4, 27.9, 27.7, 19.6, 19.6, 22, 20.6, 23.1, 20.6,
25.1, 22.6, 22.7, 32.5, 35.5, 31.1, 28.6, 29.2, 29.6, 21.8,
29.5, 25, 26.1, 24.7, 20.1, 23.9, 20.3, 24.6, 20.3, 21.1,
26.6, 27.4, 32.6, 30.4, 19.9, 21.8, 24.7, 20.7), Br = c(8.54,
7.65, 6.27, 5.83, 7.25, 6.92, 4.74, 4.79, 4.51, 7.53, 5.02,
4.35, 3.98, 3.64, 4.26, 10, 13.7, 12.7, 7.67, 8.62, 10.1,
2.52, 3.63, 2.7, 2.44, 2.73, 2.49, 5.9, 2.52, 2.56, 6.05,
5.6, 6.98, 7.81, 12.3, 8.11, 5.91, 6.01, 6.15, 5.74), Rb = c(1.95,
1.53, 2.12, 1.44, 2.54, 1.84, 1.62, 2.78, 2.35, 3.24, 3.62,
3.48, 4.74, 3.34, 4.21, 5.43, 3.94, 5.55, 3.01, 2.19, 3.34,
3.55, 5.08, 2.63, 5.44, 4.67, 4.71, 6.52, 2.99, 3.24, 4.19,
3.11, 4.11, 1.57, 1.26, 1.14, 1.95, 2.21, 2.57, 2.41), Sr = c(0.94,
0.97, 0.86, 1.07, 1.19, 1.97, 1.08, 1.23, 1.35, 1.23, 1.17,
1.03, 0.86, 0.96, 0.86, 3.51, 1.94, 3.44, 1.47, 1.95, 2.14,
1.36, 4.22, 2.07, 1.92, 1.8, 2.34, 2.89, 2.13, 2.62, 1.3,
1.16, 1.95, 1, 1.41, 0.77, 1.25, 1.09, 1.37, 1.28), N = c(5.68,
4.93, 4.36, 6.36, 4.68, 5, 4.67, 3.38, 3.33, 3.94, 3.61,
3.52, 3.03, 2.74, 2.7, 8.47, 7.33, 5.82, 8.01, 7.34, 7.12,
5.84, 4.5, 4.17, 3.91, 3.57, 2.35, 3.44, 4.2, 1.94, 2.97,
3.1, 3.42, 5.04, 5.42, 7.35, 3.28, 3.55, 4.2, 3.79), C = c(-29.04,
-28.81, -29.12, -28.91, -29.07, -29.13, -29.16, -29.16, -28.98,
-28.81, -28.74, -28.56, -28.58, -28.33, -28.51, -29.49, -30.45,
-30.34, -29.91, -30.13, -30.86, -30.3, -30.23, -30.46, -29.69,
-29.43, -29.74, -29.75, -29.92, -29.52, -28.44, -28.24, -28.01,
-28.68, -28.77, -29.13, -29.13, -29.41, -29, -28.85)), row.names = c(NA,
40L), class = "data.frame")

One way to achieve your desired result using faceting would be to split your dataframe into two like so:
The first dataframe contains the data by treatment, the second the data by year.
In each of these data frames rename the vars to be plotted on the x-axis to have the same name (I chose ´x). Doing so allows use to have one x-axis for the two different variables. But make sure to convert year or ´Leto to a character.
Add an identifier to each data.frame which could be used for facetting (besides your variable POSKUS).
Splitting the data into two df we need both two geom_jitter and two stat_summary layers.
Finally I added a custom labelled function to facet_wrap to show only the trial labels in the facet strip text.
library(dplyr)
library(ggplot2)
d1 <- I2 %>%
select(x = SISTEM, Si, POSKUS) %>%
mutate(name = "SISTEM", name = factor(name, levels = c("SISTEM", "Leto")))
d2 <- I2 %>%
select(x = Leto, Si, POSKUS) %>%
mutate(name = "Leto", name = factor(name, levels = c("SISTEM", "Leto")))
base <- ggplot(mapping = aes(x = x, y = Si)) +
geom_jitter(data = d1, show.legend = FALSE, width = 0.25, color = "black", size = 0.5) +
stat_summary(data = d1, fun.data = mean_cl_normal, show.legend = FALSE, color = "red", size = 0.3) +
geom_jitter(data = d2, show.legend = FALSE, width = 0.25, color = "black", size = 0.5) +
stat_summary(data = d2, fun.data = mean_cl_normal, show.legend = FALSE, color = "red", size = 0.3) +
labs(
x = NULL,
y = "Si (mg / 100 g)"
) +
#theme_classic(base_family = "Palatino Linotype") +
theme_classic() +
theme(axis.text = element_text(colour = "black", size = 8),
axis.title = element_text(colour = "black", size = 8),
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
theme(strip.background = element_blank())
base +
facet_wrap(~name+POSKUS, nrow = 1, scales = "free_x", labeller = function(d) list(as.character(d$POSKUS)))
#> Warning: Removed 1 rows containing missing values (geom_segment).
EDIT To have only one label for each trial after changing the order of the variables you could make use of ggh4x:: facet_nested_wrap:
base +
ggh4x::facet_nested_wrap(~POSKUS+name, nrow = 1, scales = "free_x", labeller = function(d) list(as.character(d$POSKUS)))
#> Warning: Removed 1 rows containing missing values (geom_segment).

Related

R: How to change atomic vector to recursive?

I have been trying to use the $ function to extract a certain column from a list of data frames, and always get the same error.
For reference, here is the code I used to read the data and summarize it into a list:
data_files <- list.files("D:\\My\\Data\\Path\\")
mctd_list<-list() # create empty list
for (i in 1:length(data_files)){
list[[i]]<-as.data.frame(read.csv2(data_files[i]))
} # fill in data into list
So far, so good, no issues here. But when I try to delete rows with a negative value in a certain column from each data frame, I get the error. Here's the loop for removing my rows:
for(i in 1:length(list)){
list<-list[[i]][list[[i]]$column_x>=0,]
}
Now this doesn't work, and I get the error:
$ operator is invalid for atomic vectors
To check, if the list really was atomic, I checked the following:
is.atomic(list[[i]]) # Output: TRUE
is.recursive(list[[i]]) # Output: FALSE
Apparently the data is really atomic, but I can not find a way to transform it into a recursive. I tried
as.data.frame(list[[i]])
before the loop, but it seems to be doing nothing. Afterwards, the list is still atomic.
Is there something I'm missing, or an easy way to get around the issue? I would like to avoid using [[]] for extraction, because this brings with it a bunch of new problems.
P.s. here the dput() Output:
data <- as.data.frame(t(list[[i]])) # to transform into recursive
> dput(data)
structure(list(V1 = c(44854.79028, 19.19, -0.27, 0.01, 0, 1479.2
), V2 = c(44854.79063, 19.01, -0.27, 0.01, 0, 1478.64), V3 = c(44854.79097,
18.72, -0.21, 0.01, 0, 1477.69), V4 = c(44854.79132, 18.45, -0.27,
0.01, 0, 1476.82), V5 = c(44854.79167, 18.06, -0.32, 0.01, 0,
1475.56), V6 = c(44854.79201, 17.64, -0.32, 0.01, 0, 1474.19),
V7 = c(44854.79236, 8.41, -1.35, 17.05, 19.03, 1461.43),
V8 = c(44854.79271, 4.69, 4.48, 21.48, 21.29, 1451.97), V9 = c(44854.79306,
2.95, 9.21, 25.62, 23.82, 1449.9), V10 = c(44854.7934, 2.22,
13.08, 26.96, 24.44, 1448.52), V11 = c(44854.79375, 2.07,
16.76, 27.83, 25.05, 1449.04), V12 = c(44854.7941, 1.83,
21.3, 28.6, 25.51, 1449.09), V13 = c(44854.79444, 1.91, 27.65,
29.26, 26.1, 1450.41), V14 = c(44854.79479, 2.14, 32.79,
29.68, 26.63, 1452.08), V15 = c(44854.79514, 2.37, 37.56,
29.89, 26.98, 1453.46), V16 = c(44854.79549, 2.41, 43.58,
30.16, 27.23, 1454.07), V17 = c(44854.79583, 2.41, 50.66,
30.39, 27.42, 1454.49), V18 = c(44854.79618, 2.37, 58.82,
30.63, 27.59, 1454.77), V19 = c(44854.79653, 2.26, 67.14,
30.8, 27.65, 1454.63), V20 = c(44854.79688, 2.18, 75.72,
31.01, 27.76, 1454.71), V21 = c(44854.79722, 2.03, 84.84,
31.42, 27.97, 1454.71), V22 = c(44854.79757, 2.18, 93.68,
31.74, 28.35, 1455.96), V23 = c(44854.79792, 2.26, 101.75,
31.8, 28.47, 1456.51), V24 = c(44854.79826, 2.3, 102.45,
31.76, 28.47, 1456.64), V25 = c(44854.79861, 2.3, 101.04,
31.76, 28.47, 1456.62), V26 = c(44854.79896, 2.3, 98.83,
31.74, 28.45, 1456.55), V27 = c(44854.79931, 2.3, 90.89,
31.51, 28.26, 1456.12), V28 = c(44854.79965, 2.11, 83.34,
31.34, 27.97, 1454.93), V29 = c(44854.8, 1.99, 75.46, 31.17,
27.74, 1454.06), V30 = c(44854.80035, 1.99, 66.89, 30.99,
27.58, 1453.68), V31 = c(44854.80069, 2.18, 58.35, 30.97,
27.72, 1454.38), V32 = c(44854.80104, 2.3, 50.29, 30.75,
27.63, 1454.46), V33 = c(44854.80139, 2.37, 43.26, 30.57,
27.53, 1454.44), V34 = c(44854.80174, 2.53, 36.86, 30.33,
27.46, 1454.7), V35 = c(44854.80208, 2.37, 30.09, 29.92,
27, 1453.37), V36 = c(44854.80243, 2.03, 22.85, 29.46, 26.36,
1451.12), V37 = c(44854.80278, 1.79, 16.08, 29.01, 25.81,
1449.38), V38 = c(44854.80313, 1.44, 10.6, 28.54, 25.17,
1447.09), V39 = c(44854.80347, 1.25, 5.7, 26.58, 23.46, 1443.52
), V40 = c(44854.80382, 0.93, 1.5, 24.88, 21.87, 1439.74),
V41 = c(44854.80417, 0.33, -1.03, 0, 0, 1403.66), V42 = c(44854.80451,
0.37, -0.97, 0, 0, 1403.86), V43 = c(44854.80486, 0.41, -1.03,
0, 0, 1404.06), V44 = c(44854.80521, 0.37, -0.97, 0, 0, 1403.86
), V45 = c(44854.80556, 0.37, -1.03, 0, 0, 1403.86), V46 = c(44854.8059,
0.29, -0.97, 0, 0, 1403.47), V47 = c(44854.80625, 0.25, -0.97,
0, 0, 1403.27), V48 = c(44854.8066, 0.17, -0.97, 0, 0, 1402.87
), V49 = c(44854.80694, 0, -1.51, 4.89, 4.7, 1408.6), V50 = c(44854.80729,
-0.08, -1.51, 4.83, 4.64, 1408.13), V51 = c(44854.80764,
-0.12, -1.46, 4.81, 4.62, 1407.91), V52 = c(44854.80799,
-0.16, -1.46, 4.8, 4.6, 1407.69), V53 = c(44854.80833, -0.24,
-1.46, 4.77, 4.56, 1407.24), V54 = c(44854.80868, -0.28,
-1.41, 4.77, 4.56, 1407.05), V55 = c(44854.80903, -0.36,
-1.41, 4.74, 4.52, 1406.6), V56 = c(44854.80938, -0.4, -1.41,
4.72, 4.5, 1406.37), V57 = c(44854.80972, -0.4, -1.41, 4.72,
4.5, 1406.37), V58 = c(44854.81007, -0.45, -1.41, 4.23, 4.05,
1405.51), V59 = c(44854.81042, -0.45, -1.46, 4.21, 4.03,
1405.48), V60 = c(44854.81076, -0.45, -1.41, 4.21, 4.03,
1405.48), V61 = c(44854.81111, -0.45, -1.46, 4.19, 4.01,
1405.45), V62 = c(44854.81146, -0.49, -1.41, 4.19, 4.01,
1405.26), V63 = c(44854.81181, -0.49, -1.41, 4.19, 4.01,
1405.26), V64 = c(44854.81215, -0.49, -1.41, 4.17, 3.99,
1405.23), V65 = c(44854.8125, -0.49, -1.41, 4.17, 3.99, 1405.23
), V66 = c(44854.81285, -0.49, -1.41, 4.17, 3.99, 1405.23
), V67 = c(44854.81319, -0.49, -1.41, 4.17, 3.99, 1405.23
), V68 = c(44854.81354, -0.53, -1.41, 4.18, 3.99, 1405.03
), V69 = c(44854.81389, -0.53, -1.46, 4.18, 3.99, 1405.03
), V70 = c(44854.81424, -0.57, -1.41, 4.16, 3.97, 1404.8),
V71 = c(44854.81458, -0.61, -1.35, 4.16, 3.97, 1404.6), V72 = c(44854.81493,
-0.61, -1.46, 4.16, 3.97, 1404.6), V73 = c(44854.81528, -0.61,
-1.41, 4.14, 3.95, 1404.57), V74 = c(44854.81563, -0.65,
-1.35, 5.12, 4.82, 1405.69), V75 = c(44854.81597, -0.73,
-1.35, 4.86, 4.57, 1404.92), V76 = c(44854.81632, -0.73,
-1.35, 3.91, 3.73, 1403.64), V77 = c(44854.81667, -0.69,
-1.35, 3.53, 3.39, 1403.33), V78 = c(44854.81701, -0.65,
-1.3, 3.57, 3.43, 1403.59), V79 = c(44854.81736, -0.61, -1.35,
0.33, 0.35, 1399.44), V80 = c(44854.81771, -0.53, -1.35,
0.76, 0.79, 1400.43), V81 = c(44854.81806, -0.4, -1.19, 3.81,
3.67, 1405.15), V82 = c(44854.8184, -0.32, -1.08, 3.73, 3.61,
1405.45), V83 = c(44854.81875, -0.24, -1.03, 3.92, 3.79,
1406.11), V84 = c(44854.8191, -0.12, -1.14, 0.24, 0.27, 1401.79
), V85 = c(44854.81944, -0.04, -0.92, 3.29, 3.23, 1406.27
), V86 = c(44854.81979, 0.05, -0.92, 0.39, 0.43, 1402.8),
V87 = c(44854.82014, 0.13, -0.86, 0.28, 0.31, 1403.04), V88 = c(44854.82049,
0.21, -0.7, 0, 0, 1403.07), V89 = c(44854.82083, 0.29, -1.03,
0.66, 0.71, 1404.35), V90 = c(44854.82118, 0.37, -1.03, 0.85,
0.91, 1405), V91 = c(44854.82153, 0.41, -0.97, 1.88, 1.94,
1406.58), V92 = c(44854.82188, 0.49, -1.03, 1.86, 1.92, 1406.93
), V93 = c(44854.82222, 0.53, -1.03, 2.12, 2.18, 1407.48),
V94 = c(44854.82257, 0.61, -1.03, 1.91, 1.98, 1407.59), V95 = c(44854.82292,
0.73, -1.03, 1.69, 1.76, 1407.87), V96 = c(44854.82326, 0.81,
-1.03, 1.92, 2, 1408.58), V97 = c(44854.82361, 0.93, -0.97,
1.74, 1.82, 1408.91), V98 = c(44854.82396, 1.05, -0.97, 1.75,
1.84, 1409.5), V99 = c(44854.82431, 1.13, -1.03, 1.71, 1.81,
1409.82), V100 = c(44854.82465, 1.25, -1.03, 1.72, 1.82,
1410.41), V101 = c(44854825, 1.36, -1.03, 1.68, 1.79, 1410.92
), V102 = c(44854.82535, 1.48, -0.97, 1.55, 1.67, 1411.32
), V103 = c(44854.82569, 1.64, -0.92, 0.5, 0.57, 1410.68),
V104 = c(44854.82604, 1.76, -0.92, 0.02, 0.03, 1410.61),
V105 = c(44854.82639, 1.87, -1.03, 0.01, 0.01, 1411.14),
V106 = c(44854.82674, 2.03, -0.97, 0, 0, 1411.86), V107 = c(44854.82708,
2.18, -0.81, 0, 0, 1412.6), V108 = c(44854.82743, 2.41, -0.86,
0, 0, 1413.69), V109 = c(44854.82778, 2.57, -0.86, 0, 0,
1414.4), V110 = c(44854.82813, 2.8, -0.92, 0, 0, 1415.47),
V111 = c(44854.82847, 3.1, -0.92, 0, 0, 1416.88), V112 = c(44854.82882,
3.4, -1.03, 1.91, 2.15, 1420.76)), class = "data.frame", row.names = c("Date_Time",
"Temp_C", "Depth_m", "Salinity_psu", "Conduct_mScm", "Sound.Velocity_msec"
))
>
Dummy data:
list <- list()
list[[1]] <- data.frame(column_x = rnorm(5,0,1), V2 = rnorm(5,0,1))
list[[2]] <- data.frame(column_x = rnorm(5,0,1), V2 = rnorm(5,0,1))
Correct loop:
for(i in 1:length(list)){
list[[i]] <- list[[i]][list[[i]]$column_x > 0,]
}
Although I'm pretty sure there has to be a more elegant solution using packages for lists like purrr.

Comparing test data and prediction outcome

I trying Logistic regression on a dataset. I have successfully divided my dataset into train and test. The regression model also works fine however when I apply it on my test I only get an outcome for 393 observations when the length of my test dataset is 480. How can I compare and get the mismatch or find out what went wrong?
My data has no NAs.
I am trying to create a confusion matrix.
This is my code:
n=nrow(wine_log)
shuffled=wine_log[sample(n),]
train_indices=1:round(0.7*n)
test_indices=(round(0.7*n)+1):n
#Making a new dataset
train=shuffled[train_indices,]
test=shuffled[test_indices,]
wmodel = glm(final_take~., family = binomial, data=train)
summary(wmodel)
result1 = predict(wmodel, newdata = test, type = 'response')
result1 = ifelse(result > 0.5, 1, 0) - Can someone also explain how will removing this affect the outcome?
result1
> table(result1)
result1
0 1
255 138
> table(test$final_take)
Bad Good
418 62
structure(list(fixed_acid = c(7.4, 7.8, 7.8, 11.2, 7.4, 7.4,
7.9, 7.3, 7.8, 7.5), vol_acid = c(0.7, 0.88, 0.76, 0.28, 0.7,
0.66, 0.6, 0.65, 0.58, 0.5), c_acid = c(0, 0, 0.04, 0.56, 0,
0, 0.06, 0, 0.02, 0.36), res_sugar = c(1.9, 2.6, 2.3, 1.9, 1.9,
1.8, 1.6, 1.2, 2, 6.1), chlorides = c(0.076, 0.098, 0.092, 0.075,
0.076, 0.075, 0.069, 0.065, 0.073, 0.071), free_siox = c(11,
25, 15, 17, 11, 13, 15, 15, 9, 17), total_diox = c(34, 67, 54,
60, 34, 40, 59, 21, 18, 102), density = c(0.9978, 0.9968, 0.997,
0.998, 0.9978, 0.9978, 0.9964, 0.9946, 0.9968, 0.9978), pH = c(3.51,
3.2, 3.26, 3.16, 3.51, 3.51, 3.3, 3.39, 3.36, 3.35), sulphates = c(0.56,
0.68, 0.65, 0.58, 0.56, 0.56, 0.46, 0.47, 0.57, 0.8), alcohol = c(9.4,
9.8, 9.8, 9.8, 9.4, 9.4, 9.4, 10, 9.5, 10.5), final_take = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L), .Label = c("Bad", "Good"
), class = "factor")), row.names = c(NA, -10L), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"),
Your line of code here:
result1 = ifelse(result > 0.5, 1, 0)
Should be referencing result1 in the ifelse statement. I'm guessing that result is another object you have in your environment that isn't 480 rows.
So you should use this instead.
result1 = ifelse(result1 > 0.5, 1, 0)
You also asked what this line of code is doing. It's basically a threshold for your predictions from the glm model. If the prediction from the model is greater than 0.50, then you are translating the prediction to a "1". If it's less than or equal to 0.50 then you are translating that prediction to a "0". It's a way to convert a probability to a TRUE/FALSE or 1/0.

Switching colors in ggplot

I have a pretty simple question. I want the boxes that are currently red to be blue and boxes that are blue to be red. I would like for the legend to stay exactly the way it is at the moment. How can I best achieve this?
labels1 <- c("Male sex", "Male sex", "Age", "Age", "Body mass index", "Body mass index" , "SBP", "SBP", "Black", "Black", "Asian", "Asian", "Deprivation", "Deprivation")
labels1 <- factor(labels1, levels = labels1)
Joint <- c("Hip", "Knee", "Hip", "Knee", "Hip", "Knee", "Hip", "Knee", "Hip", "Knee", "Hip", "Knee", "Hip", "Knee")
Joint <- factor(Joint, levels = Joint)
#rr <- c(1.02, 0.79, 2.18, 2.45, 2.93, 1.70, 0.98, 0.98, 0.98, 0.42, 1.32, 0.26, 0.99, 0.99)
#rrlow <- c(0.98, 0.76, 2.11, 2.36, 2.83, 1.62, 0.97, 0.97, 0.83, 0.31, 1.14, 0.19, 0.98, 0.98)
#rrhigh <- c(1.06, 0.83, 2.25, 2.54, 3.03, 1.78, 0.99, 1.00, 1.17, 0.57, 1.52, 0.37, 1.00, 1.00)
rr <- c(0.79, 1.02, 2.45, 2.18, 1.70, 2.93, 0.98, 0.98, 0.42, 0.98, 0.26, 1.32, 0.99, 0.99)
rrlow <- c(0.76, 0.98, 2.36, 2.11, 1.62, 2.83, 0.97, 0.97, 0.31, 0.83, 0.19, 1.14, 0.98, 0.98)
rrhigh <- c(0.83, 1.06, 2.54, 2.25, 1.78, 3.03, 1.00, 0.99, 0.57, 1.17, 0.37, 1.52, 1.00, 1.00)
x <- c(2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13)
rr <- rr[x]
rrlow <- rrlow[x]
rrhigh <- rrhigh[x]
forestdf <- data.frame(labels1, rr, rrhigh, rrlow, Joint)
#fplottable <- data.frame(labels1, figure1table[c(-16,-17), "rrfull"])
#fplottable <- data.frame(labels1, figure1table[c(-16,-17), "rrfull"])
#colors
dotCOLS = c("red3", "dodgerblue4")
barCOLS = c("red3", "dodgerblue4")
forestdf$color <- c(rep(c("white", "gray95"), 7))
p <- ggplot(forestdf, aes(x=rr, y=labels1, xmin=rrlow, xmax=rrhigh))+
geom_pointrange(shape=22, position = position_dodge(width = 0.75), aes(col = Joint, fill = Joint))+
geom_vline(xintercept = 1, linetype=3)+
#scale_colour_identity() +
xlab("Adjusted Hazard Ratio with 95% Confidence Interval")+theme_bw()+scale_y_discrete(limits=rev(labels1))+
scale_x_log10(limits = c(0.125, 4), breaks = c(0.125, 0.25, 0.5, 1, 2, 4), labels=c("0.125", "0.25", "0.5", "1", "2", "4"), expand = c(0,0))+
theme(axis.title.y=element_blank(), axis.text=element_text(size=11, color = "black"))+
theme(plot.margin = margin(30, 5, 10, 12)) +
scale_fill_manual(values = rev(dotCOLS))+scale_color_manual(values = rev(barCOLS))
p
In both your scale_fill_manual and scale_color_manual you are reversing the order of the color values you specify. If you drop the rev around the values the colors will switch. Is this what you want to achieve?
**labels1 <- c("Male sex", "Male sex", "Age", "Age", "Body mass index", "Body mass index" , "SBP", "SBP", "Black", "Black", "Asian", "Asian", "Deprivation", "Deprivation")
labels1 <- factor(labels1, levels = labels1)
Joint <- c("Hip", "Knee", "Hip", "Knee", "Hip", "Knee", "Hip", "Knee", "Hip", "Knee", "Hip", "Knee", "Hip", "Knee")
Joint <- factor(Joint, levels = Joint)
#rr <- c(1.02, 0.79, 2.18, 2.45, 2.93, 1.70, 0.98, 0.98, 0.98, 0.42, 1.32, 0.26, 0.99, 0.99)
#rrlow <- c(0.98, 0.76, 2.11, 2.36, 2.83, 1.62, 0.97, 0.97, 0.83, 0.31, 1.14, 0.19, 0.98, 0.98)
#rrhigh <- c(1.06, 0.83, 2.25, 2.54, 3.03, 1.78, 0.99, 1.00, 1.17, 0.57, 1.52, 0.37, 1.00, 1.00)
rr <- c(0.79, 1.02, 2.45, 2.18, 1.70, 2.93, 0.98, 0.98, 0.42, 0.98, 0.26, 1.32, 0.99, 0.99)
rrlow <- c(0.76, 0.98, 2.36, 2.11, 1.62, 2.83, 0.97, 0.97, 0.31, 0.83, 0.19, 1.14, 0.98, 0.98)
rrhigh <- c(0.83, 1.06, 2.54, 2.25, 1.78, 3.03, 1.00, 0.99, 0.57, 1.17, 0.37, 1.52, 1.00, 1.00)
x <- c(2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13)
rr <- rr[x]
rrlow <- rrlow[x]
rrhigh <- rrhigh[x]
forestdf <- data.frame(labels1, rr, rrhigh, rrlow, Joint)
#fplottable <- data.frame(labels1, figure1table[c(-16,-17), "rrfull"])
#fplottable <- data.frame(labels1, figure1table[c(-16,-17), "rrfull"])
#colors
dotCOLS = c("red3", "dodgerblue4")
barCOLS = c("red3", "dodgerblue4")
forestdf$color <- c(rep(c("white", "gray95"), 7))
p <- ggplot(forestdf, aes(x=rr, y=labels1, xmin=rrlow, xmax=rrhigh))+
geom_pointrange(shape=22, position = position_dodge(width = 0.75), aes(col = Joint, fill = Joint))+
geom_vline(xintercept = 1, linetype=3)+
#scale_colour_identity() +
xlab("Adjusted Hazard Ratio with 95% Confidence Interval")+theme_bw()+scale_y_discrete(limits=rev(labels1))+
scale_x_log10(limits = c(0.125, 4), breaks = c(0.125, 0.25, 0.5, 1, 2, 4), labels=c("0.125", "0.25", "0.5", "1", "2", "4"), expand = c(0,0))+
theme(axis.title.y=element_blank(), axis.text=element_text(size=11, color = "black"))+
theme(plot.margin = margin(30, 5, 10, 12)) +
scale_fill_manual(values = dotCOLS) +
scale_color_manual(values = barCOLS)
p

How to manually specify legend text/color in this ggplot composed of several different geoms?

Please, find my data sample ndd below.
Question: how can I add a customized legend to this ggplot composed of different geoms?
I have tried approaching as described in this thread, but wihtout luck.
I have produced this plot:
However, no matter what I try, I cannot manually change the legend. I have tried show.legend=FALSE and adding "an extra" fake color in geom_area, which did not work or, at least, I did it wrong.
I would like the legend to look like this:
The color order correspond to c("#E1B930", "#2C77BF","#E38072","#6DBCC3").
Obviously, I tried specifying that in cols. However, it then changes the color order in geom_area to start with "#E1B930" (the orange). The one geom_point with color placed at geom_point(x=0,y=18.3) should have "#E1B930" (the orange), as currently in the picture/code, whereas geom_area comes in this color order: c("#2C77BF","#E38072","#6DBCC3").
To me, the tricky part is that the four groups in the legend should have the color order as described: c("#E1B930", "#2C77BF","#E38072","#6DBCC3")
Please, feel free to improve or optimize my script. I am completely new to tidyverse and I am eager to learn.
Thank you in advance.
cols = c("#2C77BF", "#E38072","#6DBCC3", "grey40")
as.data.frame(approx(ndd$lnd, ndd$y, xout=c(ndd$lnd, 8.001, 15.00001,100))) %>%
set_names(c("lnd", "y")) %>%
mutate(xcut = cut(lnd, c(0,8,15,100), include.lowest=TRUE)) %>%
ggplot(aes(lnd, y)) +
geom_area(aes(fill=xcut, color=c("f")), alpha=0.2) +
geom_line(size=6,color="white") +
geom_line(size=2, alpha=.9) +
geom_segment(aes(x=0,y=0,xend=8,yend=0),color="#2C77BF", size=1.1) +
geom_segment(aes(x=8,y=0,xend=15,yend=0),color="#E38072", size=1.1) +
geom_segment(aes(x=15,y=0,xend=35,yend=0),color="#6DBCC3", size=1.1) +
geom_segment(aes(x = 0, y = ndd$y[1], xend = 0, yend = -0.2), lty="solid", size=1.2, color="#E1B930") +
geom_point(aes(x = 0, y = ndd$y[1]), size=5, shape=20, col="#E1B930", alpha=0.5) +
geom_point(aes(x=8,y=0.337),color="black",shape=20, size=5) +
geom_point(aes(x=15,y=0.475),color="black",shape=20, size=5) +
annotate("text", x = 7.8, y = 0.35, label = "15%-point increase",hjust=1,
cex=3.28, vjust=0.5, fontface=2, col="darkgrey") +
annotate("text", x = 14.8, y = 0.493, label = "15%-point increase",hjust=1,
cex=3.28, vjust=0.5, fontface=2, col="darkgrey") +
scale_fill_manual(values=cols, breaks = c("f"), name="") +
scale_colour_manual(values=cols, breaks = c("f"), name="") +
scale_x_continuous(name="LND", breaks=seq(0,100,by=5),
label=c(paste0("LND: 0% \nas baseline"), paste0(seq(5,100,5),"%")), limits=c(0,35)) +
scale_y_continuous(name = "5-yrs risk of death", breaks = seq(0,1,by=.1),
label=paste0(seq(0,100,10),"%")) +
coord_cartesian(ylim=c(0,1)) +
theme(axis.text.x = element_text(colour="grey20",size=11, color=c("#E1B930",rep("grey20",15))),
plot.title = element_text(color = "grey20", size = 13,face="bold",hjust = 0.5))
My data
ndd <- structure(list(y = c(0.183, 0.185, 0.188, 0.191, 0.193, 0.196,
0.199, 0.202, 0.205, 0.208, 0.211, 0.214, 0.217, 0.22, 0.223,
0.226, 0.229, 0.232, 0.235, 0.237, 0.24, 0.243, 0.245, 0.248,
0.25, 0.253, 0.255, 0.257, 0.259, 0.261, 0.263, 0.265, 0.267,
0.269, 0.27, 0.272, 0.273, 0.275, 0.276, 0.278, 0.279, 0.28,
0.281, 0.283, 0.284, 0.285, 0.286, 0.287, 0.288, 0.289, 0.29,
0.292, 0.293, 0.294, 0.295, 0.296, 0.297, 0.299, 0.3, 0.301,
0.302, 0.304, 0.305, 0.307, 0.308, 0.31, 0.311, 0.313, 0.315,
0.316, 0.318, 0.32, 0.322, 0.323, 0.325, 0.327, 0.329, 0.331,
0.333, 0.335, 0.337, 0.339, 0.341, 0.343, 0.345, 0.347, 0.349,
0.351, 0.353, 0.355, 0.358, 0.36, 0.362, 0.364, 0.366, 0.368,
0.371, 0.373, 0.375, 0.377, 0.379, 0.382, 0.384, 0.386, 0.388,
0.39, 0.393, 0.395, 0.397, 0.399, 0.401, 0.403, 0.405, 0.407,
0.41, 0.412, 0.414, 0.416, 0.418, 0.42, 0.422, 0.424, 0.426,
0.428, 0.43, 0.432, 0.434, 0.436, 0.438, 0.439, 0.441, 0.443,
0.445, 0.447, 0.449, 0.451, 0.452, 0.454, 0.456, 0.458, 0.46,
0.461, 0.463, 0.465, 0.467, 0.468, 0.47, 0.472, 0.473, 0.475,
0.477, 0.478, 0.48, 0.481, 0.483, 0.485, 0.486, 0.488, 0.489,
0.491, 0.492, 0.494, 0.495, 0.497, 0.498, 0.5, 0.501, 0.503,
0.504, 0.505, 0.507, 0.508, 0.51, 0.511, 0.512, 0.514, 0.515,
0.516, 0.517, 0.519, 0.52, 0.521, 0.522, 0.524, 0.525, 0.526,
0.527, 0.528, 0.53, 0.531, 0.532, 0.533, 0.534, 0.535, 0.536,
0.537, 0.539, 0.54, 0.541, 0.542, 0.543, 0.544, 0.545, 0.546,
0.547, 0.548, 0.549, 0.55, 0.551, 0.551, 0.552, 0.553, 0.554,
0.555, 0.556, 0.557, 0.558, 0.559, 0.559, 0.56, 0.561, 0.562,
0.563, 0.563, 0.564, 0.565, 0.566, 0.566, 0.567, 0.568, 0.569,
0.569, 0.57, 0.571, 0.572, 0.572, 0.573, 0.574, 0.574, 0.575,
0.576, 0.576, 0.577, 0.577, 0.578, 0.579, 0.579, 0.58, 0.58,
0.581, 0.582, 0.582, 0.583, 0.583, 0.584, 0.584, 0.585, 0.585,
0.586, 0.586, 0.587, 0.587, 0.588, 0.588, 0.589, 0.589, 0.59,
0.59, 0.591, 0.591, 0.592, 0.592, 0.593, 0.593, 0.593, 0.594,
0.594, 0.595, 0.595, 0.595, 0.596, 0.596, 0.597, 0.597, 0.597,
0.598, 0.598, 0.599, 0.599, 0.599, 0.6, 0.6, 0.6, 0.601, 0.601,
0.602, 0.602, 0.602, 0.603, 0.603, 0.603), lnd = c(0, 0.1, 0.2,
0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5,
1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8,
2.9, 3, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4, 4.1,
4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5, 5.1, 5.2, 5.3, 5.4,
5.5, 5.6, 5.7, 5.8, 5.9, 6, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7,
6.8, 6.9, 7, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8,
8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9, 9.1, 9.2, 9.3,
9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10, 10.1, 10.2, 10.3, 10.4, 10.5,
10.6, 10.7, 10.8, 10.9, 11, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6,
11.7, 11.8, 11.9, 12, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7,
12.8, 12.9, 13, 13.1, 13.2, 13.3, 13.4, 13.5, 13.6, 13.7, 13.8,
13.9, 14, 14.1, 14.2, 14.3, 14.4, 14.5, 14.6, 14.7, 14.8, 14.9,
15, 15.1, 15.2, 15.3, 15.4, 15.5, 15.6, 15.7, 15.8, 15.9, 16,
16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 16.9, 17, 17.1,
17.2, 17.3, 17.4, 17.5, 17.6, 17.7, 17.8, 17.9, 18, 18.1, 18.2,
18.3, 18.4, 18.5, 18.6, 18.7, 18.8, 18.9, 19, 19.1, 19.2, 19.3,
19.4, 19.5, 19.6, 19.7, 19.8, 19.9, 20, 20.1, 20.2, 20.3, 20.4,
20.5, 20.6, 20.7, 20.8, 20.9, 21, 21.1, 21.2, 21.3, 21.4, 21.5,
21.6, 21.7, 21.8, 21.9, 22, 22.1, 22.2, 22.3, 22.4, 22.5, 22.6,
22.7, 22.8, 22.9, 23, 23.1, 23.2, 23.3, 23.4, 23.5, 23.6, 23.7,
23.8, 23.9, 24, 24.1, 24.2, 24.3, 24.4, 24.5, 24.6, 24.7, 24.8,
24.9, 25, 25.1, 25.2, 25.3, 25.4, 25.5, 25.6, 25.7, 25.8, 25.9,
26, 26.1, 26.2, 26.3, 26.4, 26.5, 26.6, 26.7, 26.8, 26.9, 27,
27.1, 27.2, 27.3, 27.4, 27.5, 27.6, 27.7, 27.8, 27.9, 28, 28.1,
28.2, 28.3, 28.4, 28.5, 28.6, 28.7, 28.8, 28.9, 29, 29.1, 29.2,
29.3, 29.4, 29.5, 29.6, 29.7, 29.8, 29.9, 30)), row.names = c(NA,
-301L), class = c("data.table", "data.frame"))
Here a possible solution to create your legend of interest.
Basically, I add a group to your approx and cut functions in order to generate 4 groups instead of 3. Then, using scale_fill_manual, it is much easier to get the desired legend.
In order to avoid the repetition of geom_segment, I generate a dataframe that will hold coordinates for each segment based on each groups.
cols_group = c("#E1B930","#2C77BF", "#E38072","#6DBCC3")
DF <- as.data.frame(approx(ndd$lnd, ndd$y, xout=c(ndd$lnd, 0.1,8.001, 15.00001,100))) %>%
set_names(c("lnd", "y")) %>% slice(.,1:nrow(ndd)) %>%
mutate(xcut = cut(lnd, c(0,0.1,8,15,100), include.lowest=TRUE))
segment_df <- DF %>% group_by(xcut) %>%
summarise(xmin = min(lnd, na.rm = TRUE),
xmax = max(lnd, na.rm = TRUE))
Then, I integrate these two dataframe into your ggplot code as follow:
ggplot(DF,aes(lnd, y)) +
geom_area(aes(fill=xcut), alpha=0.2) +
geom_line(size=6,color="white") +
geom_line(size=2, alpha=.9) +
scale_fill_manual(name = "", values = cols_group, labels = paste("Group",1:4))+
geom_segment(data = segment_df,
aes(x = xmin, xend = xmax, y = 0, yend = 0, color = xcut), size = 1.1, show.legend = FALSE)+
geom_segment(aes(x = 0, y = ndd$y[1], xend = 0, yend = 0), lty="solid", size=1.2, color="#E1B930") +
scale_color_manual(values = cols_group)+
geom_point(aes(x = 0, y = ndd$y[1]), size=5, shape=20, col="#E1B930", alpha=0.5) +
geom_point(aes(x=8,y=0.337),color="black",shape=20, size=5) +
geom_point(aes(x=15,y=0.475),color="black",shape=20, size=5) +
annotate("text", x = 7.8, y = 0.35, label = "15%-point increase",hjust=1,
cex=3.28, vjust=0.5, fontface=2, col="darkgrey") +
annotate("text", x = 14.8, y = 0.493, label = "15%-point increase",hjust=1,
cex=3.28, vjust=0.5, fontface=2, col="darkgrey") +
scale_x_continuous(name="LND", breaks=seq(0,100,by=5),
label=c(paste0("LND: 0% \nas baseline"), paste0(seq(5,100,5),"%")), limits=c(0,max(ndd$lnd))) +
scale_y_continuous(name = "5-yrs risk of death", breaks = seq(0,1,by=.1),
label=paste0(seq(0,100,10),"%")) +
coord_cartesian(ylim=c(0,1)) +
theme(axis.text.x = element_text(colour="grey20",size=11, color=c("#E1B930",rep("grey20",15))),
plot.title = element_text(color = "grey20", size = 13,face="bold",hjust = 0.5),
legend.position = "bottom") +
guides(fill = guide_legend(override.aes = list(fill = cols_group, color = cols_group, lwd = 1)))
Does it answer your question ?
EDIT: Adding color borer for each box in legends
As asked by OP, you can add border color to each box of the legend by using guides function and following arguments:
guides(fill = guide_legend(override.aes = list(fill = cols_group, color = cols_group, lwd = 1)))
lwd control the size of the border while color and fill control the border color and the filling of each box.

How do I plot a linear regression line in a specified bin in a histogram?

So we are trying to determine speciation rate as a function of animal weight. Animal weight follows a gaussian distribution when they are plotted altogether; hence, we only want to fit the regression line in the decreasing trend of the histogram. Specifically, the line should start from x = 2.1 and y = 3.0. Fig. 1 is my current plot using the code below, while Fig. 2 is the outcome I would like to acquire (superimposed line via paint), which I don't know how to do. Any help on the matter will be greatly appreciated.
Attached is my code:
x.log = c(-2.9, -2.7, -2.5, -2.3, -2.1, -1.9, -1.7, -1.5, -1.3, -1.1,
-0.9,-0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.5, 0.7, 0.9, 1.1,
1.3, 1.5, 1.7, 1.9, 2.1, 2.3, 2.5, 2.7, 2.9, 3.1, 3.3, 3.5, 3.7,
3.9, 4.1, 4.3, 4.5, 4.7, 4.9, 5.1, 5.3, 5.5, 5.7, 5.9, 6.1,
6.3, 6.5,6.9, 7.1, 7.3, 7.5, 7.7, 7.9)
y.log = c(0, 0, 0, 0.47, 0.60, 0.95, 1.14, 1.38, 1.68, 1.79, 2.10, 2.26,
2.29, 2.39, 2.48, 2.52, 2.79, 2.68, 2.80, 2.84, 2.96, 2.92,
2.91, 3.01, 2.95, 3.05, 2.94, 2.96, 2.98, 2.83, 2.85, 2.83,
2.71, 2.63, 2.61, 2.57, 2.37, 2.26, 2.17, 1.99, 1.87, 1.74,
1.62, 1.36, 1.30, 1.07, 1.20, 0.90, 0.30, 0.69, 0.30, 0.47, 0
0.30, 0)
# plot the histogram
names(log.nspecies) = logbio
log.nspecies = log.nspecies[order (as.numeric(names(log.nspecies)))]
xpos = barplot(log.nspecies, las = 2, space = 0, col = 'red',
xlab = 'ln Weight', ylab = 'ln Number of species')

Resources