R: ggplot2 multiple regression lines grouped by variable - r

I have a dataframe (sample below) with 3 columns. My goal is to have the variable "Return" on the y-axis and "BetaRealized" on the x-axis. Based on that, I would like to have two regression lines grouped by "SML" e.g. one regression line for the two "Theoretical" values and one for the 10 "Empirical" values. Preferably I would like to use ggplot2.
I've looked through several other questions but I wasn't able to find one that fits my case. As I am very new to R, I would greatly appreciate any help. Feel free to help me improve my question for future users if necessary.
Reproducible data sample:
structure(list(SML = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L), .Label = c("Empirical", "Theoretical"), class = "factor"),
Return = c(0.00136162543341773, 0.00327371856919072, 0.00402550498386094,
0.00514512870557883, 0.00491788632261087, 0.00501053666090353,
0.00485590289408263, 0.00576880451680399, 0.00579134238930521,
0.00704131096883141, 0.00471917614445859, 0), BetaRealized = c(0.42574984058487,
0.576898009418581, 0.684024167075167, 0.763551381826944,
0.833875797322081, 0.902738972263857, 0.976227211834564,
1.06544414896672, 1.19436401770255, 1.50932083346054, 0.893219438045588,
0)), class = "data.frame", row.names = c(NA, -12L))

Following AntoniosK comment, it seems the solution is to use geom_smooth with a color argument in the following manner. First, transforming you sample data into a dataframe:
df<-data.frame(structure(list(SML = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L), .Label = c("Empirical", "Theoretical"), class = "factor"),
Return = c(0.00136162543341773, 0.00327371856919072, 0.00402550498386094,
0.00514512870557883, 0.00491788632261087, 0.00501053666090353,
0.00485590289408263, 0.00576880451680399, 0.00579134238930521,
0.00704131096883141, 0.00471917614445859, 0), BetaRealized = c(0.42574984058487,
0.576898009418581, 0.684024167075167, 0.763551381826944,
0.833875797322081, 0.902738972263857, 0.976227211834564,
1.06544414896672, 1.19436401770255, 1.50932083346054, 0.893219438045588,
0)), class = "data.frame", row.names = c(NA, -12L)))
In the sequence, just call ggplot like this:
ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+geom_smooth(method=lm, se=FALSE)
the output will be this one: graph
Addtionally, you can add the equation using the package ggpubr:
ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+stat_smooth(method=lm, se=FALSE)+
stat_regline_equation()
Finally, depending on your objectvei, it may be interesting to use facet_wrap to distinguish the categories:
ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+
stat_smooth(method=lm, se=FALSE)+ facet_wrap(~SML)+
stat_regline_equation()
The image will look like this: graph2

Related

error with stat_compare_means and multiple groups

I would like to label my boxplots with pvalues.
Here is my code:
ggplot(df_annot,aes(x=Insect,y=index,fill=Fungi))+geom_boxplot(alpha=0.8)+
geom_point(aes(fill=Fungi),size = 3, shape = 21,position = position_jitterdodge(jitter.width = 0.02,jitter.height = 0))+
facet_wrap(~Location,scales="free" )+
stat_compare_means(aes(group="Insect"))+
guides(fill=guide_legend("M. robertii")) +
scale_x_discrete(labels= c("I+","I-","soil alone"))+
ylab(index_name)+
theme(plot.title = element_text(size = 18, face = "bold"))+
theme(axis.text=element_text(size=14),
axis.title=element_text(size=14)) +
theme(legend.text=element_text(size=14),
legend.title=element_text(size=14)) +
theme(strip.text.x = element_text(size = 14))
Here is the error message that I'm getting:
Warning messages:
1: Unknown or uninitialised column: 'p'.
2: Computation failed in stat_compare_means(): argument "x" is missing, with no default
3: Unknown or uninitialised column: 'p'.
4: Computation failed in stat_compare_means(): argument "x" is missing, with no default
I've tried moving around the aes() from the main ggplot call to the boxplot call. I've tried different inherit.aes in the stat_compare_means().
I've tried first subsetting the root section and making them separately , but the same error.
Any help is appreciated.
thanks
here is my data:
> dput(df_annot)
structure(list(Location = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Root", "Rhizospheric Soil"
), class = "factor"), Bean = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Bean", "No bean"), class = "factor"),
Fungi = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L), .Label = c("M+", "M-"), class = "factor"), Insect = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Insect",
"NI"), class = "factor"), index = c(2.90952191983974, 3.19997588762484,
2.96753469534499, 2.93030877512644, 2.72220793003196, 3.09008037591454,
2.63687890737919, 2.73583925812843, 3.06766793411045, 3.26431040286099,
3.03361194852963, 2.9181623054061)), row.names = c("S-B1",
"S-B2", "S-B3", "S-BF-1", "S-BF-2", "S-BF-3", "S-BFi-1", "S-BFi-2",
"S-BFi-3", "S-Bi-1", "S-Bi-2", "S-Bi-3"), class = "data.frame")
A possible and easy fix to your error maybe to use the exact variable name (i.e. remove the double quotes from the variable name) rather that the quoted variable name (i.e. character) in the stat_compare_means (), so the function should look like this:
stat_compare_means(aes(group=Insect))
A working example using ggboxplot() is as follows:
library(ggpubr)
boxplot <- ggboxplot(ToothGrowth, x = "dose", y = "len", add = "jitter",
color = "supp", group="supp", palette = "jco", legend.title="Supplier")
boxplot <- boxplot + stat_compare_means(aes(group=supp), label = "p.signif", method="wilcox.test", hide.ns=T, paired=F)
print(bxp.legend)
There is a warning message for the above example, but I do not know how improve the code to remove the warning message:
`cols` is now required.
Please use `cols = c(p)`

Have two colour scales ggplot [duplicate]

This question already has answers here:
Assign color to 2 different geoms and get 2 different legends
(3 answers)
Closed 4 years ago.
I am trying to change have separate colors for my lines and points. My data is split by Arm so at each time-point there should be two dots and two lines connecting them to the previous and future time-point.
I can get both the line and dot colors to change together, but I would like the line to be a different colour, still based on Arm though. As in, I want the lines to be light blue for Arm=1 and yellow for Arm=2, but the dots to stay they color shown below. Is this possible with ggplot?
Any help would be much appreciated.
What I have:
Code:
ggplot(head(TOT, 12), aes(x=VisitNo, y=Mean)) +
geom_line(size=1.5, aes(color=as.factor(Arm))) +
geom_point(size=3, aes(color=as.factor(Arm))) +
scale_colour_manual(values = c("blue", "orange")) +
theme_bw()
Data:
TOT <- structure(list(Arm = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
VisitNo = structure(c(0L, 6L, 12L, 16L, 24L, 36L, 0L, 6L, 12L, 16L, 24L, 36L),
label = "VisitNo", class = c("labelled", "integer")),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("PWB", "SWB", "EWB", "FWB", "AC"), class = "factor"),
Mean = c(25.3025326086957, 25.4365119047619, 25.8333333333333, 21.3452380952381,
26, 26.8235294117647, 25.2272727272727, 25.6172839506173,
25.6805555555556, 21.625976744186, 26.24, 26)),
row.names = c(NA, 12L), class = "data.frame")
If you just want the lines to be a bit lighter than the points, you can use alpha to make the lines a bit transparent:
ggplot(head(TOT, 12), aes(x=VisitNo, y=Mean)) +
geom_line(size=1.5, aes(color=as.factor(Arm)), alpha = 0.4) +
geom_point(size=3, aes(color=as.factor(Arm))) +
scale_colour_manual(values = c("blue", "orange")) +
theme_bw()

creating a factor-based in dendrogram with R and ggplot2

This is not so much a coding as general approach call for help ;-) I prepared a table containing taxonomic information about organisms. But I want to use the "names" of these organisms, so no values or anything where you could compute a distance or clustering with (this is also all the information I have). I just want to use these factors to create a plot that shows the relationship. My data looks like this:
test2<-structure(list(genus = structure(c(4L, 2L, 7L, 8L, 6L, 1L, 3L,
5L, 5L), .Label = c("Aminobacter", "Bradyrhizobium", "Hoeflea",
"Hyphomonas", "Mesorhizobium", "Methylosinus", "Ochrobactrum",
"uncultured"), class = "factor"), family = structure(c(4L, 1L,
2L, 3L, 5L, 6L, 6L, 6L, 6L), .Label = c("Bradyrhizobiaceae",
"Brucellaceae", "Hyphomicrobiaceae", "Hyphomonadaceae", "Methylocystaceae",
"Phyllobacteriaceae"), class = "factor"), order = structure(c(1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Caulobacterales",
"Rhizobiales"), class = "factor"), class = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Alphaproteobacteria", class = "factor"),
phylum = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Proteobacteria", class = "factor")), .Names = c("genus",
"family", "order", "class", "phylum"), class = "data.frame", row.names = c(NA,
9L))
is it necessary to set up artificial values to describe a distance between the levels?
Here is an attempt using data.tree library
First create a string variable in the form:
Proteobacteria/Alphaproteobacteria/Caulobacterales/Hyphomonadaceae/Hyphomonas
library(data.tree)
test2$pathString <- with(test2,
paste(phylum,
class,
order,
family,
genus, sep = "/"))
tree_test2 = as.Node(test2)
plot(tree_test2)
many things can be done after like:
Interactive network:
library(networkD3)
test2_Network <- ToDataFrameNetwork(tree_test2, "name")
simpleNetwork(test2_Network)
or graph styled
library(igraph)
plot(as.igraph(tree_test2, directed = TRUE, direction = "climb"))
check out the vignette
using ggplot2:
library(ggraph)
graph = as.igraph(tree_test2, directed = TRUE, direction = "climb")
ggraph(graph, layout = 'kk') +
geom_node_text(aes(label = name))+
geom_edge_link(arrow = arrow(type = "closed", ends = "first",
length = unit(0.20, "inches"),
angle = 15)) +
geom_node_point() +
theme_graph()+
coord_cartesian(xlim = c(-3,3), expand = TRUE)
or perhaps:
ggraph(graph, layout = 'kk') +
geom_node_text(aes(label = name), repel = T)+
geom_edge_link(angle_calc = 'along',
end_cap = circle(3, 'mm'))+
geom_node_point(size = 5) +
theme_graph()+
coord_cartesian(xlim = c(-3,3), expand = TRUE)

Reposition geom_errorbar on a faceted bargraph

I have found a lot of questions that deal with repositioning error bars in ggplot2, but none that have my particular problem. I hope this isn't a duplicate!
I have a faceted barplot, where I am trying to add in error bars for confidence intervals that have already been calculated. The arguments stat and position don't seem to be having any effect, whether they are in the aes() argument or just with geom_errorbar(). Here is what I am working with:
> dput(anthro.bp)
structure(list(Source = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("PED", "RES"), class = "factor"),
Response = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("A", "B", "C", "D"
), class = "factor"), Value = c(0.5043315, 0.03813694, 0.20757498,
0.249956615, 0.9232598, 0.0142572, 0.0537258, 0.008757155,
0.897265, 0.03153401, 0.06610772, 0.005093254, 0.8360081,
0.03893782, 0.0370325, 0.088021559), Distance = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("Near", "Far"), class = "factor"), UCI = c(0.5853133,
0.07247573, 0.27357566, 0.32335335, 0.9744858, 0.03844421,
0.08841988, 0.04262752, 0.9422062, 0.0540748, 0.09600908,
0.03348959, 1.2445932, 0.11196198, 0.10133358, 0.52272511
), LCI = c(0.4233497, 0.003798153, 0.1415743, 0.17655988,
0.8720338, -0.009929805, 0.01903172, -0.02511321, 0.8523238,
0.008993231, 0.03620636, -0.02330308, 0.427423, -0.034086335,
-0.02726858, -0.34668199)), .Names = c("Source", "Response",
"Value", "Distance", "UCI", "LCI"), row.names = c(NA, -16L), class = "data.frame")
anthro.bp[,4]<-factor(anthro.bp[,4], levels=c("Near","Far"))
bp <- ggplot(anthro.bp, aes(Value, fill=Response))
bp + geom_bar(aes(x=Source,y=Value), stat="identity", position="dodge") +
geom_errorbar(aes(ymin=LCI,ymax=UCI), stat="identity", position="dodge",width=0.25) +
facet_wrap(~Distance) +
labs(x="Disturbance Source", y="Mean Probability")
I have also tried to use position=position_dodge(width=1), again both within the aes() argument and outside of that in the geom_errorbar() command. My graph is as follows in the link (I don't have a high enough reputation to embed images yet, apologies!).
I'm also getting two error messages:
Warning messages:
1: In loop_apply(n, do.ply) :
position_dodge requires non-overlapping x intervals
2: In loop_apply(n, do.ply) :
position_dodge requires non-overlapping x intervals
This is the first time that I have used ggplot2 outside of a classroom environment, so constructive criticism is highly encouraged.
For some reason which I am not clear on, ggplot2 is dodging your bars and error bars by different values. I got around this by manually specifying the dodging width. Also you've set the y and x aesthetics only geom_bar. Note where they are placed now. Lastly, stat='identity' is not needed for the geom_errorbar call.
bp <- ggplot(anthro.bp, aes(x=Source,y=Value, fill=Response))
bp + geom_bar(stat="identity", position = position_dodge(width = 0.90)) +
geom_errorbar(aes(ymin=LCI,ymax=UCI), position = position_dodge(width = 0.90),width=0.25) +
facet_wrap(~Distance) +
labs(x="Disturbance Source", y="Mean Probability")
according to How to make dodge in geom_bar agree with dodge in geom_errorbar, geom_point
is it what you wanted?
anthro.bp$dmin <- anthro.bp$Value - anthro.bp$LCI
anthro.bp$dmax <- anthro.bp$UCI - anthro.bp$Value
ggplot(data=anthro.bp, aes(x=Source, ymin=Value-dmin, ymax=Value+dmax, fill=Response)) +
geom_bar(position=position_dodge(), aes(y=Value), stat="identity") +
geom_errorbar(position=position_dodge(width=0.9), colour="black") + facet_wrap(~Distance) + labs(x="Disturbance Source", y="Mean Probability")
I believe your main problem was the undefined x value (Source) in the aes of bp.
bp <- ggplot(anthro.bp, aes(x=Source,y=Value, fill=Response))
bp + geom_bar(stat="identity", position=position_dodge()) +
geom_errorbar(aes(ymin=LCI,ymax=UCI),width=0.25, stat="identity", position=position_dodge(0.9)) +
facet_wrap(~Distance) +
labs(x="Disturbance Source", y="Mean Probability")

wrong linking point with lines in ggplot

I don't know what I'm missing but I cannot figure out a very simple task. This is a small piece of my dataframe:
dput(df)
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "SOU55", class = "factor"), Depth = c(2L, 4L,
6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L), Value = c(211.8329815,
278.9603866, 255.6111086, 212.6163368, 193.7281895, 200.9584658,
160.9289157, 192.0664419, 174.5951019, 7.162682425)), .Names = c("ID",
"Depth", "Value"), class = "data.frame", row.names = c(NA, -10L
))
What I'm trying to do is simply plotting Depth versus Value with ggplot, this is the simple code:
ggplot(df, aes(Value, Depth))+
geom_point()+
geom_line()
and this the result:
But it is pretty different from what I really want. This is the plot made with Libreoffice:
It seems that ggplot doesn't link correctly the values. What am I doing wrong?
Thanks to all!
You need geom_path() to connect the observations in the original order. geom_line() sorts the data according to the x-aesthetic before plotting:
ggplot(df, aes(Value, Depth))+
geom_point()+
geom_path()

Resources