Related
I'm trying to fix an issue with my GGBalloonPlot graph with regards to how R processes the axis labels.
By default R plots the data using the labels ranked in reverse alphabetical order but to reveal the pattern of the data, the data need to be plotted in a specific order. The only way I've been able to do trick the software is by manually adding a prefix to each label in my .csv table so that R would rank them properly in my output. This is time consuming since I need to manually order the data first before adding the prefix and then plotting.
I would like to input a character vector (or something like that) which would essentially specify the order in which I want to have the data plotted which would reveal the pattern without the need for a prefix in the label name.
I have made some attempts with "scale_y_discrete" without success. I would also like to do the same thing for the X axis since I've had to use the same "trick" to display the columns in the proper non-alphabetical order which offsets the position of the labels. Any idea on how to get GGplot to display my values as seen in the graph without having to "trick" the software since this is quite time consuming ?
Data + Code
#Assign data to "Stack_Overflow_DummyData"
Stack_Overflow_DummyData <- structure(list(Species = structure(c(8L, 3L, 1L, 5L, 6L, 2L,
7L, 4L, 8L, 3L, 1L, 5L, 6L, 2L, 7L, 4L, 8L, 3L, 1L, 5L, 6L, 2L,
7L, 4L, 8L, 3L, 1L, 5L, 6L, 2L, 7L, 4L), .Label = c("Ani", "Cal",
"Can", "Cau", "Fis", "Ort", "Sem", "Zan"), class = "factor"),
Species_prefix = structure(c(8L, 7L, 6L, 5L, 4L, 3L, 2L,
1L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 8L, 7L, 6L, 5L, 4L, 3L,
2L, 1L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L), .Label = c("ac.Cau",
"ad.Sem", "af.Cal", "ag.Ort", "as.Fis", "at.Ani", "be.Can",
"bf.Zan"), class = "factor"), Dist = structure(c(2L, 3L,
5L, 2L, 1L, 1L, 4L, 5L, 2L, 3L, 5L, 2L, 1L, 1L, 4L, 5L, 2L,
3L, 5L, 2L, 1L, 1L, 4L, 5L, 2L, 3L, 5L, 2L, 1L, 1L, 4L, 5L
), .Label = c("End", "Ind", "Pan", "Per", "Wid"), class = "factor"),
Region = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Cen", "Col",
"Far", "Nor"), class = "factor"), Region_prefix = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), .Label = c("a.Far", "b.Nor", "c.Cen", "d.Col"), class = "factor"),
Frequency = c(75, 50, 25, 50, 0, 0, 0, 0, 11.1, 22.2, 55.6,
55.6, 11.1, 0, 5.6, 0, 0, 2.7, 36.9, 27.9, 65.8, 54.1, 37.8,
28.8, 0, 0, 0, 3.1, 34.4, 21.9, 78.1, 81.3)), class = "data.frame", row.names = c(NA,
-32L))
# Plot Data With Prefix Trick
library(ggplot2)
library(ggpubr)
# make color base on Dist, size and alpha dependent on Frequency
ggballoonplot(Stack_Overflow_DummyData, x = "Region_prefix", y = "Species_prefix",
size = "Frequency", size.range = c(1, 9), fill = "Dist") +
theme_set(theme_gray() +
theme(legend.key=element_blank())) +
# Sets Grey Theme and removes grey background from legend panel
theme(axis.title = element_blank()) +
# Removes X axis title (Region)
geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4)
# Add Frequency Values Next to the circles
# Plot Data Without Prefix Trick
library(ggplot2)
library(ggpubr)
# make color base on Dist, size and alpha dependent on Frequency
ggballoonplot(Stack_Overflow_DummyData, x = "Region", y = "Species",
size = "Frequency", size.range = c(1, 9), fill = "Dist") +
theme_set(theme_gray() +
theme(legend.key=element_blank())) +
# Sets Grey Theme and removes grey background from legend panel
theme(axis.title = element_blank()) +
# Removes X axis title (Region)
geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4)
# Add Frequency Values Next to the circles
Here below are the graphs
Good Graph.
Using the label prefix trick with the visible pattern in the data:
Wrong Graph (R default).
Without the prefix trick when GGplot automatically orders the data/labels and the graph makes no sense:
To sum up, I would like the Good graph output without having to have to previously add a prefix in my labels.
Many Thanks in advance for your help.
For the axis labels I would define a previous function to override the breaks:
shlab <- function(lbl_brk){
sub("^[a-z]+\\.","",lbl_brk) # removes the starts of strings as a. or ab.
}
Then, to change the labels you just have to use scale_x,y_discrete with labels = shlab (if you look at the help of scale_x_discrete you will see that one of the options for labels is A function that takes the breaks as input and returns labels as output).
For the colours would be enough to change them (values) in scale_fill_manual and for the sizes, using guides so:
library(ggplot2)
library(ggpubr)
shlab <- function(lbl_brk){
sub("^[a-z]+\\.","",lbl_brk)
}
ggballoonplot(Stack_Overflow_DummyData, x = "Region_prefix", y = "Species_prefix", size = "Frequency", size.range = c(1, 9), fill = "Dist") +
scale_x_discrete(labels = shlab) +
scale_y_discrete(labels = shlab) +
scale_fill_manual(values = c("green", "blue", "red", "black", "white")) +
guides(fill = guide_legend(override.aes = list(size=8))) +
theme_set(theme_gray() + theme(legend.key=element_blank())) + # Sets Grey Theme and removes grey background from legend panel
theme(axis.title = element_blank()) + # Removes X axis title (Region)
geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4) # Add Frequency Values Next to the circles
UPDATE:
With the new dataset and vector labels:
library(ggplot2)
library(ggpubr)
# make color base on Dist, size and alpha dependent on Frequency
ggballoonplot(Stack_Overflow_DummyData, x = "Region", y = "Species",
size = "Frequency", size.range = c(1, 9), fill = "Dist") +
scale_y_discrete(limits = c("Cau", "Sem", "Cal", "Ort", "Fis", "Ani", "Can", "Zan")) +
scale_x_discrete(limits = c("Far", "Nor", "Cen", "Col")) +
theme_set(theme_gray() +
theme(legend.key=element_blank())) +
# Sets Grey Theme and removes grey background from legend panel
theme(axis.title = element_blank()) +
# Removes X axis title (Region)
geom_text(aes(label=Frequency), alpha=1.0, size=3, nudge_x = 0.4)
My question is related to this question. I want "2014" in the 4-year facet. I tried to repeat but my code doesn't give what I want.
Annotating text on individual facet in ggplot2
This is my data
structure(list(Rot = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("2-year",
"3-year", "4-year"), class = "factor"), Rot.Herb = structure(c(3L,
3L, 4L, 4L, 13L, 13L, 14L, 14L, 5L, 5L, 6L, 6L, 9L, 9L, 10L,
10L, 15L, 15L, 16L, 16L, 1L, 1L, 2L, 2L, 7L, 7L, 8L, 8L, 11L,
11L, 12L, 12L, 17L, 17L, 18L, 18L), .Label = c("A4-conv", "A4-low",
"C2-conv", "C2-low", "C3-conv", "C3-low", "C4-conv", "C4-low",
"O3-conv", "O3-low", "O4-conv", "O4-low", "S2-conv", "S2-low",
"S3-conv", "S3-low", "S4-conv", "S4-low"), class = "factor"),
variable = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Diversity",
"Evenness"), class = "factor"), N = c(4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4), value = c(0.78537789925, 0.613408315,
1.305194686, 0.79519430975, 0.4481728555, 0.30608817425,
1.20978861475, 0.8580643725, 0.92387324875, 0.630166121,
0.945954185, 0.561172324, 1.43952456275, 0.8616864655, 1.23679146725,
0.831737624, 1.033474108, 0.80689293925, 0.9910142125, 0.79342098075,
1.175512223, 0.6293940245, 0.981614832, 0.62342189825, 1.351710013,
0.805075937, 1.6598348325, 0.7983622545, 1.01606920875, 0.5751418795,
1.0500365255, 0.56408326225, 1.07162937725, 0.6756859865,
0.45699816625, 0.44444147325), sd = c(0.354077266902404,
0.208934910331856, 0.169501822767995, 0.0774319459391732,
0.737366460962239, 0.40697977697835, 0.494107033311986, 0.11906912863268,
0.491492768082854, 0.34236657107712, 0.219739438843007, 0.205905593411204,
0.319301583035043, 0.0696484379979274, 0.0563293598951725,
0.0978700910274188, 0.446850757364563, 0.175073468716825,
0.426859848850874, 0.180469101499932, 0.526842123835502,
0.200470277385505, 0.574885944755375, 0.27189545397305, 0.39621771945215,
0.150798258847229, 0.275863362594154, 0.111178397407429,
0.254811233135664, 0.158920851982914, 0.198698241334475,
0.0730606635175717, 0.717706309307313, 0.453776579066358,
0.574276936403411, 0.513758415496589), se = c(0.177038633451202,
0.104467455165928, 0.0847509113839974, 0.0387159729695866,
0.368683230481119, 0.203489888489175, 0.247053516655993,
0.0595345643163399, 0.245746384041427, 0.17118328553856,
0.109869719421504, 0.102952796705602, 0.159650791517521,
0.0348242189989637, 0.0281646799475863, 0.0489350455137094,
0.223425378682282, 0.0875367343584126, 0.213429924425437,
0.090234550749966, 0.263421061917751, 0.100235138692753,
0.287442972377688, 0.135947726986525, 0.198108859726075,
0.0753991294236146, 0.137931681297077, 0.0555891987037145,
0.127405616567832, 0.0794604259914568, 0.0993491206672376,
0.0365303317587859, 0.358853154653656, 0.226888289533179,
0.287138468201705, 0.256879207748294), ci = c(0.563415944919255,
0.332462066715199, 0.26971522480343, 0.123211505132525, 1.1733145846647,
0.647595643784969, 0.786234551289211, 0.189465554245211,
0.782074671929471, 0.544781614588516, 0.349654482635521,
0.327641747494367, 0.508080071600555, 0.110826207087643,
0.089632581638694, 0.155733154793995, 0.71103927089404, 0.278580956835532,
0.679229274424713, 0.287166612643164, 0.838323385234058,
0.318992946792351, 0.914771825423139, 0.432646341459985,
0.630470808679215, 0.23995368085579, 0.438960169525453, 0.176909640028318,
0.40546153371869, 0.252878539112781, 0.316173242000635, 0.116255819336536,
1.14203089616693, 0.722059798737006, 0.91380275723334, 0.817504285602766
)), .Names = c("Rot", "Rot.Herb", "variable", "N", "value",
"sd", "se", "ci"), row.names = c(NA, -36L), class = "data.frame")
and the code to graph
p <- ggplot(Shannon.long2, aes(x=Rot.Herb, y=value, fill=factor(variable)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_brewer(palette = "Set1")+
theme_bw() +
theme(panel.grid.major=element_blank()) +
facet_grid(~Rot, scales = "free_x", space="free_x")+
theme(legend.title=element_blank(),legend.text=element_text(size=20),legend.position="top")+
geom_errorbar(aes(ymin=value-se, ymax=value+se), size=0.5, width=.25,position=position_dodge(.9))+
xlab("\nTreatment") +
theme(axis.title = element_text(size=24,face="bold", vjust=4), axis.text.x = element_text(size=20,angle = 90, hjust = 1)) +
ylab("Shannon's H' and E'") +
theme(axis.title = element_text(size=24,face="bold", vjust=2), axis.text.y = element_text(size=20, color="black"))+
theme(strip.text.x = element_text(colour = "black", size = 20), strip.background = element_rect(fill = "white"))
produced graph (please don't mind the "2014" on the y-axis).
New code to annotate 2014, with help from eipi10
ann_text <- data.frame(x = "S4-conv",y = 1.75,lab = "2014", Rot.Herb=NA,
value=NA, variable=NA,
N=NA, sd=NA, se=NA, ci=NA,
Rot = factor("4-year",levels = c("2-year","3-year","4-year")))
I got an error saying Error: Discrete value supplied to continuous scale after I run p + geom_text(data = ann_text,label = "2014"). Please see what have been wrong with my code and data format. Thanks.
It turns out the issue is that when you include value=NA in ann_text it gets interpreted as logical (rather than numeric, which is its mode in Shannon.long2), causing the error because ggplot expects a numeric variable rather than a categorical one. Set value=NA_real_ (in addition to NA, R has class-specific missing value constants; see ?NA for more info) in ann_text to ensure value is interpreted as numeric and resolve the error. Or set value to any number, e.g., value=0.
In the example below, I've removed all of the theme and lab statements to shorten the code down to the essentials:
p = ggplot(Shannon.long2, aes(x=Rot.Herb, y=value, fill=factor(variable))) +
geom_bar(stat="identity", position="dodge") +
geom_errorbar(aes(ymin=value-se, ymax=value+se), size=0.5, width=.25,position=position_dodge(.9)) +
facet_grid(~Rot, scales = "free_x", space="free_x")
ann_text <- data.frame(x = "S4-conv", y = 1.75, lab = "2014", Rot.Herb=NA,
value=NA_real_, variable=NA)
p + geom_text(data = ann_text, aes(label=lab, x, y))
Note that you also need to feed x and y values to geom_text to provide the label location.
Another option would be to just use the same x and y variable names as in your original data frame, since ggplot already knows these names and has scaled the graph based on them. Now the only missing column we need to add is variable:
ann_text <- data.frame(Rot.Herb = "S4-conv", value = 1.75, lab = "2014", variable=NA)
p + geom_text(data = ann_text, aes(label=lab, Rot.Herb, value))
Thanks to combine stacked bars and dodged bars, I created the plot below using the data frame shown. But now, since the axis titles name the bars, how can I remove the legend elements other than for the one stacked bar? That is, can the legend show only the segments of the Big8 bar?
> dput(combo)
structure(list(firm = structure(c(12L, 1L, 11L, 13L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("Avg.", "Co", "Firm1",
"Firm2", "Firm3", "Firm4", "Firm5", "Firm6", "Firm7", "Firm8",
"Median", "Q1", "Q3"), class = "factor"), metric = structure(c(5L,
1L, 4L, 6L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Avg.",
"Big8", "Co", "Median", "Q1", "Q3"), class = "factor"), value = c(0.0012,
0.0065, 0.002, 0.0036, 0.0065, 0.000847004466666667, 0.000658907411111111,
0.0002466389, 8.41422555555556e-05, 8.19149222222222e-05, 7.97185555555556e-05,
7.82742555555556e-05, 7.56679888888889e-05), grp = structure(c(1L,
2L, 3L, 6L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("Q1",
"Avg.", "Median", "Co", "Big8", "Q3"), class = "factor")), .Names = c("firm",
"metric", "value", "grp"), row.names = c(NA, -13L), class = "data.frame")
Here is the plotting code.
ggplot(combo, aes(x=grp, y=value, fill=firm)) +
geom_bar(stat="identity") +
labs(x = "", y = "") +
theme(legend.position = "bottom") +
guides(fill = guide_legend(nrow = 2))
The plot, which ideally would have a smaller set of elements in the legend.
You can manually set the breaks for scale_fill_discrete:
library(ggplot2)
ggplot(combo, aes(x=grp, y=value, fill=firm)) +
geom_bar(stat="identity") +
labs(x = "", y = "") +
theme(legend.position = "bottom") +
guides(fill = guide_legend(nrow = 2)) +
scale_fill_discrete(breaks = combo$firm[combo$metric=="Big8"])
I'm not 100% sure which labels you want to keep, but a manually entered vector, combo$firm and combo$metric will all work.
a <- structure(list(
X1 = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L),
.Label = c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8"), class = "factor"),
X2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
.Label = c("A", "B", "C"), class = "factor"),
value = c(0.03508924, 0.03054929, 0.03820896, 0.18207091, 0.25985142, 0.03909991, 0.03079736,
0.41436334, 0.02957787, 0.03113289, 0.03239794, 0.1691519, 0.16368845, 0.0287741, 0.02443448,
0.33474091, 0.03283068, 0.02668754, 0.03597605, 0.17098721, 0.23048966, 0.0385765, 0.02597068, 0.36917749),
se = c(0.003064016, 0.003189752, 0.003301929, 0.006415592, 0.00825635, 0.003479607,
0.003195332, 0.008754099, 0.005594554, 0.006840959, 0.006098068, 0.012790908, 0.014176414,
0.006249045, 0.005659445, 0.018284739, 0.005051873, 0.004719352, 0.005487301, 0.011454206,
0.01290797, 0.005884275, 0.004738851, 0.014075813)),
.Names = c("X1", "X2", "value", "se"), class = "data.frame", row.names = c(NA, -24L))
I'm plotting the above data (kept in dataset "a"), and I can't get the confidence interals to sit in the middle of the group chart.My attempts until now have only managed to put lines on the side of each bar, not in the middle like in the geom_errorbar helpfile.I've tried to manipulate the dodge parameters but it only made it worse.
The chart needs to stay flipped over and in the code below I used geom_linerange but geom_errorbar would be even better.
Another thing I haven't quite managed to do is to change the scale into whole numbers (without muliplying the original table ).
I've used the code below on a<-a[1:16,] (the first two groups).
When I use the same code on the full table I get even worse results with the confidence intervals.
Would anyone be able to help? Many thanks in advance.
limits <- aes(ymax = value + se, ymin=value - se)
p<-ggplot(data = a, aes(x = X1, y =value))+
geom_bar(aes(fill=X2),position = "dodge") +
scale_x_discrete(name="")+
scale_fill_manual(values=c("grey80","black","red"))+
scale_y_continuous(name="%")+
theme(axis.text.y = element_text(face='bold'),
legend.position ="top",
legend.title=element_blank())+
coord_flip()
p + geom_linerange(limits)
Try this ,
p<-ggplot(data = df, aes(x = X1, y =value,fill= X2))+
geom_bar(position=position_dodge()) +
geom_errorbar(aes(ymax = value + 2* se, ymin=value,colour = X2),position=position_dodge(.9))
p <- p + scale_x_discrete(name="")+
scale_fill_manual(values=c("grey80","black","red"))+
scale_y_continuous(name="%")+
theme(axis.text.y = element_text(face='bold'),
legend.position ="top",
legend.title=element_blank())+
coord_flip()
This code throws an error and I can't figure out why...
library( plyr )
library( ggplot2 )
library( grid )
library( proto )
# the master dataframe
myDF = structure(list(Agg52WkPrceRange = c(2L, 2L, 2L, 2L, 2L, 2L, 3L,
5L, 3L, 5L, 3L, 5L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 3L, 4L, 3L, 4L, 4L, 4L, 4L), OfResidualPntReturn52CWk = c(0.201477324,
0.22350293, 0.248388728, 0.173871456, 0.201090654, 0.170666183,
0.18681883, 0.178840521, 0.159744891, 0.129811042, 0.13209741,
0.114989407, 0.128347625, 0.100945992, 0.057017002, 0.081123718,
0.018900252, 0.021784814, 0.081931816, 0.059067844, 0.095879746,
0.038977508, 0.078895248, 0.051344317, 0.077515295, 0.011776214,
0.099216033, 0.054714439, 0.022879951, -0.079558277, -0.050889584,
-0.006934821, -0.003407085, 0.032545474, -0.003387139, 0.030418511,
0.053942523, 0.051398537, 0.073482355, 0.087963039, 0.079555591,
-0.040490418, -0.130754663, -0.125826649, -0.141766316, -0.150708718,
-0.171906882, -0.174623614, -0.212945405, -0.174480554), IndependentVariableBinned = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 3L, 10L, 3L, 10L, 4L, 10L, 4L, 2L, 4L, 4L,
4L, 5L, 2L, 2L, 2L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 3L, 6L, 6L, 6L,
6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 8L, 9L, 9L, 9L, 9L,
10L, 10L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10"), class = "factor")), .Names = c("Agg52WkPrceRange",
"OfResidualPntReturn52CWk", "IndependentVariableBinned"), row.names = 28653:28702, class = "data.frame")
# secondary data frame
meansByIndependentVariableBin = ddply( myDF , .( IndependentVariableBinned ) , function( df ) mean( df[[ "OfResidualPntReturn52CWk" ]] ) )
# construct the plot
thePlot = ggplot( myDF , aes_string( x = "IndependentVariableBinned" , y = "OfResidualPntReturn52CWk" ) )
thePlot = thePlot + geom_point( data = meansByIndependentVariableBin , aes( x = IndependentVariableBinned , y = V1 ) )
thePlot = thePlot + geom_line( data = meansByIndependentVariableBin , aes( x = IndependentVariableBinned , y = V1 , group = 1 ) )
thePlot = thePlot + geom_ribbon( data = meansByIndependentVariableBin , aes( group = 1 , x = IndependentVariableBinned , ymin = V1 - 1 , ymax = V1 + 1 ) )
# print - error!
print( thePlot )
I've tried with/without group=1. The error is:
Error in eval(expr, envir, enclos) :
object 'OfRelStrength52CWk' not found
but not sure how that is relevant?? I must be missing something obvious. Take away the last geom (ribbon) and it plots just fine!
There is no bug in geom_ribbon. Your error is because you are defining y = OfResidualPntReturn52CWk in your ggplot call as a result of which geom_ribbon is looking for it. Since you are passing a different data frame to geom_ribbon, there is confusion and hence an error. From your plotting call, although you are using y = OfResidualPntReturn52CWk in your ggplot call, there is no layer where you are calling it, and hence it is immaterial to the plot.
Here is how to do it correctly (if I am understanding what you intend to do in this plot)
MIVB = meansByIndependentVariableBin
thePlot = ggplot(myDF , aes(x = IndependentVariableBinned)) +
geom_point(aes(y = OfResidualPntReturn52CWk)) +
geom_point(data = MIVB, aes(y = V1), colour = 'red') +
geom_line(data = MIVB , aes(y = V1, group = 1), colour = 'red') +
geom_ribbon(data = MIVB, aes(group = 1, ymin = V1 - 1 , ymax = V1 + 1),
alpha = 0.2)
Here is the output it produces
Here is another way to do it, without computing the means in advance. Also I have used mean +- standard errors in the ribbon as I find the choice of +- 1 to be arbitrary
myDF$IndependentVariableBinned = as.numeric(myDF$IndependentVariableBinned)
thePlot = ggplot(myDF , aes(x = IndependentVariableBinned, y =
OfResidualPntReturn52CWk)) +
geom_point() +
geom_point(stat = 'summary', fun.y = 'mean', colour = 'red') +
geom_line(stat = 'summary', fun.y = 'mean', colour = 'red') +
geom_ribbon(stat = 'summary', fun.data = 'mean_se', alpha = 0.2)
This produces
#Ramnath is spot on. Your initial call to ggplot is not needed as all of the layers you are plotting come from the summarized data.frame made by ddply(). You can also simplify your call to ddply() by using the summarize function:
meansByIndependentVariableBin2 = ddply( myDF , .( IndependentVariableBinned )
, summarize, means = mean(OfResidualPntReturn52CWk) )
I would then plot your graph as such:
ggplot(meansByIndependentVariableBin2, aes(x = as.numeric(IndependentVariableBinned), y = means)) +
geom_ribbon(aes(ymin = (means - 1), ymax = (means + 1)), alpha = .4) +
geom_point() +
geom_line()
Is that what you had in mind? I added an alpha to the ribbon layer so we can see the lines and points clearly.