I have factors on x-axis and order those factor levels in a way that's intuitive to plot with ggplot. It works fine. However, when I use the subset command within ggplot, it re-orders my original sequence of factors.
Is it possible to do subsetting within ggplot and preserve the order of factor levels?
Here is the data and code:
library(ggplot2)
library(plyr)
dat <- structure(list(SubjectID = structure(c(12L, 4L, 6L, 7L, 12L,
7L, 5L, 8L, 14L, 1L, 15L, 1L, 7L, 1L, 7L, 5L, 4L, 2L, 9L, 6L,
7L, 13L, 12L, 2L, 15L, 3L, 5L, 13L, 13L, 10L, 7L, 8L, 10L, 10L,
1L, 10L, 12L, 7L, 6L, 10L), .Label = c("s001", "s002", "s003",
"s004", "s005", "s006", "s007", "s008", "s009", "s010", "s011",
"s012", "s013", "s014", "s015"), class = "factor"), Parameter = structure(c(7L,
3L, 5L, 3L, 6L, 4L, 6L, 7L, 7L, 4L, 7L, 12L, 8L, 11L, 1L, 4L,
3L, 4L, 6L, 4L, 6L, 6L, 12L, 5L, 12L, 1L, 7L, 13L, 11L, 1L, 4L,
1L, 6L, 13L, 10L, 10L, 10L, 13L, 5L, 8L), .Label = c("(Intercept)",
"c0.008", "c0.01", "c0.015", "c0.02", "c0.03", "PrevCorr1", "PrevFail1",
"c0.025", "c0.004", "c0.006", "c0.009", "c0.012", "c0.005"), class = "factor"),
Weight = c(0.0352725634087837, 1.45546697427904, 2.29457594510248,
0.479548914792514, 6.39680995359234, 1.48829600339586, 2.69253113220079,
-0.171219812386926, -0.453625394224277, 1.43732884325816,
0.742416863226952, 0.256935761466245, -0.29401087047524,
0.34653127811481, 0.33120592543102, 2.79213318878505, 2.47047299128637,
1.022450287681, 6.92891513416868, 0.648982326396105, 6.58336282626389,
6.40600461501379, 1.80062359655524, 3.86658202530889, 1.23833324887194,
-0.026560261876089, 0.121670468861011, 0.9290824087063, 0.349104382483186,
0.24722583823016, 1.82473621255801, -0.712668411699556, 6.51789901685784,
0.74682257127003, 0.0755807984938072, 0.131705709322157,
0.246465073382095, 0.876279316248929, 1.83442709571662, -0.579086982613267
)), .Names = c("SubjectID", "Parameter", "Weight"), row.names = c(2924L,
784L, 1537L, 1663L, 3138L, 1744L, 1266L, 1996L, 3548L, 86L, 3692L,
230L, 1613L, 213L, 1627L, 1024L, 832L, 384L, 2418L, 1568L, 1714L,
3362L, 3200L, 497L, 3632L, 683L, 1020L, 3281L, 3263L, 2779L,
1632L, 1995L, 2674L, 2753L, 312L, 2638L, 3198L, 1809L, 1569L,
2589L), class = "data.frame")
## Sort factors in the order that will make it intuitive to read the plot
## It goes, "(Intercept), "PrevCorr1", "PrevFail1", "c0.004", "c0.006", etc.
paramNames <- levels(dat$Parameter)
contrastNames <- sort(paramNames[grep("c0",paramNames)])
biasNames <- paramNames[!paramNames %in% contrastNames]
dat$Parameter <- factor(dat$Parameter, levels=c(biasNames, contrastNames))
## Add grouping parameter that will be used to plot different weights in different colors
dat$plotColor <-"Contrast"
dat$plotColor[dat$Parameter=="(Intercept)"] <- "Intercept"
dat$plotColor[grep("PrevCorr", dat$Parameter)] <- "PrevSuccess"
dat$plotColor[grep("PrevFail", dat$Parameter)] <- "PrevFail"
p <- ggplot(dat, aes(x=Parameter, y=Weight)) +
# The following command, which adds geom_line to data points of the graph, changes the order of levels
# If I uncomment the next line, the factor level order goes wrong.
#geom_line(subset=.(plotColor=="Contrast"), aes(group=1), stat="summary", fun.y="mean", color="grey50", size=1) +
geom_point(aes(group=Parameter, color=plotColor), size=5, stat="summary", fun.y="mean") +
geom_point(aes(group=Parameter), size=2.5, color="white", stat="summary", fun.y="mean") +
theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1))
print(p)
Here is the plot when geom line is commented
And here is what happens when geom_line is uncommented
If you switch the order in which you plot the objects, the problem disappears:
p <- ggplot(dat, aes(x=Parameter, y=Weight)) +
# The following command, which adds geom_line to data points of the graph, changes the order of levels
# If I uncomment the next line, the factor level order goes wrong.
geom_point(aes(group=Parameter, color=plotColor), size=5, stat="summary", fun.y="mean") +
geom_line(subset = .(plotColor == "Contrast"), aes(group=1), stat="summary", fun.y="mean", color="grey50", size=1) +
geom_point(aes(group=Parameter), size=2.5, color="white", stat="summary", fun.y="mean") +
theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1))
print(p)
I think the problem lies in plotting the subsetted data first, it ditches the levels for the original data, and when you add back in the points, it doesn't know where to put them. When you plot with the original data first, it maintains the levels. I'm not sure though, you might have to take my word on it.
Related
I am trying to reproduce a graph with ggplot2. However, I am stucking at the point to add 2nd x-axis (or some Workaround to set the Information above the graph, that Looks like a 2nd x-axis):
Some ideas how I can replicate the above x-axis (the values from 8 to 1) with ggplot2 ? Since items like sec.axis are only can transform the "main x-axis". But I need to specify a new variable to this axis.
dput of my data:
structure(list(df = c(0L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 8L), lambda = c(1.07439006452315, 0.981356323478726,
0.888322582434298, 0.79528884138987, 0.702255100345442, 0.609221359301013,
0.516187618256585, 0.423153877212157, 0.330120136167729, 0.237086395123301,
0.144052654078873, 0.0510189130344448, -0.0420148280099833, -0.135048569054412,
-0.22808231009884, -0.321116051143268, -0.414149792187696, -0.507183533232124,
-0.600217274276552, -0.69325101532098, -0.786284756365408, -0.879318497409836,
-0.972352238454264, -1.06538597949869, -1.15841972054312, -1.25145346158755,
-1.34448720263198, -1.4375209436764, -1.53055468472083, -1.62358842576526,
-1.71662216680969, -1.80965590785412, -1.90268964889855, -1.99572338994297,
-2.0887571309874, -2.18179087203183, -2.27482461307626, -2.36785835412069,
-2.46089209516511, -2.55392583620954, -2.64695957725397, -2.7399933182984,
-2.83302705934283, -2.92606080038725, -3.01909454143168, -3.11212828247611,
-3.20516202352054, -3.29819576456497, -3.3912295056094, -3.48426324665382,
-3.57729698769825, -3.67033072874268, -3.76336446978711, -3.85639821083154,
-3.94943195187596, -4.04246569292039, -4.13549943396482, -4.22853317500925,
-4.32156691605368, -4.4146006570981, -4.50763439814253), cv = c(19.3204817596425,
19.1213671183882, 18.8773619995934, 18.6774892277955, 18.4835208555261,
18.2897100568571, 18.0708770943675, 17.8323381067138, 17.6066859085164,
17.4107724812813, 17.2471326487745, 17.1112451165908, 16.997104959334,
16.8967562285072, 16.8077689802323, 16.7318984219901, 16.664519563401,
16.6099449837552, 16.5672421336285, 16.5338375435914, 16.5069624936659,
16.4847328120999, 16.4657265866304, 16.448486821364, 16.4320103091219,
16.4170983362393, 16.4064896391581, 16.4015230468903, 16.3998890848963,
16.4005528671873, 16.4019304286688, 16.4036216529537, 16.4067528891749,
16.4100161601128, 16.4132605899296, 16.4165395864851, 16.4199042281655,
16.4233781838506, 16.426769853484, 16.4296843196208, 16.4324964410732,
16.4350449454725, 16.4374759407496, 16.4398124053385, 16.4420255031771,
16.4441043985724, 16.4460493459917, 16.4478923998711, 16.4496464721569,
16.4512671846374, 16.452755372096, 16.4541355260057, 16.4554098004139,
16.4565844426538, 16.4576660291829, 16.4586609088677, 16.4595751922309,
16.4604147172934, 16.4611850278838, 16.4618913616671, 16.4625380587224
), cvup = c(101.889140389973, 100.915687017117, 99.5338387199812,
98.3978656187505, 97.3342038792051, 96.2945440744101, 95.0979526365153,
93.8258765150077, 92.6520846572708, 91.6517399890698, 90.8248921229177,
90.1446520206297, 89.5760195678867, 89.0768732000128, 88.6326431812006,
88.2580379443768, 87.9285651379465, 87.6667164658585, 87.46798180289,
87.3169642537599, 87.1985439655234, 87.1045127640699, 87.0287736938534,
86.9594484427788, 86.8898849898434, 86.8259652241002, 86.7845239044576,
86.774810006682, 86.7837821754338, 86.8070951335521, 86.8328275771822,
86.8596790387766, 86.8928184061523, 86.9251464730962, 86.9562034124854,
86.9862365456731, 87.0154823680639, 87.0440872246828, 87.0713299919281,
87.0954330102414, 87.1183027814994, 87.1395653522666, 87.1596335907399,
87.1785452291483, 87.1962104973293, 87.2126312068382, 87.227857678753,
87.242092892569, 87.2554416547414, 87.2677222300107, 87.2789778575492,
87.289357594465, 87.2989009052576, 87.3076669365146, 87.3157130025923,
87.3230930779294, 87.3298580386636, 87.3360556493216, 87.3417305999759,
87.3469245807757, 87.3516741645461), cvund = c(91.3156772064514,
90.2979841667648, 89.239781275953, 88.3770266592041, 87.5010046760554,
86.6025564941611, 85.6108183071601, 84.4975045521302, 83.4147744278928,
82.4559848237428, 81.6464343648275, 80.9677991452778, 80.3950300254529,
79.8906890850589, 79.4450466211225, 79.0609462755239, 78.7166304960634,
78.4327333716939, 78.2044395333952, 78.0214111821543, 77.8710809711359,
77.7428153569291, 77.6284921724508, 77.525419770861, 77.4302181013752,
77.3450181382926, 77.2803724871237, 77.2404204622206, 77.215108673529,
77.1984335383206, 77.1864767095062, 77.1765374907606, 77.1747104855969,
77.1750151280321, 77.1764024868109, 77.1791593191782, 77.1835599135915,
77.1896946138232, 77.196368542912, 77.2014101859665, 77.2066616292326,
77.2108841024579, 77.2151258167557, 77.2195788242368, 77.2240445344412,
77.2284127788855, 77.2326357811641, 77.2368311061419, 77.2410230668273,
77.2449496163633, 77.2485758634109, 77.251997665592, 77.255197098881,
77.2581774900237, 77.2609472892365, 77.2635160107474, 77.265893883645,
77.2680915236126, 77.2701196788622, 77.2719890358956, 77.2737064226781
)), row.names = c("s0", "s1", "s2", "s3", "s4", "s5", "s6", "s7",
"s8", "s9", "s10", "s11", "s12", "s13", "s14", "s15", "s16",
"s17", "s18", "s19", "s20", "s21", "s22", "s23", "s24", "s25",
"s26", "s27", "s28", "s29", "s30", "s31", "s32", "s33", "s34",
"s35", "s36", "s37", "s38", "s39", "s40", "s41", "s42", "s43",
"s44", "s45", "s46", "s47", "s48", "s49", "s50", "s51", "s52",
"s53", "s54", "s55", "s56", "s57", "s58", "s59", "s60"), class = "data.frame")
the short Version of my Code (abstracted by axis names and more):
p <- ggplot(data=data_cv, aes(x=lambda, y=cv)) + geom_point(colour = "blue")
p2 <- p + geom_errorbar(data=data_cv, mapping=aes(ymin=cvund, ymax=cvup), width=0.2, size=0.1, color="black")
You can create the extra axis as a separate plot, then grid arrange it together with the main plot:
library(cowplot)
data_cv=data.frame(x = seq(-5,5,0.1), y = exp(seq(-5,5,0.1)))
p1 = ggplot(data_cv, aes(x,y)) + geom_line()
p2 = ggplot(data_cv, aes(x,y)) + geom_blank() +
scale_x_continuous(breaks = seq(-5,5,length.out = 10),
labels = c(8,8,8,8,8,7,6,5,4,3)) +
theme(axis.title.y = element_text(color = NA),
axis.title.x = element_blank(),
axis.text.y = element_text(color = NA),
axis.ticks.y = element_line(color = NA),
axis.line.y = element_line(color = NA))
plot_grid(p2,p1,align = 'h', ncol = 1, rel_heights = c(1,5))
I want to estimate an "Average curve" from curves of multiple trials. I have done this before using approx() , but then I had a fixed set of x-axis values against which y was measured.
In this dataset, values are mixed for both x and y (i.e., there are no fixed values of x for which y has been measured). Instead, different set of x values for every trial.
Is there a way to average curves in these situations (with standard errors)?
Alternatively :
How would you extract y-values (for a fixed set of x-values) from different curves and construct a new dataframe ?
I have provided a sample dataset (melted) - and the code for plotting the curves for individual trials.
P1, P2,P3,P4, P5 the names/ID for the individual trials
> dput(head(dat,74))
structure(list(ID = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L), .Label = c("LCRA_P1", "LCRA_P2",
"LCRA_P3", "LCRA_P4", "LCRA_P5", "LCRA_P6", "P1", "P2", "P3",
"P4", "P5"), class = "factor"), Time = c(170L, 452L, 572L, 692L,
812L, 932L, 1052L, 1172L, 1292L, 1412L, 1532L, 1652L, 1772L,
1892L, 2012L, 2132L, 2252L, 54L, 290L, 410L, 530L, 650L, 770L,
890L, 1010L, 1130L, 1250L, 1370L, 1490L, 1610L, 1730L, 1850L,
1970L, 115L, 235L, 355L, 475L, 595L, 715L, 835L, 955L, 1075L,
1195L, 1315L, 1435L, 1555L, 1675L, 1795L, 135L, 201L, 321L, 441L,
561L, 681L, 801L, 921L, 1041L, 1161L, 1281L, 1401L, 100L, 251L,
371L, 431L, 491L, 611L, 731L, 791L, 851L, 911L, 971L, 1031L,
1091L, 1151L), I = c(154.5066034, 138.3819058, 104.8425346, 61.6283449,
40.34374398, 35.18384073, 29.37894957, 40.34374398, 44.85865933,
27.44398585, 31.9589012, 41.6337198, 54.53347792, 64.20829652,
70.65817559, 66.78824815, 66.78824815, 154.5066034, 90.00781278,
73.88311512, 62.2733328, 61.6283449, 57.75841746, 53.24350211,
48.08359886, 55.17846583, 51.30853839, 42.92369561, 53.24350211,
50.66355049, 54.53347792, 38.40878026, 54.53347792, 154.5066034,
73.88311512, 62.2733328, 61.6283449, 57.75841746, 53.24350211,
48.08359886, 55.17846583, 51.30853839, 42.92369561, 38.40878026,
54.53347792, 37.79284177, 35.21289014, 39.08281758, 154.5066034,
129.997063, 84.84790953, 51.30853839, 40.98873189, 33.24887701,
29.37894957, 27.44398585, 33.24887701, 33.89386492, 31.9589012,
31.9589012, 135.1569662, 85.49289744, 48.08359886, 48.08359886,
22.2840826, 27.44398585, 49.37357467, 51.30853839, 31.9589012,
28.73396167, 23.57405841, 21.63909469, 9.384324471, 25.50902213
)), .Names = c("ID", "Time", "I"), row.names = c(NA, 74L), class = "data.frame")
(The code for plotting is included)
> ggplot(dat,aes(x=Time, y = I, colour=ID)+
geom_point()+
labs(x="Time (Seconds)", y ="Infiltration (mm/hour)")+
scale_x_continuous(breaks=seq(0,2500,100))+
scale_y_continuous(breaks=seq(0,160,10))+
geom_line(aes(group=ID))
To average, I used this :
ggplot(df2,aes(x=Time, y=I))+
stat_summary(fun.data="mean_se",mult=1, geom="smooth")
The result (the figure below) is not making any sense.
I'm still not sure what's the exact output you want, but here are a few simple examples you can adapt. I think you still had the color or group set in your aes when you made the geom_smooth, which is why you have lots of lines. If you want lines or points or any other geom for the different IDs, but then want a single smoothing line that averages all the IDs, you need to separate what gets a color or group and what doesn't.
Study up on the arguments to stat_smooth—there's a lot you can do to specify the curve it draws, including the method and formula, and arguments depending on the method. Note (from the output geom_smooth gives) that the default for a small number of observations is a loess curve, which might be the type of averaging you're looking for.
Here are examples of where you might want to take this:
library(ggplot2)
ggplot(df, aes(x = Time, y = I)) +
geom_point(aes(color = ID)) +
geom_smooth()
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(df, aes(x = Time, y = I)) +
geom_point(aes(color = ID)) +
geom_smooth(se = F, method = lm)
ggplot(df, aes(x = Time, y = I)) +
geom_line(aes(group = ID), alpha = 0.5) +
geom_smooth(size = 0.8, se = F, span = 0.2)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2018-06-14 by the reprex package (v0.2.0).
I am new to making joy plots in R. Below is a plot I made with some simulated data. I'm confused though, because my data variable foo contains no negative values, but the resulting plot would indicate so:
library(ggjoy)
p <- ggplot(results, aes(foo, bar)) + geom_joy()
The data is:
results <- structure(list(foo = c(462.834004209936, 460.834004209936, 73.0340042099357,
106.134004209936, 165.634004209936, 200.134004209936, 490.434004209936,
157.334004209936, 460.834004209936, 131.434004209936, 269.934004209936,
457.534004209936, 459.634004209936, 475.534004209936, 180.034004209936,
142.134004209936, 294.734004209936, 419.534004209936, 279.834004209936,
280.734004209936, 448.034004209936, 206.334004209936, 283.134004209936,
243.034004209936, 530.334004209936, 396.934004209936, 49.8340042099357,
136.134004209936, 210.234004209936, 59.0340042099357, 269.834004209936,
123.034004209936, 385.434004209936, 78.7340042099357, 226.434004209936,
391.034004209936, 219.434004209936, 338.134004209936, 87.0340042099357,
434.234004209936, 123.034004209936, 75.7340042099357, 247.234004209936,
192.334004209936, 146.234004209936, 259.334004209936, 72.5340042099357,
110.934004209936, 287.134004209936, 122.634004209936, 197.834004209936,
379.334004209936), bar = structure(c(3L, 8L, 1L, 5L, 10L, 8L,
7L, 9L, 8L, 10L, 9L, 8L, 8L, 9L, 2L, 3L, 5L, 6L, 9L, 1L, 3L,
5L, 6L, 8L, 7L, 9L, 2L, 3L, 2L, 2L, 3L, 1L, 5L, 10L, 4L, 7L,
5L, 6L, 8L, 8L, 1L, 8L, 8L, 9L, 5L, 6L, 5L, 6L, 7L, 9L, 1L, 9L
), .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
), class = "factor")), .Names = c("foo", "bar"), row.names = c(NA,
-52L), class = "data.frame")
I think it may have to do with stat:
Stats
The default stat used with geom_joy is stat_joy. However, it may not do exactly what you want it to do, and there are other stats that can be used that may be better for your respective application.
First, stat_joy estimates the data range and bandwidth for the density estimation from the entire data at once, rather than from each individual group of data. This choice makes joyplots look more uniform, but the density estimates can in some cases look quite different from what you would get from geom_density or stat_density. This problem can be remidied by using stat_density with geom_joy. This works just fine, we just need to make sure that we map the calculated density onto the height aesthetic.
Function geom_joy() estimates density function which is not bounded by min/max value of your data. Because you've supplied only a few data points, ranges of densities are too wide. You can see it here:
ggplot(results, aes(foo, bar)) +
geom_point() +
geom_joy(alpha=.3)
I am calling the ggplot function
ggplot(data,aes(x,y,fill=category)+geom_bar(stat="identity")
The result is a barplot with bars filled by various colours corresponding to category. However the ordering of the colours is not consistent from bar to bar. Say there is pink, green and blue. Some bars go pink,green,blue from bottom to top and some go green,pink,blue. I don't see any obvious pattern.
How are these orderings chosen? How can I change it? At the very least, how can I make ggplot choose a consistent ordering?
The class of (x,y and category) are (integer,numeric and factor) respectively. If I make category an ordered factor, it does not change this behavior.
Anyone know how to fix this?
Reproducible example:
dput(data)
structure(list(mon = c(9L, 10L, 11L, 10L, 8L, 7L, 7L, 11L, 9L,
10L, 12L, 11L, 7L, 12L, 8L, 12L, 9L, 7L, 9L, 10L, 10L, 8L, 12L,
7L, 11L, 10L, 8L, 7L, 11L, 12L, 12L, 9L, 9L, 7L, 7L, 12L, 12L,
9L, 9L, 8L), gclass = structure(c(9L, 1L, 8L, 6L, 4L, 4L, 3L,
6L, 2L, 4L, 1L, 1L, 5L, 7L, 1L, 6L, 8L, 6L, 4L, 7L, 8L, 7L, 9L,
8L, 3L, 5L, 9L, 2L, 7L, 3L, 5L, 5L, 7L, 7L, 9L, 2L, 4L, 1L, 3L,
8L), .Label = c("Down-Down", "Down-Stable", "Down-Up", "Stable-Down",
"Stable-Stable", "Stable-Up", "Up-Down", "Up-Stable", "Up-Up"
), class = c("ordered", "factor")), NG = c(222614.67, 9998.17,
351162.2, 37357.95, 4140.48, 1878.57, 553.86, 40012.25, 766.52,
15733.36, 90676.2, 45000.29, 0, 375699.84, 2424.21, 93094.21,
120547.69, 291.33, 1536.38, 167352.21, 160347.01, 26851.47, 725689.06,
4500.55, 10644.54, 75132.98, 42676.41, 267.65, 392277.64, 33854.26,
384754.67, 7195.93, 88974.2, 20665.79, 7185.69, 45059.64, 60576.96,
3564.53, 1262.39, 9394.15)), .Names = c("mon", "gclass", "NG"
), row.names = c(NA, -40L), class = "data.frame")
ggplot(data,aes(mon,NG,fill=gclass))+geom_bar(stat="identity")
Starting in ggplot2_2.0.0, the order aesthetic is no longer available. To get a graph with the stacks ordered by fill color, you can simply order the dataset by the grouping variable you want to order by.
I often use arrange from dplyr for this. Here I'm ordering the dataset by the fill factor within the ggplot call rather than creating an ordered dataset but either will work fine.
library(dplyr)
ggplot(arrange(data, gclass), aes(mon, NG, fill = gclass)) +
geom_bar(stat = "identity")
This is easily done in base R, of course, using the classic order with the extract brackets:
ggplot(data[order(data$gclass), ], aes(mon, NG, fill = gclass)) +
geom_bar(stat = "identity")
With the resulting plot in both cases now in the desired order:
ggplot2_2.2.0 update
In ggplot_2.2.0, fill order is based on the order of the factor levels. The default order will plot the first level at the top of the stack instead of the bottom.
If you want the first level at the bottom of the stack you can use reverse = TRUE in position_stack. Note you can also use geom_col as shortcut for geom_bar(stat = "identity").
ggplot(data, aes(mon, NG, fill = gclass)) +
geom_col(position = position_stack(reverse = TRUE))
You need to specify the order aesthetic as well.
ggplot(data,aes(mon,NG,fill=gclass,order=gclass))+
geom_bar(stat="identity")
This may or may not be a bug.
To order, you must use the levels parameter and inform the order. Like this:
data$gclass
(data$gclass2 <- factor(data$gclass,levels=sample(levels(data$gclass)))) # Look the difference in the factors order
ggplot(data,aes(mon,NG,fill=gclass2))+geom_bar(stat="identity")
You can change the colour using the scale_fill_ functions. For example:
ggplot(dd,aes(mon,NG,fill=gclass)) +
geom_bar(stat="identity") +
scale_fill_brewer(palette="blues")
To get consistent ordering in the bars, then you need to order the data frame:
dd = dd[with(dd, order(gclass, -NG)), ]
In order to change the ordering of legend, alter the gclass factor. So something like:
dd$gclass= factor(dd$gclass,levels=sort(levels(dd$gclass), TRUE))
Since this exchange shows up first for "factor fill order", I will add one more solution, what I believe to be a bit more straight forward, and doesn't require altering your underlying data.
ggplot(data,aes(x,y,fill=factor(category, levels = c("Down-Down", "Down-Stable", "Down-Up", "Stable-Down", "Stable-Stable", "Stable-Down", "Up-Down", "Up-Stable", "Up-Up"))) +
geom_col(position = position_stack(reverse = FALSE))
Or as I prefer, I first create a variable vector to simplify coding later and make it more easily editable:
v_factor_levels <- c("Down-Down", "Down-Stable", "Down-Up", "Stable-Down", "Stable-Stable", "Stable-Down", "Up-Down", "Up-Stable", "Up-Up")
ggplot(data,aes(x,y,fill=factor(category, levels = v_factor_levels)) +
geom_col(position = position_stack(reverse = FALSE))
You don't need the reverse position element within geom_col(), I keep these as a reminder in case I want to reverse, but you could further simplify by eliminating that.
Building on #aosmith 's answer, another way to order the bars, that I found slightly more intuitive is:
ggplot(data, aes(x=mon, y=reorder(NG,gclass), fill = gclass)) +
geom_bar(stat = "identity")
The beauty of the reorder function from the base stats package is that you can apply it in the reorder(based_on_dimension, y, function) wherein y is ordered based_on_dimension with a function like sum, mean, etc.
My data set:
structure(list(Site = c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L, 6L), Average.worm.weight..g. = c(0.1934,
0.249, 0.263, 0.262, 0.4186, 0.204, 0.311, 0.481, 0.326, 0.657,
0.347, 0.311, 0.239, 0.4156, 0.31, 0.3136, 0.4033, 0.302, 0.277
), Average.total.immune.cell.count = structure(c(8L, 16L, 11L,
12L, 10L, 1L, 4L, 15L, 4L, 3L, 17L, 13L, 18L, 7L, 5L, 6L, 9L,
14L, 2L), .Label = c("0", "168750", "18650000", "200,000", "21,600,000",
"226666.6", "22683333.33", "2533333.33", "283333.333", "291666.6",
"335833.3", "435800", "474816666.7", "500000", "6450000", "729166.667",
"7433333.3", "9916667"), class = "factor"), Average.eleocyte.number = structure(c(2L,
5L, 14L, 10L, 1L, 1L, 6L, 1L, 6L, 7L, 1L, 9L, 15L, 8L, 12L, 3L,
11L, 13L, 4L), .Label = c("0", "1266666.67", "153333.3", "168740",
"17", "200,000", "2266666.667", "22683333.33", "23116666.67",
"264000", "283333.333", "442", "500000", "7.3", "9916667"), class = "factor")), .Names = c("Site",
"Average.worm.weight..g.", "Average.total.immune.cell.count",
"Average.eleocyte.number"), class = "data.frame", row.names = c(NA,
-19L))
This is my R script so far:
Plotting multiple data series on a graph
y1<-dframe1$"Average.total.immune.cell.count"
y2<-dframe1$"Average.eleocyte.number"
x<-dframe1$"Average.worm.weight..g."
plot.default(y1~x,type="p" )
points(y2~x)
I am trying to add to y series to the same scatterplot and I am struggling to do so, I want to have different symbols for the points so as to tell apart the two different data series. Also I would like the axes to meet on the bottom left hand side and would appreciate being informed as to how I can do that? I would also like the y axis to be in standard form, but do not know how to get R to do that.
Best regards.
K.
So this is an object lesson is getting your data in the correct format to begin with. Your numbers have commas, which R does not like. Hence the numbers get converted to character and imported as factors (which your structure(...) clearly shows. You need to fix that, or better yet get rid of the commas prior to exporting.
Something like this will work
colnames(dframe) <- c("Site","x","y1","y2")
dframe$y1 <- as.numeric(as.character(gsub(",","",dframe$y1,fixed=TRUE)))
dframe$y2 <- as.numeric(as.character(gsub(",","",dframe$y2,fixed=TRUE)))
plot(y1~x,dframe, col="red", pch=20)
points(y2~x,dframe, col="blue", pch=20)
But there are additional problems. One of the numbers (in row 12) is a factor of 10 larger than all the others, so the plot above is not very informative. It's hard to know if this is a data input error, or a genuine outlier in your data.
EDIT: Response to OP's comment
dframe <- dframe[-12,] # remove row 12
dframe <- dframe[order(dframe$x),] # order by increasing x
plot(y1~x,dframe, col="red", pch=20, type="b")
points(y2~x,dframe, col="blue", pch=20, type="b")
legend("topleft",legend=c("y1","y2"),col=c("red","blue"),pch=20)