Fitting gaussian to data geom_point in ggplot2 - r

I have the following data set
structure(list(Collimator = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("n", "y"), class = "factor"), angle = c(0L,
15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L, 180L,
0L, 15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L,
180L), X1 = c(2099L, 11070L, 17273L, 21374L, 23555L, 23952L,
23811L, 21908L, 19747L, 17561L, 12668L, 6008L, 362L, 53L, 21L,
36L, 1418L, 6506L, 10922L, 12239L, 8727L, 4424L, 314L, 38L, 21L,
50L), X2 = c(2126L, 10934L, 17361L, 21301L, 23101L, 23968L, 23923L,
21940L, 19777L, 17458L, 12881L, 6051L, 323L, 40L, 34L, 46L, 1352L,
6569L, 10880L, 12534L, 8956L, 4418L, 344L, 58L, 24L, 68L), X3 = c(2074L,
11109L, 17377L, 21399L, 23159L, 23861L, 23739L, 21910L, 20088L,
17445L, 12733L, 6046L, 317L, 45L, 26L, 46L, 1432L, 6495L, 10862L,
12300L, 8720L, 4343L, 343L, 38L, 34L, 60L), average = c(2099.6666666667,
11037.6666666667, 17337, 21358, 23271.6666666667, 23927, 23824.3333333333,
21919.3333333333, 19870.6666666667, 17488, 12760.6666666667,
6035, 334, 46, 27, 42.6666666667, 1400.6666666667, 6523.3333333333,
10888, 12357.6666666667, 8801, 4395, 333.6666666667, 44.6666666667,
26.3333333333, 59.3333333333)), .Names = c("Collimator", "angle",
"X1", "X2", "X3", "average"), row.names = c(NA, -26L), class = "data.frame")
I first scale average counts for both collimator y and n to a make the highest counts 1
df <- ddply(df, .(Collimator), transform,
norm.average = average / max(average))
and plot the curves:
ggplot(df, aes(x=angle,y=norm.average,col=Collimator)) +
geom_point() + geom_line()
Using geom_line is quite unpleasing on the eye and I would rather fit to the data using stat_smooth. Each data set should be symmetric about the mean so I think a Gaussian fit should be ideal. How can I fit a Gaussian to the dataset collimator="y" and collimator="n" in ggplot2 or using base R. Also I would like to output the mean and standard deviation. Can this be done?

By definition your data is not Gaussian but a kind of Gaussian-like shape, and here is the example of the visualization of fitting:
fit <- dlply(df, .(Collimator), function(x) {
co <- coef(nls(norm.average ~ exp(-(angle - m)^2/(2 * s^2)), data = x, start = list(s = 50, m = 80)))
stat_function(fun = function(x) exp(-(x - co["m"])^2/(2 * co["s"]^2)), data = x)
})
ggplot(df, aes(x = angle, y = norm.average, col = Collimator)) + geom_point() + fit
Updated
To obtain the parameters:
fit <- dlply(df, .(Collimator), function(x) {
co <- coef(nls(norm.average ~ exp(-(angle - m)^2/(2 * s^2)), data = x, start = list(s = 50, m = 80)))
r <- stat_function(fun = function(x) exp(-(x - co["m"])^2/(2 * co["s"]^2)), data = x)
attr(r, ".coef") <- co
r
})
then,
> ldply(fit, attr, ".co")
Collimator s m
1 n 52.99117 82.60820
2 y 21.99518 86.61268

Related

How to make scatterplot with colors based on a column and add a mean line through stats_summary with grouping based on another column?

I have a data.frame (see below) and I would like to build a scatterplot, where colours of dots is based on a factor column (replicate). I simultaneously want to add a line that represents the mean of y, for each x. The problem is that when I define the stat_summary it uses the colours I requested for groupingand hence I get three mean lines (for each color) instead of one. Trying to redefine groups either in ggplot() or stat_summary() function did not work.
if I disable colors I get what I want (a single mean line).
How do I have colors (plot # 1), yet still have a single mean line (plot # 2)?
structure(list(conc = c(10L, 10L, 10L, 25L, 25L, 25L, 50L, 50L,
50L, 75L, 75L, 75L, 100L, 100L, 100L, 200L, 200L, 200L, 300L,
300L, 300L, 400L, 400L, 400L, 500L, 500L, 500L, 750L, 750L, 750L,
1000L, 1000L, 1000L), citric_acid = c(484009.63, 409245.09, 303193.26,
426427.47, 332657.35, 330875.96, 447093.71, 344837.39, 302873.98,
435321.69, 359146.09, 341760.28, 378298.37, 342970.87, 323146.92,
362396.98, 361246.41, 290638.14, 417357.82, 351927.66, 323611.37,
416280.3, 359430.65, 327950.99, 431167.14, 361429.91, 291901.43,
340166.41, 353640.91, 341839.08, 393392.69, 311375.19, 342103.54
), MICIT = c(20771.28, 18041.97, 12924.35, 49814.13, 38683.32,
38384.72, 106812.16, 82143.12, 72342.43, 156535.39, 128672.12,
119397.14, 187208.46, 167814.92, 159418.62, 350813.47, 357227.48,
295948.31, 505553.77, 523282.46, 489652.3, 803544.84, 704431.61,
654753.29, 1030485.41, 895451.64, 717698.52, 1246839.19, 1309712.63,
1212111.53, 1930503.38, 1499838.89, 1642091.64), replicate = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L
), .Label = c("1", "2", "3"), class = "factor"), MICITNorm = c(0.0429150139016862,
0.0440859779160698, 0.0426274317575529, 0.116817357005636, 0.116285781751102,
0.116009395182412, 0.238903293897827, 0.238208275500519, 0.238853235263062,
0.359585551549246, 0.358272367659634, 0.34935932285636, 0.494869856298879,
0.489297881187402, 0.493331701877276, 0.968036405822146, 0.98887482369721,
1.01827072661558, 1.21131974956166, 1.48690347328766, 1.51308744189056,
1.93029754230503, 1.95985403582026, 1.99649737297637, 2.38999059622215,
2.47752500616233, 2.45870162403795, 3.6653801002868, 3.70350995307641,
3.54585417793659, 4.90731889298706, 4.81682207885606, 4.79998435561351
)), class = "data.frame", row.names = c(NA, -33L))
ggplot(xx, aes (conc, MICIT, colour = replicate)) + geom_point () +
stat_summary(geom = "line", fun = mean)
Use aes(group = 1):
ggplot(xx, aes(conc, MICIT, colour = replicate)) +
geom_point() +
geom_line() +
stat_summary(aes(group = 1), geom = "line", fun = mean)

ggpairs formatting for points only

I'm looking to increase the size of the points AND outline them in black while keeping the line weight the same across the remaining plots.
library(ggplot2)
library(GGally)
pp <- ggpairs(pp.sed, columns = c(1,2), aes(color=pond.id, alpha = 0.5)) +
theme_bw()
print(pp)
Which gives me the following figure:
Data for reproducibility, and TIA!
> dput(pp.sed)
structure(list(Fe.259.941 = c(905.2628883, 825.7883359, 6846.128702,
1032.932924, 997.8037721, 588.9599882, 6107.641947, 798.4493611,
1046.38376, 685.2485692, 6452.273486, 730.8656684, 902.8585447,
1039.886406, 7408.801001, 2512.089991, 911.2101809, 941.3712067,
659.1069185, 1070.090445, 1017.666402, 925.3221586, 645.0500668,
954.0009756, 1022.594904, 803.5865352, 7653.184537, 1082.714082,
1048.51115, 773.9070604, 6889.060748, 973.0971769, 1002.091143,
798.9670583, 5089.035978, 2361.713222, 970.8258109, 748.3574529,
3942.04816, 889.1760124), Mn.257.611 = c(17.24667962, 14.90488024,
14.39265671, 20.51133433, 19.92596564, 11.76690074, 19.76386229,
14.29779164, 20.23646264, 13.55374658, 16.8847698, 13.11784439,
15.91777975, 20.64068844, 16.78681661, 28.61732162, 15.88328987,
19.59750367, 13.09735943, 21.59458118, 17.680152, 19.87127449,
12.8082581, 20.12050221, 17.57143193, 18.72196029, 16.21525793,
22.0518966, 18.39642397, 18.32238508, 16.17696923, 20.69668404,
17.96018218, 18.71945309, 16.50162126, 30.60719123, 17.69058768,
14.99048753, 16.28302375, 18.32277507), pond.id = structure(c(6L,
5L, 2L, 1L, 3L, 5L, 2L, 1L, 3L, 5L, 2L, 1L, 6L, 3L, 2L, 4L, 6L,
3L, 4L, 4L, 6L, 3L, 4L, 1L, 6L, 3L, 2L, 1L, 6L, 3L, 2L, 1L, 6L,
3L, 2L, 1L, 6L, 5L, 2L, 1L), .Label = c("LIL", "RHM", "SCS",
"STN", "STS", "TS"), class = "factor")), class = "data.frame", row.names = c(11L,
12L, 13L, 15L, 26L, 27L, 28L, 30L, 36L, 37L, 38L, 40L, 101L,
102L, 103L, 105L, 127L, 128L, 129L, 131L, 142L, 143L, 144L, 146L,
157L, 158L, 159L, 161L, 172L, 173L, 174L, 176L, 184L, 185L, 186L,
188L, 199L, 200L, 201L, 203L))
The GGally package already offers a family of wrap_xxx functions which could be used to set parameters to override default behaviour, e.g. using wrap you could override the default size of points using wrap(ggally_points, size = 5).
To use the wrapped function instead of the default you have to call
ggpairs(..., lower = list(continuous = wrap(ggally_points, size = 5))).
Switching the outline is a bit more tricky. Using wrap we could switch the shape of the points to 21 and set the outline color to "black". However, doing so the points are no longer colored. Unfortunately I have found no way to override the mapping. While it is possible to add a global fill aes, a drawback of doing so is that we lose the black outline for the densities.
One option to fix that is to write a wrapper for ggally_points which adjusts the mapping so that the fill aes is used instead of color.
library(ggplot2)
library(GGally)
ggally_points_filled <- function(data, mapping, ...) {
names(mapping)[grepl("^colour", names(mapping))] <- "fill"
ggally_points(data, mapping, ..., shape = 21)
}
w_ggally_points_filled <- wrap(ggally_points_filled, size = 5, color = "black")
ggpairs(pp.sed, columns = c(1, 2), aes(color = pond.id, alpha = 0.5),
lower = list(continuous = w_ggally_points_filled)) +
theme_bw()

Order Bars in ggplot2 from high to low (when repeating words are used)

I am trying to reorder the bars in ggPlot2's barplot from the highest values to lowest values. Where the highest values are at the top of the barchart and the lowest values are at the bottom.
I've used this stack overflow post in other plots and it works with no problem.
However, ggPlot2 seems to have a problem when there are the same values in both facets. It does not produce the correct ordering in the plot.
Here is what it looks like now. As you can see, it is out of order. Idealy, I'd like the Unvax_to_Vax facet to read (from top to bottom): safe, sheep, good, dumb, stupid, scared and I'd like the Vax_to_Unvax facet to read (from top to bottom): stupid, selfish, ingnorant, dumb, unsade, foolish.
Here is the data and code to reproduce the figure.
df <- structure(list(Var1 = structure(c(8L, 7L, 4L, 1L, 9L, 2L, 5L,
10L, 3L, 1L, 8L, 6L), .Label = c("dumb", "foolish", "good", "ignorant",
"safe", "scared", "selfish", "stupid", "unsafe", "sheep"), class = "factor"),
Freq = c(101L, 94L, 47L, 33L, 29L, 24L, 27L, 22L, 18L, 15L,
15L, 11L), Percent = c(8.82096069868996, 8.20960698689956,
4.10480349344978, 2.882096069869, 2.53275109170306, 2.09606986899563,
5.54414784394251, 4.51745379876797, 3.69609856262834, 3.08008213552361,
3.08008213552361, 2.25872689938398), Group = c("Vax_to_Unvax",
"Vax_to_Unvax", "Vax_to_Unvax", "Vax_to_Unvax", "Vax_to_Unvax",
"Vax_to_Unvax", "Unvax_to_Vax", "Unvax_to_Vax", "Unvax_to_Vax",
"Unvax_to_Vax", "Unvax_to_Vax", "Unvax_to_Vax")), row.names = c(319L,
292L, 147L, 82L, 375L, 98L, 173L, 182L, 76L, 54L, 190L, 176L), class = "data.frame")
ggplot(df,
aes( x= reorder(Var1, Freq), y = Percent, fill = Group)) +
geom_bar(stat="identity") +
facet_wrap(Group ~. , scales = "free") +
coord_flip()
Thank you for your help.

Drawing SE in xyplot with errorbars

I am trying to construct a simple XY-Graph with the milk production (called FCM) of two different groups of cows (from the output I got from the mixed model, using the lsmeans and SE).
I was able to construct the plot displaying the lsmeans using the xyplot function in lattice:
library(lattice)
xyplot(lsmean~Time, type="b", group=Group, data=lsmeans2[order(lsmeans2$Time),],
pch=16, ylim=c(10,35), col=c("darkorange","darkgreen"),
ylab="FCM (kg/day)", xlab="Week", lwd=2,
key=list(space="top",
lines=list(col=c("darkorange","darkgreen"),lty=c(1,1),lwd=2),
text=list(c("Confinement Group","Pasture Group"), cex=0.8)))
I now want to add the error bars. I tried some things with the panel.arrow function, just copying and pasting from other examples but didnĀ“t get any further.
I would really appreciate some help!
My lsmeans2 dataset:
Group Time lsmean SE df lower.CL upper.CL
Stall wk1 26.23299 0.6460481 59 24.19243 28.27356
Weide wk1 25.12652 0.6701080 58 23.00834 27.24471
Stall wk10 21.89950 0.6460589 59 19.85890 23.94010
Weide wk10 18.45845 0.6679617 58 16.34705 20.56986
Stall wk2 25.38004 0.6460168 59 23.33957 27.42050
Weide wk2 22.90409 0.6679617 58 20.79269 25.01549
Stall wk3 25.02474 0.6459262 59 22.98455 27.06492
Weide wk3 24.05886 0.6679436 58 21.94751 26.17020
Stall wk4 23.91630 0.6456643 59 21.87694 25.95565
Weide wk4 22.23608 0.6678912 58 20.12490 24.34726
Stall wk5 23.97382 0.6493483 59 21.92283 26.02481
Weide wk5 18.14550 0.6677398 58 16.03480 20.25620
Stall wk6 24.48899 0.6456643 59 22.44963 26.52834
Weide wk6 19.40022 0.6697394 58 17.28319 21.51724
Stall wk7 24.98107 0.6459262 59 22.94089 27.02126
Weide wk7 19.71200 0.6677398 58 17.60129 21.82270
Stall wk8 22.65167 0.6460168 59 20.61120 24.69214
Weide wk8 19.35759 0.6678912 58 17.24641 21.46877
Stall wk9 22.64381 0.6460481 59 20.60324 24.68438
Weide wk9 19.26869 0.6679436 58 17.15735 21.38004
For completeness, here is a solution using xyplot:
# Reproducible data
lsmeans2 = structure(list(Group = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Stall",
"Weide"), class = "factor"), Time = structure(c(1L, 1L, 2L, 2L,
3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L,
10L), .Label = c("wk1", "wk10", "wk2", "wk3", "wk4", "wk5", "wk6",
"wk7", "wk8", "wk9"), class = "factor"), lsmean = c(26.23299,
25.12652, 21.8995, 18.45845, 25.38004, 22.90409, 25.02474, 24.05886,
23.9163, 22.23608, 23.97382, 18.1455, 24.48899, 19.40022, 24.98107,
19.712, 22.65167, 19.35759, 22.64381, 19.26869), SE = c(0.6460481,
0.670108, 0.6460589, 0.6679617, 0.6460168, 0.6679617, 0.6459262,
0.6679436, 0.6456643, 0.6678912, 0.6493483, 0.6677398, 0.6456643,
0.6697394, 0.6459262, 0.6677398, 0.6460168, 0.6678912, 0.6460481,
0.6679436), df = c(59L, 58L, 59L, 58L, 59L, 58L, 59L, 58L, 59L,
58L, 59L, 58L, 59L, 58L, 59L, 58L, 59L, 58L, 59L, 58L), lower.CL = c(24.19243,
23.00834, 19.8589, 16.34705, 23.33957, 20.79269, 22.98455, 21.94751,
21.87694, 20.1249, 21.92283, 16.0348, 22.44963, 17.28319, 22.94089,
17.60129, 20.6112, 17.24641, 20.60324, 17.15735), upper.CL = c(28.27356,
27.24471, 23.9401, 20.56986, 27.4205, 25.01549, 27.06492, 26.1702,
25.95565, 24.34726, 26.02481, 20.2562, 26.52834, 21.51724, 27.02126,
21.8227, 24.69214, 21.46877, 24.68438, 21.38004)), .Names = c("Group",
"Time", "lsmean", "SE", "df", "lower.CL", "upper.CL"), class = "data.frame", row.names = c(NA,
-20L))
xyplot(lsmean~Time, type="b", group=Group, data=lsmeans2[order(lsmeans2$Time),],
panel = function(x, y, ...){
panel.arrows(x, y, x, lsmeans2$upper.CL, length = 0.15,
angle = 90, col=c("darkorange","darkgreen"))
panel.arrows(x, y, x, lsmeans2$lower.CL, length = 0.15,
angle = 90, col=c("darkorange","darkgreen"))
panel.xyplot(x,y, ...)
},
pch=16, ylim=c(10,35), col=c("darkorange","darkgreen"),
ylab="FCM (kg/day)", xlab="Week", lwd=2,
key=list(space="top",
lines=list(col=c("darkorange","darkgreen"),lty=c(1,1),lwd=2),
text=list(c("Confinement Group","Pasture Group"), cex=0.8)))
The length argument in panel.arrows changes the width of the error heads. You can fiddle around with this parameter to get a width you like.
Notice that even though you had lsmeans2[order(lsmeans2$Time),] when specifying the data =, the ordering of Time is still wrong. This is because Time is a factor, and R doesn't know you want it to order by the numerical suffix of wk. This means, that it will sort wk10 before wk2, because 1 is smaller than 2. You can use this little trick below to order it correctly:
# Order first by the character lenght, then by Time
Timelevels = levels(lsmeans2$Time)
Timelevels = Timelevels[order(nchar(Timelevels), Timelevels)]
# Reorder the levels
lsmeans2$Time = factor(lsmeans2$Time, levels = Timelevels)
# Create Subset
lsmeansSub = lsmeans2[order(lsmeans2$Time),]
xyplot(lsmean~Time, type="b", group=Group, data=lsmeansSub,
panel = function(x, y, yu, yl, ...){
panel.arrows(x, y, x, lsmeansSub$upper.CL, length = 0.15,
angle = 90, col=c("darkorange","darkgreen"))
panel.arrows(x, y, x, lsmeansSub$lower.CL, length = 0.15,
angle = 90, col=c("darkorange","darkgreen"))
panel.xyplot(x, y, ...)
},
pch=16, ylim=c(10,35), col=c("darkorange","darkgreen"),
ylab="FCM (kg/day)", xlab="Week", lwd=2,
key=list(space="top",
lines=list(col=c("darkorange","darkgreen"),lty=c(1,1),lwd=2),
text=list(c("Confinement Group","Pasture Group"), cex=0.8)))
Note that even after reordering the the levels of "Time", I still need to use the sorted data for the data = argument. This is because xyplot plots the points in the order that appears in the dataset, not the order of the factor levels.
Is there a particular reason you want to use xplot? ggplot2 is much easier to work with and prettier. Here's an example of what I think you want.
#load ggplot2
library(ggplot2)
#load data
d = structure(list(Group = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Stall",
"Weide"), class = "factor"), Time = structure(c(1L, 1L, 2L, 2L,
3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L,
10L), .Label = c("wk1", "wk10", "wk2", "wk3", "wk4", "wk5", "wk6",
"wk7", "wk8", "wk9"), class = "factor"), lsmean = c(26.23299,
25.12652, 21.8995, 18.45845, 25.38004, 22.90409, 25.02474, 24.05886,
23.9163, 22.23608, 23.97382, 18.1455, 24.48899, 19.40022, 24.98107,
19.712, 22.65167, 19.35759, 22.64381, 19.26869), SE = c(0.6460481,
0.670108, 0.6460589, 0.6679617, 0.6460168, 0.6679617, 0.6459262,
0.6679436, 0.6456643, 0.6678912, 0.6493483, 0.6677398, 0.6456643,
0.6697394, 0.6459262, 0.6677398, 0.6460168, 0.6678912, 0.6460481,
0.6679436), df = c(59L, 58L, 59L, 58L, 59L, 58L, 59L, 58L, 59L,
58L, 59L, 58L, 59L, 58L, 59L, 58L, 59L, 58L, 59L, 58L), lower.CL = c(24.19243,
23.00834, 19.8589, 16.34705, 23.33957, 20.79269, 22.98455, 21.94751,
21.87694, 20.1249, 21.92283, 16.0348, 22.44963, 17.28319, 22.94089,
17.60129, 20.6112, 17.24641, 20.60324, 17.15735), upper.CL = c(28.27356,
27.24471, 23.9401, 20.56986, 27.4205, 25.01549, 27.06492, 26.1702,
25.95565, 24.34726, 26.02481, 20.2562, 26.52834, 21.51724, 27.02126,
21.8227, 24.69214, 21.46877, 24.68438, 21.38004)), .Names = c("Group",
"Time", "lsmean", "SE", "df", "lower.CL", "upper.CL"), class = "data.frame", row.names = c(NA,
-20L))
#fix week
library(stringr)
library(magrittr)
d$Time %<>% as.character() %>% str_replace(pattern = "wk", replacement = "") %>% as.numeric()
#plot
ggplot(d, aes(Time, lsmean, color = Group, group = Group)) +
geom_point() +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = .2) +
geom_line() +
ylim(10, 35) +
scale_x_continuous(name = "Week", breaks = 1:10) +
ylab("FCM (kg/day)") +
scale_color_discrete(label = c("Confinement Group","Pasture Group"))

Scaling data in R data frame and fitting gaussian to geom_point

2 questions based on my data.frame
structure(list(Collimator = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("n", "y"), class = "factor"), angle = c(0L,
15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L, 180L,
0L, 15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L,
180L), X1 = c(2099L, 11070L, 17273L, 21374L, 23555L, 23952L,
23811L, 21908L, 19747L, 17561L, 12668L, 6008L, 362L, 53L, 21L,
36L, 1418L, 6506L, 10922L, 12239L, 8727L, 4424L, 314L, 38L, 21L,
50L), X2 = c(2126L, 10934L, 17361L, 21301L, 23101L, 23968L, 23923L,
21940L, 19777L, 17458L, 12881L, 6051L, 323L, 40L, 34L, 46L, 1352L,
6569L, 10880L, 12534L, 8956L, 4418L, 344L, 58L, 24L, 68L), X3 = c(2074L,
11109L, 17377L, 21399L, 23159L, 23861L, 23739L, 21910L, 20088L,
17445L, 12733L, 6046L, 317L, 45L, 26L, 46L, 1432L, 6495L, 10862L,
12300L, 8720L, 4343L, 343L, 38L, 34L, 60L), average = c(2099.6666666667,
11037.6666666667, 17337, 21358, 23271.6666666667, 23927, 23824.3333333333,
21919.3333333333, 19870.6666666667, 17488, 12760.6666666667,
6035, 334, 46, 27, 42.6666666667, 1400.6666666667, 6523.3333333333,
10888, 12357.6666666667, 8801, 4395, 333.6666666667, 44.6666666667,
26.3333333333, 59.3333333333)), .Names = c("Collimator", "angle",
"X1", "X2", "X3", "average"), row.names = c(NA, -26L), class = "data.frame")
I wish to plot detector counts versus angle with and without a collimator attached to the device. I guess geom_point is probably the best way to summarise the data
p <- ggplot(df, aes(x=angle,y=average,col=Collimator)) + geom_point() + geom_line()
Instead of plotting average count in the y-axis, I would prefer to rescale the data so that the angle with max counts has a value 1 for both collimator Y and N. The way I have done this seems quite cumbersome
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
coly = subset(df,Collimator=='y')
coly$norm_count = range01(coly$average)
coln = subset(df,Collimator=='n')
coln$norm_count = range01(coln$average)
df = rbind(coln,coly)
p <- ggplot(df, aes(x=angle,y=norm_count,col=Collimator) + geom_point() + geom_line()
I'm sure this can be done in a more efficient manner, applying the function to the data.frame based on the variable 'Collimator'. How can I do this?
Also I want to fit a function to the data rather than using geom_line. I think a Gaussian function may work in this case but have no idea how/if I can implement this in stat_smooth. Also can I pull out mead/standard deviation from such a fit?
ggplot2 goes hand in hand with the package plyr:
df <- ddply(df,.(Collimator),
transform,
norm_count1 = (average - min(average)) / (max(average) - min(average)) )
joran's answer scales the highest value to 1 and the lowest to 0; if you just want to scale to make the highest value 1 (and leaving 0 as 0), it is even simpler.
library("plyr")
df <- ddply(df, .(Collimator), transform,
norm.average = average / max(average))
The the plot is
ggplot(df, aes(x=angle,y=norm.average,col=Collimator)) +
geom_point() + geom_line()

Resources