Adding axis labels and title to ggballoonplot() - r

The code I used and the result can be seen in the image below. The main problem is that the title doesn't appear in the center and the x and y labels don't appear at all. How do I fix this?
The graph and code

You should upload your code as a snippet and your data so we can reproduce this on our own machines easily...
Take the example below. You can recreate the data set and then run the code immediately.
Using ggtitle, xlab, ylab you can plot the text and center it with theme.
If this does not help you have the wrong print / render settings.
balloon <- data.table(structure(list(Genera = c("Prevotella", "Treponema", "Fusobacterium","Selenomonas", "Veillonella", "Porphyromonas", "Streptococcus","Leptotrichia", "Aggregatibacter", "Succiniclasticum"), S1 = c(97L,28L, 11L, 40L, 5L, 13L, 10L, 24L, 0L, 16L), S3 = c(5370L, 3760L,5551L, 2087L, 533L, 873L, 1330L, 5877L, 1213L, 44L), S4 = c(7892L,8004L, 11017L, 19712L, 5115L, 2695L, 7451L, 13611L, 301L, 2557L), S5 = c(23L, 79L, 30L, 7L, 0L, 34L, 0L, 2L, 2L, 0L), S6 = c(8310L,3379L, 38058L, 1133L, 2506L, 17811L, 12103L, 403L, 668L, 3L),S2 = c(7379L, 14662L, 10085L, 148L, 1502L, 5222L, 1010L,2463L, 4790L, 28L), S7 = c(6238L, 18977L, 2674L, 2198L, 27L,2999L, 174L, 1197L, 5268L, 5L), S8 = c(20019L, 18674L, 15306L,1472L, 1898L, 9600L, 1683L, 2221L, 3435L, 1109L), S9 = c(153L,12L, 23L, 36L, 15L, 15L, 6L, 41L, 0L, 30L), S10 = c(20103L,29234L, 10857L, 2869L, 4923L, 14206L, 1415L, 4574L, 649L,2160L)), .Names = c("Genera", "S1", "S3", "S4", "S5", "S6","S2", "S7", "S8", "S9", "S10"), class = c("data.table", "data.frame"), row.names = c(NA, -10L)))
library(ggplot2)
library(reshape2)
library(data.table)
balloon<-fread("Downloads/balloon.csv")
balloon
balloon_melted<-melt(balloon)
head(balloon_melted)
p <- ggplot(balloon_melted, aes(x =variable, y = Genera))
p+
geom_point( aes(size=value))+
theme(panel.background=element_blank(),
panel.border = element_rect(colour = "blue", fill=NA, size=1)) +
ggtitle("Pretty title") +
xlab("x lab label") +
ylab("y lab label") +
theme(plot.title = element_text(hjust = 0.5))

Related

Order Bars in ggplot2 from high to low (when repeating words are used)

I am trying to reorder the bars in ggPlot2's barplot from the highest values to lowest values. Where the highest values are at the top of the barchart and the lowest values are at the bottom.
I've used this stack overflow post in other plots and it works with no problem.
However, ggPlot2 seems to have a problem when there are the same values in both facets. It does not produce the correct ordering in the plot.
Here is what it looks like now. As you can see, it is out of order. Idealy, I'd like the Unvax_to_Vax facet to read (from top to bottom): safe, sheep, good, dumb, stupid, scared and I'd like the Vax_to_Unvax facet to read (from top to bottom): stupid, selfish, ingnorant, dumb, unsade, foolish.
Here is the data and code to reproduce the figure.
df <- structure(list(Var1 = structure(c(8L, 7L, 4L, 1L, 9L, 2L, 5L,
10L, 3L, 1L, 8L, 6L), .Label = c("dumb", "foolish", "good", "ignorant",
"safe", "scared", "selfish", "stupid", "unsafe", "sheep"), class = "factor"),
Freq = c(101L, 94L, 47L, 33L, 29L, 24L, 27L, 22L, 18L, 15L,
15L, 11L), Percent = c(8.82096069868996, 8.20960698689956,
4.10480349344978, 2.882096069869, 2.53275109170306, 2.09606986899563,
5.54414784394251, 4.51745379876797, 3.69609856262834, 3.08008213552361,
3.08008213552361, 2.25872689938398), Group = c("Vax_to_Unvax",
"Vax_to_Unvax", "Vax_to_Unvax", "Vax_to_Unvax", "Vax_to_Unvax",
"Vax_to_Unvax", "Unvax_to_Vax", "Unvax_to_Vax", "Unvax_to_Vax",
"Unvax_to_Vax", "Unvax_to_Vax", "Unvax_to_Vax")), row.names = c(319L,
292L, 147L, 82L, 375L, 98L, 173L, 182L, 76L, 54L, 190L, 176L), class = "data.frame")
ggplot(df,
aes( x= reorder(Var1, Freq), y = Percent, fill = Group)) +
geom_bar(stat="identity") +
facet_wrap(Group ~. , scales = "free") +
coord_flip()
Thank you for your help.

Using geom_errorbar in ggplot2 results in "Error: geom_errorbar requires the following missing aesthetics: ymin, ymax"

I wanted to create a visualisation for some data I had collected using ggplot2. Everything works fine except I cannot add error bars for some reasons. The code I used is the following
graph2 <- ggplot(enth_comb, aes(saturated, eocv, color=oil))
graph2 <- graph2 + geom_point()
This worked fine and resulted in the graph I expected. Then I added the following
graph2 <- graph2 + geom_errorbar(aes(ymin = v_lowlim, ymax = v_highlim))
This gives me the error "Error: geom_errorbar requires the following missing aesthetics: ymin, ymax" despite having provided ymin and ymax. I also tried adding an x value and removing 'aes' but it resulted in the same error.
The data is the following
I appreciate any help or suggestions.
Edit: Added output of dput(enth_comb)
structure(list(oil = structure(c(4L, 6L, 3L, 5L, 2L, 1L), .Label = c("coconut",
"palm", "peanut", "rapeseed", "rice", "sunflower"), class = "factor"),
saturated = c(8L, 11L, 17L, 25L, 82L, 88L), sonounsaturated = c(64L,
20L, 46L, 38L, 7L, 12L), Polyunsaturated = c(28L, 69L, 32L,
37L, 11L, 0L), eocv = c(26991L, 26746L, 28817L, 30056L, 20635L,
29497L), eocm = c(31204L, 30892L, 32964L, 34436L, 22979L,
33233L), eocv_error = c(2073L, 602L, 1932L, 5578L, 2128L,
1267L), eocm_error = c(2396L, 695L, 2210L, 6391L, 2369L,
1427L), v_highlim = c(29064L, 27348L, 30749L, 35634L, 22763L,
30764L), v_lowlim = c(24918L, 26144L, 26885L, 24478L, 18507L,
28230L), m_highlim = c(33600L, 31587L, 35174L, 40827L, 25348L,
34660L), m_lowlim = c(28808L, 30197L, 30754L, 28045L, 20610L,
31806L)), class = "data.frame", row.names = c(NA, -6L))
The full solution would be concatening all elements:
ggplot(enth_comb, aes(saturated, eocv, color=oil))+
geom_point()+
geom_errorbar(aes(ymin = v_lowlim, ymax = v_highlim))

Fitting gaussian to data geom_point in ggplot2

I have the following data set
structure(list(Collimator = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("n", "y"), class = "factor"), angle = c(0L,
15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L, 180L,
0L, 15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L,
180L), X1 = c(2099L, 11070L, 17273L, 21374L, 23555L, 23952L,
23811L, 21908L, 19747L, 17561L, 12668L, 6008L, 362L, 53L, 21L,
36L, 1418L, 6506L, 10922L, 12239L, 8727L, 4424L, 314L, 38L, 21L,
50L), X2 = c(2126L, 10934L, 17361L, 21301L, 23101L, 23968L, 23923L,
21940L, 19777L, 17458L, 12881L, 6051L, 323L, 40L, 34L, 46L, 1352L,
6569L, 10880L, 12534L, 8956L, 4418L, 344L, 58L, 24L, 68L), X3 = c(2074L,
11109L, 17377L, 21399L, 23159L, 23861L, 23739L, 21910L, 20088L,
17445L, 12733L, 6046L, 317L, 45L, 26L, 46L, 1432L, 6495L, 10862L,
12300L, 8720L, 4343L, 343L, 38L, 34L, 60L), average = c(2099.6666666667,
11037.6666666667, 17337, 21358, 23271.6666666667, 23927, 23824.3333333333,
21919.3333333333, 19870.6666666667, 17488, 12760.6666666667,
6035, 334, 46, 27, 42.6666666667, 1400.6666666667, 6523.3333333333,
10888, 12357.6666666667, 8801, 4395, 333.6666666667, 44.6666666667,
26.3333333333, 59.3333333333)), .Names = c("Collimator", "angle",
"X1", "X2", "X3", "average"), row.names = c(NA, -26L), class = "data.frame")
I first scale average counts for both collimator y and n to a make the highest counts 1
df <- ddply(df, .(Collimator), transform,
norm.average = average / max(average))
and plot the curves:
ggplot(df, aes(x=angle,y=norm.average,col=Collimator)) +
geom_point() + geom_line()
Using geom_line is quite unpleasing on the eye and I would rather fit to the data using stat_smooth. Each data set should be symmetric about the mean so I think a Gaussian fit should be ideal. How can I fit a Gaussian to the dataset collimator="y" and collimator="n" in ggplot2 or using base R. Also I would like to output the mean and standard deviation. Can this be done?
By definition your data is not Gaussian but a kind of Gaussian-like shape, and here is the example of the visualization of fitting:
fit <- dlply(df, .(Collimator), function(x) {
co <- coef(nls(norm.average ~ exp(-(angle - m)^2/(2 * s^2)), data = x, start = list(s = 50, m = 80)))
stat_function(fun = function(x) exp(-(x - co["m"])^2/(2 * co["s"]^2)), data = x)
})
ggplot(df, aes(x = angle, y = norm.average, col = Collimator)) + geom_point() + fit
Updated
To obtain the parameters:
fit <- dlply(df, .(Collimator), function(x) {
co <- coef(nls(norm.average ~ exp(-(angle - m)^2/(2 * s^2)), data = x, start = list(s = 50, m = 80)))
r <- stat_function(fun = function(x) exp(-(x - co["m"])^2/(2 * co["s"]^2)), data = x)
attr(r, ".coef") <- co
r
})
then,
> ldply(fit, attr, ".co")
Collimator s m
1 n 52.99117 82.60820
2 y 21.99518 86.61268

Scaling data in R data frame and fitting gaussian to geom_point

2 questions based on my data.frame
structure(list(Collimator = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("n", "y"), class = "factor"), angle = c(0L,
15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L, 180L,
0L, 15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L,
180L), X1 = c(2099L, 11070L, 17273L, 21374L, 23555L, 23952L,
23811L, 21908L, 19747L, 17561L, 12668L, 6008L, 362L, 53L, 21L,
36L, 1418L, 6506L, 10922L, 12239L, 8727L, 4424L, 314L, 38L, 21L,
50L), X2 = c(2126L, 10934L, 17361L, 21301L, 23101L, 23968L, 23923L,
21940L, 19777L, 17458L, 12881L, 6051L, 323L, 40L, 34L, 46L, 1352L,
6569L, 10880L, 12534L, 8956L, 4418L, 344L, 58L, 24L, 68L), X3 = c(2074L,
11109L, 17377L, 21399L, 23159L, 23861L, 23739L, 21910L, 20088L,
17445L, 12733L, 6046L, 317L, 45L, 26L, 46L, 1432L, 6495L, 10862L,
12300L, 8720L, 4343L, 343L, 38L, 34L, 60L), average = c(2099.6666666667,
11037.6666666667, 17337, 21358, 23271.6666666667, 23927, 23824.3333333333,
21919.3333333333, 19870.6666666667, 17488, 12760.6666666667,
6035, 334, 46, 27, 42.6666666667, 1400.6666666667, 6523.3333333333,
10888, 12357.6666666667, 8801, 4395, 333.6666666667, 44.6666666667,
26.3333333333, 59.3333333333)), .Names = c("Collimator", "angle",
"X1", "X2", "X3", "average"), row.names = c(NA, -26L), class = "data.frame")
I wish to plot detector counts versus angle with and without a collimator attached to the device. I guess geom_point is probably the best way to summarise the data
p <- ggplot(df, aes(x=angle,y=average,col=Collimator)) + geom_point() + geom_line()
Instead of plotting average count in the y-axis, I would prefer to rescale the data so that the angle with max counts has a value 1 for both collimator Y and N. The way I have done this seems quite cumbersome
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
coly = subset(df,Collimator=='y')
coly$norm_count = range01(coly$average)
coln = subset(df,Collimator=='n')
coln$norm_count = range01(coln$average)
df = rbind(coln,coly)
p <- ggplot(df, aes(x=angle,y=norm_count,col=Collimator) + geom_point() + geom_line()
I'm sure this can be done in a more efficient manner, applying the function to the data.frame based on the variable 'Collimator'. How can I do this?
Also I want to fit a function to the data rather than using geom_line. I think a Gaussian function may work in this case but have no idea how/if I can implement this in stat_smooth. Also can I pull out mead/standard deviation from such a fit?
ggplot2 goes hand in hand with the package plyr:
df <- ddply(df,.(Collimator),
transform,
norm_count1 = (average - min(average)) / (max(average) - min(average)) )
joran's answer scales the highest value to 1 and the lowest to 0; if you just want to scale to make the highest value 1 (and leaving 0 as 0), it is even simpler.
library("plyr")
df <- ddply(df, .(Collimator), transform,
norm.average = average / max(average))
The the plot is
ggplot(df, aes(x=angle,y=norm.average,col=Collimator)) +
geom_point() + geom_line()

R plot- SGAM plot counts vs. time - how do I get dates on the x-axis?

I'd like to plot this vs. time, with the actual dates (years actually, 1997,1998...2010). The dates are in a raw format, ala SAS, days since 1960 (hence as.date conversion). If I convert the dates using as.date to variable x, and do the GAM plot, I get an error. It works fine with the raw day numbers. But I want the plot to display the years (data are not equally spaced).
structure(list(site = c(928L, 928L, 928L, 928L, 928L, 928L, 928L,
928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L,
928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L), date = c(13493L,
13534L, 13566L, 13611L, 13723L, 13752L, 13804L, 13837L, 13927L,
14028L, 14082L, 14122L, 14150L, 14182L, 14199L, 16198L, 16279L,
16607L, 16945L, 17545L, 17650L, 17743L, 17868L, 17941L, 18017L,
18092L), y = c(7L, 7L, 17L, 18L, 17L, 17L, 10L, 3L, 17L, 24L,
11L, 5L, 5L, 3L, 5L, 14L, 2L, 9L, 9L, 4L, 7L, 6L, 1L, 0L, 5L,
0L)), .Names = c("site", "date", "y"), class = "data.frame", row.names = c(NA,
-26L))
sgam1 <- gam(sites$y ~ s(sites$date))
sgam <- predict(sgam1, se=TRUE)
plot(sites$date,sites$y,xaxt="n", xlab='Time', ylab='Counts')
x<-as.Date(sites$date, origin="1960-01-01")
axis(1, at=1:26,labels=x)
lines(sites$date,sgam$fit, lty = 1)
lines(sites$date,sgam$fit + 1.96* sgam$se, lty = 2)
lines(sites$date,sgam$fit - 1.96* sgam$se, lty = 2)
ggplot2 has a solution (it doesn't mind the as.date thing) but it gives me other problems...
Use the origin= argument to as.Date() to specify a particular offset:
R> as.Date(c(928, 928, 930), origin="1960-01-01")
[1] "1962-07-17" "1962-07-17" "1962-07-19"
R>
Once you have a Date type for your data, you have options for formatting the axis as you wish.
sites <- read.table("349.txt", header = TRUE, sep = "\t", quote="\"", dec=".")
p<-as.Date(sites$date, origin="1960-01-01")
sgam1 <- gam(sites$y ~ s(sites$date))
sgam <- predict(sgam1, se=TRUE)
plot(p,sites$y, xlab='Time', ylab='Counts')
lines(p,sgam$fit, lty = 1)
lines(p,sgam$fit + 1.96* sgam$se, lty = 2)
lines(p,sgam$fit - 1.96* sgam$se, lty = 2)
This works!

Resources