Related
I would like to add a bracket using geom_bracket for my first two groups of countries the United Kingdom (UK) and France (FR). I use the following code and it plots the three estimates:
library(ggpubr)
library(ggplot2)
df %>%
ggplot(aes(estimate, cntry)) +
geom_point()
However, whenever i add the geom_bracket as below, i get an error. I tried to get around it in different ways but it is still not working. Could someone let me know what i am doing wrong?
df %>%
ggplot(aes(estimate, cntry)) +
geom_point() +
geom_bracket(ymin = "UK", ymax = "FR", x.position = -.75, label.size = 7,
label = "group 1")
Here is a reproducible example:
structure(list(cntry = structure(1:3, .Label = c("BE", "FR",
"UK"), class = "factor"), estimate = c(-0.748, 0.436,
-0.640)), row.names = c(NA, -3L), groups = structure(list(
cntry = structure(1:3, .Label = c("BE", "FR", "UK"), class = "factor"),
.rows = structure(list(1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Well, it's pretty damn late at that, but I figured out a workaround for this. I though that I might as well post it here in case anyone finds it useful.
Firstly, as Basti mentioned, ymin, ymax, and x.position aren't arguments that can be used - you have to use xmin, xmax, and y.position. Now, won't this only work for a flipped graph (i.e. x = cntry, y = estimate)? Yes, it will. However you can easily get around this by using coord_flip().
Secondly, it turns out that geom_bracket doesn't inherit the data description (df) and won't run without it being defined inside it. Why? No idea. But this is what was causing the error. Additionally, for some reason, merely defining the data isn't enough, a label must also be added. Not a problem here, just thought I might mention it for dumb people like me who decided to use geom_bracket to add brackets to stat_compare_means.
Here's an example of the OP that should work, along with data generation:
library(ggplot2)
library(ggpubr)
library(tibble) #I like tibbles
df <- tibble(cntry = factor(c("BE", "FR", "UK")),
estimate = c(-0.748,0.436,-0.64)) #dataframe generation
df %>%
ggplot(aes(cntry, estimate)) +
geom_point() +
coord_flip() + #necessary if you want to keep this weird x/y orientation
geom_bracket(data = df, xmin = "UK", xmax = "FR", y.position = -.75,
label.size = 7, label = "group 1", coord.flip = T)
#coord.flip = T reflects the added coord_flip()
You can then play around with y coordinates, size, etc. You can also expand the graph using expand_limits().
I have the following code, that generates the following heatmap in R.
ggplot(data = hminput, color=category, aes(x = Poblaciones, y = Variantes)) +
geom_tile(aes(fill = Frecuencias)) + scale_colour_gradient(name = "Frecuencias",low = "blue", high = "white",guide="colourbar")
hminput is a data frame with three columns: Poblaciones, Variantes and Frecuencias, where the first two are the x and y axis and the third one is the color reference.
And my desired output is that the heatmap to have a bar as the reference instead of those blocks, and also that the coloring is white-blue gradient instead of that multicolor gradient.
To achieve that, I tried what's in my code, but I'm not achieving what I want (I'm getting the graph you see in the picture). Any thoughts? Thanks!
As some people asked, here is the dput of the data frame :
> dput(hminput)
structure(list(Variantes = structure(c(1L, 2L, 3L, 4L,...), .Label =
c("rs10498633", "rs10792832", "rs10838725",
"rs10948363", ..., "SNP"), class = "factor"),
Poblaciones = c("AFR", "AFR", ...), Frecuencias = structure(c(12L,
10L,...), .Label = c("0.01135", "0.0121",
"0.01286", "0.01513", "0.02194", "0.05144", "0.05825", "0.059",
"0.07716", "0.0938", "0.1051", "0.1225", "0.1346", "0.1407",
"0.1566", "0.1604", "0.1619", "0.1838", "0.1914", "0.1929",
...,
"0.45", "0.5", "0.4"), class = "factor")), .Names = c("Variantes",
"Poblaciones", "Frecuencias"), row.names = c("frqAFR.33", "frqAFR.31",
"frqAFR.27", "frqAFR.14", "frqAFR.24",...
), class = "data.frame")
Let's say I have a saved plot named my_plot, produced with ggplot. Also, let's say that the column in my_plot[[1]] data frame used for horizontal axis is named my_dates
Now, I want to add some vertical lines to the plot, which, of course, can be done by something like that:
my_plot +
geom_vline(aes(xintercept = my_dates[c(3, 8)]))
Since I perform this task quite on a regular basis, I want to write a function for that -- something like that:
ggplot.add_lines <- function(given_plot, given_points) {
finale <- given_plot +
geom_vline(aes(xintercept = given_plot[[1]]$my_dates[given_points]))
return(finale)
}
Which, as it's probably obvious to everyone, doesn't work:
> ggplot.add_lines(my_plot, c(3, 5))
Error in eval(expr, envir, enclos) : object 'given_plot' not found
So, my question would be what am I doing wrong, and how can it be fixed? Below is some data for a reproducible example:
> dput(my_plot)
structure(list(data = structure(list(my_dates = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10), my_points = c(-2.20176409422924, -1.12872396340683,
-0.259703895194354, 0.634233385649338, -0.678983982973015, -1.83157126614836,
1.33360095418957, -0.120455389285709, -0.969431974863616, -1.20451262626184
)), .Names = c("my_dates", "my_points"), row.names = c(NA, -10L
), class = "data.frame"), layers = list(<environment>), scales = <S4 object of class structure("Scales", package = "ggplot2")>,
mapping = structure(list(x = my_dates, y = my_points), .Names = c("x",
"y"), class = "uneval"), theme = list(), coordinates = structure(list(
limits = structure(list(x = NULL, y = NULL), .Names = c("x",
"y"))), .Names = "limits", class = c("cartesian", "coord"
)), facet = structure(list(shrink = TRUE), .Names = "shrink", class = c("null",
"facet")), plot_env = <environment>, labels = structure(list(
x = "my_dates", y = "my_points"), .Names = c("x", "y"
))), .Names = c("data", "layers", "scales", "mapping", "theme",
"coordinates", "facet", "plot_env", "labels"), class = c("gg",
"ggplot"))
According to this post, below is my solution to this problem. The environment issue in the **ply and ggplot is annoying.
ggplot.add_lines <- function(given_plot, given_points) {
finale <- eval(substitute( expr = {given_plot +
geom_vline(aes(xintercept = my_dates[given_points]))}, env = list(given_points = given_points)))
return(finale)
}
The following code runs well on my machine. (I cannot make your reproducible work on my machine...)
df <- data.frame(my_dates = 1:10, val = 1:10)
my_plot <- ggplot(df, aes(x = my_dates, y = val)) + geom_line()
my_plot <- ggplot.add_lines(my_plot, c(3, 5))
print(my_plot)
Update: The above solution fails when more than two points are used.
It seems that we can easily solve this problem by not including the aes (subsetting together with aescauses problems):
ggplot.add_lines <- function(given_plot, given_points) {
finale <- given_plot + geom_vline(xintercept = given_plot[[1]]$my_dates[given_points])
return(finale)
}
I would take the following approach: extract the data.frame of interest, and pass it to the new layer,
df <- data.frame(my_dates = 1:10, val = rnorm(10))
my_plot <- ggplot(df, aes(x = my_dates, y = val)) + geom_line()
add_lines <- function(p, given_points=c(3,5), ...){
d <- p[["data"]][given_points,]
p + geom_vline(data = d, aes_string(xintercept="my_dates"), ...)
}
add_lines(my_plot, c(3,5), lty=2)
My data looks something like this:
There are 10,000 rows, each representing a city and all months since 1998-01 to 2013-9:
RegionName| State| Metro| CountyName| 1998-01| 1998-02| 1998-03
New York| NY| New York| Queens| 1.3414| 1.344| 1.3514
Los Angeles| CA| Los Angeles| Los Angeles| 12.8841| 12.5466| 12.2737
Philadelphia| PA| Philadelphia| Philadelphia| 1.626| 0.5639| 0.2414
Phoenix| AZ| Phoenix| Maricopa| 2.7046| 2.5525| 2.3472
I want to be able to do a plot for all months since 1998 for any city or more than one city.
I tried this but i get an error. I am not sure if i am even attempting this right. Any help will be appreciated. Thank you.
forecl <- ts(forecl, start=c(1998, 1), end=c(2013, 9), frequency=12)
plot(forecl)
Error in plots(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"
You might try
require(reshape)
require(ggplot2)
forecl <- melt(forecl, id.vars = c("region","state","city"), variable_name = "month")
forecl$month <- as.Date(forecl$month)
ggplot(forecl, aes(x = month, y = value, color = city)) + geom_line()
To add to #JLLagrange's answer, you might want to pass city through facet_grid() if there are too many cities and the colors will be hard to distinguish.
ggplot(forecl, aes(x = month, y = value, color = city, group = city)) +
geom_line() +
facet_grid( ~ city)
Could you provide an example of your data, e.g. dput(head(forecl)), before converting to a time-series object? The problem might also be with the ts object.
In any case, I think there are two problems.
First, data are in wide format. I'm not sure about your column names, since they should start with a letter, but in any case, the general idea would be do to something like this:
test <- structure(list(
city = structure(1:2, .Label = c("New York", "Philly"),
class = "factor"), state = structure(1:2, .Label = c("NY",
"PA"), class = "factor"), a2005.1 = c(1, 1), a2005.2 = c(2, 5
)), .Names = c("city", "state", "a2005.1", "a2005.2"), row.names = c(NA,
-2L), class = "data.frame")
test.long <- reshape(test, varying=c(3:4), direction="long")
Second, I think you are trying to plot too many cities at the same time. Try:
plot(forecl[, 1])
or
plot(forecl[, 1:5])
This question is a direct successor to a pervious question asked here called “ggplot scatter plot of two groups with superimposed means with X and Y error bars”. That questions answer looks to do exactly what I am trying to accomplish however the code provided results in an error which I can’t get around. I will use my data as example here but I have tried the original question code as well with the same result.
I have a data frame which looks like this:
structure(list(Meta_ID = structure(c(15L, 22L, 31L, 17L), .Label = c("NM*624-46",
"NM*624-54", "NM*624-56", "NM*624-61", "NM*624-70", "NM624-36",
"NM624-38", "NM624-39", "NM624-40", "NM624-41", "NM624-43", "NM624-46",
"NM624-47", "NM624-51", "NM624-54 ", "NM624-56", "NM624-57",
"NM624-59", "NM624-61", "NM624-64", "NM624-70", "NM624-73", "NM624-75",
"NM624-77", "NM624-81", "NM624-82", "NM624-83", "NM624-84", "NM625-02",
"NM625-10", "NM625-11", "SM621-43", "SM621-44", "SM621-46", "SM621-47",
"SM621-48", "SM621-52", "SM621-53", "SM621-55", "SM621-56", "SM621-96",
"SM621-97", "SM622-51", "SM622-52", "SM623-14", "SM623-23", "SM623-26",
"SM623-27", "SM623-32", "SM623-33", "SM623-34", "SM623-55", "SM623-56",
"SM623-57", "SM623-58", "SM623-59", "SM623-61", "SM623-62", "SM623-64",
"SM623-65", "SM623-66", "SM623-67", "SM680-74", "SM681-16"), class = "factor"),
Region = structure(c(1L, 1L, 1L, 1L), .Label = c("N", "S"
), class = "factor"), Tissue = structure(c(1L, 2L, 1L, 1L
), .Label = c("M", "M*"), class = "factor"), Tag_Num = structure(c(41L,
48L, 57L, 43L), .Label = c("621-43", "621-44", "621-46",
"621-47", "621-48", "621-52", "621-53", "621-55", "621-56",
"621-96", "621-97", "622-51", "622-52", "623-14", "623-23",
"623-26", "623-27", "623-32", "623-33", "623-34", "623-55",
"623-56", "623-57", "623-58", "623-59", "623-61", "623-62",
"623-64", "623-65", "623-66", "623-67", "624-36", "624-38",
"624-39", "624-40", "624-41", "624-43", "624-46", "624-47",
"624-51", "624-54", "624-56", "624-57", "624-59", "624-61",
"624-64", "624-70", "624-73", "624-75", "624-77", "624-81",
"624-82", "624-83", "624-84", "625-02", "625-10", "625-11",
"680-74", "681-16"), class = "factor"), Lab_Num = structure(1:4, .Label = c("C4683",
"C4684", "C4685", "C4686", "C4687", "C4688", "C4689", "C4690",
"C4691", "C4692", "C4693", "C4694", "C4695", "C4696", "C4697",
"C4698", "C4699", "C4700", "C4701", "C4702", "C4703", "C4704",
"C4705", "C4706", "C4707", "C4708", "C4709", "C4710", "C4711",
"C4712", "C4713", "C4714", "C4715", "C4716", "C4717", "C4718",
"C4719", "C4720", "C4721", "C4722", "C4723", "C4724", "C4725",
"C4726", "C4727", "C4728", "C4729", "C4730", "C4731", "C4732",
"C4733", "C4734", "C4735", "C4736", "C4737", "C4738", "C4739",
"C4740", "C4741", "C4742", "C4743", "C4744", "C4745", "C4746",
"C4747", "C4748"), class = "factor"), C = c(46.5, 46.7, 45,
43.6), N = c(12.9, 13.7, 14.5, 13.4), C.N = c(3.6, 3.4, 3.1,
3.3), d13C = c(-19.7, -19.5, -19.4, -19.2), d15N = c(13.3,
12.4, 11.7, 11.9)), .Names = c("Meta_ID", "Region", "Tissue",
"Tag_Num", "Lab_Num", "C", "N", "C.N", "d13C", "d15N"), row.names = c(NA,
4L), class = "data.frame")
What I want to produce is a scatter plot of the raw data with an overlay of the data means for each “Region” with bidirectional error bars. To accomplish that I use plyr to summarize my data and generate the means and SD’s. Then I use ggplot2:
library(plyr)
Basic <- ddply(First.run,.(Region),summarise,
N = length(d13C),
d13C.mean = mean(d13C),
d15N.mean = mean(d15N),
d13C.SD = sd(d13C),
d15N.SD = sd(d15N))
ggplot(data=First.run, aes(x = First.run$d13C, y = First.run$d15N))+
geom_point(aes(colour = Region))+
geom_point(data = Basic,aes(colour = Region))+
geom_errorbarh(data = Basic, aes(xmin = d13C.mean + d13C.SD, xmax = d13C.mean - d13C.SD,
y = d15N.mean, colour = Region, height = 0.01))+
geom_errorbar(data = Basic, aes(ymin = d15N.mean - d15N.SD, ymax = d15N.mean + d15N.SD,
x = d13C.mean,colour = Region))
But each time I run this code I get the same error and can’t figure out what the problem is.
Error: Aesthetics must either be length one, or the same length as the dataProblems:Region
Any help would be much appreciated.
Edit: Since my example data is taken from the head of my full dataset it only includes samples from the "N" Region. With only this one region the code works fine but if you use fix() to change the provided dataset so that at least one other Region is included (in my data the other Region is "S") then the error I get shows up. My mistake in not including some data from each Region.
I ended up changing two of the "N" Regions to "S" so I could calculate standard deviation for both groups.
I think the problem was that you were missing required aesthetics in some of your geoms (geom_point was missing x and y, for example). At least getting all the required aesthetics into each geom seemed to get everything working. I cleaned up a few other things while I was at it to shorten the code up a bit.
ggplot(data = First.run, aes(x = d13C, y = d15N, colour = Region)) +
geom_point() +
geom_point(data = Basic,aes(x = d13C.mean, y = d15N.mean)) +
geom_errorbarh(data = Basic, aes(xmin = d13C.mean + d13C.SD,
xmax = d13C.mean - d13C.SD, y = d15N.mean, x = d13C.mean), height = .5) +
geom_errorbar(data = Basic, aes(ymin = d15N.mean - d15N.SD,
ymax = d15N.mean + d15N.SD, x = d13C.mean, y = d15N.mean), width = .01)