Related
Is there any way to add points to a ggplot graph like with the points() function in base graphics? I don't often use ggplot and always prefer base graphics, but this time I must to deal with it. With + geom_point(x = c(1,2,3), y = c(1,2,3)) there is an error:
Error: Aesthetics must be either length 1 or the same as the data (33049): x, y
I'm not quite sure what you're looking for, but you can use the data= argument to geom_point() to override the default behaviour (which is to inherit data from the original ggplot call); as #dc37 points out, x and y need to be specified within a data frame, but you can do this on the fly. You might also need to specify the mapping, if the original x and y variables aren't called x and y ...
+ geom_point(data= data.frame(x = c(1,2,3), y = c(1,2,3)),
mapping = aes(x=x, y=y))
Alternatively (and maybe better):
+ annotate( geom="point", x = 1:3, y = 1:3)
From ?annotate:
This function adds geoms to a plot, but unlike [a typical] geom
function, the properties of the geoms are not mapped from
variables of a data frame, but are instead passed in as vectors.
This is useful for adding small annotations (such as text labels)
or if you have your data in vectors, and for some reason don't
want to put them in a data frame.
I am trying to loop a ggplot2 plot with a linear regression line over it. It works when I type the y column name manually, but the loop method I am trying does not work. It is definitely not a dataset issue.
I've tried many solutions from various websites on how to loop a ggplot and the one I've attempted is the simplest I could find that almost does the job.
The code that works is the following:
plots <- ggplot(Everything.any, mapping = aes(x = stock_VWRETD, y = stock_10065)) +
geom_point() +
labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
geom_smooth(method='lm',formula=y~x)
But I do not want to do this another 40 times (and then 5 times more for other reasons). The code that I've found on-line and have tried to modify it for my means is the following:
plotRegression <- function(z,na.rm=TRUE,...){
nm <- colnames(z)
for (i in seq_along(nm)){
plots <- ggplot(z, mapping = aes(x = stock_VWRETD, y = nm[i])) +
geom_point() +
labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
geom_smooth(method='lm',formula=y~x)
ggsave(plots,filename=paste("regression1",nm[i],".png",sep=" "))
}
}
plotRegression(Everything.any)
I expect it to be the nice graph that I'd expect to get, a Stock returns vs Market returns graph, but instead on the y-axis, I get one value which is the name of the respective column, and the Market value plotted as normally, but as if on a straight number-line across the one y-axis value. Please let me know what I am doing wrong.
Desired Plot:
Actual Plot:
Sample Data is available on Google Drive here:
https://drive.google.com/open?id=1Xa1RQQaDm0pGSf3Y-h5ZR0uTWE-NqHtt
The problem is that when you assign variables to aesthetics in aes, you mix bare names and strings. In this example, both X and Y are supposed to be variables in z:
aes(x = stock_VWRETD, y = nm[i])
You refer to stock_VWRETD using a bare name (as required with aes), however for y=, you provide the name as a character vector produced by colnames. See what happens when we replicate this with the iris dataset:
ggplot(iris, aes(Petal.Length, 'Sepal.Length')) + geom_point()
Since aes expects variable names to be given as bare names, it doesn't interpret 'Sepal.Length' as a variable in iris but as a separate vector (consisting of a single character value) which holds the y-values for each point.
What can you do? Here are 2 options that both give the proper plot
1) Use aes_string and change both variable names to character:
ggplot(iris, aes_string('Petal.Length', 'Sepal.Length')) + geom_point()
2) Use square bracket subsetting to manually extract the appropriate variable:
ggplot(iris, aes(Petal.Length, .data[['Sepal.Length']])) + geom_point()
you need to use aes_string instead of aes, and double-quotes around your x variable, and then you can directly use your i variable. You can also simplify your for loop call. Here is an example using iris.
library(ggplot2)
plotRegression <- function(z,na.rm=TRUE,...){
nm <- colnames(z)
for (i in nm){
plots <- ggplot(z, mapping = aes_string(x = "Sepal.Length", y = i)) +
geom_point()+
geom_smooth(method='lm',formula=y~x)
ggsave(plots,filename=paste("regression1_",i,".png",sep=""))
}
}
myiris<-iris
plotRegression(myiris)
This question follows from these two topics:
How to use stat_bin2d() to compute counts labels in ggplot2?
How to show the numeric cell values in heat map cells in r
In the first topic, a user wants to use stat_bin2d to generate a heatmap, and then wants the count of each bin written on top of the heat map. The method the user initially wants to use doesn't work, the best answer stating that stat_bin2d is designed to work with geom = "rect" rather than "text". No satisfactory response is given.
The second question is almost identical to the first, with one crucial difference, that the variables in the second question question are text, not numeric. The answer produces the desired result, placing the count value for a bin over the bin in a stat_2d heat map.
To compare the two methods i've prepared the following code:
library(ggplot2)
data <- data.frame(x = rnorm(1000), y = rnorm(1000))
ggplot(data, aes(x = x, y = y))
geom_bin2d() +
stat_bin2d(geom="text", aes(label=..count..))
We know this first gives you the error:
"Error: geom_text requires the following missing aesthetics: x, y".
Same issue as in the first question. Interestingly, changing from stat_bin2d to stat_binhex works fine:
library(ggplot2)
data <- data.frame(x = rnorm(1000), y = rnorm(1000))
ggplot(data, aes(x = x, y = y))
geom_binhex() +
stat_binhex(geom="text", aes(label=..count..))
Which is great and all, but generally, I don't think hex binning is very clear, and for my purposes wont work for the data i'm trying to desribe. I really want to use stat_2d.
To get this to work, i've prepared the following work around based on the second answer:
library(ggplot2)
data <- data.frame(x = rnorm(1000), y = rnorm(1000))
x_t<-as.character(round(data$x,.1))
y_t<-as.character(round(data$y,.1))
x_x<-as.character(seq(-3,3),1)
y_y<-as.character(seq(-3,3),1)
data<-cbind(data,x_t,y_t)
ggplot(data, aes(x = x_t, y = y_t)) +
geom_bin2d() +
stat_bin2d(geom="text", aes(label=..count..))+
scale_x_discrete(limits =x_x) +
scale_y_discrete(limits=y_y)
This works around allows one to bin numerical data, but to do so, you have to determine bin width (I did it via rounding) before bringing it into ggplot. I actually figured it out while writing this question, so I may as well finish.
This is the result: (turns out I can't post images)
So my real question here, is does any one have a better way to do this? I'm happy I at least got it to work, but so far I haven't seen an answer for putting labels on stat_2d bins when using a numerical variable.
Does any one have a method for passing on x and y arguments to geom_text from stat_2dbin without having to use a work around? Can any one explain why it works with text variables but not with numbers?
Another work around (but perhaps less work). Similar to the ..count.. method you can extract the counts from the plot object in two steps.
library(ggplot2)
set.seed(1)
dat <- data.frame(x = rnorm(1000), y = rnorm(1000))
# plot
p <- ggplot(dat, aes(x = x, y = y)) + geom_bin2d()
# Get data - this includes counts and x,y coordinates
newdat <- ggplot_build(p)$data[[1]]
# add in text labels
p + geom_text(data=newdat, aes((xmin + xmax)/2, (ymin + ymax)/2,
label=count), col="white")
In a plot of Y and X over categories Z, I would like for categories to be represented by points of different collor, except for one category, which I would like to be displayed as a line connecting the points.
Here is the data and what I have so far:
library(ggplot2);library(reshape);library(scales);library(directlabels)
dat <- read.csv("https://dl.dropboxusercontent.com/u/4329509/Fdat_graf.csv")
dat_long <- melt(dat, id="ano")
p <- qplot(ano,value, data=dat_graf_long, colour=variable)+
scale_y_log10(breaks=c(.1,1,10,100,500,1000),labels = comma) +
scale_x_continuous(breaks=seq(from=1960, to=2010, by=10)) +
theme_bw()
direct.label(p)
I would like for the "Lei_de_Moore" category to be represented by a line, as in this example (done in Stata):
Also, I would like to change a few things (maybe I should ask tem in different topic?):
Change the style of the graph colors more "vivid", as in the Stata
example
Change the Y aixis. I just want plain Numbers in non-scientific
notation form. I used the labels="comma", but I don't want the coma
itself. Ideally I would like the comma to be the decimal place
separator.
EDIT: I had asked another question on how to embed the legend for this graph (this post: Legend as text alongside points for each category and with same collor)
You can mix geoms if you use ggplot and pass only a subset of the data to different geoms. Here you can pass everything in dat_long to geom_point except rows where variable is Lei_de_Moore, and then pass only those dat_long rows to geom_line in a different call.
p <- ggplot(dat_long, aes(ano, value, color=variable)) +
geom_point(data=dat_long[dat_long$variable != 'Lei_de_Moore',]) +
geom_line(data=dat_long[dat_long$variable == 'Lei_de_Moore',]) +
scale_y_log10(breaks=c(.1,1,10,100,500,1000),labels = comma) +
scale_x_continuous(breaks=seq(from=1960, to=2010, by=10)) +
theme_bw()
For colors, have a look at RColorBrewer package palettes. Install the package and use ?brewer.pal to see some more options. For example, this one might work:
p <- p + scale_color_brewer(palette="Set1")
For the y-axis labels, you'll probably have to hack something together. Have a look at this question. So you could do something like this:
fmt <- function(){
f <- function(x) sub(".", ",", as.character(round(x,1)), fixed=T)
f
}
p <- ggplot(dat_long, aes(ano, value, color=variable)) +
geom_point(data=dat_long[dat_long$variable != 'Lei_de_Moore',]) +
geom_line(data=dat_long[dat_long$variable == 'Lei_de_Moore',]) +
scale_y_log10(breaks=c(.1,1,10,100,500,1000), labels=fmt()) +
scale_x_continuous(breaks=seq(from=1960, to=2010, by=10)) +
theme_bw() +
scale_color_brewer(palette="Set1")
I'm trying to write a simple plot function, using the ggplot2 library. But the call to ggplot doesn't find the function argument.
Consider a data.frame called means that stores two conditions and two mean values that I want to plot (condition will appear on the X axis, means on the Y).
library(ggplot2)
m <- c(13.8, 14.8)
cond <- c(1, 2)
means <- data.frame(means=m, condition=cond)
means
# The output should be:
# means condition
# 1 13.8 1
# 2 14.8 2
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=meansdf$condition, y=meansdf$means, x = meansdf$condition))
p + geom_bar(position="dodge", stat="identity")
}
testplot(means)
# This will output the following error:
# Error in eval(expr, envir, enclos) : object 'meansdf' not found
So it seems that ggplot is calling eval, which can't find the argument meansdf. Does anyone know how I can successfully pass the function argument to ggplot?
(Note: Yes I could just call the ggplot function directly, but in the end I hope to make my plot function do more complicated stuff! :) )
The "proper" way to use ggplot programmatically is to use aes_string() instead of aes() and use the names of the columns as characters rather than as objects:
For more programmatic uses, for example if you wanted users to be able to specify column names for various aesthetics as arguments, or if this function is going in a package that needs to pass R CMD CHECK without warnings about variable names without definitions, you can use aes_string(), with the columns needed as characters.
testplot <- function(meansdf, xvar = "condition", yvar = "means",
fillvar = "condition") {
p <- ggplot(meansdf,
aes_string(x = xvar, y= yvar, fill = fillvar)) +
geom_bar(position="dodge", stat="identity")
}
As Joris and Chase have already correctly answered, standard best practice is to simply omit the meansdf$ part and directly refer to the data frame columns.
testplot <- function(meansdf)
{
p <- ggplot(meansdf,
aes(fill = condition,
y = means,
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
This works, because the variables referred to in aes are looked for either in the global environment or in the data frame passed to ggplot. That is also the reason why your example code - using meansdf$condition etc. - did not work: meansdf is neither available in the global environment, nor is it available inside the data frame passed to ggplot, which is meansdf itself.
The fact that the variables are looked for in the global environment instead of in the calling environment is actually a known bug in ggplot2 that Hadley does not consider fixable at the moment.
This leads to problems, if one wishes to use a local variable, say, scale, to influence the data used for the plot:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale, # does not work, since scale is not found
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
A very nice workaround for this case is provided by Winston Chang in the referenced GitHub issue: Explicitly setting the environment parameter to the current environment during the call to ggplot.
Here's what that would look like for the above example:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale,
x = condition),
environment = environment()) # This is the only line changed / added
p + geom_bar(position = "dodge", stat = "identity")
}
## Now, the following works
testplot(means)
Here is a simple trick I use a lot to define my variables in my functions environment (second line):
FUN <- function(fun.data, fun.y) {
fun.data$fun.y <- fun.data[, fun.y]
ggplot(fun.data, aes(x, fun.y)) +
geom_point() +
scale_y_continuous(fun.y)
}
datas <- data.frame(x = rnorm(100, 0, 1),
y = x + rnorm(100, 2, 2),
z = x + rnorm(100, 5, 10))
FUN(datas, "y")
FUN(datas, "z")
Note how the y-axis label also changes when different variables or data-sets are used.
I don't think you need to include the meansdf$ part in your function call itself. This seems to work on my machine:
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=condition, y=means, x = condition))
p + geom_bar(position="dodge", stat="identity")
}
testplot(meansdf)
to produce:
This is an example of a problem that is discussed earlier. Basically, it comes down to ggplot2 being coded for use in the global environment mainly. In the aes() call, the variables are looked for either in the global environment or within the specified dataframe.
library(ggplot2)
means <- data.frame(means=c(13.8,14.8),condition=1:2)
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=condition,
y=means, x = condition))
p + geom_bar(position="dodge", stat="identity")
}
EDIT:
update: After seeing the other answer and updating the ggplot2 package, the code above works. Reason is, as explained in the comments, that ggplot will look for the variables in aes in either the global environment (when the dataframe is specifically added as meandf$... ) or within the mentioned environment.
For this, be sure you work with the latest version of ggplot2.
If is important to pass the variables (column names) to the custom plotting function unquoted, while different variable names are used within the function, then another workaround that I tried, was to make use of match.call() and eval (like here as well):
library(ggplot2)
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(df, x, y) {
arg <- match.call()
scale <- 0.5
p <- ggplot(df, aes(x = eval(arg$x),
y = eval(arg$y) * scale,
fill = eval(arg$x)))
p + geom_bar(position = "dodge", stat = "identity")
}
testplot(meansdf, condition, means)
Created on 2019-01-10 by the reprex package (v0.2.1)
Another workaround, but with passing quoted variables to the custom plotting function is using get():
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(df, x, y) {
scale <- 0.5
p <- ggplot(df, aes(x = get(x),
y = get(y) * scale,
fill = get(x)))
p + geom_bar(position = "dodge", stat = "identity")
}
testplot(meansdf, "condition", "means")
Created on 2019-01-10 by the reprex package (v0.2.1)
This frustrated me for some time. I wanted to send different data frames with different variable names and I wanted the ability to plot different columns from the data frame. I finally got a work around by creating some dummy (global) variables to handle plotting and forcing assignment inside the function
plotgraph function(df,df.x,df.y) {
dummy.df <<- df
dummy.x <<- df.x
dummy.y <<- df.y
p = ggplot(dummy.df,aes(x=dummy.x,y=dummy.y,.....)
print(p)
}
then in the main code I can just call the function
plotgraph(data,data$time,data$Y1)
plotgraph(data,data$time,data$Y2)
Short answer: Use qplot
Long answer:
In essence you want something like this:
my.barplot <- function(x=this.is.a.data.frame.typically) {
# R code doing the magic comes here
...
}
But that lacks flexibility because you must stick to consistent column naming to avoid the annoying R scope idiosyncrasies. Of course the next logic step is:
my.barplot <- function(data=data.frame(), x=..., y....) {
# R code doing something really really magical here
...
}
But then that starts looking suspiciously like a call to qplot(), right?
qplot(data=my.data.frame, x=some.column, y=some.other column,
geom="bar", stat="identity",...)
Of course now you'd like to change things like scale titles but for that a function comes handy... the good news is that scoping issues are mostly gone.
my.plot <- qplot(data=my.data.frame, x=some.column, y=some.other column,...)
set.scales(p, xscale=scale_X_continuous, xtitle=NULL,
yscale=scale_y_continuous(), title=NULL) {
return(p + xscale(title=xtitle) + yscale(title=ytitle))
}
my.plot.prettier <- set.scale(my.plot, scale_x_discrete, 'Days',
scale_y_discrete, 'Count')
Another workaround is to define the aes(...) as a variable of your function :
func<-function(meansdf, aes(...)){}
This just worked fine for me on a similar topic
You don't need anything fancy. Not even dummy variables. You only need to add a print() inside your function, is like using cat() when you want something to show in the console.
myplot <- ggplot(......) + Whatever you want here
print(myplot)
It worked for me more than one time inside the same function
I just generate new data frame variables with the desired names inside the function:
testplot <- function(df, xVar, yVar, fillVar) {
df$xVar = df[,which(names(df)==xVar)]
df$yVar = df[,which(names(df)==yVar)]
df$fillVar = df[,which(names(df)==fillVar)]
p <- ggplot(df,
aes(x=xvar, y=yvar, fill=fillvar)) +
geom_bar(position="dodge", stat="identity")
}