ggplot geom_text annotation with variable label - r

Is it possible to annotate a ggplot figure with a "text" element indicating a feature of the data (variable)?
library(ggplot2)
library(datasets)
my.mean <- mean(mtcars$mpg, na.rm=T)
my.mean <- as.name(my.mean)
gplot <- ggplot(mtcars, aes(mpg))+geom_histogram()
gplot <- gplot + geom_text(aes_string(label=my.mean, y=5), size=3)
This produces something on the plot that looks like a succession of numbers. Any ideas how to resolve this?
Edit: this question is different since I am not trying to annotate each histogram bin with a value. The objective is to add one single text element to the plot.

If I understood you right, you want to add a text to your plot which is defined by another dataset, i.e. a dataset which was not given as argument to ggplot().
Solution: Pass this dataset directly to your geom_text function using data=... to use it.
library(ggplot2) library(datasets)
my.mean <- mean(mtcars$mpg, na.rm=T)
ggplot(mtcars, aes(mpg)) +
geom_histogram() +
geom_text(data=data.frame(my.mean=my.mean), aes(y=5, x=my.mean, label=my.mean), size=3)

it should work like this:
gplot <- gplot + geom_text(aes(15, 5, label="some random text"))
gplot
with the numbers you can specify the location within your grid.

Related

How to use position_jitter_tern() in ggtern() in R?

I am creating a simple ternary plot.
ggtern(data=data.frame(x=c(0.1,0.1),y=c(0.2,0.2),z=c(0.7,0.7)),aes(x,y,z)) + geom_point()
How can I jitter the point so that the plot will display two points?
I tried using position_jitter_tern like so: but it isn't changing anything.
ggtern(data=data.frame(x=c(0.1,0.1),y=c(0.2,0.2),z=c(0.7,0.7)),aes(x,y,z, position_jitter_tern(0.1,0.1,0.1))) + geom_point()
Image can be seen here
You need to use the "position" option inside the geom_point function.
library(ggtern)
df <- data.frame(x=c(0.1,0.1),y=c(0.2,0.2),z=c(0.7,0.7))
ggtern(data=df, aes(x,y,z) ) +
geom_point(position= position_jitter_tern(x=0.1, y=0.1, z=0.02))
You can apply base jitter function to the dataframe.
library(ggtern)
library(ggplot2)
data=data.frame(x=c(0.1,0.1),y=c(0.2,0.2),z=c(0.7,0.7))
data[] <- lapply(data, jitter, 3)
ggtern(data,aes(x,y,z)) + geom_point()

How to compare two histograms in R?

I want to compare two histograms in a graph in R, but couldn't imagined and implemented.
My histograms are based on two sub-dataframes and these datasets divided according to a type (Action, Adventure Family)
My first histogram is:
split_action <- split(df, df$type)
dataset_action <- split_action$Action
hist(dataset_action$year)
split_adventure <- split(df, df$type)
dataset_adventure <- split_adventure$Adventure
hist(dataset_adventure$year)
I want to see how much overlapping is occured, their comparison based on year in the same histogram. Thank you in advence.
Using the iris dataset, suppose you want to make a histogram of sepal length for each species. First, you can make 3 data frames for each species by subsetting.
irissetosa<-subset(iris,Species=='setosa',select=c('Sepal.Length','Species'))
irisversi<-subset(iris,Species=='versicolor',select=c('Sepal.Length','Species'))
irisvirgin<-subset(iris,Species=='virginica',select=c('Sepal.Length','Species'))
and then, make the histogram for these 3 data frames. Don't forget to set the argument "add" as TRUE (for the second and third histogram), because you want to combine the histograms.
hist(irissetosa$Sepal.Length,col='red')
hist(irisversi$Sepal.Length,col='blue',add=TRUE)
hist(irisvirgin$Sepal.Length,col='green',add=TRUE)
you will have something like this
Then you can see which part is overlapping...
But, I know, it's not so good.
Another way to see which part is overlapping is by using density function.
plot(density(irissetosa$Sepal.Length),col='red')
lines(density(irisversi$Sepal.Length),col='blue')
lines(density(irisvirgin$Sepal.Length,col='green'))
Then you will have something like this
Hope it helps!!
You don't need to split the data if using ggplot. The key is to use transparency ("alpha") and change the value of the "position" argument to "identity" since the default is "stack".
Using the iris dataset:
library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
geom_histogram(binwidth=0.2, alpha=0.5, position="identity") +
theme_minimal()
It's not easy to see the overlap, so a density plot may be a better choice if that's the main objective. Again, use transparency to avoid obscuring overlapping plots.
ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
geom_density(alpha=0.5) +
xlim(3.9,8.5) +
theme_minimal()
So for your data, the command would be something like this:
ggplot(data=df, aes(x=year, fill=type)) +
geom_histogram(alpha=0.5, position="identity")

How to add geom_point() to autolayer() line?

Trying to add geom_points to an autolayer() line ("fitted" in pic), which is a wrapper part of autoplot() for ggplot2 in Rob Hyndmans forecast package (there's a base autoplot/autolayer in ggplot2 too so same likely applies there).
Problem is (I'm no ggplot2 expert, and autoplot wrapper makes it trickier) the geom_point() applies fine to the main call, but how do I apply similar to the autolayer (fitted values)?
Tried type="b" like normal geom_line() but it's not an object param in autolayer().
require(fpp2)
model.ses <- ets(mdeaths, model="ANN", alpha=0.4)
model.ses.fc <- forecast(model.ses, h=5)
forecast::autoplot(mdeaths) +
forecast::autolayer(model.ses.fc$fitted, series="Fitted") + # cannot set to show points, and type="b" not allowed
geom_point() # this works fine against the main autoplot call
This seems to work:
library(forecast)
library(fpp2)
model.ses <- ets(mdeaths, model="ANN", alpha=0.4)
model.ses.fc <- forecast(model.ses, h=5)
# Pre-compute the fitted layer so we can extract the data out of it with
# layer_data()
fitted_layer <- forecast::autolayer(model.ses.fc$fitted, series="Fitted")
fitted_values <- fitted_layer$layer_data()
plt <- forecast::autoplot(mdeaths) +
fitted_layer +
geom_point() +
geom_point(data = fitted_values, aes(x = timeVal, y = seriesVal))
There might be a way to make forecast::autolayer do what you want directly but this solution works. If you want the legend to look right, you'll want to merge the input data and fitted values into a single data.frame.

How can I define a color palette (normalize) for multiple hexbin plots in R

I want to find a way to set a certain range of a color palette that is used for a hexbin plot to normalize multiple plots in R.
So far I have tried:
library(hexbin)
library(gplots)
my.colors <- function (n)
{
(rich.colors(n))
}
plot(hexbin(lastthousand$V4, lastthousand$V5, xbnds=c(0,35), ybnds=c(0,35),), xlab="Green Pucks", ylab="Red Pucks",colramp = my.colors, colorcut = seq(0, 1, length = 25),lcex=0.66)
Which results in the follwing plot:
hexbin plot #1
I understand that "colourcut" controls the resolution of the color palette. But I found no way to controll the min/max values
Lets say I have a second plot - 'hexbin plot #2' - with counts from 1(dark-blue) to 100(red). Is there a way to use only the colors 1(dark-blue)-24(light-blue) [based on only a part of the 1(dark-blue)-100(red) scale] for hexbin plot #1?
The final goal is to have several hexbin plots next to each other which follow the same colour scheme (min and max based on the one with the highest counts).
-this is my first question here :) and I'm new to R, please be gentle
//edit: For everyone with the same problem: My supervisor suggested to use facets in ggplot2. Will see how that works and return with another edit if it solves the issue.
//edit2: factes did the trick:
library(gplots)
library(ggplot2)
p <- ggplot(data=lastthousand, aes(lastthousand$V4,lastthousand$V5))+ geom_hex()
p + facet_grid(. ~ Market) + xlab("green pucks") + ylab("red pucks") + scale_colour_gradientn(colours=rainbow(7))
Maybe this can be useful: https://gist.github.com/wahalulu/1376861
and this for ranges:
https://stackoverflow.com/a/15505591/1600108
https://stackoverflow.com/a/14586941/1600108
Facets does the trick:
library(gplots)
library(ggplot2)
p <- ggplot(data=lastthousand, aes(lastthousand$V4,lastthousand$V5))+ geom_hex()
p + facet_grid(. ~ Market) + xlab("green pucks") + ylab("red pucks") + scale_colour_gradientn(colours=rainbow(7))

Histogram with fraction in qplot / ggplot

So far I have missed a histogram function with a fraction on the y-axis. Like this:
require(ggplot2)
data(diamonds)
idealD <- diamonds[diamonds[,"cut"]=="Ideal",]
fracHist <- function(x){
frac <- (hist(x,plot=F)$counts) / (sum(hist(x,plot=F)$counts))
barplot(frac)
}
### call
fracHist(idealD$carat)
It ain't pretty but basically should explain what I want: bar heights should add up to one. Plus the breaks should be labelling the x-axis. I'd love to create the same with ggplot2 but can't figure out how to get around plotting the frequencies offracinstead of plottingfracitself.
all I get with `ggplot` is density...
m <- ggplot(idealD, aes(x=carat))
m + geom_histogram(aes(y = ..density..)) + geom_density()
The solution is to use stat_bin and map the aesthetic y=..count../sum(..count..)
library(ggplot2)
ggplot(idealD, aes(x=carat)) + stat_bin(aes(y=..count../sum(..count..)))
From a quick scan of ?hist I couldn't find how the values are binned in hist. This means the graphs won't be identical unless you fiddle with the binwidth argument of stat_bin.
The trick works with geom_histogram histogram as well.
require(ggplot2)
data(diamonds)
idealD <- diamonds[diamonds[,"cut"]=="Ideal",]
ggplot(idealD, aes(x=carat)) + geom_histogram(aes(y=..count../sum(..count..)))

Resources