I am looking to resolve an error that I encounter when trying to use direct.label to label a ggplot with only one series. Below is a example to illustrate how direct.label fails if there is only a single series.
In my real data, I am looping through regions and wanting to use direct labels on the sub-regions. However, in my case some of the regions only have one sub-region resulting in an error when using direct.label. Any assistance would be greatly appreciated
library(ggplot2)
library(directlabels)
# sample data from ggplot2 movies data
mry <- do.call(rbind, by(movies, round(movies$rating), function(df) {
nums <- tapply(df$length, df$year, length)
data.frame(rating=round(df$rating[1]), year = as.numeric(names(nums)), number=as.vector(nums))
}))
# use direct labels to label based on rating
p <- ggplot(mry, aes(x=year, y=number, group=rating, color=rating)) + geom_line()
direct.label(p, "last.bumpup")
# subset to only a single rating
mry2 = subset(mry, rating==10)
p2 <- ggplot(mry2, aes(x=year, y=number, group=rating, color=rating)) + geom_line()
p2
# direct labels fails when attempting to label plot with a single series
direct.label(p2, "last.bumpup")
This indeed was a bug; the package maintainer has already fixed it. To obtain an updated version,
install.packages("directlabels", repos="http://r-forge.r-project.org")
I've just checked, everything now runs fine. Nice catch!
Related
I have been struggling to create an area chart using ggplot for a while now and to no avail!
Here is my code:
strings <- cbind("rstarUS","rstarUK","rstarJAP","rstarGER","rstarFRA","rstarITA","rstarCA")
time <- as.numeric(rep(seq(1,50),each=7))
rstar <- rep(strings,times=50)
v <- variance.decomposition$rstarUS*100
data <- data.frame(time,v)
data <- data.frame(time, percent=as.vector(t(data[-1])), rstar)
percent <- as.numeric(data$percent)
plot.us <- ggplot(data, aes(x=time, y=percent, fill=rstar)) + geom_area()
plot.us
My data is already in percentages, they are FEVD - but every time I run my code I keep getting lines instead of the shaded area FEVD.
I am essentially trying to get a stacked area percentage chart
Perhaps the issue is with variance.decomposition$rstarUS; if you change the value of v to something else it appears to run as expected:
library(tidyverse)
strings <- cbind("rstarUS","rstarUK","rstarJAP","rstarGER","rstarFRA","rstarITA","rstarCA")
time <- as.numeric(rep(seq(1,50),each=7))
rstar <- rep(strings,times=50)
v <- rpois(350, 10)
data <- data.frame(time,v)
data <- data.frame(time, percent=as.vector(t(data[-1])), rstar)
percent <- as.numeric(data$percent)
plot.us <- ggplot(data, aes(x=time, y=percent, fill=rstar)) +
geom_area()
plot.us
Created on 2022-07-15 by the reprex package (v2.0.1)
What is variance.decomposition$rstarUS and are you sure you have calculated it properly?
I made a grouped barchart in R using the ggplot package. I used the following code:
ggplot(completedDF,aes(year,value,fill=variable)) + geom_bar(position=position_dodge(),stat="identity")
And the graph looks like this:
The problem is that I want the 1999-2008 data to be at the end.
Is there anyway to move it?
Thanks any help appreciated.
ggplot will follow the order of the levels in a factor. If you didn't ordered your factor, then it is assumed that the order is alphabetical.
If you want your "1999-2008" modality to be at the end, just reorder your factor using
completed$year <- factor(x=completed$year,
levels=c("1999-2002", "2002-2005", "2005-2008", "1999-2008"))
For example :
library(ggplot2)
# Create a sample data set
set.seed(2014)
years_labels <- c( "1999-2008","1999-2002", "2002-2005", "2005-2008")
variable_labels <- c("pointChangeVector", "nonPointChangeVector",
"onRoadChangeVector", "nonRoadChangeVecto")
years <- rbinom(n=1000, size=3,prob=0.3)
variables <- rbinom(n=1000, size=3,prob=0.3)
year <- factor(x=years , levels=0:3, labels=years_labels)
variable <- factor(x=variables , levels=0:3, labels=variable_labels)
completed <- data.frame( year, variable)
# Plot
ggplot(completed,aes(x=year, fill=variable)) + geom_bar(position=position_dodge())
# change the order
completed$year <- factor(x=completed$year,
levels=c("1999-2002", "2002-2005", "2005-2008", "1999-2008"))
ggplot(completed,aes(x=year, fill=variable)) + geom_bar(position=position_dodge())
Furthermore, the other benefit of using this is you will have also your results in a good order for others functions like summary or plot.
Does it help?
Yeah this is a real probelm in ggplot. It always changes the order of non-numeric values
The easiest way to solve it is to add scale_x_discrete in this way:
p <- ggplot(completedDF,aes(year,value,fill=variable))
p <- p + geom_bar(position=position_dodge(),stat="identity")
p <- p + scale_x_discrete(limits = c("1999-2002","2002-2005","2005-2008","1999-2008"))
I have two datsets in my diagram, "y1" and "baseline". As labels in a ggplot diagram, I want to use the difference of the y-value of both. I want to facilitate this on the fly.
Dataframe:
df <- data.frame(c(10,20,40),c(0.1,0.2,0.3),c(0.05,0.1,0.2))
names(df)[1] <- "classes"
names(df)[2] <- "y1"
names(df)[3] <- "baseline"
df$classes <- factor(df$classes,levels=c(10,20,40), labels=c("10m","20m","40m"))
dfm=melt(df)
To start with, I defined a function which returns the y-value of the baseline corresponding to a particular x-value:
Tested it, works fine:
getBaselineY <- function(xValue){
return(dfm[dfm$classes==xValue & dfm$variable=="baseline",]$value[1])
}
Unfortunately, parsing this function into the ggplot code only gives me the baseline y-value for the first x-value:
diagram <- ggplot(dfm, aes(x=classes, y=value, group=variable, colour=variable))
diagram <- diagram + geom_point() + geom_line()
diagram <- diagram + geom_text(aes(label=getBaselineY(classes)))
diagram <- diagram + theme_bw(base_size=16)
diagram
Nevertheless, subsetting the function call by just the x-value gives me the respective x-value for each ggplot-iteration:
diagram <- diagram + geom_text(aes(label=classes))
I don't understand how this come and how to solve it the best way. Any help is highly appreciated!
Alternatively, this could be solved by calculating the difference beforehand and adding an additional column to the data frame:
df$Difference<-df$y1-df$baseline
dfm=melt(df,id.var=c(1,4))
And use it directly as geom_text label:
diagram <- diagram + geom_text(aes(label=Difference))
The problem is your function getBaselineY. I guess you wrote and tested it with a single xValue in mind. But you are passing a vector to the function and return only the first value.
To get the labels the way you described use an ifelse:
diagram + geom_text(aes(label = ifelse(variable == "baseline", value,
value - value[variable == "baseline"])))
What I really want to do is plot a histogram, with the y-axis on a log-scale. Obviously this i a problem with the ggplot2 geom_histogram, since the bottom os the bar is at zero, and the log of that gives you trouble.
My workaround is to use the freqpoly geom, and that more-or less does the job. The following code works just fine:
ggplot(zcoorddist) +
geom_freqpoly(aes(x=zcoord,y=..density..),binwidth = 0.001) +
scale_y_continuous(trans = 'log10')
The issue is that at the edges of my data, I get a couple of garish vertical lines that really thro you off visually when combining a bunch of these freqpoly curves in one plot. What I'd like to be able to do is use points at every vertex of the freqpoly curve, and no lines connecting them. Is there a way to to this easily?
The easiest way to get the desired plot is to just recast your data. Then you can use geom_point. Since you don't provide an example, I used the standard example for geom_histogram to show this:
# load packages
require(ggplot2)
require(reshape)
# get data
data(movies)
movies <- movies[, c("title", "rating")]
# here's the equivalent of your plot
ggplot(movies) + geom_freqpoly(aes(x=rating, y=..density..), binwidth=.001) +
scale_y_continuous(trans = 'log10')
# recast the data
df1 <- recast(movies, value~., measure.var="rating")
names(df1) <- c("rating", "number")
# alternative way to recast data
df2 <- as.data.frame(table(movies$rating))
names(df2) <- c("rating", "number")
df2$rating <- as.numeric(as.character(df$rating))
# plot
p <- ggplot(df1, aes(x=rating)) + scale_y_continuous(trans="log10", name="density")
# with lines
p + geom_linerange(aes(ymax=number, ymin=.9))
# only points
p + geom_point(aes(y=number))
While trying to overlay a new line to a existing ggplot, I am getting the following error:
Error: ggplot2 doesn't know how to deal with data of class uneval
The first part of my code works fine. Below is an image of "recent" hourly wind generation data from a Midwestern United States electric power market.
Now I want to overlay the last two days worth of observations in Red. It should be easy but I cant figure out why I am getting a error.
Any assistance would be greatly appreciated.
Below is a reproducible example:
# Read in Wind data
fname <- "https://www.midwestiso.org/Library/Repository/Market%20Reports/20130510_hwd_HIST.csv"
df <- read.csv(fname, header=TRUE, sep="," , skip=7)
df <- df[1:(length(df$MKTHOUR)-5),]
# format variables
df$MWh <- as.numeric(df$MWh)
df$Datetime <- strptime(df$MKTHOUR, "%m/%d/%y %I:%M %p")
# Create some variables
df$Date <- as.Date(df$Datetime)
df$HrEnd <- df$Datetime$hour+1
# Subset recent and last data
last.obs <- range(df$Date)[2]
df.recent <- subset(df, Date %in% seq(last.obs-30, last.obs-2, by=1))
df.last <- subset(df, Date %in% seq(last.obs-2, last.obs, by=1))
# plot recent in Grey
p <- ggplot(df.recent, aes(HrEnd, MWh, group=factor(Date))) +
geom_line(color="grey") +
scale_y_continuous(labels = comma) +
scale_x_continuous(breaks = seq(1,24,1)) +
labs(y="MWh") +
labs(x="Hour Ending") +
labs(title="Hourly Wind Generation")
p
# plot last two days in Red
p <- p + geom_line(df.last, aes(HrEnd, MWh, group=factor(Date)), color="red")
p
when you add a new data set to a geom you need to use the data= argument. Or put the arguments in the proper order mapping=..., data=.... Take a look at the arguments for ?geom_line.
Thus:
p + geom_line(data=df.last, aes(HrEnd, MWh, group=factor(Date)), color="red")
Or:
p + geom_line(aes(HrEnd, MWh, group=factor(Date)), df.last, color="red")
Another cause is accidentally putting the data=... inside the aes(...) instead of outside:
RIGHT:
ggplot(data=df[df$var7=='9-06',], aes(x=lifetime,y=rep_rate,group=mdcp,color=mdcp) ...)
WRONG:
ggplot(aes(data=df[df$var7=='9-06',],x=lifetime,y=rep_rate,group=mdcp,color=mdcp) ...)
In particular this can happen when you prototype your plot command with qplot(), which doesn't use an explicit aes(), then edit/copy-and-paste it into a ggplot()
qplot(data=..., x=...,y=..., ...)
ggplot(data=..., aes(x=...,y=...,...))
It's a pity ggplot's error message isn't Missing 'data' argument! instead of this cryptic nonsense, because that's what this message often means.
This could also occur if you refer to a variable in the data.frame that doesn't exist. For example, recently I forgot to tell ddply to summarize by one of my variables that I used in geom_line to specify line color. Then, ggplot didn't know where to find the variable I hadn't created in the summary table, and I got this error.