Struggling with Stacked Area Chart - r

I have been struggling to create an area chart using ggplot for a while now and to no avail!
Here is my code:
strings <- cbind("rstarUS","rstarUK","rstarJAP","rstarGER","rstarFRA","rstarITA","rstarCA")
time <- as.numeric(rep(seq(1,50),each=7))
rstar <- rep(strings,times=50)
v <- variance.decomposition$rstarUS*100
data <- data.frame(time,v)
data <- data.frame(time, percent=as.vector(t(data[-1])), rstar)
percent <- as.numeric(data$percent)
plot.us <- ggplot(data, aes(x=time, y=percent, fill=rstar)) + geom_area()
plot.us
My data is already in percentages, they are FEVD - but every time I run my code I keep getting lines instead of the shaded area FEVD.
I am essentially trying to get a stacked area percentage chart

Perhaps the issue is with variance.decomposition$rstarUS; if you change the value of v to something else it appears to run as expected:
library(tidyverse)
strings <- cbind("rstarUS","rstarUK","rstarJAP","rstarGER","rstarFRA","rstarITA","rstarCA")
time <- as.numeric(rep(seq(1,50),each=7))
rstar <- rep(strings,times=50)
v <- rpois(350, 10)
data <- data.frame(time,v)
data <- data.frame(time, percent=as.vector(t(data[-1])), rstar)
percent <- as.numeric(data$percent)
plot.us <- ggplot(data, aes(x=time, y=percent, fill=rstar)) +
geom_area()
plot.us
Created on 2022-07-15 by the reprex package (v2.0.1)
What is variance.decomposition$rstarUS and are you sure you have calculated it properly?

Related

ggplotly plots negative bars in positive direction

I am using a geom_bar plot in ggplotly, and it renders negative bars positive. Any ideas why this might be the case, and in particular how to solve this?
library(ggplot2)
library(plotly)
dat1 <- data.frame(
sex = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(-13.53, 16.81, 16.24, 17.42)
)
# Bar graph, time on x-axis, color fill grouped by sex -- use position_dodge()
g <- ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge())
ggplotly(g)
Why would the first bar be in a positive direction, with a negative value?
The versions that I am using is the latest:
plotly_3.4.13
ggplot2_2.1.0
If you write your plotly object to another variable you can modify the plotly properties including the 'data' it uses to render the plot.
For your specific example append this to your code:
#create plotly object to manipulate
gly<-ggplotly(g)
#confirm existing data structure/values
gly$x$data[[1]]
# see $y has values of 13.53, 16.81 which corresponds to first groups absolute values
#assign to original data
gly$x$data[[1]]$y <- dat1$total_bill[grep("Female",dat1$sex)]
#could do for second group too if needed
gly$x$data[[2]]$y <- dat1$total_bill[grep("Male",dat1$sex)]
#to see ggplotly object with changes
gly
I have come up with a general solution that works in cases where facet wrap is being used. Here is an example of the problem with toy data:
set.seed(45)
df <- data.frame( group=rep(1:4,5), TitleX=rep(1:5,4), TitleY=sample(-5:5,20, replace = TRUE))
h <- ggplot(df) + geom_bar(aes(TitleX,TitleY),stat = 'identity') + facet_wrap(~group)
h
When we use ggplotly we see what OP saw, which is that the negatives have disappeared:
gly <- ggplotly(h)
gly
I wrote a function that will check for the instances in each facet list where the y values in the text are given as 0, which seems to be a comorbid issue with the one I am currently addressing:
fix_bar_ly <- function(element,yname){
tmp <- as.data.frame(element[c("y","text")])
tmp <- tmp %>% mutate(
y=ifelse(grepl(paste0(yname,": 0$"),text),
ifelse(y!=0,-y,y),
y)
)
element$y <- tmp$y
element
}
Now I apply this function to the data for each facet:
data.list <- gly$x$data
m <- lapply(data.list,function(x){fix_bar_ly(x,"TitleY")})
gly$x$data <- m
gly
For some reason the spaces between the bars have disappeared ... but at least the values are negative in the appropriate places.

adding layer to a plot in R

Taking some generic data
A <- c(1997,2000,2000,1998,2000,1997,1997,1997)
B <- c(0,0,1,0,0,1,0,0)
df <- data.frame(A,B)
counts <- t(table(A,B))
frac <- counts[1,]/(counts[2,]+counts[1,])
C <- c(1998,2001,2000,1995,2000,1996,1998,1999)
D <- c(1,0,1,0,0,1,0,1)
df2 <- data.frame(C,D)
counts2 <- t(table(C,D))
frac2 <- counts2[1,]/(counts2[2,]+counts2[1,])
If we then want to create a scatterplot for the two datasets on the one scale
We can:
plot(frac, pch=22)
points(frac2, pch=19)
But we see we have two problems
first we want to put our year values (which appear as df$A and df$C) along the x axis
We want the x axis to automatically adjust the scale when the second data is added.
A solution using ggplot2 or base R would be desired
ggplot will do the scaling for you. You can convert the fracs to data.frame and to use with ggplot
library(ggplot2)
ggplot(data.frame(y=frac, x=names(frac)), aes(x, y)) +
geom_point(col="salmon") +
geom_point(data=data.frame(y=frac2, x=names(frac2)), aes(x, y), col="steelblue") +
theme_bw()

how do I stop ggplot automatically arranging my graph?

I made a grouped barchart in R using the ggplot package. I used the following code:
ggplot(completedDF,aes(year,value,fill=variable)) + geom_bar(position=position_dodge(),stat="identity")
And the graph looks like this:
The problem is that I want the 1999-2008 data to be at the end.
Is there anyway to move it?
Thanks any help appreciated.
ggplot will follow the order of the levels in a factor. If you didn't ordered your factor, then it is assumed that the order is alphabetical.
If you want your "1999-2008" modality to be at the end, just reorder your factor using
completed$year <- factor(x=completed$year,
levels=c("1999-2002", "2002-2005", "2005-2008", "1999-2008"))
For example :
library(ggplot2)
# Create a sample data set
set.seed(2014)
years_labels <- c( "1999-2008","1999-2002", "2002-2005", "2005-2008")
variable_labels <- c("pointChangeVector", "nonPointChangeVector",
"onRoadChangeVector", "nonRoadChangeVecto")
years <- rbinom(n=1000, size=3,prob=0.3)
variables <- rbinom(n=1000, size=3,prob=0.3)
year <- factor(x=years , levels=0:3, labels=years_labels)
variable <- factor(x=variables , levels=0:3, labels=variable_labels)
completed <- data.frame( year, variable)
# Plot
ggplot(completed,aes(x=year, fill=variable)) + geom_bar(position=position_dodge())
# change the order
completed$year <- factor(x=completed$year,
levels=c("1999-2002", "2002-2005", "2005-2008", "1999-2008"))
ggplot(completed,aes(x=year, fill=variable)) + geom_bar(position=position_dodge())
Furthermore, the other benefit of using this is you will have also your results in a good order for others functions like summary or plot.
Does it help?
Yeah this is a real probelm in ggplot. It always changes the order of non-numeric values
The easiest way to solve it is to add scale_x_discrete in this way:
p <- ggplot(completedDF,aes(year,value,fill=variable))
p <- p + geom_bar(position=position_dodge(),stat="identity")
p <- p + scale_x_discrete(limits = c("1999-2002","2002-2005","2005-2008","1999-2008"))

direct.label error on ggplot with single series to be labeled

I am looking to resolve an error that I encounter when trying to use direct.label to label a ggplot with only one series. Below is a example to illustrate how direct.label fails if there is only a single series.
In my real data, I am looping through regions and wanting to use direct labels on the sub-regions. However, in my case some of the regions only have one sub-region resulting in an error when using direct.label. Any assistance would be greatly appreciated
library(ggplot2)
library(directlabels)
# sample data from ggplot2 movies data
mry <- do.call(rbind, by(movies, round(movies$rating), function(df) {
nums <- tapply(df$length, df$year, length)
data.frame(rating=round(df$rating[1]), year = as.numeric(names(nums)), number=as.vector(nums))
}))
# use direct labels to label based on rating
p <- ggplot(mry, aes(x=year, y=number, group=rating, color=rating)) + geom_line()
direct.label(p, "last.bumpup")
# subset to only a single rating
mry2 = subset(mry, rating==10)
p2 <- ggplot(mry2, aes(x=year, y=number, group=rating, color=rating)) + geom_line()
p2
# direct labels fails when attempting to label plot with a single series
direct.label(p2, "last.bumpup")
This indeed was a bug; the package maintainer has already fixed it. To obtain an updated version,
install.packages("directlabels", repos="http://r-forge.r-project.org")
I've just checked, everything now runs fine. Nice catch!

Plot results from dist_tab() function from qdap library

I am interested in plotting the results from the following code which produces a frequency distribution table. I would like to graph the Freq column as a bar with the cum.Freq as a line both sharing the interval column as the x-axis.
library("qdap")
x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
dist_tab(x)
I have been able to get the bar chart built using ggplot, but I want to take it further with the cum.Freq added as a secondary axis. I also want to add the percent and cum.percent values added as data labels. Any help is appreciated.
library("ggplot2")
ggplot(dist_tab(x), aes(x=interval)) + geom_bar(aes(y=Freq))
Not sure if I understand your question. Is this what you are looking for?
df <- dist_tab(x)
df.melt <- melt(df, id.vars="interval", measure.vars=c("Freq", "cum.Freq"))
#
ggplot(df.melt, aes(x=interval, y=value, fill=variable)) +
geom_bar(stat="identity", position="dodge")

Resources