ggplot2: yearmon scale and geom_bar - r

More than a solution I'd like to understand the reason why something which should be quite easy, it's actually not.
[I am borrowing part of the code from a different post which touched on the issue but it ended up with a solution I didn't like]
library(ggplot2)
library(xts)
library(dplyr)
library(scales)
csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"
tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.yearmon(tmp$dt)
tmp$status <- as.factor(tmp$status)
### Not good. Why?
ggplot(tmp, aes(x = yearmon, fill = status)) +
geom_bar() +
scale_x_yearmon()
### Almost good but long-winded and ticks not great
chartData <- tmp %>%
group_by(yearmon, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) +
geom_col() +
scale_x_yearmon()
The first plot is all wrong; the second is almost perfect (ticks on the X axis are not great but I can live with that). Isn't geom_bar() supposed to perform the count job I have to manually perform in the second chart?
FIRST CHART
SECOND CHART
My question is: why is the first chart so poor? There is a warning which is meant to suggest something ("position_stack requires non-overlapping x intervals") but I really fail to understand it.
Thanks.
MY PERSONAL ANSWER
This is what I learned (thanks so much to all of you!):
Even if there is a scale_#_yearmon or scale_#_date, unfortunately ggplot treats those object types as continuous numbers. That makes geom_bar unusable.
geom_histogram might do the trick. But you lose control on relevant parts of the aestethics.
bottom line: you need to group/sum before you chart
Not sure (if you plan to use ggplot2) xts or lubridate are really that useful for what I was trying to achieve. I suspect for any continuous case - date-wise - they will be perfect.
All in, I ended with this which does perfectly what I am after (notice how there is no need for xts or lubridate):
library(ggplot2)
library(dplyr)
library(scales)
csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"
tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.Date(format(tmp$dt, "%Y-%m-01"))
tmp$status <- as.factor(tmp$status)
### GOOD
chartData <- tmp %>%
group_by(yearmon, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) +
geom_col() +
scale_x_date(labels = date_format("%h-%y"),
breaks = seq(from = min(chartData$yearmon),
to = max(chartData$yearmon), by = "month"))
FINAL OUTPUT

You could also aes(x=factor(yearmon), ...) as a shortcut fix.

The reason why the first plot is screwed is basically ggplot2 does not exactly what the yearmon is. As you see here it is just a num internally with labels.
> as.numeric(tmp$yearmon)
[1] 2015.917 2015.917 2015.917 2015.833 2015.750 2015.917 2016.417 2016.333 2016.167 2015.917
So when you plot without the previous aggregation, the bar is spread out. You need to assign appropriate binwidth using geom_histogram() like this:
ggplot(tmp, aes(x = yearmon, fill = status)) +
geom_histogram(binwidth = 1/12) +
scale_x_yearmon()
1/12 corresponds with 12 months in each year.
For a plot after aggregation, as #ed_sans suggest, I also prefer lubridate as I know better on how to change ticks and modify axis labels.
chartData <- tmp %>%
mutate(ym = floor_date(dt,"month")) %>%
group_by(ym, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = ym, y = count, fill = status)) +
geom_col() +
scale_x_date(labels = date_format("%Y-%m"),
breaks = as.Date("2015-09-01") +
months(seq(0, 10, by = 2)))

Related

Combine scale_x_upset with scale_y_break

I made an upset plot using the ggupset package and added a break to the y axis with scale_y_break from the ggbreakpackage.
However, when I add scale_y_break, the combination matrix under the bar plot disappears.
Is there a way to combine the combination matrix of the plot made without scale_y_break with the bar plot portion of a plot made with scale_y_break? I can't seem to be able to access the grobs of these plots or use any other workaround. If anyone could help, I would greatly appreciate it!
Example with scale_x_upset and scale_y_break:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
I would like to combine the barplot portion of the plot created with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
with the combination matrix portion of the plot made with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)
Thanks!

ggplot boxplot: position_dodge does not work

I have made a relatively simple boxplot with ggplot
ggplot(l8tc.df_17_18,aes(x=landcover,y= tcw_17, group=landcover))+
geom_boxplot()+
geom_boxplot(aes(y= tcw_18),position_dodge(1))
A screenshot to get an idea of the data used:
This is the output:
I want the different boxplots to be next to each other and not in one vertical line. I have looked through all related questions and tried out a couple of options, however I could not find a solution so far.
I am still a ggplot beginner though.
Any ideas?
You should use in this case different data format and melt it.
require(reshape2)
require(tidyverse)
# format data
melted_data <- l8tc.df_17_18 %>%
select(landcover, tcw_17, tcw_18) %>%
melt('landcover', variable.name = 'tcw')
# plot
ggplot(melted_data, aes(x = as.factor(landcover), y = value)) + geom_boxplot(aes(fill = tcw))
a dodge should be automatic but if you want ot experiment use geom_boxplot(aes(fill = tcw), position = position_dodge())
https://ggplot2.tidyverse.org/reference/position_dodge.html
you can write it in one line without creating temp file
l8tc.df_17_18 %>%
select(landcover, tcw_17, tcw_18) %>%
melt('landcover', variable.name = 'tcw') %>%
ggplot(aes(x = as.factor(landcover), y = value)) + geom_boxplot(aes(fill = tcw))

ggplot fill does not work - no errors [MRE]

the ggplot analysis below is intended show number of survey responses by date. I'd like to color the bars by the three survey administrations (the Admini variable).While there are no errors thrown, the bars do not color.
Can anyone point out how/why my bars are not color-coded? THANKS!
library(ggplot2)
library(dplyr)
library(RCurl)
OSTadminDates2<-getURL("https://raw.githubusercontent.com/bac3917/Cauldron/master/OSTadminDates.csv")
OSTadminDates<-read.csv(text=OSTadminDates2)
ndate1<-as.Date(OSTadminDates$Date,"%m/%d/%y");ndate1
SurvAdmin<-as.factor(OSTadminDates$Admini)
R<-ggplot(data=OSTadminDates,aes(x=ndate1),fill=Admini,group=1) +
geom_bar(stat = "count",width = .5 )
R
Here's a work-around you could use:
library(ggplot2)
library(dplyr)
library(RCurl)
OSTadminDates2<-getURL("https://raw.githubusercontent.com/bac3917/Cauldron/master/OSTadminDates.csv")
OSTadminDates<-read.csv(text=OSTadminDates2)
OSTadminDates$Date<-as.Date(OSTadminDates$Date,"%m/%d/%y")
OSTadminDates$Admini <- factor(OSTadminDates$Admini)
df <- OSTadminDates %>%
group_by(Date, Admini) %>%
summarise(n = n())
ggplot(data = df) +
geom_bar(aes(x = Date, y = n, fill = Admini), stat = "identity")

How to generate grouped bar plot or pie chart from list of csv files?

I got list of data.frame that need to be classified, I did manipulate these list and finally export them as csv files in default folder. However, to make these exported data more informative, I think it is better to generate grouped bar plot, or pie chart for each data.frame objects. As a beginner, I am still learning features of ggplot2 packages, so I have little idea how to do this easily. Can any one give me possible ideas how to generate grouped bar plot easily ? How can I generate well informative bar plot for list of files ? How can I make this happen ? Any idea ? Thanks in advance :)
reproducible data :
savedDF <- list(
bar.saved = data.frame(start=sample(100, 15), stop=sample(150, 15), score=sample(36, 15)),
cat.saved = data.frame(start=sample(100, 20), stop=sample(100,20), score=sample(45,20)),
foo.saved = data.frame(start=sample(125, 24), stop=sample(140, 24), score=sample(32, 24))
)
dropedDF <- list(
bar.droped = data.frame(start=sample(60, 12), stop=sample(90,12), score=sample(35,12)),
cat.droped = data.frame(start=sample(75, 18), stop=sample(84,18), score=sample(28,18)),
foo.droped = data.frame(start=sample(54, 14), stop=sample(72,14), score=sample(25,14))
)
so I am getting list of csv files from this pipeline :
comb <- do.call("rbind", c(savedDF, dropedDF))
cn <- c("letter", "saved","seq")
DF <- cbind(read.table(text = chartr("_", ".", rownames(comb)), sep = ".", col.names = cn), comb)
DF <- transform(DF, updown = ifelse(score>= 12, "stringent", "weak"))
by(DF, DF[c("letter", "saved", "updown")],
function(x) write.csv(x[-(1:3)],
sprintf("%s_%s_%s.csv", x$letter[1], x$updown[1], x$saved[1])))
To better understand the exported data, I think generating grouped bar plot and pie chart for each data.frame object will be much informative.
In desired plot, I intend to see number of features in each csv files for each data.frame objects. Can any one give me ideas to do this task ?
How can I make this happen easily by using ggplot2 packages ? Is there any way to get this done more efficiently ? Thanks a lot
If I understand correctly, this may work for you as a rough solution. Please comment to let me know if this is acceptable. In the future, if you can provide a rough sketch along with your data to show what you're trying to achieve that would be a good idea.
library(dplyr)
library(ggplot2)
plot_data <- DF %>%
group_by(letter, saved, updown) %>%
tally %>%
group_by(saved, updown) %>%
mutate(percentage = n/sum(n))
ggplot(plot_data, aes(x = saved, y = n, fill = saved)) +
geom_bar(stat = "identity") +
facet_wrap(~ letter + updown, ncol = 2)
You can always change the facet_wrap(~ letter + updown, ncol = 2) to an explicit facet_grid(letter ~ updown) if you wish.
Or you could view it this way:
ggplot(plot_data, aes(x = letter, y = n)) +
geom_bar(stat = "identity") +
facet_wrap(~updown+saved, ncol = 2)
For a pie (cleaning up and labeling is up to you):
ggplot(plot_data, aes(x = 1, y = percentage, fill = letter)) +
geom_bar(stat = "identity", width =1) +
facet_wrap(~updown+saved, ncol = 2) +
coord_polar(theta = "y") +
theme_void()
The bar, 4 interaction pie just requires some manipulating of your data:
library(dplyr)
library(tidyr)
library(ggplot2)
plot_data <- DF %>%
unite(interaction, saved, updown, sep = "-") %>%
group_by(letter, interaction) %>%
tally %>%
mutate(percentage = n/sum(n)) %>%
filter(letter == "bar")
ggplot(plot_data, aes(x = 1, y = percentage, fill = interaction)) +
geom_bar(stat = "identity", width =1) +
coord_polar(theta = "y") +
theme_void()
You should really look into dplyr, tidyr and ggplot2 packages. Read their documentation and vignettes and work through the exmaples. Best way to learn is by doing.

Plotting a bar graph in R

Here is a snapshot of data:
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales, restaurant_change_labor, restaurant_change_POS)
I want two bars for each of the columns indicating the change. One graph for each of the columns.
I tried:
ggplot(aes(x = rest_change$restaurant_change_sales), data = rest_change) + geom_bar()
This is not giving the result the way I want. Please help!!
So ... something like:
library(ggplot2)
library(dplyr)
library(tidyr)
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales,
restaurant_change_labor,
restaurant_change_POS)
cbind(rest_change,
change = c("Before", "After")) %>%
gather(key,value,-change) %>%
ggplot(aes(x = change,
y = value)) +
geom_bar(stat="identity") +
facet_grid(~key)
Which will produce:
Edit:
To be extra fancy e.g. make it so that the order of x-axis labels goes from "Before" to "After", you can add this line: scale_x_discrete(limits = c("Before", "After")) to the end of the ggplot function
Your data are not formatted properly to work well with ggplot2, or really any of the plotting packages in R. So we'll fix your data up first, and then use ggplot2 to plot it.
library(tidyr)
library(dplyr)
library(ggplot2)
# We need to differentiate between the values in the rows for them to make sense.
rest_change$category <- c('first val', 'second val')
# Now we use tidyr to reshape the data to the format that ggplot2 expects.
rc2 <- rest_change %>% gather(variable, value, -category)
rc2
# Now we can plot it.
# The category that we added goes along the x-axis, the values go along the y-axis.
# We want a bar chart and the value column contains absolute values, so no summation
# necessary, hence we use 'identity'.
# facet_grid() gives three miniplots within the image for each of the variables.
ggplot2(rc2, aes(x=category, y=value, facet=variable)) +
geom_bar(stat='identity') +
facet_grid(~variable)
You have to melt your data:
library(reshape2) # or library(data.table)
rest_change$rowN <- 1:nrow(rest_change)
rest_change <- melt(rest_change, id.var = "rowN")
ggplot(rest_change,aes(x = rowN, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable)

Resources