How to fix ggplot2 time series barchart bars that are too wide? - r

I am plotting a time series bar chart with a measure for different categories. When I plot the time series bar chart, the width of the bars fills over many dates so that the neighbouring bars touch, even if they are a month apart, but this means that it is unclear which date that bar corresponds to. How do I change the code so that the bars only appear over the date in the underlying dataframe?
I have successfully plotted another time series bar chart with exactly the same ggplot code but different underlying data and so it is unclear to me why this is happening with this particular dataframe.
In this following example, I use a dataframe with only one category for simplicity in highlighting the issue:
data <- data.frame(a = c(as.Date("2019-05-30"), as.Date("2019-06-19")), b = c("FX FORWARD", "FX FORWARD"), c = c(29.2, 74.7))
colnames(data ) <- c("Expiration Date", "Security Type", "Exposure $M")
plot <- ggplot(data , aes(x=`Expiration Date`, y=`Exposure $M`, fill=`Security Type`)) +
geom_bar(stat="identity") + scale_x_date(labels = scales::date_format("%d-%b"), date_breaks = "3 day")
I expected the bars to appear only above the day in which they are stored in the dataframe and not as it is shown in the chart, i.e. $29.2 above 31st May 2019 only and not spreading from 23rd May to 8th June; same for the second data point. Can anyone advise how I may correct this in my code?
Thanks in advance for any help, I've tried looking all over for a solution.

Related

How to plot only vertical lines of different colours over the date axis with geom_vline in Shiny?

I am a complete newbie to ggplot2. I have been trying to understand how to use aesthetics and geom_vline to do the following with no success. I presume there must be a simple and elegant solution to this in one line, but I can't get it.
So, I have a data frame with columns (dates, category, colour).
For example:
dates = c("1/Jan/2022 06:00", "1/Jan/2022 18:00", "2/Jan/2022 06:00", "2/Jan/2017 18:00")
category = c(1, 5, 6, 3)
colour = c("black", "red", "blue", "red")
data = data.frame(dates, category, colour)
The Y-axis is categorical, but could be continuous.
I would like to plot vertical lines for each date on X-axis with specified colour and on Y-axis every line should go from category - 0.5 up to category + 0.5.
On X-axis I would like to observe only months on ticks, and on Y-axis all the categories.
The data frame has thousands of dates- is there a risk of overlapping lines? Is there a way of controlling the line thickness using the pixel size of this plot in Shiny app and the total number of lines?
Any help is appreciated!
I don't what your dates are, so I provide my own dates like this:
dates = seq.Date(as.Date("2022-01-01"), as.Date("2022-02-01"), length.out = 4)
Now, you can simply use geom_linerange, and I've invented a set of shiny-like inputs, just to give you a suggestion of how you might use information from user inputs from a shiny app to set the point and line width/size
input = list(linewidth=1, pointsize=2)
ggplot(data, aes(dates,category, color=colour)) +
geom_point(size=input$pointsize) +
geom_linerange(aes(ymin=category-0.5, ymax=category+0.5),size=input$linewidth)

Setting column width in charts in R

I'm very new to R. I want to plot graphs by months with ggplot2, but the last dates of the year variable are intertwined on the x-axis. I have attached the image below. Any ideas on how I can adjust the width on the x-axis? Can I also print each year in the date variable? My dates are between 2010-2020.
enter image description here
Updated version. Op seems to be asking for this. The time variable shows the full date ("year-month-day"). Modifying the x-axis using scale_x_date for showing only calendar years:
# example dataset
dt <- data.table(date=as.Date(seq(1,1000,100),origin = "2010-01-01"),var=rnorm(10))
head(dt)
# display only the YEAR
ggplot(dt,aes(y=var,x=date))+geom_point()+
scale_x_date(date_breaks = "1 year", date_labels = "%Y")
# display 6 months intervals
ggplot(dt,aes(y=var,x=date))+geom_point()+
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y")
Older version: the time variable shows only years.
For showing each single year of the data here are two options.
For increasing the width I guess you mean while saving the plot permanently.
Clarification: if you use R Studio you as it seems from the screenshot, you can change the temporary visualization of the plot in many ways using the GUI.
Clarification #2: check ?facet_wrap to see how you can display the facets in multiple rows and columns, that could also help the specific visualization of your plot.
library(ggplot2)
library(data.table)
# create example dataset (no values for 2015)
dt <- data.table(var=rnorm(40),year=sample(c(seq(2010,2014,1),seq(2016,2020,1)),40,replace = T))
# clearly plot each specific year by considering it as factor (2015 not shown)
ggplot(dt,aes(y=var,x=as.factor(year)))+geom_point()+
xlab("Year") # nicer x-axis naming
# clearly plot each specific year by modifying breaks (shows also empty years if present)
ggplot(dt,aes(y=var,x=year))+geom_point()+
scale_x_continuous(breaks = seq(min(dt[,year]),max(dt[,year]),1))
# save the file with exaggerated width (just an example)
ggsave("myfilepath/myfilename.jpg",width=20,height=4,units = "cm")

choosing specific values on the X axis when using ggplot2

I am trying to plot a graph showing the number of events at the Olympics as a function of the year that a specific Olympic took place.
My data frame is called supertable and it consists of 2 columns, the first is the year and the second is the number of events in the games held that year.
My problem is that on the x axis I only get the years 1920 and 1980 and I would like to have 1920,1950,1980,2010
this is my code
ggplot(data = supertable,aes(x=year,y=no.of.events))+geom_point(colour='red')+
scale_x_discrete(breaks=c(1920,1950,1980,2010))
This is the picture I get
I tried doing this
scale_x_discrete(breaks=c(1920,1950,1980,2010),limits=c(1920,1950,1980,2010)
but it didn't help
I am assuming It is some thing small that I am missing, I tried searching for the answer but didn't find it.
Your x-axis is a continuous variable, so you need to use scale_x_continuous.
You used breaks correctly to indicate where your ticks on the x axis are, but the limits value should be a c(min, max) of the range of the plot you want to show.
Try this: scale_x_continuous(breaks=c(1920,1950,1980,2010), limits = c(1920, 2019))

Revisiting R+ggplot+geom_bar+scale_x_continuous+limits: leftmost and rightmost bars not showing on plot

Please don't tag this as a duplicate of R+ggplot+geom_bar+scale_x_continuous+limits: leftmost and rightmost bars not showing on plot : some people commented that the example in there was too long/convoluted/weird, so here is a simpler example that reproduces the problem. If a moderator think it is a good idea I will delete the original (longer) question.
I am trying to create a function that does a stacked bar plot of some yearly measures. The function takes as parameters the data and the min and max year I want to plot. The problem is that for some combination of the years the bars get weird.
Here is the code, it defines the function, creates a simple simulated dataset and creates four plots with different parameters. The resulting images are below.
library(ggplot2)
library(plyr)
# Plot either all data or select by name.
doPlot <- function(data,minYear,maxYear) {
title = paste("Bob's Performance ",minYear,"-",maxYear)
# Aggregate quantity by year and category
byYear <- aggregate(Quantity ~ Year+Category, data, sum)
# Get coordinates for numbers in stacked bars
byYear = ddply(byYear, "Year", mutate, label_y = cumsum(Quantity))
g <- ggplot(byYear, aes(x=Year,y=Quantity))
g <- g + geom_bar(stat="identity",aes(fill=Category), colour="black") +
ggtitle(title) +
scale_fill_discrete("Category",labels=c("Sheep","Cactus","Chicken"),drop=FALSE,c=45, l=80)+
scale_x_continuous(name="Year", limits=c(minYear,maxYear), breaks=seq(minYear,maxYear,1)) +
geom_text(aes(label=Quantity,y=label_y), vjust=1.3,size=6)
print(g)
}
consts = paste('"Category","Year","Name","Quantity"\n',
'CACTUS,1997,Bob,45\n',
'CHICKEN,1997,Bob,6\n',
'SHEEP,1998,Bob,2\n',
'SHEEP,1999,Bob,4\n',
'SHEEP,2005,Bob,5\n',sep = "")
data <- read.csv(text=consts,header = TRUE)
data$Category <- factor(data$Category, levels = c("SHEEP", "CACTUS", "CHICKEN"))
# This works OK
doPlot(data,1996,2006)
# This don't: bars on left and rightside disappears
doPlot(data,1997,2005)
# This don't: left bar disappears but it seems it was not plotted.
doPlot(data,1998,2000)
# This is weird: why does the bar width uses over 5 years?
doPlot(data,1999,2011)
The first plot is OK since the data is all inside the years range:
In the second plot the years range is exactly the same as the range of years in the data. The leftmost and rightmost bars are not plotted, but the numbers are.
In the third plot the year range is very narrow -- again leftmost and/or rightmost bars are not plotted. There's a hint here that the bar width could not be fitted in the plot -- see the width for 1999!
The fourth plot the year range is wider, but again leftmost and/or rightmost bars are not plotted, and the one bar that is plotted covers several years.
I can make the plot sort of work by using always an extended range for years, but this is bugging me. I guess I didn't specify something that controls the bar widths, but what?
I noticed that there are similar problems with the leftmost and rightmost bars, e.g. In ggplot2 - how to ensure geom_errorbar displays bar limits for all points when controlling x-axis with xlim() , and the solutions are similar, but I believe there ought to be a better way.
I must point out that using
scale_x_continuous(name="Year", breaks=seq(minYear,maxYear,1)) +
coord_cartesian(xlim=c(minYear,maxYear)) +
instead of
scale_x_continuous(name="Year", limits=c(minYear,maxYear),breaks=seq(minYear,maxYear,1)) +
solves the "bar over several years" issue of the fourth plot, but causes parts of the leftmost/rightmost bars to be plotted:
thanks
Rafael

How to plot two y axis? or combine(merge) two plots? Should handle faceted column as well

I've a combination of two difficult(I'm naive) requirements :(
Consider the Weather data as example. Let's say I've dataset with following information.
"Datetime", "Word", "Frequency", "Temperature"
Visualization: I want to see change in frequency of a word over time and at temperature.
X-axis shows the time series(date)
Y-axis has the frequency scale(0 to max freq).
Requirements:
I need to draw frequencies of several words(Column "word") over the time.
Correlate the frequency with temperature.
I started with ggplot2:
ggplot(TemperatureData, aes(x=timeId, y=termFrequency)) + geom_line() + facet_wrap(~Keyword) +
geom_line(data = TemperatureData, aes(y = temperature)) +
labs(x="Time Series over X days", y = "Term Frequency")
The above approach results in overlapping y axis (frequency, temperature). And, a separate bin for each "Word" (facet for ggplot). i.e plot has 3 bin's for each keyword. Each bin shows temperature over time, and frequency of a word over time.
Problems:
I want to be able to separate y-axis for temperature, and frequency. Also, I do not want to normalize these y-axis as it gets tough to understand what are the high/low values of each axis over days. Plot Loses readability. I learnt that two y-axis is not possible using ggplot2.
Separate bin for each keyword is not required. One horizontal line per keyword is what I'm looking for.
The plot should have only one appearance(line graph) of temperature to reflect change over time.
I tried using PAR, but could not succeed.
Example solution using plotrix package

Resources