I have 2 datas, one for 2020 and the other for 2019. Each is divided into 5 groups when each month has its own data.
I want to create a graph that compares each month for each group between the figure in 2020 and the figure in 2019.
the data for 2020 was like that-
enter image description here
and the data for 2019 was the same.
I combine the 2 datas to that:
enter image description here
The problem is that all the graphs I looked at on the internet have either one column of values or no division into months.
How can you create one graph that compares each month between 2019 and 2020?
library(tidyverse)
library(ggplot2)
# bring table in long format
longerTable <- tibble(month = 1:12, value_2020 = rnorm(12), value_2019=rnorm(12)) %>%
pivot_longer(cols=starts_with("value"), names_to="year", values_to="value")
# plot with ggplot.
ggplot(longerTable, aes(x=month, y=value, fill=year)) +
# stat = identity -> plot numbers as they are
# position = dodge -> show bars next to each other
geom_bar(stat="identity", position = "dodge")
Created on 2020-10-01 by the reprex package (v0.3.0)
Related
I have created a graph from two variables (customer.complaints and Date_by_month) for the trend chart of monthly granularity levels but on the y-axis, it's showing me the name of the complaints, instead of that I want the total number of complaints in that particular month.
This is the graph I am getting
as you can see the value of the y-axis it quite annoying. instead of that, I want to add the count of the values
Here is my code snippet
library(ggplot2)
library(scales)
### Provide the trend chart for the number of complaints at monthly and daily granularity levels.
# converting date var to date type
comcast$Date <- gsub('-', '/', comcast$Date)
comcast$Date <- as.Date(comcast$Date, '%d/%m/%Y')
# plotting graph for monthly granularity levels
comcast$Date_by_month <- as.Date(cut(comcast$Date, breaks='month'))
ggplot(comcast, aes(Date_by_month, Customer.Complaint)) + stat_summary(fun.y=sum, geom='bar') + scale_x_date(labels=date_format("%Y-%m"), breaks='1 month') + scale_y_continuous(labels = fun.y= length)
There are a few issues here. The code you posted actually has a syntax error in the last line so doesn't run at all : scale_y_continuous(labels = fun.y= length), so whatever code produced your plot, it wasn't the code you posted.
In the line stat_summary(fun.y=sum, geom='bar') you are asking to get a sum of a text variable, which doesn't make any sense (maybe you meant count or length?)
And, of course, your problem isn't reproducible because you haven't given us any data to try it out on.
That said, let's recreate a similar data frame:
library(ggplot2)
library(lubridate)
library(scales)
random_words <- function(x) paste0(sample(c(" ", " ", letters), 50, TRUE), collapse = "")
Date_by_month <- as.Date(as.POSIXct("2015-01-01") + months(sample(12, 100, TRUE)))
complaints <- sapply(1:50, random_words)
comcast <- data.frame(Date_by_month = as_date(Date_by_month), Customer.Complaint = complaints)
head(comcast)
#> Date_by_month Customer.Complaint
#> 1 2015-09-30 impzvcx esxfmknrpufewh fxqknamay qhob cvpzlgubpu
#> 2 2015-08-31 mwt aezkcolutpengovtggeqavkxnfr myrq famttzzurj ug
#> 3 2015-04-30 uusewv wjxdpywsssqxgclhmlksrxqnqdfsip u jrdsfbldey
#> 4 2015-08-31 sf jytjtwseahfaqtvzisozuhhtrzygysxndyjifxoaytxhncf
#> 5 2015-06-30 vabtbfijnkeflhgpsspxyasiistuqqqjxuqs bsucp lbdrgbn
#> 6 2015-03-01 eylmurltlfgcp rvfdx as hiehnqdrn lrqanrmf quvzbhgh
Now, we don't actually need to include the complaints themselves in the chart. We can just give the dates to ggplot and it will automatically perform counts if we select geom_bar:
ggplot(comcast, aes(Date_by_month)) +
geom_bar() +
scale_x_date(labels = date_format("%Y-%m"), breaks='1 month')
Created on 2020-02-28 by the reprex package (v0.3.0)
I have a dataset like below and would like to create a graph as shown. I have tried many ways to represent data in a good way. Is there any way I could represent the following dataset as given graph.
d <- read.table(text="
MONTH ACTUAL PREDICTED
1 January -18.0521472 -0.05621367
2 February 5.7084035 2.06652079
3 March 1.5226629 -2.13900349
4 April -6.2783397 -1.4275986
", stringsAsFactors=FALSE)
d$MONTH <- factor(d$MONTH, levels=unique(d$MONTH))
Graph:
[![X axis is the Month and Y axis is Values of Actual and Predicted][1]][1]
Here the X- axis is the month and Y axis is the values of Actual and Predicted. I would like to show the labels as well. Thanks in advance.
library(tidyverse)
d %>%
gather(var,value,-MONTH) %>%
ggplot(aes(MONTH,value, col=var,group = var))+
geom_line(linetype = "dashed")+
geom_point()
I have a dataframe of which two columns are of date and sales. date column varies from 2012-10-22 to 2016-09-22. I want to plot the graph of sales in jan 2013 by day without creating any subset.
I have used this-
ggplot(subsales,aes(Date,spdby))+geom_line()
Is it possible by using ggplot()?
I have plotted the sales per day and look like this-
I want to zoom in, to January 2013 and want to extract that part as a new plot.
yes, in ggplot:
library(ggplot2)
subsales <- data.frame(
date = seq(as.Date("2013-1-1"), as.Date("2017-1-1"), by = "day"),
spdby = runif(1462, 2000, 6000)
)
ggplot(subsales, aes(date, spdby)) +
geom_line()
ggplot(subsales, aes(date, spdby)) + geom_line() +
scale_x_date(limits = c(as.Date("2013-1-1"), as.Date("2013-1-31")))
#> Warning: Removed 1431 rows containing missing values (geom_path).
So, I've hit something I don't think I have every come across. I scoured Google looking for the answer, but have not found anything (yet)...
I have two data sets - one for 2015 and one for 2016. They represent the availability of an IT system. The data frames read as such:
2015 Data Set:
variable value
Jan 2015 100
Feb 2015 99.95
... ...
2015 Data Set:
variable value
Jan 2016 99.99
Feb 2016 99.90
... ...
They just go from Jan - Dec listing the availability of the system. The "variable" column is a as.yearmon data type and the value is a simple numeric.
I want to create a geom_line() chart with ggplot2 that will basically have the percentages as the y-axis and the months as the x-axis. I have been able to do this where there are two lines, but the x-axis runs from Jan 2015 - Dec 2016. What I'd like is to have them only be plotted by month, so they overlap. I have tried some various things with the scales and so forth, but I have yet to figure out how to do this.
Basically, I need the x-axis to read January - December in chronological order, but I want to plot both 2015 and 2016 on the same chart. Here is my ggplot code (non-working) as I have it now:
ggplot(data2015,aes(variable,value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
scale_x_yearmon() +
theme_classic()
This plots in a continuous stream as I am dealing with a yearmon() data type. I have tried something like this:
ggplot(data2015,aes(months(variable),value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
theme_classic()
Obviously that won't work. I figure the months() is probably still carrying the year somehow. If I plot them as factors() they are not in order. Any help would be very much appreciated. Thank you in advance!
To get a separate line for each year, you need to extract the year from each date and map it to colour. To get months (without year) on the x-axis, you need to extract the month from each date and map to the x-axis.
library(zoo)
library(lubridate)
library(ggplot2)
Let's create some fake data with the dates in as.yearmon format. I'll create two separate data frames so as to match what you describe in your question:
# Fake data
set.seed(49)
dat1 = data.frame(date = seq(as.Date("2015-01-15"), as.Date("2015-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat1$date = as.yearmon(dat1$date)
dat2 = data.frame(date = seq(as.Date("2016-01-15"), as.Date("2016-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat2$date = as.yearmon(dat2$date)
Now for the plot. We'll extract the year and month from date with the year and month functions, respectively, from the lubridate package. We'll also turn the year into a factor, so that ggplot will use a categorical color palette for year, rather than a continuous color gradient:
ggplot(rbind(dat1,dat2), aes(month(date, label=TRUE, abbr=TRUE),
value, group=factor(year(date)), colour=factor(year(date)))) +
geom_line() +
geom_point() +
labs(x="Month", colour="Year") +
theme_classic()
month value year
Jan 99.99 2015
Feb 99.90 2015
Jan 100 2016
Feb 99.95 2016
You need one longform dataset that has a year column. Then you can plot both lines with ggplot
ggplot(dataset, aes(x = month, y = value, color = year)) + geom_line()
ggseasonplotfrom forecast package can do that for you. Example code with ts object:
ggseasonplot(a10, year.labels=TRUE, year.labels.left=TRUE) +
ylab("$ million") +
ggtitle("Seasonal plot: antidiabetic drug sales")
Source
head(bktst.plotdata)
date method product type actuals forecast residual Percent_error month
1 2012-12-31 bauwd CUSTM NET 194727.51 -8192.00 -202919.51 -104.21 Dec12
2 2013-01-31 bauwd CUSTM NET 470416.27 1272.01 -469144.26 -99.73 Jan13
3 2013-02-28 bauwd CUSTM NET 190943.57 -1892.45 -192836.02 -100.99 Feb13
4 2013-03-31 bauwd CUSTM NET -42908.91 2560.05 45468.96 -105.97 Mar13
5 2013-04-30 bauwd CUSTM NET -102401.68 358807.48 461209.16 -450.39 Apr13
6 2013-05-31 bauwd CUSTM NET -134869.73 337325.33 472195.06 -350.11 May13
I have been trying to plot my back test result using ggplot2. Given above a sample dataset. I have dates ranging from Dec2012 to Jul2013. 3 levels in 'method', 5 levels in 'product' and 2 levels in 'type'
I tried this code, trouble is that R is not reading x-axis correct, on the X-axis I am getting 'Jan, feb, mar, apr, may,jun, jul, aug', instead I expect R to plot Dec-to-Jul
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= date, y=Percent_error, colour=method))
facet4 <- facet_grid(product~type,scales="free_y")
title3 <- ggtitle("Percent Error - Month-over-Month")
xaxis2 <- xlab("Date")
yaxis3 <- ylab("Error (%)")
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3
# Tried changing the code to this still not getting the X-axis right
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= format(date,'%b%y'), y=Percent_error, colour=method))
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3
Well, it looks like you are plotting the last day of each month, so it actually makes sense to me that December 31 is plotted very very close to January. If you look at the plotted points (with geom_point) you can see that each point is just to the left of the closest month axis.
It sounds like you want to plot years and months instead of actual dates. There are a variety of ways you might do this, but one thing you could is to change the day part of the date to the first of the month instead of the last of the month. Here I show how you could do this using some functions from package lubridate along with paste (I have assumed your variable date is already a Date object).
require(lubridate)
bktst.plotdata$date2 = as.Date(with(bktst.plotdata,
paste(year(date), month(date), "01", sep = "-")))
Then the plot axes start at December. You can change the format of the x axis if you load the scales package.
require(scales)
ggplot(data=bktst.plotdata, aes(x = date2, y=Percent_error, colour=method)) +
facet_grid(product~type,scales="free_y") +
ggtitle("Percent Error - Month-over-Month") +
xlab("Date") + ylab("Error (%)") +
geom_line() +
scale_x_date(labels=date_format(format = "%m-%Y"))