I have created a graph from two variables (customer.complaints and Date_by_month) for the trend chart of monthly granularity levels but on the y-axis, it's showing me the name of the complaints, instead of that I want the total number of complaints in that particular month.
This is the graph I am getting
as you can see the value of the y-axis it quite annoying. instead of that, I want to add the count of the values
Here is my code snippet
library(ggplot2)
library(scales)
### Provide the trend chart for the number of complaints at monthly and daily granularity levels.
# converting date var to date type
comcast$Date <- gsub('-', '/', comcast$Date)
comcast$Date <- as.Date(comcast$Date, '%d/%m/%Y')
# plotting graph for monthly granularity levels
comcast$Date_by_month <- as.Date(cut(comcast$Date, breaks='month'))
ggplot(comcast, aes(Date_by_month, Customer.Complaint)) + stat_summary(fun.y=sum, geom='bar') + scale_x_date(labels=date_format("%Y-%m"), breaks='1 month') + scale_y_continuous(labels = fun.y= length)
There are a few issues here. The code you posted actually has a syntax error in the last line so doesn't run at all : scale_y_continuous(labels = fun.y= length), so whatever code produced your plot, it wasn't the code you posted.
In the line stat_summary(fun.y=sum, geom='bar') you are asking to get a sum of a text variable, which doesn't make any sense (maybe you meant count or length?)
And, of course, your problem isn't reproducible because you haven't given us any data to try it out on.
That said, let's recreate a similar data frame:
library(ggplot2)
library(lubridate)
library(scales)
random_words <- function(x) paste0(sample(c(" ", " ", letters), 50, TRUE), collapse = "")
Date_by_month <- as.Date(as.POSIXct("2015-01-01") + months(sample(12, 100, TRUE)))
complaints <- sapply(1:50, random_words)
comcast <- data.frame(Date_by_month = as_date(Date_by_month), Customer.Complaint = complaints)
head(comcast)
#> Date_by_month Customer.Complaint
#> 1 2015-09-30 impzvcx esxfmknrpufewh fxqknamay qhob cvpzlgubpu
#> 2 2015-08-31 mwt aezkcolutpengovtggeqavkxnfr myrq famttzzurj ug
#> 3 2015-04-30 uusewv wjxdpywsssqxgclhmlksrxqnqdfsip u jrdsfbldey
#> 4 2015-08-31 sf jytjtwseahfaqtvzisozuhhtrzygysxndyjifxoaytxhncf
#> 5 2015-06-30 vabtbfijnkeflhgpsspxyasiistuqqqjxuqs bsucp lbdrgbn
#> 6 2015-03-01 eylmurltlfgcp rvfdx as hiehnqdrn lrqanrmf quvzbhgh
Now, we don't actually need to include the complaints themselves in the chart. We can just give the dates to ggplot and it will automatically perform counts if we select geom_bar:
ggplot(comcast, aes(Date_by_month)) +
geom_bar() +
scale_x_date(labels = date_format("%Y-%m"), breaks='1 month')
Created on 2020-02-28 by the reprex package (v0.3.0)
When converting a numeric variable "number of days from 1st of January of 2015" to date, the boxplot only shows part of the range of y-values but not all.
In this example, I plotted "gender" vs "months". Months were obtained by transforming the original "days" variable (i.e. days starting from 2015/1/1). The range of numeric values should extend from the end of March to the beginning of April of the subsequent year, but ggplot() is only plotting values between Aug and Jan and showing only month labels within that range in the y-axis.
Any help to solve this issue is very welcome!
Here is the code and the corresponding plot:
gender <- c(rep("female",144), rep("male",144))
days <- c(274,285,302,330,117,230,271,207,235,249,268,NA,NA,NA,NA,210,255,290,267,252,257,268,288,220,264,270,277,303,222,252,296,323,369,NA,258,NA,240,245,310,271,272,282,314,345,214,211,258,268,145,176,244,273,249,257,277,284,272,273,272,282,290,297,260,266,277,213,247,244,269,349,268,NA,220,235,269,299,266,273,274,307,285,299,300,224,257,284,291,305,278,294,455,280,262,272,276,295,338,264,339,232,277,230,270,312,276,285,308,241,273,340,249,260,270,352,297,217,247,287,320,191,249,265,287,320,432,262,265,324,309,234,441,409,264,381,262,276,316,330,252,264,298,315,287,330,274,287,371,237,259,266,349,247,249,241,333,379,486,198,249,270,275,279,314,182,234,252,289,319,216,262,293,234,272,284,311,258,NA,299,314,290,292,296,300,274,289,359,267,319,NA,492,294,319,293,265,273,315,307,315,287,378,238,239,315,325,361,249,NA,192,224,226,204,208,234,263,283,294,430,267,273,307,327,460,240,307,319,492,300,311,485,348,297,348,317,317,318,338,316,316,336,255,284,316,249,302,307,308,301,265,273,316,281,326,272,283,NA,NA,243,254,271,191,259,324,287,265,310,337,287,326,304,399,337,295,313,228,288,307,270,347,290,245,NA,283,423,223,NA,264,314,283)
mytable <- data.frame(gender,days)
range(mytable$days, na.rm=T) # 117 to 492
mytable$months <- (as.Date(days,origin = "2015/1/1"))
ggplot(mytable, aes(x=gender, y=months,fill=gender)) +
geom_boxplot()
I am not sure about the intuition behind this plot. But, this would give you what you desire:
ggplot(mytable, aes(x=gender, y=months, fill=gender)) +
geom_boxplot() +
scale_y_date(date_labels="%b ", date_breaks ="1 month",
limits = c(as.Date("2015-3-1"), as.Date("2016-2-1")))
So, I've hit something I don't think I have every come across. I scoured Google looking for the answer, but have not found anything (yet)...
I have two data sets - one for 2015 and one for 2016. They represent the availability of an IT system. The data frames read as such:
2015 Data Set:
variable value
Jan 2015 100
Feb 2015 99.95
... ...
2015 Data Set:
variable value
Jan 2016 99.99
Feb 2016 99.90
... ...
They just go from Jan - Dec listing the availability of the system. The "variable" column is a as.yearmon data type and the value is a simple numeric.
I want to create a geom_line() chart with ggplot2 that will basically have the percentages as the y-axis and the months as the x-axis. I have been able to do this where there are two lines, but the x-axis runs from Jan 2015 - Dec 2016. What I'd like is to have them only be plotted by month, so they overlap. I have tried some various things with the scales and so forth, but I have yet to figure out how to do this.
Basically, I need the x-axis to read January - December in chronological order, but I want to plot both 2015 and 2016 on the same chart. Here is my ggplot code (non-working) as I have it now:
ggplot(data2015,aes(variable,value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
scale_x_yearmon() +
theme_classic()
This plots in a continuous stream as I am dealing with a yearmon() data type. I have tried something like this:
ggplot(data2015,aes(months(variable),value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
theme_classic()
Obviously that won't work. I figure the months() is probably still carrying the year somehow. If I plot them as factors() they are not in order. Any help would be very much appreciated. Thank you in advance!
To get a separate line for each year, you need to extract the year from each date and map it to colour. To get months (without year) on the x-axis, you need to extract the month from each date and map to the x-axis.
library(zoo)
library(lubridate)
library(ggplot2)
Let's create some fake data with the dates in as.yearmon format. I'll create two separate data frames so as to match what you describe in your question:
# Fake data
set.seed(49)
dat1 = data.frame(date = seq(as.Date("2015-01-15"), as.Date("2015-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat1$date = as.yearmon(dat1$date)
dat2 = data.frame(date = seq(as.Date("2016-01-15"), as.Date("2016-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat2$date = as.yearmon(dat2$date)
Now for the plot. We'll extract the year and month from date with the year and month functions, respectively, from the lubridate package. We'll also turn the year into a factor, so that ggplot will use a categorical color palette for year, rather than a continuous color gradient:
ggplot(rbind(dat1,dat2), aes(month(date, label=TRUE, abbr=TRUE),
value, group=factor(year(date)), colour=factor(year(date)))) +
geom_line() +
geom_point() +
labs(x="Month", colour="Year") +
theme_classic()
month value year
Jan 99.99 2015
Feb 99.90 2015
Jan 100 2016
Feb 99.95 2016
You need one longform dataset that has a year column. Then you can plot both lines with ggplot
ggplot(dataset, aes(x = month, y = value, color = year)) + geom_line()
ggseasonplotfrom forecast package can do that for you. Example code with ts object:
ggseasonplot(a10, year.labels=TRUE, year.labels.left=TRUE) +
ylab("$ million") +
ggtitle("Seasonal plot: antidiabetic drug sales")
Source
When I am downloading data from Google Trend, the dataset looks like this:
Week nuclear atomic nuclear.weapons unemployment
2004-01-04 - 2004-01-10 11 11 1 15
2004-01-11 - 2004-01-17 11 13 1 13
2004-01-18 - 2004-01-24 10 11 1 13
How can I change the dates in "Week" from this format "Y-m-d - Y-m-d" to a format like "Year-Week"?
Furthermore, how can I tell ggplot, that it only the years are printed on the x-axes instead of all values for x?
#Mattrition: Thank you. I followed your advice:
trends <- melt(trends, id = "Woche",
measure = c("nuclear", "atomic", "nuclear.weapons", "unemployment"))
trends$Week<- gsub("^(\\d+-\\d+-\\d+).+", "\\1", trends$Week)
trends$Week <- as.Date(trends$Week)
ggplot(trends, aes(Week, value, colour = variable, group=variable)) +
geom_line() +
ylab("Trends") +
theme(legend.position="top", legend.title=element_blank(),
panel.background = element_rect(fill = "#FFFFFF", colour="#000000"))+
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73"))+
stat_smooth(method="loess")
Now, every second year is labeled (2004, 2006, ...) in x-axis. How can I tell ggplot to label every year (2004, 2005, ...)?
ggplot will understand Date objects (see ?Date) and work out appropriate labelling if you can convert your dates to this format.
You can use something like gsub to extract starting day for each week. This uses regular expressions to match the first argument and return anything inside the set of brackets:
df$startingDay <- gsub("^(\\d+-\\d+-\\d+).+", "\\1", df$Week)
Then call as.Date() on the extracted day strings to convert to Date objects:
df$date <- as.Date(df$startingDay)
You can then use the date objects to plot whatever you wanted to plot:
g <- ggplot(df, aes(date, as.numeric(atomic))) + geom_line()
print(g)
EDIT:
To answer your additional question, add the following to your ggplot object:
library(scales)
g <- g + scale_x_date(breaks=date_breaks(width="1 year"),
labels=date_format("%Y"))