R, ggplot2, skip printing x values - r

This might be fairly simple but yet i cant seem to find out how to do it.
I got a nice plot with a group of lines of values in it.
The y represents an amount, the x represents dates.
The problem is simple, there so many dates that they are printed on top of each other.
The code :
sp = rbind(sp1,sp2,sp3,sp4)
pm = ggplot(data = sp, aes(x = date,
y = amount,
colour=sm,
group=sm)) +
geom_line()
How can I make the x axis only print for example every 5 dates instead of all of them?
Thanks in advance!

library(scales)
sp = rbind(sp1,sp2,sp3,sp4)
pm = ggplot(data = sp, aes(x = date, y = amount, colour=sm, group=sm)) +
geom_line() +
scale_x_date("x axis title", breaks = "5 years")
scale_x_date will sort out the x axis labels for you. To specify the label intervals use the scales packages as above. (p.s your dates need to be of class Date, POSIXct or POSIXlt)

Related

How to put axis labels in between the axis ticks in ggplot2

In base R, I like to make plots with time on the x-axis where the labels are shown in between long tick marks. For example, there may be tick marks at June 1 and June 31, but the text "June" shows up in between, centered around June 15. In base R, I simply draw 2 axes, one with the ticks and one with the labels.
However, I haven't been able to figure out how to make this style of axis in ggplot2.
Simply offsetting the text adjustment is not precise enough.
Creating a single axis with labels = c("","June","") almost works but tick marks only accept one length so something like axis.ticks.length = unit(c(.25,0,.25),"cm") doesn't work.
I think something like this might be possible with the ggh4x package but I haven't been able to figure it out. I will be happy for any solution compatible with ggplot2, regardless of which package.
Have you looked into the scales package? Instead of manually creating x-axis ticks and labels, you can specify exactly how many x-axis tick marks and labels you want with breaks_pretty and specify date formatting with label_date. More info on date formatting here.
library(tidyverse)
library(scales)
time <- seq(as.Date("2020-1-1"), as.Date("2022-1-1"), by = "months")
var <- c(1:15, 15:6)
df <- data.frame(var, time) %>%
mutate(time = as.Date(time))
# original
df %>%
ggplot(aes(x = time, y = var)) +
geom_bar(stat = "identity")
# just month names
df %>%
ggplot(aes(x = time, y = var)) +
geom_bar(stat = "identity") +
scale_x_date(labels = label_date("%B"))
# increase tick marks with month names and full year
df %>%
ggplot(aes(x = time, y = var)) +
geom_bar(stat = "identity") +
scale_x_date(labels = label_date("%B %Y"),
breaks = breaks_pretty(n = 12)) # <- change 12 to another number

Why is ggplot drawing straight lines intermittently along my time series geom_line?

I have a ggplot with two geom_lines that plot a rolling 7-day sum of data for 2019 and 2020, on the x axis is a numeric variable where I have removed the year from date to give a day/month (101, 102 all the way to 1231).
Something, probably the date, is causing the ggplot to draw straight lines over ~70 days. Any ideas how I can plot all the points without skipping like this?
Attached is a picture of the ggplot and Excel where it plots perfectly.
library(ggplot2)
library(plotly)
options(scipen=1000000)
plot_a <- ggplot(data = Total_Arrivals) +
geom_line(mapping = aes(x = Date, y = SevenDaySum2019, group ='Date'), colour= company_colour[3]) +
geom_line(mapping = aes(x = Date, y = SevenDaySum2020, group ='Date'), colour= company_colour[2]) +
theme_company
ggplotly(plot_a)

Can't change the colors on a ggplot2 histogram

I'm working with a dataset of 5k finish times that looks a little bit like this:
"15:34"
"14:23"
"17:34"
and so on, there's a lot, but they're all formatted like that. I'm able to convert all of them to POSIXct, and store them in a data frame to make using ggplot2 easier, but for the life of me, I cannot get ggplot to change colors. The fill command doesn't work, the graph just remains grey.
I've tried just referencing the POSIXct object I made, but ggplot throws an error and tells me it doesn't work well with POSIXct. The only way I've been able to display a histogram is by storing it in a dataframe.
The code I'm currently using looks like:
#make the data frame
df <- data.frame(
finish_times = times_list)
#set the limits on the x axis as datetime objects
lim <- as.POSIXct(strptime(c('2018-3-18 14:15', '2018-3-18 20:00'), format = "%Y-%m-%d %M:%S"))
#making the plot
ggplot(data = df, aes(x = finish_times)) +
geom_histogram(fill = 'red') + #this just doesn't work
stat_bin(bins = 30) +
scale_x_datetime(labels = date_format("%M:%S"),
breaks = date_breaks("1 min"),
limits = lim) +
labs(title = "2017 5k finishers",
x='Finish Times',
y= 'Counts')
I've crawled through a lot of ggplot and R documentation, and I'm not sure what I'm missing, I appreciate all help, thanks
stat_bin(bins = 30) is overriding anything you set in geom_histogram(). Generally, each geom has an associated default stat, and you can plot the object using one of the two, but when you try to do it with both, you can end up with problems. There are several solutions to this. Here's an example.
ggplot(diamonds, aes(x = carat)) + geom_histogram(fill = "red") + stat_bin(bins = 30)
Produces a plot with gray fill
ggplot(diamonds, aes(x = carat)) + geom_histogram(fill = "red", bins = 30)
Produces a plot with red fill

ggplot x axis trouble

Currently, I have this plot that looks like this:
I don't like how on the x-axis there are weird lines / bars. I suspect this may be because ggplot can't fit all 540000 observations in the x axis. Here is the code I used to graph this:
data %>%
ggplot() +
geom_point(aes(x = dates_df$date, y = Quantity)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x = "Invoice Date", y = "Quantity", title = "Quantity vs Invoice Date")
What can I do to get rid of / solve this mess on the x-axis?
As was told on comments it seems there is a mess in Date column and you use of two separate data frames. As first join the data. I assume both of them has some Id or other key like name in column:
library("dplyr")
left_join(data,dates_df,by="id")
Date is also a character as was mentioned. To change it to Date, if you haven't already do this use as.Date function. After joining
data$date<- as.Date(data$date, "%m/%d/%Y")
you can find other date formats here: http://www.statmethods.net/input/dates.html
You said there are 540 000 observation on x axis. My suggestfion is to separate the chart for unique year. To do this use facet_grid function inside ggplot.
library(lubridate)
ggplot(df, aes(x= df$date,y= df$Quantity))+
geom_point() +
facet_grid(~year(df$date))
Hope it helped :)

labeling axis of dates in ggplot?

I am trying to making plots using ggplot in R and I have the same problem that was discussed below.
Date axis labels in ggplot2 is one day behind
My data ranges from 2016-09-01 to 2016-09-30, but labels in plots say 2016-08-31 is the first day of data.
I solved the problem with the solution in the previous question, which is:
ggplot(df, aes(x, y)) +
geom_point() +
scale_x_datetime(breaks =df$x , labels = format(df$x, "%Y-%m-%d"))
(Is this to set breaks and labels by taking exact dates from the data?)
Anyways, I have a new problem,
dates match to labels well now but the plot does not look good.
I am not complaining length of dates is too long, but I don't like I can't set breaks and labels by a week or a certain number of days with the solution above.
Also, I have many missing dates.
What should I do to solve this problem? I need a new solution.
Just use this if you want your dates to appear vertically (that way you can see all your dates):
ggplot(df, aes(x, y)) +
geom_point() +
scale_x_datetime(breaks =df$x , labels = format(df$x, "%Y-%m-%d")) +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))
I found the solution... Maybe my question was not described here in detail.
My solution for the situation where dates did not match to values on an axis and I wanted to make plots look better is:
# set breaks first by seq.POSIXt
breaks.index <- seq.POSIXt(from=as.POSIXct(strftime("2020-01-01", format="%Y-%m-%d"), format="%Y-%m-%d"), to=as.POSIXct(strftime("2020-12-31", format="%Y-%m-%d"), format="%Y-%m-%d"), by="1 week")
and
# plot
plot <- ggplot(data, aes(x=date, y=y)
+scale_x_datetime(breaks = breaks.index, labels = format(breaks.index, "%Y-%m-%d"))
plot
.
Though I don't understand what is different from using scale_x_date(date_labels ='%F') and how this code works, it works.

Resources