ggplot x axis trouble - r

Currently, I have this plot that looks like this:
I don't like how on the x-axis there are weird lines / bars. I suspect this may be because ggplot can't fit all 540000 observations in the x axis. Here is the code I used to graph this:
data %>%
ggplot() +
geom_point(aes(x = dates_df$date, y = Quantity)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x = "Invoice Date", y = "Quantity", title = "Quantity vs Invoice Date")
What can I do to get rid of / solve this mess on the x-axis?

As was told on comments it seems there is a mess in Date column and you use of two separate data frames. As first join the data. I assume both of them has some Id or other key like name in column:
library("dplyr")
left_join(data,dates_df,by="id")
Date is also a character as was mentioned. To change it to Date, if you haven't already do this use as.Date function. After joining
data$date<- as.Date(data$date, "%m/%d/%Y")
you can find other date formats here: http://www.statmethods.net/input/dates.html
You said there are 540 000 observation on x axis. My suggestfion is to separate the chart for unique year. To do this use facet_grid function inside ggplot.
library(lubridate)
ggplot(df, aes(x= df$date,y= df$Quantity))+
geom_point() +
facet_grid(~year(df$date))
Hope it helped :)

Related

How to put axis labels in between the axis ticks in ggplot2

In base R, I like to make plots with time on the x-axis where the labels are shown in between long tick marks. For example, there may be tick marks at June 1 and June 31, but the text "June" shows up in between, centered around June 15. In base R, I simply draw 2 axes, one with the ticks and one with the labels.
However, I haven't been able to figure out how to make this style of axis in ggplot2.
Simply offsetting the text adjustment is not precise enough.
Creating a single axis with labels = c("","June","") almost works but tick marks only accept one length so something like axis.ticks.length = unit(c(.25,0,.25),"cm") doesn't work.
I think something like this might be possible with the ggh4x package but I haven't been able to figure it out. I will be happy for any solution compatible with ggplot2, regardless of which package.
Have you looked into the scales package? Instead of manually creating x-axis ticks and labels, you can specify exactly how many x-axis tick marks and labels you want with breaks_pretty and specify date formatting with label_date. More info on date formatting here.
library(tidyverse)
library(scales)
time <- seq(as.Date("2020-1-1"), as.Date("2022-1-1"), by = "months")
var <- c(1:15, 15:6)
df <- data.frame(var, time) %>%
mutate(time = as.Date(time))
# original
df %>%
ggplot(aes(x = time, y = var)) +
geom_bar(stat = "identity")
# just month names
df %>%
ggplot(aes(x = time, y = var)) +
geom_bar(stat = "identity") +
scale_x_date(labels = label_date("%B"))
# increase tick marks with month names and full year
df %>%
ggplot(aes(x = time, y = var)) +
geom_bar(stat = "identity") +
scale_x_date(labels = label_date("%B %Y"),
breaks = breaks_pretty(n = 12)) # <- change 12 to another number

How Can I Plot Percentage Change for 3 Vectors in Same DataFrame?

I have three vectors and a list of crimes. Each crime represents a row. On each row, each vector identifies the percentage change in the number of incidents of each type from the prior year.
Below is the reproducible example. Unfortunately, the df takes the first value in and repeats in down the columns (this is my first sorta reproducible example).
crime_vec = c('\tSTRONGARM - NO WEAPON', '$500 AND UNDER', 'ABUSE/NEGLECT: CARE FACILITY', 'AGG CRIM')
change15to16vec = as.double(825, -1.56, -66.67, -19.13)
change16to17vec = as.double(8.11, .96, 50, 4.84)
change17to18vec = as.double(-57.50, 1.29, 83.33, 28.72)
df = data.frame(crime_vec, change15to16vec, change16to17vec, change17to18vec)
df
I need a graph that will take the correct data frame, show the crimes down the y axis and ALL 3 percentage change vectors on the x-axis in a dodged bar. The examples I've seen plot only two vectors. I've tried plot(), geom_bar, geom_col, but can only get one column to graph (occasionally).
Any suggestions for a remedy would help.
Not sure if this is what you are looking for:
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(-crime_vec) %>%
ggplot(aes(x = value, y = crime_vec, fill = as.factor(name))) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
xlab("Percentage Change") +
ylab("Crime") +
labs(fill = "Change from")
For using ggplot2 it's necessary, to bring your data into a long format. geom_bar should create your desired plot.

R shiny ggplot geomline

I tried to generate a line chart with 3 lines by years but i can only generate 1 line with my code, what should I do
Try this:
Without seeing your data it is guess work on my part.
I suspect YEAR is being treated as a continuous variable and to get distinct colours you need YEAR to be a discrete variable.
ggplot(data = crimessum2)+
geom_line(mapping = aes(x=HOUR, y = Numbers, col = factor(YEAR), group = YEAR))+
xlab("HOUR")+
ylab("Total Paid by Insurance in $$")+
ggtitle(" ")

labeling axis of dates in ggplot?

I am trying to making plots using ggplot in R and I have the same problem that was discussed below.
Date axis labels in ggplot2 is one day behind
My data ranges from 2016-09-01 to 2016-09-30, but labels in plots say 2016-08-31 is the first day of data.
I solved the problem with the solution in the previous question, which is:
ggplot(df, aes(x, y)) +
geom_point() +
scale_x_datetime(breaks =df$x , labels = format(df$x, "%Y-%m-%d"))
(Is this to set breaks and labels by taking exact dates from the data?)
Anyways, I have a new problem,
dates match to labels well now but the plot does not look good.
I am not complaining length of dates is too long, but I don't like I can't set breaks and labels by a week or a certain number of days with the solution above.
Also, I have many missing dates.
What should I do to solve this problem? I need a new solution.
Just use this if you want your dates to appear vertically (that way you can see all your dates):
ggplot(df, aes(x, y)) +
geom_point() +
scale_x_datetime(breaks =df$x , labels = format(df$x, "%Y-%m-%d")) +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))
I found the solution... Maybe my question was not described here in detail.
My solution for the situation where dates did not match to values on an axis and I wanted to make plots look better is:
# set breaks first by seq.POSIXt
breaks.index <- seq.POSIXt(from=as.POSIXct(strftime("2020-01-01", format="%Y-%m-%d"), format="%Y-%m-%d"), to=as.POSIXct(strftime("2020-12-31", format="%Y-%m-%d"), format="%Y-%m-%d"), by="1 week")
and
# plot
plot <- ggplot(data, aes(x=date, y=y)
+scale_x_datetime(breaks = breaks.index, labels = format(breaks.index, "%Y-%m-%d"))
plot
.
Though I don't understand what is different from using scale_x_date(date_labels ='%F') and how this code works, it works.

R, ggplot2, skip printing x values

This might be fairly simple but yet i cant seem to find out how to do it.
I got a nice plot with a group of lines of values in it.
The y represents an amount, the x represents dates.
The problem is simple, there so many dates that they are printed on top of each other.
The code :
sp = rbind(sp1,sp2,sp3,sp4)
pm = ggplot(data = sp, aes(x = date,
y = amount,
colour=sm,
group=sm)) +
geom_line()
How can I make the x axis only print for example every 5 dates instead of all of them?
Thanks in advance!
library(scales)
sp = rbind(sp1,sp2,sp3,sp4)
pm = ggplot(data = sp, aes(x = date, y = amount, colour=sm, group=sm)) +
geom_line() +
scale_x_date("x axis title", breaks = "5 years")
scale_x_date will sort out the x axis labels for you. To specify the label intervals use the scales packages as above. (p.s your dates need to be of class Date, POSIXct or POSIXlt)

Resources