Plotting this weeks data versus last weeks in ggplot - r

I have a datasets structured the following way
date transaction
8/15/2020 585
8/14/2020 780
8/13/2020 1427.8
8/12/2020 4358
8/11/2020 780.9
8/8/2020 585
8/6/2020 1107.4
8/5/2020 2917.35
8/4/2020 1237.1
Is there a way to plot a line graph with all the transactions that occurred this week compared to the previous week? I tried filtering the data manually and assigning it to a new dataframe which seemed to work but its very manual intensive. Would it be possible to use today() and have it register the day of execution and run the results from there? Thanks!

To do that, you need
real Date (using as.Date), so that we can deal with them numerically (not categorically), and so that we can break them into weeks;
use format to get each date's week-of-the-year; and
facet_wrap so that we can use facets and have distinct x axes.
dat$date <- as.Date(dat$date, format = "%m/%d/%Y")
dat$week <- format(dat$date, format = "%V") # or %W
library(ggplot2)
ggplot(dat, aes(date, transaction)) +
facet_wrap("week", ncol = 1, scales = "free_x") +
geom_path()

Related

Creating a Cumulative Sum Plot using ggplot with duplicate x values

In my hypothetical example, people order ice-cream at a stand and each time an order is placed, the month the order was made and the number of orders placed is recorded. Each row represents a unique person who placed the order. For each flavor of ice-cream, I am curious to know the cumulative orders placed over the various months. For instance if a total of 3 Vanilla orders were placed in April and 4 in May, the graph should show one data point at 3 for April and one at 7 for May.
The issue I am running into is each row is being plotted separately (so there would be 3 separate points at April as opposed to just 1).
My secondary issue is that my dates are not in chronological order on my graph. I thought converting the Month column to Date format would fix this but it doesn't seem to.
Here is my code below:
library(lubridate)
Flavor <- c("Vanilla", "Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Vanilla","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","Strawberry","chocolate","chocolate","chocolate")
Month <- c("1-Jun-21", "1-May-19", "1-May-19","1-Apr-19", "1-Apr-19","1-Apr-19","1-Apr-19", "1-Mar-19", "1-Mar-19", "1-Mar-19","1-Mar-19", "1-Apr-19", "1-Mar-19", " 1-Apr-19", " 1-Jan-21", "1-May-19", "1-May-19","1-May-19","1-May-19","1-Jun-19","2-September-19", "1-September-19","1-September-19","1-December-19","1-May-19","1-May-19","1-Jun-19")
Orders <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2)
data <- data.frame(Flavor,Month,Orders)
data$Month <- dmy(data$Month)
str(data)
data2 <- data[data$Flavor == "Vanilla",]
ggplot(data=data2, aes(x=Month, y=cumsum(Orders))) + geom_point()
In these situations, it's usually best to pre-compute your desired summary and send that to ggplot, rather than messing around with ggplot's summary functions. I've also added a geom_line() for clarity.
data %>%
group_by(Flavor, Month) %>%
summarize(Orders = sum(Orders)) %>%
group_by(Flavor) %>%
arrange(Month) %>%
mutate(Orders = cumsum(Orders)) %>%
ggplot(data = ., aes(x=Month, y=Orders, color = Flavor)) + geom_point() + geom_line()

Align left border of geom_col column with data anchor

I'm trying to plot some timestamped data with ggplot2 and R. Here is a minimal and reproducible example of my current work
library(lubridate)
library(ggplot2)
sample_size <- 100
start_date = as.POSIXct("2020-01-01 00:00")
end_date = as.POSIXct("2020-01-02 00:00")
timestamps <- as.POSIXct(sample(seq(start_date, end_date, by=60), sample_size))
amount <- rpois(sample_size, 5)
df <- data.frame(timestamps=timestamps, amount=amount)
df$hour_group <- floor_date(df$timestamps, unit="1 hour")
ggplot(df, aes(x=hour_group, y=amount)) + geom_col()
Explanation: First a sample dataframe with the column timestamp and amount is created. The timestamps are uniformly selected between the start_date and end_date. I'd like to plot the amount variable for each hour of the day. Therefore another column hour_group is created and filled with the hour of each timestamp.
Plotting this data yields the following graph:
The columns look alright, but since the first column for example represents the sum of the amount with timestamps between 00:00 and 01:00 I'd like the column to fill exactly this space (not 23:30 to 00:30 as in the current plot). Therefore I want to align the left border of each column with the anchor point (in the example 00:00) and not center the column at this point. How can this be achieved?
My approach: One way I can think is to created another column with the shifted anchor points. In the example a 30minute shift is necessary.
df$hour_group_shifted <- df$hour_group + 60*30
The new plot creates the expected result
I'm still wondering if there may be a simpler way to achieve this directly with a ggplot setting without the extra column.
You can use position_nudge.
ggplot(df, aes(x=hour_group, y=amount)) +
geom_col(position = position_nudge(60*30))
Since ggplot2 3.4.0, you can use just = 0 to align your columns as needed:
ggplot(df, aes(x=hour_group, y=amount)) +
geom_col(just = 0)

R, Format date on X axis using ggplot

After aggregating, "filtering" and parsing a column with date and hours by using as.Date, also transforming it by using melt function, my data looks like this:
DATE variable value
1 13-09-20 Billete_50 20405
2 14-09-20 Billete_50 19808
3 13-09-20 Billete_100 27787
4 14-09-20 Billete_100 20361
5 13-09-20 Total 48192
6 14-09-20 Total 40169
I want a linear graph to show the data, but it looks like this:
It's in spanish, I'm sorry, but the date is almost the same, it may say only September (Sep), but it is showing October, January, April, July, then back to October.
This is my ggplot line:
ggplot(plotCajero1Melted, aes(x=DATE, y=value, col=variable)) + geom_line() +xlab("Fecha") +ylab("Saldo") +ggtitle("Cajero 1")
What should I do? It's something with the date? I was thinking in the year, it is represented in a short way, probably it is making ggplot to have a weird behavior, or maybe a ggplot option that i'm missing?
You're right, your date format is not read as it should be (it seems to be read as Year-Month-Day).
You can modify the date format for example by using the function dmy from lubridate package to indicate r to read the date as Day-Month-Year:
library(lubridate)
df$DATE <- dmy(df$DATE)
ggplot(df, aes(x = DATE, y = value, color = variable))+
geom_line()
Is it what you are looking for ?

How can I fix my x-axis, and plot my dates using GGPlot2 in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
So I have a CSV file with two columns. Date (DD-MM-YYYY) and Gas Price (0.00). I also converted the date using as.Date. But when I attempted to plot it using ggplot, I kept getting this error that it can't work with numeric type or something.
Here is my source code:
gasoline <- read.csv(file.choose())
Date <- gasoline$Date
Price <- as.numeric(gasoline$Price) str(Price) ggplot(gasoline, aes(Date, Price)) + geom_line(colour="red")
Unfortunately, this code results in my graph having every date crammed down at the bottom so that it is barely legible:
(imgur.com/a/iitXw).
It's just a flat line. It also says, "geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?"
How can I plot dates such as this in Ggplot?
It is hard to diagnose the problem or propose a solution without your code or data. However, the code below is a very simple way to create a basic plot from example data that matches your description of your data:
# attach the ggplot package
library(ggplot2)
# make example data that matches your description
# (where the dates are January, April, July, and October 1 of 2015)
dates <- c('01-01-2015', '01-04-2015', '01-07-2015', '01-10-2015')
prices <- rnorm(4, 3)
# now convert dates to date format
dates <- as.Date(dates, format='%d-%m-%Y')
# you should then be able to create a simple plot using qplot
qplot(dates, prices, geom='line', xlab='Date', ylab='Price',
main='Gas Prices Over Time')
Note the use of the format option in the as.Date() function. Since the dates are in DD-MM-YYYY format, while as.Date() assumes YYYY-MM-DD format, your dates would not convert correctly without that option:
dates <- c('01-01-2015', '01-04-2015', '01-07-2015', '01-10-2015')
dates <- as.Date(dates)
format(dates, format="%B %d %Y")
[1] "January 20 1" "April 20 1" "July 20 1" "October 20 1"
Since you had some problem with your dates being numeric, I suspect you did some data cleaning after converting them to dates; for example,
dates <- c('01-01-2015', '01-04-2015', '01-07-2015', '01-10-2015', NA)
dates <- as.Date(dates, format='%d-%m-%Y')
class(dates)
[1] "Date"
dates <- ifelse(is.na(dates), NA, dates)
class(dates)
[1] "numeric"
This is because Date objects are stored as integer values, which are used when making logical comparisons (see this page at IDRE). However, that still should not throw an error when plotting; the axis labels should simply be numbers rather than dates.
prices <- c(rnorm(4, 3), NA)
qplot(dates, prices, geom='line', xlab='Date', ylab='Price',
main='Gas Prices Over Time')
Hopefully this will get you started on a solution to your problem.
EDIT:
Having seen the first twelve rows of your csv file, I can give you the following solution:
library(ggplot2)
df <- read.csv('dataframe.csv') # change file name/path as appropriate
df$Date <- as.Date(df$Date, format='%m/%d/%Y')
ggplot(df, aes(x=Date, y=Price)) + geom_line(color="red")
For me this produced the following plot:
I believe the issue was that in the code you provided me, the date variable was not converted to a date object; it was a factor (as seen in the lower right of the screenshot you provided). There was only one observation per level of the factor, resulting in the error you saw. You can get rid of that error by adding group=1 to the aesthetic (i.e. aes(x=Date, y=Price, group=1)), but this does not accomplish exactly what you want.
The key is making sure you convert the variable to a Date object (being careful about the format, as mentioned above); then everything works out nicely.

Order dates in ggplot by month

I have DF$Date in the as.Date format "yyyy-mm-dd" as shown below. Is there an easy way to get these grouped by month in ggplot?
Date
2015-07-30
2015-08-01
2015-08-02
2015-08-06
2015-08-11
2015-08-12
I've added a column DF$Month as "year Monthname" (e.g. April 2015.)
I'm doing this by DF$Month<-strftime(DF$Date,format="%B %Y")
Is there a quick way to factor the month/years so that they are ordinal?
I used a workaround by formatting using:
DF$Month<-strftime(DF$Date,format="%Y-%m") so that the larger numbers are first and subsequently the month number.
This gives the output, which is sortable:
DF$Month
"2015-07"
"2015-08"
This output allows me to get this grouping:
http://imgur.com/df1FI3s
When using this plot:
MonthlyActivity<-ggplot(DF,aes(x=Month, y=TotalSteps))+
geom_boxplot()
MonthlyActivity
Any alternatives so I can use the full month name and still be in the correct time order?
There are probably other solutions, but here is one with full month names as a factor. As you already found out, you need a x variable to group by. We can then treat it as a 'order a factor' problem instead of a date-scale problem.
#first, generate some data
dat <- data.frame(date=sample(seq(as.Date("01012015",format="%d%m%Y"),
as.Date("01082015", format="%d%m%Y"),by=1),1000,T),
value=rnorm(1000))
We find the minimum and maximum month, and do some date-arithmetic to allow for all start-days (so that february doesn't get skipped when the minimum date is on the 29th/30th/31st). I used lubridate for this.
library(lubridate)
min_month = min(dat$date)-day(min(dat$date))+1
max_month = max(dat$date)-day(max(dat$date))+1
We generate a grouping variable. It is a factor with labels like 'January 2015, March 2015'. However, we force the order by creating a sequence (by month) from min date to max date and formatting it in the same way.
dat$group <- factor(format(dat$date, "%B %Y"),
levels=format(seq(min_month, max_month,by="month"),
"%B %Y"))
This forces the ordering on the axis:
Try adding
scale_x_discrete(limits = month.abb)
so your code would be
MonthlyActivity<-ggplot(DF,aes(x=Month, y=TotalSteps))+ geom_boxplot()+scale_x_discrete(limits = month.abb)
you will need library(dplyr)

Resources