Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have the following code:
library(ggplot2)
library(fpp2)
library(tidyverse)
library(tidyr)
library(lubridate)
library(writexl)
library(plyr)
library(forecast)
Sales171819 <- SalesNL[SalesNL$TransactionDate >= "2017-01-01" & SalesNL$TransactionDate <= "2019-12-31",]
#create time series
myts <- ts(Sales171819[,2],start = decimal_date(as.Date("2017-05-01")), frequency = 365)
#plot time series
view(myts)
autoplot(myts) + ggtitle("TAF Sales NL 2017/2018")+
ylab("SalesQty") + xlab("days")
# seasonal plot sales
ggseasonplot(myts) + ggtitle("Sales Per Dag")+
ylab("Sales") + xlab("Days")
I would like to plot the actual dates to the autoplot and ggseasonplot on the x axis, instead of day 1, 2, 3... etc. I would also like to highlight points in the plots with the actual dates. How can I edit my code so I can get this done?
The data looks like this:
TransactionDate NetSalesQty
1 2017-05-01 1221
2 2017-05-02 1275
3 2017-05-03 1198
4 2017-05-04 1792
5 2017-05-05 1842
6 2017-05-06 1183
structure(list(TransactionDate = structure(c(17287, 17288, 17289,
17290, 17291, 17292), class = "Date"), NetSalesQty = c(1221,
1293, 1525, 1475, 1854, 2189)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Thanks in advance.
Well, I was not able to make autoplot() work with the myts object but based on the ylab() and xlab(), I made this plot:
Of course you can add geom_line() or others to make it look as you expect.
The code:
library(ggplot2)
SalesNL <- structure(list(TransactionDate = structure(c(17287, 17288, 17289,
17290, 17291, 17292), class = "Date"), NetSalesQty = c(1221,
1293, 1525, 1475, 1854, 2189)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Sales171819 <- SalesNL[SalesNL$TransactionDate >= "2017-01-01" & SalesNL$TransactionDate <= "2019-12-31",]
ggplot(data = Sales171819,
aes(x = TransactionDate,
y = NetSalesQty,
color = ifelse(TransactionDate %in% as.Date(c("2017-05-02", "2017-05-04")), "outstanding", "normal")
)
) +
geom_point() +
scale_x_date(name = "Days",
# date_breaks = "1 day", # uncheck to get all labels
breaks = as.Date(c("2017-05-02", "2017-05-04"))) + # just pass a vector with dates you want to highlight
scale_y_continuous(name = "Sales") +
scale_color_manual(name = "highlights",
values = c("outstanding" = "red", "normal" = "black"))
You can also do it the other way around, with a color based on the y value:
ggplot(data = Sales171819,
aes(x = TransactionDate,
y = NetSalesQty,
color = ifelse(
NetSalesQty >= 1500,
"outstanding", # name for the legend and the choice of the color, see scale_color_manual
"normal") # name for the legend and the choice of the color, see scale_color_manual
)) +
geom_point() +
scale_x_date(name = "Days",
# date_breaks = "1 day",
breaks = Sales171819[Sales171819$NetSalesQty >= 1500, 1]$TransactionDate) +
scale_y_continuous(name = "Sales") +
scale_color_manual(name = "highlights",
values = c("outstanding" = "red", "normal" = "black"))
Output:
Related
I am trying to modify a line graph i have already made. On the x axis, it has the data in which a participant completed a task. However, I am trying to make it so the x axis simply show each completed session of the task as day 1, day 2 etc.... Is there a way to do this?
My code for the line graph is as follows:
ggplot(data = p07_points_scored, aes(x = day, y = total_score, group = 1)) +
geom_line() +
geom_point() +
theme(axis.text.x = element_text(angle = 60, vjust = 0.5)) +
labs(title=" P07s Total score on the training tool",
x = "Date of training completion",
y = "Total Score",
color = "lightblue") +
geom_smooth()
To further add to this. I have 4 separate line graphs from individual participants showing their total scores within the task. Is there a way to combine the separate graphs together into 1?
Many thanks :)
enter image description here
Here is an example with fake data: The key point is to mutate a new column days and assign it to the x axis with fct_inorder():
library(tidyverse)
library(lubridate)
# Create some fake data:
date <- dmy("6-8-2022"):dmy("5-9-2022")
y = rnorm(31, mean = 2300, sd = 100)
df <- tibble(date, y)
df %>%
mutate(days = paste0("day",row_number())) %>%
ggplot(aes(x = fct_inorder(days), y = y, group= 1)) +
geom_point()+
geom_line()
data:
df <- structure(list(date = 19210:19240, y = c(2379.71407792736, 2349.90296535465,
2388.14396999868, 2266.84629740315, 2261.95099255488, 2270.90461436351,
2438.19569234793, 2132.6468717962, 2379.46892613664, 2406.13636097426,
2176.9392984643, 2219.0521150482, 2221.22674399102, 2399.82972150781,
2396.76276645913, 2233.62763324748, 2468.98833991591, 2397.47855248058,
2486.96828322353, 2330.04116860874, 2280.66624489061, 2411.09933781266,
2281.06682518505, 2281.63162850277, 2235.66952459084, 2271.2152525563,
2481.86164459452, 2544.25592495568, 2411.90218614317, 2275.60378793237,
2297.98843827031)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-31L))
Raw data
structure(list(attainment_target = c(7.5, 15), quarter_2022 = c("Q1",
"Q2"), total_attainment = c(2, 4), percent_attainment = c(0.2666,
0.2666)), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))
Quarter | Target | Attainment
2022-01-01 7.5 2
2022-04-01 15 4
Scenario
I would like to plot a ggplot (geom_col or geom_bar) with Quarter as x-axis and Attainment as y-axis with Target as a horizontal dash line that shows how far off I am from that value.
However, I am having trouble plotting YTD (Total attainment given # of quarters) in the same plot. Here is an example of how I used dplyr to create new field that shows calculated YTD value:
Desired output
Quarter | Target | Attainment | YTD. | % Attainment
2022-01-01 7.5 2 2 27
2022-04-01 15 4 6 40
Which is the best way to plot this via ggplot in R? Here is my current approach but having trouble incorporating all the above:
df1 <- df %>%
mutate(YTD_TOTAL = sum(total_attainment)) %>%
mutate(YTD_PERCENT_ATTAINMENT = sum(total_attainment) / max(attainment_target))
ggplot(data = df1, aes(fill=quarter_2022, x=attainment_target, y=total_attainment, color = quarter_2022, palette = "Paired",
label = TRUE,
position = position_dodge(0.9)))
Not sure exactly what you have in mind but here are some of the pieces you might want to use:
df %>%
mutate(YTD_TOTAL = cumsum(total_attainment)) %>%
mutate(YTD_PERCENT_ATTAINMENT = YTD_TOTAL/ attainment_target) %>%
ggplot(aes(quarter_2022, total_attainment)) +
geom_col(aes(y = YTD_TOTAL), fill = NA, color = "gray20") +
geom_text(aes(y = YTD_TOTAL, label = scales::percent(YTD_PERCENT_ATTAINMENT)),
vjust = -0.5) +
geom_col(fill = "gray70", color = "gray20") +
geom_text(aes(label = total_attainment),
position = position_stack(vjust = 0.5)) +
geom_segment(aes(x = as.numeric(as.factor(quarter_2022)) - 0.4,
xend = as.numeric(as.factor(quarter_2022)) + 0.4,
y = attainment_target, yend = attainment_target),
linetype = "dashed")
I think I'm missing something very easy here, but I just can't figure it out at the moment:
I would like to consistently assign colors to certain values from a column across multiple plots.
So I have this tibble (sl):
# A tibble: 15 x 3
class hex x
<chr> <chr> <int>
1 translational slide #c23b22 1
2 rotational slide #AFC6CE 2
3 fast flow-type #b7bf5e 3
4 complex #A6CEE3 4
5 area subject to rockfall/topple #1F78B4 5
6 fall-type #B2DF8A 6
7 n.d. #33A02C 7
8 NA #FB9A99 8
9 area subject to shallow-slides #E31A1C 9
10 slow flow-type #FDBF6F 10
11 topple #FF7F00 11
12 deep-seated movement #CAB2D6 12
13 subsidence #6A3D9A 13
14 areas subject to subsidence #FFFF99 14
15 area of expansion #B15928 15
This should recreate it:
structure(list(class = c("translational slide", "rotational slide",
"fast flow-type", "complex", "area subject to rockfall/topple",
"fall-type", "n.d.", NA, "area subject to shallow-slides", "slow flow-type",
"topple", "deep-seated movement", "subsidence", "areas subject to subsidence",
"area of expansion"), hex = c("#c23b22", "#AFC6CE", "#b7bf5e",
"#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C",
"#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928"
), x = 1:15), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
Now I would like to plot each class with a bar in the color if its hex-code (for now just for visualization purposes). So I did:
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual(values = sl$hex) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
But these are not the colors as they are in the tibble.
So I tried to follow this guide: How to assign colors to categorical variables in ggplot2 that have stable mapping? and created this:
# create the color palette
mycols = sl$hex
names(mycols) = sl$class
and then plotted it with
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual(values = mycols) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
But the results is the same. It's this:
For example the translational slide has the hex code: "#c23b22" and should be a pastell darkish red.
Anyone might have an idea what I'm missing here?
Consider this:
sl <- structure(list(class = c("translational slide", "rotational slide",
"fast flow-type", "complex", "area subject to rockfall/topple",
"fall-type", "n.d.", NA, "area subject to shallow-slides", "slow flow-type",
"topple", "deep-seated movement", "subsidence", "areas subject to subsidence",
"area of expansion"), hex = c("#c23b22", "#AFC6CE", "#b7bf5e",
"#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C",
"#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928"
), x = 1:15), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
sl$class <- factor( sl$class, levels=unique(sl$class) )
cl <- sl$hex
names(cl) <- paste( sl$class )
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual( values = cl, na.value = cl["NA"] ) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
By changing class to a factor and setting levels to it, and using a named vector for your values in scale_fill_manual, and using na.value in there properly, yo might get something that looks more as expected.
You need to provide correct order to colors as per your column, since there is already one called 'x' I have used it as well. Also I replaced NA with character 'NA'. I have checked few of them, Please let me know if this is not the desired output. Thanks
#Assuming df is your dataframe:
df[is.na(df$class), 'class'] <- 'NA'
ggplot(df) +
geom_col(aes(x = x,
y = 1,
fill = factor(x))) +
scale_fill_manual(values = df$hex, labels=df$class) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
Output:
I think the problem is that scale_fill_manual expect the order of its values and labels arguments to match. This isn't the case with your dataset.
Does
sl %>% ggplot() +
geom_col(aes(x = x,
y = 1,
fill = hex)) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90) +
scale_fill_manual(values=sl$hex, labels=sl$class)
Give you what you want?
next time, please dput() your test data: it took me as long to create the test dataset as to answer your question. Also, using hex codes for colours make it difficult to check the colours are as expected. For a MWE, blue/green/black etx would have been more helpful.
I'm trying to create a plot showing the temporal evolution (x) of different values from the same column (y), thus requiring to create a plot with different lines.
I am able to create separate plots for each value of y, so my problem seems to be specifically about adding multiple lines showing different values (of different lengths it seems).
This is the dput of the columns "Date" and "Journal" I use from my dataset "test" :
> structure(list(Date = structure(c(9132, 9136, 9136, 9141, 9141,
9142), class = "Date", tzone = "Europe/Paris"), Journal = c("Libération",
"Libération", "Libération", "Libération", "Le Monde", "La Tribune (France)"
)), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x000002146c471ef0>, class = c("data.table",
"data.frame"))
I used the following code to successfully create a barplot which shows the evolution of column "Journal" according to the column "Date".
dateplot <- ggplot(cleantest) + aes(x = format(Date, "%Y%")) + geom_bar()
I also managed to create single line plots for each value of Y, with the following code :
valueplot <- ggplot(subset(test, Journal %in% "value")) + aes(x = format(Date, "%Y")) + geom_line(stat = "count", na.rm = TRUE, group = 1)
Therefore, I typed the following codes to obtain, for example, two lines in the same plot, and each of them returned a different error :
jourplot <- ggplot(test, aes(x = format(Date, "%Y"))) + geom_line(aes(y = subset(test, Journal %in% "Libération"), colour = "blue")) + geom_line(aes(y = subset(test, Journal %in% "Le Figaro"), colour =
"red"))
The error is :
> Don't know how to automatically pick scale for object of type data.table/data.frame. Defaulting to continuous.
Erreur : Aesthetics must be either length 1 or the same as the data (17307): y
So I tried this :
jourplot <- ggplot(test, aes(x = format(Date, "%Y")) + geom_line(aes(y = subset(test, Journal %in% "Libération"), colour = "blue"), stat = "count", na.rm = TRUE, group = 1) + geom_line(aes(y = subset(test, Journal %in% "Le Figaro"), colour = "red"), stat = "count", na.rm = TRUE, group = 1"))
But this one doesn't even create the "jourplot" object.
There is obviously something wrong with my code and / or my data, but as a newbie I really don't see it. It seems to be about length, but how do I get over this ? Or is this about the classes of the columns that make things difficult to process for ggplot ?
Does anyone understand what is going on ?
Edit : I deleted the "+" symbols from prompt
Is it your full dataset ? To my opinion your example seems to be too small to get a sense of what you are trying to plot.
From my understanding, you are trying to plot the count of each journal per year. But your example is covering only few points for 1995 with some journal with an unique value, so I don't think you can get a line with one point.
Here, I simulate a dataframe with a sequence of dates covering every week for five years and I attributes randomly for each week, one of three journals. Then, I formated the date sequence per year and I plot the count for each year as follow:
library(lubridate)
rep_df <- data.frame(Date = seq(ymd("1995-01-01"),ymd("2000-01-01"), by = "weeks"),
Journal = sample(c("Liberation","Le Monde","Le Figaro"), 261, replace = TRUE))
rep_df$Year <- floor_date(rep_df$Date, unit = "year")
head(rep_df)
Date Journal Year
1 1995-01-01 Le Monde 1995-01-01
2 1995-01-08 Le Figaro 1995-01-01
3 1995-01-15 Liberation 1995-01-01
4 1995-01-22 Le Monde 1995-01-01
5 1995-01-29 Liberation 1995-01-01
6 1995-02-05 Liberation 1995-01-01
library(ggplot2)
ggplot(rep_df, aes(x = Year))+
geom_point(aes(color = Journal), stat = "count")+
geom_line(aes(color = Journal),stat = "count")+
scale_x_date(date_labels = "%Y", date_breaks = "1 year", name = "")
Does it look what you are trying to get ?
I have a simple R script to create a forecast based on a file.
Data has been recorded since 2014 but I am having trouble trying to accomplish below two goals:
Plot only a subset of the forecast information (starting on 11/2017 onwards).
Include month and year in a specific format (i.e. Jun 17).
Here is the link to the dataset and below you will find the code made by me so far.
# Load required libraries
library(forecast)
library(ggplot2)
# Load dataset
emea <- read.csv(file="C:/Users/nsoria/Downloads/AMS Globales/EMEA_Depuy_Finanzas.csv", header=TRUE, sep=';', dec=",")
# Create time series object
ts_fin <- ts(emea$Value, frequency = 26, start = c(2014,11))
# Pull out the seasonal, trend, and irregular components from the time series
model <- stl(ts_fin, s.window = "periodic")
# Predict the next 3 bi weeks of tickets
pred <- forecast(model, h = 5)
# Plot the results
plot(pred, include = 5, showgap = FALSE, main = "Ticket amount", xlab = "Timeframe", ylab = "Quantity")
I appreciate any help and suggestion to my two points and a clean plot.
Thanks in advance.
Edit 01/10 - Issue 1:
I added the screenshot output for suggested code.
Plot1
Edit 01/10 - Issue 2:
Once transformed with below code, it somehow miss the date count and mess with the results. Please see two screenshots and compare the last value.
Screenshot 1
Screenshot 2
Plotting using ggplot2 w/ ggfortify, tidyverse, lubridate and scales packages
library(lubridate)
library(tidyverse)
library(scales)
library(ggfortify)
# Convert pred from list to data frame object
df1 <- fortify(pred) %>% as_tibble()
# Convert ts decimal time to Date class
df1$Date <- as.Date(date_decimal(df1$Index), "%Y-%m-%d")
str(df1)
# Remove Index column and rename other columns
# Select only data pts after 2017
df1 <- df1 %>%
select(-Index) %>%
filter(Date >= as.Date("2017-01-01")) %>%
rename("Low95" = "Lo 95",
"Low80" = "Lo 80",
"High95" = "Hi 95",
"High80" = "Hi 80",
"Forecast" = "Point Forecast")
df1
### Updated: To connect the gap between the Data & Forecast,
# assign the last non-NA row of Data column to the corresponding row of other columns
lastNonNAinData <- max(which(complete.cases(df1$Data)))
df1[lastNonNAinData, !(colnames(df1) %in% c("Data", "Fitted", "Date"))] <- df1$Data[lastNonNAinData]
# Or: use [geom_segment](http://ggplot2.tidyverse.org/reference/geom_segment.html)
plt1 <- ggplot(df1, aes(x = Date)) +
ggtitle("Ticket amount") +
xlab("Time frame") + ylab("Quantity") +
geom_ribbon(aes(ymin = Low95, ymax = High95, fill = "95%")) +
geom_ribbon(aes(ymin = Low80, ymax = High80, fill = "80%")) +
geom_point(aes(y = Data, colour = "Data"), size = 4) +
geom_line(aes(y = Data, group = 1, colour = "Data"),
linetype = "dotted", size = 0.75) +
geom_line(aes(y = Fitted, group = 2, colour = "Fitted"), size = 0.75) +
geom_line(aes(y = Forecast, group = 3, colour = "Forecast"), size = 0.75) +
scale_x_date(breaks = scales::pretty_breaks(), date_labels = "%b %y") +
scale_colour_brewer(name = "Legend", type = "qual", palette = "Dark2") +
scale_fill_brewer(name = "Intervals") +
guides(colour = guide_legend(order = 1), fill = guide_legend(order = 2)) +
theme_bw(base_size = 14)
plt1