How to overlap series in ggplot group by year in R - r

I'd like to create chart representing time series (xts object) by days and months in ggplot to compare trends over years.
Here how i create dataset:
sales_ <- read_csv("sales.csv")
sales_ <- sales_ %>% mutate(date_ = as_date(date_))
sales_xts <- xts(sales_[!names(sales_) %in% 'date_'], order.by = sales_$date_)
head(sales_xts)
x1 x2
2017-01-02 10 20000.89
2017-01-03 100 90000.53
2017-01-04 120 100000.16
--- -- --
2018-01-06 16 10000.25
2018-01-09 180 100000.68

You can use dplyr and ggplot:
library(dplyr);library(ggplot2);
sales_xts %>%
gather("key", "value", -date) %>% # gathers your data in the correct long format
# key will contain the names of x1 and x2 (you can change it)
# value will contain the numbers of each
# I supposed you have a column of dates named "date" change that with the correct name
ggplot(aes(x = date, y = value, color = as.factor(key))) +
geom_line(size = 1)

Related

Is it possible in R to split my date-time values into 5 different columns (Year, month, date, hour, minute)?

I am really new at R and this is probably a really basic question: Let's say I have a dataset with a column that includes date values of the format ("y-m-d H:M:S") as a Factor value.
How do I split the one column into 5?
Given example:
x <- as.factor(c("2018-01-03 12:34:32.92382", "2018-01-03 12:50:40.00040"))
x <- as_datetime(x) #to convert to type Date
x <- x %>%
dplyr::mutate(year = lubridate::year(x),
month = lubridate::month(x),
day = lubridate::day(x),
hour = lubridate::hour(x),
minute = lubridate::minute(x),
second = lubridate::second(x))
I get the error: for objects with the class(c('POSIXct', 'POSIXt') can not be used.
Change it into dataframe then run mutate part will works
x %>%
as.data.frame() %>%
rename(x = '.') %>%
dplyr::mutate(year = lubridate::year(x),
month = lubridate::month(x),
day = lubridate::day(x),
hour = lubridate::hour(x),
minute = lubridate::minute(x),
second = lubridate::second(x))
x year month day hour minute second
1 2018-01-03 12:34:32 2018 1 3 12 34 32.92382
2 2018-01-03 12:50:40 2018 1 3 12 50 40.00040
You could also make your mutate a little bit cleaner utilizing the power of across:
library(lubridate)
x %>%
data.frame(date = .) %>%
mutate(across(date,
funs(year, month, day, hour, minute, second),
.names = "{.fn}"))

Plotting data in order of date

The aim is to plot a historical plot in r. using the following data
date new_cases
1 22/01/2020 NA
2 23/01/2020 0
3 24/01/2020 1
4 25/01/2020 0
5 26/01/2020 3
6 27/01/2020 0
7 28/01/2020 0
8 29/01/2020 1
9 30/01/2020 0
10 31/01/2020 2
... etc.
I have never plotted like this, the only thing I know is using
plot(data$new_cases)
But surely there is a more a aesthetically pleasing way to do this?
Thanks
In base R, you can convert the date column to date (as suggested above) using as.Date. Then, you can plot and label your graph.
df$date <- as.Date(df$date, "%d/%m/%Y")
plot(new_cases ~ date, df, xaxt = "n", type = "l")
axis(1, df$date, format(df$date, "%d/%m/%Y"), cex.axis = .7)
Another option is to use tidyverse and convert the date, then pipe into ggplot2:
library(tidyverse)
df %>%
mutate(date = as.Date(date, format = "%d/%m/%Y")) %>%
ggplot(aes(date,new_cases)) +
geom_line()
Or if you just want points, then you can do as #Dave2e suggested:
df$date <- as.Date(df$date, "%d/%m/%Y")
plot(x=df$date, y=df$new_cases)
For completness, date is a poor and potentially perilous name for variable and my result in a namespace conflict as date corresponds to base and lubridate functions. When using this data, I would suggest that you rename column to case_date or equivalent.
On another matter, if you are using tidyverse you may want to explore lubridate, which is part of tidyverse.
# pasted_data <- read.delim(pipe("pbpaste"))
library("tidyverse")
library("tidyr")
library("lubridate")
pasted_data %>%
separate(
col = 1,
sep = "\\s+",
into = c("col_ind", "case_date", "cases"),
convert = TRUE
) %>%
mutate(case_date = dmy(case_date)) %>%
ggplot(aes(x = case_date, y = cases)) +
geom_line()
For posterity, ff pasted in the available format, data should be separated into distinct columns. While in tidyverse, this can be achieved using tidy::separate.

Ggplot: how to show boxplots in a given order?

I have a peculiar problem with arranging boxplots given a certain order of the x-axis, as I am adding two boxplots from different dataframe in the same plot and each time I add the second geom_boxplot, R reorders my x axis alphabetically instead of following ordered levels of factor(x).
So, I have two dataframe of different lengths lookings something like this:
df1:
id value
1 A 1
2 A 2
3 A 3
4 A 5
5 B 10
6 B 8
7 B 1
8 C 3
9 C 7
df2:
id value
1 A 4
2 A 5
3 B 6
4 B 8
There is always more observations per id in df1 than in df2 and there is some ids in df1 that are not available in df2.
I'd like df1 to be sorted by the median(value) (ascending) and to first plot boxplots for each id in that order.
Then I add a second layer with boxplots for all other measurements per id from df2, which should maintain the same order on the x-axis.
Here's how I approached that:
vec <- df %>%
group_by(id) %>%
summarize(m = median(value)) %>%
arrange(m) %>%
pull(id)
p1 <- df1 %>%
ggplot(aes(x = factor(id, levels = vec), y = value)) +
geom_boxplot()
p1
p2 <- p1 +
geom_boxplot(data = df2, aes(x = factor(id, levels = vec), y = value))
p2
p1 shows the right order (ids are ordered based on ascending medians), p2 always throws my order off and goes back to plotting ids alphabetically (my id is a character column with names actually). I tried with sample dataframes and the above code achieves what is required. Hence, I am not sure what could be specifically wrong about my data so that the code fails when applied to the specific data and not the above mock data.
Any ideas?
Thanks a lot in advance!
If I understood correctly, this shoud work.
library(tidyverse)
# Sample data
df1 <-
tibble(
id = c("A","A","A","A","B","B","B","C","C"),
value = c(1,2,3,5,10,8,1,3,7),
type = "df1"
)
df2 <-
tibble(
id = c("A","A","B","B"),
value = c(4,5,6,8),
type = "df2"
)
df <-
# Create single data.frame
df1 %>%
bind_rows(df2) %>%
# Reorder id by median(value)
mutate(id = fct_reorder(id,value,median))
df %>%
ggplot(aes(id, y = value, fill = type)) +
geom_boxplot()

Plotting dummy variables with ggplot2

I actually need help building on this question:
ggplot2 graphic order by grouped variable instead of in alphabetical order.
I need to produce a similar graph and I actually have a problem with the black points. I have data where column names are dates and rows are filled with 0 or 1 and I need to plot the point if the value is 1. To reproduce, here is a small sample (in my dataset, there is over 300 columns):
df <- data.frame(id=c(1,2,3),
"26April1970"=c(0,0,1),
"14August1970"=c(0,1,0))
I need to plot the dates on the x axis, match the id to the canton and show the points where the value is 1.
Could anyone help?
Try this:
plot_data = df %>%
## put data in long format
pivot_longer(-id, names_to = "colname") %>%
## keep only 1s
filter(value == 1) %>%
## convert dates to Date class
mutate(date = as.Date(colname, format = "%d%B%Y"))
plot_data
# # A tibble: 2 x 4
# id colname value date
# <dbl> <chr> <dbl> <date>
# 1 2 14August1970 1 1970-08-14
# 2 3 26April1970 1 1970-04-26
## plot
ggplot(plot_data, aes(x = date, y = factor(id))) +
geom_point()
Using this data:
df <- data.frame(id=c(1,2,3),
"26April1970"=c(0,0,1),
"14August1970"=c(0,1,0), check.names = FALSE)
Maybe you are looking for this:
library(ggplot2)
library(dplyr)
library(tidyr)
#Data
df <- data.frame(id=c(1,2,3),
"26April1970"=c(0,0,1),
"14August1970"=c(0,1,0))
#Code
df %>% pivot_longer(-id) %>%
ggplot(aes(x=name,y=factor(value)))+
geom_point(aes(color=factor(value)))+
scale_color_manual(values=c('transparent','black'))+
theme(legend.position = 'none')+xlab('Date')+ylab('value')
Output:

box plot for multiple observations

I have multiple observation of rainfall for the same station for around 14 years the data frame is in something like this :
df (from date -01/01/2000)
v1 v2 v3 v4 v5 v6 ........ v20
1 1 2 4 8 9..............
1.4 4 3.8..................
1.5 3 1.6....................
1.6 8 .....................
.
.
.
.
till date 31/01/2013 i.e total 5114 observations
where v1 v2 ...v20 are the rainfall simulation for the same point; I want to plot the box plot which represents the collective range of quantiles and median monthly when all the observations are taken together.
I can plot box plot for single monthly values using :
df$month<-factor(month.name,levels=month.name)
library(reshape2)
df.long<-melt(df,id.vars="month")
ggplot(df.long,aes(month,value))+geom_boxplot()
but in this problem as the data is daily and there are multiple observations i don't get idea where to start.
sample data
df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))
In case if u want to work with a zoo object :
date<-seq(as.POSIXct("2000-01-01 00:00:00","GMT"),as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min")
If you want yo can also convert it to zoo object
x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"), as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
I am not familiar with zoo. So, I converted your sample to data frame. Your idea of using melt() is a right way. Then, you need to aggregate rain amount by month. I think it is good to look up aggregate() and other options. Here, I used dplyr and tidyr to arrange the sample data. I hope this will let you move forward.
### zoo to data frame by # Joshua Ulrich
### http://stackoverflow.com/questions/14064097/r-convert-between-zoo-object-and-data-frame-results-inconsistent-for-different
zoo.to.data.frame <- function(x, index.name="Date") {
stopifnot(is.zoo(x))
xn <- if(is.null(dim(x))) deparse(substitute(x)) else colnames(x)
setNames(data.frame(index(x), x, row.names=NULL), c(index.name,xn))
}
### to data frame
foo <- zoo.to.data.frame(df)
str(foo)
library(dplyr)
library(tidyr)
### wide to long data frame, aggregate rain amount by Date
ana <- foo %>%
melt(., id.vars = "Date") %>%
group_by(Date) %>%
summarize(rain = sum(value))
### Aggregate rain amount by year and month
bob <- ana %>%
separate(Date, c("year", "month", "date")) %>%
group_by(year, month) %>%
summarize(rain = sum(rain))
### Drawing a ggplot figure
ggplot(data = bob, aes(x = month, y = rain)) +
geom_boxplot()
just found out an easier way to do it, hwoever your answered really helped jazzuro
install.packages("reshape2")
library(dplyr)
library(reshape2)
require(ggplot2)
df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))
x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"),
as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
v<-aggregate(x, as.yearmon, mean)
months<- rep(1:12,14)
lol<-data.frame(v,months)
df.m <- melt(lol, id.var = "months")
View(df.m)
p <- ggplot(df.m, aes(factor(months), value))
p + geom_boxplot(aes(fill = months))

Resources