Plotting data in order of date - r

The aim is to plot a historical plot in r. using the following data
date new_cases
1 22/01/2020 NA
2 23/01/2020 0
3 24/01/2020 1
4 25/01/2020 0
5 26/01/2020 3
6 27/01/2020 0
7 28/01/2020 0
8 29/01/2020 1
9 30/01/2020 0
10 31/01/2020 2
... etc.
I have never plotted like this, the only thing I know is using
plot(data$new_cases)
But surely there is a more a aesthetically pleasing way to do this?
Thanks

In base R, you can convert the date column to date (as suggested above) using as.Date. Then, you can plot and label your graph.
df$date <- as.Date(df$date, "%d/%m/%Y")
plot(new_cases ~ date, df, xaxt = "n", type = "l")
axis(1, df$date, format(df$date, "%d/%m/%Y"), cex.axis = .7)
Another option is to use tidyverse and convert the date, then pipe into ggplot2:
library(tidyverse)
df %>%
mutate(date = as.Date(date, format = "%d/%m/%Y")) %>%
ggplot(aes(date,new_cases)) +
geom_line()
Or if you just want points, then you can do as #Dave2e suggested:
df$date <- as.Date(df$date, "%d/%m/%Y")
plot(x=df$date, y=df$new_cases)

For completness, date is a poor and potentially perilous name for variable and my result in a namespace conflict as date corresponds to base and lubridate functions. When using this data, I would suggest that you rename column to case_date or equivalent.
On another matter, if you are using tidyverse you may want to explore lubridate, which is part of tidyverse.
# pasted_data <- read.delim(pipe("pbpaste"))
library("tidyverse")
library("tidyr")
library("lubridate")
pasted_data %>%
separate(
col = 1,
sep = "\\s+",
into = c("col_ind", "case_date", "cases"),
convert = TRUE
) %>%
mutate(case_date = dmy(case_date)) %>%
ggplot(aes(x = case_date, y = cases)) +
geom_line()
For posterity, ff pasted in the available format, data should be separated into distinct columns. While in tidyverse, this can be achieved using tidy::separate.

Related

How to Invert date format

I have a large database with one of the columns containing dates with the following format: DD-MM-YYYY.
I would like to invert the date format, to something like this: YYYY-MM-DD.
Can someone tell me how can I do it using bash OR R?
A possible solution:
library(tidyverse)
library(lubridate)
df <- data.frame(date=c("11-4-2021","5-6-2019"))
df %>%
mutate(date2 = dmy(date) %>% ymd)
#> date date2
#> 1 11-4-2021 2021-04-11
#> 2 5-6-2019 2019-06-05
In bash, we can use string manipulation:
dmy=30-12-2021
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}" # 2021-12-30
or with read:
IFS="-" read -r d m y <<<"$dmy"
echo "$y-$m-$d"
I used R to solve my problem like this:
df > data.frame with dates on column "eventDate". Dates were in the format DD-MM-YYYY. There were several cells with incomplete dates (e.g. MM-YYYY or YYYY).
library(tidyr)
x <- separate(df, col = eventDate, into = c("day", "month", "year"), sep="-")
y <- x %>% unite("eventDate_2", year:month:day, remove=TRUE, sep="-", na.rm= TRUE)
y <- cbind(y, df$eventDate) # add the original column for comparing if it had worked and correct individual errors.
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}"

Plotting Time Series Data in R and Tidy

I am trying to use lubridate to sort out time series data from my temperature sensors. I would ultimately like a plot that has time on the x axis and temperature on the y axis. I have been using the function parse_date_time to try and create a new date variable but all I get is NA.
temps<-temps %>% as_tibble() %>%
mutate(date = parse_date_time(Date.Time..GMT..0500, "mdYHM"))
temps
The problem is that you inserted a capital Y when the year part only contains two digits. So you should use a small-case y, i.e.
temps %>% as_tibble() %>%
mutate(date = parse_date_time(Date.Time..GMT..0500, "mdyHM"))
To produce a simple plot, here is a basic code
ggplot(temps) +
aes(x = date, y = TempF) +
geom_line()
For further details on the plot itself, I suggest you to have a look at ggplot2 documentation.
In my sample data it worked
temps <- data.frame(
Date.Time..GMT..0500 = c("6/18/18 12:57", "6/18/18 13:57", "6/18/18 14:57"),
var = c(1,2,3)
)
parse_date_time(temps$Date.Time..GMT..0500, "mdYHM")
# [1] "2018-06-18 12:57:00 UTC" "2018-06-18 13:57:00 UTC" "2018-06-18 14:57:00 UTC"

How to overlap series in ggplot group by year in R

I'd like to create chart representing time series (xts object) by days and months in ggplot to compare trends over years.
Here how i create dataset:
sales_ <- read_csv("sales.csv")
sales_ <- sales_ %>% mutate(date_ = as_date(date_))
sales_xts <- xts(sales_[!names(sales_) %in% 'date_'], order.by = sales_$date_)
head(sales_xts)
x1 x2
2017-01-02 10 20000.89
2017-01-03 100 90000.53
2017-01-04 120 100000.16
--- -- --
2018-01-06 16 10000.25
2018-01-09 180 100000.68
You can use dplyr and ggplot:
library(dplyr);library(ggplot2);
sales_xts %>%
gather("key", "value", -date) %>% # gathers your data in the correct long format
# key will contain the names of x1 and x2 (you can change it)
# value will contain the numbers of each
# I supposed you have a column of dates named "date" change that with the correct name
ggplot(aes(x = date, y = value, color = as.factor(key))) +
geom_line(size = 1)

Adding date ticks to ggplot in R

I am trying to add ticks to my x-axis in this graph to show all the months of the year:
My code is as follows:
library(ggplot2)
library(scales)
p <- ggplot(df_test, aes(time, reading))
p + geom_point(alpha = 1/4) + geom_smooth()
I have tried to use scale_x_date but have come across the following error:
Error: Invalid input: date_trans works with objects of class Date only
Here's the data frame I'm using:
hour reading date time
1 53 1/1/15 2015-01-01 01:00:00
2 55 1/1/15 2015-01-01 02:00:00
3 56 1/1/15 2015-01-01 03:00:00
The class of my time variable:
class(df_test$time)
"POSIXct" "POSIXt"
Use scale_x_date(breaks="month", labels=date_format("%b%)). Here's an example.
library(quantmod)
sp500 <- getSymbols("SP500", src="FRED", auto.assign=FALSE)
sp500 <- sp500["2015-01-01::"]
sp500 <- data.frame(time=as.POSIXct(index(sp500), origin="1970-01-01"),sp500)
class(sp500$time)
# [1] "POSIXct" "POSIXt"
library(ggplot2)
library(scales) # for date_format(...)
ggplot(sp500, aes(x=as.Date(time), y=SP500))+
geom_line()+
scale_x_date(breaks="month", labels=date_format("%b"))
You are trying to use a scale specific for Date on a POSIXct object. The solution is to cast the POSIXct object to a date using as.date:
> Sys.time()
[1] "2015-09-16 09:52:42 CEST"
> as.Date(Sys.time())
[1] "2015-09-16"
To do this on your data.frame, I recommend using the dplyr package:
df_test = df_test %>% mutate(time = as.Date(time))
Combining the answers of both #PaulHiemstra and #jihoward, I've managed to figure out an answer.
First using the dplyr library to rework the data:
library(dplyr)
df_test1 = df_test %>% mutate(time = as.Date(time))
Then using scale_x_dates:
library(ggplot2)
library(scales)
p <- ggplot(df_test1, aes(time, reading))
p + geom_point(alpha = 1/4)+
scale_x_date(breaks="month", labels=date_format("%b"))
Gives the result:
Since the most recent answers (2015) date_format() is deprecated (I believe). Replace it with label_date() or scales::label_date(). It may not be loaded in your namespace but should come with ggplot so scales:: might be necessary.
Here's a copy and paste of the #timothyylim accepted answer with the change.
library(dplyr)
df_test1 = df_test %>% mutate(time = as.Date(time))
library(ggplot2)
library(scales)
p <- ggplot(df_test1, aes(time, reading))
p + geom_point(alpha = 1/4)+
scale_x_date(breaks="month", labels = scales::label_date("%b"))

box plot for multiple observations

I have multiple observation of rainfall for the same station for around 14 years the data frame is in something like this :
df (from date -01/01/2000)
v1 v2 v3 v4 v5 v6 ........ v20
1 1 2 4 8 9..............
1.4 4 3.8..................
1.5 3 1.6....................
1.6 8 .....................
.
.
.
.
till date 31/01/2013 i.e total 5114 observations
where v1 v2 ...v20 are the rainfall simulation for the same point; I want to plot the box plot which represents the collective range of quantiles and median monthly when all the observations are taken together.
I can plot box plot for single monthly values using :
df$month<-factor(month.name,levels=month.name)
library(reshape2)
df.long<-melt(df,id.vars="month")
ggplot(df.long,aes(month,value))+geom_boxplot()
but in this problem as the data is daily and there are multiple observations i don't get idea where to start.
sample data
df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))
In case if u want to work with a zoo object :
date<-seq(as.POSIXct("2000-01-01 00:00:00","GMT"),as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min")
If you want yo can also convert it to zoo object
x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"), as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
I am not familiar with zoo. So, I converted your sample to data frame. Your idea of using melt() is a right way. Then, you need to aggregate rain amount by month. I think it is good to look up aggregate() and other options. Here, I used dplyr and tidyr to arrange the sample data. I hope this will let you move forward.
### zoo to data frame by # Joshua Ulrich
### http://stackoverflow.com/questions/14064097/r-convert-between-zoo-object-and-data-frame-results-inconsistent-for-different
zoo.to.data.frame <- function(x, index.name="Date") {
stopifnot(is.zoo(x))
xn <- if(is.null(dim(x))) deparse(substitute(x)) else colnames(x)
setNames(data.frame(index(x), x, row.names=NULL), c(index.name,xn))
}
### to data frame
foo <- zoo.to.data.frame(df)
str(foo)
library(dplyr)
library(tidyr)
### wide to long data frame, aggregate rain amount by Date
ana <- foo %>%
melt(., id.vars = "Date") %>%
group_by(Date) %>%
summarize(rain = sum(value))
### Aggregate rain amount by year and month
bob <- ana %>%
separate(Date, c("year", "month", "date")) %>%
group_by(year, month) %>%
summarize(rain = sum(rain))
### Drawing a ggplot figure
ggplot(data = bob, aes(x = month, y = rain)) +
geom_boxplot()
just found out an easier way to do it, hwoever your answered really helped jazzuro
install.packages("reshape2")
library(dplyr)
library(reshape2)
require(ggplot2)
df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))
x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"),
as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
v<-aggregate(x, as.yearmon, mean)
months<- rep(1:12,14)
lol<-data.frame(v,months)
df.m <- melt(lol, id.var = "months")
View(df.m)
p <- ggplot(df.m, aes(factor(months), value))
p + geom_boxplot(aes(fill = months))

Resources