Keep only the year from a data timestamp column - r

Having a dataframe like this:
data.frame(id = c(1,3), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21"), stringAsFactor = FALSE)
How is it possible to keep only year in the timestamp column having in mind that all years are after 2000? An expected output:
data.frame(id = c(1,3), timestamp = c("2009", "2017"), stringAsFactor = FALSE)

Base R:
format(as.Date(df$timestamp, "%d-%m-%Y %H:%M:%S"), "%Y")
[1] "2009" "2017"
So in the dataframe:
df$year <- format(as.Date(df$timestamp, "%d-%m-%Y %H:%M:%S"), "%Y")
id timestamp year
1 1 20-10-2009 11:35:12 2009
2 3 01-01-2017 12:21:21 2017
Another option, if you're into or familiar with regex, is this:
sub(".*([0-9]{4}).*", "\\1", df$timestamp)
[1] "2009" "2017"

See if this answers your question. The code and the output is as follows :-
library(lubridate)
library(tidyverse)
df <- data.frame(id = c(1,3,4), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21","01-01-1998 12:21:21"), stringAsFactor = FALSE)
df$timestamp <- dmy_hms(df$timestamp)
df1 <- df %>%
filter(year(timestamp) > 2000) %>%
mutate(new_year = year(timestamp))
df1
#id timestamp stringAsFactor new_year
#1 1 2009-10-20 11:35:12 FALSE 2009
#2 3 2017-01-01 12:21:21 FALSE 2017

If you're not afraid of external packages, one option would be to make use of the lubridate package:
df <- data.frame(id = c(1,3), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21"))
df <- df %>%
mutate(timestamp = lubridate::dmy_hms(timestamp)) %>%
mutate(year = lubridate::year(timestamp))
Obviously, if you actually want to replace the timestampe column, you have to change the last mutate command. Result:
id timestamp year
1 1 2009-10-20 11:35:12 2009
2 3 2017-01-01 12:21:21 2017

I have a tidyverse solution to your problem:
library(tidyverse)
data.frame(id = c(1,3), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21"), stringAsFactor = FALSE) %>%
mutate(timestamp = timestamp %>%
str_extract("\\d{4}"))
The function str_extract("\\d{4}") should always extract the first four digits of your target variable.

Related

Loop to convert to date for multiple columns in R

I have a dataframe like this:
df <- data.frame(id = c("a", "b"), date1 = c("06/10/2003", "2006-05-12"), date2 = c("2003-07-15", "10/01/2010"))
id date1 date2
a 06/10/2003 2003-07-15
b 2006-05-12 10/01/2010
I would like to convert these characters to dates. So far, I have been able to do it one column at a time with the following code:
df$new_date <- as.Date(df$date1, format = "%m/%d/%Y")
df$new_date2 <- as.Date(df$date1, format = "%Y-%m-%d")
df <- df %>%
mutate(date1 = coalesce(new_date,new_date2))
But I have a bunch of columns, is there a way to loop this? Thanks in advance!
We can use a function from lubridate, along with across within mutate:
library(tidyverse)
df %>%
mutate(across(starts_with("date"),
~lubridate::parse_date_time(.,orders = c("mdy", "ymd"))))
# id date1 date2
# 1 a 2003-06-10 2003-07-15
# 2 b 2006-05-12 2010-10-01
You could reshape the data frame using pivot_longer so all of the dates are in one column, then use a vectorized condition to address each of the formatting variations in turn, then use pivot_wider to return the data frame to its original shape.
library(tidyverse)
pivot_longer(df, cols = c("date1", "date2")) %>%
mutate(
value = case_when(
grepl("-", value) ~ as.Date(value, format = "%Y-%m-%d"),
grepl("/", value) ~ as.Date(value, format = "%m/%d/%Y")
)
) %>%
pivot_wider(names_from = "name", values_from = "value")
# A tibble: 2 x 3
id date1 date2
<chr> <date> <date>
1 a 2003-06-10 2003-07-15
2 b 2006-05-12 2010-10-01

How to reorder file in chronological order

I have a dataset with multiple columns but I'd like to change the order in chronological order by date!
This is a really bad example but would there be a code to r
Station
year
ID
1
2020
D
2
2019
C
3
2017
A
4
2018
B
This is a really bad example but would there be a code to reorder by date oldest to newest?
Station
year
ID
3
2017
A
4
2018
B
2
2019
C
1
2020
D
To look something like this!
Any help would be amazing! :)
Thank you
Well... "2020" is not a date, and you can order the column as regular integer.
But, if you had dates like "2020-01-25"... transforming strings to dates is easy as...
df <- tibble(n = c(1,2,3,4),
dt = c("2020-01-01","2019-01-01","2017-01-01", "2018-01-01"),
l = c("D","C","A","B"))
df <- df %>%
mutate(
dt = as.Date(dt)
) %>%
arrange(
dt
)
Use ymd () function from lubridate package to bring dt to date format and year () to extract the year. With this format you can sort your dates with arrange
library(dplyr)
library(lubridate)
# data borrowed from abreums
df <- tibble(n = c(1,2,3,4),
dt = c("2020-01-01","2019-01-01","2017-01-01", "2018-01-01"),
l = c("D","C","A","B"))
df1 <- df %>%
mutate(dt = ymd(dt), # "2020-01-01"
dt = year(dt)) %>% # "2020"
arrange(dt)

Standardizing the date format using R

I am having trouble standardizing the Date format to be dd-mm-YYYY, This is my current code
Dataset
date
1 23/07/2020
2 22-Jul-2020
Current Output
df$date<-as.Date(df$date)
df$date = format(df$date, "%d-%b-%Y")
date
1 20-Jul-0022
2 <NA>
Desired Output
date
1 23-Jul-2020
2 22-Jul-2020
You can try this way
library(lubridate)
df$date <- dmy(df$date)
df$date <- format(df$date, format = "%d-%b-%Y")
# date
# 1 23-Jul-2020
# 2 22-Jul-2020
Data
df <- read.table(text = "date
1 23/07/2020
2 22-Jul-2020", header = TRUE)
I've saved your example data set as a dataframe named df. I used group_by from dplyr to all each date to be converted separately to the correct format.
library(tidyverse)
df %>%
group_by(date) %>%
mutate(date = as.Date(date, tryFormats = c("%d-%b-%Y", "%d/%m/%Y"))) %>%
mutate(date = format(date, "%d-%b-%Y"))

Format Date to Year-Month in R

I would like to retain my current date column in year-month format as date. It currently gets converted to chr format. I have tried as_datetime but it coerces all values to NA.
The format I am looking for is: "2017-01"
library(lubridate)
df<- data.frame(Date=c("2017-01-01","2017-01-02","2017-01-03","2017-01-04",
"2018-01-01","2018-01-02","2018-02-01","2018-03-02"),
N=c(24,10,13,12,10,10,33,45))
df$Date <- as_datetime(df$Date)
df$Date <- ymd(df$Date)
df$Date <- strftime(df$Date,format="%Y-%m")
Thanks in advance!
lubridate only handle dates, and dates have days. However, as alistaire mentions, you can floor them by month of you want work monthly:
library(tidyverse)
df_month <-
df %>%
mutate(Date = floor_date(as_date(Date), "month"))
If you e.g. want to aggregate by month, just group_by() and summarize().
df_month %>%
group_by(Date) %>%
summarize(N = sum(N)) %>%
ungroup()
#> # A tibble: 4 x 2
#> Date N
#> <date> <dbl>
#>1 2017-01-01 59
#>2 2018-01-01 20
#>3 2018-02-01 33
#>4 2018-03-01 45
You can solve this with zoo::as.yearmon() function. Follows the solution:
library(tidyquant)
library(magrittr)
library(dplyr)
df <- data.frame(Date=c("2017-01-01","2017-01-02","2017-01-03","2017-01-04",
"2018-01-01","2018-01-02","2018-02-01","2018-03-02"),
N=c(24,10,13,12,10,10,33,45))
df %<>% mutate(Date = zoo::as.yearmon(Date))
You can use cut function, and use breaks="month" to transform all your days in your dates to the first day of the month. So any date within the same month will have the same date in the new created column.
This is usefull to group all other variables in your data frame by month (essentially what you are trying to do). However cut will create a factor, but this can be converted back to a date. So you can still have the date class in your data frame.
You just can't get rid of the day in a date (because then, is not a date...). Afterwards you can create a nice format for axes or tables. For example:
true_date <-
as.POSIXlt(
c(
"2017-01-01",
"2017-01-02",
"2017-01-03",
"2017-01-04",
"2018-01-01",
"2018-01-02",
"2018-02-01",
"2018-03-02"
),
format = "%F"
)
df <-
data.frame(
Date = cut(true_date, breaks = "month"),
N = c(24, 10, 13, 12, 10, 10, 33, 45)
)
## here df$Date is a 'factor'. You could use substr to create a formated column
df$formated_date <- substr(df$Date, start = 1, stop = 7)
## and you can convert back to date class. format = "%F", is ISO 8601 standard date format
df$true_date <- strptime(x = as.character(df$Date), format = "%F")
str(df)

Aggregate Daily Data to Month/Year intervals

I don't often have to work with dates in R, but I imagine this is fairly easy. I have a column that represents a date in a dataframe. I simply want to create a new dataframe that summarizes a 2nd column by Month/Year using the date. What is the best approach?
I want a second dataframe so I can feed it to a plot.
Any help you can provide will be greatly appreciated!
EDIT: For reference:
> str(temp)
'data.frame': 215746 obs. of 2 variables:
$ date : POSIXct, format: "2011-02-01" "2011-02-01" "2011-02-01" ...
$ amount: num 1.67 83.55 24.4 21.99 98.88 ...
> head(temp)
date amount
1 2011-02-01 1.670
2 2011-02-01 83.550
3 2011-02-01 24.400
4 2011-02-01 21.990
5 2011-02-03 98.882
6 2011-02-03 24.900
I'd do it with lubridate and plyr, rounding dates down to the nearest month to make them easier to plot:
library(lubridate)
df <- data.frame(
date = today() + days(1:300),
x = runif(300)
)
df$my <- floor_date(df$date, "month")
library(plyr)
ddply(df, "my", summarise, x = mean(x))
There is probably a more elegant solution, but splitting into months and years with strftime() and then aggregate()ing should do it. Then reassemble the date for plotting.
x <- as.POSIXct(c("2011-02-01", "2011-02-01", "2011-02-01"))
mo <- strftime(x, "%m")
yr <- strftime(x, "%Y")
amt <- runif(3)
dd <- data.frame(mo, yr, amt)
dd.agg <- aggregate(amt ~ mo + yr, dd, FUN = sum)
dd.agg$date <- as.POSIXct(paste(dd.agg$yr, dd.agg$mo, "01", sep = "-"))
A bit late to the game, but another option would be using data.table:
library(data.table)
setDT(temp)[, .(mn_amt = mean(amount)), by = .(yr = year(date), mon = months(date))]
# or if you want to apply the 'mean' function to several columns:
# setDT(temp)[, lapply(.SD, mean), by=.(year(date), month(date))]
this gives:
yr mon mn_amt
1: 2011 februari 42.610
2: 2011 maart 23.195
3: 2011 april 61.891
If you want names instead of numbers for the months, you can use:
setDT(temp)[, date := as.IDate(date)
][, .(mn_amt = mean(amount)), by = .(yr = year(date), mon = months(date))]
this gives:
yr mon mn_amt
1: 2011 februari 42.610
2: 2011 maart 23.195
3: 2011 april 61.891
As you see this will give the month names in your system language (which is Dutch in my case).
Or using a combination of lubridate and dplyr:
temp %>%
group_by(yr = year(date), mon = month(date)) %>%
summarise(mn_amt = mean(amount))
Used data:
# example data (modified the OP's data a bit)
temp <- structure(list(date = structure(1:6, .Label = c("2011-02-01", "2011-02-02", "2011-03-03", "2011-03-04", "2011-04-05", "2011-04-06"), class = "factor"),
amount = c(1.67, 83.55, 24.4, 21.99, 98.882, 24.9)),
.Names = c("date", "amount"), class = c("data.frame"), row.names = c(NA, -6L))
You can do it as:
short.date = strftime(temp$date, "%Y/%m")
aggr.stat = aggregate(temp$amount ~ short.date, FUN = sum)
Just use xts package for this.
library(xts)
ts <- xts(temp$amount, as.Date(temp$date, "%Y-%m-%d"))
# convert daily data
ts_m = apply.monthly(ts, FUN)
ts_y = apply.yearly(ts, FUN)
ts_q = apply.quarterly(ts, FUN)
where FUN is a function which you aggregate data with (for example sum)
Here's a dplyr option:
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
mutate(ym = format(date, '%Y-%m')) %>%
group_by(ym) %>%
summarize(ym_mean = mean(x))
I have a function monyr that I use for this kind of stuff:
monyr <- function(x)
{
x <- as.POSIXlt(x)
x$mday <- 1
as.Date(x)
}
n <- as.Date(1:500, "1970-01-01")
nn <- monyr(n)
You can change the as.Date at the end to as.POSIXct to match the date format in your data. Summarising by month is then just a matter of using aggregate/by/etc.
One more solution:
rowsum(temp$amount, format(temp$date,"%Y-%m"))
For plot you could use barplot:
barplot(t(rowsum(temp$amount, format(temp$date,"%Y-%m"))), las=2)
Also, given that your time series seem to be in xts format, you can aggregate your daily time series to a monthly time series using the mean function like this:
d2m <- function(x) {
aggregate(x, format(as.Date(zoo::index(x)), "%Y-%m"), FUN=mean)
}

Resources