R - How can I choose the earliest date column from date columns? - r

I would like to get a column that has the earliest date in each row from multiple date columns.
My dataset is like this.
df = data.frame( x_date = as.Date( c("2016-1-3", "2016-3-5", "2016-5-5")) , y_date = as.Date( c("2016-2-2", "2016-3-1", "2016-4-4")), z_date = as.Date(c("2016-3-2", "2016-1-1", "2016-7-1")) )
+---+-----------+------------+-----------+
| | x_date | y_date | z_date |
+---+-----------+------------+-----------+
|1 | 2016-01-03 | 2016-02-02 |2016-03-02 |
|2 | 2016-03-05 | 2016-03-01 |2016-01-01 |
|3 | 2016-05-05 | 2016-04-04 |2016-07-01 |
+---+-----------+------------+-----------+
I would like to get something like the following column.
+---+---------------+
| | earliest_date |
+---+---------------+
|1 | 2016-01-03 |
|2 | 2016-01-01 |
|3 | 2016-04-04 |
+---+---------------+
This is my code, but it outputs the earliest date from the overall columns and rows....
library(dplyr)
df %>% dplyr::mutate(earliest_date = min(x_date, y_date, z_date))

One option is pmin
df %>%
mutate(earliest_date = pmin(x_date, y_date, z_date))
# x_date y_date z_date earliest_date
#1 2016-01-03 2016-02-02 2016-03-02 2016-01-03
#2 2016-03-05 2016-03-01 2016-01-01 2016-01-01
#3 2016-05-05 2016-04-04 2016-07-01 2016-04-04
If we need only the single column, then transmute is the option
df %>%
transmute(earliest_date = pmin(x_date, y_date,z_date))

You need to transform your data set first if you want the output to be a data frame with columns in rows.
library(reshape2)
melt(df) %>% group_by(variable) %>% summarize(earliest_date = min(value))

You can apply rowwise to get minimum of the date (as the dates are already of class Date)
apply(df, 1, min)
#[1] "2016-01-03" "2016-01-01" "2016-04-04"
Or you can also use pmin with do.call
do.call(pmin, df)
#[1] "2016-01-03" "2016-01-01" "2016-04-04"

Related

Create new column based on cut off date (date from another column)

I have a sample df below (with date formatted into as.Date):
| date |
--------------
| 2020-03-03 |
| 2020-06-30 |
| 2020-01-23 |
| 2020-02-10 |
| 2020-11-29 |
I am trying to add a column according to cut-off date of 2020-05-01 and expects to get this table:
| date | cutoff |
------------------------
| 2020-03-03 | prior |
| 2020-06-30 | later |
| 2020-01-23 | prior |
| 2020-02-10 | prior |
| 2020-11-29 | later |
I used dplyr and called the mutate to create a column and initially applied case_when:
df %>%
mutate(cutoff = case_when(
date < 2020-05-01 ~ "prior",
"later"
))
The code above created cutoff column with only "later" values.
I also tried the ifelse:
df <- with(df, ifelse(date < 2020-05-01, "prior", "later"))
The code above replaced the values in the date column with NA value.
I gave a different code a try:
df %>%
mutate(cutoff = case_when(date < 2020-05-01 ~ "prior",
TRUE ~ "later"))
but the result was the same as the first code I tried.
I thought of converting the date into POSixct format, but each code above produced the same output as above.
First define date class with ymd then use ifelse:
library(lubridate)
library(dplyr)
df %>%
mutate(date = ymd(date),
cutoff = ifelse(date < ymd("2020-05-01"), "prior", "later"))
date cutoff
1 2020-03-03 prior
2 2020-06-30 later
3 2020-01-23 prior
4 2020-02-10 prior
5 2020-11-29 later
data:
df <- structure(list(date = c("2020-03-03", "2020-06-30", "2020-01-23",
"2020-02-10", "2020-11-29")), class = "data.frame", row.names = c(NA,
-5L))

How do I transpose similar record values into separate columns in R with (reshape2 or etc)?

So I would like to transform the following:
days <- c("MONDAY", "SUNDAY", "MONDAY", "SUNDAY", "MONDAY", "SUNDAY")
dates <- c("2020-03-02", "2020-03-08", "2020-03-09", "2020-03-15", "2020-03-16", "2020-03-22")
df <- cbind(days, dates)
+--------+------------+
| days | dates |
+--------+------------+
| MONDAY | 2020.03.02 |
| SUNDAY | 2020.03.08 |
| MONDAY | 2020.03.09 |
| SUNDAY | 2020.03.15 |
| MONDAY | 2020.03.16 |
| SUNDAY | 2020.03.22 |
+--------+------------+
Into this:
+------------+------------+
| MONDAY | SUNDAY |
+------------+------------+
| 2020.03.02 | 2020.03.08 |
| 2020.03.09 | 2020.03.15 |
| 2020.03.16 | 2020.03.22 |
+------------+------------+
Do you have any hints how should I do it? Thank you in advance!
In Base-R
sapply(split(df,df$days), function(x) x$dates)
MONDAY SUNDAY
[1,] "2020-03-02" "2020-03-08"
[2,] "2020-03-09" "2020-03-15"
[3,] "2020-03-16" "2020-03-22"
Here is a solution in tidyr which takes into account JohannesNE's
poignant comment.
You can think of this, as the 'trick' you were referring to in your reply (assuming each consecutive Monday and Sunday is a pair):
df <- as.data.frame(df) # tidyr needs a df object
df <- cbind(pair = rep(1:3, each = 2), df) # the 'trick'!
pair days dates
1 1 MONDAY 2020-03-02
2 1 SUNDAY 2020-03-08
3 2 MONDAY 2020-03-09
4 2 SUNDAY 2020-03-15
5 3 MONDAY 2020-03-16
6 3 SUNDAY 2020-03-22
Now the tidyr implementation:
library(tidyr)
df %>% pivot_wider(names_from = days, values_from = dates)
# A tibble: 3 x 3
pair MONDAY SUNDAY
<int> <chr> <chr>
1 1 2020-03-02 2020-03-08
2 2 2020-03-09 2020-03-15
3 3 2020-03-16 2020-03-22

R: merge two datasets within range of dates

I have one dataset x that looks something like this:
id | date
1 | 2014-02-04
1 | 2014-03-15
2 | 2014-02-04
2 | 2014-03-15
And I would like to merge it with another dataset, y, by id and date. But with date from x being same as or preceding the date in dataset y for every observation. Dataset y looks like this:
id | date | value
1 | 2014-02-07 | 100
2 | 2014-02-04 | 20
2 | 2014-03-22 | 80
So I would want my final dataset to be:
id | date.x | date.y | value
1 | 2014-02-04 | 2014-02-07 | 100
1 | 2014-03-15 | |
2 | 2014-02-04 | 2014-02-04 | 20
2 | 2014-03-15 | 2014-03-22 | 80
I really do not have a lead on how to approach something like this, any help is appreciated. Thanks!
This is easy in data.table, using the roll-argument
First, craete sample data with actual dates
library( data.table )
DT1 <- fread("id | date
1 | 2014-02-04
1 | 2014-03-15
2 | 2014-02-04
2 | 2014-03-15")
DT2 <- fread("id | date | value
1 | 2014-02-07 | 100
2 | 2014-02-04 | 20
2 | 2014-03-22 | 80")
DT1[, date := as.Date( date ) ]
DT2[, date := as.Date( date ) ]
now, perform an update join on DT1, where the columns date.y and value are the result of the (left rolling) join from DT2[ DT1, .( x.date, value), on = .(id, date), roll = -Inf ].
This code joins on two columns, id and date, the roll-argument -Inf is used on the last one (i.e. date). To make sure the date-value from DT2 is returned, and not the date from DT1, we call for x.date in stead of date (which returns the date -value from DT1)
#rolling update join
DT1[, c("date.y", "value") := DT2[ DT1, .( x.date, value), on = .(id, date), roll = -Inf ]][]
# id date date.y value
# 1: 1 2014-02-04 2014-02-07 100
# 2: 1 2014-03-15 <NA> NA
# 3: 2 2014-02-04 2014-02-04 20
# 4: 2 2014-03-15 2014-03-22 80
Another option is to full_join by year & month.
Firstly we need to add an additional column that extracts month and year from date column:
library(zoo)
library(dplyr)
xx <- x %>%
mutate(y_m = as.yearmon(date))
yy <- y %>%
mutate(y_m = as.yearmon(date))
Then we need to fully join by id and y_m:
out <- full_join(xx,yy, by = c("id","y_m")) %>%
select(-y_m)
> out
# A tibble: 4 x 4
id date.x date.y value
<dbl> <date> <date> <dbl>
1 1 2014-02-04 2014-02-07 100
2 1 2014-03-15 NA NA
3 2 2014-02-04 2014-02-04 20
4 2 2014-03-15 2014-03-22 80

How to find the Min and Max value based on another column?

My input table is given below,
+------+------------------+
| Name | Datetime |
+------+------------------+
| ABC | 26-01-2019 4:55 |
| ABC | 26-01-2019 4:35 |
| ABC | 26-01-2019 5:00 |
| XYZ | 26-01-2019 2:50 |
| XYZ | 26-01-2019 4:00 |
| XYZ | 26-01-2019 4:59 |
+------+------------------+
From the above table I wanted to find the min and max value of 'DateTime' based on 'Name' while rejecting in between 'DataTime' information, and create another column automatically if that person is early or late admitted using R Studio as given below,
+------+------------------+--------+
| Name | Datetime | Col3 |
+------+------------------+--------+
| ABC | 26-01-2019 4:35 | Early |
| ABC | 26-01-2019 5:00 | Late |
| XYZ | 26-01-2019 2:50 | Early |
| XYZ | 26-01-2019 4:59 | Late |
+------+------------------+--------+
Thank you in advance.
Using dplyr, one way would be to convert DateTime column to POSIXct, arrange by Datetime and select 1st and last row (min and max) in each group and add a new column.
library(dplyr)
df %>%
mutate(Datetime = as.POSIXct(Datetime, format = "%d-%m-%Y %H:%M")) %>%
arrange(Datetime) %>%
group_by(Name) %>%
slice(c(1L, n())) %>%
mutate(Col3 = c("Early", "Late"))
# Name Datetime Col3
# <fct> <dttm> <chr>
#1 ABC 2019-01-26 04:35:00 Early
#2 ABC 2019-01-26 05:00:00 Late
#3 XYZ 2019-01-26 02:50:00 Early
#4 XYZ 2019-01-26 04:59:00 Late
Here is a base R option,
transform(stack(data.frame(
do.call(cbind,
tapply(as.POSIXct(dd$Datetime, format = '%d-%m-%Y %H:%M'), dd$Name, function(i)
as.character(c(min(i), max(i))))), stringsAsFactors = FALSE)),
col3 = c('Early', 'Late'))
# values ind col3
#1 2019-01-26 04:35:00 ABC Early
#2 2019-01-26 05:00:00 ABC Late
#3 2019-01-26 02:50:00 XYZ Early
#4 2019-01-26 04:59:00 XYZ Late
We can use tidyverse
library(tidyverse)
df %>%
arrange(dmy_hm(Datetime)) %>%
group_by(Name) %>%
filter(row_number() %in% c(1, n())) %>%
mutate(Col3 = c("Early", "Late"))
# A tibble: 4 x 3
# Groups: Name [2]
# Name Datetime Col3
# <chr> <chr> <chr>
#1 XYZ 26-01-2019 2:50 Early
#2 ABC 26-01-2019 4:35 Early
#3 XYZ 26-01-2019 4:59 Late
#4 ABC 26-01-2019 5:00 Late
data
df <- structure(list(Name = c("ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ"
), Datetime = c("26-01-2019 4:55", "26-01-2019 4:35", "26-01-2019 5:00",
"26-01-2019 2:50", "26-01-2019 4:00", "26-01-2019 4:59")),
class = "data.frame", row.names = c(NA,
-6L))

How to add a column for next date in an R data frame

I have a data frame of stocks and dates. I want to add a "next date" column. How should I do this?
The data is this:
df = data.frame(ticker = c("BHP", "BHP", "BHP", "BHP", "ANZ", "ANZ", "ANZ"), date = c("1999-05-31", "2000-06-30", "2001-06-29", "2002-06-28", "1999-09-30", "2000-09-29", "2001-09-28"))
df$date = as.POSIXct(df$date)
In human-readable form:
ticker | date
-----------------
BHP | 1999-05-31
BHP | 2000-06-30
BHP | 2001-06-29
BHP | 2002-06-28
ANZ | 1999-09-30
ANZ | 2000-09-29
ANZ | 2001-09-28
What I want is to add a column for the next date:
ticker | date | next_date
------------------------------------
BHP | 1999-05-31 | 2000-06-30
BHP | 2000-06-30 | 2001-06-29
BHP | 2001-06-29 | 2002-06-28
BHP | 2002-06-28 | NA # (or some default value)
ANZ | 1999-09-30 | 2000-09-29
ANZ | 2000-09-29 | 2001-09-28
ANZ | 2001-09-28 | NA
library(dplyr)
df %>%
group_by(ticker) %>%
mutate(next_date = lead(date))
We can use ave from base R to do this
df$next_date <- with(df, ave(as.Date(date), ticker, FUN = function(x) c(x[-1], NA)))
df$next_date
#[1] "2000-06-30" "2001-06-29" "2002-06-28" NA "2000-09-29" "2001-09-28" NA
Or we can use data.table
library(data.table)
setDT(df)[, next_date := shift(date, type = "lead"), by = ticker]

Resources