date formating in R

date formating in R - r

Final Output
| Date | New_Date |
|-----------| --------- |
|1967-07-01 | |
|1967-07-02 | |
|1967-07-03 | |
|1967-07-04 | |
|1967-07-05 | |
|1967-07-06 | |
|1967-07-07 | 07-July |
|1967-07-08 | |
|1967-07-09 | |
|1967-07-10 | |
|1967-07-11 | |
|1967-07-12 | |
|1967-07-13 | |
|1967-07-14 | 14-July |
Is there any function or library I can use to get "New_Date" (Final output every 7 day)?
I've tried this code but I am not getting the desired *Final output
df <- df %>%
mutate(New_Date <- seq.Data(Date, by = 7),
format(New_Date, format = "%d-%b))

We can use case_when
library(dplyr)
df %>%
mutate(New_date = case_when((row_number() -1) %% 7 + 1 == 7 ~
format(Date, '%d-%b'), TRUE ~ ''))
-output
Date New_date
1 1967-07-01
2 1967-07-02
3 1967-07-03
4 1967-07-04
5 1967-07-05
6 1967-07-06
7 1967-07-07 07-Jul
8 1967-07-08
9 1967-07-09
10 1967-07-10
11 1967-07-11
12 1967-07-12
13 1967-07-13
14 1967-07-14 14-Jul
data
df <- data.frame(Date = seq(as.Date('1967-07-01'), length.out = 14, by = '1 day'))

You can create a index of every 7 days,change the format of those dates and create a new column.
inds <- seq(7, nrow(df), 7)
df$New_Date <- ''
df$New_Date[inds] <- format(df$Date[inds], '%d-%b')
df
# Date New_Date
#1 1967-07-01
#2 1967-07-02
#3 1967-07-03
#4 1967-07-04
#5 1967-07-05
#6 1967-07-06
#7 1967-07-07 07-Jul
#8 1967-07-08
#9 1967-07-09
#10 1967-07-10
#11 1967-07-11
#12 1967-07-12
#13 1967-07-13
#14 1967-07-14 14-Jul
If Date column is not of Date type run df$Date <- as.Date(df$Date) first.

Related

How do I transpose similar record values into separate columns in R with (reshape2 or etc)?

So I would like to transform the following:
days <- c("MONDAY", "SUNDAY", "MONDAY", "SUNDAY", "MONDAY", "SUNDAY")
dates <- c("2020-03-02", "2020-03-08", "2020-03-09", "2020-03-15", "2020-03-16", "2020-03-22")
df <- cbind(days, dates)
+--------+------------+
| days | dates |
+--------+------------+
| MONDAY | 2020.03.02 |
| SUNDAY | 2020.03.08 |
| MONDAY | 2020.03.09 |
| SUNDAY | 2020.03.15 |
| MONDAY | 2020.03.16 |
| SUNDAY | 2020.03.22 |
+--------+------------+
Into this:
+------------+------------+
| MONDAY | SUNDAY |
+------------+------------+
| 2020.03.02 | 2020.03.08 |
| 2020.03.09 | 2020.03.15 |
| 2020.03.16 | 2020.03.22 |
+------------+------------+
Do you have any hints how should I do it? Thank you in advance!

In Base-R
sapply(split(df,df$days), function(x) x$dates)
MONDAY SUNDAY
[1,] "2020-03-02" "2020-03-08"
[2,] "2020-03-09" "2020-03-15"
[3,] "2020-03-16" "2020-03-22"

Here is a solution in tidyr which takes into account JohannesNE's
poignant comment.
You can think of this, as the 'trick' you were referring to in your reply (assuming each consecutive Monday and Sunday is a pair):
df <- as.data.frame(df) # tidyr needs a df object
df <- cbind(pair = rep(1:3, each = 2), df) # the 'trick'!
pair days dates
1 1 MONDAY 2020-03-02
2 1 SUNDAY 2020-03-08
3 2 MONDAY 2020-03-09
4 2 SUNDAY 2020-03-15
5 3 MONDAY 2020-03-16
6 3 SUNDAY 2020-03-22
Now the tidyr implementation:
library(tidyr)
df %>% pivot_wider(names_from = days, values_from = dates)
# A tibble: 3 x 3
pair MONDAY SUNDAY
<int> <chr> <chr>
1 1 2020-03-02 2020-03-08
2 2 2020-03-09 2020-03-15
3 3 2020-03-16 2020-03-22

How to find the Min and Max value based on another column?

My input table is given below,
+------+------------------+
| Name | Datetime |
+------+------------------+
| ABC | 26-01-2019 4:55 |
| ABC | 26-01-2019 4:35 |
| ABC | 26-01-2019 5:00 |
| XYZ | 26-01-2019 2:50 |
| XYZ | 26-01-2019 4:00 |
| XYZ | 26-01-2019 4:59 |
+------+------------------+
From the above table I wanted to find the min and max value of 'DateTime' based on 'Name' while rejecting in between 'DataTime' information, and create another column automatically if that person is early or late admitted using R Studio as given below,
+------+------------------+--------+
| Name | Datetime | Col3 |
+------+------------------+--------+
| ABC | 26-01-2019 4:35 | Early |
| ABC | 26-01-2019 5:00 | Late |
| XYZ | 26-01-2019 2:50 | Early |
| XYZ | 26-01-2019 4:59 | Late |
+------+------------------+--------+
Thank you in advance.

Using dplyr, one way would be to convert DateTime column to POSIXct, arrange by Datetime and select 1st and last row (min and max) in each group and add a new column.
library(dplyr)
df %>%
mutate(Datetime = as.POSIXct(Datetime, format = "%d-%m-%Y %H:%M")) %>%
arrange(Datetime) %>%
group_by(Name) %>%
slice(c(1L, n())) %>%
mutate(Col3 = c("Early", "Late"))
# Name Datetime Col3
# <fct> <dttm> <chr>
#1 ABC 2019-01-26 04:35:00 Early
#2 ABC 2019-01-26 05:00:00 Late
#3 XYZ 2019-01-26 02:50:00 Early
#4 XYZ 2019-01-26 04:59:00 Late

Here is a base R option,
transform(stack(data.frame(
do.call(cbind,
tapply(as.POSIXct(dd$Datetime, format = '%d-%m-%Y %H:%M'), dd$Name, function(i)
as.character(c(min(i), max(i))))), stringsAsFactors = FALSE)),
col3 = c('Early', 'Late'))
# values ind col3
#1 2019-01-26 04:35:00 ABC Early
#2 2019-01-26 05:00:00 ABC Late
#3 2019-01-26 02:50:00 XYZ Early
#4 2019-01-26 04:59:00 XYZ Late

We can use tidyverse
library(tidyverse)
df %>%
arrange(dmy_hm(Datetime)) %>%
group_by(Name) %>%
filter(row_number() %in% c(1, n())) %>%
mutate(Col3 = c("Early", "Late"))
# A tibble: 4 x 3
# Groups: Name [2]
# Name Datetime Col3
# <chr> <chr> <chr>
#1 XYZ 26-01-2019 2:50 Early
#2 ABC 26-01-2019 4:35 Early
#3 XYZ 26-01-2019 4:59 Late
#4 ABC 26-01-2019 5:00 Late
data
df <- structure(list(Name = c("ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ"
), Datetime = c("26-01-2019 4:55", "26-01-2019 4:35", "26-01-2019 5:00",
"26-01-2019 2:50", "26-01-2019 4:00", "26-01-2019 4:59")),
class = "data.frame", row.names = c(NA,
-6L))

Average the first row by group from data.table lookup

I wish to average the most recent company rows, for each individual which occur before a specified date.
In other words I would like to average the most recent (for each company) previous alpha values for each individual and for each date.
table1 <- fread(
"individual_id | date
1 | 2018-01-02
1 | 2018-01-04
1 | 2018-01-05
2 | 2018-01-02
2 | 2018-01-05",
sep ="|"
)
table1$date = as.IDate(table1$date)
table2 <- fread(
"individual_id | date2 | company_id | alpha
1 | 2018-01-02 | 62 | 1
1 | 2018-01-04 | 62 | 1.5
1 | 2018-01-05 | 63 | 1
2 | 2018-01-01 | 71 | 2
2 | 2018-01-02 | 74 | 1
2 | 2018-01-05 | 74 | 4",
sep = "|"
)
So for example:
observation 1 in table 1 is individual "1" on the 2018-01-02.
To achieve this I look in table 2 and see that individual 1 has 1 instance prio or on the 2018-01-02 for a company 62. Hence only 1 value to average and the mean alpha is 1.
example 2:
observation for individual 2 on 2018-01-05.
here there are 3 observations for individual 2, 1 for company 71 and 2 for company 74, so we choose the most recent for each company which leaves us with 2 observations 71 on 2018-01-01 and 74 on 2018-01-05, with alpha values of 2 and 4, the mean alpha is then 3.
The result should look like:
table1 <- fread(
"individual_id | date | mean alpha
1 | 2018-01-02 | 1
1 | 2018-01-04 | 1.5
1 | 2018-01-05 | (1.5+1)/2 = 1.25
2 | 2018-01-02 | (2+1)/2 = 1.5
2 | 2018-01-05 | (2+4)/2 = 3",
sep ="|"
)
I can get the sub sample of the first row from table2 using:
table2[, .SD[1], by=company_id]
But I am unsure how limit by the date and combine this with the first table.
Edit
This produces the result for each individual but not by company.
table1[, mean_alpha :=
table2[.SD, on=.(individual_id, date2 <= date), mean(alpha, na.rm = TRUE), by=.EACHI]$V1]
individual_id date mean_alpha
1 2018-01-02 1.000000
1 2018-01-04 1.250000
1 2018-01-05 1.166667
2 2018-01-02 1.500000
2 2018-01-05 2.333333

Here is another possible approach:
#ensure that order is correct before using the most recent for each company
setorder(table2, individual_id, company_id, date2)
table1[, mean_alpha :=
#perform non-equi join
table2[table1, on=.(individual_id, date2<=date),
#for each row of table1,
by=.EACHI,
#get most recent alpha by company_id and average the alphas
mean(.SD[, last(alpha), by=.(company_id)]$V1)]$V1
]
output:
individual_id date mean_alpha
1: 1 2018-01-02 1.00
2: 1 2018-01-04 1.50
3: 1 2018-01-05 1.25
4: 2 2018-01-02 1.50
5: 2 2018-01-05 3.00
data:
library(data.table)
table1 <- fread(
"individual_id | date
1 | 2018-01-02
1 | 2018-01-04
1 | 2018-01-05
2 | 2018-01-02
2 | 2018-01-05",
sep ="|"
)
table1[, date := as.IDate(date)]
table2 <- fread(
"individual_id | date2 | company_id | alpha
1 | 2018-01-02 | 62 | 1
1 | 2018-01-04 | 62 | 1.5
1 | 2018-01-05 | 63 | 1
2 | 2018-01-01 | 71 | 2
2 | 2018-01-02 | 74 | 1
2 | 2018-01-05 | 74 | 4",
sep = "|"
)
table2[, date2 := as.IDate(date2)]

table2[table1,
on = "individual_id",
allow.cartesian = TRUE][
date2 <= date, ][order(-date2)][,
.SD[1,],
by = .(individual_id, company_id, date)][,
mean(alpha),
by = .(individual_id, date)][
order(individual_id, date)]
What I did there: joined tables 1 and 2 on individual, allowing for all possible combinations. Then filtered out the combinations in which date2 was greater than date, so we kept dates2 prior to dates. Ordered them in descending order by date2, so we could select only the most recent occurrencies (that's what's done with .SD[1,]) by each individual_id, company_id and date combinations.
After that, it's just calculating the mean by individual and date, and sorting the table to match with your expecte output.

R data tables vlookup nearest date

I am trying to do a vlookup in r using data.tables. I am looking up the value for a specific date, if it is not available I would like the nearest next value.
table1 <- fread(
"id | date_created
1 | 2018-01-02
1 | 2018-01-03
2 | 2018-01-08
2 | 2018-01-09",
sep ="|"
)
table2<- fread(
"otherid | date | value
1 | 2018-01-02 | 1
2 | 2018-01-04 | 5
3 | 2018-01-07 | 3
4 | 2018-01-08 | 5
5 | 2018-01-11 | 3
6 | 2018-01-12 | 2",
sep = "|"
)
The result should look like:
table1 <- fread(
"id | date | value2
1 | 2018-01-02 | 1
1 | 2018-01-03 | 5
2 | 2018-01-08 | 5
2 | 2018-01-09 | 3",
sep ="|"
)
Edit
I fixed it, this works:
table1[, value2:= table2[table1, value, on = .(date=date_created), roll = -7]]

table1[, value2:= table2[table1, value, on = .(date=date_created), roll = -7]]

How to add a column for next date in an R data frame

I have a data frame of stocks and dates. I want to add a "next date" column. How should I do this?
The data is this:
df = data.frame(ticker = c("BHP", "BHP", "BHP", "BHP", "ANZ", "ANZ", "ANZ"), date = c("1999-05-31", "2000-06-30", "2001-06-29", "2002-06-28", "1999-09-30", "2000-09-29", "2001-09-28"))
df$date = as.POSIXct(df$date)
In human-readable form:
ticker | date
-----------------
BHP | 1999-05-31
BHP | 2000-06-30
BHP | 2001-06-29
BHP | 2002-06-28
ANZ | 1999-09-30
ANZ | 2000-09-29
ANZ | 2001-09-28
What I want is to add a column for the next date:
ticker | date | next_date
------------------------------------
BHP | 1999-05-31 | 2000-06-30
BHP | 2000-06-30 | 2001-06-29
BHP | 2001-06-29 | 2002-06-28
BHP | 2002-06-28 | NA # (or some default value)
ANZ | 1999-09-30 | 2000-09-29
ANZ | 2000-09-29 | 2001-09-28
ANZ | 2001-09-28 | NA

library(dplyr)
df %>%
group_by(ticker) %>%
mutate(next_date = lead(date))

We can use ave from base R to do this
df$next_date <- with(df, ave(as.Date(date), ticker, FUN = function(x) c(x[-1], NA)))
df$next_date
#[1] "2000-06-30" "2001-06-29" "2002-06-28" NA "2000-09-29" "2001-09-28" NA
Or we can use data.table
library(data.table)
setDT(df)[, next_date := shift(date, type = "lead"), by = ticker]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

date formating in R - r

Related

How do I transpose similar record values into separate columns in R with (reshape2 or etc)?

How to find the Min and Max value based on another column?

Average the first row by group from data.table lookup

R data tables vlookup nearest date

How to add a column for next date in an R data frame

Categories

Resources