Final Output
| Date | New_Date |
|-----------| --------- |
|1967-07-01 | |
|1967-07-02 | |
|1967-07-03 | |
|1967-07-04 | |
|1967-07-05 | |
|1967-07-06 | |
|1967-07-07 | 07-July |
|1967-07-08 | |
|1967-07-09 | |
|1967-07-10 | |
|1967-07-11 | |
|1967-07-12 | |
|1967-07-13 | |
|1967-07-14 | 14-July |
Is there any function or library I can use to get "New_Date" (Final output every 7 day)?
I've tried this code but I am not getting the desired *Final output
df <- df %>%
mutate(New_Date <- seq.Data(Date, by = 7),
format(New_Date, format = "%d-%b))
We can use case_when
library(dplyr)
df %>%
mutate(New_date = case_when((row_number() -1) %% 7 + 1 == 7 ~
format(Date, '%d-%b'), TRUE ~ ''))
-output
Date New_date
1 1967-07-01
2 1967-07-02
3 1967-07-03
4 1967-07-04
5 1967-07-05
6 1967-07-06
7 1967-07-07 07-Jul
8 1967-07-08
9 1967-07-09
10 1967-07-10
11 1967-07-11
12 1967-07-12
13 1967-07-13
14 1967-07-14 14-Jul
data
df <- data.frame(Date = seq(as.Date('1967-07-01'), length.out = 14, by = '1 day'))
You can create a index of every 7 days,change the format of those dates and create a new column.
inds <- seq(7, nrow(df), 7)
df$New_Date <- ''
df$New_Date[inds] <- format(df$Date[inds], '%d-%b')
df
# Date New_Date
#1 1967-07-01
#2 1967-07-02
#3 1967-07-03
#4 1967-07-04
#5 1967-07-05
#6 1967-07-06
#7 1967-07-07 07-Jul
#8 1967-07-08
#9 1967-07-09
#10 1967-07-10
#11 1967-07-11
#12 1967-07-12
#13 1967-07-13
#14 1967-07-14 14-Jul
If Date column is not of Date type run df$Date <- as.Date(df$Date) first.
Related
So I would like to transform the following:
days <- c("MONDAY", "SUNDAY", "MONDAY", "SUNDAY", "MONDAY", "SUNDAY")
dates <- c("2020-03-02", "2020-03-08", "2020-03-09", "2020-03-15", "2020-03-16", "2020-03-22")
df <- cbind(days, dates)
+--------+------------+
| days | dates |
+--------+------------+
| MONDAY | 2020.03.02 |
| SUNDAY | 2020.03.08 |
| MONDAY | 2020.03.09 |
| SUNDAY | 2020.03.15 |
| MONDAY | 2020.03.16 |
| SUNDAY | 2020.03.22 |
+--------+------------+
Into this:
+------------+------------+
| MONDAY | SUNDAY |
+------------+------------+
| 2020.03.02 | 2020.03.08 |
| 2020.03.09 | 2020.03.15 |
| 2020.03.16 | 2020.03.22 |
+------------+------------+
Do you have any hints how should I do it? Thank you in advance!
In Base-R
sapply(split(df,df$days), function(x) x$dates)
MONDAY SUNDAY
[1,] "2020-03-02" "2020-03-08"
[2,] "2020-03-09" "2020-03-15"
[3,] "2020-03-16" "2020-03-22"
Here is a solution in tidyr which takes into account JohannesNE's
poignant comment.
You can think of this, as the 'trick' you were referring to in your reply (assuming each consecutive Monday and Sunday is a pair):
df <- as.data.frame(df) # tidyr needs a df object
df <- cbind(pair = rep(1:3, each = 2), df) # the 'trick'!
pair days dates
1 1 MONDAY 2020-03-02
2 1 SUNDAY 2020-03-08
3 2 MONDAY 2020-03-09
4 2 SUNDAY 2020-03-15
5 3 MONDAY 2020-03-16
6 3 SUNDAY 2020-03-22
Now the tidyr implementation:
library(tidyr)
df %>% pivot_wider(names_from = days, values_from = dates)
# A tibble: 3 x 3
pair MONDAY SUNDAY
<int> <chr> <chr>
1 1 2020-03-02 2020-03-08
2 2 2020-03-09 2020-03-15
3 3 2020-03-16 2020-03-22
My input table is given below,
+------+------------------+
| Name | Datetime |
+------+------------------+
| ABC | 26-01-2019 4:55 |
| ABC | 26-01-2019 4:35 |
| ABC | 26-01-2019 5:00 |
| XYZ | 26-01-2019 2:50 |
| XYZ | 26-01-2019 4:00 |
| XYZ | 26-01-2019 4:59 |
+------+------------------+
From the above table I wanted to find the min and max value of 'DateTime' based on 'Name' while rejecting in between 'DataTime' information, and create another column automatically if that person is early or late admitted using R Studio as given below,
+------+------------------+--------+
| Name | Datetime | Col3 |
+------+------------------+--------+
| ABC | 26-01-2019 4:35 | Early |
| ABC | 26-01-2019 5:00 | Late |
| XYZ | 26-01-2019 2:50 | Early |
| XYZ | 26-01-2019 4:59 | Late |
+------+------------------+--------+
Thank you in advance.
Using dplyr, one way would be to convert DateTime column to POSIXct, arrange by Datetime and select 1st and last row (min and max) in each group and add a new column.
library(dplyr)
df %>%
mutate(Datetime = as.POSIXct(Datetime, format = "%d-%m-%Y %H:%M")) %>%
arrange(Datetime) %>%
group_by(Name) %>%
slice(c(1L, n())) %>%
mutate(Col3 = c("Early", "Late"))
# Name Datetime Col3
# <fct> <dttm> <chr>
#1 ABC 2019-01-26 04:35:00 Early
#2 ABC 2019-01-26 05:00:00 Late
#3 XYZ 2019-01-26 02:50:00 Early
#4 XYZ 2019-01-26 04:59:00 Late
Here is a base R option,
transform(stack(data.frame(
do.call(cbind,
tapply(as.POSIXct(dd$Datetime, format = '%d-%m-%Y %H:%M'), dd$Name, function(i)
as.character(c(min(i), max(i))))), stringsAsFactors = FALSE)),
col3 = c('Early', 'Late'))
# values ind col3
#1 2019-01-26 04:35:00 ABC Early
#2 2019-01-26 05:00:00 ABC Late
#3 2019-01-26 02:50:00 XYZ Early
#4 2019-01-26 04:59:00 XYZ Late
We can use tidyverse
library(tidyverse)
df %>%
arrange(dmy_hm(Datetime)) %>%
group_by(Name) %>%
filter(row_number() %in% c(1, n())) %>%
mutate(Col3 = c("Early", "Late"))
# A tibble: 4 x 3
# Groups: Name [2]
# Name Datetime Col3
# <chr> <chr> <chr>
#1 XYZ 26-01-2019 2:50 Early
#2 ABC 26-01-2019 4:35 Early
#3 XYZ 26-01-2019 4:59 Late
#4 ABC 26-01-2019 5:00 Late
data
df <- structure(list(Name = c("ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ"
), Datetime = c("26-01-2019 4:55", "26-01-2019 4:35", "26-01-2019 5:00",
"26-01-2019 2:50", "26-01-2019 4:00", "26-01-2019 4:59")),
class = "data.frame", row.names = c(NA,
-6L))
I wish to average the most recent company rows, for each individual which occur before a specified date.
In other words I would like to average the most recent (for each company) previous alpha values for each individual and for each date.
table1 <- fread(
"individual_id | date
1 | 2018-01-02
1 | 2018-01-04
1 | 2018-01-05
2 | 2018-01-02
2 | 2018-01-05",
sep ="|"
)
table1$date = as.IDate(table1$date)
table2 <- fread(
"individual_id | date2 | company_id | alpha
1 | 2018-01-02 | 62 | 1
1 | 2018-01-04 | 62 | 1.5
1 | 2018-01-05 | 63 | 1
2 | 2018-01-01 | 71 | 2
2 | 2018-01-02 | 74 | 1
2 | 2018-01-05 | 74 | 4",
sep = "|"
)
So for example:
observation 1 in table 1 is individual "1" on the 2018-01-02.
To achieve this I look in table 2 and see that individual 1 has 1 instance prio or on the 2018-01-02 for a company 62. Hence only 1 value to average and the mean alpha is 1.
example 2:
observation for individual 2 on 2018-01-05.
here there are 3 observations for individual 2, 1 for company 71 and 2 for company 74, so we choose the most recent for each company which leaves us with 2 observations 71 on 2018-01-01 and 74 on 2018-01-05, with alpha values of 2 and 4, the mean alpha is then 3.
The result should look like:
table1 <- fread(
"individual_id | date | mean alpha
1 | 2018-01-02 | 1
1 | 2018-01-04 | 1.5
1 | 2018-01-05 | (1.5+1)/2 = 1.25
2 | 2018-01-02 | (2+1)/2 = 1.5
2 | 2018-01-05 | (2+4)/2 = 3",
sep ="|"
)
I can get the sub sample of the first row from table2 using:
table2[, .SD[1], by=company_id]
But I am unsure how limit by the date and combine this with the first table.
Edit
This produces the result for each individual but not by company.
table1[, mean_alpha :=
table2[.SD, on=.(individual_id, date2 <= date), mean(alpha, na.rm = TRUE), by=.EACHI]$V1]
individual_id date mean_alpha
1 2018-01-02 1.000000
1 2018-01-04 1.250000
1 2018-01-05 1.166667
2 2018-01-02 1.500000
2 2018-01-05 2.333333
Here is another possible approach:
#ensure that order is correct before using the most recent for each company
setorder(table2, individual_id, company_id, date2)
table1[, mean_alpha :=
#perform non-equi join
table2[table1, on=.(individual_id, date2<=date),
#for each row of table1,
by=.EACHI,
#get most recent alpha by company_id and average the alphas
mean(.SD[, last(alpha), by=.(company_id)]$V1)]$V1
]
output:
individual_id date mean_alpha
1: 1 2018-01-02 1.00
2: 1 2018-01-04 1.50
3: 1 2018-01-05 1.25
4: 2 2018-01-02 1.50
5: 2 2018-01-05 3.00
data:
library(data.table)
table1 <- fread(
"individual_id | date
1 | 2018-01-02
1 | 2018-01-04
1 | 2018-01-05
2 | 2018-01-02
2 | 2018-01-05",
sep ="|"
)
table1[, date := as.IDate(date)]
table2 <- fread(
"individual_id | date2 | company_id | alpha
1 | 2018-01-02 | 62 | 1
1 | 2018-01-04 | 62 | 1.5
1 | 2018-01-05 | 63 | 1
2 | 2018-01-01 | 71 | 2
2 | 2018-01-02 | 74 | 1
2 | 2018-01-05 | 74 | 4",
sep = "|"
)
table2[, date2 := as.IDate(date2)]
table2[table1,
on = "individual_id",
allow.cartesian = TRUE][
date2 <= date, ][order(-date2)][,
.SD[1,],
by = .(individual_id, company_id, date)][,
mean(alpha),
by = .(individual_id, date)][
order(individual_id, date)]
What I did there: joined tables 1 and 2 on individual, allowing for all possible combinations. Then filtered out the combinations in which date2 was greater than date, so we kept dates2 prior to dates. Ordered them in descending order by date2, so we could select only the most recent occurrencies (that's what's done with .SD[1,]) by each individual_id, company_id and date combinations.
After that, it's just calculating the mean by individual and date, and sorting the table to match with your expecte output.
I am trying to do a vlookup in r using data.tables. I am looking up the value for a specific date, if it is not available I would like the nearest next value.
table1 <- fread(
"id | date_created
1 | 2018-01-02
1 | 2018-01-03
2 | 2018-01-08
2 | 2018-01-09",
sep ="|"
)
table2<- fread(
"otherid | date | value
1 | 2018-01-02 | 1
2 | 2018-01-04 | 5
3 | 2018-01-07 | 3
4 | 2018-01-08 | 5
5 | 2018-01-11 | 3
6 | 2018-01-12 | 2",
sep = "|"
)
The result should look like:
table1 <- fread(
"id | date | value2
1 | 2018-01-02 | 1
1 | 2018-01-03 | 5
2 | 2018-01-08 | 5
2 | 2018-01-09 | 3",
sep ="|"
)
Edit
I fixed it, this works:
table1[, value2:= table2[table1, value, on = .(date=date_created), roll = -7]]
table1[, value2:= table2[table1, value, on = .(date=date_created), roll = -7]]
I have a data frame of stocks and dates. I want to add a "next date" column. How should I do this?
The data is this:
df = data.frame(ticker = c("BHP", "BHP", "BHP", "BHP", "ANZ", "ANZ", "ANZ"), date = c("1999-05-31", "2000-06-30", "2001-06-29", "2002-06-28", "1999-09-30", "2000-09-29", "2001-09-28"))
df$date = as.POSIXct(df$date)
In human-readable form:
ticker | date
-----------------
BHP | 1999-05-31
BHP | 2000-06-30
BHP | 2001-06-29
BHP | 2002-06-28
ANZ | 1999-09-30
ANZ | 2000-09-29
ANZ | 2001-09-28
What I want is to add a column for the next date:
ticker | date | next_date
------------------------------------
BHP | 1999-05-31 | 2000-06-30
BHP | 2000-06-30 | 2001-06-29
BHP | 2001-06-29 | 2002-06-28
BHP | 2002-06-28 | NA # (or some default value)
ANZ | 1999-09-30 | 2000-09-29
ANZ | 2000-09-29 | 2001-09-28
ANZ | 2001-09-28 | NA
library(dplyr)
df %>%
group_by(ticker) %>%
mutate(next_date = lead(date))
We can use ave from base R to do this
df$next_date <- with(df, ave(as.Date(date), ticker, FUN = function(x) c(x[-1], NA)))
df$next_date
#[1] "2000-06-30" "2001-06-29" "2002-06-28" NA "2000-09-29" "2001-09-28" NA
Or we can use data.table
library(data.table)
setDT(df)[, next_date := shift(date, type = "lead"), by = ticker]