I have a dataframe containing daily prices of a stock exchange with corresponding dates for several years. These dates are tradingdates and is thus excluded weekends and holidays. Ex:
df$date <- c(as.Date("2017-03-30", "2017-03-31", "2017-04-03", "2017-04-04")
I have used lubridate to extract a column containg which month each date is in, but what I struggle with is creating a column that for each month of every year, calculates which number of trading day in the month it is. I.e. from the example, a counter that will start at 1 for 2017-04-03 as this is the first observation of the month and not 3 as it is the third day of the month and end at the last observation of the month. So that the column would look like this:
df$DayofMonth <- c(22, 23, 1, 2)
and not
df$DayofMonth <- c(30, 31, 3, 4)
Is there anybody that can help me?
Maybe this helps:
library(data.table)
library(stringr)
df <- setDT(df)
df[,YearMonth:=str_sub(Date,1,7)]
df[, DayofMonth := seq(.N), by = YearMonth]
You have a column called YearMonth with values like these '2020-01'.
Then for each group (month) you give each date an index which in your case would correspond to the trading day.
As you can see this would lead to 1 for the date '2017-04-03' since it is the first trading day that month. This works if your df is sorted from first date to latest date.
There is a way using lubridate to extract the date components and dplyr.
library(dplyr)
library(lubridate)
df <- data.frame(date = as.Date(c("2017-03-30", "2017-03-31", "2017-04-03", "2017-04-04")))
df %>%
mutate(month = month(date),
year = year(date),
day = day(date)) %>%
group_by(year, month) %>%
mutate(DayofMonth = day - min(day) + 1)
# A tibble: 4 x 5
# Groups: year, month [2]
date month year day DayofMonth
<date> <dbl> <dbl> <int> <dbl>
1 2017-03-30 3 2017 30 1
2 2017-03-31 3 2017 31 2
3 2017-04-03 4 2017 3 1
4 2017-04-04 4 2017 4 2
You can try the following :
For each date find out the first day of that month.
Count how many working days are present between first_day_of_month and current date.
library(dplyr)
library(lubridate)
df %>%
mutate(first_day_of_month = floor_date(date, 'month'),
day_of_month = purrr::map2_dbl(first_day_of_month, date,
~sum(!weekdays(seq(.x, .y, by = 'day')) %in% c('Saturday', 'Sunday'))))
# date first_day_of_month day_of_month
#1 2017-03-30 2017-03-01 22
#2 2017-03-31 2017-03-01 23
#3 2017-04-03 2017-04-01 1
#4 2017-04-04 2017-04-01 2
You can drop the first_day_of_month column if not needed.
data
df <- data.frame(Date = as.Date(c("2017-03-30", "2017-03-31",
"2017-04-03", "2017-04-04")))
Related
I have a data.frame where in one column I have a lot of different dates in the format Year-Month-Day and I would like to keep only the rows that have as a month 12, so December.
I tried two different codes:
First version:
IBES1985_1990[IBES1985_1990$`Forecast Period End Date, SAS Format` !=
month(1,2,3,4,5,6,7,8,9,10,11, )]
But here I get an error saying that undefined columns where selected.
Second version:
IBES1985_1990 <- IBES1985_1990 %>%
mutate(`Forecast Period End Date, SAS Format`= ifelse(month(`Forecast Period End Date, SAS Format`)
%in% c(1,2,3,4,5,6,7,8,9,10,11),NA,`Forecast Period End Date, SAS Format`))
Here I wanted to then delete all the rows that have NA in it but the date format changed to pure numbers and I couldn't change it back to see if I the dates that don't have December were already deleted or not.
In summary, I would like to have a code where all rows are deleted that are not December.
If your data looks like this
library(lubridate)
df <- data.frame(dates = seq.Date(ymd("2022-09-02"), ymd("2023-02-02"), "month"),
data = 1:6)
df
dates data
1 2022-09-02 1
2 2022-10-02 2
3 2022-11-02 3
4 2022-12-02 4
5 2023-01-02 5
6 2023-02-02 6
keep all December dates e.g. by using strftime
df[strftime(df$dates, format="%b") == "Dec", ]
dates data
4 2022-12-02 4
With dplyr you can do
library(dplyr)
df %>%
rowwise() %>%
summarize(dates = dates[strftime(dates, format="%b") == "Dec"], data)
# A tibble: 1 × 2
dates data
<date> <int>
1 2022-12-02 4
or, if you want to use lubridates month
library(dplyr)
library(lubridate)
df %>%
rowwise() %>%
summarize(dates = dates[month(dates) == 12], data)
# A tibble: 1 × 2
dates data
<date> <int>
1 2022-12-02 4
I am looking to calculate a 3 month rolling sum of values in one column of a data frame based upon the dates in another column and product.
newResults data frame columns : Product, Date, Value
In this example, I wish to calculate the rolling sum of value for Product for 3 months. I have sorted the data frame on Product and Date.
Dataset Example:
Sample Dataset
My Code:
newResults = newResults %>%
group_by(Product) %>%
mutate(Roll_12Mth =
rollapplyr(Value, width = 1:n() - findInterval( Date %m-% months(3), date), sum)) %>%
ungroup
Error: Problem with mutate() input Roll_12Mth.
x could not find function "%m-%"
i Input Roll_12Mth is rollapplyr(...).
Output:
Output
If the dates are always spaced 1 month apart, it is easy.
dat=data.frame(Date=seq(as.Date("2/1/2017", "%m/%d/%Y"), as.Date("1/1/2018", "%m/%d/%Y"), by="month"),
Product=rep(c("A", "B"), each=6),
Value=c(4182, 4822, 4805, 6235, 3665, 3326, 3486, 3379, 3596, 3954, 3745, 3956))
library(zoo)
library(dplyr)
dat %>%
group_by(Product) %>%
arrange(Date, .by_group=TRUE) %>%
mutate(Value=rollapplyr(Value, 3, sum, partial=TRUE))
Date Product Value
<date> <fct> <dbl>
1 2017-02-01 A 4182
2 2017-03-01 A 9004
3 2017-04-01 A 13809
4 2017-05-01 A 15862
5 2017-06-01 A 14705
6 2017-07-01 A 13226
7 2017-08-01 B 3486
8 2017-09-01 B 6865
9 2017-10-01 B 10461
10 2017-11-01 B 10929
11 2017-12-01 B 11295
12 2018-01-01 B 11655
I am new to R and I would like to ask how to transform the below data set into the two outcome tables which
have unique name as the row and list the trip 1, 2, 3, 4, 5 and so on of each person and have the avg trip n grand total at last column n row.
The second table I want to know the lag days between trips and avg. lag day of each person as the last column. Lag is the day between trips.
Dataset
name <- c('Mary', 'Sue', 'Peter', 'Mary', 'Mary', 'John', 'Sue', 'Peter',
'Peter', 'John', 'John', 'John', 'Mary', 'Mary')
date <- c('01/04/2018', '03/02/2017', '01/01/2019', '24/04/2017',
'02/03/2019', '31/05/2019', '08/09/2019', '17/12/2019',
'02/08/2017', '10/11/2017', '30/12/2017', '18/02/2018',
'18/02/2018', '18/10/2019')
data <- data.frame(name, date)
The desired results:
Result 1
Name Trip 1 Trip2 Total trips
Mary dd/mm/yyyy dd/mm/yyyy 2
John dd/mm/yyyy. N/A 1
Total Trip 2 1 3
Result 2
Name Lag1 Lag2 Avg.Lag
Mary 3 4 3.5
John 5 1 3
Result 1 can be achieved by arranging the data by date (first convert to date format) and doing a group_by() per person to calculate the rank and count of the trips. These can then by pivoted into columns using pivot_wider() from the tidyr package (the paste0() lines are to ensure readable column names).
For result 2 the difference in days needs to be calculated between trips using difftime(), which will give an NA for the first trip. The rest of the procedure is similar to result 1, but some columns have to be removed before the pivot.
library(dplyr)
library(tidyr)
name <- c('Mary','Sue','Peter','Mary','Mary','John','Sue','Peter','Peter','John',
'John','John','Mary','Mary')
date <- c('01/04/2018','03/02/2017','01/01/2019','24/04/2017',
'02/03/2019','31/05/2019','08/09/2019','17/12/2019',
'02/08/2017','10/11/2017','30/12/2017','18/02/2018',
'18/02/2018','18/10/2019')
data <- data.frame(name,date, stringsAsFactors = F)
data <- data %>%
mutate(date = as.Date(date, format = '%d/%m/%Y')) %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(trip_nr = rank(date),
total_trips = n()) %>%
ungroup()
result1 <- data %>%
mutate(trip_nr = paste0('Trip_', trip_nr)) %>%
pivot_wider(names_from = trip_nr, values_from = date)
result2 <- data %>%
group_by(name) %>%
mutate(lag = difftime(date, lag(date), units = 'days'),
lag_avg = mean(lag, na.rm = T)) %>%
ungroup() %>%
filter(!is.na(lag)) %>%
mutate(lag_nr = paste0('Lag_', trip_nr-1)) %>%
select(-date,-trip_nr,-total_trips) %>%
pivot_wider(names_from = lag_nr, values_from = lag)
This gives the output for result1:
# A tibble: 4 x 7
name total_trips Trip_1 Trip_2 Trip_3 Trip_4 Trip_5
<chr> <int> <date> <date> <date> <date> <date>
1 John 4 2017-11-10 2017-12-30 2018-02-18 2019-05-31 NA
2 Mary 5 2017-04-24 2018-02-18 2018-04-01 2019-03-02 2019-10-18
3 Peter 3 2017-08-02 2019-01-01 2019-12-17 NA NA
4 Sue 2 2017-02-03 2019-09-08 NA NA NA
and result2:
# A tibble: 4 x 6
# Groups: name [4]
name lag_avg Lag_1 Lag_2 Lag_3 Lag_4
<chr> <drtn> <drtn> <drtn> <drtn> <drtn>
1 John 189.00 days 50 days 50 days 467 days NA days
2 Mary 226.75 days 300 days 42 days 335 days 230 days
3 Peter 433.50 days 517 days 350 days NA days NA days
4 Sue 947.00 days 947 days NA days NA days NA days
enter code here
data$date <- as.character(data$date)
data <- data[order(as.Date(data$date,"%d/%m/%Y")),]
data <- data.table(data)
data[,date := as.Date(date,"%d/%m/%Y")]
#trips
data[,Trips:=seq(.N),by="name"]
#time diff in "days" between trips
data[,Lag:=shift(date,1),by="name"]
data[,diff:=difftime(Lag,date,"days"),by="name"]
data[,diff:=abs(as.numeric(diff))]
#creating second summary table
data_summary_second_table <- data[,.(Avg_lag=mean(diff,na.rm = TRUE)),by="name"]
How do I separate date and time into 2 different variables if the_date column is as follows:
the_date
12/25/17 0:00
How can I separately retrieve
year as 2017,
month as 12 or December,
Date as 12, and
time as 0.00 ?
After converting to DateTime, then we extract each of the components
library(lubridate)
library(dplyr)
df1 %>%
mutate(v1 = mdy_hm(v1),
Year = year(v1),
Month = month(v1),
Date = day(v1),
time = format(v1, "%H:%M:%S"))
# A tibble: 1 x 5
# v1 Year Month Date time
# <dttm> <dbl> <dbl> <int> <chr>
#1 2017-12-25 00:00:00 2017 12 25 00:00:00
data
df1 <- tibble(v1 = "12/25/17 0:00")
I need some help with R timeseries. I have daily values of temperature for a 30 year period = 365*30 days = 10950 days (if bisiest years are not considered) . I want to create a "daily climatology", that is, the average of
each (the 30 values) 1st of January, 2nd of January, etc.., to create a timesieres with 365 values. Could anyone help me with this topic?. Thanks in advance.
Something like this with dplyr + lubridate:
library(dplyr)
library(lubridate)
df %>%
group_by(month = month(date), day = day(date)) %>%
summarize(avg_value = mean(value)) %>%
pull(avg_value) %>%
ts() %>%
plot(ylab = "avg_value")
Result:
> df %>%
+ group_by(month = month(date), day = day(date)) %>%
+ summarize(avg_value = mean(value))
# A tibble: 366 x 3
# Groups: month [?]
month day avg_value
<dbl> <int> <dbl>
1 1 1 0.19750444
2 1 2 0.30492408
3 1 3 0.16760465
4 1 4 -0.09357058
5 1 5 0.10606383
6 1 6 -0.14456526
7 1 7 0.23384988
8 1 8 -0.11987095
9 1 9 -0.01166687
10 1 10 -0.08134161
# ... with 356 more rows
Data:
df = data.frame(date = seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days"),
value = rnorm(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days"))))
I had the same probleme to solve and found an answer here:
Daily average calculation from multiple year daily weather data?
It took some time for me to understand and reorder all the comments beacause there was no straight code.
So here I give an complete example based on the link above.
As an example 3 years of random precipitation and temperature data:
test_data <- data.frame("date"= seq(from = as.Date("1990/1/1"), to = as.Date("1992/12/31"), by = "day"),"prec" =runif(1096, 0, 10),"temp" = runif(1096, 0, 10))
Next step is to ad a new column with a variable on which base the average will be calculated. One Day in this example:
test_data$day <- format(test_data$date, format='%m-%d')
In this column everyday of a year appears 3 times because of the 3 years. So we can calculate the mean for every day:
test_data_daily_mean <- aggregate(cbind(prec, temp) ~ (day), data=test_data, FUN=mean)
Hint: For this solution the date column really has to have dates inside. Otherwise you have to format them to R dates like this:
as.Date(data$date, format='%d-%m-%Y')
This answer is a little late, but maybe it helps someone else!