I have a data frame with year and day
df <- data.frame(year = rep(1980:2015,each = 365), day = 1:365)
Please note that I only need 365 days a year i.e. I am asusming each day has
365 years.
I want to generate two data:
1) which month does each day fall in
2) which 15-days period each day fall in. A year will have 24 15-days period. i.e. each month will be split into two halves something like this;
Jan: 1st - 15th: 1st Quarter
Jan: 16th- 31st: 2nd Quarter
Feb: 1st - 15th: 3rd Quarter
Feb: 16th - 28th: 4th Quarter
March: 1st - 15th: 5th Quarter
.
.
Decmber: 16th - 31st: 24th quarter
My final data should look like this
Year Day Quarter Month
1980 1 1 1
1980 2 1 1
.
.
1980 365 24 12
.
.
2015 1 1 1
2015 2 1 1
.
.
2015 365 12 24
I can generate the month using this:
library(dplyr)
months <- list(1:31, 32:59, 60:90, 91:120, 121:151, 152:181, 182:212, 213:243, 244:273, 274:304, 305:334, 335:365)
df1 <- df %>% group_by(year) %>%
mutate(month = sapply(day, function(x) which(sapply(months, function(y) x %in% y)))
But I do not know how to generate the 15-days period?
To handle that Feb 29th in leap years should not be included, we may generate a complete sequence of dates and then remove instances of Feb 29th. Grab month from the date. Calculate the two-week periods by checking if day of the month %d is <= 15 and subtract from 2* the month number.
# complete sequence of dates
# use two years in this example, with 2012 being a leap year
dates <- data.frame(date = seq(as.Date("2011-01-01"), as.Date("2012-12-31"), by = "1 day"))
# remove Feb 29th in leap years
d <- dates[format(dates$date, "%m-%d") != "02-29", , drop = FALSE]
# create month
d$month <- month(d$date)
# create two-week number
d$twoweek <- d$month * 2 - (as.numeric(format(d$date, "%d")) <= 15)
Related
For simplicity, I have data that has two columns. One column is the year (year) and the other is the number of days (yday). So year with a value of 1980 and yday with a value of 1 is January 1, 1980. Year with a value of 1980 and yday with a value of 365 is December 31, 1980. How do I separate the single yday column into two columns; a month column and the day of the month column? For example, 365 would be 12 for the month and 31 for the day. Thanks in advance.
Create a Date from the yday + year columns, then extract the day of month, and month separately:
dat <- data.frame(year=1980, yday=c(1,365))
# year yday
#1 1980 1
#2 1980 365
dat[c("month","day")] <- lapply(c("%m","%d"), \(x) {
d <- as.Date(paste(dat$year, dat$yday), format="%Y %j")
as.integer(format(d, x))
})
# year yday month day
#1 1980 1 1 1
#2 1980 365 12 30
I have a data for 2000 events with start and end date of each event and the length.
What I am trying to do is finding the frequency of events by month and year. But several of events are split between two consecutive months (say May and June) and want for these events to be reported to the month over which they stay longer. But if an event split equally between tow month then it will be reported to the month of start.
Eg:
> date01[1:5,9:11]
# A tibble: 5 x 3
StrD EndD EvLength
<date> <date> <drtn>
1 1993-12-30 1994-01-01 3 days # this would be reported Dec frequency
2 2000-07-23 2000-08-02 11 days # this would be reported July frequency
3 2001-02-28 2001-03-01 2 days # this would be reported Feb frequency (as it started in Feb)
4 2006-05-29 2006-06-01 4 days # this would be reported May frequency (as it started in May)
5 2010-07-30 2010-08-04 6 days
I tried to use group_by (from dplyr), but still not able to figure it out.
dates to date format with ymd() from lubridate package.
mutate days in previous and next Month with days_in_month function and basic arichmetic. Note the start day is count therefore +1 to start date.
get the month depending on which month has more days with an ifelse
get the abbreviation of Months with month.abb[Month]
get the Year from start date.
group and summarise
library(dplyr)
library(lubridate)
df %>%
mutate(across(1:2, ymd)) %>%
mutate(prev_month_days = days_in_month(StrD)-day(StrD)+1,
next_month_days = day(EndD)) %>%
mutate(Month = ifelse(prev_month_days>= next_month_days, month(StrD), month(EndD))) %>%
mutate(Month = month.abb[Month]) %>%
mutate(Year = year(StrD)) %>%
group_by(Year, Month) %>%
summarise(n = n())
Output:
Year Month n
<int> <chr> <int>
1 1993 Dec 1
2 2000 Jul 1
3 2001 Feb 1
4 2006 May 1
5 2010 Aug 1
I have a data set of daily value. It spans from Dec-1 2018 to April-1 2020.
The columns are "date" and "value". As shown here:
date <- c("2018-12-01","2000-12-02", "2000-12-03",
...
"2020-03-30","2020-03-31","2020-04-01")
value <- c(1592,1825,1769,1909,2022, .... 2287,2169,2366,2001,2087,2099,2258)
df <- data.frame(date,value)
What I would like to do is the sum the values by week and then calculate week over week change from the current to previous year.
I know that I can sum by week using the following function:
Data_week <- df%>% group_by(category ,week = cut(date, "week")) %>% mutate(summed= sum(value))
My questions are twofold:
1) How do I sum by week and then manipulate the dataframe so that I can calculate week over week change (e.g. week dec.1 2019/ week dec.1 2018).
2) How can I do that above, but using a "customized" week. Let's say I want to define a week as moving 7 days back from the latest date I have data for. Eg. the latest week I would have would be week starting on March 26th (April 1st -7 days).
We can use lag from dplyr to help and also some convenience functions from lubridate.
library(dplyr)
library(lubridate)
df %>%
mutate(year = year(date)) %>%
group_by(week = week(date),year) %>%
summarize(summed = sum(value)) %>%
arrange(year, week) %>%
ungroup %>%
mutate(change = summed - lag(summed))
# week year summed change
# <dbl> <dbl> <dbl> <dbl>
# 1 48 2018 3638. NA
# 2 49 2018 15316. 11678.
# 3 50 2018 13283. -2033.
# 4 51 2018 15166. 1883.
# 5 52 2018 12885. -2281.
# 6 53 2018 1982. -10903.
# 7 1 2019 14177. 12195.
# 8 2 2019 14969. 791.
# 9 3 2019 14554. -415.
#10 4 2019 12850. -1704.
#11 5 2019 1907. -10943.
If you would like to define "weeks" in different ways, there is also isoweek and epiweek. See this answer for a great explaination of your options.
Data
set.seed(1)
df <- data.frame(date = seq.Date(from = as.Date("2018-12-01"), to = as.Date("2019-01-29"), "days"), value = runif(60,1500,2500))
I have a dataframe like this. The time span is 10 years. Because it's Chinese market data, and China has Lunar Holidays. So each year have different holiday times in terms of the western calendar.
When it is a holiday, the stock market does not open, so it is a non-trading day. Weekends are non-trading days too.
I want to find out which month of which year has the least number of trading days, and most importantly, what number is that.
There are not repeated days.
date change open high low close volume
1 1995-01-03 -1.233 637.72 647.71 630.53 639.88 234518
2 1995-01-04 2.177 641.90 655.51 638.86 653.81 422220
3 1995-01-05 -1.058 656.20 657.45 645.81 646.89 430123
4 1995-01-06 -0.948 642.75 643.89 636.33 640.76 487482
5 1995-01-09 -2.308 637.52 637.55 625.04 625.97 509851
6 1995-01-10 -2.503 616.16 617.60 607.06 610.30 606925
If there are not repeated days, you can count days per month and year by:
library(data.table) "maxx"))), .Names = c("X2005", "X2006", "X2007", "X2008"))
library(lubridate)
dt <- as.data.table(dt)
dt_days <- dt[, .(count_day=.N), by=.(year(date), month(date))]
Then you only need to do this to get the min:
dt_days[count_day==min(count_day)]
The chron and bizdays packages deal with business days but neither actually contains a usable calendar of holidays limiting their usefulness.
We will use chron below assuming you have defined the .Holidays vector of dates that are holidays. (If you run the code below without doing that only weekdays will be regarded as business days as the default .Holidays vector supplied by chron has very few dates in it.) DF has 120 rows (one row for each year/month) and the last line subsets that to just the month in each year having least business days.
library(chron)
library(zoo)
st <- as.yearmon("2001-01")
en <- as.yearmon("2010-12")
ym <- seq(st, en, 1/12) # sequence of year/months of interest
# no of business days in each yearmonth
busdays <- sapply(ym, function(x) {
s <- seq(as.Date(x), as.Date(x, frac = 1), "day")
sum(!is.weekend(s) & !is.holiday(s))
})
# data frame with one row per year/month
yr <- as.integer(ym)
DF <- data.frame(year = yr, month = cycle(ym), yearmon = ym, busdays)
# data frame with one row per year
wx.min <- ave(busdays, yr, FUN = function(x) which.min(x) == seq_along(x))
DF[wx.min == 1, ]
giving:
year month yearmon busdays
2 2001 2 Feb 2001 20
14 2002 2 Feb 2002 20
26 2003 2 Feb 2003 20
38 2004 2 Feb 2004 20
50 2005 2 Feb 2005 20
62 2006 2 Feb 2006 20
74 2007 2 Feb 2007 20
95 2008 11 Nov 2008 20
98 2009 2 Feb 2009 20
110 2010 2 Feb 2010 20
I have a data frame that looks similar to this:
I know the starting year of the first obs (1963). The obs are in the exact chronological order. So the next instance of "Jan" (obs 13) indicates that the year is 1964. Is there a way to create a column "Year" that has increases the current year every time that the next occurrence of "Jan " happens?
In the pic, it would be "1964" and then when "Jan" happens again, 1965 and so on....
There is an answer to a similar problem that was suggested but it doesn't quite do it and here it is:
## Make data easily reproducible
df <- data.frame(day=c(24, 21, 20, 10, 20, 20, 10, 15),
month = c("Jun", "Mar", "Jan", "Dec", "Jun", "Jan", "Dec", "Dec"))
## Convert each month-day combo to its corresponding "julian date"
datestring <- paste("1963", match(df[[2]], month.abb), df[[1]], sep = "-")
date <- strptime(datestring, format = "%Y-%m-%d")
julian <- as.integer(strftime(date, format = "%j"))
## Transitions between years occur wherever julian date increases between
## two observations
df$year <- 1963 - cumsum(diff(c(julian[2], julian))>0)
But this won't do it: Because the last two observations have the same month ("Dec" and then another "Dec") the count for year increases:
The last observation should still read "1960" NOT "1959".
The OP has requested to complete the years in ascending order starting in 1963.
The approach below works without date conversion and dummy dates and can be amended to work with fiscal years (see here).
df$year <- 1963 + cumsum(c(0L, diff(100L*as.integer(
factor(df$month, levels = month.abb)) + df$day) < 0))
df
day month year
1 24 Jun 1963
2 21 Mar 1964
3 20 Jan 1965
4 10 Dec 1965
5 20 Jun 1966
6 20 Jan 1967
7 10 Dec 1967
8 15 Dec 1967
Note that there is a question which seems to be similar but was asking to complete years in descending order. The solution there needs to be changed in two places to work here.