I'm trying to filte some daily panel ,but I just want to use the month end data,first I must know their month end date.
data example:
https://imgur.com/a/SqL7A7F
I tried to use this code to get the month end date.However,some dates are missed (Some month end days are 24/25/26,I missed many data)
My problems is,how can I get the data month end date and not ignore any earlier last day(like3/23,6/25,etc.)
library(anytime)
x=anydate(as.vector(fundbv$date))
y=unique(as.Date(format(mydates+28,"%Y-%m-01"))-1)
finaldays=x[x %in% unique(as.Date(format(x+28,"%Y-%m-01"))-1)]
finaldays=unique(finaldays)
Thanks and appreciate!!!!!
Here's how to do it with dplyr and lubridate:
library(dplyr)
library(lubridate)
# generate a data frame with dates to play with
(df <- data_frame(
date=seq(as.Date("2017-01-01"), as.Date("2018-12-31"), by=6),
amount=rgamma(length(date), shape=2, scale=20)))
df %>%
group_by(month=floor_date(date, "month")) %>%
summarize(date = max(date))
Related
Apologies if the wording of the title is confusing, it's difficult to describe exactly what I'm looking for. I've got some data with two date fields, let's call them start_date and end_date. I'm interested in knowing whether or not a particular observation "covered" June 30th of any given year (the data spans multiple years).
So, for instance, if start_date = "02-25-2021" and end_date = "01-12-2022", this observation would fit my criteria. By contrast, an observation with start_date = "07-02-2015" and end_date = "08-25-2015" would not, since June 30th does not occur in between the start and end date variables.
The issue is that because my data spans multiple years, it's not straightforward to me how I can identify cases which pass over a date regardless of year. How can I do this type of filtering without having to manually specify a range for every single year? Hope this is clear enough -- thanks for any assistance you can provide.
You could use lubridate to add a column with your test date, and then test for it being %within% each interval. If you could share a sample of your data with dput() it might be easier to provide a clear example. Off my head I'd try something like:
library(tidyverse)
library(lubridate)
df %>%
mutate(test_date = ymd(paste0(year(end_date),'0630')),
in_range = test_date %within% interval(start_date, end_date))
Here is a solution with base R that can be used in a Tidyverse context. It is a bit hacky, but it does work.
The idea is to create a vector of dates between start_date and end_date and then strip away the year and the dash. When done in this order, the date can be matched as many times as it actually occurs in the vector. The rest is quite self-explanatory, by using basic dplyr functions, you can filter, count, etc.
# Packages
lapply(c("dplyr","tibble","stringr","lubridate"), library, character.only = TRUE)
# Create vector of dates without year
prep_m_d_vec <- function(start_date,
end_date){
out <- seq.Date(from = start_date,
to = end_date,
by = 1) %>%
str_remove_all(pattern = "^[1-3]{1}[0-9]{3}-")
return(out)
}
# Optional: RM year of date of choice
rm_year <- function(d){
out <- format(d,
format="%m-%d")
return(out)
}
# Does not include date_of_choice
date_vec <- prep_m_d_vec(start_date = dmy("30-03-2021"),
end_date = dmy("30-05-2021"))
# Set date
date_of_choice <- rm_year(dmy("30-06-2022"))
# Filter rows
tibble(date = date_vec) %>%
filter(date == date_of_choice)
# date_of_choice is included 40x
date_vec <- prep_m_d_vec(start_date = dmy("30-03-2000"),
end_date = dmy("30-03-2040"))
# Filter rows
tibble(date = date_vec) %>%
filter(date == date_of_choice)
# Check if present and count
tibble(date = date_vec) %>%
summarise(n_date_of_choice = sum(date %in% date_of_choice),
date_of_choice_present = (date_of_choice %in% date))
I have a date format as follows: yyyymmdd. So, 10 March 2022 is fromatted as 20220310. So there is no separator between the day, month and year. But no I want to replace to column with all those dates with a column that only contains the year. Normally I would use the following code:
df <- df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd')))))
And then separate the column into three different columns with year, month and days and than delete the monht and day column. But somehow the code above doesn't work. Hope that anyone can help me out.
Not as fancy, but you could simply get the year from a substring of the whole date:
df$Year <- as.numeric(substr(as.character(df$Date),1,4))
you can try this:
df$column_with_date <- as.integer(x = substr(x = df$column_with_date, start = 1, stop = 4))
The as.integer function is optional, but you could use it to save more space in memory.
You code works if it is in the format below. You can use mutate_at with a list of year, month, and day to create the three columns like this:
df <- data.frame(Date = c("20220310"))
library(lubridate)
library(dplyr)
df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd'))))) %>%
mutate_at(vars(Date), list(year = year, month = month, day = day))
#> Date year month day
#> 1 2022-03-10 2022 3 10
Created on 2022-07-25 by the reprex package (v2.0.1)
let's say I have a list of dates from March 1st to July 15th:
daterange = as.data.frame(seq(as.Date("2020-3-1"), as.Date("2020-7-15"), "days"))
I want to group the dates by 1-15 and 16-30/31 for each month. So the dates in March will be separated into two groups: Mar 1-15 and Mar 16-31. Then keep doing this for every month.
I know the lubridate package can sort by week, but I don't know how to set a custom range.
Thanks
We can create a logical vector on day as well as a group on yearmon
library(dplyr)
library(zoo)
library(lubridate)
library(stringr)
daterange2 <- daterange %>%
set_names('Date') %>%
group_by(yearmon = as.yearmon(Date),
Daygroup = (day(Date) > 15) + 1) %>%
mutate(Label = str_c(format(Date, '%b'),
str_c(min(day(Date)), max(day(Date)), sep='-'), sep= ' '))
Using base R, you can create two groups in each month by pasting the month value from each date and assign value 1/2 based on the date.
newdaterange <- transform(daterange, group = paste0(format(date, "%b"), '-group-',
ifelse(as.integer(format(date, "%d")) > 15, 1, 2)))
I am an aspiring data scientist, and this will be my first ever question on StackOF.
I have this line of code to help wrangle me data. My date filter is static. I would prefer not to have to go in an change this hardcoded value every year. What is the best alternative for my date filter to make it more dynamic? The date column is also difficult to work with because it is not a
"date", it is a "dbl"
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
Tried so far:
df %>%
filter(DATE >= 20191231)
# load packages (lubridate for dates)
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
This looks like this:
DATE
1 20191230
2 20191231
3 20200122
# and now...
df %>% # take the dataframe
mutate(DATE = ymd(DATE)) %>% # turn the DATE column actually into a date
filter(DATE >= floor_date(Sys.Date(), "year") - days(1))
...and filter rows where DATE is >= to one day before the first day of this year (floor_date(Sys.Date(), "year"))
DATE
1 2019-12-31
2 2020-01-22
I want to calculate number of days between two dates by excluding the weekends.
enter image description here
You can make a sequence of dates (and days, both with lubrirdate), filter out the weekends and count the number of rows:
library(dplyr)
library(lubridate)
df <- data_frame(date = seq(ymd("2018-06-01"), ymd("2018-09-30"), by = "days"))
days <- mutate(df, day = wday(date, label = T)) %>%
filter(day != "Sat", day != "Sun") %>%
nrow()
You can do it with help of library chron
Sample Code:
library(chron)
number_of_days <- (START_DATE,END_DATE,by=1)
length(number_of_days)
length(number_of_days[!is.weekend(number_of_days)])
Hope this helps