Given a dataframe:
df = pd.DataFrame({'c':[0,1,1,2,2,2],'date':pd.to_datetime(['2016-01-01','2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-05'])})
How to get the previous month begin for each date? The below doesn't work for 6/5 and there is some extra time portion.
pd.to_datetime(df['date'], format="%Y%m") + pd.Timedelta(-1,unit='M') + MonthBegin(0)
EDIT
I have a workaround (2 steps back and 1 step forward):
(df['date']+ pd.Timedelta(-2,unit='M')+ MonthBegin(1)).dt.date
Don't like this. There should be something better.
You can first subtract MonthEnd to get to the end of the previous month, then MonthBegin to get to the beginning of the previous month:
df['date'] - pd.offsets.MonthEnd() - pd.offsets.MonthBegin()
The resulting output:
0 2015-12-01
1 2016-01-01
2 2016-02-01
3 2016-03-01
4 2016-04-01
5 2016-05-01
Related
I have looked around for ages trying to find what I am looking for but none of the code has given me what I want. I need to create a variable that calculates the difference in months between two date variables.
For example, if I have the data below:
start_date end_date
2010-01-01 2010-12-31
2016-05-01 2016-12-31
2004-03-01 2004-10-31
1997-10-01 1998-08-31
I would like the outcome to look like the following:
start_date end_date month_count
2010-01-01 2010-12-31 12
2016-05-01 2016-12-31 8
2004-03-01 2004-10-31 8
1997-10-01 1998-08-31 11
Meaning I would like the whole last month to be included. Many of the codes I have checked have given me 11 months for the first observation instead of 12 for example. Also, many codes have said to specify the actual date but as I have a large dataset I can't do that, and would need to go by the variables instead.
Thank you in advance!
dplyr way
library(lubridate)
library(dplyr)
df %>% mutate(across(everything(), ~as.Date(.))) %>%
mutate(months = (year(end_date) - year(start_date) * 12) + month(end_date) - month(start_date) + 1)
Here is a possible way:
library(data.table)
dtt <- fread(text = 'start_date end_date
2010-01-01 2010-12-31
2016-05-01 2010-12-31
2004-03-01 2010-10-31')
dtt[, month_count := month(end_date) - month(start_date) + 1]
dtt
# start_date end_date month_count
# 1: 2010-01-01 2010-12-31 12
# 2: 2016-05-01 2010-12-31 8
# 3: 2004-03-01 2010-10-31 8
I have a round about solution to get the last Thursday of each month, the reproducible code is as below:
import pandas as pd
start = pd.Timestamp('2016-07-27 00:00:00')
end = pd.Timestamp('2016-11-18 00:00:00')
dt_range = pd.Series(pd.date_range(start, end, freq='W-THU'))
t = dt_range.groupby(dt_range.dt.month).last().values.astype('datetime64[D]')
However, i guess it is somewhat unnecessary to produce range of values and operate groupby on it for get to the last Thursday. I tried
dt_range = pd.Series(pd.date_range(start, end, freq='4W-THU'))
but this can result in selecting 2nd last Thursday for months with Thursdays in fifth week.
How can i accomplish this more efficiently, preferably at the date_range function itself?
You can use pd.offsets.LastWeekOfMonth to build your custom frequency:
last_thu = pd.offsets.LastWeekOfMonth(weekday=3)
dt_range = pd.Series(pd.date_range('2016-01-01', periods=12, freq=last_thu))
The resulting output:
0 2016-01-28
1 2016-02-25
2 2016-03-31
3 2016-04-28
4 2016-05-26
5 2016-06-30
6 2016-07-28
7 2016-08-25
8 2016-09-29
9 2016-10-27
10 2016-11-24
11 2016-12-29
I have a data set containing data for about 4.5 years. I'm trying to create two different data frames from this, for what I will call holiday and non-holiday periods. There are multiple periods per year, and these periods will repeat over multiple years.
For example, I'd like to choose a time period between Thanksgiving and New Year's Day, as well as periods prior to Valentine's Day and Mother's Day for each year, and make this my holiday data frame. Everything else would be non-holiday.
I apologize if this has been asked before, I just can't find it. I found a similar question for SQL, but I'm trying to figure out how to do this in R.
I've tried filtering and selecting, to no avail.
wine.holiday <- wine.sub2 %>%
select(total, cdate) %>%
subset(cdate>=2011-11-25, cdate<=2011-12-31)
wine.holiday
Source: local data frame [27,628 x 3]
Groups: clubgroup_id.x [112]
clubgroup_id.x total cdate
(chr) (dbl) (date)
1 1 45 2011-10-04
2 1 45 2011-10-08
3 1 45 2011-10-09
4 1 45 2011-10-09
5 1 45 2011-10-11
6 1 45 2011-10-15
7 1 45 2011-10-24
8 1 90 2011-11-13
9 1 45 2011-11-18
10 1 45 2011-11-26
.. ... ... ...
Clearly something isn't right, because not only is it not limiting the date range, but it's including a column in the data frame that I'm not even selecting.
As mentioned in the comments, dplyr uses filter not subset. Just a simple change to the code you've got (therefore not a complete solution to your issue, but hopefully helps) should get the subset working.
wine.holiday <- wine.sub2 %>%
select(total, cdate)
wine.holiday <- subset(wine.holiday, cdate>=as.Date("2011-11-25") & cdate<=as.Date("2011-12-31"))
wine.holiday
Or, to stick with dplyr piping:
wine.holiday <- wine.sub2 %>%
select(total, cdate) %>%
filter( cdate>=as.Date("2011-11-25") & cdate<=as.Date("2011-12-31") )
wine.holiday
EDIT to add: If the dplyr select isn't working (it looks fine to me), you could try this:
wine.holiday <- subset( wine.sub2, select = c( total, cdate ) )
wine.holiday <- subset(wine.holiday, cdate>=as.Date("2011-11-25") & cdate<=as.Date("2011-12-31"))
wine.holiday
You could, of course, combine those two lines into one. This makes it harder to read, but would probably improve the processing efficiency:
wine.holiday <- subset(wine.sub2, cdate>=as.Date("2011-11-25") & cdate<=as.Date("2011-12-31"), select=c(total,cdate) )
I figured out another method for this through looking through SO posts (took a while).
> library(dateTime)
> wine.holiday <- data.table(start = c(as.Date(USThanksgivingDay(2010:2020))),
+ end = as.Date(USNewYearsDay(2011:2021))-1)
> wine.holiday
start end
1: 2010-11-25 2010-12-31
2: 2011-11-24 2011-12-31
3: 2012-11-22 2012-12-31
4: 2013-11-28 2013-12-31
5: 2014-11-27 2014-12-31
6: 2015-11-26 2015-12-31
7: 2016-11-24 2016-12-31
8: 2017-11-23 2017-12-31
9: 2018-11-22 2018-12-31
10: 2019-11-28 2019-12-31
11: 2020-11-26 2020-12-31
I still need to figure out how to add other ranges (e.g. two weeks before Valentine's Day or Mother's Day) to this, and will update this answer if/when I figure it out.
I'm trying to define a custom week for a dataframe.
I have a dataframe with timestamps.
I've read the questions on here regarding isocalendar. While it does the job. It's not what I want.
I'm trying to define the weeks from Friday to Thrusday.
For example:
Friday 2nd Jan 2015 would be the first day of the week.
Thursday 8th Jan 2015 would be the last day of the week.
And this would be week 1.
Is there a way to set a custom weekday? so when I access the the datetime library, I get the result that I expect.
df['Week_Number'] = df['Date'].dt.week
Here's one solution - convert your dates to a Period representing weeks that end on Thursday.
In [39]: df = pd.DataFrame({'Date':pd.date_range('2015-1-1', '2015-12-31')})
In [40]: df['Period'] = df['Date'].dt.to_period('W-THU')
In [41]: df['Week_Number'] = df['Period'].dt.week
In [44]: df.head()
Out[44]:
Date Period Week_Number
0 2015-01-01 2014-12-26/2015-01-01 1
1 2015-01-02 2015-01-02/2015-01-08 2
2 2015-01-03 2015-01-02/2015-01-08 2
3 2015-01-04 2015-01-02/2015-01-08 2
4 2015-01-05 2015-01-02/2015-01-08 2
Note that it follows the same convention as datetimes, where week 1 can be incomplete, so you may have to do a little extra munging if you want 1 to be the first complete week.
I have following data set:
>d
x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012
I want:
> d
x date
1 1 31-12-2013
2 2 31-12-2010
3 3 31-12-2011
4 4 31-12-2012
i.e. Last day, last month and the year of the date object.
Please Help!
You can also just use the ceiling_date function in LUBRIDATE package.
You can do something like -
library(lubridate)
last_date <- ceiling_date(date,"year") - days(1)
ceiling_date(date,"year") gives you the first date of the next year and to get the last date of the current year, you subtract this by 1 or days(1).
Hope this helps.
Another option using lubridate package:
## using d from Roland answer
transform(d,last =dmy(paste0('3112',year(dmy(date)))))
x date last
1 1 1-3-2013 2013-12-31
2 2 2-4-2010 2010-12-31
3 3 2-5-2011 2011-12-31
4 4 1-6-2012 2012-12-31
d <- read.table(text="x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012", header=TRUE)
d$date <- as.Date(d$date, "%d-%m-%Y")
d$date <- as.POSIXlt(d$date)
d$date$mon <- 11
d$date$mday <- 31
d$date <- as.Date(d$date)
# x date
#1 1 2013-12-31
#2 2 2010-12-31
#3 3 2011-12-31
#4 4 2012-12-31
1) cut.Date Define cut_year to give the first day of the year. Adding 366 gets us to the next year and then applying cut_year again gets us to the first day of the next year. Finally subtract 1 to get the last day of the year. The code uses base functionality only.
cut_year <- function(x) as.Date(cut(as.Date(x), "year"))
transform(d, date = cut_year(cut_year(date) + 366) - 1)
2) format
transform(d, date = as.Date(format(as.Date(date), "%Y-12-31")))
3) zoo A "yearmon" class variable stores the date as a year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec. Thus taking its floor and adding 11/12 gets one to Dec and as.Date.yearmon(..., frac = 1) uses the last of the month instead of the first.
library(zoo)
transform(d, date = as.Date(floor(as.yearmon(as.Date(date))) + 11 / 12, frac = 1))
Note: The inner as.Date in cut_year and in the other two solutions can be omitted if it is known that date is already of "Date" class.
ADDED additional solutions.