time frequency in R - r

Good Afternoon, colleagues!I have some problems with the following task: I need to plot the time-series graph by using parameter "frequency" that defines the time between two observations in my graph. The data are shown below:
date time open high low close
1 1999.04.08 11:00 1.0803 1.0817 1.0797 1.0809
2 1999.04.08 12:00 1.0808 1.0821 1.0806 1.0807
3 1999.04.08 13:00 1.0809 1.0814 1.0801 1.0813
4 1999.04.08 14:00 1.0819 1.0845 1.0815 1.0844
5 1999.04.08 15:00 1.0839 1.0857 1.0832 1.0844
6 1999.04.08 16:00 1.0842 1.0852 1.0824 1.0834
By default in this data the frequency is 1 hour, but I have two questions: - how to define this frequency in the data (by automatically, if the data will be other one) (because I tried to select column time and to calculate frequency = time[2]-time[1] but I got an error)
- if we task the required frequency is 3 hour how to select this data with frequency 3 hour (in other words: 1st observations, the next one is 4th observations, the next is 7th and etc).
Thank you!

Related

Power BI - Calculating Sum Between 2 Times with DAX

I have half-hourly consumption data and I need to calculate the sum of consumption that takes place between 2:30 AM and 5:00 AM.
I have achieved this in Excel with a SUMIF statement. How do I do this with DAX, though?
Assuming you have a table with columns similar to those in the sample table below, I've included the DAX to calculate the sum of consumption between the times given. This also assumes that you want to calculate this sum for ALL days between 2:30 AM and 5:00 AM.
Sample Table (Table)
id
consumption
timestamp
1
4
2022-05-28 02:00
2
4
2022-05-28 02:30
3
5
2022-05-28 03:00
4
5
2022-05-28 03:30
5
6
2022-05-28 04:00
6
6
2022-05-28 04:30
7
5
2022-05-28 05:00
8
5
2022-05-28 05:30
Solution Measure
Consumption Sum =
CALCULATE(
SUM('Table'[consumption]),
TIME(
HOUR('Table'[timestamp]),
MINUTE('Table'[timestamp]),
SECOND('Table'[timestamp])
) >= TIMEVALUE("02:30:00"),
TIME(
HOUR('Table'[timestamp]),
MINUTE('Table'[timestamp]),
SECOND('Table'[timestamp])
) <= TIMEVALUE("05:00:00")
)
Sample Result
A similar result could be achieved using the SUMX function if that's more intuitive for you.

R is not giving the correct week number

Hello I am trying to find the week number for a series of date over three years. However R is not giving the correct week number. I am generating a seq of dates from 2016-04-01 to 2019-03-30 and then I am trying to calculate week over three years such that I get the week number 54, 55 , 56 and so on.
However when I check the week 2016-04-03 R shows the week number as 14 where as when cross checked with excel it is the week number 15 and also it simply calculates 7 days and does not reference the actual calendar days. Also the week number starts from 1 for every start of year
The code looks like this
days <- seq(as.Date("2016-04-03"),as.Date("2019-03-30"),'days')
weekdays <- data.frame('days'=days, Month = month(days), week = week(days),nweek = rep(1,length(days)))
This is how the results looks like
days week
2016-04-01 14
2016-04-02 14
2016-04-03 14
2016-04-04 14
2016-04-05 14
2016-04-06 14
2016-04-07 14
2016-04-08 15
2016-04-09 15
2016-04-10 15
2016-04-11 15
2016-04-12 15
However when checked from excel this is what I get
days week
2016-04-01 14
2016-04-02 14
2016-04-03 15
2016-04-04 15
2016-04-05 15
2016-04-06 15
2016-04-07 15
2016-04-08 15
2016-04-09 15
2016-04-10 16
2016-04-11 16
2016-04-12 16
Can someone please help me identify wherever I am going wrong.
Thanks a lot in advance!!
Not anything that you're doing wrong per se, there is just a difference in how R (I presume you're using the lubridate package) and Excel calculate week numbers.
R will calculate week numbers based on the seven day block from 1 January that year; but
Excel calculates week numbers based on a week starting from Sunday.
Taking the first few days of January 2016 for an example. On, Friday, 1 January 2016, both R and Excel will say this is week 1.
On Sunday, 3 January 2016:
this is within the first seven days of the start of the year so R will return week number 1; but
it is a Sunday, so Excel ticks over to week number 2.
Try this:
ifelse(test = weekdays.Date(days[1]) == "Sunday", yes = epiweek(days[1]), no = epiweek(days[1]) + 1) + cumsum(weekdays.Date(days) == "Sunday")
This tests whether the first day is a Sunday or not and returns an appropriate week number starting point, then adds on one more week number each Sunday. Gives the same week number if there's overlap between years.

Given time column, how can I create time bins in R?

Given a dataframe with say 3 columns:
date time respond
1/1/2018 15:40 1
4/5/2017 08:25 0
3/4/2016 09:00 1
5/4/2017 09:25 1
....
I want to bin my time column say into 24 bins - for each our and if for example I have 50 samples I want all times between hour1 to hour2 (08:00 - 09:00) to represent bin of 08:00 hour etc.
Now when I will achieve this, I want to count how many responders I have within each bin:
bin08:00 = 10 responders
bin09:00 = 134 responders
and to plot it using ggplot2.
Also please guide me how can I create different bin map:
from 08:00 to 12:00 AM - hourly bins.
12:00AM - 15:00 every 15 minutes bins etc.
Please guide me how can I do this.
#akrun
One way to do this is to use strptime to format your time column as POSIX objects, and then use format on those objects to round down to the hour like so:
library(dplyr)
df$hour <- format(strptime(df$time, "%H:%M"), "%H:00")
df %>% group_by(hour) %>% summarize(respond = sum(respond))
# # A tibble: 3 x 2
# hour respond
# <chr> <int>
# 1 08:00 0
# 2 09:00 2
# 3 15:00 1

Combine timedelta and date column, group by time interval

I need to combine two separate columns to one datetime column.
The pandas dataframe looks as follows:
calendarid time_delta_actualdeparture actualtriptime
20140101 0 days 06:35:49.000020000 27.11666667
20140101 0 days 06:51:37.000020000 24.83333333
20140101 0 days 07:11:40.000020000 28.1
20140101 0 days 07:31:40.000020000 23.03333333
20140101 0 days 07:53:34.999980000 23.3
20140101 0 days 08:14:13.000020000 51.81666667
I would like to convert it to look like this:
calendarid actualtriptime
2014-01-01 6:30:00 mean of trip times in time interval
2014-01-01 7:00:00 mean of trip times in time interval
2014-01-01 7:30:00 mean of trip times in time interval
2014-01-01 8:00:00 mean of trip times in time interval
2014-01-01 8:30:00 mean of trip times in time interval
Essentially i would like to combine the two columns as one and then group into 30 minute time intervals, taking the mean of the actual trip time in that interval. I've unsuccessfully tried many techniques, but i am still learning python/pandas. Can anyone help me with this?
Convert your 'calendarid' column to a datetime and add the delta to get the starting times.
In [5]: df['calendarid'] = pd.to_datetime(df['calendarid'], format='%Y%m%d')
In [7]: df['calendarid'] = df['calendarid'] + df['time_delta_actualdeparture']
In [8]: df
Out[8]:
calendarid time_delta_actualdeparture actualtriptime
0 2014-01-01 06:35:49.000020 06:35:49.000020 27.116667
1 2014-01-01 06:51:37.000020 06:51:37.000020 24.833333
2 2014-01-01 07:11:40.000020 07:11:40.000020 28.100000
3 2014-01-01 07:31:40.000020 07:31:40.000020 23.033333
4 2014-01-01 07:53:34.999980 07:53:34.999980 23.300000
5 2014-01-01 08:14:13.000020 08:14:13.000020 51.816667
Then you can you set your date column as an index and resample at the 30 minutes frequency to get the mean over each interval.
In [19]: df.set_index('calendarid').resample('30Min', how='mean', label='right')
Out[19]:
actualtriptime
calendarid
2014-01-01 07:00:00 25.975000
2014-01-01 07:30:00 28.100000
2014-01-01 08:00:00 23.166667
2014-01-01 08:30:00 51.816667

Pandas custom week

I'm trying to define a custom week for a dataframe.
I have a dataframe with timestamps.
I've read the questions on here regarding isocalendar. While it does the job. It's not what I want.
I'm trying to define the weeks from Friday to Thrusday.
For example:
Friday 2nd Jan 2015 would be the first day of the week.
Thursday 8th Jan 2015 would be the last day of the week.
And this would be week 1.
Is there a way to set a custom weekday? so when I access the the datetime library, I get the result that I expect.
df['Week_Number'] = df['Date'].dt.week
Here's one solution - convert your dates to a Period representing weeks that end on Thursday.
In [39]: df = pd.DataFrame({'Date':pd.date_range('2015-1-1', '2015-12-31')})
In [40]: df['Period'] = df['Date'].dt.to_period('W-THU')
In [41]: df['Week_Number'] = df['Period'].dt.week
In [44]: df.head()
Out[44]:
Date Period Week_Number
0 2015-01-01 2014-12-26/2015-01-01 1
1 2015-01-02 2015-01-02/2015-01-08 2
2 2015-01-03 2015-01-02/2015-01-08 2
3 2015-01-04 2015-01-02/2015-01-08 2
4 2015-01-05 2015-01-02/2015-01-08 2
Note that it follows the same convention as datetimes, where week 1 can be incomplete, so you may have to do a little extra munging if you want 1 to be the first complete week.

Resources