Why would per-minute step count and per-hour step count from HealthKit differ? - healthkit

I am using a method similar to the Swift example in Get total step count for every date in HealthKit to acquire the number of steps from HealthKit. That works great.
My preference would be to get the number of steps per minute or per hour though instead of the per day that that code does -- While the sum of hourly steps perfectly matches the daily step count reported by HealthKit, the sum of minutely steps does not match hourly or daily sums.
Is there a way to get per-minute step summaries work? Or is there a logical answer as to why they are vastly different?
The only differences from the code above and my code is the following for Per Hour calculations (works the same):
interval.hour = 1
var anchorComponents = calendar.dateComponents([.hour, .day, .month, .year], from: NSDate() as Date)
and the following per minute calculations (usually over counts):
interval.minute = 1
var anchorComponents = calendar.dateComponents([.minute, .hour, .day, .month, .year], from: NSDate() as Date)
Clearly I am missing something. Thanks for any insight.

There are options (.strictStartDate, .strictEndDate) for the query predicate that determine whether the query finds all samples that are strictly within the interval (strict stop and start date), or whether it will include samples that stop or start outside of the interval.
My suspicion is that your step samples may cross minute boundaries, and depending on how you define the interval, this will make a significant difference.
I work with daily summaries and have defined the interval to be (strict, not-strict) so that a sample that spans two days is only counted in the one that it starts in. (Swift 4)
let predicate = HKQuery.predicateForSamples(withStart: start, end: end, options: .strictStartDate)
I note that the final answer in the SO question you linked to defines a (strict, not-strict) interval as well.


A difference between startof and firstof values of indexAt parameter in to.period

According to the documentation:
To adjust the final indexing style, it is possible to set indexAt to
one of the following: ‘yearmon’, ‘yearqtr’, ‘firstof’, ‘lastof’,
‘startof’, or ‘endof’. The final index will then be yearmon, yearqtr,
the first time of the period, the last time of the period, the
starting time in the data for that period, or the ending time in the
data for that period, respectively.
Now I try to do to.hourly for my minute data. I see that all values by default is set to the last minute in data. I want to set to the first minute. It seems indexAt is parameter exactly for that. I have one hour that starts at 09:30. As I understand from description 'firstof' should set it to 09:00 (first minute of hourly period) and 'startof' should set it to 09:30 (first available minute in hour data). 'startof' seems to work for me but 'firstof' does not work and still returns 09:59! Am I missing something?
If you look at the Usage section of ?to.period, you will see that only to.monthly and to.quarterly have indexAt arguments. That is why to.hourly ignores the indexAt argument.
Issue #158 briefly discusses the possibility of adding indexAt support for periods other than monthly and quarterly.

R - Cluster x number of events within y time period

I have a dataset that has 59k entries recorded over 63 years, I need to identify clusters of events with the criteria being:
6 or more events within 6 hours
Each event has a unique ID, time HH:MM:SS and date DD:MM:YY, an output would ideally have a cluster ID, the eventS that took place within each cluster, and start and finish time and date.
Thinking about the problem in R we would need to look at every date/time and count the number of events in the following 6 hours, if the number is 6 or greater save the event IDs, if not move onto the next date and perform the same task. I have taken a data extract that just contains EventID, Date, Time and Year.
If I come up with anything in the meantime I will post below.
Update: Having taken a break to think about the problem I have a new approach.
Add 6 hours to the Date/Time of each event then count the number of events that fall within the start end time, if there are 6 or more take the eventIDs and assign them a clusterID. Then move onto the next event and repeat 59k times as a loop.
Don't use clustering. It's the wrong tool. And the wrong term. You are not looking for abstract "clusters", but something much simpler and much more well defined. In particular, your data is 1 dimensional, which makes things a lot easier than the multivariate case omnipresent in clustering.
Instead, sort your data and use a sliding window.
If your data is sorted, and time[x+5] - time[x] < 6 hours, then these events satisfy your condition.
Sorting is O(n log n), but highly optimized. The remainder is O(n) in a single pass over your data. This will beat every single clustering algorithm, because they don't exploit your data characteristics.

Graphite: append a "current time" point to the end of a series

I have a "succeeded" metric that is just the timestamp. I want to see the time between successive successes (this is how long the data is stale for). I have
but I also want to know how long between the last success time and the current time. since derivative transforms xs[n] to xs[n+1] - xs[n], the "last" delta doesn't exist. How can I do this? Something like:
derivative(append(Success, now()))
I don't see any graphite functions for appending series, and I don't see any user-defined graphite functions.
The general problem is to be alerted when the data is stale, via graphite monitoring. There may be a better solution than the one I'm thinking about.
identity is a function whose value at any given time is the timestamp of that time.
keepLastValue is a function that takes a series and replicates data points forward over gaps in the data.
So then diffSeries(identity("now"), keepLastValue(Success)) will be a "sawtooth" series that climbs steadily while Success isn't updated, and jumps down to zero (or close to it — there might be some time skew) every time Success has a data point. If you use graphite monitoring to get the current value of that expression and compare it to some threshold, it will probably do what you want.

VB or macro to exclude period of times from time duration calculation in Excel

I have an Excel table which contains thousands of incident tickets. Each tickets typically carried over few hours or few days, and I usually calculate the total duration by substracting opening date and time from closing date and time.
However I would like to take into account and not count the out of office hours (night time), week-ends and holidays.
I have therefore created two additional reference tables, one which contains the non-working hours (eg everyday after 7pm until 7am in the morning, saturday and sunday all day, and list of public holidays).
Now I need to find some sort of VB macro that would automatically calculate each ticket "real duration" by removing from the total ticket time any time that would fall under that list.
I had a look around this website and other forums, however I could not find what I am looking for. If someone can help me achieve this, I would be extremely grateful.
Best regards,
You can use the NETWORKDAYS function to calculate the number of working days in the interval. Actually you seem to be perfectly set up for it: it takes start date, end date and a pointer to a range of holidays. By default it counts all days non-weekend.
For calculating the intraday time, you will need some additional magic. assuming that tickets are only opened and closed in bussines hours, it would look like this:
first_day_hrs := dayend - ticketstart
last_day_hrs := ticketend - daystart
inbeetween_hrs := (NETWORKDAYS(ticketstart, ticketend, rng_holidays) - 2) * (dayend - daystart)
total_hrs := first_day_hrs + inbetween_hrs + last_day_hrs
Of course the names should in reality refer to Excel cells. I recommend using lists and/or names.

Interval of one month back not working on the 31st?

Essentially, I have a query that is responsible for fetching all records (with specific filters) within the last month. I'm using Oracle's interval keyword and all was working great until today (December 31st, 2009). The code I'm using is
select (sysdate - interval '1' month) from dual
and the error I get it
ORA-01839: date not valid for month specified
How can I use the interval keyword to be compatible with any date? Or if anyone has a better way of approaching the issue, I'm all ears.
Thank you.
select add_months(sysdate,-1) from dual
Being pedantic...
The requirements are not quite specified perfectly unambiguously. What does the business mean by "within the last month"? Most people would take that to mean "within the current calendar month" in which case I'd use:
Otherwise, perhaps they want an arbitrary period of 1 month prior to the current date - but then how do you define that? As you've found, INTERVAL '1' MONTH simply subtracts one from the month portion of the date - e.g. 15-JAN-2009 - INTERVAL '1' MONTH returns 15-DEC-1999. For some dates, this results in an invalid date because not all months have the same number of days.
ADD_MONTHS resolves this by returning the last day in the month, e.g. ADD_MONTHS(31-DEC-2009,-1) returns 30-NOV-2009.
Another possibility is that the business actually wants to use an average month period - e.g. 365/12 which is approximately 30.4. They might want you to use SYSDATE-30, although of course twelve iterations of this will only cover 360 days of the year.
