SQLite query GROUP BY range - sqlite

Here are my table's columns :
Time | Close | High | Low | Open | pairVolume | Trades | Volume
I would love to have my data group by range of time.
Now the tricky part is that this range is custom (it's a user input which could very well be grouping by 10 minutes, 2 hours, or even 5 days)
My time field is stored in millisecond since epoch.
Solution I found for now which I'm uncertain about :
SELECT time + (21600000 - (time%21600000)) as gap, count(time)
FROM price_chart
WHERE time >= 1517418000000 and time <= 1518195600000
GROUP BY gap
21600000 is 6 hours in milliseconds
time is time since epoch

Yes, it works.
Putting some numbers into excel with your formula below, it works for me. Your gap value will be returned as the top end of each time range grouping.
SELECT time + (21600000 - (time%21600000)) as gap ...
Using the below:
SELECT time - (time%21600000) as gap_bottom ...
Would return you the bottom end of each time range grouping. You could add this as an additional calculated column and have both returned.
EDIT / PS:
You can also use the SQLite date formatting functions after dividing 1,000 milliseconds out of your epoch time and converting it to the SQLite unixepoch:
strftime('%Y-%m-%d %H:%M:%S', datetime(1517418000000 / 1000, 'unixepoch') )
... for ...
SELECT strftime('%Y-%m-%d %H:%M:%S', datetime( (time + (21600000 - (time%21600000))) / 1000, 'unixepoch') ) as gap ...

Related

How to select fix number of datapoints spread evenly over a time range

I am having a hard time creating a SQLite command that will return an evenly spaced out data points based on time if the number of data points is beyond 50.
Basically, I am having data stored every 30 seconds. However, if I want to see the data in the last hour, that will be a very large amount of data and will end up freezing my RPI as it attempts to visualize this data. So, my solution is to create a SQLite command that will limit the number of return data points to 50 that are spread evenly across a time range.
I have separate commands for if I want last min, 5 mins, 10 mins, etc. Once it goes beyond 1 hour, I need to limit the data so I can hard code this into the command (no need for IF statements)
Here is my currently attempt at the command which is not working:
select Voltage from Battery2 where Timestamp >= Datetime('now', '-1 hour') % (SELECT COUNT(*)/50 FROM Battery2)=0;
This is based on this stack overflow post: How to select fixed number of evenly spread rows in timeseries sqlite database
EDIT:
Here is some sample data from the output of the function:
Voltage: 54
Timestamp: "2022-01-13 16:47:47"
Voltage: 54
Timestamp: "2022-01-13 16:48:18"
Voltage: 54
Timestamp: "2022-01-13 16:48:49"
You can use NTILE() window function to divide the resultset in 50 groups, based on the column Timestamp and then with aggregation pick 1 row from each group with MAX() or MIN() aggregate function:
WITH cte AS (
SELECT *, NTILE(50) OVER (ORDER BY Timestamp) nt
FROM Battery2
WHERE Timestamp >= datetime('now', '-1 hour')
)
SELECT MAX(Timestamp) AS Timestamp, Voltage
FROM cte
GROUP BY nt;

Get number of milliseconds for a localised date, taking into account daylight savings

I have data in Google BigQuery that looks like this:
sample_date_time_UTC time_zone milliseconds_between_samples
-------- --------- ----------------------------
2019-03-31 01:06:03 UTC Europe/Paris 60000
2019-03-31 01:16:03 UTC Europe/Paris 60000
...
Data samples are expected at regular intervals, indicated by the value of the milliseconds_between_samples field:
The time_zone is a string that represents a Google Cloud Supported Timezone Value
I'm then checking the ratio of the actual number of samples compared to the expected number over any particular day, for any single day range (expressed as a local date, for the given time_zone):
with data as
(
select
-- convert sample_date_time_UTC to equivalent local datetime for the timezone
DATETIME(sample_date_time_UTC,time_zone) as localised_sample_date_time,
milliseconds_between_samples
from `mytable`
where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
)
select date(localised_sample_date_time) as localised_date, count(*)/(86400000/avg(milliseconds_between_samples)) as ratio_of_daily_sample_count_to_expected
from data
group by localised_date
order by localised_date
The problem is that this has a bug, as I've hardcoded the expected number of milliseconds in a day to 86400000. This is incorrect, as when daylight saving begins in the specified time_zone (Europe/Paris), a day is 1hr shorter. When daylight saving ends, the day is 1hr longer.
So, the query above is incorrect. It queries data for 31st March of this year in the Europe/Paris timezone (which is when daylight saving started in that timezone). The milliseconds in that day should be 82800000.
Within the query, how can I get the correct number of milliseconds for the specified localised_date?
Update:
I tried doing this to see what it returns:
select DATETIME_DIFF(DATETIME('2019-04-01 00:00:00.000000+02:00', 'Europe/Paris'), DATETIME('2019-03-31 00:00:00.000000+01:00', 'Europe/Paris'), MILLISECOND)
That didn't work - I get 86400000
You can get the difference in milliseconds for the two timestamps by removing the +01:00 and +02:00. Note that this gives the difference between the timestamps in UTC: 90000000, which is not the same as the actual milliseconds that passed.
You can do something like this to get the milliseconds for one day:
select 86400000 + (86400000 - DATETIME_DIFF(DATETIME('2019-04-01 00:00:00.000000', 'Europe/Paris'), DATETIME('2019-03-31 00:00:00.000000', 'Europe/Paris'), MILLISECOND))
Thanks #Juta, for the hint on using UTC times for the calculation. As I'm grouping my data for each day by a localised date, I figured out that I can work out milliseconds for each day by getting the beginning and end datetime (in UTC), for my 'localised' date, using the following logic:
-- get UTC start datetime for localised date
-- get UTC end datetime for localised date
-- this then gives the milliseconds for that localised date:
datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND);
So, my full query becomes:
with daily_sample_count as (
with data as
(
select
-- get the date in the local timezone, for sample_date_time_UTC
DATE(sample_date_time_UTC,time_zone) as localised_date,
milliseconds_between_samples
from `mytable`
where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
)
select
localised_date,
count(*) as daily_record_count,
avg(milliseconds_between_samples) as daily_avg_millis_between_samples,
datetime(timestamp(localised_date, time_zone)) as utc_start_datetime,
datetime(timestamp(date_add(localised_date, interval 1 day), time_zone)) as utc_end_datetime
from data
)
select
localised_date,
-- apply calculation for ratio_of_daily_sample_count_to_expected
-- based on the actual vs expected number of samples for the day
-- no. of milliseconds in the day changes, when transitioning in/out of daylight saving - so we calculate milliseconds in the day
daily_record_count/(datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND)/daily_avg_millis_between_samples) as ratio_of_daily_sample_count_to_expected
from
daily_sample_count

Fetch Data At every 2 Minutes SQL Lite

Please can anyone help me in fetching data from my SQL Lite Table at every 2 minutes between my start time and stop time
I have two columns Data , TimeStamp and I am filtering between two timestamp and it is working fine but what I am trying to do is to result my data at every 2 minutes interval For example my start time is 2016-12-15 10:00:00 and stop time is 2016-12-15 10:10:00 the result should be 2016-12-15 10:00:00,2016-12-15 10:02:00,2016-12-15 10:04:00 ....
Add, to your where clause, an expression that looks for 2 minute boundaries:
strftime("%s", TimeStamp) % 120 = 0
This assumes you have data on exact, 2-minute boundaries. It will ignore data between those points.
strftime("%s", TimeStamp) converts your time stamp string into a single number representing the number of seconds since Jan 1st, 1970. The % 120 does modulo arithmetic resulting in 0 every 120 seconds. If you want minute boundaries, use 60. If you want hourly, use 3600.
What's more interesting -- and I've used this -- is to take all the data between boundaries and average them together:
SELECT CAST(strftime("%s", TimeStamp) / 120 AS INTEGER) * 120 as stamp, AVG(Data)
FROM table
WHERE TimeStamp >= '2016-12-15 10:00:00' AND
TimeStamp < '2016-12-15 10:10:00'
GROUP BY stamp;
This averages all data with timestamps in the same 2-minute "bin". The second date comparison is < rather than <= because then the last bin would only average one sample whereas the other bins would be averages of multiple values. You could also add MAX(Data) and MIN(Data) columns, if you want to know how much the data changed within each bin.

sqlite : time difference between two dates in decimals

I have two two timestamp fields (START,END) and a TIME_DIFF field which is of Integer type. I am trying to calculate the time between START and END field.. I created a trigger to do that :
CREATE TRIGGER [TIME_DIFF]
AFTER UPDATE OF [END]
ON [KLOG]
BEGIN
update klog set TIME_DIFF =
cast(
(
strftime('%s',KLOG.END) -
strftime('%s',KLOG.START)
) as INT
) / 60/60;
END
This gives me result in whole hours.Anything between 0 and 59 minutes is neglected.
I am wondering how can I modify this trigger so it displays in decimals?
Meaning, if the time difference is 1 hour 59 minutes the result would display 1.59.If the time difference is 35 minutes it would display 0.35.
To interpret a number of seconds as a timestamp, use the unixepoch modifier. Then you can simply use strftime() to format the value:
strftime('%H:%S',
strftime('%s',KLOG.END) - strftime('%s',KLOG.START),
'unixepoch')
If you use Julian days instead of seconds, you do not need a separate modifier:
strftime('%H:%S',
julianday(KLOG.END) - julianday(KLOG.START))

How to add/subtract date/time components using a calculated interval?

I'd like to get this to work in Teradata:
Updated SQL for better example
select
case
when
current_date between
cast('03-10-2013' as date format 'mm-dd-yyyy') and
cast('11-03-2013' as date format 'mm-dd-yyyy')
then 4
else 5
end Offset,
(current_timestamp + interval Offset hour) GMT
However, I get an error of Expected something like a string or a Unicode character blah blah. It seems that you have to hardcode the interval like this:
select current_timestamp + interval '4' day
Yes, I know I hardcoded it in my first example, but that was only to demonstrate a calculated result.
If you must know, I am having to convert all dates and times in a few tables to GMT, but I have to account for daylight savings time. I am in Eastern, so I need to add 4 hours if the date is within the DST timeframe and add 5 hours otherwise.
I know I can just create separate update statements for each period and just change the value from a 4 to a 5 accordingly, but I want my query to be dynamic and smart.
Here's the solution:
select
case
when
current_date between
cast('03-10-2013' as date format 'mm-dd-yyyy') and
cast('11-03-2013' as date format 'mm-dd-yyyy')
then 4
else 5
end Offset,
(current_timestamp + cast(Offset as interval hour)) GMT
You have to actually cast the case statement's return value as an interval. I didn't even know interval types existed in Teradata. Thanks to this page for helping me along:
http://www.teradataforum.com/l081007a.htm
If I understand correctly, you want to multiply the interval by some number. Believe it or not, that's literally all you need to do:
select current_timestamp as right_now
, right_now + (interval '1' day) as same_time_tomorrow
, right_now + (2 * (interval '1' day)) as same_time_next_day
Intervals have always challenged me for some reason; I don't use them very often. But I've had this little example in my Teradata "cheat sheet" for quite a while.
Two remarks:
You could return an INTERVAL instead of an INT
The recommended way to write a date literal in Teradata is DATE 'YYYY-MM-DD' instead of CAST/FORMAT
select
case
when current_date between DATE '2013-03-10' and DATE '2013-11-03'
then interval '4' hour
else interval '5'hour
end AS Offset,
current_timestamp + Offset AS GMT

Resources