Create benchmark for the ScadaLTS,
who checks how much data is stored in the database per second.
Create data source
Copy three times
And enable every data-source and data point.
And How count save data per second in ScadaLTS?
You can do this using SQL:
select
count(*),
YEAR(from_unixtime(ts/1000)),
MONTH(from_unixtime(ts/1000)),
DAY(from_unixtime(ts/1000)),
hour(from_unixtime(ts/1000)),
minute(from_unixtime(ts/1000)),
second(from_unixtime(ts/1000))
from pointValues
where
YEAR(from_unixtime(ts/1000)) = YEAR(NOW()) and
MONTH(from_unixtime(ts/1000)) = Month(NOW()) and
DAY(from_unixtime(ts/1000)) = DAY(NOW())
group by
YEAR(from_unixtime(ts/1000)),
MONTH(from_unixtime(ts/1000)),
DAY(from_unixtime(ts/1000)),
hour(from_unixtime(ts/1000)),
minute(from_unixtime(ts/1000)),
second(from_unixtime(ts/1000))
Having count(*) >5
order by count(*) DESC
Related
I have a sqlite database of about 1.4 million rows and 16 columns.
I have to run an operation on 80,000 id's :
Get all rows associated with that id
convert to R date object and sort by date
calculate difference between 2 most recent dates
For each id I have been querying sqlite from R using dbSendQuery and dbFetch for step 1, while steps 2 and 3 are done in R. Is there a faster way? Would it be faster or slower to load the entire sqlite table into a data.table ?
I heavily depends on how you are working on that problem.
Normally loading the whole query inside the memory and then do the operation will be faster from what I have experienced and have seen on grahics, I can not show you a benchmark right now. If logically it makes hopefully sense, because you have to repeat several operations multiple times on multiple data.frames. As you can see here, 80k rows are pretty fast, faster than 3x 26xxx rows.
However you could have a look at the parallel package and use multiple cores on your machine to load subsets of your data and process them parallel, each on a multiple core.
Here you can find information how to do this:
http://jaehyeon-kim.github.io/2015/03/Parallel-Processing-on-Single-Machine-Part-I
If you're doing all that in R and fetching rows from the database 80,0000 times in a loop... you'll probably have better results doing it all in one go in sqlite instead.
Given a skeleton table like:
CREATE TABLE data(id INTEGER, timestamp TEXT);
INSERT INTO data VALUES (1, '2019-07-01'), (1, '2019-06-25'), (1, '2019-06-24'),
(2, '2019-04-15'), (2, '2019-04-14');
CREATE INDEX data_idx_id_time ON data(id, timestamp DESC);
a query like:
SELECT id
, julianday(first_ts)
- julianday((SELECT max(d2.timestamp)
FROM data AS d2
WHERE d.id = d2.id AND d2.timestamp < d.first_ts)) AS days_difference
FROM (SELECT id, max(timestamp) as first_ts FROM data GROUP BY id) AS d
ORDER BY id;
will give you
id days_difference
---------- ---------------
1 6.0
2 1.0
An alternative for modern versions of sqlite (3.25 or newer) (EDIT: On a test database with 16 million rows and 80000 distinct ids, it runs considerably slower than the above one, so you don't want to actually use it):
WITH cte AS
(SELECT id, timestamp
, lead(timestamp, 1) OVER id_by_ts AS next_ts
, row_number() OVER id_by_ts AS rn
FROM data
WINDOW id_by_ts AS (PARTITION BY id ORDER BY timestamp DESC))
SELECT id, julianday(timestamp) - julianday(next_ts) AS days_difference
FROM cte
WHERE rn = 1
ORDER BY id;
(The index is essential for performance for both versions. Probably want to run ANALYZE on the table at some point after it's populated and your index(es) are created, too.)
I have two tables. Config and Data. Config table has info to define what I call "Predefined Points". The columns are configId, machineId, iotype, ioid, subfield and predeftype. I have a second table that contains all the data for all the items in the config table linked by configId. Data table contains configId, timestamp, value.
I am trying to return each row from the config table with 2 new columns in the result which would be min timestamp of this particular predefined point and max timestamp of this particular predefined point.
Pseudocode would be
select a.*, min(b.timestamp), max(b.timestamp) from TrendConfig a join TrendData b on a.configId = b.configId where configId = (select configId from TrendConfig)
Where the subquery would return multiple values.
Any idea how to formulate this?
Try an inner join:
select a.*, b.min(timestamp), b.max(timestamp)
from config a
inner join data b
on a.configId = b.configID
I was able to find an answer using: Why can't you mix Aggregate values and Non-Aggregate values in a single SELECT?
The solution was indeed GROUP BY as CL mentioned above.
select a.*, min(b.timestamp), max(b.timestamp) from TrendConfig a join TrendData b on a.configId = b.configId group by a.configId
I have a SQLITE3 DB with following 3 column layout
typ (1=gas or 0=electrical power) | time (seconds since epoch) | value (float)
In there, I document events from a gas meter which fires every 10 liter of consumed gas. This is (when the gas heating is active) once every ~20 seconds. The value written together with the timestamp is 0 (zero).
I want to automatically fill an aggregaton table with the count of all records within an interval of 10 minutes.
I had success with this query to get the counts within the intervals:
select time/600*600+600 _time, count(*) _count
from data
where typ = 1 and value = 0
group by _time
order by _time
But how would I achive the following:
run this query regularely every 10 minutes (or at every INSERT with a TRIGGER?) at xx:10 / xx:20 / xx:20 / ...
write the resulting count of only the last 10 minutes to an aggregation table together with the interval end time.
I of course could do this with a program (e.g. PHP) but I'd prefer a DB-only solution if possible.
Thanks for any help.
This trigger will run for every inserted row, and tries to insert a corresponding row in an aggregate table if one does not already exist. Then it increments the counter value in the aggregate table for the timespan of the newly inserted row.
create trigger after insert on data
begin
insert or ignore into aggregateData(startTime, counter) values ((new.time / 600) * 600, 0);
update aggregateData set counter = counter + 1 where startTime = (new.time / 600) * 600;
end;
I think that I found an easier solution which in the end creates the same result:
Just turn my aggregate query into a view:
CREATE VIEW _aggregate as
select time/600*600+600 _time, count(*) _count
from data
where typ = 1 and value = 0
group by _time
order by _time
This gives me exactly my desired result if I do a:
select * from _aggregate
It's good enough to have the aggregated values at runtime and not to store them. Or do you see a substantial difference to your solution?
I want to update multiple columns in a table using a correlated subquery. Updating a single column is straightforward:
UPDATE route
SET temperature = (SELECT amb_temp.temperature
FROM amb_temp.temperature
WHERE amb_temp.location = route.location)
However, I'd like to update several columns of the route table. As the subquery is much more complex in reality (JOIN with a nested subquery using SpatiaLite functions), I want to avoid repeating it like this:
UPDATE route
SET
temperature = (SELECT amb_temp.temperature
FROM amb_temp.temperature
WHERE amb_temp.location = route.location),
error = (SELECT amb_temp.error
FROM amb_temp.temperature
WHERE amb_temp.location = route.location),
Ideally, SQLite would let me do something like this:
UPDATE route
SET (temperature, error) = (SELECT amb_temp.temperature, amb_temp.error
FROM amb_temp.temperature
WHERE amb_temp.location = route.location)
Alas, that is not possible. Can this be solved in another way?
Here's what I've been considering so far:
use INSERT OR REPLACE as proposed in this answer. It seems it's not possible to refer to the route table in the subquery.
prepend the UPDATE query with a WITH clause, but I don't think that is useful in this case.
For completeness sake, here's the actual SQL query I'm working on:
UPDATE route SET (temperature, time_distance) = ( -- (C)
SELECT -- (B)
temperature.Temp,
MIN(ABS(julianday(temperature.Date_HrMn)
- julianday(route.date_time))) AS datetime_dist
FROM temperature
JOIN (
SELECT -- (A)
*, Distance(stations.geometry,route.geometry) AS distance
FROM stations
WHERE EXISTS (
SELECT 1
FROM temperature
WHERE stations.USAF = temperature.USAF
AND stations.WBAN_ID = temperature.NCDC
LIMIT 1
)
GROUP BY stations.geometry
ORDER BY distance
LIMIT 1
) tmp
ON tmp.USAF = temperature.USAF
AND tmp.WBAN_ID = temperature.NCDC
)
High-level description of this query:
using geometry (= longitude & latitude) and date_time from the route table,
(A) find the weather station (stations table, uniquely identified by the USAF and NCDC/WBAN_ID columns)
closest to the given longitude/latitude (geometry)
for which temperatures are present in the temperature table
(B) find the temperature table row
for the weather station found above
closest in time to the given timestamp
(C) store the temperature and "time_distance" in the route table
I'm running the following on my sqlite3 DB, but the result is not limited to the last 3 records. It is returning the average for all records.
SELECT AVG(time) FROM tbl_aa ORDER BY ID LIMIT 3
Any thoughts?
Use a subquery to get the first 3 records and then calculate the average on them
select avg(time) from
(
SELECT time
FROM tbl_a
ORDER BY ID
LIMIT 3
) x
Limit will restrict the number of results in your result set, however AVG is calculated on the entire set so will only return one row. Therefore the limit is redundant.