Minimum variability check in R or PostgreSQL - r

My Data-set contains temperature values. I want to preform minimum variability check. I would like to check if 3 successive temperature values do not changed with respect to a per-defined threshold (.05), then replacing them with mean value of last three observations.

WITH A as (
SELECT ambtemp,
date_trunc('hour', dt)+
CASE WHEN date_part('minute', dt) >= 6
THEN interval '6 minutes'
ELSE interval '0 minutes'
END as t
FROM temm),
B as(
SELECT ambtemp,t,
max(ambtemp::float(23)) OVER (PARTITION BY t) as max_temp,
min(ambtemp::float(23)) OVER (PARTITION BY t) as min_temp
FROM A)
SELECT *
FROM B
WHERE (max_temp - min_temp) <= 0.5

Related

How to get final length of a line in a query?

I am just learning SQL and I got a task, that I need to find the final length of a discontinuous line when I have imput such as:
start | finish
0 | 3
2 | 7
15 | 17
And the correct answer here would be 9, because it spans from 0-3 and then I am suppsed to ignore the parts that are present multiple times so from 3-7(ignoring the two because it is between 0 and 3 already) and 15-17. I am supposed to get this answer solely through an sql query(no functions) and I am unsure of how. I have tried to experiment with some code using with, but I can't for the life of me figure out how to ignore all the multiples properly.
My half-attempt:
WITH temp AS(
SELECT s as l, f as r FROM lines LIMIT 1),
cte as(
select s, f from lines where s < (select l from temp) or f > (select r from temp)
)
select * from cte
This really only gives me all the rows tha are not completly usless and extend the length, but I dont know what to do from here.
Use a recursive CTE that breaks all the (start, finish) intervals to as many 1 unit length intervals as is the total length of the interval and then count all the distinct intervals:
WITH cte AS (
SELECT start x1, start + 1 x2, finish FROM temp
WHERE start < finish -- you can omit this if start < finish is always true
UNION
SELECT x2, x2 + 1, finish FROM cte
WHERE x2 + 1 <= finish
)
SELECT COUNT(DISTINCT x1) length
FROM cte
See the demo.
Result:
length
9

How can I accumulate values from rows on a per closest date basis to a list of dates as a parameter according the parameter dates?

There is table that has a date and cnt column e.g.
timestamp cnt
------------------
1547015021 14
1547024080 2
This table can be created using :-
DROP TABLE IF EXISTS roundit_base;
CREATE TABLE IF NOT EXISTS roundit_base (timestamp INTEGER, cnt INTEGER);
INSERT INTO roundit_base VALUES (1547015021,14),(1547024080,2);
The result should be the sum of the cnt column of rows that are the closest timestamp to a list of supplied timestamps, e.g. the supplied data could be
1546905600 - 0
1546992000 - 0
1547078400 - 0
...
The result should be along the lines of
1546905600 - 0
1546992000 - 14
1547078400 - 2
That is two columns:-
the timestamp from the list of supplied timestamps, that the respective rows from the database are closest to and
the sum of the cnt column those rows on a per supplied timestamp
Although the results are different from the expected results in that the calculations used places both 1547015021 and 1547024080 as being closest to the suplied timestamp of 1546992000;
The following could be the basis of an SQLite based solution :-
WITH
-- The supplied list of timestamps
v (cv,dflt) AS (
VALUES (1546905600,0),(1546992000,0),(1547078400,0)
),
-- Join the two sets calculating the difference
cte1 AS (
SELECT *, abs(cv - timestamp) AS diff FROM roundit_base INNER JOIN v
),
-- Find the closest (smallest difference) for each timestamp
cte2 AS (
SELECT *, min(diff) FROM cte1 GROUP BY timestamp
)
-- For each compartive value sum the counts allocated/assigned (timestamps) to that
SELECT cv,
CASE
WHEN
(SELECT sum(cnt) FROM cte2 WHERE cv = v.cv) IS NOT NULL
THEN
(SELECT sum(cnt) FROM cte2 WHERE cv = v.cv)
ELSE 0
END AS cnt
FROM v;
;
The above results in :-

Creating even ranges based on values in an oracle table

I have a big table which is 100k rows in size and the PRIMARY KEY is of the datatype NUMBER. The way data is populated in this column is using a random number generator.
So my question is, can there be a possibility to have a SQL query that can help me with getting partition the table evenly with the range of values. Eg: If my column value is like this:
1
2
3
4
5
6
7
8
9
10
And I would like this to be broken into three partitions, then I would expect an output like this:
Range 1 1-3
Range 2 4-7
Range 3 8-10
It sounds like you want the WIDTH_BUCKET() function. Find out more.
This query will give you the start and end range for a table of 1250 rows split into 20 buckets based on id:
with bkt as (
select id
, width_bucket(id, 1, 1251, 20) as id_bucket
from t23
)
select id_bucket
, min(id) as bkt_start
, max(id) as bkt_end
, count(*)
from bkt
group by id_bucket
order by 1
;
The two middle parameters specify min and max values; the last parameter specifies the number of buckets. The output is the rows between the minimum and maximum bows split as evenly as possible into the specified number of buckets. Be careful with the min and max parameters; I've found poorly chosen bounds can have an odd effect on the split.
This solution works without width_bucket function. While it is more verbose and certainly less efficient it will split the data as evenly as possible, even if some ID values are missing.
CREATE TABLE t AS
SELECT rownum AS id
FROM dual
CONNECT BY level <= 10;
WITH
data AS (
SELECT id, rownum as row_num
FROM t
),
total AS (
SELECT count(*) AS total_rows
FROM data
),
parts AS (
SELECT rownum as part_no, total.total_rows, total.total_rows / 3 as part_rows
FROM dual, total
CONNECT BY level <= 3
),
bounds AS (
SELECT parts.part_no,
parts.total_rows,
parts.part_rows,
COALESCE(LAG(data.row_num) OVER (ORDER BY parts.part_no) + 1, 1) AS start_row_num,
data.row_num AS end_row_num
FROM data
JOIN parts
ON data.row_num = ROUND(parts.part_no * parts.part_rows, 0)
)
SELECT bounds.part_no, d1.ID AS start_id, d2.ID AS end_id
FROM bounds
JOIN data d1
ON d1.row_num = bounds.start_row_num
JOIN data d2
ON d2.row_num = bounds.end_row_num
ORDER BY bounds.part_no;
PART_NO START_ID END_ID
---------- ---------- ----------
1 1 3
2 4 7
3 8 10

Sum all field with the other field < itself in sqlite

Sorry because I dont think good title for my problem.
I have table a(f1 integer, date Long), date increase, and the data
f1 date
1 1
2 2
3 3
...
I need to sum f1 by date, with record 1{1,1} the sum f1 is 1,with record 2 the sum f1 is 1+2, record 3 the sum f1 is 1+2+3...
How can I do that?
This requires a correlated subquery:
SELECT date,
(SELECT SUM(f1)
FROM a AS a2
WHERE a2.date <= a.date
) AS f1_sum
FROM a
ORDER BY date;
But it's inefficient. Consider just scanning the table, sorted by the date, and summing f1 as you're reading it.

SQLite count(*) in while clause

I have a calendar table in which there are all the dates in the future and a workday field:
fld_date / fld_workday
2014-01-01 / 1
2014-01-02 / 1
2014-01-03 / 0
...
I want select a date which are n workday far from another date. I tried two ways, but i failed:
The 5th workday from 2014-11-07:
1.
SELECT n1.fld_date FROM calendar as n1 WHERE n1.fld_workday=1 AND
(select count(*) FROM calendar as n2 WHERE n2.fld_date>='2014-11-07' AND n2.fld_workday=1)=5
It gave back 0 row.
2.
SELECT fld_date FROM calendar WHERE fld_date>='2014-11-07' AND fld_workday=1 LIMIT 1 OFFSET 5
It's ok, but i would like to change the 5 days constant to a field, and it's cannot (it would be inside a bigger select statement):
SELECT fld_date FROM calendar WHERE fld_date>='2014-11-07' AND fld_workday=1 LIMIT 1 OFFSET fld_another_field
Any suggestion?
In the first query, the subquery does not refer to the row in n1.
You need a correlated subquery:
SELECT fld_Date
FROM Calendar AS n1
WHERE fld_WorkDay = 1
AND (SELECT COUNT(*)
FROM Calendar AS n2
WHERE fld_Date BETWEEN '2014-11-07' AND n1.fld_Date
AND fld_WorkDay = 1
) = 5
LIMIT 1
The subquery is extremly inefficient if there is no index on the fld_Date column.
You can avoid executing the subquery for every row in n1 by adding another condition with an estimate of the result date (assuming that there are between about four to five work days per week, and using a few extra days to be sure):
...
WHERE fldDate BETWEEN date('2014-11-07', (5 * 4/7 - 10) || ' days')
AND date('2014-11-07', (5 * 5/7 + 10) || ' days')
AND fldWorkDay = 1
AND (SELECT ...

Resources