I have a big table which is 100k rows in size and the PRIMARY KEY is of the datatype NUMBER. The way data is populated in this column is using a random number generator.
So my question is, can there be a possibility to have a SQL query that can help me with getting partition the table evenly with the range of values. Eg: If my column value is like this:
1
2
3
4
5
6
7
8
9
10
And I would like this to be broken into three partitions, then I would expect an output like this:
Range 1 1-3
Range 2 4-7
Range 3 8-10
It sounds like you want the WIDTH_BUCKET() function. Find out more.
This query will give you the start and end range for a table of 1250 rows split into 20 buckets based on id:
with bkt as (
select id
, width_bucket(id, 1, 1251, 20) as id_bucket
from t23
)
select id_bucket
, min(id) as bkt_start
, max(id) as bkt_end
, count(*)
from bkt
group by id_bucket
order by 1
;
The two middle parameters specify min and max values; the last parameter specifies the number of buckets. The output is the rows between the minimum and maximum bows split as evenly as possible into the specified number of buckets. Be careful with the min and max parameters; I've found poorly chosen bounds can have an odd effect on the split.
This solution works without width_bucket function. While it is more verbose and certainly less efficient it will split the data as evenly as possible, even if some ID values are missing.
CREATE TABLE t AS
SELECT rownum AS id
FROM dual
CONNECT BY level <= 10;
WITH
data AS (
SELECT id, rownum as row_num
FROM t
),
total AS (
SELECT count(*) AS total_rows
FROM data
),
parts AS (
SELECT rownum as part_no, total.total_rows, total.total_rows / 3 as part_rows
FROM dual, total
CONNECT BY level <= 3
),
bounds AS (
SELECT parts.part_no,
parts.total_rows,
parts.part_rows,
COALESCE(LAG(data.row_num) OVER (ORDER BY parts.part_no) + 1, 1) AS start_row_num,
data.row_num AS end_row_num
FROM data
JOIN parts
ON data.row_num = ROUND(parts.part_no * parts.part_rows, 0)
)
SELECT bounds.part_no, d1.ID AS start_id, d2.ID AS end_id
FROM bounds
JOIN data d1
ON d1.row_num = bounds.start_row_num
JOIN data d2
ON d2.row_num = bounds.end_row_num
ORDER BY bounds.part_no;
PART_NO START_ID END_ID
---------- ---------- ----------
1 1 3
2 4 7
3 8 10
I'd like to calculate the median value in a numeric row. How can I do that in SQLite 4?
Let's say that the median is the element in the middle of an ordered list.
SQLite (4 or 3) does not have any built-in function for that, but it's possible to do this by hand:
SELECT x
FROM MyTable
ORDER BY x
LIMIT 1
OFFSET (SELECT COUNT(*)
FROM MyTable) / 2
When there is an even number of records, it is common to define the median as the average of the two middle records.
In this case, the average can be computed like this:
SELECT AVG(x)
FROM (SELECT x
FROM MyTable
ORDER BY x
LIMIT 2
OFFSET (SELECT (COUNT(*) - 1) / 2
FROM MyTable))
Combining the odd and even cases then results in this:
SELECT AVG(x)
FROM (SELECT x
FROM MyTable
ORDER BY x
LIMIT 2 - (SELECT COUNT(*) FROM MyTable) % 2 -- odd 1, even 2
OFFSET (SELECT (COUNT(*) - 1) / 2
FROM MyTable))
There is an extension pack of various math functions for sqlite3. It includes group functions like median.
It will be more work getting this going than CL's answer, but might be worthwhile if you think you will need any of the other functions.
http://www.sqlite.org/contrib/download/extension-functions.c?get=25
(Here is the guide for how to compile and load SQLite extensions.)
From description:
Provide mathematical and string extension functions for SQL queries using the loadable extensions mechanism. Math: acos, asin, atan, atn2, atan2, acosh, asinh, atanh, difference, degrees, radians, cos, sin, tan, cot, cosh, sinh, tanh, coth, exp, log, log10, power, sign, sqrt, square, ceil, floor, pi. String: replicate, charindex, leftstr, rightstr, ltrim, rtrim, trim, replace, reverse, proper, padl, padr, padc, strfilter. Aggregate: stdev, variance, mode, median, lower_quartile, upper_quartile.
UPDATE 2015-04-12: Fixing "undefined symbol: sinh"
As has been mentioned in comments, this extension may not work properly despite a successful compile.
For example, compiling may work and on Linux you might copy the resulting .so file to /usr/local/lib. But .load /usr/local/lib/libsqlitefunctions from the sqlite3 shell may then generate this error:
Error: /usr/local/lib/libsqlitefunctions.so: undefined symbol: sinh
Compiling it this way seems to work:
gcc -fPIC -shared extension-functions.c -o libsqlitefunctions.so -lm
And copying the .so file to /usr/local/lib shows no similar error:
sqlite> .load /usr/local/lib/libsqlitefunctions
sqlite> select cos(pi()/4.0);
---> 0.707106781186548
I'm not sure why the order of options to gcc matters in this particular case, but apparently it does.
Credit for noticing this goes to Ludvick Lidicky's comment on this blog post
There is a log table with timestamp, label, and latency. We want to see the latency median value of each label, grouped by timestamp. Format all latency value to 15 char length with leading zeroes, concatenate it, and cut half positioned value(s).. there is the median.
select L, --V,
case when C % 2 = 0 then
( substr( V, ( C - 1 ) * 15 + 1, 15) * 1 + substr( V, C * 15 + 1, 15) * 1 ) / 2
else
substr( V, C * 15 + 1, 15) * 1
end as MEDST
from (
select L, group_concat(ST, "") as V, count(ST) / 2 as C
from (
select label as L,
substr( timeStamp, 1, 8) * 1 as T,
printf( '%015d',latency) as ST
from log
where label not like '%-%' and responseMessage = 'OK'
order by L, T, ST ) as XX
group by L
) as YY
Dixtroy provided the best solution via group_concat().
Here is a full sample for this:
DROP TABLE [t];
CREATE TABLE [t] (name, value INT);
INSERT INTO t VALUES ('A', 2);
INSERT INTO t VALUES ('A', 3);
INSERT INTO t VALUES ('B', 4);
INSERT INTO t VALUES ('B', 5);
INSERT INTO t VALUES ('B', 6);
INSERT INTO t VALUES ('C', 7);
results into this table:
name|value
A|2
A|3
B|4
B|5
B|6
C|7
now we use the (slightly modified) query from Dextroy:
SELECT name, --string_list, count, middle,
CASE WHEN count%2=0 THEN
0.5 * substr(string_list, middle-10, 10) + 0.5 * substr(string_list, middle, 10)
ELSE
1.0 * substr(string_list, middle, 10)
END AS median
FROM (
SELECT name,
group_concat(value_string,"") AS string_list,
count() AS count,
1 + 10*(count()/2) AS middle
FROM (
SELECT name,
printf( '%010d',value) AS value_string
FROM [t]
ORDER BY name,value_string
)
GROUP BY name
);
...and get this result:
name|median
A|2.5
B|5.0
C|7.0
If you are using PDO then ::loadExtension() used in Paul's answer might not be available to you.
Assuming you are using PHP, an alternative is to create an aggregate function.
$pdo_handle->sqliteCreateAggregate(
'median', // the name of the function to declare
function($context, $row_number, $value){ // a method called for each row
$context[] = $value; // store the values
return $context;
},
function($context, $row_count){ // a method called once all row have been iterated over
// sort the values
sort($context, SORT_NUMERIC);
// cound the number of values
$count = count($context);
// get the mid point of array (lowest one)
$middle = floor($count/2);
// if there is an even amount of values
if (($count % 2) == 0) {
// average the two middle values to find the median
return ($context[$middle--] + $context[$middle])/2;
} else {
// odd amount of elements, so the median value is simply the one in the middle
return $context[$middle];
}
},
1
);
You are then free to do a
SELECT median("column_name") FROM "table_name";
Similar "create function" might be available in other languages.
The SELECT AVG(x) returns just the year of date values formatted as YYYY-MM-DD, so I tweaked CL's solution just slightly to accommodate dates:
SELECT DATE(JULIANDAY(MIN(MyDate)) + (JULIANDAY(MAX(MyDate)) - JULIANDAY(MIN(MyDate)))/2) as Median_Date
FROM (
SELECT MyDate
FROM MyTable
ORDER BY MyDate
LIMIT 2 - ((SELECT COUNT(*) FROM MyTable) % 2) -- odd 1, even 2
OFFSET (SELECT (COUNT(*) - 1) / 2 FROM MyTable)
);
My Data-set contains temperature values. I want to preform minimum variability check. I would like to check if 3 successive temperature values do not changed with respect to a per-defined threshold (.05), then replacing them with mean value of last three observations.
WITH A as (
SELECT ambtemp,
date_trunc('hour', dt)+
CASE WHEN date_part('minute', dt) >= 6
THEN interval '6 minutes'
ELSE interval '0 minutes'
END as t
FROM temm),
B as(
SELECT ambtemp,t,
max(ambtemp::float(23)) OVER (PARTITION BY t) as max_temp,
min(ambtemp::float(23)) OVER (PARTITION BY t) as min_temp
FROM A)
SELECT *
FROM B
WHERE (max_temp - min_temp) <= 0.5
Here is my example table:
column_example
10
20
25
50
Here is what I would like:
column_example2
10
5
25
I'm sure this is a simple question, but I haven't found the answer in the SQLite Syntax web page or via Google.
EDIT: To clarify, the code would likely return the outputs for:
20-10
25-20
50-25
This solution might be slow, but I had to consider the potential gaps between succeeding rowids:
http://sqlfiddle.com/#!5/daeed/1
SELECT
(SELECT x
FROM t AS t3
WHERE t3.rowid =
(SELECT MIN(tt.rowid)
FROM t AS tt
WHERE tt.rowid > t.rowid
)
)
- x
FROM t
WHERE diff IS NOT NULL
If it is guaranteed to not have any gaps between rowids, then you can use this simpler query:
http://sqlfiddle.com/#!5/1f906/3
SELECT t_next.x - t.x
FROM t
INNER JOIN t AS t_next
ON t_next.rowid = t.rowid + 1
I need some help to build SQL Query. I have table having data like:
ID Date Name
1 1/1/2009 a
2 1/2/2009 b
3 1/3/2009 c
I need to get result something like...
1 1/1/2009 a
2 1/2/2009 b
3 1/3/2009 c
4 1/4/2009 Null
5 1/5/2009 Null
6 1/6/2009 Null
7 1/7/2009 Null
8 1/8/2009 Null
............................
............................
............................
30 1/30/2009 Null
31 1/31/2009 Null
I want query something like..
Select * from tbl **where month(Date)=1 AND year(Date)=2010**
Above is not completed query.
I need to get all the record of particular month, even if some date missing..
I guess there must be equi Join in the query, I am trying to build this query using Equi join
Thanks
BIG EDIT
Now understand the OPs question.
Use a common table expression and a left join to get this effect.
DECLARE #FirstDay DATETIME;
-- Set start time
SELECT #FirstDay = '2009-01-01';
WITH Days AS
(
SELECT #FirstDay as CalendarDay
UNION ALL
SELECT DATEADD(d, 1, CalendarDay) as CalendarDay
FROM Days
WHERE DATEADD(d, 1, CalendarDay) < DATEADD(m, 1, #FirstDay)
)
SELECT DATEPART(d,d.CalendarDay), **t.date should be (d.CalendarDay)**, t.Name FROM Days d
LEFT JOIN tbl t
ON
d.CalendarDay = t.Date
ORDER BY
d.CalendarDay;
Left this original answer at bottom
You need DATEPART, sir.
SELECT * FROM tbl WHERE DATEPART(m,Date) = 1
If you want to choose month and year, then you can use DATEPART twice or go for a range.
SELECT * FROM tbl WHERE DATEPART(m,Date) = 1 AND DATEPART(yyyy,Date) = 2009
Range :-
SELECT * FROM tbl WHERE Date >= '2009-01-01' AND Date < '2009-02-01'
See this link for more info on DATEPART.
http://msdn.microsoft.com/en-us/library/ms174420.aspx
You can use less or equal to.
Like so:
select * from tbl where date > '2009-01-01' and date < '2009-02-01'
However, it is unclear if you want month 1 from all years?
You can check more examples and functions on "Date and Time Functions" from MSDN
Create a temporary table containing all days of that certain month,
Do left outer join between that table and your data table on tempTable.month = #month.
now you have a big table with all days of the desired month and all the records matching the proper dates + empty records for those dates who have no data.
i hope that's what you want.