How can I calculate the median of values in SQLite? - sqlite

I'd like to calculate the median value in a numeric row. How can I do that in SQLite 4?

Let's say that the median is the element in the middle of an ordered list.
SQLite (4 or 3) does not have any built-in function for that, but it's possible to do this by hand:
SELECT x
FROM MyTable
ORDER BY x
LIMIT 1
OFFSET (SELECT COUNT(*)
FROM MyTable) / 2
When there is an even number of records, it is common to define the median as the average of the two middle records.
In this case, the average can be computed like this:
SELECT AVG(x)
FROM (SELECT x
FROM MyTable
ORDER BY x
LIMIT 2
OFFSET (SELECT (COUNT(*) - 1) / 2
FROM MyTable))
Combining the odd and even cases then results in this:
SELECT AVG(x)
FROM (SELECT x
FROM MyTable
ORDER BY x
LIMIT 2 - (SELECT COUNT(*) FROM MyTable) % 2 -- odd 1, even 2
OFFSET (SELECT (COUNT(*) - 1) / 2
FROM MyTable))

There is an extension pack of various math functions for sqlite3. It includes group functions like median.
It will be more work getting this going than CL's answer, but might be worthwhile if you think you will need any of the other functions.
http://www.sqlite.org/contrib/download/extension-functions.c?get=25
(Here is the guide for how to compile and load SQLite extensions.)
From description:
Provide mathematical and string extension functions for SQL queries using the loadable extensions mechanism. Math: acos, asin, atan, atn2, atan2, acosh, asinh, atanh, difference, degrees, radians, cos, sin, tan, cot, cosh, sinh, tanh, coth, exp, log, log10, power, sign, sqrt, square, ceil, floor, pi. String: replicate, charindex, leftstr, rightstr, ltrim, rtrim, trim, replace, reverse, proper, padl, padr, padc, strfilter. Aggregate: stdev, variance, mode, median, lower_quartile, upper_quartile.
UPDATE 2015-04-12: Fixing "undefined symbol: sinh"
As has been mentioned in comments, this extension may not work properly despite a successful compile.
For example, compiling may work and on Linux you might copy the resulting .so file to /usr/local/lib. But .load /usr/local/lib/libsqlitefunctions from the sqlite3 shell may then generate this error:
Error: /usr/local/lib/libsqlitefunctions.so: undefined symbol: sinh
Compiling it this way seems to work:
gcc -fPIC -shared extension-functions.c -o libsqlitefunctions.so -lm
And copying the .so file to /usr/local/lib shows no similar error:
sqlite> .load /usr/local/lib/libsqlitefunctions
sqlite> select cos(pi()/4.0);
---> 0.707106781186548
I'm not sure why the order of options to gcc matters in this particular case, but apparently it does.
Credit for noticing this goes to Ludvick Lidicky's comment on this blog post

There is a log table with timestamp, label, and latency. We want to see the latency median value of each label, grouped by timestamp. Format all latency value to 15 char length with leading zeroes, concatenate it, and cut half positioned value(s).. there is the median.
select L, --V,
case when C % 2 = 0 then
( substr( V, ( C - 1 ) * 15 + 1, 15) * 1 + substr( V, C * 15 + 1, 15) * 1 ) / 2
else
substr( V, C * 15 + 1, 15) * 1
end as MEDST
from (
select L, group_concat(ST, "") as V, count(ST) / 2 as C
from (
select label as L,
substr( timeStamp, 1, 8) * 1 as T,
printf( '%015d',latency) as ST
from log
where label not like '%-%' and responseMessage = 'OK'
order by L, T, ST ) as XX
group by L
) as YY

Dixtroy provided the best solution via group_concat().
Here is a full sample for this:
DROP TABLE [t];
CREATE TABLE [t] (name, value INT);
INSERT INTO t VALUES ('A', 2);
INSERT INTO t VALUES ('A', 3);
INSERT INTO t VALUES ('B', 4);
INSERT INTO t VALUES ('B', 5);
INSERT INTO t VALUES ('B', 6);
INSERT INTO t VALUES ('C', 7);
results into this table:
name|value
A|2
A|3
B|4
B|5
B|6
C|7
now we use the (slightly modified) query from Dextroy:
SELECT name, --string_list, count, middle,
CASE WHEN count%2=0 THEN
0.5 * substr(string_list, middle-10, 10) + 0.5 * substr(string_list, middle, 10)
ELSE
1.0 * substr(string_list, middle, 10)
END AS median
FROM (
SELECT name,
group_concat(value_string,"") AS string_list,
count() AS count,
1 + 10*(count()/2) AS middle
FROM (
SELECT name,
printf( '%010d',value) AS value_string
FROM [t]
ORDER BY name,value_string
)
GROUP BY name
);
...and get this result:
name|median
A|2.5
B|5.0
C|7.0

If you are using PDO then ::loadExtension() used in Paul's answer might not be available to you.
Assuming you are using PHP, an alternative is to create an aggregate function.
$pdo_handle->sqliteCreateAggregate(
'median', // the name of the function to declare
function($context, $row_number, $value){ // a method called for each row
$context[] = $value; // store the values
return $context;
},
function($context, $row_count){ // a method called once all row have been iterated over
// sort the values
sort($context, SORT_NUMERIC);
// cound the number of values
$count = count($context);
// get the mid point of array (lowest one)
$middle = floor($count/2);
// if there is an even amount of values
if (($count % 2) == 0) {
// average the two middle values to find the median
return ($context[$middle--] + $context[$middle])/2;
} else {
// odd amount of elements, so the median value is simply the one in the middle
return $context[$middle];
}
},
1
);
You are then free to do a
SELECT median("column_name") FROM "table_name";
Similar "create function" might be available in other languages.

The SELECT AVG(x) returns just the year of date values formatted as YYYY-MM-DD, so I tweaked CL's solution just slightly to accommodate dates:
SELECT DATE(JULIANDAY(MIN(MyDate)) + (JULIANDAY(MAX(MyDate)) - JULIANDAY(MIN(MyDate)))/2) as Median_Date
FROM (
SELECT MyDate
FROM MyTable
ORDER BY MyDate
LIMIT 2 - ((SELECT COUNT(*) FROM MyTable) % 2) -- odd 1, even 2
OFFSET (SELECT (COUNT(*) - 1) / 2 FROM MyTable)
);

Related

An optimum way to locate the right OFFSET for a given data

To avoid fetching all data at once, which will cause out of memory, we are implementing paging in our app, by using limit & offset.
Each time, our app will only display 1 page.
Page 0 : select * from note order by order_id limit 10000 offset 0;
Page 1 : select * from note order by order_id limit 10000 offset 10000;
Page 2 : select * from note order by order_id limit 10000 offset 20000;
Page 3 : select * from note order by order_id limit 10000 offset 30000;
...
Whenever user adds a new data, we know the search criteria to locate data from SQLite.
select * from note where uuid = '1234-5678-9ABC';
However, we need to reload our app with the correct page.
But, we have no idea, how to have a good speed performance, to find out which page (which offset), the new data belongs to.
We can have the following brute force way to find out which offset the data belongs to
select * from (select * from note order by order_id limit 10000 offset 0) where uuid = '1234-5678-9ABC';
select * from (select * from note order by order_id limit 10000 offset 10000) where uuid = '1234-5678-9ABC';
select * from (select * from note order by order_id limit 10000 offset 20000) where uuid = '1234-5678-9ABC';
select * from (select * from note order by order_id limit 10000 offset 30000) where uuid = '1234-5678-9ABC';
...
But, that is highly inefficient.
Is there any "smart" way, so that we can have a good speed performance, to locate the right offset for a given data?
Thanks.
Use row_number() to find offset
The offset can be calculated by using the row_number window function to compute a row index. https://www.sqlite.org/windowfunctions.html#builtins
FIND ROW INDEX
select
uuid, (row_number() over (order by order_id) - 1) as row_index
from note
FIND OFFSET
The offset can be computed using modular arithmetic.
select row_index - (row_index % 10000) as offset
from note_row_index
where uuid = '1234-5678-9ABC'
USE OFFSET IN QUERY
with note_row_index(uuid, row_index) AS (
select
uuid, (row_number() over (order by order_id) - 1) as row_index
from note
),
note_offset(offset) AS (
select row_index - (row_index % 10000) as offset
from note_row_index
where uuid = '1234-5678-9ABC'
)
select *
from note
order by order_id
limit 10000
offset (select offset from note_offset)

How to get final length of a line in a query?

I am just learning SQL and I got a task, that I need to find the final length of a discontinuous line when I have imput such as:
start | finish
0 | 3
2 | 7
15 | 17
And the correct answer here would be 9, because it spans from 0-3 and then I am suppsed to ignore the parts that are present multiple times so from 3-7(ignoring the two because it is between 0 and 3 already) and 15-17. I am supposed to get this answer solely through an sql query(no functions) and I am unsure of how. I have tried to experiment with some code using with, but I can't for the life of me figure out how to ignore all the multiples properly.
My half-attempt:
WITH temp AS(
SELECT s as l, f as r FROM lines LIMIT 1),
cte as(
select s, f from lines where s < (select l from temp) or f > (select r from temp)
)
select * from cte
This really only gives me all the rows tha are not completly usless and extend the length, but I dont know what to do from here.
Use a recursive CTE that breaks all the (start, finish) intervals to as many 1 unit length intervals as is the total length of the interval and then count all the distinct intervals:
WITH cte AS (
SELECT start x1, start + 1 x2, finish FROM temp
WHERE start < finish -- you can omit this if start < finish is always true
UNION
SELECT x2, x2 + 1, finish FROM cte
WHERE x2 + 1 <= finish
)
SELECT COUNT(DISTINCT x1) length
FROM cte
See the demo.
Result:
length
9

How to format a Float number in SQLite?

In SQLite I need to format a number to display it with the thousand separator and decimal separator. Example: The number 123456789 should be displayed as 1,234,567.89
What I did partially works because it does not display the thousand separator as I expected:
select *, printf ("U$%.2f", CAST(unit_val AS FLOAT) / 100) AS u_val FROM items;
u_val shows: U$1234567.89 but I need U$1,234,567.89
The following is one way that you could accomplish the result:-
select *, printf ("U$%.2f", CAST(unit_val AS FLOAT) / 100) AS u_val FROM items;
Could become :-
SELECT
*,
CASE
WHEN len < 9 THEN myfloat
WHEN len> 8 AND len < 12 THEN substr(myfloat,1,len - 6)||','||substr(myfloat,len - 5)
WHEN len > 11 AND len < 15 THEN substr(myfloat,1,len -9)||','||substr(myfloat,len-8,3)||','||substr(myfloat,len-5)
WHEN len > 14 AND len < 18 THEN substr(myfloat,1,len - 12)||','||substr(myfloat,len -11,3)||','||substr(myfloat,len-8,3)||','||substr(myfloat,len-5)
END AS u_val
FROM
(
SELECT *, length(myfloat) AS len
FROM
(
SELECT *,printf("U$%.2f",CAST(unit_val AS FLOAT) / 100)) AS myfloat
FROM Items
)
)
The innermost SELECT extracts the original data plus a new column as per your orginal SELECT.
The intermediate SELECT adds another column for the length of the new column as derived from unit_val via the printf (this could have been done in the first/innermost SELECT, getting this value simplifies (in my opinion) the outermost SELECT, or you could use multiple length(myfloats) in the outermost SELECT).
RESULT - Example
The following is the result from a testing (source column is myfloat) :-
The highlighted columns being the original columns.
The circled data being the result.
The other 2 columns are intermediate columns.
Edit
As you've clarified that the input is an integer, then :-
SELECT *,'U$'||printf('%,d',(unit_val/100))||'.'||CAST((unit_val % 100) AS INTEGER) AS u_val FROM Items
would work assuming that you are using at least version 3.18 of SQLite.
Correction
Using the SQL immediately above if the value of the last part (the cents) is less than 10 then the leading 0 is dropped. So the correct SQL is. Note for simplicity the CAST has also been dropped and rather than concatening the . it has been added to the printf formatter string so :-
SELECT
'U$' ||
printf('%,d', (unit_val / 100)) ||
printf('.%02d',unit_val % 100)
AS u_val
FROM Items
Or as a single line
SELECT 'U$' || printf('%,d', (unit_val / 100)) || printf('.%02d',unit_val % 100) AS u_val FROM Items
Here is a suggestion:
WITH cte AS (SELECT 123456789 AS unit_val)
SELECT printf('%,d.%02d', unit_val/100, unit_val%100) FROM cte;
The Common Table Expression is just there to supply a dummy value, in the absence of variables.
The %,d format adds thousands separators, but, as many have pointed out, only for integers. Because of that, you will need to use the unit_val twice, once for the integer part, and again to calculate the decimal part.
SQLite truncates integer division, so unit_val/100 gives you your dollar part. The % operator is a remainder operator (not strictly the same as “mod”), so unit_val%100 gives the cents part, as another integer. The %02d format ensures that this is always 2 digits, padding with zeroes if necessary.

How to convert the Long value to String using sql

I am doing a long to string conversion using java in following way.
Long longValue = 367L;
String str = Long.toString(longValue, 36).toUpperCase();
this is returning me as value A7. how can achieve this in doing oracle sql.
UPDATED:
Hi, I have analyzed how java code is working then wanted to implement the same thing in procedure.
First point is Input vaues. LONG and Radix. in my case Radix is 36. so i will have values from 1..9A...Z0 It picks up the values from this set only.
Second point Long value as input. we have to divide this value with radix. if the quotient is more than 36 again we need to divide.
For eaxmple 367 then my converted value is 10(quotient) 7(remainder) that is A7.
3672 converted value is 102 0 i need to do again for 102 that is 2 -6 so my final value will be 2-6 0 that is 2U0(- means reverse the order).
UPDATE 2:
Using oracle built in functions we can do this. this was solved by my friend and gave me a function.I want to thank my friend. this will give me an out put as follows.
367 then my converted value is 10(quotient) 7(remainder) that is *A*7.(I modified this to my requirement).
FUNCTION ENCODE_STRING(BASE_STRING IN VARCHAR2,
FROM_BASE IN NUMBER,
TO_BASE IN NUMBER)
RETURN VARCHAR2
IS
V_ENCODED_STRING VARCHAR(100);
BEGIN
WITH N1 AS (
SELECT SUM((CASE
WHEN C BETWEEN '0' AND '9'
THEN TO_NUMBER(C)
ELSE
ASCII(C) - ASCII('A') + 10
END) * POWER(FROM_BASE, LEN - RN)
) AS THE_NUM
FROM (SELECT SUBSTR(BASE_STRING, ROWNUM, 1) C, LENGTH(BASE_STRING) LEN, ROWNUM RN
FROM DUAL
CONNECT BY ROWNUM <= LENGTH(BASE_STRING))
),
N2 AS (
SELECT (CASE
WHEN N < 10
THEN TO_CHAR(N)
ELSE CHR(ASCII('A') + N - 10)
END) AS DIGI, RN
FROM (SELECT MOD(TRUNC(THE_NUM/POWER(TO_BASE, ROWNUM - 1)), TO_BASE) N, ROWNUM RN
FROM N1
CONNECT BY ROWNUM <= TRUNC(LOG(TO_BASE, THE_NUM)) + 1)
)
SELECT SYS_CONNECT_BY_PATH(DIGI, '*') INTO V_ENCODED_STRING
FROM N2
WHERE RN = 1
START WITH RN = (SELECT MAX(RN) FROM N2)
CONNECT BY RN = PRIOR RN - 1;
RETURN V_ENCODED_STRING;
IN PL/SQL (or Oracle SQL) you have the a function called TO_CHAR.
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions181.htm
It is not possible to do it in the pure SQL. You have to use PL/SQL.
Simple example how to do it PL/SQL:
CREATE TABLE long_tbl
(
long_col LONG
);
INSERT INTO long_tbl VALUES('How to convert the Long value to String using sql');
DECLARE
l_varchar VARCHAR2(32767);
BEGIN
SELECT long_col
INTO l_varchar
FROM long_tbl;
DBMS_OUTPUT.PUT_LINE(l_varchar);
END;
-- How to convert the Long value to String using sql
There is TO_LOB function but it can only by used when you insert data into table.
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions185.htm
You can apply this function only to a LONG or LONG RAW column, and
only in the select list of a subquery in an INSERT statement.
There is also other, more proper way to do it by using "dbms_sql.column_value_long" but this gets complicated (fetching of the LONG column and appending to the CLOB type.)
http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_sql.htm#i1025399
(Oracle Database PL/SQL Packages and Types Reference)

How To Create A Stored Procedure That return a Random Number If Not Exist In The Table

I Want To Create A Stored procedure That return A Random Number Between (11111,99999)
Provided that the Number Should Not Exist In The Table
I use This complicated Function to Do that But I Need To Convert it to Stored Procedure
Function GiveRandomStudentNumber() As String
s:
Dim rnd As New Random
Dim st_num As String = rnd.Next(11111, 99999)
Dim cmd As New SqlCommand("select count(0) from student where st_num = " & st_num,con)
dd.con.Open()
Dim count As Integer = cmd.ExecuteScalar()
dd.con.Close()
If count <> 0 Then
GoTo s
Else
Return st_num
End If
End Function
this Function Is Works But I need To Convert it To Stored Procedure ..
Thanks In Advance ...
CREATE PROCEDURE [dbo].[Select_RandomNumber]
(
#Lower INT, --11111-- The lowest random number
#Upper INT --99999-- The highest random number
)
AS
BEGIN
IF NOT (#Lower < #Upper) RETURN -1
--TODO: If all the numbers between Lower and Upper are in the table,
--you should return from here
--RETURN -2
DECLARE #Random INT;
SELECT #Random = ROUND(((#Upper - #Lower -1) * RAND() + #Lower), 0)
WHILE EXISTS (SELECT * FROM YourTable WHERE randCol = #Random)
BEGIN
SELECT #Random = ROUND(((#Upper - #Lower -1) * RAND() + #Lower), 0)
END
SELECT #Random
END
Create a table of student IDs. Fill it up with IDs between X and Y. Every time you want to use an ID, remove it from the table.
create table [FreeIDs] (
[ID] int,
[order] uniqueidentifier not null default newid() primary key);
insert into [FreeIDs] ([ID]) values (11111),(11112),...,(99999);
to get a free ID:
with cte as (
select top(1) [ID]
from [FreeIDs]
order by [order])
delete cte
output deleted.ID;
The persisted predeterminer order speeds up generating new IDs.
BTW, if you're tempted to 'optimize' the table and go by a numbers table:
with Digits as (
select Digit
from (
values (0), (1), (2), (3), (4), (5),
(6), (7), (8), (9)) as t(Digit)),
Numbers as (
select u.Digit + t.Digit*10 +h.Digit*100 + m.Digit*1000+tm.Digit*10000 as Number
from Digits u
cross join Digits t
cross join Digits h
cross join Digits m
cross join Digits tm)
select top(1) Number
from Numbers
where Number between 11111 and 99999
and Number not in (
select ID
from Students)
order by (newid());
just don't. The requirement to randomize the set is a performance killer and the join to eliminate existing (used) IDs is also problematic. But most importantly the solution fails under concurrency, as multiple requests can get the same ID (and this increases as the number of free IDs is reduced). And of course, the semantically equivalent naive row-by-painfully-slow-row processing, like your original code or Kaf's answer, have exactly the same problem but are also just plain slow. It really worth testing the solution when all but one of the IDs are taken, watch the light dim as you wait for the random number generator to hit the jackpot...

Resources