HOW TO SELECT RANDOM ROWS IN mb maria - mariadb

SELECT col1 FROM tbl ORDER BY RAND() LIMIT 10;
This can work fine for small tables. However, for big table, it will have a serious performance problem as in order to generate the list of random rows, MySQL need to assign random number to each row and then sort them.
Even if you want only 10 random rows from a set of 100k rows, MySQL need to sort all the 100k rows and then, extract only 10 of them.
My solution for this problem, is to use RAND in the WHERE clause and not in the ORDER BY clause. First, you need to calculate the fragment of your desired result set rows number from the total rows in your table. Second, use this fragment in the WHERE clause and ask only for RAND numbers that smallest (or equal) from this fragment.

SELECT col1 FROM tbl WHERE RAND()<=0.0005;
In order to get exactly 100 row in the result set, we can increase the fragment number a bit and limit the query:
For example:

Related

Is there a way to stop a sqlite3 total() early when over a certain value?

I have this sample table:
NAME SIZE
sam 100
skr 200
sss 50
thu 150
I want to do this query:
select total(size) > 300 from sample;
but my table is very big, so I want it to stop computing total(size) early if it's already greater than 300, instead of going through the entire table. (There are no negative sizes in my table.) Is there any way to do this in SQLite?
I've found a way to allow it to stop early, by using a window function, but unfortunately it makes it slower for a different reason. I hope someone else has a way to do this faster. To truly make it fast, you might need to create a custom aggregate function.
Window function method
A normal aggregate function like total() will always add all of the rows its given, but you can use an aggregate window function instead to add only some of the rows:
select name, size,
total(size) over (rows between unbounded preceding
and current row)
from sample
will give you
sam|100|100.0
skr|200|300.0
sss|50|350.0
thu|150|500.0
in which the third column is a cumulative sum. You can see in this result that you'd like to stop this query once you see the 350. You can do this by putting the above query into a subquery and using the EXISTS operator:
select exists(
select 1
from (select total(size) over (rows between unbounded preceding
and current row)
as total_size
from sample)
where total_size > 300)
This will filter the query to only the rows > 300, and then stop and return true (1) as soon as it finds one of them. If it never finds a row with that sum, it returns false (0).
However: this version can take longer than simply
select total(size) > 300 from sample
because it re-calculates the sum for each row, instead of just adding the next row's size to the running total.

Perform operation on all substrings of a string in SQL (MariaDB)

Disclaimer: This is not a database administration or design question. I did not design this database and I do not have rights to change it.
I have a database in which many fields are compound. For example, a single column is used for acre usage for a district. Many districts have one primary crop and the value is a single number, such as 14. Some have two primary crops and it has two numbers separated by a comma like "14,8". Some have three, four, or even five primary crops resulting in a compound value like "14,8,7,4,3".
I am pulling data out of this database for analytical research. Right now, I am pulling columns like that into R, splitting them into 5 values (padding nulls if there aren't 5 values), and performing work on the values. I want to do it in the database itself. I want to split the value on the comma, perform an operation on the resulting values, and then concatenate the result of the operation back into the original column format.
Example, I have a column that is in acres. I want it in square meters. So, I want to take "14,8", temporarily turn it into 14 and 8, multiply each of those by 4046.86, and get "56656.04,32374.88" as my result. What I am currently doing is using regexp_replace. I start with all rows where "acres REGEXP '^[0-9.]+,[0-9.]+,[0-9.]+,[0-9.]+$'" for the where clause. That gives me rows with 5 numbers in the field. Then, I can do the first number with "cast(regexp_replace(acres,',.*%','') as float) * 4046.86". I can do each of the 5 using a different regexp_replace. I can concatenate those values back together. Then, I run a query for those with 4 numbers, then 3, then 2, and finally the single number rows.
Is this possible as a single query?
Use a function to parse the string and to convert it to desired result. This will allow for you to use a sigle query for the job.

Ordering SQLite table by sum of two columns

I have a database that has two integer columns, and I'm trying to find a way to select the top 'x' amount of rows with the highest sums of these two columns. I'm trying to eliminate the need of creating a third column that stores the sum of the two, unless there's a way to to automatically update this column every time one of the other two are altered. I'm using SQLite by the way, as I know there are some slight differences here and there between SQL/SQLite syntax.
Any help is appreciated.
Something like
SELECT a, b
FROM yourtable
ORDER BY a + b DESC
LIMIT :x
should do it.

How to get multiple random numbers in sqlite3

I'm trying to sample multiple records from big table randomly. Currently, a way I can go is to make random numbers in python and use them in sqlite3. Though I know random() of sqlite gives a random number, I don't know how to get (DISTINCT) multiple random numbers in sqlite. If I can make a table filled with random numbers, it would be OK.
If you know how many rows you want, you can use something like:
SELECT DISTINCT col1, col2, etc FROM yourtable ORDER BY random() LIMIT 10;

SQLite - Update with random unique value

I am trying to populate everyrow in a column with random ranging from 0 to row count.
So far I have this
UPDATE table
SET column = ABS (RANDOM() % (SELECT COUNT(id) FROM table))
This does the job but produces duplicate values, which turned out to be bad. I added a Unique constraint but that just causes it to crash.
Is there a way to update a column with random unique values from certain range?
Thanks!
If you want to later read the records in a random order, you can just do the ordering at that time:
SELECT * FROM MyTable ORDER BY random()
(This will not work if you need the same order in multiple queries.)
Otherwise, you can use a temporary table to store the random mapping between the rowids of your table and the numbers 1..N.
(Those numbers are automatically generated by the rowids of the temporary table.)
CREATE TEMP TABLE MyOrder AS
SELECT rowid AS original_rowid
FROM MyTable
ORDER BY random();
UPDATE MyTable
SET MyColumn = (SELECT rowid
FROM MyOrder
WHERE original_rowid = MyTable.rowid) - 1;
DROP TABLE MyOrder;
What you seem to be seeking is not simply a set of random numbers, but rather a random permutation of the numbers 1..N. This is harder to do. If you look in Knuth (The Art of Computer Programming), or in Bentley (Programming Pearls or More Programming Pearls), one suggested way is to create an array with the values 1..N, and then for each position, swap the current value with a randomly selected other value from the array. (I'd need to dig out the books to check whether it is any arbitrary position in the array, or only with a value following it in the array.) In your context, then you apply this permutation to the rows in the table under some ordering, so row 1 under the ordering gets the value in the array at position 1 (using 1-based indexing), etc.
In the 1st Edition of Programming Pearls, Column 11 Searching, Bentley says:
Knuth's Algorithm P in Section 3.4.2 shuffles the array X[1..N].
for I := 1 to N do
Swap(X[I], X[RandInt(I,N)])
where the RandInt(n,m) function returns a random integer in the range [n..m] (inclusive). That's nothing if not succinct.
The alternative is to have your code thrashing around when there is one value left to update, waiting until the random number generator picks the one value that hasn't been used yet. As a hit and miss process, that can take a while, especially if the number of rows in total is large.
Actually translating that into SQLite is a separate exercise. How big is your table? Is there a convenient unique key on it (other than the one you're randomizing)?
Given that you have a primary key, you can easily generate an array of structures such that each primary key is allocated a number in the range 1..N. You then use Algorithm P to permute the numbers. Then you can update the table from the primary keys with the appropriate randomized number. You might be able to do it all with a second (temporary) table in SQL, especially if SQLite supports UPDATE statements with a join between two tables. But it is probably nearly as simple to use the array to drive singleton updates. You'd probably not want a unique constraint on the random number column while this update is in progress.

Resources