I have a web service that generates radio station playlists and I'm trying to ensure that playlists never have tracks from the same artist more than n times.
So for example (unless it is Mandatory Metallica --haha) then no artist should ever dominate any 8 hour programming segment.
Today we use a query similar to this which generates smaller randomized playlists out of existing very large playlists:
SELECT FilePath FROM vwPlaylistTracks
WHERE Owner='{0}' COLLATE NOCASE AND
Playlist='{1}' COLLATE NOCASE
ORDER BY RANDOM()
LIMIT {2};
Someone then has to manually review the playlists and do some manual editing if the same artist appears consecutively or more than the desired limit.
Supposing the producer wants to ensure that no artist appears more than twice in the span of the playlist generated in this query (and assuming there is an artist field in the vwPlaylistTracks view; which there is) is GROUP BY the correct way to accomplish this?
I've been messing around with the view trying to accomplish this but this query always only returns 1 track from each artist.
SELECT
a.Name as 'Artist',
f.parentPath || '\' || f.fileName as 'FilePath',
p.name as 'Playlist',
u.username as 'Owner'
FROM mp3_file f,
mp3_track t,
mp3_artist a,
mp3_playlist_track pt,
mp3_playlist p,
mp3_user u
WHERE f.file_id = t.track_id
AND t.artist_id = a.artist_id
AND t.track_id = pt.track_id
AND pt.playlist_id = p.playlist_id
AND p.user_id = u.user_id
--AND p.Name = 'Alternative Rock'
GROUP BY a.Name
--HAVING Count(a.Name) < 3
--ORDER BY RANDOM()
--LIMIT 50;
GROUP BY creates exactly one result record for each distinct value in the grouped column, so this is not what you want.
You have to count any previous records with the same artist, which is not easy because the random ordering is not stable.
However, this is possible with a temporary table, which is ordered by its rowid:
CREATE TEMPORARY TABLE RandomTracks AS
SELECT a.Name as Artist, parentPath, name, username
FROM ...
WHERE ...
ORDER BY RANDOM();
CREATE INDEX RandomTracks_Artist on RandomTracks(Artist);
SELECT *
FROM RandomTracks AS r1
WHERE -- filter out if there are any two previous records with the same artist
(SELECT COUNT(*)
FROM RandomTracks AS r2
WHERE r2.Artist = r1.Artist
AND r2.rowid < r1.rowid
) < 2
AND -- filter out if the directly previous record has the same artist
r1.Artist IS NOT (SELECT Artist
FROM RandomTracks AS r3
WHERE r3.rowid = r1.rowid - 1)
LIMIT 50;
DROP TABLE RandomTracks;
It might be easier and faster to just read the entire playlist and to filter and reorder it in your code.
Related
Using SQLite I can get all tablenames in my database:
SELECT name AS Tablename FROM sqlite_master WHERE type = 'table'
Result will be some tablenames, for example:
Tablename:
cars
planes
bus
How could I have a SQL query that will count the number of records for each table that is found, result should be:
Tablename Records:
cars 100
planes 200
bus 300
I understand that in this example I simply could run 3 SELECT COUNT() statements, however the number of tables can vary so that I can not hardcode a fixed number of SELECT COUNT()
All table and column names in a statement need to be known at the time it is compiled, so you can't do this dynamically.
You'd have to programmatically build up a new query string based on the results of getting the table names from sqlite_master. Either one query per table like you mentioned, or all together by creating something that looks like
SELECT 'table1' AS Tablename, count(*) AS Records FROM table1
UNION ALL
SELECT 'table2', count(*) FROM table2
-- etc.
You don't mention what language you're working in, so in psuedo-code of a functional style:
var allcounts = query("SELECT name FROM sqlite_master WHERE type = 'table'")
.map(name -> "SELECT '$name' AS Tablename, count(*) AS Records FROM \"$name\"")
.join(" UNION ALL ");
var totals = query(allcounts);
I stumbled unto this standard SQL BigQuery documentation this week, which got me started with a Firebase Analytics Closed Funnel. I however got the wrong results (view image below). There should be no users that had a "Tutorial_LessonCompleted" before they did not start a "Tutorial_LessonStarted >> Lesson = 1 " first. This could be because of various reasons.
Questions:
Is it wise to use the User Property = "first_open_time", or is it better to use the Event = "first_open". How would the latter implementation look like ?
I suspect I am perhaps not correctly drilling down to: Event (String = "Tutorial_LessonStarted") >> parameter (String = "LessonNumber") >> value (String = "lesson1")?
How would a filter on _TABLE_SUFFIX = '20170701' possibly work, I read this will be cheaper. Any optimised code suggestions are received with open arms and an up-vote!
#standardSQL
SELECT
step1, step2, step3, step4, step5, step6,
COUNT(*) AS funnel_count,
COUNT(DISTINCT user_id) AS users
FROM (
SELECT
user_dim.app_info.app_instance_id AS user_id,
event.timestamp_micros AS event_timestamp,
event.name AS step1,
LEAD(event.name, 1) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step2,
LEAD(event.name, 2) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step3,
LEAD(event.name, 3) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step4,
LEAD(event.name, 4) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step5,
LEAD(event.name, 5) OVER (
PARTITION BY user_dim.app_info.app_instance_id
ORDER BY event.timestamp_micros ASC) as step6
FROM
`......`,
UNNEST(event_dim) AS event,
UNNEST(user_dim.user_properties) AS user_prop
WHERE user_prop.key = "first_open_time"
ORDER BY 1, 2, 3, 4, 5 ASC
)
WHERE step6 = "Tutorial_LessonStarted" AND EXISTS (
SELECT *
FROM `......`,
UNNEST(event_dim) AS event,
UNNEST(event.params)
WHERE key = 'LessonNumber' AND value.string_value = "lesson1") GROUP BY step1, step2, step3, step4, step5, step6
ORDER BY funnel_count DESC
LIMIT 100;
Note:
Enter your query table FROM, i.e:project_id.com_game_example_IOS.app_events_20170212,
I left out the funnel_count and user_count.
Output:
----------------------------------------------------------
Update since original question above:
#Elliot: I don’t understand why you said: -- ensure that an event with lesson1 precedes Tutorial_LessonStarted.
Tutorial_LessonStarted has a parameter "LessonNumber" with values lesson1,lesson2,lesson3,lesson4.
I want to count all funnels that took place with a last step in the funnel equal to LessonNumber=lesson1.
So, applied to event log-data for a brand new user's first session (aka: an user that fired first_open_time), the answer would be the table below:
View.OnboardingWelcomePage
View.OnboardingFinalPage
View.JamLoading
View.JamLoading
Jam.UserViewsJam
Jam.ProjectOpened
View.JamMixer
Tutorial.LessonStarted (This parameter “LessonNumber"'s value would be equal to “lesson1”)
Jam.ProjectPlayStarted
View.JamLoopSelector
View.JamMixer
View.JamLoopSelector
View.JamMixer
View.JamLoopSelector
View.JamMixer
Tutorial.LessonCompleted
Tutorial.LessonStarted (This parameter “LessonNumber"'s value would be equal to “lesson2”)
So it is important to firstly get all the users that had a first_open_time on a specific day, as well structure the events into a funnel so that the last event in the funnel is one which matches an event and a specific parameter value, and then form the funnel "backwards" from there.
Let me go through some explanation, then see if I can suggest a query to get you started.
It looks like you want to analyze the sequence of events in your analytics data, but the sequence is already there for you--you have an array of the events. Looking at the Firebase schema for BigQuery, event_dim is the relevant column, and unless I'm misunderstanding something, these events are ordered by time. If you want to check what the sixth event's name was, you can use:
event_dim[SAFE_ORDINAL(6)].name
This will evaluate to NULL if there were fewer than six events, or else it will give you the string with the event name.
Another observation is that you are attempting to analyze both event_dim and user_dim, but you are taking the cross product of the two, which will explode the number of rows and make it hard to reason about the results of the query. To look for a specific user property, use an expression of this form:
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>'
Combining these two filters, your FROM and WHERE clause would look something like this:
FROM `project_id.com_game_example_IOS.app_events_*`
WHERE _TABLE_SUFFIX = '20170701' AND
event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>'
Using the bracket operator to access the steps from event_dim, we can do something like this:
WITH FilteredInput AS (
SELECT *
FROM `project_id.com_game_example_IOS.app_events_*`
WHERE _TABLE_SUFFIX = '20170701' AND
event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'first_open_time') = '<expected property value>' AND
-- ensure that an event with lesson1 precedes Tutorial_LessonStarted
EXISTS (
SELECT 1
FROM UNNEST(event_dim) WITH OFFSET event_offset
CROSS JOIN UNNEST(params)
WHERE key = 'LessonNumber' AND
value.string_value = 'lesson1' AND
event_offset < 5
)
)
SELECT
event_dim[ORDINAL(1)].name AS step1,
event_dim[ORDINAL(2)].name AS step2,
event_dim[ORDINAL(3)].name AS step3,
event_dim[ORDINAL(4)].name AS step4,
event_dim[ORDINAL(5)].name AS step5,
event_dim[ORDINAL(6)].name AS step6,
COUNT(*) AS funnel_count,
COUNT(DISTINCT user_dim.user_id) AS users
FROM FilteredInput
GROUP BY step1, step2, step3, step4, step5, step6;
This will return all unique "paths" along with a count and number of distinct users for each. Note that I'm just writing this off the top of my head--I don't have representative data that I can try it on--so there may be syntax or other errors.
Imagine, that I need to retrieve all records excluding, those, that are associated
with specific ID, for instance if you consider table below and chose RestaurantID 1, resulting table should not include rows, that contain CuisineID 3,4 and 7.
If RestaurantID is 6, then resulting table should return anything without CuisineID 1 and 8
and so on
My table
Kind regards
erwre
if you do a subselect with your query, you can get a list of which CuisineID's to exclude using the NOT IN clause.
select
t.*
from
mytable t
where
t.CuisineID NOT IN
(
select
t2.CuisineID
from
mytable t2
where
t2.ID = #YOUR_RESTAURANT_ID
)
I'm trying to write a query on two simple tables. Tables are simple, the query is not :)
Anyway...
Here is the database scheme :
and here is an overview of table content :
I'm trying to write a query that would list all assets in corresponding table, only if the are marked as "wanted" (meaning the boolean field asset_owned =0) and that are referenced for another owner as "owned".
This is what I have so far and it works :
SELECT
user.user_pseudo AS REQUESTER,
asset.asset_sku AS SKU,
asset.asset_name AS ASSET_NAME
FROM
asset
INNER JOIN user ON asset.id_user = user.id
WHERE
asset.asset_owned = 0
AND
asset.asset_sku IN (SELECT asset.asset_sku FROM asset WHERE asset.asset_owned = 1)
But, in the same query (if possible) I would like to get the owner name as well.
The first result of such a query on those table would be :
me,003,Test003,you.
I've tried inline SELECT and nested subqueries like :
SELECT
user.user_pseudo as ASKER,
asset.asset_sku as SKU,
asset.asset_name as NAME,
subquery1.user.user_pseudo as OWNER
FROM
asset
INNER JOIN user ON asset.id_user = user.id,
(SELECT user.user_pseudo.asset_asset_sku FROM asset INNER JOIN user ON asset.id_user = user.id WHERE asset.asset_owned = 1) subquery1
WHERE
asset.asset_owned = 0 AND
subquery1.asset.asset_sku IN (SELECT asset.asset_sku FROM asset INNER JOIN user ON asset.id_user = user.id WHERE asset.asset_owned=1)
but of course that does not work.
Thanks for any direction you could point me to.
happy new year
Mathias
So this was fun for me (I'm learning SQL, so this is good practice!) - I appreciate the very clear question.
Hopefully this works for you - I used two sub-queries (one each for 'owner' and 'requester') and then joined those on SKU and name. It works in SQLite with the small sample data shown above.
SELECT requester, subq1.SKU, subq1.name, owner
FROM
(SELECT pseudo AS requester, SKU, name
FROM asset, user
WHERE owned = 0
AND user.id = id_user) subq1,
(SELECT pseudo AS owner, SKU, name
FROM asset, user
WHERE owned = 1
AND asset.id_user = user.id) subq2
WHERE subq1.SKU = subq2.SKU
AND subq1.name = subq2.name;
I have a product page on a webpage that shows categories of products. This is done with a listview populated from a database. The issue that I have is that the main supplier has demanded that their products are first in the category list. So what I need to do is run a query that will return the results, display those two categories first and then display the rest alphabetically.
So I've been trying to do this using a UNION ALL query like this:
SELECT cat, cat_id, image FROM prod_categories WHERE cat_id = 19 OR cat_id = 65
UNION ALL
SELECT cat, cat_id, image FROM prod_categories WHERE cat_id <> 19 AND cat_id <> 65
I thought with a union like this it would display the results of the first select query first, but it's not doing that.
I can add an 'order by cat' clause on the end, but obviously that only displays them in the correct order if the two categories I want to display come first alphabetically, which they don't.
If anyone has any ideas how to do this it would be greatly appreciated.
Thanks
How about this:
SELECT cat, cat_id, image FROM prod_categories
order by case when cat_id in (19, 65) then 1 else 2 end, cat_id
Cuts out the need to UNION altogether. Might even produce a more efficient execution plan (possibly...).
(using Transact-SQL for SQL Server - the exact syntax may have to be tinkered for MySql etc)
Try something like this.
SELECT cat, cat_id, image, 1 as [srt]
FROM prod_categories WHERE cat_id = 19 OR cat_id = 65
UNION ALL
SELECT cat, cat_id, image, 2 as [srt]
FROM prod_categories WHERE cat_id <> 19 AND cat_id <> 65
ORDER BY srt ASC, cat_id
Don't hard-code this into your query. What happens when the next supplier wants to come second? Or last? For that matter, you may want to list categories in some sort of "group", anyways.
Instead, you should be using an ordering table (or multiple). Something simple to get you started:
CREATE TABLE Category_Order (categoryId INTEGER -- fk to category.id, unique
priority INTEGER) -- when to display category
Then you want to insert the values for the current "special" categories:
INSERT INTO Category_Order (categoryId, priority) VALUES (19, 2147483647), (65, 0)
You'll also need an entry for rows that are not currently prioritized:
INSERT INTO Category_Order (categoryId, priority)
SELECT catId, -2147483648
FROM prod_categories
WHERE catID NOT IN (19, 65)
Which can then be queried like this:
SELECT cat, cat_id, image
FROM prod_categories
JOIN Category_Order
ON category_id = cat_id
ORDER BY priority DESC, cat
If you write a small maintenance program for this table, you can then push re-ordering duties off onto the correct business department. Reordering of entries can be accomplished by splitting the difference between existing entries, although you'll want a procedure to re-distribute if things get too crowded.
Note that, in the event your db supports a clause like ORDER BY priority NULLS LAST, the entries for non-prioritized categories are unnecessary, and you can simply LEFT JOIN to the ordering table.