'merge' rows if they are duplicated in a table - SQLite - sqlite

Table is the following:
CREATE TABLE UserLog(uid TEXT, clicks INT, lang TEXT)
Where uid field should be unique.
Here is some sample data:
| uid | clicks | lang |
----------------------------------------
| "898187354" | 4 | "ru" |
| "898187354" | 4 | "ru" |
| "123456789" | 1 | <null> |
| "123456789" | 10 | "en" |
| "140922382" | 13 | <null> |
As you can see, I have multiple rows with where the uid field is now duplicated. I would like for those rows to be merged in a following way:
clicks fields are added, and lang fields are updated if their previous value was null.
For the data shown above, it would look something like this:
| uid | clicks | lang |
---------------------------------------
| "898187354" | 8 | "ru" |
| "123456789" | 11 | "en" |
| "140922382" | 13 | <null> |
It seems that I can find many ways to simply delete duplicate data, which I do not necessarily want to do. I'm unsure how I can introduce logic in SQL statements that does this.

First update:
update userlog
set
clicks = (select sum(u.clicks) from userlog u where u.uid = userlog.uid),
lang = (select max(u.lang) from userlog u where u.uid = userlog.uid)
where not exists (
select 1 from userlog u
where u.uid = userlog.uid and u.rowid < userlog.rowid
);
and then delete the duplicate rows that are not needed:
delete from userlog
where exists (
select 1 from userlog u
where u.uid = userlog.uid and u.rowid < userlog.rowid
);

Related

MariaDB DATETIME Index not working with Between FROM_UNIXTIME()

I have a table with DATETIME field, which is indexed by a BTree. Now i want to query it with following statement:
SELECT
count(us.CITY) as metric,
us.CITY as Name,
us.LATITUDE as latitude,
us.LONGITUDE as longitude
FROM
FACT
LEFT JOIN
USER us
ON
us.ID_USER = FACT.USER
WHERE
ASSESSMENT_DATE BETWEEN FROM_UNIXTIME(1601568552) AND FROM_UNIXTIME(1604028277)
GROUP BY us.CITY, us.LATITUDE, us.LONGITUDE;
EXPLAIN:
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | FACT | ALL | INDEX_FACT_ASSESSMENT_DATE | NULL | NULL | NULL | 762621 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | us | eq_ref | PRIMARY | PRIMARY | 46 | dwh0.FACT.USER,dwh0.FACT.ENV | 1 | |
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
2 rows in set (0.001 sec)
Interestingly, by only changing the dates manually into the DATETIME Format string it uses the index. But the FROM_UNIXTIME() function should in my opinion return the exactly same thing...
SELECT
count(us.CITY) as metric,
us.CITY as Name,
us.LATITUDE as latitude,
us.LONGITUDE as longitude
FROM
FACT
LEFT JOIN
USER us
ON
us.ENV = FACT.ENV AND us.ID_USER = FACT.USER
WHERE
-- ASSESSMENT_DATE BETWEEN FROM_UNIXTIME(1596649101) AND FROM_UNIXTIME(1599108827)
ASSESSMENT_DATE BETWEEN '2020-08-05 11:30:11.987' AND '2020-09-03 11:30:11.987'
GROUP BY us.CITY, us.LATITUDE, us.LONGITUDE;
EXPLAIN:
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
| 1 | SIMPLE | FACT | range | INDEX_FACT_ASSESSMENT_DATE | INDEX_FACT_ASSESSMENT_DATE | 5 | NULL | 132008 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | us | eq_ref | PRIMARY | PRIMARY | 46 | dwh0.FACT.USER,dwh0.FACT.ENV | 1 |
|
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
2 rows in set (0.001 sec)
Can anyone refer to such a problem? the where clause is generated by grafana, so i can not change that, but the rest i can change if it changes something.
Thanks for suggestions!
Sorry for bothering.. after around 10^5 more inserts, it works for both cases... Maybe it was just bad luck

Split data in SQLite column

I have a SQLite database that looks similar to this:
---------- ------------ ------------
| Car | | Computer | | Category |
---------- ------------ ------------
| id | | id | | id |
| make | | make | | record |
| model | | price | ------------
| year | | cpu |
---------- | weight |
------------
The record column in my Category table contains a comma separated list of the table name and id of the items that belong to that Category, so an entry would look like this:
Car_1,Car_2.
I am trying to split the items in the record on the comma to get each value:
Car_1
Car_2
Then I need to take it one step further and split on the _ and return the Car records.
So if I know the Category id, I'm trying to wind up with this in the end:
---------------- ------------------
| Car | | Car |
---------------| -----------------|
| id: 1 | | id: 2 |
| make: Honda | | make: Toyota |
| model: Civic | | model: Corolla |
| year: 2016 | | year: 2013 |
---------------- ------------------
I have had some success on splitting on the comma and getting 2 records back, but I'm stuck on splitting on the _ and making the join to the table in the record.
This is my query so far:
WITH RECURSIVE record(recordhash, data) AS (
SELECT '', record || ',' FROM Category WHERE id = 1
UNION ALL
SELECT
substr(data, 0, instr(data, ',')),
substr(data, instr(data, ',') + 1)
FROM record
WHERE data != '')
SELECT recordhash
FROM record
WHERE recordhash != ''
This is returning
--------------
| recordhash |
--------------
| Car_1 |
| Car_2 |
--------------
Any help would be greatly appreciated!
If your recursive CTE works as expected then you can split each of the values of recordhash with _ as a delimiter and use the part after _ as the id of the rows from Car to return:
select * from Car
where id in (
select substr(recordhash, 5)
from record
where recordhash like 'Car%'
)

How to obtain distinct values based on another column in the same table?

I'm not sure how to word the title properly so sorry if it wasn't clear at first.
What I want to do is to find users that have logged into a specific page, but not the other.
The table I have looks like this:
Users_Logins
------------------------------------------------------
| IDLogin | Username | Page | Date | Hour |
|---------|----------|-------|------------|----------|
| 1 | User_1 | Url_1 | 2019-05-11 | 11:02:51 |
| 2 | User_1 | Url_2 | 2019-05-11 | 14:16:21 |
| 3 | User_2 | Url_1 | 2019-05-12 | 08:59:48 |
| 4 | User_2 | Url_1 | 2019-05-12 | 16:36:27 |
| ... | ... | ... | ... | ... |
------------------------------------------------------
So as you can see, User 1 logged into Url 1 and 2, but User 2 logged into Url 1 only.
How should I go about finding users that logged into Url 1, but never logged into Url 2 during a certain period of time?
Thanks in advance!
I will try to improve the title of your question later, but for the time being, this is how I accomplished what you are asking for:
Query:
select distinct username from User_Logins
where page = 'Url_1'
and username not in
(select username from User_Logins
where Page = 'Url_2')
and date BETWEEN '2019-05-12' AND '2019-05-12'
and hour BETWEEN '00:00:00' AND '12:00:00';
Returns:
User_2
Comments:
I basically used a sub query to filter out the usernames you don't care about. :)
The time range is getting only 1 result, which you can test by removing the "distinct" in the first line of the query. If you then remove the time range from the query, you'll get 2 results.
You can do it with group by username and apply the conditions in a HAVING clause:
select username
from User_Logins
where
date between '..........' and '..........'
and
hour between '..........' and '..........';
group by username
having
sum(page = 'Url_1') > 0
and
sum(page = 'Url_2') = 0
Replace the dots with the date/time intervals you want.

Getting App Maker to respect the order of a many-to-many relation

I'm having some trouble getting App Maker to respect the order of a many-to-many relation.
Let's say I have two models:
Model 1 has an ID and a many-to-many relation to model 2 which also has an ID.
App maker generates three tables:
DESCRIBE model_1;
+--------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+----------------+
| Id | int(11) | NO | PRI | NULL | auto_increment |
+--------------------+--------------+------+-----+---------+----------------+
DESCRIBE model_2;
+--------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+----------------+
| Id | int(11) | NO | PRI | NULL | auto_increment |
+--------------------+--------------+------+-----+---------+----------------+
DESCRIBE model_1_Has_model_2;
+------------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+---------+------+-----+---------+-------+
| parentModel1_fk | int(11) | NO | MUL | NULL | |
| childModel2_fk | int(11) | NO | MUL | NULL | |
+------------------+---------+------+-----+---------+-------+
Now let's say I have a model_1 object with ID 1 and three model_2 objects with IDs 1, 2, 3. If I assign model_1.childModel_2 to [model_2_ID_1, model_2_ID_2] the model_1_Has_model_2 table will contain:
parentModel1_fk | childModel2_fk
--------------------------------
1 | 1
1 | 2
Now let's say I splice model_1.childModel_2 using model_1.childModel_2.splice(0, 1) and then insert model_2 ID 3 in index 0 using model_1.childModel_2.splice(0, 0, model_2_ID_3). I would expect my table to contain the following:
parentModel1_fk | childModel2_fk
--------------------------------
1 | 3
1 | 1
However it contains the opposite:
parentModel1_fk | childModel2_fk
--------------------------------
1 | 1
1 | 3
Is there any way I can stop this behavior short of clearing the entire relation and then setting it to my new expected order?
The short answer is no. App Maker is just creating a new record, not rearranging the table. Otherwise it would have to edit all the records below the desired insertion point (which could be a prohibitively time consuming transaction). If this is the desired functionality, you'll have to do it manually.
I would seriously consider creating your own join table that will allow you to have additional columns, where you can store the desired sort order.

sqlite, order by date/integer in joined table

I have two tables
Names
id | name
---------
5 | bill
15 | bob
10 | nancy
Entries
id | name_id | added | description
----------------------------------
2 | 5 | 20140908 | i added this
4 | 5 | 20140910 | added later on
9 | 10 | 20140908 | i also added this
1 | 15 | 20140805 | added early on
6 | 5 | 20141015 | late to the party
I'd like to order Names by the first of the numerically-lowest added values in the Entries table, and display the rows from both tables ordered by the added column overall, so the results will be something like:
names.id | names.name | entries.added | entries.description
-----------------------------------------------------------
15 | bob | 20140805 | added early on
5 | bill | 20140908 | i added this
10 | nancy | 20140908 | i also added this
I looked into joins on the first item (e.g. SQL Server: How to Join to first row) but wasn't able to get it to work.
Any tips?
Give this query a try:
SELECT Names.id, Names.name, Entries.added, Entries.description
FROM Names
INNER JOIN Entries
ON Names.id = Entries.name_id
ORDER BY Entries.added
Add DESC if you want it in reverse order i.e.: ORDER BY Entries.added DESC.
This should do it:
SELECT n.id, n.name, e.added, e.description
FROM Names n INNER JOIN
(SELECT name_id, description, Min(added) FROM Entries GROUP BY name_id, description) e
ON n.id = e.name_id
ORDER BY e.added

Resources