sqlite, order by date/integer in joined table - sqlite

I have two tables
Names
id | name
---------
5 | bill
15 | bob
10 | nancy
Entries
id | name_id | added | description
----------------------------------
2 | 5 | 20140908 | i added this
4 | 5 | 20140910 | added later on
9 | 10 | 20140908 | i also added this
1 | 15 | 20140805 | added early on
6 | 5 | 20141015 | late to the party
I'd like to order Names by the first of the numerically-lowest added values in the Entries table, and display the rows from both tables ordered by the added column overall, so the results will be something like:
names.id | names.name | entries.added | entries.description
-----------------------------------------------------------
15 | bob | 20140805 | added early on
5 | bill | 20140908 | i added this
10 | nancy | 20140908 | i also added this
I looked into joins on the first item (e.g. SQL Server: How to Join to first row) but wasn't able to get it to work.
Any tips?

Give this query a try:
SELECT Names.id, Names.name, Entries.added, Entries.description
FROM Names
INNER JOIN Entries
ON Names.id = Entries.name_id
ORDER BY Entries.added
Add DESC if you want it in reverse order i.e.: ORDER BY Entries.added DESC.

This should do it:
SELECT n.id, n.name, e.added, e.description
FROM Names n INNER JOIN
(SELECT name_id, description, Min(added) FROM Entries GROUP BY name_id, description) e
ON n.id = e.name_id
ORDER BY e.added

Related

De duping Table Joined to itself

I have the following table:
ID|ID2
-----+---
1234 |56473
56473|1234
34521|56473
35462|23457
23457|35462
56473|34521
As you can see these ids are linked together via a previous join based upon different fields, the combination of these ids repeats itself throughout the table just in a different order
Desired output:
ID|ID2
-----+---
1234 |56473
34521|56473
35462|23457
You can use MIN() and MAX() functions:
select distinct
min(ID, ID2) ID, max(ID, ID2) ID2
from tablename
See the demo.
Results:
| ID | ID2 |
| ----- | ----- |
| 1234 | 56473 |
| 34521 | 56473 |
| 23457 | 35462 |

Split data in SQLite column

I have a SQLite database that looks similar to this:
---------- ------------ ------------
| Car | | Computer | | Category |
---------- ------------ ------------
| id | | id | | id |
| make | | make | | record |
| model | | price | ------------
| year | | cpu |
---------- | weight |
------------
The record column in my Category table contains a comma separated list of the table name and id of the items that belong to that Category, so an entry would look like this:
Car_1,Car_2.
I am trying to split the items in the record on the comma to get each value:
Car_1
Car_2
Then I need to take it one step further and split on the _ and return the Car records.
So if I know the Category id, I'm trying to wind up with this in the end:
---------------- ------------------
| Car | | Car |
---------------| -----------------|
| id: 1 | | id: 2 |
| make: Honda | | make: Toyota |
| model: Civic | | model: Corolla |
| year: 2016 | | year: 2013 |
---------------- ------------------
I have had some success on splitting on the comma and getting 2 records back, but I'm stuck on splitting on the _ and making the join to the table in the record.
This is my query so far:
WITH RECURSIVE record(recordhash, data) AS (
SELECT '', record || ',' FROM Category WHERE id = 1
UNION ALL
SELECT
substr(data, 0, instr(data, ',')),
substr(data, instr(data, ',') + 1)
FROM record
WHERE data != '')
SELECT recordhash
FROM record
WHERE recordhash != ''
This is returning
--------------
| recordhash |
--------------
| Car_1 |
| Car_2 |
--------------
Any help would be greatly appreciated!
If your recursive CTE works as expected then you can split each of the values of recordhash with _ as a delimiter and use the part after _ as the id of the rows from Car to return:
select * from Car
where id in (
select substr(recordhash, 5)
from record
where recordhash like 'Car%'
)

'merge' rows if they are duplicated in a table - SQLite

Table is the following:
CREATE TABLE UserLog(uid TEXT, clicks INT, lang TEXT)
Where uid field should be unique.
Here is some sample data:
| uid | clicks | lang |
----------------------------------------
| "898187354" | 4 | "ru" |
| "898187354" | 4 | "ru" |
| "123456789" | 1 | <null> |
| "123456789" | 10 | "en" |
| "140922382" | 13 | <null> |
As you can see, I have multiple rows with where the uid field is now duplicated. I would like for those rows to be merged in a following way:
clicks fields are added, and lang fields are updated if their previous value was null.
For the data shown above, it would look something like this:
| uid | clicks | lang |
---------------------------------------
| "898187354" | 8 | "ru" |
| "123456789" | 11 | "en" |
| "140922382" | 13 | <null> |
It seems that I can find many ways to simply delete duplicate data, which I do not necessarily want to do. I'm unsure how I can introduce logic in SQL statements that does this.
First update:
update userlog
set
clicks = (select sum(u.clicks) from userlog u where u.uid = userlog.uid),
lang = (select max(u.lang) from userlog u where u.uid = userlog.uid)
where not exists (
select 1 from userlog u
where u.uid = userlog.uid and u.rowid < userlog.rowid
);
and then delete the duplicate rows that are not needed:
delete from userlog
where exists (
select 1 from userlog u
where u.uid = userlog.uid and u.rowid < userlog.rowid
);

What database schema to use for storing survey answers

I'm required for designing a survey system for our customer.
It's based on asp.net, and the database used is oracle.
I've no experience here so I'd like to ask for advice about:
What database schema to use for storing user answers, I'm afraid my current design is likely to have performance issue...
About the survey:
There'll be two or more surveys going on at the same time.
Surveys may be triggered once a year or more frequently, so I think I need a Survey Period table.
Surveys are targeting different products, so there'll be a mapping between products and surveys
Currently my design:
Survey Category table
+------------+--------------+
| CatageryId | CatageryName |
+------------+--------------+
| 1 | cat1 |
| 2 | cat2 |
+------------+--------------+
Survey Category version table
+-----------+------------+--------------------+
| VersionId | CatageryId | VersionDescription |
+-----------+------------+--------------------+
| 1 | 1 | 'cat1 version1' |
| 2 | 1 | 'cat1 version2' |
| 3 | 2 | 'cat2 version1' |
+-----------+------------+--------------------+
Survey Period Table
+----------+--------------------+
| PeriodId | PeriodDescription |
+----------+--------------------+
| 1 | 'cat1 period2016' |
| 2 | 'cat1 period2017' |
| 3 | 'cat2 period2016' |
+----------+--------------------+
Survey Period-Version map table
+----------+-----------+
| PeriodId | VersionId |
+----------+-----------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 3 | 3 |
+----------+-----------+
A Version-Question map table
+--------------+------------+
| VersionId | | QuestionId |
+--------------+------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
+--------------+------------+
A Version-Product map table
+-----------+-----------+
| VersionId | ProductId |
+-----------+-----------+
| 1 | 'prodA' |
| 1 | 'prodB' |
| 1 | 'prodC' |
| 2 | 'prodA' |
+-----------+-----------+
And to Store the survey result data, I have to put lots of duplicated information between rows of record:
User Answer table
+----------+------------+----------+-----------+-----------+--------+-----------+
| AnswerId | QuestionId | PeriodId | UserId/Ip | ProductId | Answer | VersionId |
+----------+------------+----------+-----------+-----------+--------+-----------+
| 1 | 1 | 1 | 'adam' | 'prodA' | 'Yes' | 2 |
| 2 | 2 | 1 | 'Joe' | 'prodA' | 'Yes' | 2 |
| 3 | 1 | 2 | 'adam' | 'prodB' | 'A' | 3 |
+----------+------------+----------+-----------+-----------+--------+-----------+
We're expecting tens of products and thousands of users for this system.
So assume 30 products, 5000 users, 50 questions per survey and 4 surveys per year
in the current design, there'll be 5000 * 4 * 50 * 30 = 30 millions of records added in the User Answer Table per year,
I'm really afraid if it could still work properly..., so any suggestions for optimizing?
Edit 1:
Add VersionId column in user answer table as suggested.
This looks like a case of premature optimization. You should probably worry more about correctness and flexibility than performance.
30 million rows per year, especially in these skinny tables, is a small amount of data for any Oracle system. Don't worry too much about indexes and partitioning yet, those can be added later if necessary.
Your solution is similar to the Entity Attribute Value (EAV) model. It's worth knowing that term since much has been written about it. There are 2 common problems with EAV models you want to avoid:
Avoid extremes. Don't use EAV for everything, but don't completely avoid it either. EAV is slow and inconvenient compared to a normal table structure. It should not be used for every interesting columns, otherwise you have created a database within a database. For example, if virtually every survey has fields like a username and a date created, store those as regular columns and not in a generic column. It's OK to have a column that is only populated 99% of the time. On the other hand, it's a bad idea to always avoid the EAV and try to hack something together with 1,000 column tables or object-relational types.
Always use the correct type. Always, always, always store data as the correct type. Store numbers as numbers, dates as dates, and strings as strings. Your queries will be easier, faster, and safer, if you have at least three columns for the data: ANSWER_NUMBER, ANSWER_STRING, ANSWER_DATE. I explain the type safety problem more in this answer. Those extra columns may look bad in the model diagram, but they are a life-saver when you're querying the data.

How to make a query for getting the specific rows with the latest time column value

Below is my sample data, I would like to get the host:value pair with the latest time.
+------+-------+-------+
| HOST | VALUE | TIME |
+------+-------+-------+
| A | 100 | 13:40 |
| A | 150 | 13:00 |
| A | 222 | 13:23 |
| B | 210 | 13:55 |
| B | 300 | 13:44 |
+------+-------+-------+
Wanted to get only rows with the latest time value for the each host column value.
The result should be like:
A 150 13:40
B 210 13:55
I think there are several analytical function to achieve this requirement in Oracle but I'm not sure what can I do in SQLite.
Can you let me know how I can make a query?
Here is an ANSI-compliant way of performing your query which should run on all versions of SQLite. For a potentially shorter solution see the answer by #CL.
SELECT t1.HOST || '-' || t1.VALUE || '-' || t1.TIME AS HOSTVALUETIME
FROM table t1 INNER JOIN
(
SELECT HOST, MAX(TIME) AS MAXTIME
FROM table
GROUP BY HOST
) t2
ON t1.HOST = t2.HOST AND t1.TIME = t2.MAXTIME
ORDER BY t1.HOST DESC
Output:
+---------------+
| HOSTVALUETIME |
+---------------+
| A-100-13:50 |
| B-210-13:55 |
+---------------+
In SQLite 3.7.11 or later, MAX() selects from which row in a group the other column values come:
SELECT Host,
Value,
MAX(Time)
FROM TheNameOfThisTableIsSoSecretThatICantTellYou
GROUP BY Host;

Resources