De duping Table Joined to itself - sqlite

I have the following table:
ID|ID2
-----+---
1234 |56473
56473|1234
34521|56473
35462|23457
23457|35462
56473|34521
As you can see these ids are linked together via a previous join based upon different fields, the combination of these ids repeats itself throughout the table just in a different order
Desired output:
ID|ID2
-----+---
1234 |56473
34521|56473
35462|23457

You can use MIN() and MAX() functions:
select distinct
min(ID, ID2) ID, max(ID, ID2) ID2
from tablename
See the demo.
Results:
| ID | ID2 |
| ----- | ----- |
| 1234 | 56473 |
| 34521 | 56473 |
| 23457 | 35462 |

Related

How to match two columns in one dataframe using values in another dataframe in R

I have two dataframes. One is a set of ≈4000 entries that looks similar to this:
| grade_col1 | grade_col2 |
| --- | --- |
| A-| A-|
| B | 86|
| C+| C+|
| B-| D |
| A | A |
| C-| 72|
| F | 96|
| B+| B+|
| B | B |
| A-| A-|
The other is a set of ≈700 entries that look similar to this:
| grade | scale |
| --- | --- |
| A+|100|
| A+| 99|
| A+| 98|
| A+| 97|
| A | 96|
| A | 95|
| A | 94|
| A | 93|
| A-| 92|
| A-| 91|
| A-| 90|
| B+| 89|
| B+| 88|
...and so on.
What I'm trying to do is create a new column that shows whether grade_col2 matches grade_col1 with a binary, 0-1 output (0 = no match, 1 = match). Most of grade_col2 is shown by letter grade. But every once in awhile an entry in grade_col2 was accidentally entered as a numeric grade instead. I want this match column to give me a "1" even when grade_col2 is a numeric grade instead of a letter grade. In other words, if grade_col1 is B and grade_col2 is 86, I want this to still be read as a match. Only when grade_col1 is F and grade_col2 is 96 would this not be a match (similar to when grade_col1 is B- and grade_col2 is D = not a match).
The second data frame gives me the information I need to translate between one and the other (entries between 97-100 are A+, between 93-96 are A, and so on). I just don't know how to run a script that uses this information to find matches through all ≈4000 entries. Theoretically, I could do this manually, but the real dataset is so lengthy that this isn't realistic.
I had been thinking of using nested if_else statements with dplyr. But once I got past the first "if" statement, I got stuck. I'd appreciate any help with this people can offer.
You can do this using a join.
Let your first dataframe be grades_df and your second dataframe be lookup_df, then you want something like the following:
output = grades_df %>%
# join on look up, keeping everything grades table
left_join(lookup_df, by = c(grade_col2 = "scale")) %>%
# combine grade_col2 from grades_df and grade from lookup_df
mutate(grade_col2b = ifelse(is.na(grade), grade_col2, grade)) %>%
# indicator column
mutate(indicator = ifelse(grade_col1 == grade_col2b, 1, 0))

How to separate the unique values from a column in kusto and make new rows for them?

I have a table in Kusto. It looks like this:-
------------------
| Tokens | Shop |
------------------
| a | P |
| a,b | Q |
| c,d,e | P |
| c | R |
| c,d | Q |
------------------
There are total 12 distinct tokens and tokens column can have any permutation of them(including empty) and the shop column will only have one fixed value out of 5 values possible(can't be empty).
I want to get an output table, having three columns, like this:-
----------------------------------
| Distinct Tokens | Shop | Count |
----------------------------------
| a | P | 12 |
| b | P | 13 |
| c | R | 16 |
| d | Q | 2 |
----------------------------------
In short, I want all distinct tokens in one column, and each token mapped with each of the 5 shops available, and count indicating the number of rows in the original table where a specific token came with a specific shop.
Note: count of 'a' with shop 'P' in new table will include the count of rows in original table having 'a' in any of the comma separated values.
I am unable to write a kusto query for this, Please help.
Here is one apporach:
datatable(Tokens:dynamic, Shop:string)[dynamic(["a"]), "P",
dynamic(["a", "b"]), "Q",
dynamic(["a", "d", "e"]), "P",
dynamic(["c"]), "R",
dynamic(["a", "b", "c", "d"]), "Q"]
| mv-expand Token =Tokens to typeof(string)
| summarize count() by Token, Shop
| order by Token asc
Here is the output:

Split data in SQLite column

I have a SQLite database that looks similar to this:
---------- ------------ ------------
| Car | | Computer | | Category |
---------- ------------ ------------
| id | | id | | id |
| make | | make | | record |
| model | | price | ------------
| year | | cpu |
---------- | weight |
------------
The record column in my Category table contains a comma separated list of the table name and id of the items that belong to that Category, so an entry would look like this:
Car_1,Car_2.
I am trying to split the items in the record on the comma to get each value:
Car_1
Car_2
Then I need to take it one step further and split on the _ and return the Car records.
So if I know the Category id, I'm trying to wind up with this in the end:
---------------- ------------------
| Car | | Car |
---------------| -----------------|
| id: 1 | | id: 2 |
| make: Honda | | make: Toyota |
| model: Civic | | model: Corolla |
| year: 2016 | | year: 2013 |
---------------- ------------------
I have had some success on splitting on the comma and getting 2 records back, but I'm stuck on splitting on the _ and making the join to the table in the record.
This is my query so far:
WITH RECURSIVE record(recordhash, data) AS (
SELECT '', record || ',' FROM Category WHERE id = 1
UNION ALL
SELECT
substr(data, 0, instr(data, ',')),
substr(data, instr(data, ',') + 1)
FROM record
WHERE data != '')
SELECT recordhash
FROM record
WHERE recordhash != ''
This is returning
--------------
| recordhash |
--------------
| Car_1 |
| Car_2 |
--------------
Any help would be greatly appreciated!
If your recursive CTE works as expected then you can split each of the values of recordhash with _ as a delimiter and use the part after _ as the id of the rows from Car to return:
select * from Car
where id in (
select substr(recordhash, 5)
from record
where recordhash like 'Car%'
)

How can we transpose a Table in Spotfire with keeping columnames as rownames?

My sample data is like below. However the original data is very large so I can not hardcode.
+-----+-------+---------+
| IDN | NAME | VALUE |
+-----+-------+---------+
| 121 | test | 1254.25 |
| 152 | testa | 1585.25 |
| 587 | testb | 5878.69 |
+-----+-------+---------+
After transpose function:-
+---------+---------+---------+
| 121 | 152 | 587 |
+---------+---------+---------+
| test | testa | testb |
| 1254.25 | 1585.25 | 5878.69 |
+---------+---------+---------+
Expected:-
+-------+---------+---------+---------+
| IDN | 121 | 152 | 587 |
+-------+---------+---------+---------+
| NAME | test | testa | testb |
| VALUE | 1254.25 | 1585.25 | 5878.69 |
+-------+---------+---------+---------+
I was using t() function is spotfire but in the resultant data table I am missing the columnnames as rownames. Are there anyways to keep
You can do this with UnPivot and Pivot.
Insert > Transformation
Add the Unpivot with the settings below and hit ok
Add the Pivot with the settings below
Change the column name for column 1 with Edit > Column properties
Here Is The Data Table Settings
Add the transformations:
a. Unpivot
Add columns to pass through:
IDN
Add columns to transform:
NAME
VALUE
Category column name: Column
Select category column data type: String
Value column name: Value
Select value column data type: String
Select 'Include null values'
b. Pivot
Choose row identifiers:
Column
Choose value columns and aggregation methods:
Concatenate(Value)
Choose column titles:
IDN
Column naming pattern: %M(%V) for %C

sqlite, order by date/integer in joined table

I have two tables
Names
id | name
---------
5 | bill
15 | bob
10 | nancy
Entries
id | name_id | added | description
----------------------------------
2 | 5 | 20140908 | i added this
4 | 5 | 20140910 | added later on
9 | 10 | 20140908 | i also added this
1 | 15 | 20140805 | added early on
6 | 5 | 20141015 | late to the party
I'd like to order Names by the first of the numerically-lowest added values in the Entries table, and display the rows from both tables ordered by the added column overall, so the results will be something like:
names.id | names.name | entries.added | entries.description
-----------------------------------------------------------
15 | bob | 20140805 | added early on
5 | bill | 20140908 | i added this
10 | nancy | 20140908 | i also added this
I looked into joins on the first item (e.g. SQL Server: How to Join to first row) but wasn't able to get it to work.
Any tips?
Give this query a try:
SELECT Names.id, Names.name, Entries.added, Entries.description
FROM Names
INNER JOIN Entries
ON Names.id = Entries.name_id
ORDER BY Entries.added
Add DESC if you want it in reverse order i.e.: ORDER BY Entries.added DESC.
This should do it:
SELECT n.id, n.name, e.added, e.description
FROM Names n INNER JOIN
(SELECT name_id, description, Min(added) FROM Entries GROUP BY name_id, description) e
ON n.id = e.name_id
ORDER BY e.added

Resources