SQLite script SELECT from multiple tables - sqlite

I am tying to bring several tables into a single script. To do this, I realize I will need to use INNER JOIN and this is confusing me.
"SELECT UserSpecies.commonName, UserSpecies.commonNameFR, UserSpecies.commonNameES, UserSpecies.commonNameDE, UserSpecies.speciesName, UserSpecies.speciesRegion, UserSpecies.speciesDetails, UserSpecies.maxSize, UserSpecies.creditSource, UserSpecies.UserCreated, UserSpecies.genusUC, UserSpecies.familyUC, UserSpecies.orderUC, UserSpecies.groupUC, UserSpecies.subGroupUC, UserSpecies.authority, Genus.name, Family.name, Orders.name, Groups.name, SubGroups.name, Types.name, IUCN.name
FROM UserSpecies
INNER JOIN Genus ON UserSpecies.genusKey = Genus.id
INNER JOIN Family ON UserSpecies.familyKey = Family.id
INNER JOIN Orders ON UserSpecies.orderKey = Orders.id
INNER JOIN Groups ON UserSpecies.groupKey = Groups.id
INNER JOIN SubGroups ON UserSpecies.subGroupKey = SubGroups.id
INNER JOIN Types ON UserSpecies.typeKey = Types.id
INNER JOIN IUCN ON UserSpecies.iucnKey = IUCN.id
WHERE UserSpecies.id = %d"
When I run the above, I do not get any errors, but it simply does not retrieve the data.
Note: All tables and columns are correct.
What am I missing?

It's likely you have missing information in you database.
The "inner join" statement retrieve data only if there is one (or more) match with the joined table.
In your case it means that if some data are missing in one of the joined tables (Genus, Family, etc..) you won't get any result.
For example if there is no Family with id = UserSpecies.familyKey (in the Family table) you will get no data.
You can try using an "left join" instead of "inner join" : this will return a result even if there is no match. In this case the corresponding values will be null : if there is no Family matching you will get the value "null" for Family.name. With this modified request you will easily spot the missing data (if any) :
"SELECT UserSpecies.commonName, UserSpecies.commonNameFR, UserSpecies.commonNameES, UserSpecies.commonNameDE, UserSpecies.speciesName, UserSpecies.speciesRegion, UserSpecies.speciesDetails, UserSpecies.maxSize, UserSpecies.creditSource, UserSpecies.UserCreated, UserSpecies.genusUC, UserSpecies.familyUC, UserSpecies.orderUC, UserSpecies.groupUC, UserSpecies.subGroupUC, UserSpecies.authority, Genus.name, Family.name, Orders.name, Groups.name, SubGroups.name, Types.name, IUCN.name
FROM UserSpecies
LEFT JOIN Genus ON UserSpecies.genusKey = Genus.id
LEFT JOIN Family ON UserSpecies.familyKey = Family.id
LEFT JOIN Orders ON UserSpecies.orderKey = Orders.id
LEFT JOIN Groups ON UserSpecies.groupKey = Groups.id
LEFT JOIN SubGroups ON UserSpecies.subGroupKey = SubGroups.id
LEFT JOIN Types ON UserSpecies.typeKey = Types.id
LEFT JOIN IUCN ON UserSpecies.iucnKey = IUCN.id
WHERE UserSpecies.id = %d"
If you want more details on the different flavors of the "join" keyword you can have a look here :
What is the difference between "INNER JOIN" and "OUTER JOIN"?
Be carefull : sqlite only support a subset of the join operations.

Related

How to find cardinality of all columns in kusto?

I'm trying to find number of distinct values in all columns for some query. I found that dcount works well but you have to supply the specific column. I want to do this on all columns, where the column names and the number of columns are dynamic
you'll have to explicitly include all columns of interest.
note that any additional column you add to the query will increase the resources utilization of the query, so if you have any knowledge about columns that are likely to be of high cardinality, consider including only those.
FWIW: you can generate the query (for all columns, with the caveat above) dynamically, then invoke the result of this:
let tableName = "my_table";
let datetime_column_name = "my_datetime_column";
let lookback_period = 1h;
let column_names = toscalar(
table(tableName)
| getschema
| summarize make_set(ColumnName)
);
print query = strcat(
tableName,
"\n| where ",
datetime_column_name,
" > ago(timespan(",
lookback_period,
"))\n| summarize dcount(",
strcat_array(column_names, "),\ndcount("),
")")

full_join by date plus one or minus one

I want to use full_join to join two tables. Below is my pseudo code:
join <- full_join(a, b, by = c("a_ID" = "b_ID" , "a_DATE_MONTH" = "b_DATE_MONTH" +1 | "a_DATE_MONTH" = "b_DATE_MONTH" -1 | "a_DATE_MONTH" = "b_DATE_MONTH"))
a_DATE_MONTH and b_DATE_MONTH are in date format "%Y-%m".
I want to do full join based on condition that a_DATE_MONTH can be one month prior to b_DATE_MONTH, OR one month after b_DATE_MONTH, OR exactly equal to b_DATE_MONTH. Thank you!
While SQL allows for (almost) arbitrary conditions in a join statement (such as a_month = b_month + 1 OR a_month + 1 = b_month) I have not found dplyr to allow the same flexibility.
The only way I have found to join in dplyr on anything other than a_column = b_column is to do a more general join and filter afterwards. Hence I recommend you try something like the following:
join <- full_join(a, b, by = c("a_ID" = "b_ID")) %>%
filter(abs(a_DATE_MONTH - b_DATE_MONTH) <= 1)
This approach still produces the same records in your final results.
It perform worse / slower if R does a complete full join before doing any filtering. However, dplyr is designed to use lazy evaluation, which means that (unless you do something unusual) both commands should be evaluated together (as they would be in a more complex SQL join).

Syntax error in Union All of recursive CTE postgresql

I have been struggling to fit in my logic into recursive cte as it seems its the best approach to solve the hierarchization problem using SQL. On running a this structure below I am getting
syntax error at or near "DROP"
LINE 7: create temp table allnewreleases AS (SELECT mc.changeset_id,...
Was wondering can we create and drop temp tables inside the recursive part of the recursive cte?
create temp table maincompo AS(SELECT * from X)
create temp table allnew AS (SELECT * FROM Y inner join maincompo)
--anchor query starts--
(WITH RECURSIVE mainquery AS ((SELECT * FROM allnew )
--anchor query ends--
UNION ALL
DROP TABLE maincompo
DROP TABLE allnew
create temp table maincompo AS(SELECT * from X inner join mainquery)
create temp table allnew AS (SELECT * FROM Y inner join maincompo)
SELECT * from allnew )
SELECT * from mainquery)
Is something like this possible??, keeping in mind that the temp tables within Union ALL has to be created every something gets appended to anchor query from the recursive query.

Converting date to a varchar using "like" in pl/sql

I need to go through few millions of data searching for a year sent as a parameter to a method. The year comes as a varchar.
This is the query I'm working with
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND to_char(cre_date, 'YYYY') = year_;
cre_ date is of type date and year_ is from type carchar.
when performing this query it take around 25 minutes to process it completely.
Is anyone knows about a different approach to find out the quick execution.
Please help.
This didn't work out.
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND cre_date LIKE '%2013';
The reason might be 'cre_date' and '%2013' are of different types
If you have an index on (mch_code, contract, cre_date) columns, you can improve performance by doing something like:
select x, y
from a
where mch_code = 'KN'
and contract = '15KTN'
and cre_date >= to_date('01/01/'||year_, 'dd/mm/yyyy')
and cre_date < add_months(to_date('01/01/'||year_, 'dd/mm/yyyy'), 12);
Even better would be to declare the start of the year as a DATE variable prior to running the sql, eg:
v_year_dt := to_date('01/01/'||year_, 'dd/mm/yyyy');
which would make the query:
select x, y
from a
where mch_code = 'KN'
and contract = '15KTN'
and cre_date >= v_year_dt
and cre_date < add_months(v_year_dt, 12);
If you don't have an index on those three columns, you could create a function based index on (mch_code, contract, to_char(cre_date, 'yyyy')) that should help speed up your query, depending on the percentage of rows you're expecting to select. It may help even more if you added the x and y columns into the index, so that no table access was required at all.
Alternatively, you could think about partitioning the table on cre_date, monthly or yearly.
The reason your query is slow is that you're applying a function to a column on every row in your table. Let's try it another way:
SELECT X,Y
FROM A
WHERE mch_code = 'KN' AND
contract = '15KTN' AND
CRE_DATE BETWEEN TO_DATE('01/01/' || year_, 'DD/MM/YYYY')
AND TO_DATE('01/01/' || year_, 'DD/MM/YYYY') + INTERVAL '1' YEAR;
This eliminates the need to apply a function against every row in the table, and should allow any indexes on CRE_DATE to be used.
Best of luck.
You can try with EXTRACT function:
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND EXTRACT(YEAR FROM cre_date) = year_;

How to get count of all columns of a table, which are not null using PL/SQL?

Is there any PL/SQL function, which allows to pass a table name and returns the count of all columns, which don't include null values?
I have a huge number of columns and don't want to query each and every column. I'm new to PL/SQL and highly appreciate your help.
As of a comment to the question one approach to solve this is the following query:
SELECT t.table_name,
t.num_rows,
c.column_name,
c.num_nulls,
t.num_rows - c.num_nulls num_not_nulls,
c.data_type,
c.last_analyzed
FROM all_tab_cols c
JOIN sys.all_all_tables t ON c.table_name = t.table_name
WHERE c.table_name LIKE 'EXT%'
AND c.nullable = 'Y'
GROUP BY t.table_name,
t.num_rows,
c.column_name,
c.num_nulls,
c.data_type,
c.last_analyzed
ORDER BY t.table_name,
c.column_name

Resources