As we have two options to intercept the null values coming from database...
ISNull
Coalesce
Following are the ways to write the query for the above two functions...
Select IsNull(Columnname, '') As validColumnValue From TableName
Select Coleasce(Columnname, '') As validColumnValue From TableName
Query - Which should be prefered in which situation and why?
This has been hashed and re-hashed. In addition to the tip I pointed out in the comment and the links and explanation #xQbert posted above, by request here is an explanation of COALESCE vs. ISNULL using a subquery. Let's consider these two queries, which in terms of results are identical:
SELECT COALESCE((SELECT TOP (1) name FROM sys.objects), N'foo');
SELECT ISNULL((SELECT TOP (1) name FROM sys.objects), N'foo');
(Comments about using TOP without ORDER BY to /dev/null/ thanks.)
In the COALESCE case, the logic actually gets expanded to something like this:
SELECT CASE WHEN (SELECT TOP (1) ...) IS NULL
THEN (SELECT TOP (1) ...)
ELSE N'foo'
END
With ISNULL, this does not happen. There is an internal optimization that seems to ensure that the subquery is only evaluated once. I don't know if anyone outside of Microsoft is privy to exactly how this optimization works, but you can this if you compare the plans. Here is the plan for the COALESCE version:
And here is the plan for the ISNULL version - notice how much simpler it is (and that the scan only happens once):
In the COALESCE case the scan happens twice. Meaning the subquery is evaluated twice, even if it doesn't yield any results. If you add a WHERE clause such that the subquery yields 0 rows, you'll see similar disparity - the plan shapes might change, but you'll still see a double seek+lookup or scan for the COALESCE case. Here is a slight different example:
SELECT COALESCE((SELECT TOP (1) name FROM sys.objects
WHERE name = N'no way this exists'), N'foo');
SELECT ISNULL((SELECT TOP (1) name FROM sys.objects
WHERE name = N'no way this exists'), N'foo');
The plan for the COALESCE version this time - again you can see the whole branch that represents the subquery repeated verbatim:
And again a much simpler plan, doing roughly half the work, using ISNULL:
You can also see this question over on dba.se for some more discussion:
Performance difference for COALESCE versus ISNULL?
My suggestion is this (and you can see my reasons why in the tip and the above question): trust but verify. I always use COALESCE (because it is ANSI standard, supports more than two arguments, and doesn't do quite as wonky things with data type precedence) unless I know I am using a subquery as one of the expressions (which I don't recall ever doing outside of theoretical work like this) or I am experiencing a real performance issue and just want to compare to see if COALESCE vs. ISNULL has any substantial performance difference (which outside of the subquery case, I have yet to find). Since I am almost always using COALESCE with arguments of like data types, I rarely have to do any testing other than looking back at what I've said about it in the past (I was also the author of the aspfaq article that xQbert pointed out, 7 years ago).
begin humor: The 1st, the 2nd will never work it's spelled wrong :D END humor
---Cleaned up response---
Features of isNull(value1,value2)
Only supports 1 valuation, if the first is null, the 2nd will be used, so if its null too you get null back!
is non-ANSI standard. Meaning if database portability is an issue, don't use this one
isnull(value1,value2) will return the datatype for Value1
will return the datatype for the selected value and fail when implicit conversions can't occur
Features of Coalesce(Value1, Value2, value3, value...)
Supports multiple valuations of Null; basically will pull in the first non-null value from the list. if all values are null in list, null is returned.
is ANSI-standard meaning database portability shouldn't be an issue.
will return the datatype for the selected value and fail if all fields in the select do not return the same datatype.
So to answer the question directly:
It depends on the situation if you need to develop SQL that
is DB independent; coalesce is more correct to use.
allows for multiple evaluations; coalesce is more correct (of course you could just embed isnull over and over and over...) but put that under a performance microscope, and coalesce may just win. (i've not tested it)
are you using a db engine that supports isNull? (if not use coalesce)
how do you want type casting handled? implicitly or not.
---ORIGINAL------
is null only supports 2 evaluations
coalesce supports many more... coalesce (columnName1, ColumnName2, ColumnName3, '')
coalesce returns datatype similar to that of case evaluation, whereas isnull returns datatype of first in list. (which I found interesting!)
As to when to use which. you'd have to investigate by looking at execution plan of both on SQL 2008 and 2005, different versions different engines different ways to execute.
Furthermore coalesce is ansii standard, isnull is engine specific. Thus if you want greater portability between dbengines use coalesce.
More info here aspfaq
or here msdn blog
you may take this onto consideration.
ISNULL function required two parameters: the value to check and the replacement for null values
2.COALESCE function works a bit different COALESCE will take any number of parameters and return the first non-NULL value , I prefer COALESCE over ISNULL 'cause
meets ANSI standarts, while ISNULL does not.
I hope you found the answer to your question.
Related
I found a method json_insert in the json section of the SQLite document. But it seems to be not working in the way that I expected.
e.g. select json_insert('[3,2,1]', '$[3]', 4) as result;
The result column returns '[3,2,1,4]', which is correct.
But for select json_insert('[3,2,1]', '$[1]', 4) as result;
I am expecting something like '[3,2,4,1]' to be returned, instead of '[3,2,1]'.
Am I missing something ? I don't see there is an alternative method to json_insert.
P.S. I am playing it on https://sqlime.org/#demo.db, the SQLite version is 3.37.2.
The documentation states that json_insert() will not overwrite values ("Overwrite if already exists? - No"). That means you can't insert elements in the middle of the array.
My interpretation: The function is primarily meant to insert keys into an object, where this kind of behavior makes more sense - not changing the length of an array is a sacrifice for consistency.
You could shoehorn it into SQLite by turning the JSON array into a table, appending your element, sorting the result, and turning it all back into a JSON array:
select json_group_array(x.value) from (
select key, value from json_each('[3,2,1]')
union
select 1.5, 4 -- 1.5 = after 1, before 2
order by 1
) x
This will produce '[3,2,4,1]'.
But you can probably see that this won't scale, and even if there was a built-in function that did this for you, it wouldn't scale, either. String manipulation is slow. It might work well enough for one-offs, or when done infrequently.
In the long run, I would recommend properly normalizing your database structure instead of storing "non-blob" data in JSON blobs. Manipulating normalized data is much easier than manipulating JSON, not to mention faster by probably orders of magnitude.
I am a beginner to this progress 4GL. I have confused with the following logic especially how the index actually working.
I have added 2 fields in one index. As you can see below I have written three queries.
Query 1, Used the index and finding data from 2 fields to retrieve the data
Query 2, Used the same index but finding data from 1 field only
Query 3, Used the same index field with one non-index field
define temp-table tt_creldata no-undo
field tt_cscx_order as character
field tt_cscx_part as character
field tt_cscx_shipfrom as character
index tt_cscx
tt_cscx_order
tt_cscx_part
.
**Query 1:**
find first tt_creldata use-index tt_cscx
where tt_cscx_order = "153"
and tt_cscx_part = "113" no-lock no-error.
**Query 2:**
find first tt_creldata use-index tt_cscx
where tt_cscx_order = "153" no-lock no-error.
**Query 3:**
find first tt_creldata use-index tt_cscx
where tt_cscx_order = "153"
and tt_cscx_part = "113"
and tt_cscx_shipfrom = "US" no-lock no-error.
Question 1: Which query helps to improve the performance
Question 2: What if I don't use one field which is indexed when I mentioned use-index
Question 3: What if I add one non-index field when I mentioned use-index?
As a general rule of thumb, you should never use use-index.
The AVM will select one or more indexes to use for a query at compile time, and by forcing it to use one of your choosing, you are removing the possibility of this.
Having extra, possibly non-index, fields in your where clause will only affect the indexes chosen if you let the AVM choose (ie don't use use-index ). This is also true if you don't use indexed fields in your query.
You can see which indexes are used if you compile the program with the xref or xml-xref options, and looking for the SEARCH items.
As nwahmaet says, you should never use USE-INDEX. In this case it is especially pointless because there is only one index. In cases where there are multiple indexes a FIND statement will only use one of them no matter how complex the WHERE clause but the compiler will almost always do a better job picking an efficient index than you will. (The FOR EACH statement and its associated dynamic queries are capable of using multiple indexes. FIND is always limited to just one index.) In those rare cases where you think you are doing a better job you should thoroughly document why your choice is better and include detailed test cases and results.
All of your queries are using FIRST. This is necessary because your index is not defined as unique. That may be your intent but it seems unusual. And it means that in the event of duplicate records with the same key values you are magically making the "first" record more special than the others. Which is a data normalization faux pas (you are making "firstness" an attribute of the data) and a bug waiting to happen.
FIND FIRST and USE-INDEX are often used together to (try to) cover up for each other's deficiencies. By specifying a particular index the FIRST becomes more consistent. Likewise, FIRST is often used to "cure" performance issues that arise from insufficient index definitions, inadequate WHERE clauses or choosing FIND when FOR EACH would have been more appropriate.
None of these queries are going to perform notably faster than the others.
Query 2 may, or may not return the same record as query 1. For instance, if there is a part = "112" then query 2 will have a different "first" record. But it will be just as fast to return as query 1.
Likewise query 3 may have a different result depending on what records contain shipfrom = "US". In the best case where the very first order = "153" and part "113" also satisfy shipfrom = "US" then it will be the same speed as the others.
However, query 3 might be a lot slower depending on how many records have to be scanned before one is found that has shipfrom = "US" since that field is not a part of any index and matching it will, therefore, require scanning records until one is found which matches. That might be the first record or it might be the 10 zillionth.
I'm trying to resolve below issue:
I need to prepare table that consists 3 columns:
user_id,
month
value.
Each from over 200 users has got different values of parameters that determine expected value which are: LOB, CHANNEL, SUBSIDIARY. So I decided to store it in table ASYSTENT_GOALS_SET. But I wanted to avoid multiplying rows and thought it would be nice to put all conditions as a part of the code that I would use in "where" clause further in procedure.
So, as an example - instead of multiple rows:
I created such entry:
So far I created testing table ASYSTENT_TEST (where I collect month and value for certain user). I wrote a piece of procedure where I used BULK COLLECT.
declare
type test_row is record
(
month NUMBER,
value NUMBER
);
type test_tab is table of test_row;
BULK_COLLECTOR test_tab;
p_lob varchar2(10) :='GOSP';
p_sub varchar2(14);
p_ch varchar2(10) :='BR';
begin
select subsidiary into p_sub from ASYSTENT_GOALS_SET where user_id='40001001';
execute immediate 'select mc, sum(ppln_wartosc) plan from prod_nonlife.mis_report_plans
where report_id = (select to_number(value) from prod_nonlife.view_parameters where view_name=''MIS'' and parameter_name=''MAX_REPORT_ID'')
and year=2017
and month between 7 and 9
and ppln_jsta_symbol in (:subsidiary)
and dcs_group in (:lob)
and kanal in (:channel)
group by month order by month' bulk collect into BULK_COLLECTOR
using p_sub,p_lob,p_ch;
forall x in BULK_COLLECTOR.first..BULK_COLLECTOR.last insert into ASYSTENT_TEST values BULK_COLLECTOR(x);
end;
So now when in table ASYSTENT_GOALS_SET column SUBSIDIARY (varchar) consists string 12_00_00 (which is code of one of subsidiary) everything works fine. But the problem is when user works in two subsidiaries, let say 12_00_00 and 13_00_00. I have no clue how to write it down. Should SUBSIDIARY column consist:
'12_00_00','13_00_00'
or
"12_00_00","13_00_00"
or maybe
12_00_00','13_00_00
I have tried a lot of options after digging on topics like "Deling with single/escaping/double qoutes".
Maybe I should change something in execute immediate as well?
Or maybe my approach to that issue is completely wrong from the very beginning (hopefully not :) ).
I would be grateful for support.
I didn't create the table function described here but that article inspired me to go back to try regexp_substr function again.
I changed: ppln_jsta_symbol in (:subsidiary) to
ppln_jsta_symbol in (select regexp_substr((select subsidiary from ASYSTENT_GOALS_SET where user_id=''fake_num''),''[^,]+'', 1, level) from dual
connect by regexp_substr((select subsidiary from ASYSTENT_GOALS_SET where user_id=''fake_num''), ''[^,]+'', 1, level) is not null) Now it works like a charm! Thank you #Dessma very much for your time and suggestion!
"I wanted to avoid multiplying rows and thought it would be nice to put all conditions as a part of the code that I would use in 'where' clause further in procedure"
This seems a misguided requirement. You shouldn't worry about number of rows: databases are optimized for storing and retrieving rows.
What they are not good at is dealing with "multi-value" columns. As your own solution proves, it is not nice, it is very far from nice, in fact it is a total pain in the neck. From now on, every time anybody needs to work with subsidiary they will have to invoke a function. Adding, changing or removing a user's subsidiary is much harder than it ought to be. Also there is no chance of enforcing data integrity i.e. validating that a subsidiary is valid against a reference table.
Maybe none of this matters to you. But there are very good reasons why Codd mandated "no repeating groups" as a criterion of First Normal Form, the foundation step of building a sound data model.
The correct solution, industry best practice for almost forty years, would be to recognise that SUBSIDIARY exists at a different granularity to CHANNEL and so should be stored in a separate table.
I want to use rowset variable as scaler variable.
#cnt = Select count(*) from #tab1;
If (#cnt > 0) then
#cnt1= select * from #tab2;
End;
Is it possible?
======================================
I want to block the complex u-sql code based on some condition, lets say based on some control table. In my original code, I wrote 10-15 u-sql statements and I want to bound them within the If statement. I don't want to do cross join because it again start trying to join the table. If I use cross join, there is no significant save in execution time. Use of IF statement is, If the condition does not met, complete piece of code should not execute. Is it possible?
To add to wBob's and Alex's answers:
U-SQL does not provide data driven control flow within a script. The current IF statement requires the expression to be evaluated at compile time.
Consider a U-SQL script as just a single declarative query. So you have the following options:
Express your problem with relational expressions. This means that you will have to write a (cross) join to guard the execution. If you feel that the query optimizer does a bad job at optimizing such guards (e.g., it evaluates the expensive side of the join before the cheap guard), please report an issue and we will take a look.
Split your script into several scripts and look at the result of each script before doing your next step. This is a form of orchestration that you can do with ADF or writing your own orchestration with Powershell or any of the SDKs. The caveat here is that you will have to write intermediate results into files and download the files into your orchestration layer.
Having said this, it theoretically is possible to extent the language algebra with a "don't execute the remaining part of this operator tree if a condition is not satisfied" operator. However that is a major work item and can lead to very large query plans during compilation that may be going beyond the current limits. If you feel that neither 1 nor 2 above are sufficient to help with your scenario, please add your vote to https://feedback.azure.com/forums/327234-data-lake/suggestions/17635906-please-add-dynamic-if-evaluation-to-u-sql.
#cnt1 =
SELECT #tab2.*
FROM #tab2
CROSS JOIN (SELECT COUNT(*) AS cnt FROM #tab1) AS c
WHERE c.cnt > 0;
(Adding explanation) CROSS JOIN returns a cartesian product of all rows from #tab2 and the single row generated by the COUNT query. There WHERE condition then ensures the result of the query is all rows from #tab2 if COUNT(*)>0, no rows otherwise.
I have two tables defined for actual and expected with exactly the same schema. I insert two rows into the expected table with say Ids of 2, 1.
I run
INSERT INTO actual EXEC tSQLt.ResultSetFilter 1, '{statement}'
to populate the actual then
EXEC tSQLt.AssertEqualsTable #expected = 'expected' , #actual = 'actual'
to compare the results.
Even though the data is in a different order (Ids are 1, 2 in the actual), the test passes.
I confirmed that the data was different by adding SELECT * FROM actual and SELECT * FROM expected in the test and running the test on its own with tSQLt.Run '{test name}'.
Does anyone know if this is a known bug? Apparently it is supposed to check per row so the ordering should be checked. All the other columns are NULL that are returned it is just the ID column that contains a value.
Unless an order by clause is specified in the select statement, the order isn't guaranteed by SQL server (see the top bullet point at this MSDN page) - although in practice it is often ordered as you might expect.
Because of this, I believe that tSQLt looking for non-identical and identical rows makes sense - but checking the order doesn't - otherwise the answer could change at the whim of SQL server and the test would be meaningless (and worse - intermittently failing!). The tSQLt user guide on AssertEqualsTable states that it checks the content of the table, but not that it checks the ordering therein. What leads you to conclude that the order should be being checked as well? I couldn't find mention of it.
If you need the order to be checked, you could insert both expected and actual results into a temporary table with an identity column (or use ROW_NUMBER) and check the resultant table - if the order is different then the identity cols would be different.
There is a similar method documented here on Greg M Lucas' blog.
Relying on the order returned from the table without an order by clause is not recommended (MSDN link) - so I'd suggest including one in your application's call to the statement, or if an SP within it if the order of returned rows is important.