Optimise PL/SQL or migrate to Clojure(parallezable language)? - plsql

I have a large table and wish to iterate over the records( > 1,000,000 ), perform some checks based on another 2 sets each table >= 1 and output the result to a text file.
The PL\SQL that does this takes several hours and I could optimize it or alternatively I could just rewrite this as a clojure program that is parallelizable since there are only selects and no writes(to the tables).
Questions:
1 What challenges/limits are there in optimizing PL/SQL?
2 Are there major up sides to migrating the code to clojure vis-a-vis optimizing the PL/SQL?
EDIT
Here is the meat of it
OPEN cur;
LOOP
FETCH cur INTO l_cur;
EXIT WHEN cur%NOTFOUND;
SELECT NVL (dUM ( (total - total_old)), 0),
NVL (dUM ( (new - old)), 0)
INTO li_debt, li_debt
FROM tbl1
WHERE accounting_date = l_cur.accounting_date
AND USER_ID = l_cur.USER_ID
AND USER_ACCOUNT = l_cur.USER_ACCOUNT;
SELECT NVL (
dUM (
DECODE (a.DEBITS,
'foo', ABS (amount),
ABS (amount) * -1)),
0)
amount
INTO li_dad_bill
FROM daily_trandactiond d, ACCOUNTS a
WHERE d.USER_ID = l_cur.USER_ID
AND d.USER_ACCOUNT = l_cur.USER_ACCOUNT
AND d.f_actual >= l_cur.accounting_date
AND d.acc_code = a.acc_code
AND d.concept = a.conc
AND ( d.tarrif = a.tariff or (d.acc_code, d.concept) NOT IN
(SELECT UNIQUE acc_code, conc
FROM ACCOUNTS
WHERE TRIM (tariff) Id NOT NULL)
);
SELECT NVL (
dUM (
DECODE (a.DEBITS,
'foo', ABS (amount),
ABS (amount) * -1)),
0)
amount
INTO li_dad_coll
FROM daily_trandactiond d, ACCOUNTS a
WHERE d.USER_ID = l_cur.USER_ID
AND d.USER_ACCOUNT = l_cur.USER_ACCOUNT
AND d.f_actual = l_cur.accounting_date
AND d.acc_code = a.acc_code
AND d.concept = a.conc
AND dUBdTR (d.acc_code, 3, 1) <> '1';
IF ABS ( (li_debt - li_debt) - (li_dad_bill + li_dad_coll)) > 0.9
THEN
DBMd_OUTPUT.
put_line (
LPAD (TO_CHAR (l_cur.USER_ID) || ',', 20, ' ')
|| LPAD (TO_CHAR (l_cur.USER_ACCOUNT) || ',', 20, ' '));
END IF;
END LOOP;
CLOdE cur;

Well it depends on many things.
The main thing obviously would be your degree of competence in optimizing SQL statements vs rewriting the logic to Clojure. I'm not familiar with Clojure but I would expect that you would need at least a good understanding of SQL in general and Oracle in particular to produce an efficient parallel solution. Running many single row statements in parallel is not a good strategy performance-wise.
The second thing that comes to mind is that it will depend on the bottleneck. If the bottleneck right now is disk IO for instance, you won't achieve better performance with parallelization. It would help to know where the program is spending its time (is it the big 1000000 row SELECT or the subsequent checks, or even writing to the file?).
As a general rule, you'll be hard-pressed to outperform a well-optimized SQL statement with a do-it-yourself parallel solution. That's because many operations, like joining and sorting are more efficient in set logic than in row-by-row logic, and because thinking in sets is easier with SQL in my opinion.
Now I suspect that your program is probably something like that:
FOR cur IN (SELECT * /*100000 rows*/ FROM view) LOOP
check(cur.x, cur.y); -- check row by row, lookup to other tables
IF (condition) THEN
write_to_file(cur.z);
END IF;
END LOOP;
If you can easily rewrite most of the conditions with joins in the main cursor, you will probably have a huge performance gain with only light modifications.
If you can not, because the conditions are too heavily dependent upon the content for instance, this might be a good case for parallelization, assuming that each individual statement is already efficient. In that caes you could run N jobs with an additional where clause that distributes the work more or less equally among them, then concatenate the result.

Related

Can anyone explain why this function is not working in mariaDB?

Can anyone help with the below code? Im new to mariaDB and am struggling to create the function. Im not even sure if im highliting and executing it right. Whatever I do, I get many errors.
DELIMITER $$
CREATE FUNCTION singerExperience(
experience DECIMAL(10,2)
)
RETURNS VARCHAR(20)
DETERMINISTIC
BEGIN
DECLARE singerExperience VARCHAR(20);
IF hours > 4000 THEN
SET singerExperience = 'PLATINUM';
ELSEIF (hours >= 4000 AND
hours <= 1000) THEN
SET singerExperience = 'GOLD';
ELSEIF hours < 1000 THEN
SET singerExperience = 'SILVER';
END IF;
RETURN (singerExperience);
END $$
DELIMITER ;
SELECT singer_id, singerExperience(experience)
FROM experiencelog
ORDER BY singer_id;
If you rename the parameter experience to hours (or variable hours to experience) the function should work.
However why do you need a function, if you can handle that within the statement?
SELECT singer_id,
case experience < 1000 THEN 'SILVER' WHEN experience < 4000 THEN 'GOLD' ELSE 'PLATINUM' END
FROM experiencelog
ORDER BY singer_id;

Getting ORA-22922 (nonexistent LOB value) or no result at all with wm_concat()

(Using Oracle 11.2)
I have a rather complicated SQL with something like
wm_concat( distinct abc )
that is expected to return some varchar2(4000) compatible result.
It causes ORA-00932: inconsistent datatypes in my select used in some coalesce( some_varchar_col, wm_concat( ... ) ).
So I tried casting it via two different methods:
dbms_lob.substr( ..., 4000 ) -- L) tried even with 3000 in case of "unicode byte blow-up"
cast( ... as varchar2(4000)) -- C) tried even with 3000 in case of "unicode byte blow-up"
(The are used in a view, but playing around with it suggests, it is not related to the views)
Depending on the column and other operators I either get N) no result or O) ORA-22922:
select * from view_with_above_included where rownum <= 100
N) My Eclipse Data Explorer JDBC connection returns without any result (no columns without results, no (0 rows effected), only the query time statistics). (It could be an internal exception not treated as such?)
O)
ORA-22922: nonexistent LOB value
ORA-06512: in "SYS.DBMS_LOB", line 1092
ORA-06512: in line 1
Strangely the following test queries work:
-- rownum <= 100 would already cause the above problems
select * from view_with_above_included where rownum <= 10
or
select * from view_with_above_included
but looking at the actual aggregated data does not show aggregated data that would exceed 1000 characters in length.
Luckily, it works with the listagg( ... ) function provided since 11.2 (we are already running on), so we did not have to investigate further:
listagg( abc, ',' ) within group ( order by abc )
(Where wm_concat(...) is, as one should know, some internal and officially unsupported function.)
a rather nice solution (because it is not so bloated) to implement the distinct functionality is via self-referencing regexp functionality which should work in many cases:
regexp_replace(
listagg( abc, ',' ) within group ( order by abc )
, '(^|,)(.+)(,\2)+', '\1\2' )
(Maybe/Hopefully we will see some working listagg( distinct abc ) functionality in the future, which would be very neat and cool like the wm_concat syntax. E.g. this is no problem since a long time with Postgres' string_agg( distinct abc )1 )
-- 1: postgres sql example:
select string_agg( distinct x, ',' ) from unnest('{a,b,a}'::text[]) as x`
If the list exceeds 4000 characters, one cannot use listagg anymore (ORA-22922 again).
But luckily we can use the xmlagg function here (as mentioned here).
If you want to realize a distinct on a 4000-chars-truncated result here, you could outcomment the (1)-marked lines.
-- in smallercase everything that could/should be special for your query
-- comment in (1) to realize a distinct on a 4000 chars truncated result
WITH cfg AS (
SELECT
',' AS list_delim,
'([^,]+)(,\1)*(,|$)' AS list_dist_match, -- regexp match for distinct functionality
'\1\3' AS LIST_DIST_REPL -- regexp replace for distinct functionality
FROM DUAL
)
SELECT
--REGEXP_REPLACE( DBMS_LOB.SUBSTR( -- (1)
RTRIM( XMLAGG( XMLELEMENT( E, mycol, listdelim ).EXTRACT('//text()')
ORDER BY mycol ).GetClobVal(), LIST_DELIM )
--, 4000 ), LIST_DIST_MATCH, LIST_DIST_REPL ) -- (1)
AS mylist
FROM mytab, CFG

Distinct Count expression ssas

I want to create a new calculated member in OLAP Cube to count the number of distinct clients, I'm trying to write this expression, but i don't know how to make it in MDX:
Distinct count ([DIM.Clients].[CuNumber], where Sum([Measures].[QQT - FACT Ventes] >=1)
Any help please !
Thanks!
Hi thank you for replies,
I spent couple of days trying to make the query work, but without big progress.
First, I ran an SQL query on my datawarehouse to know what result should I get on my OLAP Cube,
this is my SQL query:
use [Warehouse]
select count(*) as count_row
From
(Select F.FaCunumberX
from [dbo].[Dim_FaClients] F
inner join [dbo].[FACT_Ventes] V on F.[SK_FAClients] = V.SK_FaClients
inner join [dbo].[Dim_Date] D on D.SK_Date = V.SK_Date
where
D.Year = '2014'
Group by F.FaCunumberX
having SUM(V.QQT) >= 1) test
the result I got is : 26026
On my OLAP Cube I tried several queries, but I didn't get the same result
this is some of the expressions that I tried :
WITH SET MySet AS
(Filter({[DIM FA Clients].[FaCuNumberX].[FaCuNumberX]}*{([Dim Date].[Year].&[2014],[Measures].[QQT - Fact Ventes])},[Measures].[QQT - Fact Ventes]>1 or [Measures].[QQT - Fact Ventes]=1)
MEMBER MEASURES.SETDISTINCTCOUNT AS
DISTINCTCOUNT(MySet)
SELECT {MEASURES.SETDISTINCTCOUNT} ON 0
FROM [CubeAll]
the result I got with this one is : 31575
I tried also this expression :
DistinctCount(Filter([DIM.Clients].[CuNumber].[CuNumber].Members,
[Measures].[QQT - FACT Ventes] >= 1
)
)
the same result : 31575
sincerely, I don't see what I'm missing on my expressions.
Thank's for your help !
This would be something like
DistinctCount(Filter([DIM.Clients].[CuNumber].[CuNumber].Members,
[Measures].[QQT - FACT Ventes] >= 1
)
)
See the documentation of Filter and DistinctCount for details.

Slow data insert on Sql

I'm just trying to import data to database (Sql) from a dataset, but its a bit slower when I try to import 70000 rows. Am I doing something wrong or missing?
Could please give me some advice how can I do it better?
Here is my asp.net code:
ArtiDB entity = new ArtiDB();
int grid = 50;
foreach (string item_kisiler in kisiler)
{
if (item_kisiler == "")
continue;
if (Tools.isNumber(item_kisiler) == false)
continue;
else
{
string gsm1 = item_kisiler;
if (gsm1.Length > 10)
gsm1 = gsm1.Substring(1, 10);
entity.veriaktar(gsm1, gg, grid);
}
}
This is my store prosedure:
alter proc veriaktar
(
#gsm1 nvarchar(50)=null,
#userid uniqueidentifier,
#grupid int = 0
)
as
begin
Declare #AltMusID int
if not exists (select * from tbl_AltMusteriler with (updlock, rowlock, holdlock) where Gsm1=#gsm1 and UserId=#userid)
begin
insert into tbl_AltMusteriler (Gsm1,UserId)
values (#gsm1,#userid)
Set #AltMusID = scope_identity()
end
else
begin
Set #AltMusID = (select AltMusteriID from tbl_AltMusteriler with (updlock, rowlock, holdlock) where Gsm1=#gsm1 and UserId=#userid)
end
if (#grupid != 0)
begin
if not exists (select * from tbl_KisiGrup with (updlock, rowlock, holdlock) where GrupID=#grupid and AltMusteriID=#AltMusID)
begin
insert into tbl_KisiGrup values(#grupid,#AltMusID)
end
end
end
go
The server is designed to work with sets. You're requiring it to deal with one row at a time, and with each row three times. Stop doing that and things will get better.
First go back to your VB docs and look for a way to do one INSERT for all 70,000 rows. If you can use the Bulk Copy (bcp) feature of SQL Server, you should be able to insert the whole set in 10-20 seconds.
The read-test-update paradigm might work here, but it's error-prone and forces the server to work much harder than necessary. If some of the 70,000 are new and others are updates, bulk them into a temporary table and use MERGE to apply it to tbl_AltMusteriler.
Second, uniqueidentifier isn't a good sign. It looks like tbl_AltMusteriler is used to generate a surrogate key. Why wouldn't a simple integer do? It would be faster to generate (with IDENTITY), easier to read, faster to query, and have better PK properties generally. (Also, make sure both the natural key and the surrogate are declared to be unique. What would it mean if two rows have the same values for gsm1 and userid, differing only by AltMusteriID?)
In short, find a way to insert all rows at once, so that your interaction with the DBMS is limited to one or at most two calls.

how to return a cursor or a result set from a oracle stored function

I have a stored function
CREATE OR REPLACE FUNCTION schedule(name in varchar2,pass in varchar2 )
begin
select t.name,s.starttime from traininfo t,schedule s, trainslot ts
where t.trainid in( select ts.trainid from trainslot
where ts.slotid in (select s.slotid from schedule s
where s.source='dhaka'
and s.dest='bogra' ))
end
I want to return this result set using a cursor.
I don't see where you are using either of the input parameters in your function. I'll assume that is either intentional or an oversight because you're simplifying the code. It also appears that your query is missing conditions to join between the traininfo, schedule, and trainslot tables in the outer query. It seems odd that your nested IN statements are turning around and querying the schedule and trainslot tables given this lack of join conditions. I don't know whether this is a result of copy-and-paste errors or something that was missed in posting the question or whether these are real problems. I'll make a guess at the query you're intending to write but if my guess is wrong, you'll have to tell us what your query is supposed to do (posting sample data and expected outputs would be exceptionally helpful for this).
CREATE OR REPLACE FUNCTION schedule(name in varchar2,pass in varchar2 )
RETURN sys_refcursor
is
l_rc sys_refcursor;
begin
open l_rc
for select t.name, s.starttime
from traininfo t,
schedule s,
trainslot ts
where t.trainid = ts.trainid
and s.slotid = ts.slotid
and s.source = 'dhaka'
and s.dest = 'borga';
return l_rc;
end;

Resources