Teradata drop statistics if statistics exist - teradata

DDL/DML script is modifying table structure, and presence of stats on columns stops it from proceeding.
Solution found was - drop the stats of columns that are being modified/dropped first.
However, if the stats do not exist, script is then, in turn, aborted when trying to drop these stats.
What is a cheap way to implement "if stats exist drop stats" logic?

This one seems to be working as intended in SQL*Assistant, per Dieter's feedback:
select * from dbc.StatsV
where tablename='${table_name}'
and columnName='${column_name}'
;
.IF ACTIVITYCOUNT>0 THEN .GOTO lbl_drop;
select 'Stats not found for column ${column_name}';
.EXIT;
.LABEL lbl_drop;
select 'Dropping stats for column ${column_name}';
DROP STATISTICS ON ${table_name} COLUMN ${column_name};

Related

How to SELECT a single record in table X with the largest value for X.a WHERE values for fields X.b & X.c are specified

I am using the following query to obtain the current component serial number (tr_sim_sn) installed on the host device (tr_host_sn) from the most recent record in a transaction history table (PUB.tr_hist)
SELECT tr_sim_sn FROM PUB.tr_hist
WHERE tr_trnsactn_nbr = (SELECT max(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_lot = '99524136'
AND tr_part = '6684112-001')
The actual table has ~190 million records. The excerpt below contains only a few sample records, and only fields relevant to the search to illustrate the query above:
tr_sim_sn |tr_host_sn* |tr_host_pn |tr_domain |tr_trnsactn_nbr |tr_qty_loc
_______________|____________|_______________|___________|________________|___________
... |
356136072015140|99524135 |6684112-000 |vattal_us |178415271 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178424418 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178628048 |1.0000000000
356136072015050|99524136 |6684112-001 |vattal_us |178628051 |-1.0000000000
356136072015836|99524137 |6684112-005 |vattal_us |178645337 |-1.0000000000
...
* = key field
The excerpt illustrates multiple occurrences of tr_trnsactn_nbr for a single value of tr_host_sn. The largest value for tr_trnsactn_nbr corresponds to the current tr_sim_sn installed within tr_host_sn.
This query works, but it is very slow, ~8minutes.
I would appreciate suggestions to improve or refactor this query to improve its speed.
Check with your admins to determine when they last updated the SQL statistics. If the answer is "we don't know" or "never" then you might want to ask them to run the following 4gl program which will create a SQL script to accomplish that:
/* genUpdateSQL.p
*
* mpro dbName -p util/genUpdateSQL.p -param "tmp/updSQLstats.sql"
*
* sqlexp -user userName -password passWord -db dnName -S servicePort -infile tmp/updSQLstats.sql -outfile tmp/updSQLtats.log
*
*/
output to value( ( if session:parameter <> "" then session:parameter else "updSQLstats.sql" )).
for each _file no-lock where _hidden = no:
put unformatted
"UPDATE TABLE STATISTICS AND INDEX STATISTICS AND ALL COLUMN STATISTICS FOR PUB."
'"' _file._file-name '"' ";"
skip
.
put unformatted "commit work;" skip.
end.
output close.
return.
This will generate a script that updates statistics for all table and all indexes. You could edit the output to only update the tables and indexes that are part of this query if you want.
Also, if the admins are nervous they could, of course, try this on a test db or a restored backup before implementing in a production environment.
I am posting this as a response to my request for an improved query.
As it turns out, the following syntax features two distinct features that greatly improved the speed of the query. One is to include tr_domain search criteria in both main and nested portions of the query. Second is to narrow the search by increasing the number of search criteria, which in the following are all included in the nested section of the syntax:
SELECT tr_sim_sn,
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_trnsactn_nbr IN (
SELECT MAX(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_part = '6684112-001'
AND tr_lot = '99524136'
AND tr_type = 'ISS-WO'
AND tr_qty_loc < 0)
This syntax results in ~0.5s response time. (credit to my colleague, Daniel V.)
To be fair, this query uses criteria outside the originally stated parameters that were included in the original post, making it difficult to impossible for others to attempt a reasonable answer. This omission was not on purpose of course, rather due to being fairly new to fundamentals of good query design. This query in part is a result of learning that when too-few or non-indexed fields are used as search criteria in a large table, it is sometimes helpful to narrow the search by increasing the number of search criteria items. The original had 3, this one has 5.

Allowing user to select multiple values from a list with RGtk2 in R?

I am creating a fairly simple GUI in R that creates figures of different data analysis based on values selected by a user. I am having trouble figuring out how to enable a user to select multiple values from a list. The method I am working on is below. The problem area is in the if statement which is where I need to place the user selections into a list.
CallSpecies<-function(options){
dialog<-gtkMessageDialog(NULL,0,"question","ok-cancel","Choose a species",show=FALSE)
sppmodel<-rGtkDataFrame(Species)
sppview<-gtkTreeView(sppmodel)
sppview$getSelection()$setMode("multiple")
column<-gtkTreeViewColumn("Species Code",gtkCellRendererText(),text=0)
column1<-gtkTreeViewColumn("Common Name",gtkCellRendererText(),text=1)
sppview$appendColumn(column)
sppview$appendColumn(column1)
scrolled_window<-gtkScrolledWindow()
scrolled_window$setSizeRequest(-1,150)
scrolled_window$add(sppview)
dialog[["vbox"]]$add(scrolled_window)
if (dialog$run()==GtkResponseType["ok"]){
}
dialog$destroy()
}
I ended up solving my own issue after quite a bit of reading documentation from other languages and translating it to R syntax. I have edited the code to show my solution. I am by no means an expert in R or GTK but this seems to work.
CallSpecies<-function(options){
dialog<-gtkMessageDialog(NULL,0,"question","ok-cancel","Choose a species. Hold control and click to select multiple species.",show=FALSE)
sppmodel<-rGtkDataFrame(Species)
sppview<-gtkTreeView(sppmodel)
sppview$getSelection()$setMode("multiple")
column<-gtkTreeViewColumn("Species Code",gtkCellRendererText(),text=0)
column1<-gtkTreeViewColumn("Common Name",gtkCellRendererText(),text=1)
sppview$appendColumn(column)
sppview$appendColumn(column1)
scrolled_window<-gtkScrolledWindow()
scrolled_window$setSizeRequest(-1,150)
scrolled_window$add(sppview)
dialog[["vbox"]]$add(scrolled_window)
spplist<-c()
if (dialog$run()==GtkResponseType["ok"]){
selection<-sppview$getSelection()
sel_paths<-selection$getSelectedRows()$retval
i=1
for (p in sel_paths){
sel_row<-sel_paths[[i]]$getIndices()[[1]]
sel_row<-sel_row+1
elem<-Species[sel_row,"SpeciesCode"]
elem<-as.character(elem)
spplist<-c(spplist,elem)
i<-i+1
}
print("spplist")
print(spplist)
}
dialog$destroy()
}

BigQuery Timeout Errors in R Loop Using bigrquery

I am running a query in a loop for each store in a dataframe. Typically there are 70 or so stores so the loop repeats that many times for each complete loop.
Maybe 75% of the time this loop works all the way through with no errors.
About 25% of the time I get the following error during any one of the loop iterations:
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached
Then I have to figure out which iteration bombed, and repeat the loop excluding iterations that completed successfully.
I can't find anything on the web to help me understand what is causing this seemingly random error. Perhaps it is a BQ technical issue? There does not seem to be any relation to the size of the result set it crashes on.
Here is the part of my code that does the loop...again it works all the way through most of the time. The cartesian product across IDs is intentional, as I want every combination of each Test ID with all possible Control IDs within store.
sql<-"SELECT pstore as store, max(pretrips) as pretrips FROM analytics.campaign_ids
group by 1 order by 1"
store_maxtrips<-query_exec(sql,project=project, max_pages = 1)
store_maxtrips
for (i in 1:length(store_maxtrips$store)) {
#pull back all ids shopping in same primary store as each test ID with their pre metrics
sql<-paste("SELECT a.pstore as pstore, a.id as test_id,
b.id as ctl_id,
(abs(a.zpbsales-b.zpbsales)*",wt_pb_sales,")+(abs(a.zcatsales-b.zcatsales)*",wt_cat_sales,")+
(abs(a.zsales-b.zsales)*",wt_retail_sales,")+(abs(a.ztrips-b.ztrips)*",wt_retail_trips,") as zscore
FROM analytics.campaign_ids a inner join analytics.pre_zscores b
on a.pstore=b.pstore
where a.id<>b.id and a.pstore=",store_maxtrips$store[i]," order by a.pstore, a.id, zscore")
print(paste("processing store",store_maxtrips$store[i]))
query_exec(sql,project=project,destination_table = "analytics.campaign_matches",
write_disposition = "WRITE_APPEND", max_pages = 1)
}
Solved!
It turns out I was using query_exec, but I should have been using insert_query_job since I do not want to retrieve any results. The errors were all happening in the course of R trying to retrieve results from BigQuery which I didn't want anyhow.
By using insert_query_job + wait_for(job) in my loop instead of the query_exec command, it eliminated all issues with the loop finishing.
I did also need to add a try() function to help circumvent some rare errors that still popped up with this approach. Thanks to MarkeD for this tip. So my final solution looked like this:
try(job<-insert_query_job(sql,project=project,destination_table = "analytics.campaign_matches",
write_disposition = "WRITE_APPEND"))
wait_for(job)
Thanks to everyone who commented and helped me research the issue.

Package input parameter SQL Developer

I am trying to create a package in Oracle SQL Developer and want to have a public input parameter that another user can input a date to. I tried the following code -
Create PACKAGE Assignment_1_Pack is
vstartDate date := to_date('&startDate', 'DD/MM/YYYY');
vendDate date := to_date('endDAte', 'DD/MM/YYYY');
END;
When I try to run it I get the following message
Empty package Assignment_1_pack definition (no public members).
I was expecting the window that pops up to prompt for an input but I haven't used packages before so I am not sure what it is I am doing wrong
run set define on;
Use a command create OR REPLACE package Assignment_1_Pack ...
SET DEFINE ON/OFF toggles the substitution variables on or off. Probably the substitution of variables was turned off, so SQLDeveloper doesn't ask for the value
See this for details: https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12040.htm#sthref2736
SET DEF[INE] {& | c | ON | OFF} Sets the character used to prefix
substitution variables to c.
ON or OFF controls whether SQL*Plus will scan commands for
substitution variables and replace them with their values. ON changes
the value of c back to the default '&', not the most recently used
character. The setting of DEFINE to OFF overrides the setting of the
SCAN variable.
create OR REPLACE package ... prevents from errors in a case when the package has already been created. Simple CREATE PACKAGE xxx command fails if the package already exists. If you create the package for the first time, then all subsequent attempts will fail. Create OR REPLACE ... drops the package if it already exists, and then creates it again, so it newer fails.

Undefined columns selected error in R

I apologize in advance because I'm extremely new to coding and was thrust into it just a few days ago by my boss for a project.
My data set is called s1. S1 has 123 variables and 4 of them have some form of "QISSUE" in their name. I want to take these four variables and duplicate them all, adding "Rec" to the end of each one (That way I can freely play with the new variables, while still maintaining the actual ones).
Running this line of code keeps giving me an error:
b<- llply(s1[,
str_c(names(s1)
[str_detect(names(s1), fixed("QISSUE"))],
"Rec")],table)
The error is as such:
Error in `[.data.frame`(s1, , str_c(names(s1)[str_detect(names(s1), fixed("QISSUE")) & :
undefined columns selected
Thank you!
Use this to get the subset. Of course there is other ways to do that with simpler code
b<- llply(s1[,
names(s1)[str_detect(names(s1), fixed("QISSUE"))]
],c)
nwnam=str_c(names(s1)[str_detect(names(s1), fixed("QISSUE"))],"Rec")
ndf=data.frame(do.call(cbind,b));colnames(ndf)=nwnam
ndf
# of course you can do
cbind(s1,ndf)

Resources