Ignore empty datasets - u-sql

I writing a U-SQL Script that sometimes ends up with a empty data set.
Today the outputter writes an empty file when that happens. I would like the outputter to not write anything when that happens. Since I will flood the ADLS with empty files...
I have tried two things so far:
IF statement - the problem here is that I do a select count(*) from the data set and I cannot do IF #COUNT > 0 since the #count is a data set and the if statement would like to have a variable.
Write a custom outputter – But I have notice that it is not the ouputter that writes the file but some other code that runs afterwards. The file gets created after the custom outputter is done.
Does anyone have any guidance?
Thanks in advance!

One method you can do is do cook your data into a table first. Then you can INSERT into the table instead of writing to a file. Empty INSERTs do not cause job failure, nor will they affect performance at runtime or future performance on the table. Let me know if you have other questions!

Related

How to Handle BQ GA Export Changes?

I'm trying to reprocess ga_sessions_yyyymmdd data but am finding the ga_sessions never used to have a field called [channelGrouping] but it does in more recent data.
So my jobs work fine for the latest version of ga_sessions but when i try reprocess earleir ga_sessions data the job fails as it's missing the [channelGrouping] field.
Obviously usually this is what you want, but in this case it's not. I want to make sure i'm sticking to the latest ga_sessions schema and would like the job to just set missing cols to null for when they did not exist.
Is there any way around this?
Perhaps i need to make an empty table called ga_sessions_template_latest and union it on to whatever ga_sessions_ daily table i'm handling - maybe this will 'upgrade' the old ga_sessions to the new structure.
Attached is a screenshot of exactly what i mean (my union idea will actually be horrible due to nested fields in ga_sessions).
I don't have such a script yet. But since the tables are under your project you are able to update them. You can write a script and update the schema on all tables with missing columns from the most recent schema set.
I envision a script that gets most recent table schema.
Then goes back one by one to past tables, does a compare, identifies the missing columns, defines them as not required and nullable, and reads the schema + applies the additional columns and runs the update on the table. Data won't be modified, you will have just additional columns with null values.
you can try out for some also from the Web UI.

Teradata set table

I have a set table in teradata , when I load duplicate records throough informatica , session fails because it tries to push duplicate records in SET table.
I want that whenever duplicate records being loaded informatica rejects them using TPT or Relation connection
can anyone help me with properties I need to set
Do you really need to keep track of what records are rejected due to duplication in the TPT logs? It seems like you are open to suggestions about TPT or relational connections, so I assume you don't really care about TPT level logs.
If this assumption is correct then you can simply put an Aggregator Transformation in the mapping and mark every field as Group By. As expected, this will add a group by clause in the generated query and eliminate duplicates in the source data.
Please try following things:
1. If you'll use fload or TPT Fast load then the utility will implicitly remove the duplicates but this utility can only be used for loading into empty tables.
2. If you are trying to load data in non-empty table then place a sorter and de-dupe your data in Informatica
3. Also try changing the flag stop on error to 0 and flag Error limit in target to -1
Please share your results with us.

XOJO delete a record from SQLite database

I am using Xojo 2013 Version 1. I am trying to delete a record from a SQLite database. But I am failing miserably. Instead of deleting the record, it duplicates it for some reason.
Here is the code that I use:
command = "DELETE * from names where ID = 10"
namesDB.SQLExecute(command)
I am dynamically generating command. but however I change it it always does the same. Same result with or without quotes.
Any ideas?
The very first thing I would do is check to see if there is an error being generated.
if namesDB.Error then
dim s as string = namesDB.errorMessage
msgbox s
return
end
It will tell you if there's a database error and what the error is. If there's no error then the problem lies elsewhere.
FWIW, always, always, always check the error bit after every db operation. Unlike other languages, Xojo does NOT generate/throw an exception if there's a database error so it's up to you to check it.
Try calling Commit().
I just made a sample SQLite database with a "names" table, and this code worked fine:
db.SQLExecute("Delete from names where ID=2")
db.Commit
I have done a lot of work with XOJO and SQLite, and they work well together. I have never seen a record duplicated erroneously as you report. That is very weird. If this doesn't help, post more of your code. For example, I assume your "command" variable is a String, but maybe it's a Variant, etc.
I think on SQLite you don't need the * between the DELETE and the FROM.

Import CSV to SQL using schema.ini as validator

I'm using schema.ini to validate the data types/columns in my CSV file before loading into SQL. If there is a datatype mismatch in a row, it will still import the row but leaves that particular mismatch cell blank. Is there a way in which I can stop user from importing the CSV file if there is any issues and/or provide a error report (i.e. which row has problems).
The best approach is to check the file for any mismatch; but in the case of a large file, then this is not feasible.
You might need to load it first the check the loaded data in the table for the mismatch. This is much faster than checking the file (You can use a simple T-SQL script to check for nulls in the table).
IF mismatches are found, the user can then be notified and the table can then be cleared.
have a look at he FileHelpers library: http://www.filehelpers.com/
This is a very powerful library to do all kinds of imports, including csv and they also have a pretty neat error handling part
Using the Differents Error Modes The FileHelpers library has support
for 3 kinds of error handling.
In the standard mode you can catch the exceptions when something fail.
This approach not is bad but you lose some info about the current
record and you can't use the records array because is not asigned.
A more intelligent way is usign the ErrorMode.SaveAndContinue of the
ErrorManager:
Using the engine like this you have the good records in the records
array and in the ErrorManager you have te records with errors and can
do wherever you want.
Another option is to ignore the errors and continue how is showed in
this example
1 engine.ErrorManager.ErrorMode = ErrorMode.IgnoreAndContinue; 2
3 records = engine.ReadFile(... Copy to Clipboard | View Plain |
Print | ? engine.ErrorManager.ErrorMode =
ErrorMode.IgnoreAndContinue;
records = engine.ReadFile(... In the records array you only have the
good records.

How to restrict PL/SQL code from executing twice with the same values to the input parameters?

I want to restrict the execution of my PL/SQL code from repetition. That is, I have written a PL/SQL code with three input parameters viz, Month, Year and a Flag. I have executed the procedure with the following values for the parameters:
Month: March
Year : 2011
Flag: Y
Now, If I am trying to execute the procedure with the same values to the parameters as above, I want to write some code in the PL/SQL to restrict the unwanted second execution. Can anyone help. I hope the question is no ambiguous.
You can use the function result cache: http://www.oracle-developer.net/display.php?id=504 . So Oracle can do this for you.
I would create another table that would store the 3 parameters of each request. When your procedure is called it would first check the "parameter request" table to see if the calling parameters have beem used before. If found, then exit the procedure. If not found, then save the parameters and execute the rest of the procedure.
Your going to need to keep "State" of the last call somewhere. I would recommend creating a table with a datetime column.
When your procedure is called update this table. So, next time when your procedure is called.. check this table to see when was the last time your procedure was called and then proceed accordingly.
Why not set up a table to track what arguments you've already executed it with?
In your procedure, first check that table to see if similar parameters have already been processed. If so, exit (with or without an error).
If not, insert them and do the processing necessary.
Depending on how tight the requirements are, you'll need to get a exclusive lock on that table to prevent concurrent execution.
A nice plus would be an extra column with "in progress"/"done"/"error" status so that you can check if things are going on properly. (Maybe a timestamp too if that's important/interesting.)
This setup allows you to easily clear some of the executions (by deleting some rows) if you find things need to be re-done for whatever reason.
Make an insert in the beginning of the procedure, and do a select for update tolock the table so no one else can process any data and if everything goes ok with the procedure, commit and release the table 😀

Resources