How to Handle BQ GA Export Changes? - google-analytics

I'm trying to reprocess ga_sessions_yyyymmdd data but am finding the ga_sessions never used to have a field called [channelGrouping] but it does in more recent data.
So my jobs work fine for the latest version of ga_sessions but when i try reprocess earleir ga_sessions data the job fails as it's missing the [channelGrouping] field.
Obviously usually this is what you want, but in this case it's not. I want to make sure i'm sticking to the latest ga_sessions schema and would like the job to just set missing cols to null for when they did not exist.
Is there any way around this?
Perhaps i need to make an empty table called ga_sessions_template_latest and union it on to whatever ga_sessions_ daily table i'm handling - maybe this will 'upgrade' the old ga_sessions to the new structure.
Attached is a screenshot of exactly what i mean (my union idea will actually be horrible due to nested fields in ga_sessions).

I don't have such a script yet. But since the tables are under your project you are able to update them. You can write a script and update the schema on all tables with missing columns from the most recent schema set.
I envision a script that gets most recent table schema.
Then goes back one by one to past tables, does a compare, identifies the missing columns, defines them as not required and nullable, and reads the schema + applies the additional columns and runs the update on the table. Data won't be modified, you will have just additional columns with null values.
you can try out for some also from the Web UI.

Related

cache intersystems command to get the last updated timestamp of a table

I want to know the last update time of a Cache Intersystems DB table. Please let me know the relevant command. I ran through their command documentation:
http://docs.intersystems.com/latest/csp/docboo/DocBook.UI.Page.cls?KEY=GTSQ_commands
But I don't see any such command there. I also tried searching through this :
http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_currenttimestamp
Is this not the complete documentation of commands ?
Cache' does not maintain "last updated" information by default as it might introduce unnecessary performance penalty on DML operations.
You can add this field manually to every table of interest:
Property LastUpdated As %TimeStamp [ SqlComputeCode = { Set {LastUpdated}= $ZDT($H, 3) }, SqlComputed, SqlComputeOnChange = (%%INSERT, %%UPDATE) ];
This way it would keep the time of last Update/Insert for every row, but still it would not help you with Delete.
Alternatively - you can setup triggers for every DML operation that would maintain timestamp in a separate table.
Without additional coding the only way to gather this information is to scan Journal files, which is not really intended use for these and would be slow at best.

Teradata set table

I have a set table in teradata , when I load duplicate records throough informatica , session fails because it tries to push duplicate records in SET table.
I want that whenever duplicate records being loaded informatica rejects them using TPT or Relation connection
can anyone help me with properties I need to set
Do you really need to keep track of what records are rejected due to duplication in the TPT logs? It seems like you are open to suggestions about TPT or relational connections, so I assume you don't really care about TPT level logs.
If this assumption is correct then you can simply put an Aggregator Transformation in the mapping and mark every field as Group By. As expected, this will add a group by clause in the generated query and eliminate duplicates in the source data.
Please try following things:
1. If you'll use fload or TPT Fast load then the utility will implicitly remove the duplicates but this utility can only be used for loading into empty tables.
2. If you are trying to load data in non-empty table then place a sorter and de-dupe your data in Informatica
3. Also try changing the flag stop on error to 0 and flag Error limit in target to -1
Please share your results with us.

R - how to react to database inserts/updates/deletes?

I'm reading in data from an SQLite database table into a data.frame with R's DBI. Often (as often as every 5 secs), new records get added into the database table externally, or existing ones updated/deleted, at which point I need to propagate these changes to my data.frame.
So the question is how can I hook onto and respond to these database events in R? I don't want to have to keep querying the database every 5 secs just to make sure nothing has changed. Is there some callback mechanism at my disposal?
If you have access to the C code that is writing your SQL data, then you can implement a callback:
http://www.sqlite.org/c3ref/update_hook.html
and then in your callback function you could update the timestamp of a file if the table being modified is one your R code cares about. Then your R code checks the timestamp of that file, and if its changed, only then does it need to query the SQLite database.
Now I don't know if you could add a callback to the SQLite connection held by R and expect to get a callback if another SQLite connection/process changes the database table. I doubt it, I suspect the callbacks are only triggered if the connection they are registered with does the update, because otherwise all sorts of asynchronous things happen, and there's no event handler.
Another idea is to use triggers to update a database table of modification times. Define triggers on all tables you care about so that they update a row in a "last modified" table. Then use the file modification time to check for any change to the database, and then your R only has to query the "last modified" table to see what specific table has changed since last check.

Determine flyway variables from earlier SQL step

I'd like to use flyway for a DB update with the situation that an DB already exists with productive data in it. The problem I'm looking at now (and I did not find a nice solution yet), is the following:
There is an existing DB table with numeric IDs, e.g.
create table objects ( obj_id number, ...)
There is a sequence "obj_seq" to allocate new obj_ids
During my DB migration I need to introduce a few new objects, hence I need new
object IDs. However I do not know at development time, what ID
numbers these will be
There is a DB trigger which later references these IDs. To improve performance I'd like to avoid determine the actual IDs every time the trigger runs but rather put the IDs directly into the trigger
Example (very simplified) of what I have in mind:
insert into objects (obj_id, ...) values (obj_seq.nextval, ...)
select obj_seq.currval from dual
-> store this in variable "newID"
create trigger on some_other_table
when new.id = newID
...
Now, is it possible to dynamically determine/use such variables? I have seen the flyway placeholders but my understanding is that I cannot set them dynamically as in the example above.
I could use a Java-based migration script and do whatever string magic I like - so, that would be a way of doing it, but maybe there is a more elegant way using SQL?
Many thx!!
tge
If the table you are updating contains only reference data, get rid of the sequence and assign the IDs manually.
If it contains a mix of reference and user data, you need to select the id based on values in other columns.

SQLite Modify Column

I need to modify a column in a SQLite database but I have to do it programatically due to the database already being in production. From my research I have found that in order to do this I must do the following.
Create a new table with new schema
Copy data from old table to new table
Drop old table
Rename new table to old tables name
That seems like a ridiculous amount of work for something that should be relatively easy. Is there not an easier way? All I need to do is change a constraint on a existing column and give it a default value.
That's one of the better-known drawbacks of SQLite (no MODIFY COLUMN support on ALTER TABLE), but it's on the list of SQL features that SQLite does not implement.
edit: Removed bit that mentioned it may being supported in a future release as the page was updated to indicate that is no longer the case
If the modification is not too big (e.g. change the length of a varchar), you can dump the db, manually edit the database definition and import it back again:
echo '.dump' | sqlite3 test.db > test.dump
then open the file with a text editor, search for the definition you want to modify and then:
cat test.dump | sqlite3 new-test.db
As said here, these kind of features are not implemented by SQLite.
As a side note, you could make your two first steps with a create table with select:
CREATE TABLE tmp_table AS SELECT id, name FROM src_table
When I ran "CREATE TABLE tmp_table AS SELECT id, name FROM src_table", I lost all the column type formatting (e.g., time field turned into a integer field
As initially stated seems like it should be easier, but here is what I did to fix. I had this problem b/c I wanted to change the Not Null field in a column and Sqlite doesnt really help there.
Using the 'SQLite Manager' Firefox addon browser (use what you like). I created the new table by copying the old create statement, made my modification, and executed it. Then to get the data copied over, I just highlighted the rows, R-click 'Copy Row(s) as SQL', replaced "someTable" with my table name, and executed the SQL.
Various good answers already given to this question, but I also suggest taking a look at the sqlite.org page on ALTER TABLE which covers this issue in some detail: What (few) changes are possible to columns (RENAME|ADD|DROP) but also detailed workarounds for other operations in the section Making Other Kinds Of Table Schema Changes and background info in Why ALTER TABLE is such a problem for SQLite. In particular the workarounds point out some pitfalls when working with more complex tables and explain how to make changes safely.

Resources