I'm having problems with configuring a filter that replicates specific tables only - tungsten-replicator

I am trying to use filters to select specific tables to replicate.
I tried running this with the installer
./tools/tungsten-installer --master-slave -a \
...
--svc-extractor-filters=replicate \
--property=replicator.filter.replicate.do=test,*.foo"
and got this exception in trepctl status after the master had not installed properly:
Plugin class name property is missing or null: key=replicator.filter.replicate
which file is this properties file? How do I find it? Moreover, in specifying the settings for the filter, how do I know what exactly to put?
I discovered that I am supposed to Modify the configuration template file prior to configuration according to Issue 219 but what changes am I supposed to make in tungsten-replicator-2.0.5-diff that will later on be patched to the extraction?
Issue 254 suggests that If you want to apply a filter out of the box, you can use these options with tungsten-installer:
-a --property=replicator.filter.Replicate.ignoreFilter=schema_x.tablex,schema_x,tabley,schema_y,tablez
--svc-thl-filter=Replicate
However when I try using this for --property=replicator.filter.replicate.do,
but the problem is still the same:
pendingExceptionMessage: Plugin class name property is missing or null: key=replicator.filter.replicate
Your assistance will be greatly appreciated.
Rumbi
Update:
Hi
I had a look at this file: /root/tungsten/tungsten-replicator/samples/
conf/filters/default/tableignore.tpl .Acoording to this sample, a
static-SERVICE_NAME.properties file is supposed to have something like
this configured, please confirm if this is the correct syntax:
replicator.filter.tabledo=com.continuent.tungsten.replicator.filter.JavaScr iptFilter
replicator.filter.tabledo.script=${replicator.home.dir}/samples/
scripts/javascript-advanced/tabledo.js
replicator.filter.tabledo.tables=foo(database).bar(table)
replicator.stage.thl-to-dbms.filters=tabledo
However, I did not find tabledo.js (or something similar) in the
directory where tableignore.js exists. Could I please have the
location of this file. If there is an alternative way of specifiying
--property=replicator.filter.replicate.do=test without the use of
this .js file, your suggestions are most welcome.

Download the latest version of tungsten replicator. The missing tpl file was added about a month ago. After installation, the filtered tables should be added to static-service.properties under the section FILTERS.

Locate your replicator configuration file in static-YOUR_SERVICE_NAME.properties, e.g.
/opt/continuent/tungsten/tungsten-replicator/conf/static-mysql2vertica.properties
Make sure the individual dbms properties are set, in particular the setting replicator.applier.dbms:
# Batch applier basic configuration information.
replicator.applier.dbms=com.continuent.tungsten.replicator.applier.batch.SimpleBatchApplier
replicator.applier.dbms.url=jdbc:mysql:thin://${replicator.global.db.host}:${replicator.global.db.port}/tungsten_${service.name}?createDB=true
replicator.applier.dbms.driver=org.drizzle.jdbc.DrizzleDriver
replicator.applier.dbms.user=${replicator.global.db.user}
replicator.applier.dbms.password=${replicator.global.db.password}
replicator.applier.dbms.startupScript=${replicator.home.dir}/samples/scripts/batch/mysql-connect.sql
# Timezone and character set.
replicator.applier.dbms.timezone=GMT+0:00
replicator.applier.dbms.charset=UTF-8
# Parameters for loading and merging via stage tables.
replicator.applier.dbms.stageTablePrefix=stage_xxx_
replicator.applier.dbms.stageDirectory=/tmp/staging
replicator.applier.dbms.stageLoadScript=${replicator.home.dir}/samples/scripts/batch/mysql-load.sql
replicator.applier.dbms.stageMergeScript=${replicator.home.dir}/samples/scripts/batch/mysql-merge.sql
replicator.applier.dbms.cleanUpFiles=false
Depending on the database you are replicating to you may have to omit/modify some of the lines.
For more information see:
https://code.google.com/p/tungsten-replicator/wiki/Replicator_Batch_Loading

I don't know if this problem is still open or not.
I am using this version 2.0.6-xxx and installing the service using the parameters works for me.
I would like to point it out, that as the parameter says "--svc-extractor-filters" defines an extractor filter. Meaning that the parameters will guide the extraction of data in the master server.
If you intend to use it on the slave service, you should use the "--svc-applier-filters".
The parameters
--svc-extractor-filters=replicate \
--property=replicator.filter.replicate.do=test,*.foo"
supposed to create the following in the properties file:
This is the filter set up.
replicator.filter.replicate=com.continuent.tungsten.replicator.filter.ReplicateFilter
replicator.filter.replicate.ignore=
replicator.filter.replicate.do=test,*.foo
And you should also be able to find the
replicator.stage.binlog-to-q.filters=replicate
parameter set.
If you intend to use this filter in the slave, please find the line with:
replicator.stage.q-to-dbms.filters=mysqlsessions,pkey,bidiSlave
and change it as
replicator.stage.q-to-dbms.filters=mysqlsessions,pkey,bidiSlave,replicate
Hope this brief description did help to you!

Related

How to run liquibase changelogSyncSQL or changelogSync up to a tag and/or labels?

I'm adding liquibase to an existing project where tables and data already exist. I would like to know how I might limit the scope of changelogSync[SQL] to a subset of available changes.
Background
I've run liquibase generateChangeLog to capture the current state and placed this into say src/main/resources/db/changelog/changes/V2021.04.13.00.00.00__init01.yaml.
I've also added another changeset to cover some new requirements in a new file. Let's call it src/main/resources/db/changelog/changes/V2021.04.13.00.00.00__new-feature.yaml.
I've added a main changelog file src/main/resources/db/changelog/db.changelog-master.yaml with the following contents:
databaseChangeLog:
- includeAll:
path: changes
relativeToChangelogFile: true
I now want to ensure that when I run liquibase changelogSync[SQL] against a particular version of the db that the scope is limited to the first changelog init01, thereby allowing from that point on a liquibase update or updateToTag et al, to continue with changes following init01.
I'm surprised to see that the changelogSync[SQL] commands don't seem to offer some way (that I can see from the docs for how to do this.
Besides printing the SQL and manually changing it, is there something I've missed? Any suggested approaches welcome. Thanks!
What about changelogSyncToTagSQL ?
Wouldn't it cover your needs?
Or maybe you could try changelogSyncSQL with additional parameters "label" or and "context" ?
changelogSyncToTagSQL
context
labels
As it stands, the only solution I've found is to generate the SQL and then manually edit its contents to filter the change sets which don't correspond to your current schema.
Then you can apply the sql to the db.

Dynamic output issue when rowset is empty

I'm running a u-sql script similar to this:
#output =
SELECT Tag,
Json
FROM #table;
OUTPUT #output
TO #"/Dir/{Tag}/filename.txt"
USING Outputters.Text(quoting : false);
The problem is that #output is empty and the execution crashes. I already checked that if I don't use {tag} in the output path the script works well (it writes an empty file but that's the expectable).
Is there a way to avoid the crash and simply don't output anything?
Thank you
The form you are using is not yet publicly supported. Output to Files (U-SQL) documents the only supported version right now.
That said, depending on the runtime that you are using, and the flags that you have set in the script, you might be running the private preview feature of outputting to a set of files. In that case, I would expect that it would work properly.
Are you able to share a job link?

How to change reports or output saving location in robot framework from RIDE

When I run test cases from RIDE the reports are saved in the below path.
C:\Windows\Temp\RIDExf4xla.d
I want save reports in specific path. Can I do this from RIDE? Is there any setting to change the reports location?
Can anyone please suggest the way to do it.
Thanks
Look at the --outputdir command within the Robot Framework Documentation:
Here is what I use:
--outputdir C:/Robot/AutomationLogs/etc/etc --timestampoutputs
You use this one liner on the "Arguments" Field, right on the top of RIDE within the run tab.
From Wamans comment you can add formats to the end of the argument, to also change the dir name dynamically. See the 2nd answer within that SO question. This should be enough for you to get what you're asking for.
There is no way to set this within a UI.
Just set it by pasting that argument option within the "Arguments" Field at the top.
use below code in command line
C:\Tests\> robot -d C:\Test_results Test.robot

How does one create optional command line arguments in oozie workflow xml

Please bear in mind that I'm a complete rookie with oozie. I know that one can specify command line arguments in the oozie workflow xml by using the arg tag. I wondered how it is possible to specify an optional command line argument such that oozie will not complain that a required parameter is missing if the user doesn't specify it?
Many thanks in advance. If the information I've given is not specific enough, I can provide a concrete example when I log into my work machine tomorrow. We use apache commons CLI options to parse the options.
E.g. I want to make the following argument optional:
-e${endDateTime}
In your workflow wherever you would use ${myparam}, replace it with ${firstNotNull(wf:conf('myparam'), 'mydefaultvalue')}
In theory you should be able to use a "config-default.xml" file next to your "workflow.xml" file to give default values to the params in the workflow (see https://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html) but I couldn't get it working.

What is job.get() and job.getBoolean() in mapreduce

I am working on pdf document clustering over hadoop so I am learning mapreduce by reading some examples on internet.In wordcount examples have lines
job.get("map.input.file")
job.getboolean()
What is function of these functions?what is exactly map.input.file where is it to set? or is it just a name given to input folder?
Please post answer if anyone know.
For code see the following link
wordcount 2.0 example=http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html
These are job configurations. i.e. set of configurations which are passed on to each mapper and reducer. Now, these configurations consist of well defined mapreduce/hadoop related configurations as well as user-defined configurations.
In your case, map.input.file is a pre-defined configuration and yes it is set to a comma separated list of all the paths you have set as input path.
While wordcount.skip.patterns is a custom configuration which is set as per user's input, and you may see this configuration to be set in run() as follows:
conf.setBoolean("wordcount.skip.patterns", true);
As for when to use get and when to use getBoolean, it should be self-explanatory, as whenever you want to set a value of type boolean you will use getBoolean and setBoolean to get and set the specific config value respectively. Similarly you have specific methods for other data types as well. If it is string then you may use get().

Resources