What is job.get() and job.getBoolean() in mapreduce - dictionary

I am working on pdf document clustering over hadoop so I am learning mapreduce by reading some examples on internet.In wordcount examples have lines
job.get("map.input.file")
job.getboolean()
What is function of these functions?what is exactly map.input.file where is it to set? or is it just a name given to input folder?
Please post answer if anyone know.
For code see the following link
wordcount 2.0 example=http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html

These are job configurations. i.e. set of configurations which are passed on to each mapper and reducer. Now, these configurations consist of well defined mapreduce/hadoop related configurations as well as user-defined configurations.
In your case, map.input.file is a pre-defined configuration and yes it is set to a comma separated list of all the paths you have set as input path.
While wordcount.skip.patterns is a custom configuration which is set as per user's input, and you may see this configuration to be set in run() as follows:
conf.setBoolean("wordcount.skip.patterns", true);
As for when to use get and when to use getBoolean, it should be self-explanatory, as whenever you want to set a value of type boolean you will use getBoolean and setBoolean to get and set the specific config value respectively. Similarly you have specific methods for other data types as well. If it is string then you may use get().

Related

Pass search strategy to filter from rest URI

First time using api-platform and Symfony 4 to create an API interface for a MySQL db.
I'm updating an old search interface for the db for which I need to replicate many of the search options. This includes being able to search on a given field using various matching operators/strategies. e.g. starts with, contains exactly equals, etc.
I've set everything up for the api using Annotations.
The #ApiFilter(SearchFilter::class, properties={"fieldname": "strategy"} annotation on my table class works as designed, but I am limited to one-and-only-one strategy per field. I need to be able to pass the strategy to the api search function in the url. something like:
/api/staff?lastname[start]=dav
or
/api/staff?lastname=david&match=contains
or
/api/staff/lastname/son?searchtype=end
would be fine.
I can't figure out how to set this up. Shockingly, to me anyway, this common requirement doesn't seem to be documented at all.
The file CustomSearchFilter.php located at the repo https://github.com/jordonedavidson/custom_search_filter solves this use-case using the
/api/staff?lastname[start]=dav
syntax.
The file was written by Kévin Dunglas (the author of Api Platform) and is presented with his blessing.

Robot Framework: Populating parameters based off of a robot.properties file?

I am attempting to mock-up a 'robot.properties' file to be utilized within my test cases with the Robot Framework. Inside my robot.properties file it contains things like for example:
project.username=stackoverflow
inside my test case file I have tried several times to 'import' the robot.properties file via adding within Settings: Resource ../path/to/properties and etc (see directory structure below), but when I attempt to pass 'project.username' as an argument to a test it passes it as the literal string value 'project.username' and not the value 'stack overflow'. I am new to Robot, I have implemented this in other languages like Java/C#, but I fully assume that the import is preventing me from accessing my value. Any help would be greatly appreciated, unfortunately this way of driving testing isn't really referenced much online that I can find.
Dir Structure:
Tests/Acceptance/MyTestCase.robot
Tests/robot.properties
If I try Library ../robot.properties I get:
"Import by filename is not supported"
If I try Resource ../robot.properties I get:
"Unsupported file format .properties"
Robot framework doesn't support a ".properties" file.
One solution is to use a variable file, which lets you define variables in python. Since you want to use dot notation, one way is to create a class and define your variables as properties of the class. The variable file can then make an instance of that class as a variable, and you can use extended variable syntax to access the variables.
The advantage to using a variable file over a plain text file is that you can create variables dynamically by calling other python functions. As a simple example, you could create a variable called "now" that contains the current date, or "host" that is the hostname of the machine running the test.
Example:
properties.py
import platform
class Properties(object):
username = "stackoverflow"
password = "SuperSecret!"
hostname = platform.uname()[1]
properties = Properties()
example.robot
*** Settings ***
Variables properties.py
Suite Setup log running on ${properties.hostname}
*** Test Cases ***
Example
should be equal ${properties.username} stackoverflow

Searching a DICOM server for metadata

I want to search the DICOM server. if for example user enters a patient id to serach, then my app populate a table with all the metadata relating to that id , such as id, name, accession number e.tc. if tha study id exists in the dicom server, How can this be done using dcm4chee kit. –
You can use dcm4che3 tool dcm4che-tool-findscu. This code shows you how to do a C-FIND against a PACS (or whatever implementing C-FIND as SCP).
FindSCU.java is quite clear, take a while and don't get missed through Apache Commons CLI code to understand input from console. Most of CLI management code is not in this project, but you can find it in the dcm4che3 tool dcm4che-tool-common project, org.dcm4che3.tool.common.CLIUtils.java class.
Take into account following considerations:
Specify the search level of Query/Retrieve. You can use several search levels in order to match attributes into a PACS. If you look at lines 260:265 of FindSCU.java, you will see that you can manage four different levels: PATIENT|STUDY|SERIES|IMAGE.This will instruct C-FIND SCP how to search matching attributes.
Tell C-FIND SCP what attributes do you want to retrieve. If you want to search studies to be retrieved later, you must ask for 0020, 000D Study​InstanceUID tag.
Of course, add all attributes that you want to populate your table.
Use retrieved 0020, 000D Study​InstanceUID tag value to do the C-GET/C-MOVE operation.
You can see how to configure attribute keys to do C-FIND SCU into CLIUtils.java class that is part of dcm4che3 tool dcm4che-tool-common project. See CLIUtils.addAttributes(Attributes, String[]).
Hope it helps!
Edit
Due to you comment you are using dcm4che2 and that you already have a DicomObject with the search result, if you want to obtain metadata from this DicomObject you must parse it before, using DicomInputStream, and then you can use getXXXX(Tag) from BasiDicomObject, something like this:
DicomObject dcmObj;
DicomInputStream dis = null;
dis = new DicomInputStream(file);
dcmObj = dis.readDicomObject();
String someVar = dcmObj.getString(Tag.SeriesInstanceUID);
Keep in mind, some attributes are inside sequences, and thus you have to search it before.
You can also take a look into dcm4che-tool-dcm2txt, you will see Dcm2Txt.java and in lines 170 and so on, there is how to parse whole dicom object.
If you need some general description about the DICOM network protocol, you could read the "Understanding DICOM with Orthanc" guide, and more specifically the section about C-Find.

Attempting to deploy a binary to a location where a different binary is already stored

When I am publishing my page from tridio 2009, I am getting the error below:
Destination with name 'FTP=[Host=servername, Location=\RET, Password=******, Port=21, UserName=retftp]' reported the following failure:
A processing error occurred processing a transport package Attempting to deploy a binary [Binary id=tcm:553-974947-16 variantId= sg= path=/Images/image_thumbnail01.jpg] to a location where a different binary is already stored Existing binary: tcd:pub[553]/binarymeta[974950]
Below is my code snippet
Component bigImageComp = th.GetComponentValue("bigimage", imageMetaFields);
string bigImagefileName = string.Empty;
string bigImagePath = string.Empty;
bigImagefileName = bigImageComp.BinaryContent.Filename;
bigImagePath = m_Engine.AddBinary(bigImageComp.Id, TcmUri.UriNull, null, bigImageComp.BinaryContent.GetByteArray(), Path.GetFileName(bigImagefileName));
imageBigNode.InnerText = bigImagePath;
Please suggest
Chris Summers addressed this on his blog. Have a read of the article - http://www.urbancherry.net/blogengine/post/2010/02/09/Unique-binary-filenames-for-SDL-Tridion-Multimedia-Components.aspx
Generally in Tridion Content Delivery we can only keep one version of a Component. To get multiple "versions" of a MMC we have to publish MMC as variants. By this way we can produce as many variants as we need via templating.
You can refer below article for more detail:
http://yatb.mitza.net/2012/03/publishing-images-as-variants.html#!/2012/03/publishing-images-as-variants.html
When adding binaries you must ensure that the file and it's metadata is unique. If one of the values e.g. the filename appears to be the same but the rest of the metadata does not match, then deployment will fail.
In the given example (as Nuno points out) the binary 910 is trying to deploy over binary 703. The filename is the same but the binary is identified to be not the same (in the case a different ID from the same publication). For this example you will need to rename one of the binaries (either the file itself or change the path) and everything will be fine.
Other scenarios can be that the same image is used from two different templates and the template id is used as the varient ID. If this is the case it is the same image BUT the varient ID check fails so to avoid overwriting the same image the deployer fails it.
Often unpublishing can help, however, the image is only removed when ALL references to it are removed. So if it is used from more than one place there are more open references.
This is logical protection from the deployer. You would not want the wrong image replacing another and either upsetting the layout or potentially changing the content to another meeting (think advertising banner).
This is actual cause and reason for above problem (Something got from forum)

I'm having problems with configuring a filter that replicates specific tables only

I am trying to use filters to select specific tables to replicate.
I tried running this with the installer
./tools/tungsten-installer --master-slave -a \
...
--svc-extractor-filters=replicate \
--property=replicator.filter.replicate.do=test,*.foo"
and got this exception in trepctl status after the master had not installed properly:
Plugin class name property is missing or null: key=replicator.filter.replicate
which file is this properties file? How do I find it? Moreover, in specifying the settings for the filter, how do I know what exactly to put?
I discovered that I am supposed to Modify the configuration template file prior to configuration according to Issue 219 but what changes am I supposed to make in tungsten-replicator-2.0.5-diff that will later on be patched to the extraction?
Issue 254 suggests that If you want to apply a filter out of the box, you can use these options with tungsten-installer:
-a --property=replicator.filter.Replicate.ignoreFilter=schema_x.tablex,schema_x,tabley,schema_y,tablez
--svc-thl-filter=Replicate
However when I try using this for --property=replicator.filter.replicate.do,
but the problem is still the same:
pendingExceptionMessage: Plugin class name property is missing or null: key=replicator.filter.replicate
Your assistance will be greatly appreciated.
Rumbi
Update:
Hi
I had a look at this file: /root/tungsten/tungsten-replicator/samples/
conf/filters/default/tableignore.tpl .Acoording to this sample, a
static-SERVICE_NAME.properties file is supposed to have something like
this configured, please confirm if this is the correct syntax:
replicator.filter.tabledo=com.continuent.tungsten.replicator.filter.JavaScr iptFilter
replicator.filter.tabledo.script=${replicator.home.dir}/samples/
scripts/javascript-advanced/tabledo.js
replicator.filter.tabledo.tables=foo(database).bar(table)
replicator.stage.thl-to-dbms.filters=tabledo
However, I did not find tabledo.js (or something similar) in the
directory where tableignore.js exists. Could I please have the
location of this file. If there is an alternative way of specifiying
--property=replicator.filter.replicate.do=test without the use of
this .js file, your suggestions are most welcome.
Download the latest version of tungsten replicator. The missing tpl file was added about a month ago. After installation, the filtered tables should be added to static-service.properties under the section FILTERS.
Locate your replicator configuration file in static-YOUR_SERVICE_NAME.properties, e.g.
/opt/continuent/tungsten/tungsten-replicator/conf/static-mysql2vertica.properties
Make sure the individual dbms properties are set, in particular the setting replicator.applier.dbms:
# Batch applier basic configuration information.
replicator.applier.dbms=com.continuent.tungsten.replicator.applier.batch.SimpleBatchApplier
replicator.applier.dbms.url=jdbc:mysql:thin://${replicator.global.db.host}:${replicator.global.db.port}/tungsten_${service.name}?createDB=true
replicator.applier.dbms.driver=org.drizzle.jdbc.DrizzleDriver
replicator.applier.dbms.user=${replicator.global.db.user}
replicator.applier.dbms.password=${replicator.global.db.password}
replicator.applier.dbms.startupScript=${replicator.home.dir}/samples/scripts/batch/mysql-connect.sql
# Timezone and character set.
replicator.applier.dbms.timezone=GMT+0:00
replicator.applier.dbms.charset=UTF-8
# Parameters for loading and merging via stage tables.
replicator.applier.dbms.stageTablePrefix=stage_xxx_
replicator.applier.dbms.stageDirectory=/tmp/staging
replicator.applier.dbms.stageLoadScript=${replicator.home.dir}/samples/scripts/batch/mysql-load.sql
replicator.applier.dbms.stageMergeScript=${replicator.home.dir}/samples/scripts/batch/mysql-merge.sql
replicator.applier.dbms.cleanUpFiles=false
Depending on the database you are replicating to you may have to omit/modify some of the lines.
For more information see:
https://code.google.com/p/tungsten-replicator/wiki/Replicator_Batch_Loading
I don't know if this problem is still open or not.
I am using this version 2.0.6-xxx and installing the service using the parameters works for me.
I would like to point it out, that as the parameter says "--svc-extractor-filters" defines an extractor filter. Meaning that the parameters will guide the extraction of data in the master server.
If you intend to use it on the slave service, you should use the "--svc-applier-filters".
The parameters
--svc-extractor-filters=replicate \
--property=replicator.filter.replicate.do=test,*.foo"
supposed to create the following in the properties file:
This is the filter set up.
replicator.filter.replicate=com.continuent.tungsten.replicator.filter.ReplicateFilter
replicator.filter.replicate.ignore=
replicator.filter.replicate.do=test,*.foo
And you should also be able to find the
replicator.stage.binlog-to-q.filters=replicate
parameter set.
If you intend to use this filter in the slave, please find the line with:
replicator.stage.q-to-dbms.filters=mysqlsessions,pkey,bidiSlave
and change it as
replicator.stage.q-to-dbms.filters=mysqlsessions,pkey,bidiSlave,replicate
Hope this brief description did help to you!

Resources