How to process lines in a file in specific hadoop slave? - dictionary

We have a custom input format extending the FileInputFormat, which generates a separate split for each line in the input file. This file provides a host name in which the mapper handling this line should run.
How do I achieve this?
This is needed as the mapper reads data from DB and I want to run the mapper in the same machine as the DB server.

Not possible without writing your own implementation within the Hadoop code base.
If you are trying to add more data to the map input then pass it in as an argument to the job and you can then have it in your map() and concatenate it with the input.

Related

How can I access context property (incoming file name) in transformation (custom xslt)?

Many historical posts about BizTalk Context Accessor (CodePlex), but all links are broken. Is there a state-of-the-art context accessor functoid / component to be used today? Or, is there any other way like creating helper class or something like it?
My aim is to add file name (without path) into the destination message in a map using Custom XSLT. No existing orchestration, only picking up a file and running a map to transform message from source to destination format (that requires source file name added to it...).
I solved my problem (this time) using an orchestration where I can access the context of the incoming message easily, and after mapping, inject/update the outgoing message with the file name.
I had one additional problem to solve that helped me accept using orchestration as solution this time. Two flies in one stroke.
(Problem was - note to self - I wanted to reuse destination schema in another debatching scenario, i.e. it was a envelope schema. Funny thing, BizTalk was not able to resolve body content schema if map was run in receive port. However, running map inside an orchestration, it was able to resolve the body content schema and mapping to envelope schema as destination worked.)
An alternative to the Context Accessor functiod is to use the BRE Pipeline Framework, and read the context property and inject it into the XML Payload.

Executing a method which is named via a config file

In short: I have a method name provided via a JSON configuration file. I'd like to call a method using this provided name. The method (with a matching name) will exist in the backend. What's the best way of going about this?
I am not quite sure what I should be searching for as an example.
To detail: I am working with a legacy application, hence the VB.NET. I am building a single PDF file from multiple PDF sources. Most of these are as is, I simply read the configuration and grab the relevant files and the job is done. However some require processing, I'd like the configuration file to pass in a method name to be called that will perform extra processing on the PDF, whatever that may be.
As there can be a lot of PDF files that can vary, I cannot simply use a property such as "PostProcessing: true".
Any ideas?
You could use reflection to reflect method names back and check them against the name passed from the property in the config file.
Like so
Type magicType = Type.GetType("MagicClass");
MethodInfo magicMethod = magicType.GetMethod("ItsMagic");
object magicValue = magicMethod.Invoke(magicClassObject, new object[]{100});
That would work.. but to be honest, I'd go with a case statement as you'll be hardcoding the method names anyway (because they are code), and it'll be strongly typed (less chance of typos and errors).

Use of struct in CAPL CANalyzer

I'm writing a piece of code to simulate some stuff of diagnostic.
I've created with CANalyzer, a panel with tons of information that need to be shown using a picklist (called combobox)
What I want to do is to create a giant array of that struct that need to be selected using the SPN combobox (the picklist) , and the other parameters of the struct/object need to populate the other elements of the panel.
Is this possible without doing a tons of SysSetVariableInt or SysSetVariableString for each element?
Before I was doing this stuff using another technique, I parse the file with all the information that are stored in a giant matrix, then I use the method "on sysvar update" on the variable associated to the SPN picklist, to get the index of that, so I search for that index in the matrix, then I use the SysSetVariableInt or others, to set the values to the elements in the panel.
To populate the picklist I've found a pretty nice method "sysSetVariableDescriptionForValue" that helps to add elements, but the problem with this method, is that if you want to change elements, you can just overwrite, and not change all...so, if in a next iteration you push less element in the picklist, you will see also the old ones.
With "sysSetVariableDescriptionForValue" you basically are writing via code, the value table of that sysvariable, and is not possible (according to Vector), be flushed, on runtime... :/
I would love to do this thing using another approach, maybe with the struct is possible...i really don't know.
Any help will be very appreciated!
Regards!
TLDR; build a tool to create a .sysvar file from a structured input (comma-separated for instance), run it, get the .sysvar file and link it to the CANalyzer configuration.
I once had to create the entire testing interface with some components of the software. We didn't have a structured release procedure, and the test environment was rebuilt every time from scratch based on the new internal software interfaces. I too had to add hundreds of variables.
My solution was to generate .sysvar files programatically outside CANalyzer. Links to the .sysvar files are symbolic in the CANalyzer configuration, meaning if a file by the right name is in the right location, that file is going to be loaded.
What I want to do is to create a giant array of that struct that need
to be selected using the SPN combobox (the picklist) , and the other
parameters of the struct/object need to populate the other elements
of the panel. Is this possible without doing a tons of
SysSetVariableInt or SysSetVariableString for each element?
Create an external script to generate the .sysvar file. In the end it is just an xml file, you may study the structure of a demo one you save. Then, import that file in the CANalyzer config. You may need to close/re-open the configuration in case the .sysvar file changes.
PROs: no need to write a complicated CAPL script and update it every time a variable changes.
CONs: you must have a source for all the information, even a simple excel sheet, with all the description and such, and you have to create a tool that accepts the input file (let's assume a .csv file) and turns it into a .xml file with .sysvar extension instead.

BAM looping of multiple xml's from an BizTalk Orchestration

I have a BizTalk Orchestration which loops to create multiple XML files. I have configured BAM activities and views and deployed the Tracking .btt file to track the data.
The BAM activity does not loop through these multiple XML files, it creates only one instance. I need the BAM activity to loop through all the XML files.
Have you tried calling the BAM api directly within your looping structure?
Put in an expression shape with something like this in the loop
Microsoft.BizTalk.Bam.EventObservation.OrchestrationEventStream.BeginActivity("someactivity", someID);
Microsoft.BizTalk.Bam.EventObservation.OrchestrationEventStream.UpdateActivity("someactivity", someID, "someProperty", someNamespace);
Microsoft.BizTalk.Bam.EventObservation.OrchestrationEventStream.EndActivity("someactivity", someID);
Have a look at the Typed BAM API.
https://generatetypedbamapi.codeplex.com/
You should iniate a new BAM Activity from within the loop.
Also, make sure you use a unique ActivityId for each XML you have in your loop, I suspect this is the problem you are experiencing now.

What's a good way of deserializing data into mock objects?

I'm writing a mock backend service for my flex application. Because I will likely need to add/edit/modify the mock data over time, I'd prefer not to generate the data in code like this:
var mockData = new Array();
mockData.push(new Foo(1, "abc", "xyz"));
mockData.push(new Foo(2, "def", "xyz"));
...
Rather I'd like to store the data in a file in some format that it can be easily serialized into my strongly-typed value objects (i.e. Foo above). Ideally I'd like to create the data in a self-describing format (i.e. what data type each field is, what class it represents, etc)
Does this make sense? Any suggestions?
I would highly recommend the asx3m library. It easily allows serialization to a very readable XML format like this for an object of class Foo:
<com.example.Foo>
<myVar>Something</myVar>
<myArrList>
<string>one</string>
<string>two</string>
</myArrList>
</com.example.Foo>
The code to de-serialize looks like this:
Asx3mer.instance.fromXML(someXMLObj)
The project site has some good examples and it's not too hard to get this off the ground.
Write a method to serialize an "inflated" version of your object. Put the output of that into a file and load it up as part of your test setup. When you want to edit the values, simply edit the xml file. I dont know if this is possible in flex but I will usually include these files as a resource in my test library so that I do not need to copy the file to any specific location for a test run.

Resources