KafkaIO GroupId after restart - apache-beam-io

I am using Apache Beam's KafkaIO to read from a Kafka topic. Everything is working as expected, but if my job is terminated and restarted, there is a new groupID that is generated by the new job hence it ends up reading from the beginning of the topic.
In other words if my initial job had group.id = Reader-0_offset_consumer_11111111_my_group as a groupID, the next job may end up having group.id = Reader-0_offset_consumer_22222222_my_group this. As you can see there is some unique prefix that gets prepended before my specified my_group.
Is there any way to avoid this and keep the same group id each time?
Thank you

Related

Intershop: How To Delete a Channel After Orders Have Been Created

I am able to delete a channel from the back office UI and run the DeleteDomainReferences job in SMC to clear the reference and be able to create a new channel again with the same id.
However, once an order has been created, the above mentioned process won't work.
I heard that we can run some stored procedures against the database for situation like this.
Question: what are the stored procedures and steps to take to be able to clean any reference in Intershop so that I can create a channel with the same id again?
Update 9/26:
I did configure a new job in SMC to call DeleteDomainReferencesTransaction pipeline with ToBeRemovedDomainID attribute set to the domain id that I am trying to clean up.
The job ran without error in the log file. The job finished almost instantly, though.
Then I ran the DeleteDomainReferences job in SMC. This is the job I normally run after deleting a channel when there is no order in that channel. This job failed the following exception in the log file.
ORA-02292: integrity constraint (INTERSHOP.BASKETADDRESS_CO001) violated - child record found
ORA-06512: at "INTERSHOP.SP_DELETELINEITEMCTNRBYDOMAIN", line 226
ORA-06512: at line 1
Then I checked BASKETADDRESS table and did see the records for that domain id. This is, I guess, the reason why DeleteDomainReferences job failed.
I also execute the SP_BASKET_OBSERVER with that domain id, but it didn't seem to make a difference.
Is there something I am missing?
sp_deleteLineItemCtnrByDomain
-- Description : This procedure deletes basket/order related stuff.
-- Input : domainID The domain id of the domain to be deleted.
-- Output : none
-- Example : exec sp_deleteLineItemCtnrByDomain(domainid)
This stored procedure should delete the orders. Look up the domainid that you want to delete in the domaininformation table and call this procedure.
You can also call the pipeline DeleteDomainReferencesTransaction. Setup an smc job that calls this pipeline with the domainid that you want to clean up as a parameter. It also calls a second sp that cleans up the payment data so it actually a better approach.
Update 9/27
I tried this out on my local 7.7 environments. The DeleteDomainReferences job also removes the orders from the isorder table. No need to run sp_deleteLineItemCtnrByDomain separately. Recreating the channel I see no old orders. I'm guessing that you discovered a bug in the version you are running. Maybe related to the address table being split into different tables. Open a ticket for support to have them look at this.
With the assistance from intershop support, it has been determined that, in IS 7.8.1.4, the sp_deleteLineItemCtnrByDomain.sql has issue.
line 117 and 118 from 7.8.1.4
delete from staticaddress_av where ownerid in (select uuid from staticaddress where lineitemctnrid = i.uuid);
delete from staticaddress where lineitemctnrid = i.uuid;
should be replaced by
delete from basketaddress_av where ownerid in (select uuid from basketaddress where basketid = i.uuid);
delete from basketaddress where basketid = i.uuid;
After making the stored procedure update, running DeleteDomainReference job finishes without error and I was able to re-create the same channel again.
The fix will become available in 7.8.2 hotfix as I was told.

cache intersystems command to get the last updated timestamp of a table

I want to know the last update time of a Cache Intersystems DB table. Please let me know the relevant command. I ran through their command documentation:
http://docs.intersystems.com/latest/csp/docboo/DocBook.UI.Page.cls?KEY=GTSQ_commands
But I don't see any such command there. I also tried searching through this :
http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_currenttimestamp
Is this not the complete documentation of commands ?
Cache' does not maintain "last updated" information by default as it might introduce unnecessary performance penalty on DML operations.
You can add this field manually to every table of interest:
Property LastUpdated As %TimeStamp [ SqlComputeCode = { Set {LastUpdated}= $ZDT($H, 3) }, SqlComputed, SqlComputeOnChange = (%%INSERT, %%UPDATE) ];
This way it would keep the time of last Update/Insert for every row, but still it would not help you with Delete.
Alternatively - you can setup triggers for every DML operation that would maintain timestamp in a separate table.
Without additional coding the only way to gather this information is to scan Journal files, which is not really intended use for these and would be slow at best.

BizTalk 2013 file receive location trigger on non-event

I have a file receive location which is schedule to run at specific time of day. I need to trigger a alert or mail if receive location is unable to find any file at that location.
I know I can create custom components or I can use BizTalk 360 to do so. But I am looking for some out of box BizTalk feature.
BizTalk is not very good at triggering on non-events. Non-events are things that did not happen, but still represent a certain scenario.
What you could do is:
Insert the filename of any file triggering the receive location in a custom SQL table.
Once per day (scheduled task adapter or polling via stored procedure) you would trigger a query on the SQL table, which would only create a message in case no records were made that day.
Also think about cleanup: that approach will require you to delete any existing records.
Another options could be a scheduled task with a custom c# program which would create a file only if there were no input files, etc...
The sequential convoy solution should work, but I'd be concerned about a few things:
It might consume the good message when the other subscriber is down, which might cause you to miss what you'd normally consider a subscription failure
Long running orchestrations can be difficult to manage and maintain. It sounds like this one would be running all day/night.
I like Pieter's suggestion, but I'd expand on it a bit:
Create a table, something like this:
CREATE TABLE tFileEventNotify
(
ReceiveLocationName VARCHAR(255) NOT NULL primary key,
LastPickupDate DATETIME NOT NULL,
NextExpectedDate DATETIME NOT NULL,
NotificationSent bit null,
CONSTRAINT CK_FileEventNotify_Dates CHECK(NextExpectedDate > LastPickupDate)
);
You could also create a procedure for this, which should be called every time you receive a file on that location (from a custom pipeline or an orchestration), something like
CREATE PROCEDURE usp_Mrg_FileEventNotify
(
#rlocName varchar(255),
#LastPickupDate DATETIME,
#NextPickupDate DATETIME
)
AS
BEGIN
IF EXISTS(SELECT 1 FROM tFileEventNotify WHERE ReceiveLocationName = #rlocName)
BEGIN
UPDATE tFileEventNotify SET LastPickupDate = #LastPickupDate, NextPickupDate = #NextPickupDate WHERE ReceiveLocationName = #rlocName;
END
ELSE
BEGIN
INSERT tFileEventNotify (ReceiveLocationName, LastPickupDate, NextPickupDate) VALUES (#rlocName, #LastPickupDate, #NextPickupDate);
END
END
And then you could create a polling port that had the following Polling Data Available statement:
SELECT 1 FROM tFileEventNotify WHERE NextPickupDate < GETDATE() AND NotificationSent <> 1
And write up a procedure to produce a message from that table that you could then map to an email sent via SMTP port (or whatever other notification mechanism you want to use). You could even add columns to tFileEventNotify like EmailAddress or SubjectLine etc. You may want to add a field to the table to indicate whether a notification has already been sent or not, depending on how large you make the polling interval. If you want it sent every time you can ignore that part.
One option is to set up a BAM Alert to trigger if no file is received during the day.
Here's one mostly out of the box solution:
BizTalk Server: Detecting a Missing Message
Basically, it's an Orchestration that listens for any message from that Receive Port and resets a timer. If the timer expires, it can do something.

SOLR Delta import takes longer than next scheduled delta import cron job

We are using Solr 5.0.0. Delta import configuration is very simple, just like the apache-wiki
We have setup cron job to do delta-imports every 30 mins, simple setup as well:
0,30 * * * * /usr/bin/wget http://<solr_host>:8983/solr/<core_name>/dataimport?command=delta-import
Now, what happens if sometimes currently running delta-import takes longer than the next scheduled chron job?
Does SOLR Launches next delta-import in a parallel thread? Or ignores job until previous one is done?
Extending time in cron scheduler isn't an option as similar problem could happen as user and document number increases over the time...
I had the similar problem at my end.
Here is how I had a work around for it.
Note : I have implemented solr with core.
I have one table where in I have kept the info about solr like core name, last re-index date and re-indexing-required, current_status.
I have written a scheduler where it check which all cores needs re-indexing(delta-import) from the above table and starts the re-index.
Re-indexing request are sent/invoked after every 20 minutes(In your its 30 min).
When I start the re-indexing also update table and mark the status for the specific core as "inprogress".
After ten minutes I fire a request checking if the re-indexing is completed.
For checking the re-indexing I have used the request as :
final URL url = new URL(SOLR_INDEX_SERVER_PROTOCOL, SOLR_INDEX_SERVER_IP, Integer.valueOf(SOLR_INDEX_SERVER_PORT),
"/solr/"+ core_name +"/select?qt=/dataimport&command=status");
check the status for Committed or idle and the consider it as re-indexing is completed and mark the status of it as Idle in the table.
So re-indexing scheduler wont pick core which are in inprogress status.
Also it considers only those cores for re-indexing where in there some updates (which can be identified by flag "re-indexing-required").
Re-indexing is invoked only if re-indexing-required is true and current status is idle.
If there are some updates(identified by "re-indexing-required") but the current_status is inprogress the scheduler wont pick it for re-indexing.
I hope this may help you.
Note : I have used DIH for indexing and re-indexing.
Solr will simple ignore next import request until the end of the first one and it will not cache the second request. I can observe the behaviour and I've been read it somewhere but couldn't find it now.
Infact I'm dealing with same problem. I try to optimize the queries:
deltaImportQuery="select * from Assests where ID='${dih.delta.ID}'"
deltaQuery="select [ID] from Assests where date_created > '${dih.last_index_time}' "
I only retrieved ID field in first hand and than try to retrive the intended doc.
You may also specify your fields instead of '*' sign. since I use view it doesn't apply in my case
I will update if I had another solution.
Edit After Solution
Beyond the suggested request above I change one more think that speed up my indexing process 10 times. I had two big Entities nested. I used Entity inside another one like
<entity name="TableA" query="select * from TableA">
<entity name="TableB" query="select * from TableB where TableB.TableA_ID='${TableA.ID}'" >
Which yields to multi valued tableB fields. But For every row one request maded to db for TableB.
I changed my view using a with clause combined with a comma separeted field value. And parse the value from solr field mapping. and indexed it in to multivalued field.
My whole indexing process speed up from hours to minutes. Below is my view and solr mappping config.
WITH tableb_with as (SELECT * from TableB)
SELECT *,STUFF( (SELECT ',' + REPLACE( fieldb1, ',', ';') from tableb_with where tableb_with.tableA.ID = tableA.ID
for xml path(''), type).value('.', 'varchar(max)') , 1, 1, '') AS field2WithComma,
STUFF( (SELECT ',' + REPLACE( fieldb1, ',', ';') from tableb_with where tableb_with.tableA.ID = tableA.ID
for xml path(''), type).value('.', 'varchar(max)') , 1, 1, '') AS field2WithComma,
Al fancy Joins and unions goes into with clouse in tableB and also alot of joins in tableA. Actually this view held 200 hundred field in total.
solr mappping is goes like this :
<field column="field1WithComma" name="field1" splitBy=","/>
Hope It may help someone.

Extracting data files for different dates from database table

I am on windows and on Oracle 11.0.2
I have a table TEMP_TRANSACTION consisting of transactions for 6 months or so. Each record has a transaction date and other data with it.
Now I want to do the following:
1. Extract data from the table for each transaction date
2. Create a flat file with a name of the transaction date;
3. Output the data for this transaction date to the flat file;
4. Move on to the next date and then do the steps 1-3 again.
I create a simple sql script to spool the data out for a transaction date and it works. Now I want to put this in a loop or something like that so that it iterates for each transaction date.
I know this is asking for something from scratch but I need pointers on how to proceed.
I have Powershell, Java at hand and no access to Unix.
Please help!
Edit: Removed powershell as my primary goal is to get it out from Oracle (PL/SQL) and if not then explore Powershell OR Java.
-Abhi
I was finally able to achieve what I was looking for. Below are the steps (may be not the most efficient ones, but it did work :) )
Created a SQL script which spools the data I was looking for (for a single day).
set colsep '|'
column spoolname new_val spoolname;
select 'TRANSACTION_' || substr(&1,0,8) ||'.txt' spoolname from dual;
set echo off
set feedback off
set linesize 5000
set pagesize 0
set sqlprompt ''
set trimspool on
set headsep off
set verify off
spool &spoolname
Select
''||local_timestamp|| ''||'|'||Field1|| ''||'|'||field2
from << transaction_table >>
where local_timestamp = &1;
select 'XX|'|| count(1)
from <<source_table>>
where local_timestamp = &1;
spool off
exit
I created a file named content.txt where I populated the local timestamp values (i.e. the transaction date time-stamps as
20141007000000
20140515000000
20140515000000
Finally I used a loop on powershell which picked up one value from content.txt and then called the sql script (from step 1) and passed the parameter:
PS C:\TEMP\data> $content = Get-Content C:\TEMP\content.txt
PS C:\TEMP\data> foreach ($line in $content){sqlplus user/password '#C:\temp\ExtractData.sql' $line}
And that is it!
I still have to refine few things but at least the idea of splitting the data is working :)
Hope this helps others who are looking for similar thing.

Resources