WebSphere WorkManager Cluster Double Execution of the Job - asynchronous

I have an Application deployed on WebSphere 8.5 with Java 1.7.1, defined with a cluster of 2 nodes.
In this application there is an EJB that, through a work manager submit an Async Job.
The problem is that on WAS 8.5 the Job is executed two times on both node of the cluster. In WAS 6.1 this did not happen.
The work is submitted by an Alarm Manager. Below the code extracted:
WorkManager wm = serviceLocator.getWorkManager("NameOfCustomWorkManager");
AsynchScope scope = wm.findAsynchScope("scopeName");
if (scope == null)
scope = wm.createAsynchScope("scopeName");
AlarmManager alarmManager = scope.getAlarmManager();
alarmManager.create(listener, "Alarm Context Info", (int) (DateUtils.getNextTime(nextTime) - System.currentTimeMillis())); --Fired on a certain hours
logger.info("Alarm fired.");
Somebody know if on was 8.5 there are additional configuration to avoid the problem described?

WorkManager in WebSphere Application Server, regardless of which version, does not have and has never had the ability to operate or coordinate across remote JVMs. The designed behavior of the WorkManager is that it is only able to run the work that you submit on the same JVM from where you submitted the work, and that it has no awareness of duplicate work that you submit from a different JVM and no mechanism for coordinating work across JVMs. The same is true of AlarmManager instances that you obtain from the WorkManager. (WebSphere Application Server actually has a way of accomplishing the above, which is the Scheduler, but the above code is not using that). Could it be possible that some earlier logic in the application that is impacted by the version change could be causing the Alarm to be created on both members now, whereas previously it would have only been created on one?

Related

The configured execution strategy 'SqlRetryingExecutionStrategy' does not support user-initiated transactions

I have ASP.Net 4.7.2 window service which is processing NServiceBus messages. Currently it is deployed to On-Premise server. It has retry mechanism as well and working fine. Now I am going to containerizing it. While running into docker window container, it is doing SQL operation using Entity framework and giving exception as mentioned below:
The configured execution strategy 'SqlRetryingExecutionStrategy' does not support user-initiated transactions. Use the execution strategy returned by 'DbContext.Database.CreateExecutionStrategy()' to execute all the operations in the transaction as a retriable unit.
While running locally by installing manually or on On-Premise server, it is working fine but in container it is throwing exception.
Can any one help me what can be the root cause?
It sounds like the piece of code does manual transaction management and is not wrapped within an execution strategy execute.
if your code initiates a transaction using BeginTransaction() you are defining your own group of operations that need to be treated as a unit, and everything inside the transaction would need to be played back shall a failure occur.
The solution is to manually invoke the execution strategy with a delegate representing everything that needs to be executed. If a transient failure occurs, the execution strategy will invoke the delegate again.
https://learn.microsoft.com/en-us/ef/core/miscellaneous/connection-resiliency#execution-strategies-and-transactions
using var db = new SomeContext();
var strategy = db.Database.CreateExecutionStrategy();
strategy.Execute(
() =>
{
using var context = new SomeContext();
using var transaction = context.Database.BeginTransaction();
context.SaveChanges();
transaction.Commit();
});
``

Embedded hazelcast cluster occasionally breaks for no apparent reason

The hazelcast cluster runs in an application running on Kubernetes. I can't see any traces of partitioning or other problems in the logs. At some point, this exception starts to appear in the logs:
hz.dazzling_morse.partition-operation.thread-1 com.hazelcast.logging.StandardLoggerFactory$StandardLogger: app-name, , , , , - [172.30.67.142]:5701 [app-name] [4.1.5] Executor is shut down.
java.util.concurrent.RejectedExecutionException: Executor is shut down.
at com.hazelcast.scheduledexecutor.impl.operations.AbstractSchedulerOperation.checkNotShutdown(AbstractSchedulerOperation.java:73)
at com.hazelcast.scheduledexecutor.impl.operations.AbstractSchedulerOperation.getContainer(AbstractSchedulerOperation.java:65)
at com.hazelcast.scheduledexecutor.impl.operations.SyncBackupStateOperation.run(SyncBackupStateOperation.java:39)
at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184)
at com.hazelcast.spi.impl.operationexecutor.OperationRunner.runDirect(OperationRunner.java:150)
at com.hazelcast.spi.impl.operationservice.impl.operations.Backup.run(Backup.java:174)
at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:256)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:237)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:452)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:166)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:136)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
I can't see any particular operation failing, prior to that. I do run some scheduled operations myself, but they are executing inside try-catch blocks and are never throwing.
The consequence is that whenever a node in the cluster restarts no data is replicated to the new node, which eventually renders the entire cluster useless - all data that's supposed to be cached and replicated among nodes disappears.
What could be the cause? How can I get more details about what causes whatever executor hazelcast uses to shut down?
Based on other conversations...
Your Runnable / Callable should implement HazelcastInstanceAware.
Don't pass the HazelcastInstance or IExecutorService as a non-transient argument... as the instance where the runnable is submitted will be different from the one where it runs.
See this.

CloudFoundry App instances - EF Core database migration

I've written a .NET Core Rest API which does migrate/ update the database (using Entity Framework Core) in Startup.cs. Currently, only one instance is running in the production environment. It seems to be recommended to run 2 instances in production.
What happens while executing the cf push command? Are both instances stopped automatically or do I need to execute cf stop?
In addition, how do I prevent both instances from updating the database?
I've read about the CF_INSTANCE_INDEX environment variable. Is it OK to only start the database migration when CF_INSTANCE_INDEX is 0? Or does CloudFoundry provide the next mechanism: start the first instance and when this one is up-and-running, the second instance will be started?
What happens while executing the cf push command? Are both instances stopped automatically or do I need to execute cf stop?
Yes, your app will stop. The new code will stage (i.e. buildpacks run) and produce a droplet. Then the system will bring up all the requested instances using the new droplet.
In addition, how do I prevent both instances from updating the database? I've read about the CF_INSTANCE_INDEX environment variable. Is it OK to only start the database migration when CF_INSTANCE_INDEX is 0?
You can certainly do it that way. The instance number is guaranteed to be unique and the zeroth instance will always exist, so if you limit to the zeroth instance then it's guaranteed to only run once.
Another option is to run your migration as a task (i.e. cf run-task). This runs in its own container, so it would only run once regardless of the number of instances you have. This SO post has some tips about running a migration as a task.
Or does CloudFoundry provide the next mechanism: start the first instance and when this one is up-and-running, the second instance will be started?
It does, it's the --strategy=rolling flag for cf push.
See https://docs.cloudfoundry.org/devguide/deploy-apps/rolling-deploy.html
I'm not sure that this feature would work for ensuring your migration runs only once. According to the docs (See "How it works" section at the link above), your new and old containers could overlap for a short period of time. If that's the case, running the migration could potentially break the old instances. It'll be a short period of time, just until they get replaced with new instances, but maybe something to consider.

HCM Full Data Sync to FSCM not publishing data

I am setting up Integration Broker messaging from HCM 9.2 to FSCM 9.2 using the PERSON_BASIC_FULLSYNC service operation (the delivered process) to sync data from HCM to FSCM. I have activated the service operation, handler, queue, and routing on both sides, however when I run the Full Data Publish process, it runs to No Success with the following error:
Fetching array element 0: index is not in range 1 to 3.
(180,252) EOL_PUBLISH.PUBDTL.GBL.default.190 0-01-01.Step05.OnExecute PCPC:16088 Statement:266
I had initially run this process, and it ran to success, however it did not publish any new data in PS_PERSONAL_DATA in FSCM, so I updated the service operation version in HCM from 'INTERNAL' to 'VERSION_1', as the corresponding service operation in FSCM only had the 'VERSION_1' version available. But after I change the version so they match, and run the process it goes to No Success.
If I set the version of the service operation in HCM back to 'INTERNAL' and run the process, then it is successful but no data gets published in PS_PERSONAL_DATA. Any thoughts on what I should look at?
Sounds like a service op. routing problem. Confirm the routing directions and ensure that any alias' that are set don't cause issues. Service Ops on each side need to be the same.

Running Apache spark job from Spring Web application using Yarn client or any alternate way

I have recently started using spark and I want to run spark job from Spring web application.
I have a situation where I am running web application in Tomcat server using Spring boot.My web application receives a REST web service request based on that It needs to trigger spark calculation job in Yarn cluster. Since my job can take longer to run and can access data from HDFS, so I want to run the spark job in yarn-cluster mode and I don't want to keep spark context alive in my web layer. One other reason for this is my application is multi tenant so each tenant can run it's own job, so in yarn-cluster mode each tenant's job can start it's own driver and run in it's own spark cluster. In web app JVM, I assume I can't run multiple spark context in one JVM.
I want to trigger spark jobs in yarn-cluster mode from java program in the my web application. what is the best way to achieve this. I am exploring various options and looking your guidance on which one is best
1) I can use spark-submit command line shell to submit my jobs. But to trigger it from my web application I need to use either Java ProcessBuilder api or some package built on java ProcessBuilder. This has 2 issues. First it doesn't sound like a clean way of doing it. I should have a programatic way of triggering my spark applications. Second problem will be I will loose the capability of monitoring the submitted application and getting it's status.. Only crude way of doing it is reading the output stream of spark-submit shell, which again doesn't sound like good approach.
2) I tried using Yarn client to submit the job from spring application. Following is the code that I use to submit spark job using Yarn Client:
Configuration config = new Configuration();
System.setProperty("SPARK_YARN_MODE", "true");
SparkConf conf = new SparkConf();
ClientArguments cArgs = new ClientArguments(sparkArgs, conf);
Client client = new Client(cArgs, config, conf);
client.run();
But when I run the above code, it tries to connect on localhost only. I get this error:
5/08/05 14:06:10 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/08/05 14:06:12 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
So I don't think it can connect to remote machine.
Please suggest, what is best way of doing this with latest version of spark. Later I have plans to deploy this entire application in amazon EMR. So approach should work there also.
Thanks in advance
Spark JobServer might help:https://github.com/spark-jobserver/spark-jobserver, this project receives RESTful web requests and start a spark job. Results is returned as json response.
I also had similar issues trying to run Spark app that connects to YARN cluster - having no cluster config it was trying to connect to the local machine as for the main node of the cluster, which obviously failed.
It worked for me when I've placed core-site.xml and yarn-site.xml into the classpath (src/main/resources in typical sbt or Maven project structure) - application correctly connected to the cluster.
When using spark-submit location of those files is typically specified by HADOOP_CONF_DIR environment variable, but for stand-alone application it didn't have effect.

Resources