We recently moved to Nitrogen-SR3 and we have customized clustering with 2 node. When we restart a node (ie., after failback), we observe following exception in karaf.log and the node is unable to join the cluster. Any help is highly appreciated.
java.util.concurrent.TimeoutException: Connection attempt failed
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBackendResolver.java:129)[505:org.opendaylight.controller.sal-distributed-datastore:1.6.3]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$2(AbstractShardBackendResolver.java:142)[505:org.opendaylight.controller.sal-distributed-datastore:1.6.3]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)[:1.8.0_66]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)[:1.8.0_66]
at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)[:1.8.0_66]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)[:1.8.0_66]
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)[:1.8.0_66]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)[:1.8.0_66]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)[:1.8.0_66]
Caused by: org.opendaylight.controller.cluster.access.concepts.RetiredGenerationException: Originating generation was superseded by 3
at org.opendaylight.controller.cluster.datastore.Shard.findFrontend(Shard.java:482)[505:org.opendaylight.controller.sal-distributed-datastore:1.6.3]
at org.opendaylight.controller.cluster.datastore.Shard.handleConnectClient(Shard.java:522)[505:org.opendaylight.controller.sal-distributed-datastore:1.6.3]
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:325)[505:org.opendaylight.controller.sal-distributed-datastore:1.6.3]
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)[490:org.opendaylight.controller.sal-akka-raft:1.6.3]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44)[498:org.opendaylight.controller.sal-clustering-commons:1.6.3]
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)[321:com.typesafe.akka.persistence:2.4.20]
I think you are hitting this open bug
Related
Recently I installed Intel OneAPI including c compiler, FORTRAN compiler and mpi library and complied VASP with it.
Before presenting the question, there are some tricks I need to clarify during the installation of VASP:
GLIBC2.14: the cluster is an old machine with a glibc version of 2.12, where OneAPI needs a version of 2.14. So I compile the GLIBC2.14 and export the ld_path: export LD_LIBRARY_PATH="~/mysoft/glibc214/lib:$LD_LIBRARY_PATH"
ld 2.24: The ld version is 2.20 in the cluster, while a higher version is needed. So I installed binutils 2.24.
There is one master computer connected with 30 calculating nodes in the cluster. The calculation is executed with 3 ways:
When I do the calculation in the master, it's totally OK.
When I login the nodes manually with rsh command, the calculation in the logged node is also no problem.
But usually I submit the calculation script from the master (with slurm or pbs), and then do the calculation in the node. In that case, I met following error message:
[mpiexec#node3.alineos.net] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec#node3.alineos.net] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec#node3.alineos.net] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1062): error waiting for event
[mpiexec#node3.alineos.net] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1015): error setting up the bootstrap proxies
[mpiexec#node3.alineos.net] Possible reasons:
[mpiexec#node3.alineos.net] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec#node3.alineos.net] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[mpiexec#node3.alineos.net] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec#node3.alineos.net] 4. pbs bootstrap cannot launch processes on remote host. You may try using -bootstrap option to select alternative launcher.
I only met this error with oneAPI compiled codes but Intel® Parallel Studio XE compiled. Do you have any idea of this error? Your response will be highly appreciated.
Best,
Léon
Could it be a permissions error with the Slurm agent not having the correct permissions or library path?
I'm currently working through getting an old project up and running again - it uses both DSE Search and DSE Graph. I don't have too much experience with DSE, but right now I've created one keyspace for the regular cassandra db (searchable) and I've created a graph on the same server using the gremlin console.
The back-end is written in node.js and uses the dse-driver to get data from the DSE server.
When I run the cassandra server with the -g flag, the dse-driver runs fine, does exactly what I want it to do, but obviously, none of my search functionality works.
When I run the cassandra server with the -g and -s flags, my search functionality works, but then I receive errors whenever the backend tries to use the dse-driver to get data from the graph using the executeGraph function.
Is this something that can be fixed or do I need to create more nodes/clusters? I'm really new to DSE so your help is appreciated.
Error: com.google.common.util.concurrent.UncheckedExecutionException: com.google.inject.ProvisionException: Unable to provision, see the following errors:\n\n1) Error injecting constructor, com.datastax.bdp.gcore.datastore.DataStoreException: Failed to execute statement40f07a96-98bf-490c-a738-6c9d0021afba\n at com.datastax.bdp.graph.impl.DseGraphImpl.<init>(DseGraphImpl.java:192)\n at com.datastax.bdp.graph.impl.GraphModule.configure(Unknown Source) (via modules: com.datastax.bdp.graph.impl.DseGraphFactoryImpl$$Lambda$1580/1437671705 -> com.google.inject.util.Modules$OverrideModule -> com.datastax.bdp.graph.impl.GraphModule)\n while locating com.datastax.bdp.graph.impl.DseGraphImpl\n\n1 error\n
I had a corda 3.3 test installation and recently updated it to version 4.1 and after that when I run my nodes with deployNodes script and runnodes - I always receive the following exception in node's console as soon as it starts. What can this mean? I don't have a clue what this can be caused by.
I tried to build and run the nodes without cordapps and they work, so somehow my cordapps cause this error happen. What other information should I provide to help you figure out this issue?
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
at kotlin.io.ByteStreamsKt.readBytes(IOStreams.kt:123)
at kotlin.io.ByteStreamsKt.readBytes$default(IOStreams.kt:120)
at net.corda.core.internal.InternalUtils.readFully(InternalUtils.kt:123)
at net.corda.node.internal.cordapp.JarScanningCordappLoader.getJarHash(JarScanningCordappLoader.kt:228)
at net.corda.node.internal.cordapp.JarScanningCordappLoader.toCordapp(JarScanningCordappLoader.kt:153)
at net.corda.node.internal.cordapp.JarScanningCordappLoader.loadCordapps(JarScanningCordappLoader.kt:106)
at net.corda.node.internal.cordapp.JarScanningCordappLoader.access$loadCordapps(JarScanningCordappLoader.kt:44)
at net.corda.node.internal.cordapp.JarScanningCordappLoader$cordapps$2.invoke(JarScanningCordappLoader.kt:56)
at net.corda.node.internal.cordapp.JarScanningCordappLoader$cordapps$2.invoke(JarScanningCordappLoader.kt:44)
at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
at net.corda.node.internal.cordapp.JarScanningCordappLoader.getCordapps(JarScanningCordappLoader.kt)
at net.corda.node.internal.cordapp.CordappLoaderTemplate$cordappSchemas$2.invoke(JarScanningCordappLoader.kt:422)
at net.corda.node.internal.cordapp.CordappLoaderTemplate$cordappSchemas$2.invoke(JarScanningCordappLoader.kt:389)
at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
at net.corda.node.internal.cordapp.CordappLoaderTemplate.getCordappSchemas(JarScanningCordappLoader.kt)
at net.corda.node.internal.AbstractNode.<init>(AbstractNode.kt:153)
at net.corda.node.internal.AbstractNode.<init>(AbstractNode.kt:126)
at net.corda.node.internal.Node.<init>(Node.kt:98)
at net.corda.node.internal.Node.<init>(Node.kt:97)
at net.corda.node.internal.NodeStartup.createNode(NodeStartup.kt:194)
at net.corda.node.internal.NodeStartup$initialiseAndRun$5.invoke(NodeStartup.kt:186)
at net.corda.node.internal.NodeStartup$initialiseAndRun$5.invoke(NodeStartup.kt:137)
at net.corda.node.internal.NodeStartupLogging$DefaultImpls.attempt(NodeStartup.kt:509)
at net.corda.node.internal.NodeStartup.attempt(NodeStartup.kt:137)
at net.corda.node.internal.NodeStartup.initialiseAndRun(NodeStartup.kt:185)
at net.corda.node.internal.NodeStartupCli.runProgram(NodeStartup.kt:128)
at net.corda.cliutils.CordaCliWrapper.call(CordaCliWrapper.kt:190)
at net.corda.node.internal.NodeStartupCli.call(NodeStartup.kt:83)
at net.corda.node.internal.NodeStartupCli.call(NodeStartup.kt:64)
at picocli.CommandLine.execute(CommandLine.java:1056)
Corda's usage of memory has been slowly creeping upwards. It is possible that your machine does not have enough memory to run 3/4+ nodes at the same time after upgrading to 4.
I recommend trying to run a single node with CorDapps installed and see what happens. If it is still happening then, then something else could be going wrong.
Looking at the stacktrace, it is also possible that your CorDapp itself is really, really, really big and it has gone out of memory reading and loading the CorDapp.
Openstack version - pike
Installation process - multinode kolla
I am observing an error as Unable to establish connection to http://controller:8774/v2.1/flavors/detail. Also, it gives error as " keystoneauth1.exceptions.connection.ConnectFailure" However, this error is not regular and that is why I am not finding an exact solution. Can anybody help what can be the issue?
I tried changing the timeout values but still observing the errors.
I had executed the R program and when I try to push the result to a table using
ore.create(score, table="xyz")
I'm getting the following error:
Error in .oci.GetQuery(conn, statement, data = data, prefetch = prefetch, :
ORA-12801: error signaled in parallel query server P007, instance XY.ab.dc.cd:abc (2)
ORA-06520: PL/SQL: Error loading external library
ORA-06522: /app/oracle/product/11.2.0/dbhome_1/lib/librqe.so: cannot open shared object file: No such file or directory
ORA-06512: at "RQSYS.RQROWEVALIMPL", line 20
ORA-06512: at "RQSYS.RQROWEVALIMPL", line 16
ORA-06512: at line 4
Please help to solve this issue since I tried to solve this for the past 1 week and I cant able to as I am new to this.
Any help much appreciated
This looks like a problem with your installation of Oracle R package.
The message indicates your are running on 11gR2. ORE requires 11.2.0.3 or higher, or 11.2.0.1 with a specific patch applied. Check this OTN Forum thread for details.
You need an Oracle Support contract to get hold of these patches. If you don't have a contract you will need to migrate to database 12c in order to use R.