I am using monolog with this configuration:
monolog:
channels:
- deprecation # Deprecations are logged in the dedicated "deprecation" channel when it exists
when#dev:
monolog:
handlers:
main:
type: stream
path: "%kernel.logs_dir%/%kernel.environment%.log"
level: debug
channels: ["!event"]
# uncomment to get logging in your browser
# you may have to allow bigger header sizes in your Web server configuration
#firephp:
# type: firephp
# level: info
#chromephp:
# type: chromephp
# level: info
console:
type: console
process_psr_3_messages: false
channels: ["!event", "!doctrine", "!console"]
when#test:
monolog:
handlers:
main:
type: fingers_crossed
action_level: error
handler: nested
excluded_http_codes: [404, 405]
channels: ["!event"]
nested:
type: stream
path: "%kernel.logs_dir%/%kernel.environment%.log"
level: debug
when#prod:
monolog:
handlers:
main:
type: fingers_crossed
action_level: error
handler: nested
excluded_http_codes: [404, 405]
buffer_size: 50 # How many messages should be saved? Prevent memory leaks
nested:
type: stream
path: "%kernel.logs_dir%/prod.log"
level: debug
formatter: monolog.formatter.json
console:
type: console
process_psr_3_messages: false
channels: ["!event", "!doctrine"]
deprecation:
type: stream
channels: [deprecation]
path: php://stderr
The only thing I actually changed from the default is:
path: "%kernel.logs_dir%/prod.log"
for the production environment. Have prod.log file but after getting 500 errors the file is empty. Nothing is logged there.
APP_ENV is set to prod in .env.
Could you please check the web engine error log, what gets logged there on 5xx errors? Also, you could try removing the following when#prod.monolog.handlers.nested.formatted so it looks like this:
...
when#prod:
monolog:
handlers:
main:
type: fingers_crossed
action_level: error
handler: nested
excluded_http_codes: [404, 405]
buffer_size: 50 # How many messages should be saved? Prevent memory leaks
nested:
type: stream
path: "%kernel.logs_dir%/prod.log"
level: debug
formatter: monolog.formatter.json
console:
type: console
process_psr_3_messages: false
...
I am trying to deploy node with the official docker image with following command
docker run -ti \
--memory=2048m \
--cpus=2 \
-v /Users/aliceguo/IdeaProjects/car-cordapp/build/nodes/PartyC/config:/etc/corda \
-v /Users/aliceguo/IdeaProjects/car-cordapp/build/nodes/PartyC/certificates:/opt/corda/certificates \
-v /Users/aliceguo/IdeaProjects/car-cordapp/build/nodes/PartyC/persistence:/opt/corda/persistence \
-v /Users/aliceguo/IdeaProjects/car-cordapp/build/nodes/PartyC/logs:/opt/corda/logs \
-v /Users/aliceguo/IdeaProjects/car-cordapp/build/nodes/PartyC/cordapps:/opt/corda/cordapps \
-v /Users/aliceguo/IdeaProjects/car-cordapp/build/nodes/PartyC/additional-node-infos:/opt/corda/additional-node-infos \
-v /Users/aliceguo/IdeaProjects/car-cordapp/build/nodes/PartyC/network-parameters:/opt/corda/network-parameters \
-p 10011:10011 \
-p 10012:10012 \
corda/corda-corretto-5.0-snapshot.
And the node seems to start successfully, but I cannot connect to it via rpc from my laptop (the docker container is on my laptop as well). I will attach some log and screenshot below. Any help would be appreciated!
Node Log:
[INFO ] 2019-07-19T03:21:23,163Z [main] cliutils.CordaCliWrapper.call - Application Args: --base-directory /opt/corda --config-file /etc/corda/node.conf
[INFO ] 2019-07-19T03:21:24,146Z [main] manifests.Manifests.info - 115 attributes loaded from 152 stream(s) in 61ms, 115 saved, 2353 ignored: ["ActiveMQ-Version", "Agent-Class", "Ant-Version", "Application-Class", "Application-ID", "Application-Library-Allowable-Codebase", "Application-Name", "Application-Version", "Archiver-Version", "Automatic-Module-Name", "Bnd-LastModified", "Branch", "Build-Date", "Build-Host", "Build-Id", "Build-Java-Version", "Build-Jdk", "Build-Job", "Build-Number", "Build-Timestamp", "Built-By", "Built-OS", "Built-Status", "Bundle-Activator", "Bundle-Category", "Bundle-ClassPath", "Bundle-Copyright", "Bundle-Description", "Bundle-DocURL", "Bundle-License", "Bundle-ManifestVersion", "Bundle-Name", "Bundle-NativeCode", "Bundle-RequiredExecutionEnvironment", "Bundle-SymbolicName", "Bundle-Vendor", "Bundle-Version", "Caller-Allowable-Codebase", "Can-Redefine-Classes", "Can-Retransform-Classes", "Can-Set-Native-Method-Prefix", "Caplets", "Change", "Class-Path", "Codebase", "Corda-Platform-Version", "Corda-Release-Version", "Corda-Revision", "Corda-Vendor", "Created-By", "DynamicImport-Package", "Eclipse-BuddyPolicy", "Eclipse-LazyStart", "Export-Package", "Extension-Name", "Fragment-Host", "Gradle-Version", "Hibernate-JpaVersion", "Hibernate-VersionFamily", "Implementation-Build", "Implementation-Build-Date", "Implementation-Title", "Implementation-URL", "Implementation-Url", "Implementation-Vendor", "Implementation-Vendor-Id", "Implementation-Version", "Import-Package", "Include-Resource", "JCabi-Build", "JCabi-Date", "JCabi-Version", "JVM-Args", "Java-Agents", "Java-Vendor", "Java-Version", "Kotlin-Runtime-Component", "Kotlin-Version", "Liquibase-Package", "Log4jReleaseKey", "Log4jReleaseManager", "Log4jReleaseVersion", "Main-Class", "Main-class", "Manifest-Version", "Min-Java-Version", "Min-Update-Version", "Module-Email", "Module-Origin", "Module-Owner", "Module-Source", "Multi-Release", "Originally-Created-By", "Os-Arch", "Os-Name", "Os-Version", "Permissions", "Premain-Class", "Private-Package", "Provide-Capability", "Require-Capability", "SCM-Revision", "SCM-url", "Scm-Connection", "Scm-Revision", "Scm-Url", "Service-Component", "Specification-Title", "Specification-Vendor", "Specification-Version", "System-Properties", "Tool", "Trusted-Library", "X-Compile-Source-JDK", "X-Compile-Target-JDK"]
[INFO ] 2019-07-19T03:21:24,188Z [main] BasicInfo.printBasicNodeInfo - Logs can be found in : /opt/corda/logs
[INFO ] 2019-07-19T03:21:25,096Z [main] subcommands.ValidateConfigurationCli.logRawConfig$node - Actual configuration:
{
"additionalNodeInfoPollingFrequencyMsec" : 5000,
"additionalP2PAddresses" : [],
"attachmentCacheBound" : 1024,
"baseDirectory" : "/opt/corda",
"certificateChainCheckPolicies" : [],
"cordappSignerKeyFingerprintBlacklist" : [
"56CA54E803CB87C8472EBD3FBC6A2F1876E814CEEBF74860BD46997F40729367",
"83088052AF16700457AE2C978A7D8AC38DD6A7C713539D00B897CD03A5E5D31D",
"6F6696296C3F58B55FB6CA865A025A3A6CC27AD17C4AFABA1E8EF062E0A82739"
],
"crlCheckSoftFail" : true,
"dataSourceProperties" : "*****",
"database" : {
"exportHibernateJMXStatistics" : false,
"initialiseAppSchema" : "UPDATE",
"initialiseSchema" : true,
"mappedSchemaCacheSize" : 100,
"transactionIsolationLevel" : "REPEATABLE_READ"
},
"detectPublicIp" : false,
"devMode" : true,
"emailAddress" : "admin#company.com",
"extraNetworkMapKeys" : [],
"flowMonitorPeriodMillis" : {
"nanos" : 0,
"seconds" : 60
},
"flowMonitorSuspensionLoggingThresholdMillis" : {
"nanos" : 0,
"seconds" : 60
},
"flowTimeout" : {
"backoffBase" : 1.8,
"maxRestartCount" : 6,
"timeout" : {
"nanos" : 0,
"seconds" : 30
}
},
"jarDirs" : [],
"jmxReporterType" : "JOLOKIA",
"keyStorePassword" : "*****",
"lazyBridgeStart" : true,
"myLegalName" : "O=PartyC,L=New York,C=US",
"noLocalShell" : false,
"p2pAddress" : "localhost:10011",
"rpcSettings" : {
"address" : "localhost:10012",
"adminAddress" : "localhost:10052",
"standAloneBroker" : false,
"useSsl" : false
},
"rpcUsers" : [],
"security" : {
"authService" : {
"dataSource" : {
"passwordEncryption" : "NONE",
"type" : "INMEMORY",
"users" : [
{
"ignoresFallbacks" : false,
"resolved" : true,
"value" : {
"loadFactor" : 0.75,
"modCount" : 3,
"size" : 3,
"table" : {},
"threshold" : 3
}
}
]
}
}
},
"trustStorePassword" : "*****",
"useTestClock" : false,
"verifierType" : "InMemory"
}
[INFO ] 2019-07-19T03:21:25,119Z [main] internal.Node.logStartupInfo - Vendor: Corda Open Source
[INFO ] 2019-07-19T03:21:25,119Z [main] internal.Node.logStartupInfo - Release: 5.0-SNAPSHOT
[INFO ] 2019-07-19T03:21:25,119Z [main] internal.Node.logStartupInfo - Platform Version: 5
[INFO ] 2019-07-19T03:21:25,119Z [main] internal.Node.logStartupInfo - Revision: df19b444ddd32d3afd10ed0b76c1b2f68d985968
[INFO ] 2019-07-19T03:21:25,119Z [main] internal.Node.logStartupInfo - PID: 19
[INFO ] 2019-07-19T03:21:25,120Z [main] internal.Node.logStartupInfo - Main class: /opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-node-5.0-SNAPSHOT.jar
[INFO ] 2019-07-19T03:21:25,120Z [main] internal.Node.logStartupInfo - CommandLine Args: -Xmx512m -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -javaagent:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/quasar-core-0.7.10-jdk8.jar=x(antlr**;bftsmart**;co.paralleluniverse**;com.codahale**;com.esotericsoftware**;com.fasterxml**;com.google**;com.ibm**;com.intellij**;com.jcabi**;com.nhaarman**;com.opengamma**;com.typesafe**;com.zaxxer**;de.javakaffee**;groovy**;groovyjarjarantlr**;groovyjarjarasm**;io.atomix**;io.github**;io.netty**;jdk**;junit**;kotlin**;net.bytebuddy**;net.i2p**;org.apache**;org.assertj**;org.bouncycastle**;org.codehaus**;org.crsh**;org.dom4j**;org.fusesource**;org.h2**;org.hamcrest**;org.hibernate**;org.jboss**;org.jcp**;org.joda**;org.junit**;org.mockito**;org.objectweb**;org.objenesis**;org.slf4j**;org.w3c**;org.xml**;org.yaml**;reflectasm**;rx**;org.jolokia**;com.lmax**;picocli**;liquibase**;com.github.benmanes**;org.json**;org.postgresql**;nonapi.io.github.classgraph**) -Dcorda.dataSourceProperties.dataSource.url=jdbc:h2:file:/opt/corda/persistence/persistence;DB_CLOSE_ON_EXIT=FALSE;WRITE_DELAY=0;LOCK_TIMEOUT=10000 -Dvisualvm.display.name=Corda -Djava.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT -Dcapsule.app=net.corda.node.Corda_5.0-SNAPSHOT -Dcapsule.dir=/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT -Dcapsule.jar=/opt/corda/bin/corda.jar -Djava.security.egd=file:/dev/./urandom
[INFO ] 2019-07-19T03:21:25,120Z [main] internal.Node.logStartupInfo - bootclasspath: /usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/resources.jar:/usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/rt.jar:/usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/jsse.jar:/usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/jce.jar:/usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/charsets.jar:/usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/jfr.jar:/usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/classes
[INFO ] 2019-07-19T03:21:25,120Z [main] internal.Node.logStartupInfo - classpath: /opt/corda/bin/corda.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-shell-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-rpc-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-node-api-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-tools-cliutils-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-common-configuration-parsing-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-common-validation-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-common-logging-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-confidential-identities-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/log4j-slf4j-impl-2.9.1.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/log4j-web-2.9.1.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/jul-to-slf4j-1.7.25.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-jackson-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-serialization-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/corda-core-5.0-SNAPSHOT.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/kotlin-stdlib-jdk8-1.2.71.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/jackson-module-kotlin-2.9.5.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/kotlin-reflect-1.2.71.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/quasar-core-0.7.10-jdk8.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/kryo-serializers-0.42.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/kryo-4.0.0.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/jimfs-1.1.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/metrics-new-relic-1.1.1.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/guava-25.1-jre.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/caffeine-2.6.2.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/disruptor-3.4.2.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/commons-collections4-4.1.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/artemis-amqp-protocol-2.6.2.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/artemis-server-2.6.2.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/artemis-jdbc-store-2.6.2.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/artemis-journal-2.6.2.jar:/opt/corda/.capsule/apps/net.corda.node.Corda_5.0-SNAPSHOT/art...
In order to solve this, you need to bind the ports to 0.0.0.0:xxxx instead of localhost:xxxx in the node.conf
"p2pAddress" : "localhost:10011",
"rpcSettings" : {
"address" : "localhost:10012",
"adminAddress" : "localhost:10052",
"standAloneBroker" : false,
"useSsl" : false
},
Description
I'm having issues with an overlay network using docker swarm mode (IMPORTANT: swarm mode, not swarm). I have an overlay network named "internal". I have a service named "datacollector" that is scaled to 12 instances. I docker exec into another service running in the same swarm (and on the same overlay network) and run curl http://datacollector 12 times. However, 4 of the requests result in a timeout. I then run dig tasks.datacollector and get a list of 12 ip addresses. Sure enough, 8 of the ip addresses work but 4 timeout every time.
I tried scaling the service down to 1 instance and then back up to 12, but got the same result.
I then used docker service ps datacollector to find each running instance of my service. I used docker kill xxxx on each node to manually kill all instances and let the swarm recreate them. I then checked dig again and verified that the list of IP addresses for the task was no longer the same. After this I ran curl http://datacollector 12 more times. Now only 3 requests work and the remaining 9 timeout!
This is the second time this has happened in the last 2 weeks or so. The previous time I had to remove all services, remove the overlay network, recreate the overlay network, and re-create all of the services in order to resolve the issue. Obviously, this isn't a workable long term solution :(
Output of `docker service inspect datacollector:
[
{
"ID": "2uevc4ouakk6k3dirhgqxexz9",
"Version": {
"Index": 72152
},
"CreatedAt": "2016-11-12T20:38:51.137043037Z",
"UpdatedAt": "2016-11-17T15:22:34.402801678Z",
"Spec": {
"Name": "datacollector",
"TaskTemplate": {
"ContainerSpec": {
"Image": "507452836298.dkr.ecr.us-east-1.amazonaws.com/swarm/api:61d7931f583742cca91b368bc6d9e15314545093",
"Args": [
"node",
".",
"api/dataCollector"
],
"Env": [
"ENVIRONMENT=stage",
"MONGODB_URI=mongodb://mongodb:27017/liveearth",
"RABBITMQ_URL=amqp://rabbitmq",
"ELASTICSEARCH_URL=http://elasticsearch"
]
},
"Resources": {
"Limits": {},
"Reservations": {}
},
"RestartPolicy": {
"Condition": "any",
"MaxAttempts": 0
},
"Placement": {
"Constraints": [
"node.labels.role.api==true",
"node.labels.role.api==true",
"node.labels.role.api==true",
"node.labels.role.api==true",
"node.labels.role.api==true"
]
}
},
"Mode": {
"Replicated": {
"Replicas": 12
}
},
"UpdateConfig": {
"Parallelism": 1,
"FailureAction": "pause"
},
"Networks": [
{
"Target": "88e9fd9715o5v1hqu6dnkg3vp"
}
],
"EndpointSpec": {
"Mode": "vip"
}
},
"Endpoint": {
"Spec": {
"Mode": "vip"
},
"VirtualIPs": [
{
"NetworkID": "88e9fd9715o5v1hqu6dnkg3vp",
"Addr": "192.168.1.23/24"
}
]
},
"UpdateStatus": {
"State": "completed",
"StartedAt": "2016-11-17T15:19:34.471292948Z",
"CompletedAt": "2016-11-17T15:22:34.402794312Z",
"Message": "update completed"
}
}
]
Output of docker network inspect internal:
[
{
"Name": "internal",
"Id": "88e9fd9715o5v1hqu6dnkg3vp",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "192.168.1.0/24",
"Gateway": "192.168.1.1"
}
]
},
"Internal": false,
"Containers": {
"03ac1e71139ff2140f93c80d9e6b1d69abf442a0c2362610bee3e116e84ef434": {
"Name": "datacollector.5.cxmvk7p1hwznautresir94m3s",
"EndpointID": "22445be80ba55b67d7cfcfbc75f2c15586bace5f317be8ba9b59c5f9f338525c",
"MacAddress": "02:42:c0:a8:01:72",
"IPv4Address": "192.168.1.114/24",
"IPv6Address": ""
},
"08ae84c7cb6e57583baf12c2a9082c1d17f1e65261cfa93346aaa9bda1244875": {
"Name": "auth.10.aasw00k7teq4knxibctlrrj7e",
"EndpointID": "c3506c851f4c9f0d06d684a9f023e7ba529d0149d70fa7834180a87ad733c678",
"MacAddress": "02:42:c0:a8:01:44",
"IPv4Address": "192.168.1.68/24",
"IPv6Address": ""
},
"192203a127d6831c3f4a41eabdd8df5282e33c3e92b99c3baaf1f213042f5418": {
"Name": "parkingcollector.1.8yrm6d831wrfsrkzhal7cf2pm",
"EndpointID": "34de6e9621ef54f7d963db942a7a7b6e0013ac6db6c9f17b384de689b1f1b187",
"MacAddress": "02:42:c0:a8:01:9a",
"IPv4Address": "192.168.1.154/24",
"IPv6Address": ""
},
"24258109e16c1a5b15dcc84a41d99a4a6617bcadecc9b35279c721c0d2855141": {
"Name": "stream.8.38npsusmpa1pf8fbnmaux57rx",
"EndpointID": "b675991ffbd5c0d051a4b68790a33307b03b48582fd1b37ba531cf5e964af0ce",
"MacAddress": "02:42:c0:a8:01:74",
"IPv4Address": "192.168.1.116/24",
"IPv6Address": ""
},
"33063b988473b73be2cbc51e912e165112de3d01bc00ee2107aa635e30a36335": {
"Name": "billing.2.ca41k2h44zkn9wfbsif0lfupf",
"EndpointID": "77c576929d5e82f1075b4cc6fcb4128ce959281d4b9c1c22d9dcd1e42eed8b5e",
"MacAddress": "02:42:c0:a8:01:87",
"IPv4Address": "192.168.1.135/24",
"IPv6Address": ""
},
"8b0929e66e6c284206ea713f7c92f1207244667d3ff02815d4bab617c349b220": {
"Name": "shotspottercollector.2.328408tiyy8aryr0g1ipmm5xm",
"EndpointID": "f2a0558ec67745f5d1601375c2090f5cd141303bf0d54bec717e3463f26ed74d",
"MacAddress": "02:42:c0:a8:01:90",
"IPv4Address": "192.168.1.144/24",
"IPv6Address": ""
},
"938fe5f6f9bb893862e8c06becd76c1a7fe5f2d3b791fc55d7d8164e67ee3553": {
"Name": "inrixproxy.2.ed77crvat0waw41phjknhhm6v",
"EndpointID": "88f550fecd60f0bdb0dfc9d5bf0c74716a91d009bcc27dc4392b113ab1215038",
"MacAddress": "02:42:c0:a8:01:96",
"IPv4Address": "192.168.1.150/24",
"IPv6Address": ""
},
"970f9d4c6ae6cc4de54a1d501408720b7d95114c28a6615d8e4e650b7e69bc40": {
"Name": "rabbitmq.1.e7j721g6hfhs8r7p3phih4g9v",
"EndpointID": "c04a4a5650ee6e10b87884004aa2cb1ec6b1c7036af15c31579462b6621436a2",
"MacAddress": "02:42:c0:a8:01:1e",
"IPv4Address": "192.168.1.30/24",
"IPv6Address": ""
},
"b1f676e6d38eec026583943dc0abff1163d21e6be9c5901539c46288f8941638": {
"Name": "logspout.0.51j8juw8aj0rjjccp2am0rib5",
"EndpointID": "98a93153abd6897c58276340df2eeec5c0ceb77fbe17d1ce8c465febb06776c7",
"MacAddress": "02:42:c0:a8:01:10",
"IPv4Address": "192.168.1.16/24",
"IPv6Address": ""
},
"bab4d80be830fa3b3fefe501c66e3640907a2cbb2addc925a0eb6967a771a172": {
"Name": "auth.2.8fduvrn5ayk024b0lkhyz50of",
"EndpointID": "7e81d41fa04ec14263a2423d8ef003d6d431a8c3ff319963197f8a8d73b4e361",
"MacAddress": "02:42:c0:a8:01:3a",
"IPv4Address": "192.168.1.58/24",
"IPv6Address": ""
},
"bc3c75a7c2d8c078eb7cc1555833ff0d374d82045dd9fb24ccfc37868615bb5e": {
"Name": "reverseproxy.6.2g20zphn5j1r2feylzcplyorg",
"EndpointID": "6c2138966ebcd144b47229a94ee603d264f3954a96ccd024d9e96501b7ffd5c0",
"MacAddress": "02:42:c0:a8:01:6c",
"IPv4Address": "192.168.1.108/24",
"IPv6Address": ""
},
"cd59d61b16ac0325336121a8558e8215e42aa5300f75054df17a70bf1f3e6c0c": {
"Name": "usgscollector.1.0h0afyw8va8maoa4tjd5qz588",
"EndpointID": "952073efc6a567ebd3f80d26811222c675183e8c76005fbf12388725a97b1bee",
"MacAddress": "02:42:c0:a8:01:48",
"IPv4Address": "192.168.1.72/24",
"IPv6Address": ""
},
"d40476e56b91762b0609acd637a4f70e42c88d266f8ebb7d9511050a8fc1df17": {
"Name": "kibana.1.6hxu5b97hfykuqr5yb9i9sn5r",
"EndpointID": "08c5188076f9b8038d864d570e7084433a8d97d4c8809d27debf71cb5d652cd7",
"MacAddress": "02:42:c0:a8:01:06",
"IPv4Address": "192.168.1.6/24",
"IPv6Address": ""
},
"e29369ad8ee5b12fb0c6f9bcb899514ab092f7da291a7c05eea758b0c19bfb65": {
"Name": "weatherbugcollector.1.crpub0hf85cewxm0qt6annsra",
"EndpointID": "afa1ddbad8ab8fdab69505ddb5342ac89c0d17bc75a11e9ac0ac8829e5885997",
"MacAddress": "02:42:c0:a8:01:2e",
"IPv4Address": "192.168.1.46/24",
"IPv6Address": ""
},
"f1bf0a656ecb9d7ef9b837efa94a050d9c98586f7312435e48b9a129c5e92e46": {
"Name": "socratacollector.1.627icslq6kdb4syaha6tzkb19",
"EndpointID": "14bea0d9ec3f94b04b32f36b7172c60316ee703651d0d920126a49dd0fa99cf5",
"MacAddress": "02:42:c0:a8:01:1b",
"IPv4Address": "192.168.1.27/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "257"
},
"Labels": {}
}
]
Output of dig datacollector:
; <<>> DiG 9.9.5-9+deb8u8-Debian <<>> datacollector
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38227
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;datacollector. IN A
;; ANSWER SECTION:
datacollector. 600 IN A 192.168.1.23
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Nov 17 16:11:57 UTC 2016
;; MSG SIZE rcvd: 60
Output of dig tasks.datacollector:
; <<>> DiG 9.9.5-9+deb8u8-Debian <<>> tasks.datacollector
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9810
;; flags: qr rd ra; QUERY: 1, ANSWER: 12, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;tasks.datacollector. IN A
;; ANSWER SECTION:
tasks.datacollector. 600 IN A 192.168.1.115
tasks.datacollector. 600 IN A 192.168.1.66
tasks.datacollector. 600 IN A 192.168.1.22
tasks.datacollector. 600 IN A 192.168.1.114
tasks.datacollector. 600 IN A 192.168.1.37
tasks.datacollector. 600 IN A 192.168.1.139
tasks.datacollector. 600 IN A 192.168.1.148
tasks.datacollector. 600 IN A 192.168.1.110
tasks.datacollector. 600 IN A 192.168.1.112
tasks.datacollector. 600 IN A 192.168.1.100
tasks.datacollector. 600 IN A 192.168.1.39
tasks.datacollector. 600 IN A 192.168.1.106
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Nov 17 16:08:54 UTC 2016
;; MSG SIZE rcvd: 457
Output of docker version:
Client:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built: Wed Oct 26 23:26:11 2016
OS/Arch: darwin/amd64
Server:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built: Wed Oct 26 21:44:32 2016
OS/Arch: linux/amd64
Output of docker info:
Containers: 58
Running: 15
Paused: 0
Stopped: 43
Images: 123
Server Version: 1.12.3
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 430
Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host null overlay bridge
Swarm: active
NodeID: 8uxexr2uz3qpn5x1km9k4le9s
Is Manager: true
ClusterID: 2kd4md2qyu67szx4y6q2npnet
Managers: 3
Nodes: 8
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: 10.10.44.201
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.13.0-91-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.676 GiB
Name: stage-0
ID: 76Z2:GN43:RQND:BBAJ:AGUU:S3F7:JWBC:CCCK:I4VH:PKYC:UHQT:IR2U
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: herbrandson
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Labels:
provider=generic
Insecure Registries:
127.0.0.0/8
Additional environment details:
Docker swarm mode (not swarm). All nodes are running on AWS. The swarm has 8 nodes (3 managers and 5 workers)
UPDATE:
Per the comments, here's a snipet from the docker daemon logs on the swarm master
time="2016-11-17T15:19:45.890158968Z" level=error msg="container status
unavailable" error="context canceled" module=taskmanager task.id=ch6w74b3cu78y8r2ugkmfmu8a
time="2016-11-17T15:19:48.929507277Z" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=exb6dfc067nxudzr8uo1eyj4e
time="2016-11-17T15:19:50.104962867Z" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=6mbbfkilj9gslfi33w7sursb9
time="2016-11-17T15:19:50.877223204Z" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=drd8o0yn1cg5t3k76frxgukaq
time="2016-11-17T15:19:54.680427504Z" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=9lwl5v0f2v6p52shg6gixs3j7
time="2016-11-17T15:19:54.949118806Z" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=51q1eeilfspsm4cx79nfkl4r0
time="2016-11-17T15:19:56.485909146Z" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=3vjzfjjdrjio2gx45q9c3j6qd
time="2016-11-17T15:19:56.934070026Z" level=error msg="Error closing logger: invalid argument"
time="2016-11-17T15:20:00.000614497Z" level=error msg="Error closing logger: invalid argument"
time="2016-11-17T15:20:00.163458802Z" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=4xa2ub5npxyxpyx3vd5n1gsuy
time="2016-11-17T15:20:01.463407652Z" level=error msg="Error closing logger: invalid argument"
time="2016-11-17T15:20:01.949087337Z" level=error msg="Error closing logger: invalid argument"
time="2016-11-17T15:20:02.942094926Z" level=error msg="Failed to create real server 192.168.1.150 for vip 192.168.1.32 fwmark 947 in sb 938fe5f6f9bb893862e8c06becd76c1a7fe5f2d3b791fc55d7d8164e67ee3553: no such process"
time="2016-11-17T15:20:03.319168359Z" level=error msg="Failed to delete a new service for vip 192.168.1.61 fwmark 2133: no such process"
time="2016-11-17T15:20:03.363775880Z" level=error msg="Failed to add firewall mark rule in sbox /var/run/docker/netns/5de57ee133a5: reexec failed: exit status 5"
time="2016-11-17T15:20:05.772683092Z" level=error msg="Error closing logger: invalid argument"
time="2016-11-17T15:20:06.059212643Z" level=error msg="Error closing logger: invalid argument"
time="2016-11-17T15:20:07.335686642Z" level=error msg="Failed to delete a new service for vip 192.168.1.67 fwmark 2134: no such process"
time="2016-11-17T15:20:07.385135664Z" level=error msg="Failed to add firewall mark rule in sbox /var/run/docker/netns/6699e7c03bbd: reexec failed: exit status 5"
time="2016-11-17T15:20:07.604064777Z" level=error msg="Error closing logger: invalid argument"
time="2016-11-17T15:20:07.673852364Z" level=error msg="Failed to delete a new service for vip 192.168.1.75 fwmark 2097: no such process"
time="2016-11-17T15:20:07.766525370Z" level=error msg="Failed to add firewall mark rule in sbox /var/run/docker/netns/6699e7c03bbd: reexec failed: exit status 5"
time="2016-11-17T15:20:09.080101131Z" level=error msg="Failed to create real server 192.168.1.155 for vip 192.168.1.35 fwmark 904 in sb 192203a127d6831c3f4a41eabdd8df5282e33c3e92b99c3baaf1f213042f5418: no such process"
time="2016-11-17T15:20:11.516338629Z" level=error msg="Error closing logger: invalid argument"
time="2016-11-17T15:20:11.729274237Z" level=error msg="Failed to delete a new service for vip 192.168.1.83 fwmark 2124: no such process"
time="2016-11-17T15:20:11.887572806Z" level=error msg="Failed to add firewall mark rule in sbox /var/run/docker/netns/5b810132057e: reexec failed: exit status 5"
time="2016-11-17T15:20:12.281481060Z" level=error msg="Failed to delete a new service for vip 192.168.1.73 fwmark 2136: no such process"
time="2016-11-17T15:20:12.395326864Z" level=error msg="Failed to add firewall mark rule in sbox /var/run/docker/netns/5b810132057e: reexec failed: exit status 5"
time="2016-11-17T15:20:20.263565036Z" level=error msg="Failed to create real server 192.168.1.72 for vip 192.168.1.91 fwmark 2163 in sb cd59d61b16ac0325336121a8558e8215e42aa5300f75054df17a70bf1f3e6c0c: no such process"
time="2016-11-17T15:20:20.410996971Z" level=error msg="Failed to delete a new service for vip 192.168.1.95 fwmark 2144: no such process"
time="2016-11-17T15:20:20.456710211Z" level=error msg="Failed to add firewall mark rule in sbox /var/run/docker/netns/88d38a2bfb77: reexec failed: exit status 5"
time="2016-11-17T15:20:21.389253510Z" level=error msg="Failed to create real server 192.168.1.46 for vip 192.168.1.99 fwmark 2145 in sb cd59d61b16ac0325336121a8558e8215e42aa5300f75054df17a70bf1f3e6c0c: no such process"
time="2016-11-17T15:20:22.208965378Z" level=error msg="Failed to create real server 192.168.1.46 for vip 192.168.1.99 fwmark 2145 in sb e29369ad8ee5b12fb0c6f9bcb899514ab092f7da291a7c05eea758b0c19bfb65: no such process"
time="2016-11-17T15:20:23.334582312Z" level=error msg="Failed to create a new service for vip 192.168.1.97 fwmark 2166: file exists"
time="2016-11-17T15:20:23.495873232Z" level=error msg="Failed to create real server 192.168.1.48 for vip 192.168.1.17 fwmark 552 in sb e29369ad8ee5b12fb0c6f9bcb899514ab092f7da291a7c05eea758b0c19bfb65: no such process"
time="2016-11-17T15:20:25.831988014Z" level=error msg="Failed to create real server 192.168.1.116 for vip 192.168.1.41 fwmark 566 in sb 03ac1e71139ff2140f93c80d9e6b1d69abf442a0c2362610bee3e116e84ef434: no such process"
time="2016-11-17T15:20:25.850904011Z" level=error msg="Failed to create real server 192.168.1.116 for vip 192.168.1.41 fwmark 566 in sb 03ac1e71139ff2140f93c80d9e6b1d69abf442a0c2362610bee3e116e84ef434: no such process"
time="2016-11-17T15:20:37.159637665Z" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=6yhu3glre4tbz6d08lk2pq9eb
time="2016-11-17T15:20:48.229343512Z" level=error msg="Error closing logger: invalid argument"
time="2016-11-17T15:51:16.027686909Z" level=error msg="Error getting service internal: service internal not found"
time="2016-11-17T15:51:16.027708795Z" level=error msg="Handler for GET /v1.24/services/internal returned error: service internal not found"
time="2016-11-17T16:15:50.946921655Z" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=cxmvk7p1hwznautresir94m3s
time="2016-11-17T16:16:01.994494784Z" level=error msg="Error closing logger: invalid argument"
UPDATE 2:
I tried removing the service and re-creating it and that did not resolve the issue.
UPDATE 3:
I went through and rebooted each node in the cluster one-by-one. After that things appear to be back to normal. However, I still don't know what caused this. More importantly, how do I keep this from happening again in the future?