How can I get more precise log sources from my Deis apps/containers? - deis

I have a Deis cluster running in a (hopefully-soon-to-be) Production environment, with quite a few different apps using the Dockerfile deployment method. Everything's running fine, but promoting this system to a true Production environment (that is, converting the DNS over) isn't really possible unless I can get some worthwhile log output. Using the standard Deis logging platform, here's some sample output of a Web hit (with a bit more output, for context):
Feb 10 01:46:04 ip-10-21-2-154.ec2.internal systemd[1]: Starting Generate /run/coreos/motd...
Feb 10 01:46:04 ip-10-21-2-154.ec2.internal systemd[1]: Started Generate /run/coreos/motd.
Feb 10 01:46:08 ip-10-21-2-154.ec2.internal docker[1867]: [info] GET /containers/json
Feb 10 01:46:08 ip-10-21-2-154.ec2.internal docker[1867]: [215084df] +job containers()
Feb 10 01:46:08 ip-10-21-2-154.ec2.internal docker[1867]: [215084df] -job containers() = OK (0)
Feb 10 01:46:09 ip-10-21-2-154.ec2.internal sh[1316]: 2015/02/10 01:46:09 set /deis/services/production-web/production-web_v8.cmd.1 -> 10.21.2.154:49409
Feb 10 01:46:12 ip-10-21-2-154.ec2.internal sh[9844]: 2015-02-10 01:46:12.302721 7f213ae14700 0 mon.ip-10-21-2-154.ec2.internal#4(peon).data_health(58) update_stats avail 80% total 102400 MB, used 17621 MB, avail 82542 MB
Feb 10 01:46:18 ip-10-21-2-154.ec2.internal docker[1867]: [info] GET /containers/json
Feb 10 01:46:18 ip-10-21-2-154.ec2.internal docker[1867]: [215084df] +job containers()
Feb 10 01:46:18 ip-10-21-2-154.ec2.internal docker[1867]: [215084df] -job containers() = OK (0)
Feb 10 01:46:19 ip-10-23-1-151.ec2.internal sh[1521]: [INFO] - [10/Feb/2015:01:46:27 +0000] - 10.21.2.179 - - - 200 - "GET / HTTP/1.1" - 4927 - "-" - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.111 Safari/537.36" - "~^production-web\x5C.(?<domain>.+)$" - 10.21.2.154:49409
Feb 10 01:46:19 ip-10-21-2-154.ec2.internal sh[8468]: ===========
Feb 10 01:46:19 ip-10-21-2-154.ec2.internal sh[8468]: HIT TRACKER
Feb 10 01:46:19 ip-10-21-2-154.ec2.internal sh[8468]: SLUG: public/javascripts/bundle.js
Feb 10 01:46:19 ip-10-21-2-154.ec2.internal sh[8468]: ===========
That contains alot of platform information – which is great to have, if only I could filter it out. The problem is all the lines for which the source is sh, but with different PIDs. Those are each completely different containers:
1316 is deis-publisher
9844 is deis-store-monitor
1521 is deis-router
8468 is my web application, production-web
The only way for me to find that out is to ssh into the box and run ps. What's worse, if I had any logs from my other containers at the same time, they would have also shown up as sh – in a production environment with several active apps all logging to the same stream, this situation is obviously untenable. The ideal situation would have sh replaced by the name of the Docker container or, preferably, the Deis app.
I've poured over the documentation and dug into the logspout and logger source code, but I can't find anything to fix this. Any chance I could get some pointers here?

In order to get at the name of the deis container that logged the line, the best way I've found is either:
To run the output of journalctl -f -o short through netcat to a fluentd or logstash tcp listener. You can use these tools to summarize the fields like _SYSTEMD_UNIT that appeal to your needs.
Use ianblenke/fluentd with LOG_DOCKER_JSON defined or fork and modify the autobuild source docker-ianblenke/fluentd. This uses the fluentd-docker plugin to follow the raw docker container json logs.
If you're using CoreOS, I use this fluentd.cloud-init to auto-feed my logs to a local elasticsearch instance on TCP 9200. Will fill find other useful CoreOS cloud-init configs in that project as well.

Related

Clickhouse default http handlers not supported

I have been trying to run clickhouse on ec2 instance from terraform. So far the ec2 instance runs well and I have access to the http localhost:8123. However when I try to access the localhost:8123/play I get the following message:
There is no handle /play
Use / or /ping for health checks.
Or /replicas_status for more sophisticated health checks.
Send queries from your program with POST method or GET /?query=...
Use clickhouse-client:
For interactive data analysis:
clickhouse-client
For batch query processing:
clickhouse-client --query='SELECT 1' > result
clickhouse-client < query > result
I don't understand why this is happening as I was not getting that error when running in local.
When I check the status of the clickhouse server I get the following output:
● clickhouse-server.service - ClickHouse Server
Loaded: loaded (/lib/systemd/system/clickhouse-server.service; enabled; vendor preset: enabled)
Mar 25 12:14:35 systemd[1]: Started ClickHouse Server.
Mar 25 12:14:35 clickhouse-server[11774]: Include not found: clickhouse_remote_servers
Mar 25 12:14:35 clickhouse-server[11774]: Include not found: clickhouse_compression
Mar 25 12:14:35 clickhouse-server[11774]: Logging warning to /var/log/clickhouse-server/clickhouse-server.log
Mar 25 12:14:35 clickhouse-server[11774]: Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Mar 25 12:14:35 clickhouse-server[11774]: Include not found: networks
Mar 25 12:14:35 clickhouse-server[11774]: Include not found: networks
Mar 25 12:14:37 clickhouse-server[11774]: Include not found: clickhouse_remote_servers
Mar 25 12:14:37 clickhouse-server[11774]: Include not found: clickhouse_compression
I don't know if this will help but maybe it is related to the problem.(logs file are empty)
Another question that I have and that has nothing to do with the problem above, is about the understanding of how clickhouse works because we hear many different articles talking about clickhouse but none seem very clear to me. We often hear about "nodes" in the articles that I've been reading. So far I think that clickhouse works with servers on which we put clusters. Inside those clusters we put shards and in each of those shards we put replicas, the so called "nodes". As we will be running in production I just want to make sure that when we talk about "nodes" we are talking about container which act as compute units or it is completely something else.
So far I've tried to open all port ingress and egress but it did not fix the problem. I've checked the clickhouse documentation which mention custom http endpoint but none talk about this error.

cloud-init error is not working / running my userdata from cdrom iso

I am getting the below error-:
[ 698.855708] cloud-init[1158]: 2017-10-09 23:48:42,438 - util.py[WARNING]: Broken config drive: /dev/sr0
I am trying to run the cloud-init during boot of VM and to take the iso from cdrom. Its a centos m/c.
Also this error-:
Also this error-:[ 846.922986] cloud-init[1158]: 2017-10-09 23:51:10,627 - DataSourceEc2.py[CRITICAL]: Giving up on md from ['http://169.254.169.254/2009-04-04/meta-data/instance-id'] after 125 seconds
[ 847.834620] cloud-init[1158]: 2017-10-09 23:51:10,912 - util.py[WARNING]: Getting data from failed
[ 995.764648] cloud-init[3092]: Cloud-init v. 0.7.5 running 'modules:config' at Tue, 10 Oct 2017 03:53:12 +0000. Up 969.10 seconds.
[ 1080.429808] cloud-init[3507]: Cloud-init v. 0.7.5 running 'modules:final' at Tue, 10 Oct 2017 03:54:49 +0000. Up 1065.33 seconds.
ci-info: no authorized ssh keys fingerprints found for user centos.
PLease suggest.
Above error most likely appears when you have malformed meta_data.json or wrong parameter(s) defined in meta_data.json
See error checking test in cloud-init source:
https://github.com/racker/cloud-init-debian-pkg/blob/409f73b8434717b6a2f0353c605399dde35d1f66/tests/unittests/test_datasource/test_configdrive.py
Check out the test test_seed_dir_bad_json_metadata().

Docker on CentOS 7.2: kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

I'm running Docker on CentOS 7, from time to time there's the following message displayed:
Message from syslogd#dev-master at Mar 29 17:23:03 ...
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
I've googled a lot, read a lot of resources found and tried many ways like keeping my system updated, upgrading kernel etc, but the message still keeps showing up, it's not too often but sooner or later I'll see it. Also I found issue for this problem on docker github is still open, then my questions are:
What does this message mean? Could somebody give me a simple explanation why docker causes it?
Is there any workaround for this?
If it could not be fixed yet(the issue is still open), will it affect the server or services running inside docker container? Will it be a serious performance issue because it also happens on our production servers?
Docker version:
Client:
Version: 1.11.1
API version: 1.23
Go version: go1.5.4
Git commit: 5604cbe
Built: Wed Apr 27 00:34:42 2016
OS/Arch: linux/amd64
Server:
Version: 1.11.1
API version: 1.23
Go version: go1.5.4
Git commit: 5604cbe
Built: Wed Apr 27 00:34:42 2016
OS/Arch: linux/amd64
OS info:
CentOS 7, with kernel version: 4.6.0-1.el7.elrepo.x86_64
I really appreciate for any info/tips or resources, thanks a lot.
Your best source of information is the issue you linked to docker#5618. This is a kernel bug, and has not yet been resolved. The issue is "triggered" by docker because starting/stopping containers also creates network interfaces for containers when they are created/destroyed.

Shiny server Connection closed. Info: {"type":"close","code":4503,"reason":"The application unexpectedly exited","wasClean":true}

I've encountered a problem with deploying my shiny app on linux Ubuntu 16.04 LTS.
After I run sudo systemctl start shiny-server, and open up my browser heading to http://192.168..*:3838/StockVis/, the web page greys out in a second.
I found some warnings in the web console as below, and survey some information on the web for like two weeks, but still have no solution. :(
***"Thu Feb 16 2017 12:20:49 GMT+0800 (CST) [INF]: Connection opened. http://192.168.**.***:3838/StockVis/"
Thu Feb 16 2017 12:20:49 GMT+0800 (CST) [DBG]: Open channel 0
The application unexpectedly exited.
Diagnostic information is private. Please ask your system admin for permission if you need to check the R logs.
**Thu Feb 16 2017 12:20:50 GMT+0800 (CST) [INF]: Connection closed. Info: {"type":"close","code":4503,"reason":"The application unexpectedly exited","wasClean":true}
Thu Feb 16 2017 12:20:50 GMT+0800 (CST) [DBG]: SockJS connection closed
Thu Feb 16 2017 12:20:50 GMT+0800 (CST) [DBG]: Channel 0 is closed
Thu Feb 16 2017 12:20:50 GMT+0800 (CST) [DBG]: Removed channel 0, 0 left*****
Please kindly give some suggestions to move on.
This can indicate something in your R code is causing an error. As that R error could be anything, this answer is to help you gather that info. The browser console messages will not tell you what that is. In order to access the error, you need to configure Shiny to not delete the log upon exiting the application.
Assuming you have sudo access:
$ sudo vi /etc/shiny-server/shiny-server.conf
Place the following line in the file after run_as shiny; :
preserve_logs true;
Restart shiny:
sudo systemctl restart shiny-server
Reload your Shiny app.
In the var/log/shiny-sever/ directory there will be a log file with your application name. Viewing that file will give you more information on what is going on.
Warning. After you are done, take out the preserve_logs true; line in the conf file and restart Shiny. If not, you will start generating a bunch of log files you don't want.

jMeter Distributed Testing: Master won't shut down

I have a simple 4 server setup running jMeter (3 slaves, 1 master):
Slave 1: 10.135.62.18 running ./jmeter-server -Djava.rmi.server.hostname=10.135.62.18
Slave 2: 10.135.62.22 running ./jmeter-server -Djava.rmi.server.hostname=10.135.62.22
Slave 3: 10.135.62.20 running ./jmeter-server -Djava.rmi.server.hostname=10.135.62.20
Master: 10.135.62.11 with remote_hosts=10.135.62.18,10.135.62.22,10.135.62.20
I start the test with ./jmeter -n -t /root/jmeter/simple.jmx -l /root/jmeter/result.jtl -r
With the following output:
Writing log file to: /root/apache-jmeter-3.0/bin/jmeter.log
Creating summariser <summary>
Created the tree successfully using /root/jmeter/simple.jmx
Configuring remote engine: 10.135.62.18
Configuring remote engine: 10.135.62.22
Configuring remote engine: 10.135.62.20
Starting remote engines
Starting the test # Mon Aug 29 11:22:38 UTC 2016 (1472469758410)
Remote engines have been started
Waiting for possible Shutdown/StopTestNow/Heapdump message on port 4445
The Slaves print:
Starting the test on host 10.135.62.22 # Mon Aug 29 11:22:39 UTC 2016 (1472469759257)
Finished the test on host 10.135.62.22 # Mon Aug 29 11:22:54 UTC 2016 (1472469774871)
Starting the test on host 10.135.62.18 # Mon Aug 29 11:22:39 UTC 2016 (1472469759519)
Finished the test on host 10.135.62.18 # Mon Aug 29 11:22:57 UTC 2016 (1472469777173)
Starting the test on host 10.135.62.20 # Mon Aug 29 11:22:39 UTC 2016 (1472469759775)
Finished the test on host 10.135.62.20 # Mon Aug 29 11:22:56 UTC 2016 (1472469776670)
Unfortunately the master waits for messages on port 4445 indefinitely event though all slaves finished the test.
Is there anything I have missed?
I figured it out myself just before submitting the question. I guess the solution could be useful nonetheless:
Once I start the test (on the main server) with this:
./jmeter -n -t /root/jmeter/simple.jmx -l /root/jmeter/result.jtl -r -Djava.rmi.server.hostname=10.135.62.11 -Dclient.rmi.localport=4001
It works just fine. I wonder why the documentation doesn't mention something like this.

Resources