Messages dropped in Syslog-NG - Buffer configuration? - syslog

i'm using Syslog-NG 3.8 as a syslog server receiving messages from many different sources, about 400 servers (& filtering them and relaying to a splunk server eventually).
however, it looks like many messages are "dropped" even before they are filtered and forwarded to the splunk instance.
i have a configuration where i'm keeping track of "incoming" messages before filtering in a flat file and i can't see some messages in there (see example below), while i can see the tcp trace successfully when running a tcpdump, meaning that Syslog-NG "drops" messages during the "source" mecanism.
i suspect this is due to the large mount of messages incoming on its interface, and i need to do some tuning with buffers & specific options.
here is a concrete example of the bug :
from the source machine, if i do a little loop sending messages every second with an ID that increments (so 20,21,22,23,24 & so on) :
root#sm1u1050vmo /var/log: for ((i=20;i<100;i++)); do logger -p auth.notice "test auth notice $i" ; sleep 1 ; done
if i tail the "incoming.log" flat file on Syslog-NG (receiver), i can see many missing messages :
[root#xm1p1034vmo 20]# tail -f incoming.log | grep sm1u1050vmo
Sep 20 12:27:32 sm1u1050vmo root: [ID 702911 auth.notice] test auth notice 28
Sep 20 12:27:34 sm1u1050vmo root: [ID 702911 auth.notice] test auth notice 30
Sep 20 12:27:37 sm1u1050vmo root: [ID 702911 auth.notice] test auth notice 33
Sep 20 12:27:42 sm1u1050vmo root: [ID 702911 auth.notice] test auth notice 38
Sep 20 12:27:43 sm1u1050vmo root: [ID 702911 auth.notice] test auth notice 39
Sep 20 12:27:52 sm1u1050vmo root: [ID 702911 auth.notice] test auth notice 48
we can clearly see many messages missing.
here is a small part of my config, with options & sources :
options {
chain_hostnames(no);
log_msg_size(8192);
time_reopen(2);
create_dirs(yes);
use_dns(yes);
keep_hostname(yes);
stats_freq(3600);
flush_lines(1);
log_fifo_size(1000);
};
the source being used :
source s_EXTERNAL {
network(transport("udp") log-fetch-limit(500));
};
the "local copy" destination being used to track these incoming messages before they are filtered :
destination d_INCOMING_ALL
{
file("/app/syslog-ng/logs/${YEAR}/${MONTH}/${DAY}/incoming.log" create-dirs(yes));
};
note, i am currently trying to play with options such as so-rcvbuf() in the source (& adapting the kernel parameter rmem_max at the same time). it was 16Mb and i increased it to 64M but it didn't change anything for now. i can still see RcvbufErrors & packet receive errors increasing when running a netstat -us
any hints please ?
thanks

I'd tell you a UDP joke, but you probably wouldn't get it. Seriously, UDP is not guaranteed to arrive. Change the protocol to TCP on each end, and you should be set. Protocol aside, if your system is dropping messages at the rate of 1 per second, you likely have configuration issues on your network or it's simply overloaded.

Related

ldapsearch is slow to launch (not slow to search, slow to launch)

On one host ldapsearch was taking 20 seconds to launch.
Even if I just asked it what its version number is, it still took 20 seconds:
time ldapsearch -VV
ldapsearch: #(#) $OpenLDAP: ldapsearch 2.4.44 (Sep 30 2020 17:16:36) $
mockbuild#x86-02.bsys.centos.org:/builddir/build/BUILD/openldap-2.4.44/openldap-2.4.44/clients/tools
(LDAP library: OpenLDAP 20444)
real 0m20.034s
user 0m0.006s
sys 0m0.008s
This isn't a question about time to search - if I asked it to search, it would spend 20 seconds before it even starts searching.
Once it starts, the search succeeds and takes about the same time as it does when invoked from other hosts.
I tried adding various command line parameters.
The only thing that returned a different result was ldapsearch --help which returns basically instantly, suggesting that the problem wasn't in loading libraries or any such.
Running strace showed that the delay was in network traffic, specifically port 53 (DNS):
socket(AF_INET6, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3 <0.000038>
connect(3, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "... poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) <0.000011>
sendto(3, "..."..., 34, MSG_NOSIGNAL, NULL, 0) = 34 <0.000033>
poll([{fd=3, events=POLLIN}], 1, 5000) = 0 (Timeout) <5.005182>
The destination for the connect call turned out to be an IP address that was being set in /etc/resolv.conf.
The IP address was unreachable.
Removing the unreachable IP address from /etc/resolv.conf made the delay go away.

Phabricator not sending outbound emails

I have set up my outbound emails on phabricator by following this guide.
However, my emails don't arrive. All the emails are queued. When I went to the daemons in Phabricator UI, I see that several tasks are failing. They all look like this.
Task 448: PhabricatorMetaMTAWorker
Task 448
Task StatusQueuedTask ClassPhabricatorMetaMTAWorkerLease StatusLeasedLease Owner13195:1624502950:mail.icicbcoin.com:11Lease Expires1 h, 59 mDurationNot Completed
Data phabricator/ $ ./bin/mail show-outbound --id 154
Retries
Failure Count5Maximum Retries250Retries After1 m, 2 m, 4 m, 6 m, 8 m, 11 m, 14 m, 17 m, 20 m, 23 m, 27 m, ...
I'm curious of this data part. To me it sounds like phabricator fails running this command which is weir because if I run ./bin/mail show-outbound --id 154 manually I get this:
ID: 154
Status: queued
Related PHID:
Message: fputs(): send of 28 bytes failed with errno=32 Broken pipe
PARAMETERS
sensitive: 1
mustEncrypt:
subject: [Phabricator] Welcome to Phabricator
to: ["PHID-USER-qezqlvc7rxton2lshjue"]
force: 1
HEADERS
TEXT BODY
Welcome to Phabricator!
admin (John Doe) has created an account for you.
Username: some.person
To log in to Phabricator, follow this link and set a password:
http://phabricator.innolabsolutions.rs/login/once/welcome/9/b2jf7j6mg5xomwjhmcfcxbigs7474jyq/10/
After you have set a password, you can log in to Phabricator in the future by going here:
http://phabricator.innolabsolutions.rs/
Love,
Phabricator
HTML BODY
(This message has no HTML body.)
Actually, the problem was the SMTP server configuration, even though this error didn't tell me that. I changed the SMTP port from 465 to 587, restarted the daemons and it worked.
I had the same problem twice.
The second time, it was because I could not resolve the smtp server name:
$ ping gandi.net
ping: gandi.net: Temporary failure in name resolution
Then I added a dns server in /etc/resolv.conf
nameserver 127.0.0.1
nameserver 8.8.8.8 # <--- added
search home
and restarted the service
sudo service systemd-resolved restart
Right after, Phabricator automatically sent all the queued emails.

EUCA 4.4.5 VPCMIDO Instances Terminate at Launch

I have achieved a small test cloud on 3 pieces of hardware. It works fine when in EDGE mode but when I try to configure it for VPCMIDO, new instances begin to launch but then timeout and move to a terminated state. I can also see the instances' initial volume and config data appear in the NC and CC data directories. Below is my system layout and network.json.
HOST 1 : CLC/UFS/WALRUS/MIDO CLUSTER/MIDO GATEWAY/MIDOLMAN AGENT:
em1 (All Services including Mido Cluster): 10.0.0.21
em3 (Target VPCMIDO Adapter): 10.0.0.22
HOST 2 : CC/SC
em1 : 10.0.0.23
HOST 3 : NC/MIDOLMAN AGENT
em1 : 10.0.0.24
{
"Mido": {
"Gateways": [
{
"Ip": "10.0.0.22",
"ExternalDevice": "em3",
"ExternalCidr": "192.168.0.0/16",
"ExternalIp": "192.168.0.2",
"ExternalRouterIp": "192.168.0.1"
}
]
},
"Mode": "VPCMIDO",
"PublicIps": [
"10.0.100.1-10.0.100.254"
]
}
I may be misunderstanding the intent of reserving an interface just for the mido gateway. All of my eucalyptus/zookeeper/cassandra/midonet configs use the 10.0.0.21 interface and seem to communicate fine. The midonet tunnel zone reports my CLC host and NC host successfully in the tunnel zone. The only part of my config that references the interface I intend to use for the midonet gateway is the network.json. No errors were returned at any time during my config so I think I may be missing something conceptual.
You may need to start eucanetd as described here:
https://docs.eucalyptus.cloud/eucalyptus/4.4.5/index.html#install-guide/starting_euca_clc.html
The eucanetd component in vpcmido mode runs on the cloud controller and is responsible for controlling midonet.
When eucanetd is not running instances will fail to start as the required network resources will not be created.
I configured a bridge on the NC and instances were able to launch and I no longer got an error in my nc.log. Docs and the eucalyptus.conf file comments tell me I shouldn't need to do this in VPCMIDO netowrking mode: https://docs.eucalyptus.cloud/eucalyptus/4.4.5/index.html#install-guide/configuring_bridge.html
Despite all that adding the bridge fixed this issue.

Command died with signal 6: "/usr/libexec/dovecot/deliver

Emails are not being deliver to a particular email IDs.
We are using Sentora panel and Postfix mail server.
Error message:
Command died with signal 6: "/usr/libexec/dovecot/deliver"
Mail log:
Feb 14 09:50:27 host postfix/pipe[24913]: CBD7D2010A5: to=,
relay=dovecot, delay=13047, delays=13045/0/0/1.3, dsn=4.3.0,
status=SOFTBOUNCE (Command died with signal 6:
"/usr/libexec/dovecot/deliver")
Please help.
Signal 6 is SIGABRT, which is typically sent when there is an internal problem with the code of Dovecot's deliver binary. There are a number of reasons this could happen.
You can turn on LDA logging within your Dovecot config to get more insight on what's actually happening:
protocol lda {
...
# remember to give proper permissions for these files as well
log_path = /var/log/dovecot-lda-errors.log
info_log_path = /var/log/dovecot-lda.log
}
this can also happen when mail_temp_dir (default: /tmp) does not have enough space to extract attachments. it was fixed in https://github.com/dovecot/core/commit/43d7f354c44b358f45ddd10deb3742ec1cc94889 but is not yet available in some linux distributions (such as debian bullseye).

bind failure: Address already in use even though recycle and reuse flags are set to 1

Environment:
Unix client and unix server.
Tool used : curl.
Client/Server should ignore the time wait time (2 *MSL ) when establishing connection.
This is done by executing the following commands :
sysctl net.ipv4.tcp_tw_reuse=1
sysctl net.ipv4.tcp_tw_recycle=1
Local port must be specified so that it can re-used.
Start the connection.
Example : while [ 1 ]; do curl --local-port 9056 192.168.40.2; sleep 30; done
I am still seeing the error even though it should have ignored time wait period.
Any idea why this is happening?

Resources