pandorafms monitoring - networking

I'm newbie for network monitoring. I'm using pandorafms 4.0.2 free version. I added about 1,167 agents and 5,831 remote monitors. unknown agent and unknown monitor level is high. number of unknown monitors/agent increase and decrease but it didn't reach to "0". i check few unknown monitors randomly and ping their ip addresses from terminal. result shows they are alive but pandorafms show them as in unknown state. i checked them after about 6 hours. but network lag is still in high value.need help.(I use ubuntu server)

The behavior you describe could be because of two reasons:
A lack of resources in the server. Normally a Pandora server can monitor 2000 devices but it depends on the resources of the server which hosts Pandora. You can check the minimum requirements here.
It could be a bug :-), in 4.x version a bug related to network monitoring was detected. This bug causes some random failures monitoring using ping.
I would update your installation to the new 5.0 version and if the unknown modules persists, you can check the problem of lack of resources by disabling some agents. Also you can check some tips to configure Pandora for large environments here.
Hope it helps.

Related

How to disallow Perforce UNIX server to generate thousands of IDLE processes

Im' asking this question because we run out of ideas on how to handle the current situation of our perforce versioning server.
The Server
The server is hosted on Scaleway and has a baremetal machine with two SSD under the hood (we know it is no hardware issue).
We are currently using the free license of perforce to evaluate it.
P4 info yields the following:
The Problem
We are using perforce on a UNIX server to version our Unreal Engine 4 project. Lately we discovered that the server stockpiled an amount of 2771 processes where around 80% of them are p4d processes. We suspect these IDLE connections / processes to swamp the server and to be the root of the connectivity issues we encounter at the office.
We enabled monitoring to keep an eye on RUNNING and IDLE processes
p4 configure set monitoring=2
When we now display the monitored processes we see IDLE ones running for more than one hour
p4 monitor show
We already tried disabling leepalive connections with
p4 configure set net.keepalive.disable=1
And we see the following which is going on for a while
The Question
Now the question I want to ask is:
Does anybody else ever has encountered this behaviour with a perforce server on UNIX?
Does anybody knows how we can tell the server that we want to discard IDLE connections ?
EDIT
So after some tracking we discovered that the proxy our office network is behind causes the problems and for some reasons don't allow the connections to close. Does anyone has some clues how to get around these issue?
Based on the monitor output it appears that these clients are opening a bunch of connections and holding them open, basically DOSing the server. You could go through and kill the pids on the server side, but this sounds like a bug in the client that should be raised with Perforce technical support.

warning google cloud compute instance over utilized

i recently installed a Bitnami Wordpress Network stack on google cloud compute.
I keep getting a warning saying that it is over utilised however, when i view cpu and disk usage statistics, i cannot see how this is possible? Both statistics are usually very low only spiking when I am administering websites (ie importing large files, backups, etc).
For exmaple as i post this message right now usage for the
Is this just a marketing ploy to get me to upgrade my instance?
What happens when we overutilise anyway? (what are the symptoms...as my wordpress network appears to me to be functioning flawlessly)
Please see images of my disk and cpu usage over the last 7 days
[CPU utilisation statistcs 7 days][1]
[disk operations 7 days][2]
[Network Packets statistics 7 days][3]
[1]: https://i.stack.imgur.com/iZa0L.png
[2]: https://i.stack.imgur.com/lUOno.png
[3]: https://i.stack.imgur.com/SnbHq.jpg
You need to install the Monitoring Agent in order to get accurate recommendations.
If the monitoring agent is installed and running on a VM instance, the
CPU and memory metrics collected by the agent are automatically used
to compute sizing recommendations. The agent metrics provided by the
monitoring agent give better insights into resource utilization of the
instance than the default Compute Engine metrics. This allows the
recommendation engine to estimate resource requirements better and
make more precise recommendations.
Read: https://cloud.google.com/compute/docs/instances/apply-sizing-recommendations-for-instances?hl=en_GB&_ga=2.217293398.-1509163014.1517671366#using_the_monitoring_agent_for_more_precise_recommendations
How to install the Monitoring Agent to get accurate sizing recommendations:
https://cloud.google.com/monitoring/agent/install-agent

Application Insights Behind a Firewall

I'm having issues getting application insights to report data to Visual Studio Online from behind our firewall. I opened the firewall rules noted in this article but it didn't make a difference. I've uninstalled and reinstalled several times. The only thing that is showing up in the Operations Logs is that it's periodically purging items in the "AppDiagnostics" queue since exceed the maximum allowed size of 15 MB (full error below).
Get-WebApplicationMonitoringStatus shows all the applications I would expect to be monitored being monitored.
The health service has removed some items from the send queue for management group "AppDiagnostics" since it exceeded the maximum allowed size of 15 megabytes.
The IP addresses and hosts that you need to configure/allow are officially documented here:
https://learn.microsoft.com/en-us/azure/application-insights/app-insights-ip-addresses
I'd copy and paste the "relevant" portions, but
there's a huge number of them depending on what you want/need to do and
then they'd be wrong here whenever the above doc is updated

LoadRunner - Monitoring linux counters gives RPC error

Linux distribution is Red Hat. I'm monitoring linux counters with the LoadRunner Controller's System Resources Graphs - Unix Resources. Monitoring is working properly and graphs are plotted in real time. But after a few minutes, errors are appearing:
Monitor name :UNIX Resources. Internal rpc error (error code:2).
Machine: 31.2.2.63. Hint: Check that RPC on this machine is up and running.
Check that rstat daemon on this machine is up and running
(use rpcinfo utility for this verification).
Details: RPC: RPC call failed.
RPC-TCP: recv()/recvfrom() failed.
RPC-TCP: Timeout reached. (entry point: Factory::CollectData).
[MsgId: MMSG-47197]
I logged on the Linux server and found rstatd is still running. Clearing the measurements in Controller's Unix Resources and adding them again, monitoring again started to work but after a few minutes, the same error occurred.
What might cause this error ? Is it due to network traffic ?
Consider using SiteScope, which has been the preferred monitoring foundation for the collection of UNIX|Linux status since version 8.0 of LoadRunner. Every Loadrunner license since version 8 has come with aa 500 Point SiteScope license in the box for this purpose. More points are available upon request for test exclusive use of the instance.

Biztalk Cluster Servers

we used to have 1 biztalk 2006R2 32bit server. We recently upgraded it to Enterprise. But because our traffic size we didn't have enough power and memory with only one. So we also recently installed a second biztalk server, a 2006R2 64-bit, and we put them in a shared cluster. Since then a problem arose, actually two but I'm guessing they probably are connected. One of our (19) host instances keeps getting in the "stopped" status. This host instance is mainly connected with TCP ports. We have a script which checks if host instances are in the stopped state and starts them again, but this obviously has very little use since it keeps resetting to the stopped state. There also is an error in our event viewer, namely:
Faulting application btsntsvc.exe, version 3.6.1404.0, stamp 4674b0a4, faulting module kernel32.dll, version 5.2.3790.4480, stamp 49c51f0a, debug? 0, fault address 0x0000bef7.
Anyone has any idea?
Thanks
Having automated scripts to restart the host instance is not a good idea IMO, you need to get to the bottom of the problem. It looks like a known issue and a hot fix is availble. Worth lookint at this KB http://support.microsoft.com/kb/978059

Resources