Kolla-ansible too many open files - openstack

I am having an issue with a relatively small openstack cluster deployed with kolla-ansible. The issue is that after a few days the controllers stop working. When I go into the docker container logs, I see in all of them that there are Too Many Open Files. I have tried changing limits.conf sysctl max files for processes and user. After all of that, the issue still shows up.
One interesting thing is that this was not happening until I had to reboot all of the controllers. I rebooted them because I needed to increase the amount of ram that they have after they died swapping. My first thought was that kolla-ansible is setting a configuration after running deploy, but I can't seem to find any point in the repo when kolla-ansible is changing ulimits or other.
Any theories what could cause this? Would it be related to increasing ram? Should I run reconfigure/deploy on each controller? I've tried looking in kolla-ansible's docs and forums and couldn't see where anyone else was having this issue.
Update this hasn't been fixed yet:
I submitted a bug report, https://bugs.launchpad.net/kolla-ansible/+bug/1901898

I don't know your used versions of Kolla-Ansible and your Linux, but your problem seems really related to this one:
On Ubuntu 16.04, please uninstall lxd and lxc packages. (An issue exists with cgroup mounts, mounts exponentially increasing when restarting container) (source: docs.openstack.org/kolla-ansible/4.0.0/quickstart.html)
I had this problem with the exponentially growing number of mount-pointers after the restart of my docker-containers too. My single-node test-deployment had become very slow based on this problem, but I can't remember at the moment, that I would had the same error with too many open files.
You can delete the packages with apt-get remove lxc-common lxcfs lxd lxd-client. I had done this fix together with a complete reinstallation of the kolla-ansible installation, so I don't know, if this also helps with an already existing installation. You should also use docker-ce instead of the docker from the apt-repos.

This was fixed with a workaround in bug https://bugs.launchpad.net/keystonemiddleware/+bug/1883659 problem was neutron server was keeping memcached connections open and not closing them until the memcached container reached too many files open. There is a work around mentioned in the bug link.

Related

Artifactory pro v7.30.x fails to start (multiple versions and installation methods)

I am evaluating a self-hosted artifactory installation on a trial license. I followed the official installation instructions for the docker container and the linux archive file. Neither of these installation options are working. The artifactory service fails to start.
I have opened an issue to track the problem: https://www.jfrog.com/jira/browse/RTFACT-27182
TL;DR; A component fails, a nasty stack trace appears in the logs, and eventually the services stop.
It would seem that there is a bug in artifactory. I have traced this back to multiple versions and this issue spans multiple years.
The problem appears to be that artifactory cannot get past the bootstrapping/initialization phase when started with artifactoryctl. At a certain point (around 2-5 minutes in) all the services stop and a pid file is left over, which is bad.
The workaround I have found is that the service can pass this initialization phase only after multiple start/stops (3 to be exact). In other words, we call artifactoryctl start, wait for all failures, then artifactoryctl stop and repeat two more times. On the fourth and final start, we will see the service come online (in about 150 - 190s). From then on, the service will start correctly with one call to artifactoryctl start.
I have not yet looked at the systemd unit file. My guess would be that it has/or could be made to have a number of retries to work around this issue and perhapse this issue does not exist when using the service wrapper.
I have also not yet looked again at the docker container which appears to be failing for the same reason. A workaround off the top of my head would be to modify the entrypoint script. If you were to dockerk exec into the container and try the workaround above it would likely terminate the root process and kill the container.

How to extend docker environment generated by wp-env

I've been using wp-env for a while now for running local WordPress environments for development on my Mac. With the introduction of Monterey, Apple removed PHP from MacOS. There are a couple of ways I can think of to handle this situation. Many people seem to be using Homebrew and MAMP. However, I'd prefer not to have to use Homebrew, both because of past personal experience, but also because going down this path seems to create a whole other mess for how to handle PHP and Composer (see, for example, Using PHPCS with Homebrew On MacOS Monterey).
So, my thought was, maybe I can just start doing development inside of the docker container. The questions then:
how do I extend the wp-env npm module to add things by default to the docker container, without modifying the wp-env source? i.e., does docker have some sort of config I can write that will run wp-env and then add some other stuff to the image? (e.g., npm, git, eslint, etc... so that the docker container itself becomes a development environment).
as I'm actually writing this question, does it even make sense to do it this way? I've found hints that a few people are doing it this way (e.g., a commenter on Using Docker in development the right way talked about his setup where he has vim/tmux/vscode/zsh configuration and shortcuts baked in, and recommends running all services as dockers inside that volume (which he claims is a huge performance increase over host bind mount). Unfortunately, he linked to a git repo that either no longer exists or is at least no longer public.)
While I cannot assist you specifically with wp-env I would recommend using DDev https://ddev.readthedocs.io/en/stable/ As you will basically have the freedom of choosing custom PHP environments, plus it comes with pre defined configurations to use specific stacks e.g. Laravel, WordPress, Drupal, and is dead simple to use.
I understand you might like to continue with wp-env but maybe this will help you out.

Meteor killed by ubuntu while trying to "run Meteor" on a server

My goal is to follow "Deploying a Meteor app with Nginx from scratch" tutorial available here
After installing Meteor, Node, Forever and Git, do the npm install, I try to "run meteor" to see if it works.
After downloading meteor-tools, the process begins to extract meteor-tools... looks like it is hanging for a couple minutes and then stops without any warning.
So my guess is that something causes the extraction to quit, but i don't know what exactly.
Yes, Meteor likes plenty of RAM. I would recommend using Phusion Passenger with nginx for Meteor, it's very easy to set up, and their tutorials/getting started is very good:
https://www.phusionpassenger.com/library/install/nginx/install/oss/
I haven't found the exact reason. However, I got it working. I was using a DigitalOcean server (512Mb/20G). I tried with a big server (16G/160G) and it works.
So I guess my server Ram OR Disk capacity was too small.
Edit:
setting up smaller configuration, i noticed that the minimum for Meteor to work is: 2Gb of RAM and 40Gb Disk.

Saltstack: network.ip_addrs is not available

I've run into an issue with Saltstack version 2014.7.0, where I cannot get network information from Salt.
If I run:
salt-call network.ip_addrs
I get:
Function network.ip_addrs is not available
This only seems to happen on some of my hosts. It seems to effect the almost all of the functions in salt.modules.network, but everything else works as expected.
I suspect there's something in my environment to blame. I am running salt within a CentOS 7 docker container. I followed these instructions to get Systemd running under Docker, and it seems to be functioning just fine, so I don't think that's the issue, but I wouldn't be surprised if it's related. I'm using Docker as a development environment, but I will be using these formula to orchestrate virtual machines in production.
Has anyone encountered the network module not being loaded properly? Is there something that needs to be available for that module to be accessible?
I have other mechanisms to get the IP address, but none that are as easy to work with in other salt formulas.
It turns out my problem was that I had my own custom module called "network" which was obscuring the upstream network module.
I'm pretty sure this was working at some point in the past, so I'm wondering if there might have been a change to salt in a more recent version that would cause it to conflict at a module level instead of merging methods from different modules of the same name, but I suppose it's possible that it never worked.

Glassfish admin console slow loading

Today I stopped/started my GlassfishV3 instance and now I cannot access the addmin console located at http://servername:4848/. The screen says: "The admin console is loading..." This is going on forever now.
I have tried as follows:
I have tried adding the following entry to my domain.xml located at /glassfishv3/glassfish/domains/domain1/config as suggested in another Stack Overflow Q&A but after restarting the server still no luck.
<java-options>-Dcom.sun.enterprise.tools.admingui.NO_NETWORK=true</java-options>
I have also installed glassfishv3 on my local machine and cannot recreate the problem.I can go to http://localhost:4848 without any problem.
I have also looked at the server.log and jvm.log files located under the /glassfishv3/glassfish/domains/domain1/logs and nothing there that shed some light.
Any help would be very much appreciated
I had similar symptoms, and I tried some of what Dario had suggested as well, but it didn't work. It could be that I had a unique configuration for my dev env: I'm running Glassfish 3.1 on a VirtualBox Ubuntu 11.04 64-bit guest on a Windows 7 64-bit host. Quite by accident, I discovered an additional symptom: if I turned off the network on the Ubuntu guest, the console would load successfully on a localhost browser instance. That is, on the Ubuntu guest with the network off, I could successfully navigate to http://localhost:4848 and show the Glassfish admin console as expected. However, if the Ubuntu guest's network was on, I had the exact behavior suggested by the original poster: http://localhost:4848 would just sit forever on the inial loading page.
To make a long story short, I found that adding the following argument to the JVM options for server-config fixed the problem:
-Djava.net.preferIPv4Stack=true
When I made that change and restarted the Glassfish server, everything worked.
(Note that I also had in place some of the other settings recommended above, i.e., NO_NETWORK=true, and I'd adjusted the JVM memory footprint and set it to -server instead of -client. It could be that these settings are required as well, though they weren't sufficient on their own in my case.)
I was having this exact same problem. I could deploy in run mode, but it would hang forever in Debug mode. IntelliJ was hanging on the breakpoints. I muted the breakpoints, and glassfish3 worked good as new. I didn't have to change any domain.xml settings. Check your breakpoints!
I found a solution to my problem. Setting the java-option to NO_NETWORK to true did not work so I upgraded from 3.0.1 to 3.1 and it got fixed. Not immediately though, I had to stop/start the Glassfish server a couple of times before I got into the admin console without any really long delays.
Solution
The solution was to upgrade from the command line using the pkg utility.
You can find the steps in this link:
http://download.oracle.com/docs/cd/E18930_01/html/821-2437/gkthu.html#gktjf
Or do as follows:
Go to as-install-parent/bin
./pkg image-update
as-install-parent/glassfish/bin/asadmin start-domain --upgrade domain-name
as-install-parent/glassfish/bin/asadmin start-domain domain-name
UPDATE
I had peformance issues again and I found this other solution in Joshi's tech blog:
http://joshitech.blogspot.com/2009/09/glassfish-application-server.html
Basically add the following jvm options in the domain.xml. It should increase Glassfish boot up and deployment performance:
<jvm-options>-server</jvm-options>
<jvm-options>-Xms3000m</jvm-options>
<jvm-options>-Xmx3000m</jvm-options>
<jvm-options>-XX:MaxPermSize=192m</jvm-options>
<jvm-options>-XX:NewRatio=2</jvm-options>
<jvm-options>-XX:+AggressiveHeap</jvm-options>
<jvm-options>-XX:+AggressiveOpts</jvm-options>
<jvm-options>-XX:+UseParallelGC</jvm-options>
<jvm-options>-XX:+UseParallelOldGC</jvm-options>
<jvm-options>-XX:ParallelGCThreads=5</jvm-options>
I don't know if you are referencing this answer, but there is a second step described (disabling update module).
Two more ideas:
Check if the NO_NETWORK=true option really works (there should be no ads in GF admin console)
Watch the server.log (glassfish-install-dir/glassfis/domains/domain1/logs) during startup and look for the last log entry before the delay occurs. This could be a hint for the source of the delay.
Beware of blindly following Dario's example unless you've lots more RAM than most do.
-Xms3000m gives 3gb to Glassfish. Do YOU have that much spare RAM?
I tried this on my 4gb Mac with 1gb for Glassfish. Made no discernable difference at all...performance still sux.

Resources