Artifactory pro v7.30.x fails to start (multiple versions and installation methods) - artifactory

I am evaluating a self-hosted artifactory installation on a trial license. I followed the official installation instructions for the docker container and the linux archive file. Neither of these installation options are working. The artifactory service fails to start.
I have opened an issue to track the problem: https://www.jfrog.com/jira/browse/RTFACT-27182
TL;DR; A component fails, a nasty stack trace appears in the logs, and eventually the services stop.

It would seem that there is a bug in artifactory. I have traced this back to multiple versions and this issue spans multiple years.
The problem appears to be that artifactory cannot get past the bootstrapping/initialization phase when started with artifactoryctl. At a certain point (around 2-5 minutes in) all the services stop and a pid file is left over, which is bad.
The workaround I have found is that the service can pass this initialization phase only after multiple start/stops (3 to be exact). In other words, we call artifactoryctl start, wait for all failures, then artifactoryctl stop and repeat two more times. On the fourth and final start, we will see the service come online (in about 150 - 190s). From then on, the service will start correctly with one call to artifactoryctl start.
I have not yet looked at the systemd unit file. My guess would be that it has/or could be made to have a number of retries to work around this issue and perhapse this issue does not exist when using the service wrapper.
I have also not yet looked again at the docker container which appears to be failing for the same reason. A workaround off the top of my head would be to modify the entrypoint script. If you were to dockerk exec into the container and try the workaround above it would likely terminate the root process and kill the container.

Related

Kolla-ansible too many open files

I am having an issue with a relatively small openstack cluster deployed with kolla-ansible. The issue is that after a few days the controllers stop working. When I go into the docker container logs, I see in all of them that there are Too Many Open Files. I have tried changing limits.conf sysctl max files for processes and user. After all of that, the issue still shows up.
One interesting thing is that this was not happening until I had to reboot all of the controllers. I rebooted them because I needed to increase the amount of ram that they have after they died swapping. My first thought was that kolla-ansible is setting a configuration after running deploy, but I can't seem to find any point in the repo when kolla-ansible is changing ulimits or other.
Any theories what could cause this? Would it be related to increasing ram? Should I run reconfigure/deploy on each controller? I've tried looking in kolla-ansible's docs and forums and couldn't see where anyone else was having this issue.
Update this hasn't been fixed yet:
I submitted a bug report, https://bugs.launchpad.net/kolla-ansible/+bug/1901898
I don't know your used versions of Kolla-Ansible and your Linux, but your problem seems really related to this one:
On Ubuntu 16.04, please uninstall lxd and lxc packages. (An issue exists with cgroup mounts, mounts exponentially increasing when restarting container) (source: docs.openstack.org/kolla-ansible/4.0.0/quickstart.html)
I had this problem with the exponentially growing number of mount-pointers after the restart of my docker-containers too. My single-node test-deployment had become very slow based on this problem, but I can't remember at the moment, that I would had the same error with too many open files.
You can delete the packages with apt-get remove lxc-common lxcfs lxd lxd-client. I had done this fix together with a complete reinstallation of the kolla-ansible installation, so I don't know, if this also helps with an already existing installation. You should also use docker-ce instead of the docker from the apt-repos.
This was fixed with a workaround in bug https://bugs.launchpad.net/keystonemiddleware/+bug/1883659 problem was neutron server was keeping memcached connections open and not closing them until the memcached container reached too many files open. There is a work around mentioned in the bug link.

Hip hop error log showing syntax errors and deployment process getting stuck on doing hot deployment parallely (allAtATime) via AWS Codedeploy

We tried this POC to deploy code via AWS Code deploy on 20 live servers, which are behind Load balancer. We are having nginx running in front of Hiphop. We tried hot deployment, i.e. deploying while nginx was running.
As soon as the deployment process moves the new file to the designated place in the production servers, we start getting the following errors, which continues indefinitely on some servers, and the Jenkins jobs times out after polling for 50 minutes -
\nFatal error: syntax error, unexpected $end in /path/to/file.php on line 19477
It appears like only a part of the file gets loaded and read, even though the file in its entirety has no syntax errors.
Restarting nginx on such servers manually fixes the problem, but that does not seem to be a good solution.
We are trying to find out the reason behind this issue.
HHVM version being used - HipHop VM 3.12.0-dev (rel)
Nginx version - 1.8.0
Alternative approach
We are now trying to do cold deployment (shut down nginx then do deployment and then turn on nginx again), but that too is throwing its own issues. I will not post those details here, but the idea is to take the advantage of the large number of servers we have, and do cold deployment in such a way that only a small percentage of the servers behind LB have nginx off at a time, so that it does not cause too much load on the running servers.
CodeDeploy will indeed replace files during a deployment. I recommend you try your approach of doing a cold deployment in which you fully shut down before deploying and startup again after it's done.

Why are my Firebase Functions deploys (from the Netherlands) failing?

Update: The problems passed for a while, but then returned with a vengeance since yesterday. Deploys now really take forever and always fail with Server Error. connect ETIMEDOUT or Upload Error: Cannot read property 'response' of undefined or something else.
After experimenting with connecting via a US location using HideMyAss I found that that completely resolved my issues though! Note that this issue not only occurs when deploying from our office in Amsterdam, but also from our office in Rotterdam. In the meanwhile I have also heard from more people experiencing issues with other Google services.
I have replied to the related Firebase Support email with this information and hope they will look into it. In the meanwhile I guess I'll have to keep on using HideMyAss..
--
Deploying Functions has been taking increasingly longer times after adding more of them. Occasionally at first, but recently for some periods every time I try to $ firebase deploy --only functions one of the functions being deployed at random fails with:
⚠ functions[foo]: Deploy Error: Failure in the
execution environment
When I try again an hour or so later it deploys without a problem (still takes 2 minutes to deploy which seems a little slow).
Perhaps the deploy process is timing out; it always fails after a long time, never quickly.
Perhaps my location outside of America is resulting in latency related issues in the deployment process? Doesn't seem very likely though..
I'm also looking into Firebase Functions logging "Function execution took 60002 ms, finished with status: 'timeout'" and other performance issues, so I wonder if these are all related.
PS: I also reported this to https://firebase.google.com/support/ but the last report I made there is still un-answered after 15 days, so I'm going to go ahead and post it here as well. I included a firebase-debug.log with that report, but rather not publicly sharing that here (not sure if there any tokens in there etc).
Also having issues
I have been having similar issues today and I am deploying from the United States.
For example a function will fail showing these two errors:
Deploy Error: Failure in the execution environment
Error: Functions did not deploy properly.
I found that once when this happened it was because my internet connection was dropping and then totally dropped during a firebase deploy.
Then again it happened because I was trying to deploy at the exact same time that at lot of my cloud functions happened to kick off doing stuff on their servers.
Once the cloud function had failed, then even when my internet resumed and my functions weren't busily running, it would not let me redeploy to get the jammed function running again. No matter what the broken copy of it was stuck in their servers with the tag:
Failure in execution environment
My Solution
I found that once you have a problem like that you can actually rename the function. On the next deploy it will effectively delete the old broken function and reload it as a working copy with the new name.
I would think if your project required it to have the same name you could do this process one more time, deleting the copy with the new name and restoring the copy with the name you needed. Or even block commenting out the function to delete it on the first deploy and then uncommenting it on another deploy to reinstall it.
How that helps you
I'm hoping that if you are still having issues from the Netherlands it is helpful to know that it may be a slow connection issue or a busy server issue. As I have found these two things to cause problems for me from within the States.
Also my solution of deleting and redeploying the function might help to speed up a deploy if the issue is with a copy that is on their servers. It would be interesting to know if that helps because even though the function looks ok on the server maybe it had issues during the previous deploy that are jamming up future deploys.
Sorry for the late response to this, hopefully you are no longer having these issues with firebase (I hate firebase btw, always issues like this).

Nexus Sonatype "Cleanup Old Snapshots" fails

I'm working on a Nexus 1.9.2.3 implementation and we're trying to run the Scheduled Task "Cleanup Old Snapshots".
The task runs anywhere from 2-5 mins and then fails with an "Error [XmYs]" in Last result (where X and Y are minutes/seconds values).
Logging shows that the tasks starts and is waiting, then no failure results shown in the logs (both nexus.log and wrapper.log).
We're trying to remove an excessively large collection of snapshots from the system that have been allowed to accumulate over the years, and ultimately move to just keeping the last 10. Next step will be to perform an upgrade to a newer version. But this has become a bit of a work blocker.
I'm at my whits end on this end on this and am close to just manually deleting all the files but I know that doesn't cleanly clear all the meta information out so it's a last resort option.
Any help or advice would be greatly appreciated!
Remove the files on the filesystem and run a scheduled task to update the Maven metadata.
And then upgrade Nexus as soon as possible. You are using a VERY old version.

Apache solr process killed automatically

I am using Apache Solr for one of my Drupal sites. I start my Apache Solr with Java -Jar -Xmx256M start.jar to fix the memory limit. I run my Apache in screen, but at times i see that my instance of Apache solr gets stopped/killed automatically. In my dev server i find it very hard to start it manually. Is there any fix to stop the instance getting killed automatically?
By the way the following are some of the warning i get in the console
"solrconfig.xml: is deprecated and no longer recommended used."
"WARNING: and configuration sections are deprecated (but still work). Please use instead."
Thanks.
I had this problem initially as well. I solved it by starting a screen session as root and starting Solr within that session. Also try adding a nohup before the java command and see if that works.

Resources