Application disappear from application list, but host always exists - cloudify

I have a problem that my application "zdaas" installed by cloudify, it has been worked well for a long time, but today, I find the applicaton "zdaas" is disappearing from application commbox(attachment picture 1), but in the host tab, the machine and usm gsa etc exist normally(picture 2)
then I use Admin to restart GSM, and the result is the application tab has nothing appeared(picture 3).
To solve this problem, I try to restart the management, but the problem always exists
At last, I teardown the cloudify, and reinstall "zdaas". But if my application is running in product enviroment(it works very well), it didnot allow me to teardown.
so, how can I resolve this problem?
Thank you very much!
![enter image description here][1]
![enter image description here][2]
![enter image description here][3]
https: //cloudifysource.zendesk.com/attachments/token/DjjxOBl6CIfPLda5fMpGrI67z/?name=1.png
https: //cloudifysource.zendesk.com/attachments/token/DjjxOBl6CIfPLda5fMpGrI67z/?name=2.png
https: //cloudifysource.zendesk.com/attachments/token/DjjxOBl6CIfPLda5fMpGrI67z/?name=3.png

From the screenshots you sent, it looks like you are using a single Cloudify manager. If it is running without persistence (not saving state to disk) then a restart of the Cloudify Manager host would cause all manager state to be lost.
After such a restart, the agent VMs would reconnect to the manager so you would still see them on the hosts page, but the state of installed services would be lost.
It is recommended to run Cloudify in production with two managers running in a highly-available cluster, so that the failure of any one machine will not cause data loss.
To set the Cloudify manager to run with HA, edit your *-cloud groovy file, and set the numberOfManagementMachines field to a value of 2. When you bootstrap cloudify with this value, 2 manager machines will be started up. Both machines are active, and each backs up the other's data.
cloud {
...
provider {
...
numberOfManagementMachines 2
}
}

Related

airflow logs: break long logs into smaller multiple files

I see that airflow logs are stored at
base_log_folder/dag_id/task_id/date_time/1.log
i.e:
base_log_folder/dag_id={dag_id}/run_id={run_id}/task_id={task_id}/attempt={try_number}.log
Sometime my logs are huge and know its now a good idea to check them from the web ui, because the chrome cant handle so much size of logs.
I have access to the server and can check the logs.
So how can i break the longs into smaller files - v
i.e
{try_number}_1.log
{try_number}_2.log
{try_number}_3.log
...
Also noted that the log file {trynumber}.log, is only created when the task is completed.
while the task is running i can check the logs in the webui, but i dont see any file in the corresponding log folder.
So i need two things for logging from the server side:
break large log files into smaller files
see the log file live while the task is running, not only after the task is completed
In Airflow 2.4.0 there is an option to view full logs or only the first fragment thus huge logs are not loaded automatically:
Starting Airflow 2.5.0 the web UI also does auto tails for logs (PR)
Airflow does show live logs. If you will set for example a Sensor task that pokes resource you will see the poking attempts in the log when the task is running. It's important to note that there are local logs and remote logs (docs):
In the Airflow UI, remote logs take precedence over local logs when remote logging is enabled. If remote logs can not be found or accessed, local logs will be displayed. Note that logs are only sent to remote storage once a task is complete (including failure). In other words, remote logs for running tasks are unavailable (but local logs are available).
Huge logs are often a sign of not using log levels. If you have entries relevant for debugging then set DEBUG mode rather than INFO mode that way you can better control over the log size displayed in the UI using the AIRFLOW__LOGGING__LOGGING_LEVEL variable.

How to unblock IPAM Access in Windows Server 2022?

I'm using Windows Server 2022 where I'm stucked in completing my IPAM Server Task after step 4 - "Start server discovery". When I proceed to step 5 - "Select or add servers to manage and verify IPAM Access".
When I tried to "Edit Server". I encountered this error as you can see on the screenshot below.
I encountered those errors. I have already ran these commands below in the powershell.
Invoke-IPAMGPOProvisioning –Domain depeddumaschools.com -GPOPrefixName DCSGROUP -IPAMServerFQDN WIN-LODU3GE5I1E.depeddumaschools.com -DelegatedGPOUser DEPEDDUMASCHOOL\Administrator
gpupdate /force
I can't still manage to unblock the IPAM Access and I have thoroughly followed the steps in these two articles below.
https://msftwebcast.com/2020/01/install-and-configure-ipam-in-windows-server-2019.html
https://mehic.se/2017/05/23/install-and-configure-ip-address-management-ipam-2016-part-1/
As you can see on my Group Policy Management below
I was able to update the group policy on our domain controller. Is there anything else that I still missed on my setting and configuration along the way? Please advice. Thanks

Private Endpoint in ACI not available for germanywestcentral?

I am facing a issue that containers which should only have private endpoints cannot be deployed
"The requested resource is not available in the location 'germanywestcentral' at this moment. Please retry with a different resource request or in another location. Resource requested: '1' CPU '1.5' GB memory 'Linux' OS virtual network"
The same container is working is fine as soon as I select public interface.
I didn't find anything about it in documentation or internet, so maybe someone here has an idea?
Thanks in advance
Stefan
Tried with QuickStart Image and DockerHub Registry as well getting the same result.
This error indicates that due to heavy load in the region in which you are attempting to deploy, the resources specified for your container can't be allocated at that time. Use one or more of the following mitigation steps to help resolve your issue.
1.Deploy to a different Azure region
2.Deploy at a later time
3.You can also reach out to support for details information.
Refernce : https://learn.microsoft.com/en-us/answers/questions/394616/trying-to-re-deploy-a-container-instance-but-error.html

How to pin openstack container versions when using kolla-ansible?

When installing openstack via kolla-ansible you specify openstack version in globals.yml, ie: openstack_release: "victoria". This is as specific as you can get, there are no point-in-time tags, just a moving target like "victoria".
In my experience containers are updated randomly, not all-at-once, and frequently. Every time I rebuild I'm having to wait for docker to pull down things which have changed since my last deploy. This is problematic for multiple reasons, most acutely:
This is a fast-moving community-driven project. I'm having to work through new issues every few times I rebuild as a result of changes.
If I deploy onto one set of hosts, then deploy onto more hosts hours later, I'm waiting again on updates, and my stack is running containers of different versions.
These pulls take time and make my deployments vulnerable to timeouts and network problems.
To emphasize what a problem the second issue is, usually I can reset a failed deployment and try again, but not always. There have been times where I had residual issues, and due to my noobness it was quicker to dump fresh disks and start over. I'm using external ceph (the only ceph option in kolla-ansible:victoria), colocated with the compute nodes. Resetting pool / OSD state to an earlier point in time isn't in my toolbox yet, so I also wipe my OSD's and redo the ceph installation. I can pin version on ceph containers, but I start to sweat once the kolla-ansible installation starts. For a 4-hour total install, there's a not-small chance that another container will change in this time.
The obvious answer for anybody who does IT or software professionally is to pin my kolla:* container versions to a specific point-in-time tag, and not "victoria". I could pin each container to a digest, but that's not supported in the playbooks as written. I'd need to edit ansible playboooks and add a variable for every container that I want to pin. And then maintain that logic as new containers are added. I'm pulling 43 containers right now. This approach feels like "2 trailer park girls go 'round the outside".
A far simpler approach which I'm planning is to pull all the "victoria"-tagged containers, and then iterate through pushing them back into my own docker repo (eg, "victoria-feralcoder-20120321"), and then update globals.yml to use this stable tag. I'm new to managing my own docker repos, so I don't know if I can retag images in a pull-through cache, or if I need to set up a private repo for that, so I may also have to switch kolla-ansible between docker.io and a private feralcoder repo, depending on whether I want to do a latest-pull or a pinned-pull. That would be a little "hey nineteen", cleaner and nicer, still not quite right...
I feel like this pull-retag-push-reconfigure-redeploy approach is hack jankery. Does anybody have a better suggestion? Like, to not check upstream for container changes if there's already a tag-match in the local mirror? Or maybe a way to pull-thru-and-retag, at the registry level?
Thanks, in advance, and also thanks to the kolla-ansible contributors for all their work, short of not providing version stability.
Here is one answer, for an existing deployment:
If you have already pulled containers to all your hosts, you can edit some ansible or python so that docker_container.pull=false for all containers.
This is the implementing module:
.../lib/python3.6/site-packages/ansible/modules/cloud/docker/docker_container.py.
This file might be in /usr/local/share/kolla-ansible/, or .../venvs/kolla-ansible/. When false, if the container exists on the host it won't be repulled.
This doesn't help the situation where a host hasn't yet pulled the package and you have a version already in your local mirror. In that situation, the stack host will pull the container, and your pull-through cache will pull down any container updates since last pull.
This is my current preferred solution, which is still, admittedly, a hack:
Pull the latest images as a batch, then tag them and push them to a local registry.
First, I need 2 docker registries: I can't push to a pull-through cache, so I also needed to set up a private registry, which I can push to.
I need to toggle settings in globals.yml back and forth during kolla-ansible deploy to achieve this:
When I run "kolla-ansible bootstrap-servers" I need the local registry configured, so that stack hosts are configured with appropriate insecure-registries configs.
I use "kolla-ansible pull" to prefetch the latest packages, when I want to update. For this I reconfigure globals.yml to point at kolla/*:victoria.
After I fetch the latest containers, I run a loop on one of my stack hosts to pull them from my pull-through cache, tag them to my local registry with a date stamp tag, and push them to my local registry.
Before I run the actual deploy I configure globals.yml to use my local registry and tags.
These are the globals.yml settings of interest:
## PINNED CONTAINER VERSIONS
#docker_registry: 192.168.127.220:4001
#docker_namespace: "feralcoder"
#openstack_release: "feralcoder-20210321"
# LATEST CONTAINER VERSIONS
docker_registry:
docker_registry_username: feralcoder
docker_namespace: "kolla"
openstack_release: "victoria"
My pseudocode is like this (intermediate steps pruned...):
use_localized_containers () {
cp $KOLLA_SETUP_DIR/files/kolla-globals-localpull.yml /etc/kolla/globals.yml
cat $KOLLA_SETUP_DIR/files/kolla-globals-remainder.yml >> /etc/kolla/globals.yml
}
use_latest_dockerhub_containers () {
# We switch to dockerhub container fetches, to get the latest "victoria" containers
cp $KOLLA_SETUP_DIR/files/kolla-globals-dockerpull.yml /etc/kolla/globals.yml
cat $KOLLA_SETUP_DIR/files/kolla-globals-remainder.yml >> /etc/kolla/globals.yml
}
localize_latest_containers () {
for CONTAINER in `ls $KOLLA_PULL_THRU_CACHE`; do
ssh_control_run_as_user root "docker image pull kolla/$CONTAINER:victoria" $PULL_HOST
ssh_control_run_as_user root "docker image tag kolla/$CONTAINER:victoria $LOCAL_REGISTRY/feralcoder/$CONTAINER:$TAG" $PULL_HOST
ssh_control_run_as_user root "docker image push $LOCAL_REGISTRY/feralcoder/$CONTAINER:$TAG" $PULL_HOST
done
}
use_localized_containers
kolla-ansible -i $INVENTORY bootstrap-servers
use_latest_dockerhub_containers
kolla-ansible -i $INVENTORY pull
localize_latest_containers
use_localized_containers
kolla-ansible -i $INVENTORY deploy

Euca 5.0 No Node Controllers

I've used the ansible install to run all services on a single host and have two separate physical node controllers.
Everything installed fine and all of my services are green. But I don't think image workers are launching to do my first image uploads. As I'm trying to troubleshoot I see that no node controllers are reported by:
euserv-describe-node-controllers
It doesn't return an error just blank output. I've unregistered and re-registered the two node controllers and copied the CLC admin keys with no errors but still can't see output from that command. cloud-output and the various nc log files seem to show successful startup.
I've switched to ImagingServiceAdministrator to look for imaging worker instances with this and got blank output which was what started me looking at NC's:
euca-describe-instances --filter tag-value=euca-internal-imaging-workers
The imaging service is not required for installing instance-store images, e.g.:
python <(curl -Ls https://eucalyptus.cloud/images)
or (on an ansible deployed cloud):
eucalyptus-images --size 1
To check on the status of node controllers in a deployment you will need to have cloud administrator credentials. You can check this using:
euare-getcallerid
euare-accountlist
and verifying that the eucalyptus account is being used.
Node controllers are managed via a cluster controller so you should check the status for both:
euserv-describe-services -a --filter service-type=cluster
euserv-describe-services -a --filter service-type=node
this differs from euserv-describe-node-controllers as it does not include information on running instances.
If there are any issues you can check for service events:
euserv-describe-events
and look at the logs (/var/log/eucalyptus/...) to further investigate.
Check that the IP addresses you registered node controllers using are the ones that the node controllers are listening on (NC_ADDR in /etc/eucalyptus//eucalyptus.conf)
If using firewalld restart/reload the configuration after deployment to ensure running with the latest settings.

Resources