Euca 5.0 No Node Controllers - eucalyptus

I've used the ansible install to run all services on a single host and have two separate physical node controllers.
Everything installed fine and all of my services are green. But I don't think image workers are launching to do my first image uploads. As I'm trying to troubleshoot I see that no node controllers are reported by:
euserv-describe-node-controllers
It doesn't return an error just blank output. I've unregistered and re-registered the two node controllers and copied the CLC admin keys with no errors but still can't see output from that command. cloud-output and the various nc log files seem to show successful startup.
I've switched to ImagingServiceAdministrator to look for imaging worker instances with this and got blank output which was what started me looking at NC's:
euca-describe-instances --filter tag-value=euca-internal-imaging-workers

The imaging service is not required for installing instance-store images, e.g.:
python <(curl -Ls https://eucalyptus.cloud/images)
or (on an ansible deployed cloud):
eucalyptus-images --size 1
To check on the status of node controllers in a deployment you will need to have cloud administrator credentials. You can check this using:
euare-getcallerid
euare-accountlist
and verifying that the eucalyptus account is being used.
Node controllers are managed via a cluster controller so you should check the status for both:
euserv-describe-services -a --filter service-type=cluster
euserv-describe-services -a --filter service-type=node
this differs from euserv-describe-node-controllers as it does not include information on running instances.
If there are any issues you can check for service events:
euserv-describe-events
and look at the logs (/var/log/eucalyptus/...) to further investigate.
Check that the IP addresses you registered node controllers using are the ones that the node controllers are listening on (NC_ADDR in /etc/eucalyptus//eucalyptus.conf)
If using firewalld restart/reload the configuration after deployment to ensure running with the latest settings.

Related

How to resolve the error: There are 0 NebulaGraph Analytics available?

When I used NebulaGraph Explorer, I expected to use workflow, but the task was executed with this error.
There are 0 NebulaGraph Analytics available. clusterSize should be less than or equal to it
You can check according to the following procedure:
Check whether the configuration of SSH password-free login between nodes is successful. You can run the ssh <user_name>#<node_ip> command on the Dag Controller machine to check whether the login succeeds.
Note that if the Dag Controller and Analytics are on the same machine, you also need to configure SSH password-free login.
Check the configuration file of the Dag Controller.
Check whether the SSH user in etc/dag-ctrl-api.yaml is the same as the user who starts the Dag Controller service and configs SSH password-free login.
Check whether the algorithm path in etc/tasks.yaml is correct.
Check whether Hadoop and Java paths in scripts/set_env.sh are correct.
Restart the Dag Controller for the settings to take effect.

.net output in Docker logs

I''m trying to get log output (Console.WriteLine(..)) in my Docker logs, but I'm getting zero avail.
I've tried:
Console.WriteLine(..)
Trace.WriteLine(..)
Flushing the console, flushing the trace.
I can see these outputs in a VS output window when I'm debugging, so they go somoewhere.
I'm on windows Container, using microsoft/aspnet:4.7.1-windowsservercore-1709 and net4.7
These are the logs I get on container start
docker logs -f exportapi
ERROR ( message:Cannot find requested collection element. )
Applied configuration changes to section "system.applicationHost/applicationPools" for "MACHINE/WEBROOT/APPHOST" at configuration commit path "MACHINE/WEBROOT/APPHOST"
You have many good lateral options, like self-contained/server-contained executables (eg. Dotnet Core using microsoft/dotnet:runtime would proxy Console.WriteLine by default off the dotnet new web scaffold). Zero-configuration STDOUT logging has never been a common approach on IIS, but these modern options adopt it as best practice (logging should be a transparent backing service).
If you want or need a chain of three programs/assemblies to get your web service up (ServiceMonitor, W3SVC, and finally your assembly), then you need something like this: https://blog.sixeyed.com/relay-iis-log-entries-to-read-them-in-docker/
Overriding the entrypoint to tail more logs than the image does by default is unfortunately a common hack (not just in Microsoft land). So, in your case, I believe you need at least a trace listener config to emit Trace.WriteLine, and then the above approach to emit it: https://learn.microsoft.com/en-us/dotnet/framework/debug-trace-profile/how-to-create-and-initialize-trace-listeners

Terminate and restart Google DataLab instance?

I am finding when working with larger datasets that the kernel may die, something I also experiance on my local machine. Sometimes it comes back and sometimes not. So even the Tree panel won't react to terminate a errant Kernel. EG "restart" does not work and the server itself seems to die. So the tree view won't respond or refresh. On my local machine I just kill the terminal instance and start over.
What is the "proper" way to restart everything?
FWIW the instance seems pegged at 150% cpu utilization atm
Related: is there any way to allow long running stuff to work?
I am trying to use a report generator (pandas-profiling) on a 2mm record dataset.. Works on my local..
found it here: https://cloud.google.com/datalab/getting-started
FWIW These commands can be used in the new command line shell on the Cloud console page.see https://cloud.google.com/shell/docs/ .. Without the sdk on your machine.. You need to modify the commands slightly since you will be logged into your project already,
Stopping/starting VM instances
You may want to stop a Cloud Datalab managed VM instance to avoid incurring ongoing charges. To stop a Cloud Datalab managed machine instance, go to a command prompt, and run:
$ gcloud auth login
$ gcloud config set project <YOUR PROJECT ID>
$ gcloud preview app versions stop main
After confirming that you want to continue, wait for the command to complete, and make sure that the output indicates that the version has stopped. If you used a non-default instance name when deploying, please use that name instead of "main" in the stop command, above (and in the start command, below).
For restarting a stopped instance, run:
$ gcloud auth login
$ gcloud config set project <YOUR PROJECT ID>
$ gcloud preview app versions start main

Running Apache spark job from Spring Web application using Yarn client or any alternate way

I have recently started using spark and I want to run spark job from Spring web application.
I have a situation where I am running web application in Tomcat server using Spring boot.My web application receives a REST web service request based on that It needs to trigger spark calculation job in Yarn cluster. Since my job can take longer to run and can access data from HDFS, so I want to run the spark job in yarn-cluster mode and I don't want to keep spark context alive in my web layer. One other reason for this is my application is multi tenant so each tenant can run it's own job, so in yarn-cluster mode each tenant's job can start it's own driver and run in it's own spark cluster. In web app JVM, I assume I can't run multiple spark context in one JVM.
I want to trigger spark jobs in yarn-cluster mode from java program in the my web application. what is the best way to achieve this. I am exploring various options and looking your guidance on which one is best
1) I can use spark-submit command line shell to submit my jobs. But to trigger it from my web application I need to use either Java ProcessBuilder api or some package built on java ProcessBuilder. This has 2 issues. First it doesn't sound like a clean way of doing it. I should have a programatic way of triggering my spark applications. Second problem will be I will loose the capability of monitoring the submitted application and getting it's status.. Only crude way of doing it is reading the output stream of spark-submit shell, which again doesn't sound like good approach.
2) I tried using Yarn client to submit the job from spring application. Following is the code that I use to submit spark job using Yarn Client:
Configuration config = new Configuration();
System.setProperty("SPARK_YARN_MODE", "true");
SparkConf conf = new SparkConf();
ClientArguments cArgs = new ClientArguments(sparkArgs, conf);
Client client = new Client(cArgs, config, conf);
client.run();
But when I run the above code, it tries to connect on localhost only. I get this error:
5/08/05 14:06:10 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/08/05 14:06:12 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
So I don't think it can connect to remote machine.
Please suggest, what is best way of doing this with latest version of spark. Later I have plans to deploy this entire application in amazon EMR. So approach should work there also.
Thanks in advance
Spark JobServer might help:https://github.com/spark-jobserver/spark-jobserver, this project receives RESTful web requests and start a spark job. Results is returned as json response.
I also had similar issues trying to run Spark app that connects to YARN cluster - having no cluster config it was trying to connect to the local machine as for the main node of the cluster, which obviously failed.
It worked for me when I've placed core-site.xml and yarn-site.xml into the classpath (src/main/resources in typical sbt or Maven project structure) - application correctly connected to the cluster.
When using spark-submit location of those files is typically specified by HADOOP_CONF_DIR environment variable, but for stand-alone application it didn't have effect.

Application disappear from application list, but host always exists

I have a problem that my application "zdaas" installed by cloudify, it has been worked well for a long time, but today, I find the applicaton "zdaas" is disappearing from application commbox(attachment picture 1), but in the host tab, the machine and usm gsa etc exist normally(picture 2)
then I use Admin to restart GSM, and the result is the application tab has nothing appeared(picture 3).
To solve this problem, I try to restart the management, but the problem always exists
At last, I teardown the cloudify, and reinstall "zdaas". But if my application is running in product enviroment(it works very well), it didnot allow me to teardown.
so, how can I resolve this problem?
Thank you very much!
![enter image description here][1]
![enter image description here][2]
![enter image description here][3]
https: //cloudifysource.zendesk.com/attachments/token/DjjxOBl6CIfPLda5fMpGrI67z/?name=1.png
https: //cloudifysource.zendesk.com/attachments/token/DjjxOBl6CIfPLda5fMpGrI67z/?name=2.png
https: //cloudifysource.zendesk.com/attachments/token/DjjxOBl6CIfPLda5fMpGrI67z/?name=3.png
From the screenshots you sent, it looks like you are using a single Cloudify manager. If it is running without persistence (not saving state to disk) then a restart of the Cloudify Manager host would cause all manager state to be lost.
After such a restart, the agent VMs would reconnect to the manager so you would still see them on the hosts page, but the state of installed services would be lost.
It is recommended to run Cloudify in production with two managers running in a highly-available cluster, so that the failure of any one machine will not cause data loss.
To set the Cloudify manager to run with HA, edit your *-cloud groovy file, and set the numberOfManagementMachines field to a value of 2. When you bootstrap cloudify with this value, 2 manager machines will be started up. Both machines are active, and each backs up the other's data.
cloud {
...
provider {
...
numberOfManagementMachines 2
}
}

Resources