When I used NebulaGraph Explorer, I expected to use workflow, but the task was executed with this error.
There are 0 NebulaGraph Analytics available. clusterSize should be less than or equal to it
You can check according to the following procedure:
Check whether the configuration of SSH password-free login between nodes is successful. You can run the ssh <user_name>#<node_ip> command on the Dag Controller machine to check whether the login succeeds.
Note that if the Dag Controller and Analytics are on the same machine, you also need to configure SSH password-free login.
Check the configuration file of the Dag Controller.
Check whether the SSH user in etc/dag-ctrl-api.yaml is the same as the user who starts the Dag Controller service and configs SSH password-free login.
Check whether the algorithm path in etc/tasks.yaml is correct.
Check whether Hadoop and Java paths in scripts/set_env.sh are correct.
Restart the Dag Controller for the settings to take effect.
Related
I see that airflow logs are stored at
base_log_folder/dag_id/task_id/date_time/1.log
i.e:
base_log_folder/dag_id={dag_id}/run_id={run_id}/task_id={task_id}/attempt={try_number}.log
Sometime my logs are huge and know its now a good idea to check them from the web ui, because the chrome cant handle so much size of logs.
I have access to the server and can check the logs.
So how can i break the longs into smaller files - v
i.e
{try_number}_1.log
{try_number}_2.log
{try_number}_3.log
...
Also noted that the log file {trynumber}.log, is only created when the task is completed.
while the task is running i can check the logs in the webui, but i dont see any file in the corresponding log folder.
So i need two things for logging from the server side:
break large log files into smaller files
see the log file live while the task is running, not only after the task is completed
In Airflow 2.4.0 there is an option to view full logs or only the first fragment thus huge logs are not loaded automatically:
Starting Airflow 2.5.0 the web UI also does auto tails for logs (PR)
Airflow does show live logs. If you will set for example a Sensor task that pokes resource you will see the poking attempts in the log when the task is running. It's important to note that there are local logs and remote logs (docs):
In the Airflow UI, remote logs take precedence over local logs when remote logging is enabled. If remote logs can not be found or accessed, local logs will be displayed. Note that logs are only sent to remote storage once a task is complete (including failure). In other words, remote logs for running tasks are unavailable (but local logs are available).
Huge logs are often a sign of not using log levels. If you have entries relevant for debugging then set DEBUG mode rather than INFO mode that way you can better control over the log size displayed in the UI using the AIRFLOW__LOGGING__LOGGING_LEVEL variable.
When I used the workflow feature of NebulaGraph Explorer, the task reported the following error:
handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain?
How to resolve the error?
You need to reconfigure the permissions to 744 on the folder .ssh and 600 on the file .ssh/authorized_keys.
In the workflow/dag controller, it's leveraging SSH to do some of the RPC call(dirty but works), which requires ssh key based authentication.
To debug this, we just need to login to the workflow machine and try manually perform the ssh login with exactly the same user and host(even when it calls itself, SSH is still needed here), the tips are to add -vvv arguments to show in the verbose mode where it could go wrong as #lisa liu posted, it could be permission issues of the corresponding files or other cipher handshake issues.
I've used the ansible install to run all services on a single host and have two separate physical node controllers.
Everything installed fine and all of my services are green. But I don't think image workers are launching to do my first image uploads. As I'm trying to troubleshoot I see that no node controllers are reported by:
euserv-describe-node-controllers
It doesn't return an error just blank output. I've unregistered and re-registered the two node controllers and copied the CLC admin keys with no errors but still can't see output from that command. cloud-output and the various nc log files seem to show successful startup.
I've switched to ImagingServiceAdministrator to look for imaging worker instances with this and got blank output which was what started me looking at NC's:
euca-describe-instances --filter tag-value=euca-internal-imaging-workers
The imaging service is not required for installing instance-store images, e.g.:
python <(curl -Ls https://eucalyptus.cloud/images)
or (on an ansible deployed cloud):
eucalyptus-images --size 1
To check on the status of node controllers in a deployment you will need to have cloud administrator credentials. You can check this using:
euare-getcallerid
euare-accountlist
and verifying that the eucalyptus account is being used.
Node controllers are managed via a cluster controller so you should check the status for both:
euserv-describe-services -a --filter service-type=cluster
euserv-describe-services -a --filter service-type=node
this differs from euserv-describe-node-controllers as it does not include information on running instances.
If there are any issues you can check for service events:
euserv-describe-events
and look at the logs (/var/log/eucalyptus/...) to further investigate.
Check that the IP addresses you registered node controllers using are the ones that the node controllers are listening on (NC_ADDR in /etc/eucalyptus//eucalyptus.conf)
If using firewalld restart/reload the configuration after deployment to ensure running with the latest settings.
I have a bash script that I want to be executed before a user can login to the server. I cannot find any information on when this script is exactly executed for different images. Can I assume that this is before a user is able to login using ssh? I'm using cirros.
openstack server create --user-data before_login.sh ...
As soon as your instance boots up this user-data script "before_login.sh" executes on it before any user login into the instance.
User-scripts run at final stage, this stage runs as late in boot as possible. Any scripts that a user is accustomed to running after logging into a system should run correctly here.
You can check below link for cloud-int behaviors for more information
https://cloudinit.readthedocs.io/en/latest/topics/boot.html
as part of a batch job I create 4 command lines through control-m which invoke a legacy console application written in VB6. The console application invokes an ActiveEx server which performs a set of analytic jobs calculating outputs. The ActiveEx server was coded as a singleton but when invoked through control-m I get 4 instances running. the ActiveEx server does not tear down once the job has completed and the command line has closed it self.
I created 4 .bat files which once launced manually on the server, simulate the calls made through control-m and the ActiveEx server behaves as expected, i.e. there is only 1 instance ever running and once complete it closes down gracefully.
What am I doing wrong?
Control-M jobs are run under a service account and it same as we login as a user and execute a job. How did you test this? Did you manually executed each batch job one after another or you have executed all the batch job at the same time from different terminals? You can do one thing. Run the control-M jobs with a time interval like first one at 09.00 second one at 09.05, third one at 09.10 and forth one at 09.15 and see if that fix your issue.
Maybe your job cannot use the Desktop environment.
Check your agent service settings:
Log on As:
User account under which Control‑M Agent service will run.
Valid values:
Local System Account – Service logs on as the system account.
Allow Service to Interact with Desktop – This option is valid only if the service is running as a local system account.
Selected – the service provides a user interface on a desktop that can
be used by whoever is logged in when the service is started. Default.
Unselected – the service does not provide a user interface.
This Account – User account under which Control‑M Agent service will run.
NOTE: If the owner of any Control-M/Server jobs has a "roaming profile" or if job output (OUTPUT) will be copied to or from other computers, the Log in mode must be set to This Account.
Default: Local System Account