I have prepared a robot test script and now I'm trying to run the script in multiple browsers ( same time ) using Blazemeter - Taurus. The Tauyus yml file looks like the code bellow.
I have used the same method in JMeter and Taurus seems to run smoothly with Jmeter as expected.
execution:
- concurrency: 5
executor: selenium
runner: robot
ramp-up: 50s
hold-for: 2h
scenario:
script: WebFlow.robot
reporting:
- console
- final-stats
- blazemeter
I'm expecting to start 5 browser windows and run the robot script concurrently. But now even the concurrency is 5 it will open browsers one at a time and once the whole robot script finished running it will start the browser for the second time.
In Taurus you can easily create multiple execution instances which will be parallel for robot scripts and will aggregate results into single report as expected. For example:
execution:
- executor: robot
concurrency: 1
iterations: 5
scenario:
script: /tools/robot/phx-read-1.robot
- executor: robot
concurrency: 1
iterations: 5
scenario:
script: /tools/robot/phx-read-2.robot
- executor: robot
concurrency: 1
iterations: 5
scenario:
script: /tools/robot/phx-read-3.robot
- executor: robot
concurrency: 1
iterations: 5
scenario:
script: /tools/robot/phx-read-4.robot
- executor: robot
concurrency: 1
iterations: 5
scenario:
script: /tools/robot/phx-read-5.robot
reporting:
- console
- final-stats
- blazemeter
Yes, you have to specify it many times... but easily scriptable. In my case I actually have to have different scripts anyway, but Taurus nicely aggregates everything.
I would recommend checking out the pabot library for robot. I have used it before for running tests in parallel and it worked great.
A parallel executor for Robot Framework tests. With Pabot you can split one execution into multiple and save test execution time.
Related
I have a python project which uses Gitlab Job retry api to retry a job of a pipeline.
But my retry job is getting failed with the error "This job depends on other jobs with expired / erased artifact". What could be the reason for this error?
stages:
- build
build:
tags: [kubernetes, linux, default]
image: #image-url
stage: build
script:
- python3 setup.py sdist bdist wheel
artifacts:
paths:
- $CI_PROJECT_DIR/dist
- ${CI_PROJECT_DIR}/job
- ${CI_PROJECT_DIR}/*.egg-info/PKG-INFO
expire_in: 600 mins
your artifacts expire after 600 minutes, so if you re-run a pipeline stage later than that, the artifact will not be present any more. If the pipeline stage you re-ran depends on a previous stage's artifact, then the error you are seeing occurs
I'm running a DAG that runs once per day. It starts with 9 concurrently running tasks that all do the same thing - each is basically polling S3 to see if that tasks's designated 1 file exists. Each task is the same code in Airflow and is put into the structure in the same way. I have 1 of these tasks, which, on random days, fails to "begin" - it won't enter the running stage. It just sits as queued . When it does this, here's what its log says
*** Log file isn't local.
*** Fetching here: http://:8793/log/my.dag.name./my_airflow_task/2020-03-14T07:00:00
*** Failed to fetch log file from worker.
*** Reading remote logs...
Could not read logs from s3://mybucket/airflow/logs/my.dag.name./my_airflow_task/2020-03-14T07:00:00
Why does this only happen on random days? All similar questions I've seen point to this error happening consistently, and once overcome, no longer continues. To "trick" this task into "running" I manually touch whatever the name of the log file is supposed to be, and then it changes to running.
So the issue appears that it had to do with the system's ownership rules regarding the folder the logs for that particular task wrote to. I used a CI tool to ship the new task_3 when I updated my Airflow's Python code to the production environment, so the task was created that way. When I peaked for log directory ownership, I noticed this for the tasks:
# inside/airflow/log/dir:
drwxrwxr-x 2 root root 4096 Mar 25 14:53 task_3 # is the offending task
drwxrwxr-x 2 airflow airflow 20480 Mar 25 00:00 task_2
drwxrwxr-x 2 airflow airflow 20480 Mar 25 15:54 task_1
So, I think what was going on, was that randomly, Airflow couldn't get the permission to write the log file, thus it wouldn't start the rest of the task. When I applied the appropriate chown command using something like sudo chown -R airflow:airflow task_3 . Ever since I changed this, the issue has disappeared.
Running airflow (v1.10.5) dag that ran fine with SequentialExecutor now has many (though not all) simple tasks that fail without any log information when running with LocalExecutor and minimal parallelism, eg.
<airflow.cfg>
# overall task concurrency limit for airflow
parallelism = 8 # which is same as number of cores shown by lscpu
# max tasks per dag
dag_concurrency = 2
# max instances of a given dag that can run on airflow
max_active_runs_per_dag = 1
# max threads used per worker / core
max_threads = 2
# 40G of RAM available total
# CPUs: 8 (sockets 4, cores per socket 4)
see https://www.astronomer.io/guides/airflow-scaling-workers/
Looking at the airflow-webserver.* logs nothing looks out of the ordinary, but looking at airflow-scheduler.out I see...
[airflow#airflowetl airflow]$ tail -n 20 airflow-scheduler.out
....
[2019-12-18 11:29:17,773] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table1 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed for try_number 1
[2019-12-18 11:29:17,779] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table2 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed for try_number 1
[2019-12-18 11:29:17,782] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table3 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed for try_number 1
[2019-12-18 11:29:18,833] {scheduler_job.py:832} WARNING - Set 1 task instances to state=None as their associated DagRun was not in RUNNING state
[2019-12-18 11:29:18,844] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table4 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status success for try_number 1
....
but not really sure what to take away from this.
Anyone know what could be going on here or how to get more helpful debugging info?
Looking again at my lscpu specs, I noticed...
[airflow#airflowetl airflow]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
Notice Thread(s) per core: 1
Looking at my airflow.cfg settings I see max_threads = 2. Setting max_threads = 1 and restarting both the scheduler seems to have fixed the problem.
If anyone knows more about what exactly is going wrong under the hood (eg. why the task fails rather than just waiting for another thread to become available), would be interested to hear about it.
Our airflow installation is using CeleryExecutor.
The concurrency configs were
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 16
# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 16
# Are DAGs paused by default at creation
dags_are_paused_at_creation = True
# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 64
# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 16
[celery]
# This section only applies if you are using the CeleryExecutor in
# [core] section above
# The app name that will be used by celery
celery_app_name = airflow.executors.celery_executor
# The concurrency that will be used when starting workers with the
# "airflow worker" command. This defines the number of task instances that
# a worker will take, so size up your workers based on the resources on
# your worker box and the nature of your tasks
celeryd_concurrency = 16
We have a dag that executes daily. It has around some tasks in parallel following a pattern that senses whether the data exists in hdfs then sleep 10 mins, and finally upload to s3.
Some of the tasks has been encountering the following error:
2019-05-12 00:00:46,212 ERROR - Executor reports task instance <TaskInstance: example_dag.task1 2019-05-11 04:00:00+00:00 [queued]> finished (failed) although the task says its queued. Was the task killed externally?
2019-05-12 00:00:46,558 INFO - Marking task as UP_FOR_RETRY
2019-05-12 00:00:46,561 WARNING - section/key [smtp/smtp_user] not found in config
This kind of error occurs randomly in those tasks. When this error happens, the state of task instance is immediately set to up_for_retry, and no logs in the worker nodes. After some retries, they execute and finished eventually.
This problem sometimes gives us large ETL delay. Anyone knows how to solve this problem?
We were facing similar problems , which was resolved by
"-x, --donot_pickle" option.
For more information :- https://airflow.apache.org/cli.html#backfill
I was seeing very similar symptoms in my DagRuns. I thought it was due to the ExternalTaskSensor and concurrency issues given the queuing and killed task language that looked like this: Executor reports task instance <TaskInstance: dag1.data_table_temp_redshift_load 2019-05-20 08:00:00+00:00 [queued]> finished (failed) although the task says its queued. Was the task killed externally? But when I looked at the worker logs, I saw there was an error caused by setting a variable with Variable.set in my DAG. The issue is described here duplicate key value violates unique constraint when adding path variable in airflow dag where the scheduler polls the dagbag at regular intervals to refresh any changes dynamically. The error with every heartbeat was causing significant ETL delays.
Are you performing any logic in your wh_hdfs_to_s3 DAG (or others) that might be causing errors or delays / these symptoms?
We fixed this already. Let me answer myself question:
We have 5 airflow worker nodes. After installing flower to monitor the tasks distributed to these nodes. We found out that the failed task was always sent to a specific node. We tried to use airflow test command to run the task in other nodes and they worked. Eventually, the reason was a wrong python package in that specific node.
The issue I've got is that PhpUnit does not function (properly, something does happen) when just plain clicking "Run" (Shift + F10 on Windows) in PhpStorm.
First up, followed tutorials/setup guides:
https://blog.jetbrains.com/phpstorm/2016/11/docker-remote-interpreters/
https://blog.alejandrocelaya.com/2017/02/01/run-phpunit-tests-inside-docker-container-from-phpstorm/
https://stackoverflow.com/a/47578104
So now, pretty much got a working setup, apart from it doesn't.
Testing started at 15:21 ...
[docker://IMAGE_NAME:latest/]:php bin/.phpunit/phpunit-6.5/phpunit --configuration /var/www/html/phpunit.xml.dist --teamcity
PHPUnit 6.5.14 by Sebastian Bergmann and contributors.
Testing Project Test Suite
Fatal error: Uncaught PDOException: SQLSTATE[HYT00]: [unixODBC][Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired in /var/www/html/src/Legacy/Connection/MssqlConnection.php on line 178
PDOException: SQLSTATE[HYT00]: [unixODBC][Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired in /var/www/html/src/Legacy/Connection/MssqlConnection.php on line 178
Call Stack:
0.0003 393408 1. {main}() /var/www/html/bin/.phpunit/phpunit-6.5/phpunit:0
0.0571 923544 2. PHPUnit\TextUI\Command::main() /var/www/html/bin/.phpunit/phpunit-6.5/phpunit:17
0.0571 923656 3. Symfony\Bridge\PhpUnit\Legacy\CommandForV6->run() /var/www/html/bin/.phpunit/phpunit-6.5/src/TextUI/Command.php:148
0.2019 4269152 4. Symfony\Bridge\PhpUnit\Legacy\TestRunnerForV6->doRun() /var/www/html/bin/.phpunit/phpunit-6.5/src/TextUI/Command.php:195
0.2158 4697272 5. PHPUnit\Framework\TestSuite->run() /var/www/html/bin/.phpunit/phpunit-6.5/src/TextUI/TestRunner.php:545
0.2181 4702968 6. PHPUnit\Framework\TestResult->startTestSuite() /var/www/html/bin/.phpunit/phpunit-6.5/src/Framework/TestSuite.php:689
0.2233 4717824 7. App\Tests\Helper\DeleteDBOnceListener->startTestSuite() /var/www/html/bin/.phpunit/phpunit-6.5/src/Framework/TestResult.php:368
0.2270 4739216 8. App\Legacy\Connection\MssqlConnection->databaseExists() /var/www/html/tests/Helper/DeleteDBOnceListener.php:55
0.2270 4739216 9. App\Legacy\Connection\MssqlConnection->findDbFromDSN() /var/www/html/src/Legacy/Connection/MssqlConnection.php:38
0.2271 4740104 10. PDO->__construct() /var/www/html/src/Legacy/Connection/MssqlConnection.php:178
Process finished with exit code 255
Obviously this reads as: cannot connect to DB. But!
If I log into the Docker instance, and then run the command, it works! Command:
php bin/.phpunit/phpunit-6.5/phpunit --configuration /var/www/html/phpunit.xml.dist
Generates output:
user#hash:/var/www/html# php bin/.phpunit/phpunit-6.5/phpunit --configuration /var/www/html/phpunit.xml.dist
PHPUnit 6.5.14 by Sebastian Bergmann and contributors.
Testing Project Test Suite
Dropping current database...
.Creating database..
................................................................ 65 / 80 ( 81%)
............... 80 / 80 (100%)
Time: 3.25 minutes, Memory: 56.12MB
OK (80 tests, 336 assertions)
So why, when executing using "Run", does this fail when doing it from PhpStorm? Did I miss a setting?
Since you asked your question 3 years ago you probably also answered it yourself. However I will leave my answer for people who enter here in future.
The error HYT00 is a timeout error from MSSQL client. This basically means that your client does not have access to the server (or that it responding very slow).
If you configure PHPStorm to use the docker container as an interpreter it's doing exactly this - only using the PHP you have inside the docker container as an interpreter but not running it inside of the container. You are still running it in your host environment, in your host network etc.
Make sure you have access to the database from your host machine and that you are using a correct connection string while running the tests. In your test configuration you may have the internal hostnames of docker (between docker containers you may for example use container names as their hostnames). When running from the host machine you will need an accessible hsotname, like localhost:1433 if you have the ports mapped to host machine.
With this checked the tests should execute correctly.
Just an offtopic - unless you are doing integration tests it's not a good idea to connect to SQL in Unit tests. Better to use mocks:)