In our project we are using DataprocClusterCreateOperator which was under contrib from airflow.contrib.operators import dataproc_operator. It is working fine with airflow version 1.10.14.
We are in a process of upgrading to Airflow 2.1.2 wherein while testing or dags which requires spinning of DataProc Cluster we found error as airflow.exceptions.AirflowException: Invalid arguments were passed to DataprocClusterCreateOperator (task_id: <task_id>). Invalid arguments were: **kwargs: {'config_bucket': None, 'autoscale_policy': None}
I am not able to see any links for this operator support in Airflow 2 so that I can identify the new params or the changes which happened.
Please share the relevant link.
We are using google-cloud-composer version 1.17.2 having Airflow version 2.1.2.
Since Airflow 2.0, 3rd party provider (like Google in this case) operators/hooks has been moved away from Airflow core to separate providers packages. You can read more here.
Since you are using Cloud Composer, the Google providers package is already installed.
Regarding the DataprocClusterCreateOperator, it has been renamed to DataprocCreateClusterOperator and moved to airflow.providers.google.cloud.operators.dataproc so you can import it with:
from airflow.providers.google.cloud.operators.dataproc import DataprocCreateClusterOperator
The accepted parameters differ from the one included in Airflow 1.x. You can find an example of usage here.
The supported parameters for the DataprocCreateClusterOperator in Airflow 2 can be found here, in the source code. The cluster configuration parameters that can be passed to the operator can be found here.
The DataprocClusterCreateOperator has been renamed as DataprocCreateClusterOperator since January 13, 2020 as per this Github commit and has been ported from airflow.contrib.operators to airflow.providers.google.cloud.operators.dataproc import path.
As given in #itroulli's answer, an example implementation of the operator can be found here.
Related
I have a static pipeline with the following architecture:
main.py
setup.py
requirements.txt
module 1
__init__.py
functions.py
module 2
__init__.py
functions.py
dist
setup_tarball
The setup.py and requirements.txt contain the non-native PyPI and local functions which would be used by the Dataflow worker node. The dataflow options are written as follows:
import apache_beam as beam
from apache_beam.io import ReadFromText, WriteToText
from apache_beam.options.pipeline_options import PipelineOptions
from module2.functions import function_to_use
dataflow_options = ['--extra_package=./dist/setup_tarball','temp_location=<gcs_temp_location>', '--runner=DataflowRunner', '--region=us-central1', '--requirements_file=./requirements.txt]
So then the pipeline will run something like this:
options = PipelineOptions(dataflow_options)
p = beam.Pipeline(options=options)
transform = (p | ReadFromText(gcs_url) | beam.Map(function_to_use) | WriteToText(gcs_output_url))
Running this locally takes Dataflow around 6 minutes to complete, where most of the time goes to worker startup. I tried getting this code automated with Composer and re-arranged the architecture as follows: my main (dag) function in dags folder, the modules in plugins, and setup_tarball and requirements.txt in data folder... So the only parameters that really changed are:
'--extra_package=/home/airflow/gcs/data/setup_tarball'
'--requirements_file=/home/airflow/gcs/data/requirements.txt'
When I try running this modified code in Composer, it will work... but it'll take much, much longer... Once the worker starts up, it will take anywhere from 20-30 minutes before actually running the pipeline (which is only a few seconds).. This is much longer than triggering Dataflow from my local code, which was taking only 6 minutes to complete. I realize this question is very general, but since the code works, I don't think it's related to the Airflow task itself. Where would be a reasonable place to start looking at for troubleshooting this problem? At the Airflow level, what can be modified? How does Composer (Airflow) interact with Dataflow, and what can potentially cause this bottleneck?
It turns out that the problem was associated with Composer itself. The fix was to increase the capacity of Composer, i.e., increase vCPUs. Not sure why this would be the case, so if anyone has an idea for the foundation behind this issue, your input would be much appreciated!
Snowflake is not showing in the connections dropdown.
I am using MWAA 2.0 and the providers are already in the requirements.txt
MWAA uses python 3.7 dont know if this can be a thing
Requirements.txt:
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.7.txt"
asn1crypto
azure-common
azure-core
azure-storage-blob
boto3
botocore
certifi
cffi
chardet
cryptography
greenlet
idna
isodate
jmespath
msrest
numpy
oauthlib
oscrypto
pandas
pyarrow
pycparser
pycryptodomex
PyJWT
pyOpenSSL
python-dateutil
pytz
requests
requests-oauthlib
s3transfer
six
urllib3
apache-airflow-providers-http
apache-airflow-providers-snowflake
#apache-airflow-providers-snowflake[slack]
#apache-airflow-providers-slack
snowflake-connector-python >=2.4.1
snowflake-sqlalchemy >=1.1.0
\
If anyone is in this trouble, instead of choosing Snowflake in the dropdown, you can choose AWS as the connection and will work fine.
It took me a while to finally figure this one out after trying many different parameter combinations.
My full Snowflake URL is:
https://xx12345.us-east-2.aws.snowflakecomputing.com
The correct format for the Host field is:
xx12345.us-east-2.snowflakecomputing.com
For the Extra field, this is what worked for me:
{
"account": "xx12345.us-east-2.aws",
"warehouse": "my_warehouse_name",
"database": "my_database_name"
}
Make sure you put Amazon Web Services for the Conn Type, like #AXI said.
Also, I have these modules defined in my requirements.txt file:
apache-airflow-providers-snowflake==1.3.0
snowflake-connector-python==2.4.5
snowflake-sqlalchemy==1.2.4
My Airflow version is 2.0.2.
According to MWAA docs, it should be enough to add apache-airflow-providers-snowflake==1.3.0 to the requirements file. When I added it to the existing MWAA env, where I had already tried many different combinations of packages, it helped partially. It was possible to create a connection using CLI, but not with UI.
But, when I created a new clean MWAA env with the requirements file as stated in mentioned AWS doc, it worked well. The connection was available in UI.
Importing the operator in the following way:
from airflow.contrib.operators.gcs_download_operator import GoogleCloudStorageDownloadOperator
Then trying to use it in a DAG:
download_file = GoogleCloudStorageDownloadOperator(bucket='us-central1-scale-training-d7d12089-bucket',
google_cloud_storage_conn_id='google_cloud_default',
object='params.json',
filename='params.json')
Receiving this error:
'GoogleCloudStorageDownloadOperator' is not defined
Edit: I am using Google Cloud Composer so I assume the relevant dependecies are installed.
If you haven't already, you also need to add the GCP dependency to Airflow:
pip install apache-airflow[gcp_api]
There's more information about installation in the docs: https://airflow.apache.org/installation.html
After installing Yosemite and a new version of MAMP
and when I'm trying to execute
domain/app_dev.php/es/venues/3/show
This route is rendering a form containing a language type field, so it's requiring ICU.
being 'es' the locale i get errors. If I changed it to 'en' there's no problem.
The errors are:
[1/2] ResourceBundleNotFoundException: The resource bundle
"/Users/a77/Documents/DEV/UVox
Com/vendor/symfony/icu/Symfony/Component/Icu/Resources/data/lang/root.php"
does not exist.
[2/2] Couldn't read the indices [Languages] from
"/Users/a77/Documents/DEV/UVox
Com/vendor/symfony/icu/Symfony/Component/Icu/Resources/data/lang/es.res".
The indices also couldn't be found in the fallback locale(s)
"root.res".
My symfony version is 2.5, I'm running the MAMP PHP 5.5.10.
I updated dependencies via composer, including "symfony/intl": "*",
I have followed several webs in order to install icu and intl via pecl. But still get the error. I don't know how to check if the installations or the configs are ok. Maybe you can let me know how to test both via terminal and let you know what is the result...
This is because you are trying to get resources only for language es. But now (from the moment of importing to Symfony icu data) you need to get language resources via language and country codes es_ES.
You may not be able to just simply activate intl.so after the Yosemite update. I solved the issue installing intl.so following an excellent article by Danilo Braband http://dab.io/posts/getting-started-with-symfony-on-yosemite.html
Solved updgrading to Symfony 2.5.6
I am trying to find a tool that will extract the module version information (a part of the module record) fron an Xserver module. For example, in the Xorg logs I can see the following information for the librecord module in my Xorg.0.log file...
[ 39.892] (II) Loading /usr/lib/xorg/modules/extensions/librecord.so
[ 39.905] (II) Module record: vendor="X.Org Foundation"
[ 39.905] compiled for 1.9.0, module version = 1.13.0
[ 39.905] Module class: X.Org Server Extension
[ 39.905] ABI class: X.Org Server Extension, version 4.0
Is there a tools that would allow me to easily extract the aforementioned information. Sometimes you can use modinfo on the module and that will have version information, but that does not always work. The only consistent way I know of now is to parse the xorg log file. Thanks.
Yes, there is and you can also try to write a small one.
http://gitorious.org/xdriverprobe
The problem is that xdriverprobe won't compile on newer servers since I didn't update it to the newest ABIs. Also, xdriverprobe is only used for video drivers, but it can be adapted to be used on other modules. The main source code file (xdriverprobe.c) has less than 500 lines, so you can easily learn by reading it.
It works in Ubuntu 11.10... ./xdriverprobe -o moduledata gives the information you want.
Look at its source code. It does:
dlopen() the module
find a symbol called modulenameModuleData (if your module is called modulename)
that symbol is a XF86ModuleData* See /usr/include/xorg/xf86Module.h
check its member named vers
Spend a few hours and you'll be able to write a very tiny code that does what you want.
More information: http://www.xfree86.org/current/DESIGN17.html#65 (very old document, but most of what's written there is still true today). If you're not happy with that document, you have to read the Xorg source code.
Happy hacking!