S3CopyObjectOperator - An error occurred (NoSuchKey) when calling the CopyObject operation: The specified key does not exist - airflow

If I run this locally in the cli it runs successfully and copies the files from another bucket/key to mine into the correct location.
aws s3 sync s3://client_export/ref/commissions/snapshot_date=2022-01-01/ s3://bi-dev/KSM/refinery29/commissions/snapshot_date=2022-01-01/
When I try with the S3CopyObjectOperator I see the NoSuchKey error:
copy_commissions_data = S3CopyObjectOperator(
task_id='copy_commissions_data',
aws_conn_id='aws_default',
source_bucket_name='client_export',
dest_bucket_name='bi-dev',
source_bucket_key='ref/commissions/snapshot_date=2022-01-01,
dest_bucket_key='KSM/refix/commissions/snapshot_date=2022-01-01',
dag=dag
)
I've also tried adding a / before and after the key names and both but I get the same error

You are missing quote in the end of line 6, should be:
source_bucket_key='ref/commissions/snapshot_date=2022-01-01',

Related

Dataflow from Colab issue

I'm trying to run a Dataflow job from Colab and getting the following worker error:
sdk_worker_main.py: error: argument --flexrs_goal: invalid choice: '/root/.local/share/jupyter/runtime/kernel-1dbd101c-a79e-432e-89b3-5ba68df104d7.json' (choose from 'COST_OPTIMIZED', 'SPEED_OPTIMIZED')
I haven't provided the flexrs_goal argument, and if I do it doesn't fix this issue. Here are my pipeline options:
beam_options = PipelineOptions(
runner='DataflowRunner',
project=...,
job_name=...,
temp_location=...,
subnetwork='regions/us-west1/subnetworks/default',
region='us-west1'
)
My pipeline is very simple, it's just:
with beam.Pipeline(options=beam_options) as pipeline:
(pipeline
| beam.io.ReadFromBigQuery(
query=f'SELECT column FROM {BQ_TABLE} LIMIT 100')
| beam.Map(print))
It looks like the command line args for the sdk worker are getting polluted by jupyter somehow. I've rolled back to the past two apache-beam library versions and it hasn't helped. I could move over to Vertex Workbench but I've invested a lot in this Colab notebook (plus I like the easy sharing) and I'd rather not migrate.
Figured it out. The PipelineOptions constructor will pull in sys.argv if no parameter is given for the first argument (called flags). In my case it was pulling in the command line args that my jupyter notebook was started with and passing them as Beam options to the workers.
I fixed my issue by doing this:
beam_options = PipelineOptions(
flags=[],
...
)

Airflow PostgresOperator SQL parameter syntax error

I have the following task in my DAG:
create_features_table = PostgresOperator(
task_id="create_features_table",
postgres_conn_id="featuredb",
sql="src/test.sql "
)
But when I test the task, I get this error:
psycopg2.errors.SyntaxError: syntax error at or near "src"
LINE 1: src/test.sql
The content of the test.sql script is:
CREATE TABLE test(
C1 int,
C2 int,
);
I can't point out the error in the syntax, but that's because it is my first DAG. Any help would be greatly appreciated.
If I run the script directly from the postgres container's psql using "\i src/text.sql" it works fine.
I have tested the connection from the airflow web server and the connection works.
I found that I had to put a space before closing the quotes to avoid a jinja2.exeptions.TemplateNotFound error, but haven't been able to find the syntax error.
According to documentation (https://airflow.apache.org/docs/apache-airflow-providers-postgres/stable/_api/airflow/providers/postgres/operators/postgres/index.html#airflow.providers.postgres.operators.postgres.PostgresOperator)
if you are defining your sql script path, it must ends with .sql
You have a white space in your path in the end so Operatort things it’s a query to be executed on your postgre, not file with query. You can see it in the response from postgre. Run this query on your postgre instance src/test.sql and you will get the same syntax error.
You can fix it easily by removing that white space
sql="src/test.sql"

How to refer tests to source/test files in utPLSQL?

I could use a little hand with utPLSQL.
I am trying to produce the test results so that Sonar would pick it up and scan them. so far, Sonar is picking up the report file, but the test executions are ignored because they are not referencing to the appropriate source files.
I am trying to make a reference to the source and test files when running ut.run(ut_sonar_test_reporter()); and our Jenkins does not have utPLSQL-cli installed. Short version: They said they will not install it.
To get a result for a single test, I tried the following:
spool sonar_results.xml;
exec ut.run('test_get_something');
exec ut.run(ut_sonar_test_reporter(), a_source_file_mapping => ut_file_mappings(ut_file_mapping(file_name => 'this_dir/get_something.fnc', object_owner=> 'GET_SOMETHING_OWNER', object_name=> 'GET_SOMETHING', object_type=>'FUNCTION'));
spool off;
And got the following error message:
Error starting at line : 4 in command -
BEGIN ut.run(ut_sonar_test_reporter(), a_source_file_mapping => ut_file_mappings(ut_file_mapping(file_name => 'this_dir/get_something.fnc', object_owner=> 'GET_SOMETHING_OWNER', object_name=> 'GET_SOMETHING', object_type=>'FUNCTION'));
Error report -
ORA-06550: line 1, column 219:
PLS-00306: wrong number or types of arguments to call to 'RUN'
ORA-06550: line 1, column 219
PL/SQL: Statement ignored
utPLSQL's documentation doesn't provide anything about referencing parameters like a_source_file_mapping or a_test_file_mapping.
I am a little stumped.

Slackr: x Problem with `id` - Cannot send messages

I am not an admin so I can't change the scopes. I can send slackr_bot messages to a channel I set up in the creation of the app in UI but doing the below does not work. Has anyone found a solution to this?
I created a txt file called: test.txt
Within that txt file it looks like this:
api_token: xxxxxxxxxxxx
channel: #channel_name
username: myusername
incoming_webhook_url: https://hooks.slack.com/services/xxxxxxxxxxx/xxxxxxxxxxxxx
Then I want to simply send a message but eventually I would like to run the function
ggslackr(qplot(mpg, wt, data=mtcars))
slackr_setup(config_file = "test.txt")
my_message <- paste("I'm sending a Slack message at", Sys.time(), "from my R script.")
slackr_msg(my_message, channel = "#channel_name", as_user=F)
Here is the error message:
Error: Join columns must be present in data.
x Problem with `id`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In structure(vars, groups = group_vars, class = c("dplyr_sel_vars", :
Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
Consider 'structure(list(), *)' instead.
Edit #2:
Okay, I learned some things regarding packages. If I had to do this over, I'd have gone to their github repo and read the issue tracker.
The reason is that it appears that slackr has a few issues related to changes in Slack's API.
And also since there has been a large updating of R (version 4.x) a lot of packages got broken.
My sense is that our issue is with a line of code inside a slackr function (slackr_util.r--iirc) that calls a dplyr join that is looking for a particular id that does not exist.
So, I'm going to watch the issue tracker and see what comes of it.
Edit: Try slackr_bot(my_message,channel = "#general")
worked as advertised!
But ggslackr continues to fail.
I'm having the same issue. I've found in another thread a debugging start:
`rlang::last_error()`
When I run that,
Backtrace:
1. slackr::slackr_msg(my_message, channel = "#general")
5. slackr::slackr_chtrans(channel)
6. slackr::slackr_ims(api_token)
8. dplyr:::left_join.data.frame(users, ims, by = "id", copy = TRUE)
9. dplyr:::join_mutate(...)
10. dplyr:::join_cols(...)
11. dplyr:::standardise_join_by(by, x_names = x_names, y_names = y_names)
12. dplyr:::check_join_vars(by$y, y_names)
So, step 8 there is a join effort by id, which I suppose this implies that 'id' is missing.
yet, if I run from github issue tracker : slackr::slackrSetup(echo=TRUE) I get the following:
{
"SLACK_CHANNEL": ["#general"],
"SLACK_USERNAME": ["slackr_brian"],
"SLACK_ICON_EMOJI": ["NA"],
"SLACK_INCOMING_URL_PREFIX": ["https://hooks.xxxxxxx"],
"SLACK_API_TOKEN": ["token secret"]
}
I'm not sure where to go from here as the issue tracker conversation makes mention of confirming webhooks going to the correct channel and becomes very user specific.
So, that's as far as I have gotten.

OpenMDAO adding command line args for ExternalCodeComp that won't results in runtime error

In OpenMDAO V3.1 I am using an ExternalCodeComp to execute a CFD code. Typically, I would call it as such:
mpirun nodet_mpi --design_run
If the above call is made in the appropriate directory, then it will find the appropriate run file and execute the CFD run. I have tried command args for the ExternalCodeComp;
execute = ['mpirun', 'nodet_mpi', '--design_run']
execute = ['mpirun', 'nodet_mpi --design_run']
execute = ['mpirun nodet_mpi --design_run']
I either get an error such as:
RunTimeError: 255, execvp error on file "nodet_mpi --design_run" (No such file or directory)
Or that the command cannot be found.
Is there any way to setup the execute statement to include commandline args for the flow solver when an input file is not defined?
Thanks in advance!
One detail in your question seems incorrect, you state that you have tried execute = "...". The ExternalCodeComp uses an option called command. I will assume that you are using the correct option in your code.
The most correct form to use is the list with all arguments as single entries in the list:
self.options['command'] = ['mpirun', 'nodet_mpi', '--design_run']
Your error msg seems to indicate that the directory that OpenMDAO is running in is not the same as the directory you would like to execute the CFD code from. The absolute simplest solution would be to make sure that you are in the correct directory via cd in the terminal window before executing your python script.
However, there is likely a reason that your python script is in a different place so there are other options I can suggest:
You can use a combination of os.getcwd() and os.chdir() inside the compute method that you have implemented to make sure you switch into and out of the working directory for the CFD code.
If you would like to, you can modify the entries of the list you've assigned to the self.options['command'] option on the fly within your compute method. You would again be relying on some of the methods in the os module for help. os.path.exists can be used to test if the specific input files you need exist or not, and you can modify the command option accordingly.
For option 2, code would look something like this:
def compute(self, inputs, outputs):
if os.path.exists('some_input.file'):
self.options['command'] = ['mpirun', 'nodet_mpi', '--design_run']
else:
self.options['command'] = ['mpirun', 'nodet_mpi', '--design_run', '--other_options']
# the parent compute function actually runs the external code
super().compute(inputs, outputs)

Resources