What is proper Partition configs for Dagster job? - dagster

Currently, I am facing with dagster.core.errors.PartitionExecutionError but error logs from Dagster seem not obvious to me.
dagster.core.errors.PartitionExecutionError: Error occurred during the evaluation of the `run_config_for_partition` function for partition set download_firebase_data_local_partition_set
File "/Users/bryan/miniconda3/envs/dagster-injector/lib/python3.9/site-packages/dagster/grpc/impl.py", line 292, in get_partition_config
return ExternalPartitionConfigData(name=partition.name, run_config=run_config)
File "/Users/bryan/miniconda3/envs/dagster-injector/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/Users/bryan/miniconda3/envs/dagster-injector/lib/python3.9/site-packages/dagster/core/errors.py", line 192, in user_code_error_boundary
raise error_cls(
The above exception was caused by the following exception:
TypeError: daily_download_config() takes 1 positional argument but 2 were given
File "/Users/bryan/miniconda3/envs/dagster-injector/lib/python3.9/site-packages/dagster/core/errors.py", line 185, in user_code_error_boundary
yield
File "/Users/bryan/miniconda3/envs/dagster-injector/lib/python3.9/site-packages/dagster/grpc/impl.py", line 291, in get_partition_config
run_config = partition_set_def.run_config_for_partition(partition)
File "/Users/bryan/miniconda3/envs/dagster-injector/lib/python3.9/site-packages/dagster/core/definitions/partition.py", line 441, in run_config_for_partition
return copy.deepcopy(self._user_defined_run_config_fn_for_partition(partition))
File "/Users/bryan/miniconda3/envs/dagster-injector/lib/python3.9/site-packages/dagster/core/definitions/time_window_partitions.py", line 192, in <lambda>
run_config_for_partition_fn=lambda partition: fn(
My current setup is
#graph
def download():
"""
Download data from BigQuery then upload to S3
"""
extract_data_in_date()
#daily_partitioned_config(start_date=datetime(2021, 12, 1))
def daily_download_config(date: datetime):
return {
"resources": {
"date": date.strftime("%Y-%m-%d")
}
}
download_local_job = download.to_job(
name=f'{NAME}_local',
resource_defs={
**{
"date": make_values_resource(date=str),
"project_name": ResourceDefinition.hardcoded_resource("test-123")
},
**RESOURCES_LOCAL,
},
config=daily_download_config,
executor_def=in_process_executor
)
I am not sure where I am wrong, can you please help

#daily_paritioned_config needs to be able to accept two arguments, one for the start of the time window and one for the end. daily_download_config doesn't actually make use of this end date value, but it still needs to show up in the signature because Dagster will try to pass two arguments to this function regardless

Related

Unhandled Exception for Downloader for Reddit

Having difficulty downloading a series of results when using the GUI. I'm able to log into my account, find a user and/or subreddit, but when I download I only get 1 result and it say
ERROR: Failed to extract due to: Unsupported Domain
Have downloaded all of the requirements in the requirements.txt file but not sure how to resolve this issue. Latest logfile result below:
"levelname": "CRITICAL",
"asctime": "02/12/2023 02:12:47 PM",
"filename": "main.py",
"module": "main",
"name": "DownloaderForReddit.main",
"funcName": "log_unhandled_exception",
"lineno": 48,
"message": "Unhandled exception",
"exc_info": "Traceback (most recent call last):\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2276, in _wrap_pool_connect\n return fn()\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 363, in connect\n return _ConnectionFairy._checkout(self)\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 773, in _checkout\n fairy = _ConnectionRecord.checkout(pool)\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 492, in checkout\n rec = pool._do_get()\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 238, in _do_get\n return self._create_connection()\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection\n return _ConnectionRecord(self)\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 437, in init\n self.__connect(first_connect_check=True)\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 652, in __connect\n connection = pool._invoke_creator(self)\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect\n return dialect.connect(*cargs, **cparams)\n File "/home/ads/projects/DownloaderForReddit-master/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 489, in c###
I think it has something to do with the SQLite package or lack there of. I'm very new to coding and trying to run on my linux machine in a virtual environment.

AuthorizationFailed - "The client 'xxx' with object id 'xxx does not have authorization to perform action

I've tried to get Workspace from config which I do have access to, but it fails with the following error:
import azureml.core
print("SDK version:", azureml.core.VERSION)
from azureml.core.workspace import Workspace
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')
SDK version: 0.1.80 Found the config file in:
C:\Users\gubert\Repos\Gimmonix\HotelMappingAI\aml_config\config.json
get_workspace error using subscription_id=xxxxxxxxxxxxxxxxxxxxxxx,
resource_group_name=xxxxxxxxxxxx, workspace_name=gmx-ml-mapping
Traceback (most recent call last): File
"C:\Users\gubert.azureml\envs\myenv\lib\site-packages\azureml_project_commands.py",
line 320, in get_workspace workspace_name) File
"C:\Users\gubert.azureml\envs\myenv\lib\site-packages\azureml_base_sdk_common\workspace\operations\workspaces_operations.py",
line 78, in get raise
models.ErrorResponseWrapperException(self._deserialize, response)
azureml._base_sdk_common.workspace.models.error_response_wrapper.ErrorResponseWrapperException:
Operation returned an invalid status code 'Forbidden'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File
"c:\Users\gubert.vscode\extensions\ms-python.python-2018.10.1\pythonFiles\experimental\ptvsd_launcher.py",
line 38, in main(sys.argv) File
"c:\Users\gubert.vscode\extensions\ms-python.python-2018.10.1\pythonFiles\experimental\ptvsd\ptvsd_main_.py",
line 265, in main wait=args.wait) File
"c:\Users\gubert.vscode\extensions\ms-python.python-2018.10.1\pythonFiles\experimental\ptvsd\ptvsd_main_.py",
line 256, in handle_args run_main(addr, name, kind, *extra, **kwargs)
File
"c:\Users\gubert.vscode\extensions\ms-python.python-2018.10.1\pythonFiles\experimental\ptvsd\ptvsd_local.py",
line 52, in run_main runner(addr, name, kind == 'module', *extra,
**kwargs) File "c:\Users\gubert.vscode\extensions\ms-python.python-2018.10.1\pythonFiles\experimental\ptvsd\ptvsd\runner.py",
line 32, in run set_trace=False) File
"c:\Users\gubert.vscode\extensions\ms-python.python-2018.10.1\pythonFiles\experimental\ptvsd\ptvsd_vendored\pydevd\pydevd.py",
line 1283, in run return self._exec(is_module, entry_point_fn,
module_name, file, globals, locals) File
"c:\Users\gubert.vscode\extensions\ms-python.python-2018.10.1\pythonFiles\experimental\ptvsd\ptvsd_vendored\pydevd\pydevd.py",
line 1290, in _exec pydev_imports.execfile(file, globals, locals) #
execute the script File
"c:\Users\gubert.vscode\extensions\ms-python.python-2018.10.1\pythonFiles\experimental\ptvsd\ptvsd_vendored\pydevd_pydev_imps_pydev_execfile.py",
line 25, in execfile exec(compile(contents+"\n", file, 'exec'), glob,
loc) File "c:\Users\gubert\Repos\Gimmonix\HotelMappingAI\test.py",
line 8, in ws = Workspace.from_config() File
"C:\Users\gubert.azureml\envs\myenv\lib\site-packages\azureml\core\workspace.py",
line 153, in from_config auth=auth) File
"C:\Users\gubert.azureml\envs\myenv\lib\site-packages\azureml\core\workspace.py",
line 86, in init auto_rest_workspace = _commands.get_workspace(auth,
subscription_id, resource_group, workspace_name) File
"C:\Users\gubert.azureml\envs\myenv\lib\site-packages\azureml_project_commands.py",
line 326, in get_workspace resource_error_handling(response_exception,
WORKSPACE) File
"C:\Users\gubert.azureml\envs\myenv\lib\site-packages\azureml_base_sdk_common\common.py",
line 270, in resource_error_handling raise
ProjectSystemException(response_message)
azureml.exceptions._azureml_exception.ProjectSystemException: {
"error_details": { "error": { "code": "AuthorizationFailed",
"message": "The client 'xxxxxxxxxx#microsoft.com' with object id
'xxxxxxxxxxxxx' does not have authorization to perform action
'Microsoft.MachineLearningServices/workspaces/read' over scope
'/subscriptions/xxxxxxxxxxxxxx/resourceGroups/CarsolizeCloud - Test
Global/providers/Microsoft.MachineLearningServices/workspaces/gmx-ml-mapping'."
} }, "status_code": 403, "url":
"https://management.azure.com/subscriptions/xxxxxxxxxxxxx/resourceGroups/CarsolizeCloud%20-%20Test%20Global/providers/Microsoft.MachineLearningServices/workspaces/gmx-ml-mapping?api-version=2018-03-01-preview"
}
Try using the newest SDK version 1.0.10, this is a fairly old preview version you're using. If you still have a problem, let me know as I work on this SDK.

pexpect python throw error

Although this is my first attempt at using pexpect, the python3 script using pexpect is pretty simple; yet it fails.
#!/usr/bin/env python3
import sys
import pexpect
SSH_NEWKEY = r'Are you sure you want to continue connecting \(yes/no\)\?'
child = pexpect.spawn("ssh -i /user/aws/key.pem ec2-user#xxx.xxx.xxx.xxx date")
i = child.expect( [ pexpect.TIMEOUT, SSH_NEWKEY )
if i == 1:
child.sendline('yes')
print(child.before)
The SSH_NEWKEY is the only response I'm expecting, but the example showed a list containing pexpect.TIMEOUT in it so I used it.
$ ./test.py
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/pexpect/spawnbase.py", line 144, in read_nonblocking
s = os.read(self.child_fd, size)
OSError: [Errno 5] Input/output error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/pexpect/expect.py", line 97, in expect_loop
incoming = spawn.read_nonblocking(spawn.maxread, timeout)
File "/usr/local/lib/python3.4/site-packages/pexpect/pty_spawn.py", line 455, in read_nonblocking
return super(spawn, self).read_nonblocking(size)
File "/usr/local/lib/python3.4/site-packages/pexpect/spawnbase.py", line 149, in read_nonblocking
raise EOF('End Of File (EOF). Exception style platform.')
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./min.py", line 15, in <module>
i = child.expect( [ pexpect.TIMEOUT, SSH_NEWKEY ] )
File "/usr/local/lib/python3.4/site-packages/pexpect/spawnbase.py", line 315, in expect
timeout, searchwindowsize, async)
File "/usr/local/lib/python3.4/site-packages/pexpect/spawnbase.py", line 339, in expect_list
return exp.expect_loop(timeout)
File "/usr/local/lib/python3.4/site-packages/pexpect/expect.py", line 102, in expect_loop
return self.eof(e)
File "/usr/local/lib/python3.4/site-packages/pexpect/expect.py", line 49, in eof
raise EOF(msg)
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
<pexpect.pty_spawn.spawn object at 0x7f70ea4fbcf8>
command: /usr/bin/ssh
args: ['/usr/bin/ssh', '-i', '/user/aws/key.pem', 'ec2-user#xxx.xxx.xxx.xxx', 'date']
searcher: None
buffer (last 100 chars): b''
before (last 100 chars): b'Fri May 6 13:50:18 EDT 2016\r\n'
after: <class 'pexpect.exceptions.EOF'>
match: None
match_index: None
exitstatus: 0
flag_eof: True
pid: 31293
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
What am I missing?
CentOS 6.4
python 3.4.3
An EOF error is being raised during your expect call. This means that the response received does not match SSH_NEWKEY, and reaches end of file within the timeout period. To catch this exception, you should change your except line to read:
i = child.expect( [ pexpect.TIMEOUT, SSH_NEWKEY, pexpect.EOF)
You can then make your if more robust:
if i == 1:
child.sendline('yes')
elif i == 0:
print "Timeout"
elif i == 2:
print "EOF"
print(child.before)
This doesn't solve the reason behind why you are on receiving a response with the expected string - it's hard to know without looking at more code but it's likely because you have the response slightly wrong. If you manually type in the SSH string, you should be able to see the response you can expect, and enter this response into your code.
You can also print child.before after your expect call, or print child.read() instead of your expect call to see what is being sent back as a response.

"Adapter does not support geometry" exception when declaring geometry field

In my application, an "Adapter does not support geometry" exception is being thrown when attempting to create a field of type, "geometry()". For my test application, I'm using an sqlite DB (production will use postgres):
db = DAL('sqlite://storage.sqlite', pool_size = 1, fake_migrate_all= False)
The DB table in question is declared within a class, inside of a module, and contains a several fields, some of which contain location data:
from gluon.dal import Field, geoPoint, geoLine, geoPolygon
class Info(Base_Model):
def __init__(...):
try:
db.define_table('t_info',
...
Field('f_geolocation', type='geometry()',
label = current.T('Geolocation')),
Field('f_city', type='string',
label = current.T('City')),
...
except Exception as e:
...
Edit:
As per Anthony's suggestion, I've modified the DAL constructor call to the following:
db = DAL('spatialite://storage.sqlite', pool_size = 1)
It produces the following error message:
Traceback (most recent call last):
File "C:\...\web2py\gluon\restricted.py", line 227, in restricted
exec ccode in environment
File "C:/My_Stuff/Programs/web2py/applications/Proj/models/db.py", line 38, in <module>
db = DAL('spatialite://storage.sqlite', pool_size = 1)
File "C:\...\web2py\gluon\packages\dal\pydal\base.py", line 171, in __call__
obj = super(MetaDAL, cls).__call__(*args, **kwargs)
File "C:\...\web2py\gluon\packages\dal\pydal\base.py", line 457, in __init__
raise RuntimeError("Failure to connect, tried %d times:\n%s" % (attempts, tb))
RuntimeError: Failure to connect, tried 5 times:
Traceback (most recent call last):
File "C:\...\web2py\gluon\packages\dal\pydal\base.py", line 435, in __init__
self._adapter = ADAPTERS[self._dbname](**kwargs)
File "C:\...\web2py\gluon\packages\dal\pydal\adapters\base.py", line 53, in __call__
obj = super(AdapterMeta, cls).__call__(*args, **kwargs)
File "C:\...\web2py\gluon\packages\dal\pydal\adapters\sqlite.py", line 169, in __init__
if do_connect: self.reconnect()
File "C:\...\web2py\gluon\packages\dal\pydal\connection.py", line 129, in reconnect
self.after_connection_hook()
File "C:\...\web2py\gluon\packages\dal\pydal\connection.py", line 81, in after_connection_hook
self.after_connection()
File "C:\...\web2py\gluon\packages\dal\pydal\adapters\sqlite.py", line 177, in after_connection
self.execute(r'SELECT load_extension("%s");' % libspatialite)
File "C:\...\web2py\gluon\packages\dal\pydal\adapters\base.py", line 1326, in execute
return self.log_execute(*a, **b)
File "C:\...\web2py\gluon\packages\dal\pydal\adapters\base.py", line 1320, in log_execute
ret = self.cursor.execute(command, *a[1:], **b)
OperationalError: The specified module could not be found.
If you want to use geometry fields with SQLite, you must use the spatialite adapter, which makes use of the SpatialLite extension for SQLite:
db = DAL('spatialite://storage.sqlite', pool_size = 1)
Note, you must have spatialite installed for this to work.

Getting the input from a PUT method in Web.py

I'm using the following code in my server program:
class AddLibSong:
def PUT(self):
db = MahData.getDBConnection()
songs = json.loads(web.input().to_add)
addToLibrary(songs)
return
But for some reason when I do a PUT with the data:
"to_add=[ { "album" : "Unknonwn", "artist" : "Unknonwn", "host_lib_id" : "1", "is_deleted" :
"false", "server_lib_id" : "-1", "song" : "Moneytalks" } ]"
I get the following error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/web/application.py", line 237, in process
return self.handle()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/web/application.py", line 228, in handle
return self._delegate(fn, self.fvars, args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/web/application.py", line 409, in _delegate
return handle_class(cls)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/web/application.py", line 385, in handle_class
return tocall(*args)
File "/Users/kurtis/sandbox/udj/webserver/Library.py", line 114, in PUT
song = json.loads(web.input().to_add)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/web/utils.py", line 76, in __getattr__
raise AttributeError, k
AttributeError: 'to_add'
127.0.0.1:51096 - - [29/Sep/2011 19:02:58] "HTTP/1.1 PUT /add_songs_to_library" - 500 Internal Server Error
Anybody know why this is? I think I saw something about Web.py begin only able to get input if given a POST or GET but I didn't see anything in the source code that should prevent this.
Anyway, if you want more details on how to use PUT with WebPy I would advice you this great link.
To make it work on the last version of webpy you should change the "main" code to that:
if __name__ == "__main__":
app=web.application(urls, globals())
app.run()

Resources