I had been having issues with python 3.7 for quite some time about very pointless indentations, so I decided to get back to 3.6, specifically repl.it Python 3.6.1, and as I mentioned, the errors are for no good reason whatsoever as far as I can tell, the code is as written below:
from random import randint
import functools
printf = functools.partial(print, end=" ")
defNuc = ['C','A','T','G']
def opNuc():
def create():
nuc = [0]
nucop = [0]
length = randint(11,16)
print (length - 1)
for i in range(1,length):
part = randint(1,4)
for a in range(1,4)
if part == a:
nuc = defNuc[a]
nucOp = defNuc[-a]
if i != length - 1:
printf(nuc[i],i,"-")
else:
print(nuc[i],i)
for i in range (1,length):
if i != length - 1:
printf(nucOp[i],"-")
else:
print(nucop[i])
The error is at line 9, at
def create():
and as for the reason of error, it just says
expected an indented block
Edit:
This was completely my stupidity, don't take the post seriously, will be deleted in 10 minutes.
You never finished the definition of opNuc, so the parser is expecting an indented line to continue the body of that function. Either add a pass statement to provide a trivial body:
def opNuc():
pass
or indent the definition of create if that is supposed to be local to the body of opNuc (unlikely, but possible):
def opNuc():
def create():
...
The problem is that your first function, opNuc, was never finished. I have made this simple mistake many times myself and is very easy to miss. It's easy to fix though, just type pass inside of the opNuc function and it should be fine. Hope I helped!
Related
Is there a built-in facility or some operator that will run a sensor and negate its status? I am writing a workflow that needs to detect that an object does not exist in order to proceed to eventual success. I have a sensor, but it detects when the object does exist.
For instance, I would like my workflow to detect that an object does not exist. I need almost exactly S3KeySensor, except that I need to negate its status.
The use case you are describing is checking key in S3, if exist wait otherwise continue workflow. As you mentioned this is a Sensor use case. The S3Hook has function check_for_key that checks if key exist so all needed is just to wrap it with Sensor poke function..
A simple basic implementation would be:
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
from airflow.sensors.base import BaseSensorOperator
class S3KeyNotPresentSensor(BaseSensorOperator):
""" Waits for a key to not be present in S3. """
template_fields: Sequence[str] = ('bucket_key', 'bucket_name')
def __init__(
self,
*,
bucket_key: str,
bucket_name: Optional[str] = None,
aws_conn_id: str = 'aws_default',
verify: Optional[Union[str, bool]] = None,
**kwargs,
):
super().__init__(**kwargs)
self.bucket_name = bucket_name
self.bucket_key = [bucket_key] if isinstance(bucket_key, str) else bucket_key
self.aws_conn_id = aws_conn_id
self.verify = verify
self.hook: Optional[S3Hook] = None
def poke(self, context: 'Context'):
return not self.get_hook().check_for_key(self.bucket_key, self.bucket_name)
def get_hook(self) -> S3Hook:
"""Create and return an S3Hook"""
if self.hook:
return self.hook
self.hook = S3Hook(aws_conn_id=self.aws_conn_id, verify=self.verify)
return self.hook
I ended up going another way. I can use the trigger_rule argument of (any) Task -- by setting it to one_failed or all_failed on the next task I can play around with the desired status.
For example,
file_exists = FileSensor(task_id='exists', timeout=3, poke_interval=1, filepath='/tmp/error', mode='reschedule')
sing = SmoothOperator(task_id='sing', trigger_rule='all_failed')
file_exists >> sing
It requires no added code or operator, but has the possible disadvantage of being somewhat surprising.
Replying to myself in the hope that this may be useful to someone else. Thanks!
I'm trying to extract vector-representations of text using BERT in the transformers libray, and have stumbled on the following part of the documentation for the "BERTModel" class:
Can anybody explain this in more detail? A forward-pass makes intuitive sense to me (am trying to get final hidden states after all), and I can't find any additional information on what "pre and post processing" means in this context.
Thanks up front!
I think this is just general advice concerning working with PyTorch Module's. The transformers modules are nn.Modules, and they require a forward method. However, one should not call model.forward() manually but instead call model(). The reason is that PyTorch does some stuff under the hood when just calling the Module. You can find that in the source code.
def __call__(self, *input, **kwargs):
for hook in self._forward_pre_hooks.values():
result = hook(self, input)
if result is not None:
if not isinstance(result, tuple):
result = (result,)
input = result
if torch._C._get_tracing_state():
result = self._slow_forward(*input, **kwargs)
else:
result = self.forward(*input, **kwargs)
for hook in self._forward_hooks.values():
hook_result = hook(self, input, result)
if hook_result is not None:
result = hook_result
if len(self._backward_hooks) > 0:
var = result
while not isinstance(var, torch.Tensor):
if isinstance(var, dict):
var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
else:
var = var[0]
grad_fn = var.grad_fn
if grad_fn is not None:
for hook in self._backward_hooks.values():
wrapper = functools.partial(hook, self)
functools.update_wrapper(wrapper, hook)
grad_fn.register_hook(wrapper)
return result
You'll see that forward is called when necessary.
I've inherited a piece of code that I need to run somewhere else than the original place with some minor changes. I am trying to map a list of strings to something that applies a function to each element of that list using python 3.6 (a language I am not familiar with).
I would like to use map not list comprehension, but now I doubt this is possible.
In the following example I've tried a combination of for loops, yield (or not), and next(...) or not, but I am not able to make the code working as expected.
I would like to see the print:
AAA! xxx
Found: foo
Found: bar
each time the counter xxx modulo 360 is 0 (zero).
I understand the map function does not execute the code, so then I need to do something to "apply" that function to each element of the input list.
However I am not able to make this thing work. This documentation https://docs.python.org/3.6/library/functions.html#map and https://docs.python.org/3.6/howto/functional.html#iterators do not help that much, I went through it and I think at least one of the commented bits below (# <python code>) should have worked. I am not an experienced python developer and I think I am missing some gotchas about the syntax/conventions of python 3.6 regarding iterators/generators.
issue_counter = 0
def foo_func(serious_stuff):
# this is actually a call to a module to send an email with the "serious_stuff"
print("Found: {}".format(serious_stuff))
def report_issue():
global issue_counter
# this actually executes once per minute (removed the logic to run this fast)
while True:
issue_counter += 1
# every 6 hours (i.e. 360 minutes) I would like to send emails
if issue_counter % 360 == 0:
print("AAA! {}".format(issue_counter))
# for stuff in map(foo_func, ["foo", "bar"]):
# yield stuff
# stuff()
# print(stuff)
iterable_stuff = map(foo_func, ["foo", "bar"])
for stuff in next(iterable_stuff):
# yield stuff
print(stuff)
report_issue()
I get lots of different errors/unexpected behaviors of that for loop when running the script:
not printing anything when I call print(...)
TypeError: 'NoneType' object is not callable
AttributeError: 'map' object has no attribute 'next'
TypeError: 'NoneType' object is not iterable
Printing what I am expecting interleaved by None, e.g.:
AAA! 3047040
Found: foo
None
Found: bar
None
I found out the call to next(iterable_thingy) actually invokes the mapped function.
Knowing the length of the input list when mapping it to generate the iterable, means we know how many times we have to invoke the next(iterable_thingy), so the function report_issue (in my previous example) runs as expected when defined like this:
def report_issue():
global issue_counter
original_data = ["foo", "bar"]
# this executes once per minute
while True:
issue_counter += 1
# every 6 hours I would like to send emails
if issue_counter % 360 == 0:
print("AAA! {}".format(issue_counter))
iterable_stuff = map(foo_func, original_data)
for idx in range(len(original_data)):
next(iterable_stuff)
To troubleshoot this iterable stuff I found useful running ipython (an interactive REPL) to check the type and documentation of the generated iterable, like this:
In [2]: def foo_func(serious_stuff):
...: # this is actually a call to a module to send an email with the "serious_stuff"
...: print("Found: {}".format(serious_stuff)) ...:
In [3]: iterable_stuff = map(foo_func, ["foo", "bar"])
In [4]: iterable_stuff?
Type: map
String form: <map object at 0x7fcdbe8647b8>
Docstring:
map(func, *iterables) --> map object
Make an iterator that computes the function using arguments from
each of the iterables. Stops when the shortest iterable is exhausted.
In [5]: next(iterable_stuff) Found: foo
In [6]: bar_item = next(iterable_stuff) Found: bar
In [7]: bar_item?
Type: NoneType
String form: None
Docstring: <no docstring>
In [8]:
Just for learning purpose, I tried to set a dictionary as a global variable in accumulator the add function works well, but I ran the code and put dictionary in the map function, it always return empty.
But similar code for setting list as a global variable
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, acc1, acc2):
acc1.update(acc2)
if __name__== "__main__":
sc, sqlContext = init_spark("generate_score_summary", 40)
rdd = sc.textFile('input')
#print(rdd.take(5))
dict1 = sc.accumulator({}, DictParam())
def file_read(line):
global dict1
ls = re.split(',', line)
dict1+={ls[0]:ls[1]}
return line
rdd = rdd.map(lambda x: file_read(x)).cache()
print(dict1)
For anyone who arrives at this thread looking for a Dict accumulator for pyspark: the accepted solution does not solve the posed problem.
The issue is actually in the DictParam defined, it does not update the original dictionary. This works:
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, value1, value2):
value1.update(value2)
return value1
The original code was missing the return value.
I believe that print(dict1()) simply gets executed before the rdd.map() does.
In Spark, there are 2 types of operations:
transformations, that describe the future computation
and actions, that call for action, and actually trigger the execution
Accumulators are updated only when some action is executed:
Accumulators do not change the lazy evaluation model of Spark. If they
are being updated within an operation on an RDD, their value is only
updated once that RDD is computed as part of an action.
If you check out the end of this section of the docs, there is an example exactly like yours:
accum = sc.accumulator(0)
def g(x):
accum.add(x)
return f(x)
data.map(g)
# Here, accum is still 0 because no actions have caused the `map` to be computed.
So you would need to add some action, for instance:
rdd = rdd.map(lambda x: file_read(x)).cache() # transformation
foo = rdd.count() # action
print(dict1)
Please make sure to check on the details of various RDD functions and accumulator peculiarities because this might affect the correctness of your result. (For instance, rdd.take(n) will by default only scan one partition, not the entire dataset.)
For accumulator updates performed inside actions only, their value is
only updated once that RDD is computed as part of an action
I would like to parse a query for a database of chemical elements.
The database is stored in a xml file. Parsing that file produces a nested dictionary that is stored in a singleton object that inherit from collections.OrderedDict.
Asking for an element will give me an ordered dictionary of its corresponding properties
(i.e. ELEMENTS['C'] --> {'name':'carbon','neutron' : 0,'proton':6, ...}).
Conversely, asking for a propery will give me an ordered dictionary of its values for all the elements (i.e. ELEMENTS['proton'] --> {'H' : 1, 'He' : 2} ...).
A typical query could be:
mass > 10 or (nucleon < 20 and atomic_radius < 5)
where each 'subquery' (i.e. mass > 10) will return the set of elements that matches it.
Then, the query will be converted and transformed internally to a string that will be evaluated further to produce a set of the indexes of the elements that matched it. In that context the operators and/or are not boolean operator but rather ensemble operator that acts upon python sets.
I recently sent a post for building such a query. Thanks to the useful answers I got, I think that I did more or less the job (I hope on a nice way !) but I still have some questions related to pyparsing.
Here is my code:
import numpy
from pyparsing import *
# This import a singleton object storing the datase dictionary as
# described earlier
from ElementsDatabase import ELEMENTS
and_operator = oneOf(['and','&'], caseless=True)
or_operator = oneOf(['or' ,'|'], caseless=True)
# ELEMENTS.properties is a property getter that returns the list of
# registered properties in the database
props = oneOf(ELEMENTS.properties, caseless=True)
# A property keyword can be quoted or not.
props = Suppress('"') + props + Suppress('"') | props
# When parsed, it must be replaced by the following expression that
# will be eval later.
props.setParseAction(lambda t : "numpy.array(ELEMENTS['%s'].values())" % t[0].lower())
quote = QuotedString('"')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?(\d+(\.\d*)?)?([eE][+-]?\d+)?').setParseAction(lambda t:float(t[0]))
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_expr = props + comparison_operator + (quote | float_ | integer)
comparison_expr.setParseAction(lambda t : "set(numpy.where(%s)%s%s)" % tuple(t))
grammar = Combine(operatorPrecedence(comparison_expr, [(and_operator, 2, opAssoc.LEFT) (or_operator, 2, opAssoc.LEFT)]))
# A test query
res = grammar.parseString('"mass " > 30 or (nucleon == 1)',parseAll=True)
print eval(' '.join(res._asStringList()))
My question are the following:
1 using 'transformString' instead of 'parseString' never triggers any
exception even when the string to be parsed does not match the grammar.
However, it is exactly the functionnality I need. Is there is a way to do so ?
2 I would like to reintroduce white spaces between my tokens in order
that my eval does not fail. The only way I found to do so it the one
implemented above. Would you see a better way using pyparsing ?
sorry for the long post but I wanted to introduce in deeper details its context. BTW, if you find this approach bad, do not hesitate to tell it me!
thank you very much for your help.
Eric
do not worry about my concern, I found a work around. I used the SimpleBool.py example shipped with pyparsing (thanks for the hint Paul).
Basically, I used the following approach:
1 for each subquery (i.e. mass > 10), using the setParseAction method,
I joined a function that returns the set of eleements that matched
the subquery
2 then, I joined the following functions for each logical operator (and,
or and not):
def not_operator(token):
_, s = token[0]
# ELEMENTS is the singleton described in my original post
return set(ELEMENTS.keys()).difference(s)
def and_operator(token):
s1, _, s2 = token[0]
return (s1 and s2)
def or_operator(token):
s1, _, s2 = token[0]
return (s1 or s2)
# Thanks for Paul for the hint.
grammar = operatorPrecedence(comparison_expr,
[(not_token, 1,opAssoc.RIGHT,not_operator),
(and_token, 2, opAssoc.LEFT,and_operator),
(or_token, 2, opAssoc.LEFT,or_operator)])
Please not that these operators acts upon python sets rather than
on booleans.
And that does the job.
I hope that this approach will help anyone of you.
Eric