How to loop through python dictionary keys using wild cards? - wildcard

I have created a dictionary and am looping through it to create a final, much smaller dictionary. I have successfully done this, but it is long and cumbersome. It seems to me I should be able to loop through using some sort of wildcard. My problem is that I am using user input, but some keys can be a, a_1, a_2, ...a_5. Not all go up to 5 possible options, and some have as few as one. Please be kind, I am new to programming. I have tried the fnmatch() function to no avail.
check_parm=yae_list.values()
new_dict={}
x=int(len(check_parm))
while x>0:
element=check_parm[x-1]
if element in yae_atom_parms_dic:
new_dict[element]=yae_atom_parms_dic.get(element)
elif not yae_atom_parms_dic.get(element):
element_1=element+'_1'
if element_1 in yae_atom_parms_dic:
new_dict[element_1]=yae_atom_parms_dic.get(element_1)
element_2=element+'_2'
if element_2 in yae_atom_parms_dic:
new_dict[element_2]=yae_atom_parms_dic.get(element_2)
element_3=element+'_3'
if element_3 in yae_atom_parms_dic:
new_dict[element_3]=yae_atom_parms_dic.get(element_3)
element_4=element+'_4'
if element_4 in yae_atom_parms_dic:
new_dict[element_4]=yae_atom_parms_dic.get(element_4)
element_5=element+'_5'
if element_5 in yae_atom_parms_dic:
new_dict[element_5]=yae_atom_parms_dic.get(element_5)
if element_1 or element_2 or element_3 or element_4 or element_5:
print("Which one do you want?")
print new_dict
x=x-1
This is working as I would like, giving the output:
Which one do you want?
{'C_1': (6, 4, 1, 2, 's', -21.4, 1.625, 0, 1, 0), 'C_2': (6, 4, 1, 2, 'p', -11.4,1.625,)}

Can try something like this:
import re
constant_keys = "|".join(map(re.escape, yae_list.values()))
re_expression = re.compile("^(%s)(_[1-5])?$" % constant_keys)
allowed_keys = filter(re_expression.match, yea_atom_parms_dic.keys())
new_dict = {key: yea_atom_parms_dic[key] for key in allowed_keys}
print new_dict
In this one, first you construct regular expression by joining all of the yae_list keys by | character (and escaping them) and then you put such an expression into ^(%s)(_[1-5])? which means that the match would be hit when the key is either equal to any of the keys in | sequence optionally suffixed with underscore and the number between 1 and 5. Next you filter the allowed keys and using them you can create the new dict with the keys from allowed_keys list.
Example:
If you have the keys: "C", "a" and "new", then the regular expression would looks as follow: ^(C|a|new)(_[1-5])?$

Related

Airflow SqlToS3Operator has unwanted an index in the beginning

Recent airflow-providers-amazon has deprecated MySQLToS3Operator and introduced SqlToS3Operator and now it is adding an index column in the beginning of the CSV dump.
For example, if I run the following
sql_to_s3_task = SqlToS3Operator(
task_id="sql_to_s3_task",
sql_conn_id=conn_id_name,
query="SELECT created_at, score FROM my_table",
s3_bucket=bucket_name,
s3_key=key,
replace=True,
)
The S3 file has something like this:
,created_at,score
1,2023-01-01,5
2,2023-01-02,6
The output seems to be a direct dump from Pandas. How can I remove this unwanted preceding index column?
The operator uses pandas DataFrame under the hood.
You should use pd_kwargs. It allows you to pass arguments to include in DataFrame .to_parquet(), .to_json() or .to_csv().
Since your output is csv the relevant pandas.DataFrame.to_csv parameters are:
header: bool or list of str, default True
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
index: bool, default True
Write row names (index).
Thus you can do:
sql_to_s3_task = SqlToS3Operator(
task_id="sql_to_s3_task",
sql_conn_id=conn_id_name,
query="SELECT created_at, score FROM my_table",
s3_bucket=bucket_name,
s3_key=key,
replace=True,
file_format="csv",
pd_kwargs={"index": False, "header": False},
)

jinja2 variable inside a variable

I am passing a dictionary to a template.
dict road_len = {"US_NEWYORK_MAIN":24,
"US_BOSTON_WALL":18,
"FRANCE_PARIS_RUE":9,
"MEXICO_CABOS_STILL":8}
file_handle = output.txt
env.globals.update(country = "MEXICO")
env.globals.update(city = "CABOS")
env.globals.update(street = "STILL")
file_handle.write(env.get_template(template.txt).render(road_len=road_len)))
template.txt
This is a road length is: {{road_len["{{country}}_{{city}}_{{street}}"]}}
Expected output.txt
This is a road length is: 8
But this does not work as nested variable substitution are not allowed.
You never nest Jinja {{..}} markers. You're already inside a template context, so if you want to use the value of a variable you just use the variable. It helps if you're familiar with Python, because you can use most of Python's string formatting constructs.
So you could write:
This is a road length is: {{road_len["%s_%s_%s" % (country, city, street)]}}
Or:
This is a road length is: {{road_len[country + "_" + city + "_" + street]}}
Or:
This is a road length is: {{road_len["{}_{}_{}".format(country, city, street)]}}

What is the "some" meaning in Collect result in Scala

"some" is not a special term which makes the googling seem to just ignore that search.
What I am asking is in my learning below:
b.collect:
Array[(Int, String)] = Array((3,dog), (6,salmon), (3,rat), (8,elephant))
d.collect:
Array[(Int, String)] = Array((3,dog), (3,cat), (6,salmon), (6,rabbit), (4,wolf), (7,penguin))
if I do some join and then collect the result, like b.join(d).collect, I will get the following:
Array[(Int, (String, String))] = Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (3,(dog,dog)), (3,(dog,cat)), (3,(rat,dog)), (3,(rat,cat)))
which seems understandable, however, if I do: b.leftOuterJoin(d).collect, I will get:
Array[(Int, (String, Option[String]))] = Array((6,(salmon,Some(salmon))), (6,(salmon,Some(rabbit))), (3,(dog,Some(dog))), (3,(dog,Some(cat))), (3,(rat,Some(dog))), (3,(rat,Some(cat))), (8,(elephant,None)))
My question is why do I get results seems to be expressed differently, I mean why the second result contains "Some"? what's the difference between with "Some" and without "Some"? Can "Some" be removed? Does "Some" have any impact to any later operations as the content of RDD?
Thank you very much.
When you do the normal join as b.join(d).collect, you get Array[(Int, (String, String))]
This is because of only the same key with RDD b and RDD d so it is always guaranteed to have a value so it returns Array[(Int, (String, String))].
But when you use b.leftOuterJoin(d).collect the return type is Array[(Int, (String, Option[String]))] this is because to handle the null. In leftOuterJoin, there is no guarantee that all the keys of RDD b are available in RDD d, So it is returned as Option[String] which contains two values
Some(String) =>If the key is matched in both RDD
None If the key is present in b and not present in d
You can replace Some by getting the value from it and providing the value in case of None as below.
val z = b.leftOuterJoin(d).map(x => (x._1, (x._2._1, x._2._2.getOrElse("")))).collect
Now you should get Array[(Int, (String, String))] and output as
Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (3,(dog,dog)), (3,(dog,cat)), (3,(rat,dog)), (3,(rat,Some(cat)), (8,(elephant,)))
Where you can replace "" with any other string as you require.
Hope this helps.

pyparsing recursive grammar space separated list inside a comma separated list

Have the following string that I'd like to parse:
((K00134,K00150) K00927,K11389) (K00234,K00235)
each step is separated by a space and alternation is represented by a comma. I'm stuck in the first part of the string where there is a space inside the brackets. The desired output I'm looking for is:
[[['K00134', 'K00150'], 'K00927'], 'K11389'], ['K00234', 'K00235']
What I've got so far is a basic setup to do recursive parsing, but I'm stumped on how to code in a space separated list into the bracket expression
from pyparsing import Word, Literal, Combine, nums, \
Suppress, delimitedList, Group, Forward, ZeroOrMore
ortholog = Combine(Literal('K') + Word(nums, exact=5))
exp = Forward()
ortholog_group = Suppress('(') + Group(delimitedList(ortholog)) + Suppress(')')
atom = ortholog | ortholog_group | Group(Suppress('(') + exp + Suppress(')'))
exp <<= atom + ZeroOrMore(exp)
You are on the right track, but I think you only need one place where you include grouping with ()'s, not two.
import pyparsing as pp
LPAR,RPAR = map(pp.Suppress, "()")
ortholog = pp.Combine('K' + pp.Word(pp.nums, exact=5))
ortholog_group = pp.Forward()
ortholog_group <<= pp.Group(LPAR + pp.OneOrMore(ortholog_group | pp.delimitedList(ortholog)) + RPAR)
expr = pp.OneOrMore(ortholog_group)
tests = """\
((K00134,K00150) K00927,K11389) (K00234,K00235)
"""
expr.runTests(tests)
gives:
((K00134,K00150) K00927,K11389) (K00234,K00235)
[[['K00134', 'K00150'], 'K00927', 'K11389'], ['K00234', 'K00235']]
[0]:
[['K00134', 'K00150'], 'K00927', 'K11389']
[0]:
['K00134', 'K00150']
[1]:
K00927
[2]:
K11389
[1]:
['K00234', 'K00235']
This is not exactly what you said you were looking for:
you wanted: [[['K00134', 'K00150'], 'K00927'], 'K11389'], ['K00234', 'K00235']
I output : [[['K00134', 'K00150'], 'K00927', 'K11389'], ['K00234', 'K00235']]
I'm not sure why there is grouping in your desired output around the space-separated part (K00134,K00150) K00927. Is this your intention or a typo? If intentional, you'll need to rework the definition of ortholog_group, something that will do a delimited list of space-delimited groups in addition to the grouping at parens. The closest I could get was this:
[[[[['K00134', 'K00150']], 'K00927'], ['K11389']], [['K00234', 'K00235']]]
which required some shenanigans to group on spaces, but not group bare orthologs when grouped with other groups. Here is what it looked like:
ortholog_group <<= pp.Group(LPAR + pp.delimitedList(pp.Group(ortholog_group*(1,) & ortholog*(0,))) + RPAR) | pp.delimitedList(ortholog)
The & operator in combination with the repetition operators gives the space-delimited grouping (*(1,) is equivalent to OneOrMore, *(0,) with ZeroOrMore, but also supports *(10,) for "10 or more", or *(3,5) for "at least 3 and no more than 5"). This too is not quite exactly what you asked for, but may get you closer if indeed you need to group the space-delimited bits.
But I must say that grouping on spaces is ambiguous - or at least confusing. Should "(A,B) C D" be [[A,B],C,D] or [[A,B],C],[D] or [[A,B],[C,D]]? I think, if possible, you should permit comma-delimited lists, and perhaps space-delimited also, but require the ()'s when items should be grouped.

XQuery Type of value does not match

declare variable $fb := doc("factbook.xml")/mondial;
for $c in $fb//country
where ($c/encompassed/#continent = 'f0_119') and ($c/#population < 100000)
return concat('Country: ',$c/name, ', Population: ',$c/#population);
it returns:
Type Error: Type of value '
()
' does not match sequence type: xs:anyAtomicType?
At characters 11681-11698
At File "q2_3.xq", line 4, characters 13-67
At File "q2_3.xq", line 4, characters 13-67
At File "q2_3.xq", line 4, characters 13-67
however, if i do not do a concat return, just name or population it will work, and most strange thing is i have another program :
declare variable $fb := doc("factbook.xml")/mondial;
for $c in $fb//country
where $c/religions = 'Seventh-Day Adventist'
order by $c/name
return concat('Country: ',$c/name, ', Population: ',$c/#population);
The return syntax is exactly same, however, it works.
Why this happens?
Without seeing an example of your data it's impossible to say for sure, but if $c/name returns more than one value, then your error would make sense. Do you have any results where there are more than one name element?

Resources