Return multiple nested dictionaries from Tcl - dictionary

I have a Tcl proc that creates two dictionaries from a large file. It is something like this:
...
...
proc makeCircuitData {spiceNetlist} {
#read the spiceNetlist file line by line
# create a dict with multilevel nesting called elementMap that will have the following structure:
# elementMap key1 key2 value12
# elementMap keyA keyB valueAB
# and so on
# ... some other code here ...
# create another dict with multilevel nesting called cktElementAttr that will have the following structure:
# cktElementAttr resistor leftVoltageNode1 rightVoltageNode1 resValue11
# cktElementAttr resistor leftVoltageNode2 rightVoltageNode2 resValue12
# cktElementAttr inductor leftVoltageNode2 rightVoltageNode2 indValue11
# cktElementAttr inductor leftVoltageNode2 rightVoltageNode2 indValue12
# cktElementAttr capacitor leftVoltageNode2 rightVoltageNode2 capValue11
# ... so on...
}
I want to return these two nested dictionaries:
cktElementAttr and elementMap from the above types of procedures as these two dictionaries get used by other parts of my program.
What is the recommended way to return two dictionaries from Tcl procs?
Thanks.

This should work:
return [list $cktElementAttr $elementMap]
Then, at the caller, you can assign the return value to a list:
set theDictionaries [makeCircuitData ...]
or assign them to different variables:
lassign [makeCircuitData ...] cEltAttr elmMap
In Tcl 8.4 or older (which are obsolete!), you can (ab)use foreach to do the job of lassign:
foreach {cEltAttr elmMap} [makeCircuitData ...] break
Documentation:
break,
foreach,
lassign,
list,
return,
set

Related

snakemake error: 'Wildcards' object has no attribute 'batch'

I don't understand how to redefine my snakemake rule to fix the Wildcards issue below.
Ignore the logic of batches, it internally makes sense in the python script. In theory, I want the rule to be run for each batch 1-20. I use BATCHES list for {batch} in output, and in the shell command, I use {wildcards.batch}:
OUTDIR="my_dir/"
nBATCHES = 20
BATCHES = list(range(1,21)) # [1,2,3 ..20] list
[...]
rule step5:
input:
ids = expand('{IDLIST}', IDLIST=IDLIST)
output:
type1 = expand('{OUTDIR}/resources/{batch}_output_type1.csv.gz', OUTDIR=OUTDIR, batch=BATCHES),
type2 = expand('{OUTDIR}/resources/{batch}_output_type2.csv.gz', OUTDIR=OUTDIR, batch=BATCHES),
type3 = expand('{OUTDIR}/resources/{batch}_output_type3.csv.gz', OUTDIR=OUTDIR, batch=BATCHES)
shell:
"./some_script.py --outdir {OUTDIR} --idlist {input.ids} --total_batches {nBATCHES} --current_batch {wildcards.batch}"
Error:
RuleException in rule step5 in line 241 of Snakefile:
AttributeError: 'Wildcards' object has no attribute 'batch', when formatting the following:
./somescript.py --outdir {OUTDIR} --idlist {input.idlist} --totalbatches {nBATCHES} --current_batch {wildcards.batch}
Executing script for a single batch manually looks like this (and works): (total_batches is a constant; current_batch is supposed to iterate)
./somescript.py --outdir my_dir/ --idlist ids.csv --total_batches 20 --current_batch 1
You seem to want to run the rule step5 once for each batch in BATCHES. So you need to structure your Snakefile to do exactly that.
In the following Snakefile running the rule all runs your rule step5 for all combinations of OUTDIR and BATCHES:
OUTDIR = "my_dir"
nBATCHES = 20
BATCHES = list(range(1, 21)) # [1,2,3 ..20] list
IDLIST = ["a", "b"] # dummy data, I don't have the original
rule all:
input:
type1=expand(
"{OUTDIR}/resources/{batch}_output_type1.csv.gz",
OUTDIR=OUTDIR,
batch=BATCHES,
),
rule step5:
input:
ids=expand("{IDLIST}", IDLIST=IDLIST),
output:
type1="{OUTDIR}/resources/{batch}_output_type1.csv.gz",
type2="{OUTDIR}/resources/{batch}_output_type2.csv.gz",
type3="{OUTDIR}/resources/{batch}_output_type3.csv.gz",
shell:
"./some_script.py --outdir {OUTDIR} --idlist {input.ids} --total_batches {nBATCHES} --current_batch {wildcards.batch}"
In your earlier version {batches} was just an expand-placeholder, but not a wildcard and the rule was only called once.
Instead of the rule all, this could be a subsequent rule which uses one or multiple of the outputs generated from step5.

Dynamically generate multiple tasks based on output dictionary from task in Airflow

I have a task in which the output is a dictionary with a list value in each key
#task(task_id="gen_dict")
def generate_dict():
...
return output_dict # output look like this {"A" : ["aa","bb", "cc"], "B" : ["dd","ee", "ff"]}
# my dag (Not mention the part of generating DAG and its properties)
start = DummyOperator(task_id="st")
end = DummyOperator(task_id="ed")
output = generate_dict()
for keys, values in output.items():
for v in values:
dm = DummyOperator(task_id=f"dm_{keys}_{v}")
dm >> end
start >> output
For this sample output above, it should create 6 dummy tasks which are dm_A_aa, dm_A_bb, dm_A_cc, dm_B_dd, dm_B_ee, dm_B_ff
But right now I'm facing the import error
AttributeError: 'XComArg' object has no attribute 'items'
Is it possible to do what I aim to do? If not, is it possible to do it using a list like ["aa", "bb", "cc", "dd", "ee", "ff"] instead?
The code in the question won't work as-is because the loop shown would run when the dag is parsed (happens when the scheduler starts up and periodically thereafter), but the data that it would loop over is not known until the task that generates it is actually run.
There are ways to do something similar though.
AIP-42 added the ability to map list data into task kwargs in airflow 2.3:
#task
def generate_lists():
# presumably the data below would come from a query executed at runtime
return [["aa", "bb", "cc"], ["dd", "ee", "ff"]]
#task
def use_list(the_list):
for item in the_list:
print(item)
with DAG(...) as dag:
use_list.expand(the_list=generate_lists())
The code above will create two tasks with output:
aa
bb
cc
dd
ee
ff
In 2.4 the expand_kwargs function was added. It's an alternative to expand (shown above) which operates on dicts instead.
It takes an XComArg referencing a list of dicts whose keys are the names of the arguments that you're mapping the data into. So the following code...
#task
def generate_dicts():
# presumably the data below would come from a query made at runtime
return [{"foo":6, "bar":7}, {"foo":8, "bar":9}]
#task
def two_things(foo, bar):
print(foo, bar)
with DAG(...) as dag:
two_things.expand_kwargs(generate_dicts())
... gives two tasks with output:
6 7
...and...
8 9
expand only lets you create tasks from the Cartesian product of the input lists, expand_kwargs lets you do the associating of data to kwargs at runtime.

TCL multi level dictionary - how to update same key by appending to it

I have a dict called test, I wish to iteratively update the same key by appending new items to it. So eventually it would look like this
AIM
test {
1 {flps {o1 o2 o3 ...}}
}
What is the standard way to do this ? I have the below code right now:
set test [dict create] ;
dict set test 1 "flps" "o1" ; #### first value o1 added
set new "[dict get $test 1 regs] o2" ; ##temp variable that append the old + new value
dict set test 1 regs $new ; ### does dict set overwrite ?
What is the standard way to do this ?
There is no built-in command for this purpose, one approach is to use dict update:
% set test [dict create]
% dict update test 1 1 { dict lappend 1 flps "o1" }
flps o1
% set test
1 {flps o1}
% dict update test 1 1 { dict lappend 1 flps "o2" }
flps {o1 o2}
% set test
1 {flps {o1 o2}}
However, this easily becomes unhandy when there are more levels of nesting or even an unknown nesting depth.
Based on mrcalvin 's link- found this to work
% set d {key1 {key2 value1}}
% dictlappendsub d key1 key2 value2
key1 {key2 {value1 value2}}
% dictlappendsub d key1 key3 value3
key1 {key2 {value1 value2} key3 value3}
This might be achieved by:
# dictlappendsub dict key1 ... keyn value
proc dictlappendsub {dictName args} {
upvar 1 $dictName d
set keys [lrange $args 0 end-1]
if {[dict exists $d {*}$keys]} { ; **## What does this do!? WOW**
dict set d {*}$keys [linsert [dict get $d {*}$keys] end [lindex $args end]]
} else {
dict set d {*}$keys [lrange $args end end]
}
}
There is no standard command for doing this because the number of possible commands for all the different things that might be done gets too large; only common cases are provided for you (with the exact choice depending in part on what is technically unambiguous given that Tcl's dictionaries allow arbitrary values as keys by design). Instead, dict update and dict with are provided as tools to allow people to make their own solutions.
In particular, by making the assumption that we are appending exactly one list item and with dict update, upvar and some recursion, we can make this:
proc dict_lappend_item {dictVar args} {
upvar 1 $dictVar d
if {[llength $args] == 1} {
lappend d [lindex $args 0]
} else {
set args [lassign $args key]
dict update d $key inner {
dict_lappend_item inner {*}$args
}
}
return $d
}
Here's using it:
% set test [dict create]
% dict set test 1 "flps" "o1"
1 {flps o1}
% dict_lappend_item test 1 flps o2
1 {flps {o1 o2}}
% dict_lappend_item test 1 flps o3
1 {flps {o1 o2 o3}}
That looks like it is doing the thing you need. (Note that dict lappend doesn't make that assumption; it instead handles the append-multiple-items case, which has some true performance nasties if not treated specially in C code.)
You can plug the above procedure into dict.
set map [namespace ensemble configure dict -map]
dict set map additem ::dict_lappend_item
namespace ensemble configure dict -map $map
and then you can do:
% dict additem test 1 flps o4
1 {flps {o1 o2 o3 o4}}
Careful if you do this. Overriding standard subcommand implementations is possible, but a recipe for code that doesn't work if you're not uber-cautious about following the existing API.

Will a nest loop help in parsing results

I am trying to pull information from two different dictionaries. (excuse me because I am literally hacking to understand.)
I have a for loop that gives me the vmname. I have another for loop that gives me the other information like 'replicationid'.
I could be doing a very huge assumption here but hey ill start there. what I want to do it to integrate for loop 1 and for loop 2. as so the results are like this, is it even possible?
initial output of for loop1 which I can get:
vma
vmb
vmc
initial output of for loop2 which I can get:
replication job 1
replication job 2
replication job 3
desired results is:
vma
replication job 1
vmb
replication job 2
vmc
replication job 3
def get_replication_job_status():
sms = boto3.client('sms')
resp = sms.get_replication_jobs()
#print(resp)
things = [(cl['replicationJobId'], cl['serverId']) for cl in
resp['replicationJobList']]
thangs = [cl['vmServer'] for cl in resp['replicationJobList']]
for i in thangs:
print()
print("this is vm " + (i['vmName']))
print("this is the vm location " + (i['vmPath']))
print("this is the vm address, " +(str(i['vmServerAddress'])))
for j in things:
print("The Replication ID is : " +(str(j[0])))
again I want:
vma
replication job 1
vmb
replication job 2
vmc
replication job 3
im am getting:
vma
replication job 1
replication job 2
replication job 3
vmb
replication job 1
replication job 2
replication job 3
..
..
..
If you are sure that your lists both have the same length, then what you need is python built-in zip function:
for thing, thang in zip(things, thangs):
print()
print(thing)
print(thang)
But if one of the lists is longer then another then zip will crop both lists to have the same length as the shortest, for example:
>>> for i, j in zip(range(3), range(5)):
... print(i, j)
...
(0, 0)
(1, 1)
(2, 2)
UPD:
You can also unpack your tuples right in for loop definition, so each item (they are 2-tuples) in things list gets saved to two variables:
for (replicationJobId, serverId), thang in zip(things, thangs):
print()
print(replicationJobId)
print(serverId)
print(thang)
UPD 2:
Why do you split resp into to two lists?
def get_replication_job_status():
sms = boto3.client('sms')
resp = sms.get_replication_jobs()
#print(resp)
for replication_job in resp['replicationJobList']:
vm_server = replication_job['vmServer']
print()
print("this is vm:", vm_server['vmName'])
print("this is the vm location:", vm_server['vmPath'])
print("this is the vm address:", vm_server['vmServerAddress'])
print("The Replication ID is :", replication_job['replicationJobId'])

Doing a complex join on files

I have files (~1k) that look (basically) like this:
NAME1.txt
NAME ATTR VALUE
NAME1 x 1
NAME1 y 2
...
NAME2.txt
NAME ATTR VALUE
NAME2 x 19
NAME2 y 23
...
Where the ATTR collumn is same in everyfile and the name column is just some version of the filename. I would like to combine them together into 1 file that looks like:
All_data.txt
ATTR NAME1_VALUE NAME2_VALUE NAME3_VALUE ...
X 1 19 ...
y 2 23 ...
...
Is there simple way to do this with just command line utilities or will I have to resort to writing some script?
Thanks
You need to write a script.
gawk is the obvious candidate
You could create an associative array in a BEGIN block, using FILENAME as the KEY and
ATTR " " VALUE
values as the value.
Then create your output in an END block.
gawk can process all txt files together by using *txt as the filename
It's a bit optimistic to expect there to be a ready made command to do exactly what you want.
Very few command join data horizontally.

Resources