Sort dictionary based on len(value) where value is a set - dictionary

I know there are many solutions (How do I sort a dictionary by value?) to sorting a dictionary by values. However, most of those predate python 3.7's changes to dictionary.
I am also aware of Fastest way to sort a python 3.7+ dictionary, which seems close to the answer I need.
I have a large dictionary of keys that are ints and values that are sets of strings.
I want to create a new dictionary that is sorted by the length of the set of each value.
Dictionary:
dict1={
'12':{'sym1', 'sym2'},
'13':{'sym1', 'sym4', 'sym5', 'sym6'},
'14':{'sym1', 'sym3'},
'15':{'sym2'},
'16':{'sym2'},
'17':{'sym2'},
'18':{'sym3', 'sym89', 'sym34', 'sym5', 'sym88'}
}
New sorted dictionary:
>>sorted_dict1
{
'18':{'sym3', 'sym89', 'sym34', 'sym5', 'sym88'},
'13':{'sym1', 'sym4', 'sym5', 'sym6'},
'12':{'sym1', 'sym2'},
'14':{'sym1', 'sym3'},
'15':{'sym2'},
'16':{'sym2'},
'17':{'sym2'}
}

I think this is a very slow way to do this, but here goes:
create a dictionary that has the same keys, and the value is the length of the set of the key
from operator import itemgetter
dict1={
'12':{'sym1', 'sym2'},
'13':{'sym1', 'sym4', 'sym5', 'sym6'},
'14':{'sym1', 'sym3'},
'15':{'sym2'},
'16':{'sym2'},
'17':{'sym2'},
'18':{'sym3', 'sym89', 'sym34', 'sym5', 'sym88'}
}
dict1_len ={}
for k,v in dict1.items():
dict1_len.update({k:len(v)})
Sort dict1_len by the value numbers in reverse (descending).
sorted_dict1_len = {k: v for k,v in sorted(dict1_len.items(), key=itemgetter(1), reverse=True)}
using the keys in the order given by sorted_dict1_len, add the key and the values of that key as given by the original dict1 to sorted_dict1.
sorted_dict1 = {}
for k in sorted_dict1_len.keys():
print(k)
sorted_dict1.update({k:dict1.get(k)})
Edit: improved answer
from operator import itemgetter
dict1={
'12':{'sym1', 'sym2'},
'13':{'sym1', 'sym4', 'sym5', 'sym6'},
'14':{'sym1', 'sym3'},
'15':{'sym2'},
'16':{'sym2'},
'17':{'sym2'},
'18':{'sym3', 'sym89', 'sym34', 'sym5', 'sym88'}
}
def len_val(tup):
return len(tup[1]) # length of the value i.e. elements in set
dict2 = sorted(dict1.items(), key=len_val, reverse=True)
print(dict2)
returns
[('18', {'sym88', 'sym5', 'sym89', 'sym3', 'sym34'}), ('13', {'sym1', 'sym5', 'sym6', 'sym4'}), ('12', {'sym1', 'sym2'}), ('14', {'sym3', 'sym1'}), ('15', {'sym2'}), ('16', {'sym2'}), ('17', {'sym2'})]

Related

How to change each keys in a dictionary

Is there any way I can change the keys in a dictionary at once?
For example, mydict={0:0.0, 1:1.1, 2:2.2}.
How can I get newdict={1:0.0, 2:1.1, 0:2.2}?
If you want to do it in Python 3 so you can control the order of keys in mydict, then you could use a pandas Series to help assign new keys to the dictionary values.
import pandas as pd
mydict={0:0.0, 1:1.1, 2:2.2}
new_keys = [1, 2, 0]
# Make dictionary with same values assigned to new keys.
newdict = pd.Series(list(mydict.values()),
index=new_keys) \
.to_dict()
newdict
# {1: 0.0, 2: 1.1, 0: 2.2}

What is the "some" meaning in Collect result in Scala

"some" is not a special term which makes the googling seem to just ignore that search.
What I am asking is in my learning below:
b.collect:
Array[(Int, String)] = Array((3,dog), (6,salmon), (3,rat), (8,elephant))
d.collect:
Array[(Int, String)] = Array((3,dog), (3,cat), (6,salmon), (6,rabbit), (4,wolf), (7,penguin))
if I do some join and then collect the result, like b.join(d).collect, I will get the following:
Array[(Int, (String, String))] = Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (3,(dog,dog)), (3,(dog,cat)), (3,(rat,dog)), (3,(rat,cat)))
which seems understandable, however, if I do: b.leftOuterJoin(d).collect, I will get:
Array[(Int, (String, Option[String]))] = Array((6,(salmon,Some(salmon))), (6,(salmon,Some(rabbit))), (3,(dog,Some(dog))), (3,(dog,Some(cat))), (3,(rat,Some(dog))), (3,(rat,Some(cat))), (8,(elephant,None)))
My question is why do I get results seems to be expressed differently, I mean why the second result contains "Some"? what's the difference between with "Some" and without "Some"? Can "Some" be removed? Does "Some" have any impact to any later operations as the content of RDD?
Thank you very much.
When you do the normal join as b.join(d).collect, you get Array[(Int, (String, String))]
This is because of only the same key with RDD b and RDD d so it is always guaranteed to have a value so it returns Array[(Int, (String, String))].
But when you use b.leftOuterJoin(d).collect the return type is Array[(Int, (String, Option[String]))] this is because to handle the null. In leftOuterJoin, there is no guarantee that all the keys of RDD b are available in RDD d, So it is returned as Option[String] which contains two values
Some(String) =>If the key is matched in both RDD
None If the key is present in b and not present in d
You can replace Some by getting the value from it and providing the value in case of None as below.
val z = b.leftOuterJoin(d).map(x => (x._1, (x._2._1, x._2._2.getOrElse("")))).collect
Now you should get Array[(Int, (String, String))] and output as
Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (3,(dog,dog)), (3,(dog,cat)), (3,(rat,dog)), (3,(rat,Some(cat)), (8,(elephant,)))
Where you can replace "" with any other string as you require.
Hope this helps.

How do you incorporate a variable in a NSLayoutConstraint string?

Is there a correct way to use a variable within a constraint string as demoed below?
let x = 6
self.addConstraints(
NSLayoutConstraint.constraintsWithVisualFormat(
"H:|-x-[subView(==16)]|",
options:[], metrics:nil,
views:viewDictionary))
self.addConstraints(
NSLayoutConstraint.constraintsWithVisualFormat(
"V:|-x-[subView(==16)]|",
options:[], metrics:nil,
views:viewDictionary))
That's what the metrics dictionary is for. Pass a dictionary like [ "x": x ].

IndexError: list index out of range, scores.append( (fields[0], fields[1]))

I'm trying to read a file and put contents in a list. I have done this mnay times before and it has worked but this time it throws back the error "list index out of range".
the code is:
with open("File.txt") as f:
scores = []
for line in f:
fields = line.split()
scores.append( (fields[0], fields[1]))
print(scores)
The text file is in the format;
Alpha:[0, 1]
Bravo:[0, 0]
Charlie:[60, 8, 901]
Foxtrot:[0]
I cant see why it is giving me this problem. Is it because I have more than one value for each item? Or is it the fact that I have a colon in my text file?
How can I get around this problem?
Thanks
If I understand you well this code will print you desired result:
import re
with open("File.txt") as f:
# Let's make dictionary for scores {name:scores}.
scores = {}
# Define regular expressin to parse team name and team scores from line.
patternScore = '\[([^\]]+)\]'
patternName = '(.*):'
for line in f:
# Find value for team name and its scores.
fields = re.search(patternScore, line).groups()[0].split(', ')
name = re.search(patternName, line).groups()[0]
# Update dictionary with new value.
scores[name] = fields
# Print output first goes first element of keyValue in dict then goes keyName
for key in scores:
print (scores[key][0] + ':' + key)
You will recieve following output:
60:Charlie
0:Alpha
0:Bravo
0:Foxtrot

Julia dictionary "key not found" only when using loop

Still trying to figure out this problem (I was having problems building a dictionary, but managed to get that working thanks to rickhg12hs).
Here's my current code:
#open files with codon:amino acid pairs, initiate dictionary:
file = open(readall, "rna_codons.txt")
seq = open(readall, "rosalind_prot.txt")
codons = {"UAA" => "stop", "UGA" => "stop", "UAG" => "stop"}
#generate dictionary entries using pairs from file:
for m in eachmatch(r"([AUGC]{3,3})\s([A-Z])\s", file)
codon, aa = m.captures
codons[codon] = aa
end
All of that code seems to work as intended. At this point, I have the dictionary I want, and the right keys point to the right entries. If I just do print(codons["AUG"]) for example, it prints 'M', which is the correct output. Now I want to scan through a string in the second file, and for every 3 letters, pull out the entry referenced in the dictionary and add it to the prot string. So I tried:
for m in eachmatch(r"([AUGC]{3,3})", seq)
amac = codons[m.captures]
prot = "$prot$amac"
end
But this kicks out the error key not found: ["AUG"]. I know the key exists, because I can print codons["AUG"] and it returns the proper entry, so why can't it find that key when it's in the loop?

Resources