I know there are many solutions (How do I sort a dictionary by value?) to sorting a dictionary by values. However, most of those predate python 3.7's changes to dictionary.
I am also aware of Fastest way to sort a python 3.7+ dictionary, which seems close to the answer I need.
I have a large dictionary of keys that are ints and values that are sets of strings.
I want to create a new dictionary that is sorted by the length of the set of each value.
Dictionary:
dict1={
'12':{'sym1', 'sym2'},
'13':{'sym1', 'sym4', 'sym5', 'sym6'},
'14':{'sym1', 'sym3'},
'15':{'sym2'},
'16':{'sym2'},
'17':{'sym2'},
'18':{'sym3', 'sym89', 'sym34', 'sym5', 'sym88'}
}
New sorted dictionary:
>>sorted_dict1
{
'18':{'sym3', 'sym89', 'sym34', 'sym5', 'sym88'},
'13':{'sym1', 'sym4', 'sym5', 'sym6'},
'12':{'sym1', 'sym2'},
'14':{'sym1', 'sym3'},
'15':{'sym2'},
'16':{'sym2'},
'17':{'sym2'}
}
I think this is a very slow way to do this, but here goes:
create a dictionary that has the same keys, and the value is the length of the set of the key
from operator import itemgetter
dict1={
'12':{'sym1', 'sym2'},
'13':{'sym1', 'sym4', 'sym5', 'sym6'},
'14':{'sym1', 'sym3'},
'15':{'sym2'},
'16':{'sym2'},
'17':{'sym2'},
'18':{'sym3', 'sym89', 'sym34', 'sym5', 'sym88'}
}
dict1_len ={}
for k,v in dict1.items():
dict1_len.update({k:len(v)})
Sort dict1_len by the value numbers in reverse (descending).
sorted_dict1_len = {k: v for k,v in sorted(dict1_len.items(), key=itemgetter(1), reverse=True)}
using the keys in the order given by sorted_dict1_len, add the key and the values of that key as given by the original dict1 to sorted_dict1.
sorted_dict1 = {}
for k in sorted_dict1_len.keys():
print(k)
sorted_dict1.update({k:dict1.get(k)})
Edit: improved answer
from operator import itemgetter
dict1={
'12':{'sym1', 'sym2'},
'13':{'sym1', 'sym4', 'sym5', 'sym6'},
'14':{'sym1', 'sym3'},
'15':{'sym2'},
'16':{'sym2'},
'17':{'sym2'},
'18':{'sym3', 'sym89', 'sym34', 'sym5', 'sym88'}
}
def len_val(tup):
return len(tup[1]) # length of the value i.e. elements in set
dict2 = sorted(dict1.items(), key=len_val, reverse=True)
print(dict2)
returns
[('18', {'sym88', 'sym5', 'sym89', 'sym3', 'sym34'}), ('13', {'sym1', 'sym5', 'sym6', 'sym4'}), ('12', {'sym1', 'sym2'}), ('14', {'sym3', 'sym1'}), ('15', {'sym2'}), ('16', {'sym2'}), ('17', {'sym2'})]
Still trying to figure out this problem (I was having problems building a dictionary, but managed to get that working thanks to rickhg12hs).
Here's my current code:
#open files with codon:amino acid pairs, initiate dictionary:
file = open(readall, "rna_codons.txt")
seq = open(readall, "rosalind_prot.txt")
codons = {"UAA" => "stop", "UGA" => "stop", "UAG" => "stop"}
#generate dictionary entries using pairs from file:
for m in eachmatch(r"([AUGC]{3,3})\s([A-Z])\s", file)
codon, aa = m.captures
codons[codon] = aa
end
All of that code seems to work as intended. At this point, I have the dictionary I want, and the right keys point to the right entries. If I just do print(codons["AUG"]) for example, it prints 'M', which is the correct output. Now I want to scan through a string in the second file, and for every 3 letters, pull out the entry referenced in the dictionary and add it to the prot string. So I tried:
for m in eachmatch(r"([AUGC]{3,3})", seq)
amac = codons[m.captures]
prot = "$prot$amac"
end
But this kicks out the error key not found: ["AUG"]. I know the key exists, because I can print codons["AUG"] and it returns the proper entry, so why can't it find that key when it's in the loop?
I am trying to use Pyparsing to identify a keyword which is not beginning with $ So for the following input:
$abc = 5 # is not a valid one
abc123 = 10 # is valid one
abc$ = 23 # is a valid one
I tried the following
var = Word(printables, excludeChars='$')
var.parseString('$abc')
But this doesn't allow any $ in var. How can I specify all printable characters other than $ in the first character position? Any help will be appreciated.
Thanks
Abhijit
You can use the method I used to define "all characters except X" before I added the excludeChars parameter to the Word class:
NOT_DOLLAR_SIGN = ''.join(c for c in printables if c != '$')
keyword_not_starting_with_dollar = Word(NOT_DOLLAR_SIGN, printables)
This should be a bit more efficient than building up with a Combine and a NotAny. But this will match almost anything, integers, words, valid identifiers, invalid identifiers, so I'm skeptical of the value of this kind of expression in your parser.
I want to return all users that I follow who are not members of any groups that I am in. If a followed user is a member of even one group that I am in, it should not be returned.
However, I am getting an error:
None.get
Neo.DatabaseError.Statement.ExecutionFailure
when I try this query:
MATCH (g1:groups)<-[:MEMBER_OF]-(u1:users{userid1:"56"})-[:FOLLOWS]->(u2:users)-[:MEMBER_OF]->(g2:groups)
WITH collect(g1.groupid) AS my_groups,u2,collect(g2.groupid) AS foll_groups
WHERE NOT any(t in foll_groups WHERE t IN extract(x IN my_groups))
RETURN u2
Here is one solution:
MATCH (g1:groups)<-[:MEMBER_OF]-(u1:users { userid1:"56" })-[:FOLLOWS]->(u2:users)-[:MEMBER_OF]->(g2:groups)
WITH u2, collect(g2) AS foll_groups, collect(g1) AS my_groups
WITH u2, reduce(dup = FALSE, g IN foll_groups | (dup OR g IN my_groups)) AS has_dup
WHERE NOT has_dup
RETURN u2;
I'm working to grab two different elements in a string.
The string look like this,
str <- c('a_abc', 'b_abc', 'abc', 'z_zxy', 'x_zxy', 'zxy')
I have tried with the different options in ?grep, but I can't get it right, 'm doing something like this,
grep('[_abc]:[_zxy]',str, value = TRUE)
and what I would like is,
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
any help would be appreciated.
Use normal parentheses (, not the square brackets [
grep('_(abc|zxy)',str, value = TRUE)
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
To make the grep a bit more flexible, you could do something like:
grep('_.{3}$',str, value = TRUE)
Which will match an underscore _ followed by any character . three times {3} followed immediately by the end of the string $
this should work: grep('_abc|_zxy', str, value=T)
X|Y matches when either X matches or Y matches
In this case just doing:
str[grep("_",str)]
will work... is it more complicated in your specific case?