Pythonic shorthand for keys in a dictionary? - dictionary

Simple question: Is there a shorthand for checking the existence of several keys in a dictionary?
'foo' in dct and 'bar' in dct and 'baz' in dct

all(x in dct for x in ('foo','bar','baz'))

You can use all() with a generator expression:
>>> all(x in dct for x in ('foo', 'bar', 'qux'))
False
>>> all(x in dct for x in ('foo', 'bar', 'baz'))
True
>>>
It saves you a whopping 2 characters (but will save you a lot more if you have a longer list to check).

{"foo","bar","baz"}.issubset(dct.keys())
For python <2.7, you’ll have to replace the set literal with set(["foo","bar","baz"])
If you like operators and don’t mind the performance of creating another set, you can use the <= operator on the set and the dict’s keyset.
Both variations combined would look like:
set(["foo","bar","baz"]) <= set(dct)
Finally, if you use python 3, dict.keys() will return a setlike object, which means that you can call the operator without performance penalty like this:
{"foo","bar","baz"} <= dct.keys()

Related

iterating 2D array in Elixir

I am new to Elixir language and I am having some issues while writing a piece of code.
What I am given is a 2D array like
list1 = [
[1 ,2,3,4,"nil"],
[6,7,8,9,10,],
[11,"nil",13,"nil",15],
[16,17,"nil",19,20] ]
Now, what I've to do is to get all the elements that have values between 10 and 20, so what I'm doing is:
final_list = []
Enum.each(list1, fn row ->
Enum.each(row, &(if (&1 >= 10 and &1 <= 99) do final_list = final_list ++ &1 end))
end
)
Doing this, I'm expecting that I'll get my list of numbers in final_list but I'm getting blank final list with a warning like:
warning: variable "final_list" is unused (there is a variable with the same name in the context, use the pin operator (^) to match on it or prefix this variable with underscore if it is not meant to be used)
iex:5
:ok
and upon printing final_list, it is not updated.
When I try to check whether my code is working properly or not, using IO.puts as:
iex(5)> Enum.each(list1, fn row -> ...(5)> Enum.each(row, &(if (&1 >= 10 and &1 <= 99) do IO.puts(final_list ++ &1) end))
...(5)> end
...(5)> )
The Output is:
10
11
13
15
16
17
19
20
:ok
What could I possibly be doing wrong here? Shouldn't it add the elements to the final_list?
If this is wrong ( probably it is), what should be the possible solution to this?
Any kind of help will be appreciated.
As mentioned in Adam's comments, this is a FAQ and the important thing is the message "warning: variable "final_list" is unused (there is a variable with the same name in the context, use the pin operator (^) to match on it or prefix this variable with underscore if it is not meant to be used)" This message actually indicates a very serious problem.
It tells you that the assignment "final_list = final_list ++ &1" is useless since it just creates a local variable, hiding the external one. Elixir variables are not mutable so you need to reorganize seriously your code.
The simplest way is
final_list =
for sublist <- list1,
n <- sublist,
is_number(n),
n in 10..20,
do: n
Note that every time you write final_list = ..., you actually declare a new variable with the same name, so the final_list you declared inside your anonymous function is not the final_list outside the anonymous function.

Parsing nested SQL queries with parenthesized predicates using pyparsing

I'm trying to parse nested queries of the form that contains predicates with parentheses. Example:
query = '(A LIKE "%.something.com" AND B = 4) OR (C In ("a", "b") AND D Contains "asdf")'
I've tried many of the answers/examples I've seen but without getting them to work, and this is what I have come up with so far that almost(?) works:
from pyparsing import *
r = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'*+,-./:;<=>?#[\]^_`{|}~'
any_keyword = CaselessKeyword("AND") | CaselessKeyword("OR")
non_keyword = ~any_keyword + Word(r)
expr = infixNotation(originalTextFor(non_keyword[1, ...]),
[
(oneOf("AND OR", caseless=True, asKeyword=True), 2, opAssoc.LEFT)
])
Then, running expr.parseString(query).asList() returns only:
[['A LIKE "%.something.com"', 'AND', 'B = 4']]
without the rest of the query.
As far as I understand, this is due to the C In ("a", "b") part, since there are parentheses there.
Is there a way to "disregard" parentheses inside the predicates so that parsing returns the expected answer:
[[['A LIKE "%.something.com"', 'AND', 'B = 4'], 'OR', ['C In ("a", "b")', 'AND', 'D Contains "asdf"']]]
Welcome to pyparsing! You've made some good headway from following other examples, but let's back up just a bit.
infixNotation is definitely the right way to go here, but there are many more operators in this expression than just AND and OR. These are all sub-expressions in your input string:
A LIKE "%.something.com"
B = 4
C in ("a", "b")
D contains "asdf"
Each of these is its own binary expression, with operators "LIKE", "=", "in" and "contains". Also, your operands can be not only identifiers, but quoted strings, a collection of quoted strings, and numbers.
I like your intuition that this is a logical expression of terms, so let's define 2 levels of infixNotations:
an expression of column or numeric "arithmetic" (using '=', "LIKE", etc.)
an expression of terms defined by #1, combined with logical NOT, AND, and OR operators
If we call #1 a column_expr, #2 would look similar to what you have already written:
expr = infixNotation(column_expr,
[
(NOT, 1, opAssoc.RIGHT),
(AND, 2, opAssoc.LEFT),
(OR, 2, opAssoc.LEFT),
])
I've added NOT as a reasonable extension to what you have now - most logical expressions include these 3 operators. Also, it is conventional to define them in this order of precedence, so that "X OR Y AND Z" eventually evaluates in the order "X OR (Y AND Z)", since AND has higher precedence than OR (and NOT is higher still).
#1 takes some more work, so I've written a little BNF for what we should expect for an individual operand for column_expr (I cannot recommend taking this step highly enough!):
identifier ::= one or more Words composed of printable letters (we may come back to this)
number ::= an integer or real number (we can use the one defined in pyparsing_common.number)
quotedString ::= (a quoted string like one define by pyparsing)
quotedString_list ::= '(' quotedString [',' quotedString]... ')'
# put identifier last, since it will match just about anything, and we want to try the other
# expressions first
column_operand ::= quotedString | quotedString_list | number | identifier
Then a column_expr will be an infixNotation using these column_operands:
ppc = pyparsing_common
LPAR, RPAR = map(Suppress, "()")
# use Group to keep these all together
quotedString_list = Group(LPAR + delimitedList(quotedString) + RPAR)
column_operand = quotedString | quotedString_list | ppc.number | identifier
column_expr = infixNotation(column_operand,
[
(IN | CONTAINS, 2, opAssoc.LEFT),
(LIKE, 2, opAssoc.LEFT),
('=', 2, opAssoc.LEFT),
])
If you find you have to add other operators like "!=", most likely you will add them in to column_expr.
Some other notes:
You probably want to remove ' and " from r, since they should really be handled as part of the quoted strings
As your list of keywords grows, you will find it easier to define them using something like this:
AND, OR, NOT, LIKE, IN, CONTAINS = keyword_exprs = list(map(CaselessKeyword, """
AND OR NOT LIKE IN CONTAINS
""".split()))
any_keyword = MatchFirst(keyword_exprs)
Then you can reference them more easily as I have done in the code above.
Write small tests first before trying to test the complex query you posted. Nice work in including many of the variations of operands. Then use runTests to run them all, like this:
expr.runTests("""
A LIKE "%.something.com"
B = 4
C in ("A", "B")
D CONTAINS "ASDF"
(A LIKE "%.something.com" AND B = 4) OR (C In ("a", "b") AND D Contains "asdf")
""")
With these changes, I get this for your original query string:
[[['A', 'LIKE', '"%.something.com"'], 'AND', 'B = 4'], 'OR', [['C', 'IN', ['"a"', '"b"']], 'AND', ['D', 'CONTAINS', '"asdf"']]]
Hmmm, I'm not keen on a term that looks like 'B = 4', now that we are actually parsing the sub expressions. I suspect it is because your definition of identifier is a little too aggressive. If we cut it back to just ~any_keyword + Word(alphas, r), forcing a leading alpha character and without the [1, ...] for repetition, then we get the better-looking:
[[['A', 'LIKE', '"%.something.com"'], 'AND', ['B', '=', 4]], 'OR', [['C', 'IN', ['"a"', '"b"']], 'AND', ['D', 'CONTAINS', '"asdf"']]]
If in fact you do want these sub-expressions to be retained as they were found in the original, and just break up on the logical operators, then you can just wrap column_expr in originalTextFor as you used before, giving:
[['A LIKE "%.something.com"', 'AND', 'B = 4'], 'OR', ['C In ("a", "b")', 'AND', 'D Contains "asdf"']]
Good luck with your SQL parsing project!

Conditional subset array between ranges

I wish to filter data between a specific range.
dummy = [1,2,3,4,5,6,7,8,9,10]
This works for a single condition:
dummy[dummy .> 4]
If I try set a range:
dummy[dummy .> 4 & dummy .< 7]
This logic doesnt provide the expected output filtering > 4 and < 7.
This did the trick
dummy[(dummy .> 4) .& (dummy .< 7)]
Indexing by a boolean array, either dummy[(4 .< dummy) .& (dummy .< 7)] or dummy[4 .< dummy .< 7] should work; the parentheses in the first snippet are required due to operator precedence. For additional clarity with larger filters, the generation of the boolean array can be vectorized using the #. macro:
dummy[#. 4 < dummy < 7]
Note that filtering using boolean arrays will allocate memory for the intermediate array; thus, the filter/filter! functions may come in handy. Both of the following calls are equivalent, with the latter improving readability for larger conditions.
filter(x -> 4 < x < 7, dummy)
filter(dummy) do x
4 < x < 7
end
The filter! function may be used in place of filter if mutation of the existing array is acceptable.

Convert Dict to DataFrame in Julia

Suppose I have a Dict defined as follows:
x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [4,5,6])
I want to convert this to a DataFrame object (from the DataFrames module). Constructing a DataFrame has a similar syntax to constructing a dictionary. For example, the above dictionary could be manually constructed as a data frame as follows:
DataFrame(A = [1,2,3], B = [4,5,6])
I haven't found a direct way to get from a dictionary to a data frame but I figured one could exploit the syntactic similarity and write a macro to do this. The following doesn't work at all but it illustrates the approach I had in mind:
macro dict_to_df(x)
typeof(eval(x)) <: Dict || throw(ArgumentError("Expected Dict"))
return quote
DataFrame(
for k in keys(eval(x))
#eval ($k) = $(eval(x)[$k])
end
)
end
end
I also tried writing this as a function, which does work when all dictionary values have the same length:
function dict_to_df(x::Dict)
s = "DataFrame("
for k in keys(x)
v = x[k]
if typeof(v) <: AbstractString
v = string('"', v, '"')
end
s *= "$(k) = $(v),"
end
s = chop(s) * ")"
return eval(parse(s))
end
Is there a better, faster, or more idiomatic approach to this?
Another method could be
DataFrame(Any[values(x)...],Symbol[map(symbol,keys(x))...])
It was a bit tricky to get the types in order to access the right constructor. To get a list of the constructors for DataFrames I used methods(DataFrame).
The DataFrame(a=[1,2,3]) way of creating a DataFrame uses keyword arguments. To use splatting (...) for keyword arguments the keys need to be symbols. In the example x has strings, but these can be converted to symbols. In code, this is:
DataFrame(;[Symbol(k)=>v for (k,v) in x]...)
Finally, things would be cleaner if x had originally been with symbols. Then the code would go:
x = Dict{Symbol,Array{Integer,1}}(:A => [1,2,3], :B => [4,5,6])
df = DataFrame(;x...)

How can I get a flat result from a list comprehension instead of a nested list?

I have a list A, and a function f which takes an item of A and returns a list. I can use a list comprehension to convert everything in A like [f(a) for a in A], but this returns a list of lists. Suppose my input is [a1,a2,a3], resulting in [[b11,b12],[b21,b22],[b31,b32]].
How can I get the flattened list [b11,b12,b21,b22,b31,b32] instead? In other words, in Python, how can I get what is traditionally called flatmap in functional programming languages, or SelectMany in .NET?
(In the actual code, A is a list of directories, and f is os.listdir. I want to build a flat list of subdirectories.)
See also: How do I make a flat list out of a list of lists? for the more general problem of flattening a list of lists after it's been created.
You can have nested iterations in a single list comprehension:
[filename for path in dirs for filename in os.listdir(path)]
which is equivalent (at least functionally) to:
filenames = []
for path in dirs:
for filename in os.listdir(path):
filenames.append(filename)
>>> from functools import reduce # not needed on Python 2
>>> list_of_lists = [[1, 2],[3, 4, 5], [6]]
>>> reduce(list.__add__, list_of_lists)
[1, 2, 3, 4, 5, 6]
The itertools solution is more efficient, but this feels very pythonic.
You can find a good answer in the itertools recipes:
import itertools
def flatten(list_of_lists):
return list(itertools.chain.from_iterable(list_of_lists))
The question proposed flatmap. Some implementations are proposed but they may unnecessary creating intermediate lists. Here is one implementation that's based on iterators.
def flatmap(func, *iterable):
return itertools.chain.from_iterable(map(func, *iterable))
In [148]: list(flatmap(os.listdir, ['c:/mfg','c:/Intel']))
Out[148]: ['SPEC.pdf', 'W7ADD64EN006.cdr', 'W7ADD64EN006.pdf', 'ExtremeGraphics', 'Logs']
In Python 2.x, use itertools.map in place of map.
You could just do the straightforward:
subs = []
for d in dirs:
subs.extend(os.listdir(d))
You can concatenate lists using the normal addition operator:
>>> [1, 2] + [3, 4]
[1, 2, 3, 4]
The built-in function sum will add the numbers in a sequence and can optionally start from a specific value:
>>> sum(xrange(10), 100)
145
Combine the above to flatten a list of lists:
>>> sum([[1, 2], [3, 4]], [])
[1, 2, 3, 4]
You can now define your flatmap:
>>> def flatmap(f, seq):
... return sum([f(s) for s in seq], [])
...
>>> flatmap(range, [1,2,3])
[0, 0, 1, 0, 1, 2]
Edit: I just saw the critique in the comments for another answer and I guess it is correct that Python will needlessly build and garbage collect lots of smaller lists with this solution. So the best thing that can be said about it is that it is very simple and concise if you're used to functional programming :-)
subs = []
map(subs.extend, (os.listdir(d) for d in dirs))
(but Ants's answer is better; +1 for him)
import itertools
x=[['b11','b12'],['b21','b22'],['b31']]
y=list(itertools.chain(*x))
print y
itertools will work from python2.3 and greater
You could try itertools.chain(), like this:
import itertools
import os
dirs = ["c:\\usr", "c:\\temp"]
subs = list(itertools.chain(*[os.listdir(d) for d in dirs]))
print subs
itertools.chain() returns an iterator, hence the passing to list().
This is the most simple way to do it:
def flatMap(array):
return reduce(lambda a,b: a+b, array)
The 'a+b' refers to concatenation of two lists
You can use pyxtension:
from pyxtension.streams import stream
stream([ [1,2,3], [4,5], [], [6] ]).flatMap() == range(7)
Google brought me next solution:
def flatten(l):
if isinstance(l,list):
return sum(map(flatten,l))
else:
return l
If listA=[list1,list2,list3]
flattened_list=reduce(lambda x,y:x+y,listA)
This will do.

Resources