How to remove common elements from both lists in python3.6? - python-3.6

If
L1=[2,4,6,8,2,4,6,8]
L2=[1,3,2,2,4]
then after performing the operation my result should be:
L1=[6,8,4,6,8]
L2=[1,3]
The operation should remove elements present in common in both List1 and List2. Tell me a method to do this.

For less complexity I suggest:
uniq = set(L1).intersection(L2)
L1_uniq = [x for x in L1 if x not in uniq]
L2_uniq = [x for x in L2 if x not in uniq]

L1_unique=[i for i in L1 if i not in L2]
L2_unique=[i for i in L2 if i not in L1]
This is called list comprehension, which is a very useful feature of python. It makes use of for loop, which can be expressed explicitly as:
L1_unique=[]
for i in L1:
if i not in L2:
L1.append(i)
Which is equivalent to a double for loop:
for i in L1:
for j in L2:
if i==j:
break
else:
L1_unique.append(i)
As the other answer presented (and I voted up), having a set based on the intersect of the two lists before list comprehension can reduce time complexity, because it ultimately reduces number of searches in the second list. (You can simply run %%timeit to see if you use IPython)
In principle, you may want to modify the structure of the second list, so that you do not have to traverse the entire list in case of unsuccessful search . But I doubt if it can be faster than list comprehension in practice.

Related

Find which sum of any numbers in an array equals amount

I have a customer who sends electronic payments but doesn't bother to specify which invoices. I'm left guessing which ones and I would rather not try every single combination manually. I need some sort of pseudo-code to do it and then I can adapt it but I'm not sure I can come up with a good algorithm myself. . I'm familiar with php, bash, and python but I can adapt.
I would need an array with the following numbers: [357.15, 223.73, 106.99, 89.96, 312.39, 120.00]. Those are the amounts of the invoices. Then I would need to find a sum of any combination of two or more of those numbers that adds up to 596.57. Once found the program would need to tell me exactly which numbers it used to reach the sum so I can then know which invoices got paid.
This is very similar to the Subset Sum problem and can be solved using a similar approach to the typical brute-force method used for that problem. I have to do this often enough that I keep a simple template of this algorithm handy for when I need it. What is posted below is a slightly modified version1.
This has no restrictions on whether the values are integer or float. The basic idea is to iterate over the list of input values and keep a running list of every subset that sums to less than the target value (since there might be a later value in the inputs that will yield the target). It could be modified to handle negative values as well by removing the rule that only keeps candidate subsets if they sum to less than the target. In that case, you'd keep all subsets, and then search through them at the end.
import copy
def find_subsets(base_values, taget):
possible_matches = [[0, []]] # [[known_attainable_value, [list, of, components]], [...], ...]
matches = [] # we'll return ALL subsets that sum to `target`
for base_value in base_values:
temp = copy.deepcopy(possible_matches) # Can't modify in loop, so use a copy
for possible_match in possible_matches:
new_val = possible_match[0] + base_value
if new_val <= target:
new_possible_match = [new_val, possible_match[1]]
new_possible_match[1].append(base_value)
temp.append(new_possible_match)
if new_val == target:
matches.append(new_possible_match[1])
possible_matches = temp
return matches
find_subsets([list, of input, values], target_sum)
This is a very inefficient algorithm and it will blow up quickly as the size of the input grows. The Subset Sum problem is NP-Complete, so you are not likely to find a generalized solution that will work in all cases and is efficient.
1: The way lists are being used here is kludgy. If the goal was to simply find any match, the nested lists could be replaced with a dictionary, and we could exit right away once a match is found. But doing that will cause intermediate subsets that sum to the same value to also map to the same dictionary slot, so only one subset with that sum is kept. Since we need to report all matching subsets (because the values represent checks and are presumably not fungible even if the dollar amounts are equal), a dictionary won't work.
You can use itertools.combinations(t,r) to list all combinations of r elements in array t.
So we loop on the possible values of r, then on the results of itertools.combinations:
import itertools
def find_sum(t, obj):
t = [x for x in t if x < obj] # filter out elements which are too big
for r in range(1, len(t)+1): # loop on number of elements
for subt in itertools.combinations(t, r): # loop on combinations of r elements
if sum(subt) == obj:
return subt
return None
find_sum([1,2,3,4], 6)
# (2, 4)
find_sum([1,2,3,4], 10)
# (1, 2, 3, 4)
find_sum([1,2,3,4], 11)
# none
find_sum([35715, 22373, 10699, 8996, 31239, 12000], 59657)
# none
Rounding errors:
The code above is meant to be used with integers, rather than floats.
To use with floats, replace the test sum(subt) == obj with the more forgiving test sum(subt) - obj < 0.01.
Relevant documentation:
itertools.combinations

Iterate through all possibilities in Julia

If I want to do something on each pair of letters, it could look like this in Julia:
for l1 in 'a':'z'
for l2 in 'a':'z'
w = l1*l2
# ... do something with w ...
end
end
I want to generalise this to words of any length, given a value n specifying the number of letters desired. How do I best do this in Julia?
You can use:
for ls in Iterators.product(fill('a':'z', n)...))
w = join(ls)
# ... do something with w ...
end
In particular if you wanted to collect them in an array you could write:
join.(Iterators.product(fill('a':'z', n)...))
or flatten it to a vector
vec(join.(Iterators.product(fill('a':'z', n)...)))
Note, however, that in most cases this will not be needed and for larger n it is better not to materialize the output but just iterate over it as suggested above.

Elixir loop over a matrix

I have a list of elements and I am converting it into a list of lists using the Enum.chunk_every method.
The code is something like this:
matrix = Enum.chunk_every(list_1d, num_cols)
Now I want to loop over the matrix and access the neighbors
Simply if I have the list [1,2,3,4,5,6,1,2,3] it is converted to a 3X3 matrix like:
[[1,2,3], [4,5,6], [1,2,3]]
Now how do I loop over this matrix? And what if I want to access the neighbors of the elements? For example the neighbors of 5 are 2,4,6 and 2.
I can see that recursion is a way to go but how will that work here?
There are many ways to solve this, and I think that you should consider first what is your use case (size of the matrix, number of matrices, number of accesses...) and adapt your data structure accordingly.
Nevertheless, here is a simple implementation (in Erlang shell, I let you adapt to elixir):
1> L = [[1,2,3], [4,5,6], [1,2,3]].
[[1,2,3],[4,5,6],[1,2,3]]
2> Get = fun(I,J,L) ->
try
V = lists:nth(I,lists:nth(J,L)),
{ok,V}
catch
_:_ -> {error,out_of_bound}
end
end.
#Fun<erl_eval.18.99386804>
3> Get(1,2,L).
{ok,4}
4> Get(2,3,L).
{ok,2}
5> Get(2,4,L).
{error,out_of_bound}
6> Neighbor = fun(I,J,L) ->
[ V || {I1,J1} <- [{I,J-1},{I-1,J},{I+1,J},{I,J+1}],
{ok,V} <- [Get(I1,J1,L)]
]
end.
#Fun<erl_eval.18.99386804>
7> Neighbor(2,2,L).
[2,4,6,2]
8> Neighbor(1,2,L).
[1,5,1]
9>
Remark: I like list comprehension, you may prefer to use lists:map in this case. This code is not efficient since it parses 4 time the list to get the neighbors. The only advantage is that it is "straight". so it should be easy to read.

R add to a list in a loop, using conditions

I have a data.frame dim = (200,500)
I want to do a shaprio.test on each column of my dataframe and append to a list. This is what I'm trying:
colstoremove <- list();
for (i in range(dim(I.df.nocov)[2])) {
x <- shapiro.test(I.df.nocov[1:200,i])
colstoremove[[i]] <- x[2]
}
However this is failing. Some pointers? (background is mainly python, not much of an R user)
Consider lapply() as any data frame passed into it runs operations on columns and the returned list will be equal to number of columns:
colstoremove <- lapply(I.df.noconv, function(col) shapiro.test(col)[2])
Here is what happens in
for (i in range(dim(I.df.nocov)[2]))
For the sake of example, I assume that I.df.nocov contains 100 rows and 5 columns.
dim(I.df.nocov) is the vector of I.df.nocov dimensions, i.e. c(100, 5)
dim(I.df.nocov)[2] is the 2nd dimension of I.df.nocov, i.e. 5
range(x)is a 2-element vector which contains minimal and maximal values of x. For example, range(c(4,10,1)) is c(1,10). So range(dim(I.df.nocov)[2]) is c(5,5).
Therefore, the loop iterate twice: first time with i=5, and second time also with i=5. Not surprising that it fails!
The problem is that R's function range and Python's function with the same name do completely different things. The equivalent of Python's range is called seq. For example, seq(5)=c(1,2,3,4,5), while seq(3,5)=c(3,4,5), and seq(1,10,2)=c(1,3,5,7,9). You may also write 1:n, it is the same as seq(n), and m:n is same as seq(m,n) (but the priority of ':' is very high, so 1:2*x is interpreted as (1:2)*x.
Generally, if something does not work in R, you should print the subexpressions from the innerwise to the outerwise. If some subexpression is too big to be printed, use str(x) (str means "structure"). And never assume that functions in Python and R are same! If there is a function with same name, it usually does a different thing.
On a side note, instead of dim(I.df.nocov)[2] you could just write ncol(I.df.nocov) (there is also a function nrow).

Numpy indexing using array

I'm trying to return a (square) section from an array, where the indices wrap around the edges. I need to juggle some indexing, but it works, however, I expect the last two lines of codes to have the same result, why don't they? How does numpy interpret the last line?
And as a bonus question: Am I being woefully inefficient with this approach? I'm using the product because I need to modulo the range so it wraps around, otherwise I'd use a[imin:imax, jmin:jmax, :], of course.
import numpy as np
from itertools import product
i = np.arange(-1, 2) % 3
j = np.arange(1, 4) % 3
a = np.random.randint(1,10,(3,3,2))
print a[i,j,:]
# Gives 3 entries [(i[0],j[0]), (i[1],j[1]), (i[2],j[2])]
# This is not what I want...
indices = list(product(i, j))
print indices
indices = zip(*indices)
print 'a[indices]\n', a[indices]
# This works, but when I'm explicit:
print 'a[indices, :]\n', a[indices, :]
# Huh?
The problem is that advanced indexing is triggered if:
the selection object, obj, is [...] a tuple with at least one sequence object or ndarray
The easiest fix in your case is to use repeated indexing:
a[i][:, j]
An alternative would be to use ndarray.take, which will perform the modulo operation for you if you specify mode='wrap':
a.take(np.arange(-1, 2), axis=0, mode='wrap').take(np.arange(1, 4), axis=1, mode='wrap')
To give another method of advanced indexing which is better in my opinion then the product solution.
If you have for every dimension an integer array these are broadcasted together and the output is the same output as the broadcast shape (you will see what I mean)...
i, j = np.ix_(i,j) # this adds extra empty axes
print i,j
print a[i,j]
# and now you will actually *not* be surprised:
print a[i,j,:]
Note that this is a 3x3x2 array, while you had a 9x2 array, but simple reshape will fix that and the 3x3x2 array is actually closer to what you want probably.
Actually the surprise is still hidden in a way, because in your examples a[indices] is the same as a[indices[0], indicies[1]] but a[indicies,:] is a[(indicies[0], indicies[1]),:] which is not a big surprise that it is different. Note that a[indicies[0], indicies[1],:] does give the same result.
See : http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
When you add :, you are mixing integer indexing and slicing. The rules are quite complicated and better explained than I could in the above link.

Resources