How can I get `True` from `[1, 1, 0, 0, 0] == [0, 0, 1, 1, 0]` in Python? - boolean-algebra

Example:
I have a solution list a:
a = [1, 1, 0, 0, 0]
and input lists bs:
b1 = [1, 1, 0, 0, 0]
b2 = [0, 1, 1, 0, 0]
b3 = [0, 0, 1, 1, 0]
...
bn = [1, 0, 0, 0, 1]
If I compare a to either b1, b2, ..., bn, I expected to get True value from the comparisons. For sure, this simple expression will not work:
if a == b:
...
because in Python only identical lists can be equal.
Is there any beautiful math that I can easily implement it in programming languages? Now I am thinking about building some hash function but I'm still not sure how?
Note 1) it can be easily implemented by just using for loop but I need some thing more robust. 2) this is maybe also related to problem of this post Cyclic group

A simple solution could be to adjust the a and b values:
a_original = [5, 2, 3, 1, 4]
a_formatted = sorted(a_original)
Then, you can just use the formatted variables. A simple "for" loop can be used to format all of your variables.
Hope this helps!

Related

R - Create new variable using the difference between lagged values

The problem: I need to create a new variable (eventWindowTime) in R that is based on data from two columns - obstacle present (1=yes,0=no) and timeOnTask (assessed continuously).
The dataset: I have data that was continuously collected (to the fractional seconds) from several participants as they performed a task. At various points, participants encountered one or more obstacles. I would like to create obstacle event windows that range from -5s (5 s before the obstacle) to +20s (20 s after the obstacle).
Additional challenges:
Some event windows are overlapping
Some timestamps have multiple measurements (so I can't rep values -5 to 20 relative to the first obstaclePresent == 1)
Things I've tried:
The way I would typically approach this is to use ifelse or case_when functions with lag() to:
set eventWindowTime to 0 when obstaclePresent == 1 && lag(obstaclePresent == 0)
set eventWindow Time to increment the lagged eventWindowTime value by the difference in timeOnTask values across the two rows when obstaclePresent == 1 && lag(obstaclePresent == 1).
then backfill the negative seconds in a second step.
However, R does not seem to hold the lagged values in memory and I keep getting a "Error in vec_slice():
! x must be a vector, not NULL." error.
Here's a small subset of code and a file which can be used to reproduce the problem:
mre <- data.frame(Sub = rep(1, 41), Time = c(723.2, 723.2, 723.3, 723.3, 723.3, 723.4, 723.4, 723.5, 723.5, 723.6, 723.6, 723.6, 723.7, 723.7, 723.7, 723.8, 723.9, 723.9, 723.9, 724, 724, 724, 724, 724.1, 724.1, 724.2, 724.2, 724.2, 724.3, 724.3, 724.3, 724.4, 724.4, 724.5, 724.5, 724.6, 724.6, 724.6, 724.7, 724.7, 724.8), obstaclePresent = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))
mre$obstacleEventWindow <- case_when(
mre$obstaclePresent != lag(mre$obstaclePresent,1) & mre$obstaclePresent == 0 ~ 0,
mre$obstaclePresent == lag(mre$obstaclePresent,1) ~ lag(mre$obstacleEventWindow) + mre$newTime - lag(mre$newTime,1),
TRUE ~ 0
)
To be clear, I understand that the case_when() statement is self-referencing. I've worked with other programs where a column is populated on the fly, and you can reference lagged cells without issue. That isn't working here, but I'm at a loss with respect to what to do instead.

Broadcasting struct creation with `Base.#kwdef`

If I have a large struct that I want to create an array of (e.g. to later create a StructArray), how can I create an array of structs when I have keyword defaults.
E.g.
Base.#kwdef struct MyType
a = 0
b = 0
c = 0
d = 0
... # can be up to 10 or 20 fields
end
Base.#kwdef is nice because I can create objects with MyType(b=10,e=5) but sometimes I have arrays of the argument. I would like to be able to broadcast or succinctly construct an array of the structs.
That is I would like the following would create an array of three MyTypes: MyType.(c=[5,6,7],d = [1,2,3])
Instead, it creates a single MyType where c and d are arrays rather than scalar values.
What are ways to keep the convenience of both Base.#kwdef and easy array of struct construction?
Seems like a good use case for a comprehension:
julia> [MyType(c=cval, d=dval) for (cval, dval) in zip([5, 6, 7], [1, 2, 3])]
3-element Vector{MyType}:
MyType(0, 0, 5, 1)
MyType(0, 0, 6, 2)
MyType(0, 0, 7, 3)
Another possiblity (based on this answer ) is to explicitly do the broadcast call yourself:
julia> broadcast((cval, dval) -> MyType(c = cval, d = dval), [5, 6, 7], [1, 2, 3])
3-element Vector{MyType}:
MyType(0, 0, 5, 1)
MyType(0, 0, 6, 2)
MyType(0, 0, 7, 3)
or the equivalent ((cval, dval) -> MyType(c = cval, d = dval)).([5, 6, 7], [1, 2, 3]) as mentioned in the comment there.
Out of these, the array comprehension seems to me the clearest and most obvious way to go about it.
Following this post: https://github.com/JuliaLang/julia/issues/34737 there is no nice built-in syntax for your case.
One option is comprehension (see the other answer), second option (which I prefer here more) is building an anonymous function and vectoring over it such as:
julia> ((x,y)->MyType(;c=x,d=y)).([1,2],[3,5])
2-element Vector{MyType}:
MyType(0, 0, 1, 3)
MyType(0, 0, 2, 5)
It is also possible to call broadcast directly as:
julia> broadcast((x,y)->MyType(;c=x,d=y), [1,2],[3,5])
2-element Vector{MyType}:
MyType(0, 0, 1, 3)
MyType(0, 0, 2, 5)

Most common term in a vector - PARI/GP

I feel like I'm being really stupid here as I would have thought there's a simple command already in Pari, or it should be a simple thing to write up, but I simply cannot figure this out.
Given a vector, say V, which will have duplicate entries, how can one determine what the most common entry is?
For example, say we have:
V = [ 0, 1, 2, 2, 3, 4, 6, 8, 8, 8 ]
I want something which would return the value 8.
I'm aware of things like vecsearch, but I can't see how that can be tweaked to make this work?
Very closely related to this, I want this result to return the most common non-zero entry, and some vectors I look at will have 0 as the most common entry. Eg: V = [ 0, 0, 0, 0, 3, 3, 5 ]. So whatever I execute here I would like to return 3.
I tried writing up something which would remove all zero terms, but again struggled.
The thing I have tried in particular is:
rem( v ) = {
my( c );
while( c = vecsearch( v, 0 ); #c, v = vecextract( v, "^c" ) ); v
}
but vecextract doesn't seem to like this set up.
If you can ensure all the elements are within the some fixed range then it is enough just to do the counting sorting with PARI/GP code like this:
counts_for(v: t_VEC, lower: t_INT, upper: t_INT) = {
my(counts = vector(1+upper-lower));
for(i=1, #v, counts[1+v[i]-lower]++);
vector(#counts, i, [i-1, counts[i]])
};
V1 = [0, 1, 2, 2, 3, 4, 6, 8, 8, 8];
vecsort(counts_for(V1, 0, 8), [2], 4)[1][1]
> 8
V2 = [0, 0, 0, 0, 3, 3, 5];
vecsort(counts_for(V2, 0, 5), [2], 4)[1][1]
> 0
You also can implement the following short-cut for the sake of convenience:
counts_for1(v: t_VEC) = {
counts_for(v, vecmin(v), vecmax(v))
};
most_frequent(v: t_VEC) = {
my(counts=counts_for1(v));
vecsort(counts, [2], 4)[1][1]
};
most_frequent(V1)
> 8
most_frequent(V2)
> 0
The function matreduce provides this in a more general setting: applied to a vector of objects, it returns a 2-column matrix whose first column contains the distinct objects and the second their multiplicity in the vector. (The function has a more general form that takes the union of multisets.)
most_frequent(v) = my(M = matreduce(v), [n] = matsize(M)); M[n, 1];
most_frequent_non0(v) =
{ my(M = matreduce(v), [n] = matsize(M), x = M[n, 1]);
if (x == 0, M[n - 1, 1], x);
}
? most_frequent([ 0, 1, 2, 2, 3, 4, 6, 8, 8, 8 ])
%1 = 8
? most_frequent([x, x, Mod(1,3), [], [], []])
%2 = []
? most_frequent_non0([ 0, 0, 0, 0, 3, 3, 5 ])
%3 = 5
? most_frequent_non0([x, x, Mod(1,3), [], [], []])
%4 = x
The first function will error out if fed an empty vector, and the second one if there are no non-zero entries. The second function tests for "0" using the x == 0 test (and we famously have [] == 0 in GP); for a more rigorous semantic, use x === 0 in the function definition.

MPI Communication Pattern

I was wondering if there was a smart way to do this. Let's say I have three nodes, 0, 1, 2. And let's say each node has an array, a0, a1, a2. If the contents of each node is something like
a0 = {0, 1, 2, 1}
a1 = {1, 2, 2, 0}
a2 = {0, 0, 1, 2}
Is there a clever communication pattern so to move each number to it's corresponding node, i.e.
a0 = {0, 0, 0, 0}
a1 = {1, 1, 1, 1}
a2 = {2, 2, 2, 2}
The approach I have in mind, would involve sorting and temporary buffers, but I was wondering if there was a smarter way?
You can use MPI_Alltoallv for this in the following way:
Sort the local_data (a) by corresponding node of each element in increasing order.
Create a send_displacements array such that send_displacements[r] indicates the index of the first element in the local_data that refers to node r.
Create a send_counts array such that send_counts[r] equals the number of elements in local_data that correspond to node r. This can be computed send_counts[r] = send_displacements[r+1] - send_displacements[r] except for the last rank.
MPI_Alltoall(send_counts, 1, MPI_INT, recv_counts, 1, MPI_INT, comm)
Compute recv_displacements such that recv_displacements[r] = sum(recv_counts[r'] for all r' < r).
Prepare a recv_data with sum(recv_counts) elements.
MPI_Alltoallv(local_data, send_counts, send_displacements, MPI_INT, recv_data, recv_counts, recv_displacements, MPI_INT, comm)

Mathematica: part assignment

I'm trying to implement an algorithm to build a decision tree from a dataset.
I wrote a function to calculate the information gain between a subset and a particular partition, then I try all the possible partition and want to choose the "best" partition, in the sense that it's got the lowest entropy.
This procedure must be recursive, hence, after the first iteration, it needs to work for every subset of the partition you got in the previous step.
These are the data:
X = {{1, 0, 1, 1}, {1, 1, 1, 1}, {0, 1, 1, 1}, {1, 1, 1, 0}, {1, 1, 0, 0}}
Xfin[0]=X
This is the function: for every subset of the partition, it tries all the possible partitions and calculate the IG. Then it selects the partition with IGMAX:
Partizioneottimale[X_, n_] :=
For[l = 1, l <= Length[Flatten[X[n], n - 1]], l++,
For[v = 1, v <= m, v++,
If[IG[X[n][[l]], Partizione[X[n][[l]], v]] == IGMAX[X[n][[l]]],
X[n + 1][[l]] := Partizione[X[n][[l]], v]]]]
then I call it:
Partizioneottimale[Xfin, 0]
and it works fine for the first one:
Xfin[1]
{{{1, 0, 1, 1}, {1, 1, 1, 1}, {0, 1, 1, 1}, {1, 1, 1, 0}}, {{1, 0, 0, 0}}}
That is the partition with lowest entropy.
But it doesn't work for the next ones:
Partizioneottimale[Xfin, 1]
Set delayed::steps : Xfin[1+1] in the part assignment is not a symbol
Has anybody any idea about how to solve this?
Thanks
without unraveling all your logic a simple fix is this:
Partizioneottimale[X_, n_] := (
xnp1 = Table[Null, {Length[Flatten[X[n], n - 1]]}] ;
For[l = 1, l <= Length[Flatten[X[n], n - 1]], l++,
For[v = 1, v <= m, v++,
If[IG[X[n][[l]], Partizione[X[n][[l]], v]] == IGMAX[X[n][[l]]],
xnp1[[l]] = Partizione[X[n][[l]], v]]]] ;
X[n+1] = xnp1 ; )

Resources