Mathematica: part assignment - recursion

I'm trying to implement an algorithm to build a decision tree from a dataset.
I wrote a function to calculate the information gain between a subset and a particular partition, then I try all the possible partition and want to choose the "best" partition, in the sense that it's got the lowest entropy.
This procedure must be recursive, hence, after the first iteration, it needs to work for every subset of the partition you got in the previous step.
These are the data:
X = {{1, 0, 1, 1}, {1, 1, 1, 1}, {0, 1, 1, 1}, {1, 1, 1, 0}, {1, 1, 0, 0}}
Xfin[0]=X
This is the function: for every subset of the partition, it tries all the possible partitions and calculate the IG. Then it selects the partition with IGMAX:
Partizioneottimale[X_, n_] :=
For[l = 1, l <= Length[Flatten[X[n], n - 1]], l++,
For[v = 1, v <= m, v++,
If[IG[X[n][[l]], Partizione[X[n][[l]], v]] == IGMAX[X[n][[l]]],
X[n + 1][[l]] := Partizione[X[n][[l]], v]]]]
then I call it:
Partizioneottimale[Xfin, 0]
and it works fine for the first one:
Xfin[1]
{{{1, 0, 1, 1}, {1, 1, 1, 1}, {0, 1, 1, 1}, {1, 1, 1, 0}}, {{1, 0, 0, 0}}}
That is the partition with lowest entropy.
But it doesn't work for the next ones:
Partizioneottimale[Xfin, 1]
Set delayed::steps : Xfin[1+1] in the part assignment is not a symbol
Has anybody any idea about how to solve this?
Thanks

without unraveling all your logic a simple fix is this:
Partizioneottimale[X_, n_] := (
xnp1 = Table[Null, {Length[Flatten[X[n], n - 1]]}] ;
For[l = 1, l <= Length[Flatten[X[n], n - 1]], l++,
For[v = 1, v <= m, v++,
If[IG[X[n][[l]], Partizione[X[n][[l]], v]] == IGMAX[X[n][[l]]],
xnp1[[l]] = Partizione[X[n][[l]], v]]]] ;
X[n+1] = xnp1 ; )

Related

Most common term in a vector - PARI/GP

I feel like I'm being really stupid here as I would have thought there's a simple command already in Pari, or it should be a simple thing to write up, but I simply cannot figure this out.
Given a vector, say V, which will have duplicate entries, how can one determine what the most common entry is?
For example, say we have:
V = [ 0, 1, 2, 2, 3, 4, 6, 8, 8, 8 ]
I want something which would return the value 8.
I'm aware of things like vecsearch, but I can't see how that can be tweaked to make this work?
Very closely related to this, I want this result to return the most common non-zero entry, and some vectors I look at will have 0 as the most common entry. Eg: V = [ 0, 0, 0, 0, 3, 3, 5 ]. So whatever I execute here I would like to return 3.
I tried writing up something which would remove all zero terms, but again struggled.
The thing I have tried in particular is:
rem( v ) = {
my( c );
while( c = vecsearch( v, 0 ); #c, v = vecextract( v, "^c" ) ); v
}
but vecextract doesn't seem to like this set up.
If you can ensure all the elements are within the some fixed range then it is enough just to do the counting sorting with PARI/GP code like this:
counts_for(v: t_VEC, lower: t_INT, upper: t_INT) = {
my(counts = vector(1+upper-lower));
for(i=1, #v, counts[1+v[i]-lower]++);
vector(#counts, i, [i-1, counts[i]])
};
V1 = [0, 1, 2, 2, 3, 4, 6, 8, 8, 8];
vecsort(counts_for(V1, 0, 8), [2], 4)[1][1]
> 8
V2 = [0, 0, 0, 0, 3, 3, 5];
vecsort(counts_for(V2, 0, 5), [2], 4)[1][1]
> 0
You also can implement the following short-cut for the sake of convenience:
counts_for1(v: t_VEC) = {
counts_for(v, vecmin(v), vecmax(v))
};
most_frequent(v: t_VEC) = {
my(counts=counts_for1(v));
vecsort(counts, [2], 4)[1][1]
};
most_frequent(V1)
> 8
most_frequent(V2)
> 0
The function matreduce provides this in a more general setting: applied to a vector of objects, it returns a 2-column matrix whose first column contains the distinct objects and the second their multiplicity in the vector. (The function has a more general form that takes the union of multisets.)
most_frequent(v) = my(M = matreduce(v), [n] = matsize(M)); M[n, 1];
most_frequent_non0(v) =
{ my(M = matreduce(v), [n] = matsize(M), x = M[n, 1]);
if (x == 0, M[n - 1, 1], x);
}
? most_frequent([ 0, 1, 2, 2, 3, 4, 6, 8, 8, 8 ])
%1 = 8
? most_frequent([x, x, Mod(1,3), [], [], []])
%2 = []
? most_frequent_non0([ 0, 0, 0, 0, 3, 3, 5 ])
%3 = 5
? most_frequent_non0([x, x, Mod(1,3), [], [], []])
%4 = x
The first function will error out if fed an empty vector, and the second one if there are no non-zero entries. The second function tests for "0" using the x == 0 test (and we famously have [] == 0 in GP); for a more rigorous semantic, use x === 0 in the function definition.

How can I get `True` from `[1, 1, 0, 0, 0] == [0, 0, 1, 1, 0]` in Python?

Example:
I have a solution list a:
a = [1, 1, 0, 0, 0]
and input lists bs:
b1 = [1, 1, 0, 0, 0]
b2 = [0, 1, 1, 0, 0]
b3 = [0, 0, 1, 1, 0]
...
bn = [1, 0, 0, 0, 1]
If I compare a to either b1, b2, ..., bn, I expected to get True value from the comparisons. For sure, this simple expression will not work:
if a == b:
...
because in Python only identical lists can be equal.
Is there any beautiful math that I can easily implement it in programming languages? Now I am thinking about building some hash function but I'm still not sure how?
Note 1) it can be easily implemented by just using for loop but I need some thing more robust. 2) this is maybe also related to problem of this post Cyclic group
A simple solution could be to adjust the a and b values:
a_original = [5, 2, 3, 1, 4]
a_formatted = sorted(a_original)
Then, you can just use the formatted variables. A simple "for" loop can be used to format all of your variables.
Hope this helps!

MPI Communication Pattern

I was wondering if there was a smart way to do this. Let's say I have three nodes, 0, 1, 2. And let's say each node has an array, a0, a1, a2. If the contents of each node is something like
a0 = {0, 1, 2, 1}
a1 = {1, 2, 2, 0}
a2 = {0, 0, 1, 2}
Is there a clever communication pattern so to move each number to it's corresponding node, i.e.
a0 = {0, 0, 0, 0}
a1 = {1, 1, 1, 1}
a2 = {2, 2, 2, 2}
The approach I have in mind, would involve sorting and temporary buffers, but I was wondering if there was a smarter way?
You can use MPI_Alltoallv for this in the following way:
Sort the local_data (a) by corresponding node of each element in increasing order.
Create a send_displacements array such that send_displacements[r] indicates the index of the first element in the local_data that refers to node r.
Create a send_counts array such that send_counts[r] equals the number of elements in local_data that correspond to node r. This can be computed send_counts[r] = send_displacements[r+1] - send_displacements[r] except for the last rank.
MPI_Alltoall(send_counts, 1, MPI_INT, recv_counts, 1, MPI_INT, comm)
Compute recv_displacements such that recv_displacements[r] = sum(recv_counts[r'] for all r' < r).
Prepare a recv_data with sum(recv_counts) elements.
MPI_Alltoallv(local_data, send_counts, send_displacements, MPI_INT, recv_data, recv_counts, recv_displacements, MPI_INT, comm)

Mathematica plotting based on all previous equation results

I have a plot
Plot[40500*x^(-0.1), {x, 1, 100}, PlotRange -> {0, 50000}]
I'm trying to plot the cumulative of these y values. I'll try to explain with an example:
I'm trying to get
for x=1: 40500*1^(-0.1)
for x=2: 40500*(2^(-0.1)+1^(-0.1))
for x=3: 40500*(3^(-0.1)+2^(-0.1)+1^(-0.1))
and so on up to x=100.
Is there a way to do that?
Running some examples for x = 3
for x=3: 40500*(3^(-0.1)+2^(-0.1)+1^(-0.1))
114574.
This can be found using Sum:
Sum[40500*i^(-0.1), {i, 3}]
or using Fold
Fold[#1 + 40500*#2^(-0.1) &, 0, {1, 2, 3}]
114574.
FoldList outputs the intermediate steps.
FoldList[#1 + 40500*#2^(-0.1) &, 0, {1, 2, 3}]
{0, 40500., 78287.8, 114574.}
Accumulating to 100 and discarding the initial zero value:
ListLinePlot[Rest[FoldList[#1 + 40500*#2^(-0.1) &, 0, Range[100]]]]

Mathematica: integrate symbolic vector function

I wrote a program that defines two piecewise functions "gradino[x_]" and "gradino1[x_]", where x is a vector of m components.
I'm not able to write these functions explicitly using the x_i, I need to keep x as a vector.
I need to measure the distance between these two function doing:
Integrate[Abs[gradino[x]-gradino1[x]],{x[[1]],0,100},{x[[2],0,100},{x[[3]],0,100}...{x[[m]],0,100}]
but it's not working.
Any idea how to do this? Remembering that I can't simply express gradino[x1_,x2_ etc...].
re: "its not working" posting the actual error message is usually a good idea,
in this case "Part specification x[[1]] is longer than depth of object.".. tells you exactly what the problem is. If x is not already defined as a list you cannot use list elements as integration variables.
f[y_] := y[[1]] y[[2]];
Integrate[ f[x] , {x[[1]], 0, 1}, {x[[2]], 0, 1}]
(* error Part specification x[[1]] is longer than depth of object. *)
If you first define x as a list, then it works:
x = Array[z, 2];
Integrate[ f[x] , {x[[1]], 0, 1}, {x[[2]], 0, 1}]
(*1/4*)
Note you can not do this with nintegrate:
NIntegrate[ f[x] , {x[[1]], 0, 1}, {x[[2]], 0, 1}]
(*error Tag Part in x[[1]] is Protected *)
you need to use the explicit elements:
NIntegrate[ f[x] , {z[1], 0, 1}, {z[2], 0, 1}]
(* 0.25 *)
According to the model above, with
x = Array[z, 2];
why the following is ok:
f[y_] := NIntegrate[y[[1]] y[[2]] t, {t, 0, 1}];
NIntegrate[f[x], {z[1], 0, 1}, {z[2], 0, 1}]
but the following is not:
f[y_] := NIntegrate[y[[1]] y[[2]] Exp[t], {t, 0, 1}];
NIntegrate[f[x], {z[1], 0, 1}, {z[2], 0, 1}]
The only difference is changing t in the inner integration into Exp[t].

Resources