Measure of Dispersion for Labelled Data

Measure of Dispersion for Labelled Data - math

I have data of the form {'A': 5, 'B': 10, 'C': 3} and I'd like to compute a value that's between 0 and 1 for the level of dispersion between labeled data. For example, if all the data were of the form {'A':3}, then there would be no dispersion and the value computed would be 0. If the data were of the form {'A': 1, 'B': 1, 'C': 1}, then the value computed would be 1 because there is maximal dispersion between the labels. And finally if the data were of the form {'A': 1, 'B': 2} then the value computed would be somewhere between 0 and 1.
More specifically, I'm trying to compute a value that indicates the level of physical dispersion between offices. For example, if a team of 10 people are in a single office, then there's no dispersion. If they're in a 10 different offices, then there's maximal dispersion. {'A': 2, 'B': 2, 'C': 2} should have a value less than {'A': 1, 'B': 1, 'C': 1, 'D': 1, 'E': 1, 'F': 1} because the latter is more disperse.
What's the correct mathematical function for this?

Related

only update key in dictionary from another dictionary as reference

For simplicity, i have 2 dictionaries below. i want to update the second dictionary (but only the keys) and set the value to 0 in reference to the first dictionary.
original dictionaries:
dict1={'a': 1, 'b': 2, 'c': 3}
dict2 ={'a': 2, 'b': 2}
after update:
dict1 ={'a': 1, 'b': 2, 'c': 3}
dict2 ={'a': 2, 'b': 2, 'c': 0}

You can use dict2.update with a dict comprehension to do this:
dict1={'a': 1, 'b': 2, 'c': 3}
dict2 ={'a': 2, 'b': 2}
dict2.update({k:0 for k,v in dict1.items() if k not in dict2})
print (dict1)
print (dict2)
{'a': 1, 'b': 2, 'c': 3}
{'a': 2, 'b': 2, 'c': 0}

Counting digit Frequencies with Python

I need your help in Python 2.7.
I made a dictionary:
{'1': 1, '3': 1, '2': 5, '6': 5}
Question 1:
What can I do if I want to print the key that has the highest value?
Question 2 :
In this case, '2' and '6' have both the same high value,
so I want Python to print the digit (type: int) of the highest key only (6).
How can I program this?
I tried many times and failed...
Im looking for easier way..without using 'sort'

simple pythonic code
a = {'1': 1, '3': 1, '2': 5, '6': 5}
m = sorted(a.items(),key=lambda x :x[1])
d = [i[0] for i in m if i[1]==m[-1][1]]
print d
print m[-1][0]

MPI Communication Pattern

I was wondering if there was a smart way to do this. Let's say I have three nodes, 0, 1, 2. And let's say each node has an array, a0, a1, a2. If the contents of each node is something like
a0 = {0, 1, 2, 1}
a1 = {1, 2, 2, 0}
a2 = {0, 0, 1, 2}
Is there a clever communication pattern so to move each number to it's corresponding node, i.e.
a0 = {0, 0, 0, 0}
a1 = {1, 1, 1, 1}
a2 = {2, 2, 2, 2}
The approach I have in mind, would involve sorting and temporary buffers, but I was wondering if there was a smarter way?

You can use MPI_Alltoallv for this in the following way:
Sort the local_data (a) by corresponding node of each element in increasing order.
Create a send_displacements array such that send_displacements[r] indicates the index of the first element in the local_data that refers to node r.
Create a send_counts array such that send_counts[r] equals the number of elements in local_data that correspond to node r. This can be computed send_counts[r] = send_displacements[r+1] - send_displacements[r] except for the last rank.
MPI_Alltoall(send_counts, 1, MPI_INT, recv_counts, 1, MPI_INT, comm)
Compute recv_displacements such that recv_displacements[r] = sum(recv_counts[r'] for all r' < r).
Prepare a recv_data with sum(recv_counts) elements.
MPI_Alltoallv(local_data, send_counts, send_displacements, MPI_INT, recv_data, recv_counts, recv_displacements, MPI_INT, comm)

Mathematica: part assignment

I'm trying to implement an algorithm to build a decision tree from a dataset.
I wrote a function to calculate the information gain between a subset and a particular partition, then I try all the possible partition and want to choose the "best" partition, in the sense that it's got the lowest entropy.
This procedure must be recursive, hence, after the first iteration, it needs to work for every subset of the partition you got in the previous step.
These are the data:
X = {{1, 0, 1, 1}, {1, 1, 1, 1}, {0, 1, 1, 1}, {1, 1, 1, 0}, {1, 1, 0, 0}}
Xfin[0]=X
This is the function: for every subset of the partition, it tries all the possible partitions and calculate the IG. Then it selects the partition with IGMAX:
Partizioneottimale[X_, n_] :=
For[l = 1, l <= Length[Flatten[X[n], n - 1]], l++,
For[v = 1, v <= m, v++,
If[IG[X[n][[l]], Partizione[X[n][[l]], v]] == IGMAX[X[n][[l]]],
X[n + 1][[l]] := Partizione[X[n][[l]], v]]]]
then I call it:
Partizioneottimale[Xfin, 0]
and it works fine for the first one:
Xfin[1]
{{{1, 0, 1, 1}, {1, 1, 1, 1}, {0, 1, 1, 1}, {1, 1, 1, 0}}, {{1, 0, 0, 0}}}
That is the partition with lowest entropy.
But it doesn't work for the next ones:
Partizioneottimale[Xfin, 1]
Set delayed::steps : Xfin[1+1] in the part assignment is not a symbol
Has anybody any idea about how to solve this?
Thanks

without unraveling all your logic a simple fix is this:
Partizioneottimale[X_, n_] := (
xnp1 = Table[Null, {Length[Flatten[X[n], n - 1]]}] ;
For[l = 1, l <= Length[Flatten[X[n], n - 1]], l++,
For[v = 1, v <= m, v++,
If[IG[X[n][[l]], Partizione[X[n][[l]], v]] == IGMAX[X[n][[l]]],
xnp1[[l]] = Partizione[X[n][[l]], v]]]] ;
X[n+1] = xnp1 ; )

Searching matrices in Mathematica 8 - Trying to find other elements on the same row as X

The text in italics describes my general goal, if anyone is interested. Question is underneath.
I am trying to graph the energy levels of simple molecules using Mathematica 8. My method is crude, and goes as this:
Find eigenvalues of simple Hückel matrix.
Delete duplicates and determine size of list.
Evaluate the number of degeneracies by comparing duplicate list with no-duplicate list.
Create a n x 2 zero matrix where n is the number of unique energy levels.
5. Fill first column with unique energy levels, second column with degeneracies.
The matrix generated in step 5 can look like this:
(1 2)
(3 1) == M
(-1 1)
I wish to evaluate the maximum of column 2, and then find the value of the element in the same row, but in column 1. In this case, the answer I am looking for is 1.
These commands both evaluate to -1:
Extract[M[[All, 1]], M[[Max[M[[All, 2]]], 1]]]
M[[Max[M[[All, 1]]], 1]]
which is not the answer I want.
Any tips?
EDIT: This
Part[Part[Position[M, Max[M[[All, 2]]]], 1], 1]
works, but I don't understand why I have to use Part[] twice.

m = {{1, 2}, {3, 1}, {-1, 1}}
max = Max[m[[All, 2]]]
So find the position of the max and replace the second column with the first:
pos=Position[m, max] /. {x_,_}:>{x,1}
{{1,1}}
Then take the first element from pos, i.e. {1,1} and sub use it in Part
m[[Sequence ## First[pos]]]
1
But having said that I prefer something like this:
Cases[m, {x_, max} :> x]
{1}
The result is a list. You could either use First#Cases[...] or you might want to keep a list of results to cover cases where the maximum value occurs more than once in a column.

The inner Part gives you the first occurance of the maximum. Position returns a list of positions, even if there is only one element that has the maximum value, like this:
M = {{2, 2}, {2, 3}, {2, 2}, {1, 1}}
{{2, 2}, {2, 3}, {2, 2}, {1, 1}}
Position[M, Max[M[[All, 2]]]]
{{2, 2}}
So you want the first element in the first element of this output. You could condense your code like this:
Position[M, Max[M[[All, 2]]]][[1, 1]]
However, one thing that I think your code needs to handle better is this case:
M = {{3, 2}, {2, 3}, {2, 2}, {1, 1}}
3, 2}, {2, 3}, {2, 2}, {1, 1}}
Position[M, Max[M[[All, 2]]]]
{{1, 1}, {2, 2}}
You will get the wrong answer with your code in this case.
Better would be:
M[[All, 1]][[Position[M[[All, 2]], Max[M[[All, 2]]]][[1, 1]] ]]
Or alternatively
M[[Position[M[[All, 2]], Max[M[[All, 2]]]][[1, 1]], 1]]

If you only want a single column one value in the case of duplicate maximum values in column two I suggest that you make use of Ordering:
m = {{1, 3}, {1, 8}, {5, 7}, {2, 2}, {1, 9}, {4, 9}, {5, 6}};
m[[ Ordering[m[[All, 2]], -1], 1 ]]
{4}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Measure of Dispersion for Labelled Data - math

Related

only update key in dictionary from another dictionary as reference

Counting digit Frequencies with Python

MPI Communication Pattern

Mathematica: part assignment

Searching matrices in Mathematica 8 - Trying to find other elements on the same row as X

Categories

Resources