How to calculate mean value in a dictionary? - dictionary

I'm trying to compute the mean value of a dictionary which have lot of lists
My dictionary looks like:
{0:[0,1,3,6,1,-5,....],1:[0,3,7,3,-5,2,...],...}
with a total of k entries and lists of lenght N.
However, I am NOT trying to compute the mean value of each list, what I need is to compute the mean value bewteen elements of the lists such (a,b)=mean, i.e. looking at the dictionary above (0,0)=0, (1,3)=2, (3,7)=5,....
Is there a way to compute something like this?
Thanks.

You can unwrap the values and use zip to correlate matching indexes:
from numpy import mean
result = [mean(x) for x in zip(*(d.values()))]

Given that all lists are of the same length:
var k = Object.keys(json).length; // k, number of lists
var n = json[0].length; // N, elements in lists
for (var i = 0; i < n; i++) {
var sum = 0;
for (var j = 0; j < k; j++) {
sum += json[j][i];
}
var mean = sum / k;
console.log(mean);
}

Related

Find the number of possible sums which add to N using (1,...,K)

I have the following problem to solve: given a number N and 1<=k<=N, count the number of possible sums of (1,...,k) which add to N. There may be equal factors (e.g. if N=3 and k=2, (1,1,1) is a valid sum), but permutations must not be counted (e.g., if N=3 and k=2, count (1,2) and (2,1) as a single solution). I have implemented the recursive Python code below but I'd like to find a better solution (maybe with dynamic programming? ). It seems similar to the triple step problem, but with the extra constraint of not counting permutations.
def find_num_sums_aux(n, min_k, max_k):
# base case
if n == 0:
return 1
count = 0
# due to lower bound min_k, we evaluate only ordered solutions and prevent permutations
for i in range(min_k, max_k+1):
if n-i>=0:
count += find_num_sums_aux(n-i, i, max_k)
return count
def find_num_sums(n, k):
count = find_num_sums_aux(n,1,k)
return count
This is a standard problem in dynamic programming (subset sum problem).
Lets define the function f(i,j) which gives the number of ways you can get the sum j using a subset of the numbers (1...i), then the result to your problem will be f(k,n).
for each number x of the range (1...i), x might be a part of the sum j or might not, so we need to count these two possibilities.
Note: f(i,0) = 1 for any i, which means that you can get the sum = 0 in one way and this way is by not taking any number from the range (1...i).
Here is the code written in C++:
int n = 10;
int k = 7;
int f[8][11];
//initializing the array with zeroes
for (int i = 0; i <= k; i++)
for (int j = 0; j <= n; j++)
f[i][j] = 0;
f[0][0] = 1;
for (int i = 1; i <= k; i++) {
for (int j = 0; j <= n; j++) {
if (j == 0)
f[i][j] = 1;
else {
f[i][j] = f[i - 1][j];//without adding i to the sum j
if (j - i >= 0)
f[i][j] = f[i][j] + f[i - 1][j - i];//adding i to the sum j
}
}
}
cout << f[k][n] << endl;//print f(k,n)
Update
To handle the case where we can repeat the elements like (1,1,1) will give you the sum 3, you just need to allow picking the same element multiple times by changing the following line of code:
f[i][j] = f[i][j] + f[i - 1][j - i];//adding i to the sum
To this:
f[i][j] = f[i][j] + f[i][j - i];

Shuffle an array in Arduino software

I have a problem with Shuffling this array with Arduino software:
int questionNumberArray[10]={0,1,2,3,4,5,6,7,8,9};
Does anyone know a build in function or a way to shuffle the values in the array without any repeating?
The simplest way would be this little for loop:
int questionNumberArray[] = {0,1,2,3,4,5,6,7,8,9};
const size_t n = sizeof(questionNumberArray) / sizeof(questionNumberArray[0]);
for (size_t i = 0; i < n - 1; i++)
{
size_t j = random(0, n - i);
int t = questionNumberArray[i];
questionNumberArray[i] = questionNumberArray[j];
questionNumberArray[j] = t;
}
Let's break it line by line, shall we?
int questionNumberArray[] = {0,1,2,3,4,5,6,7,8,9};
You don't need to put number of cells if you initialize an array like that. Just leave the brackets empty like I did.
const size_t n = sizeof(questionNumberArray) / sizeof(questionNumberArray[0]);
I decided to store number of cells in n constant. Operator sizeof gives you number of bytes taken by your array and number of bytes taken by one cell. You divide first number by the second and you have size of your array.
for (size_t i = 0; i < n - 1; i++)
Please note, that range of the loop is n - 1. We don't want i to ever have value of last index.
size_t j = random(0, n - i);
We declare variable j that points to some random cell with index greater than i. That is why we never wanted i to have n - 1 value - because then j would be out of bound. We get random number with Arduino's random function: https://www.arduino.cc/en/Reference/Random
int t = questionNumberArray[i];
questionNumberArray[i] = questionNumberArray[j];
questionNumberArray[j] = t;
Simple swap of two values. It's possible to do it without temporary t variable, but the code is less readable then.
In my case the result was as follows:
questionNumberArray[0] = 0
questionNumberArray[1] = 9
questionNumberArray[2] = 7
questionNumberArray[3] = 4
questionNumberArray[4] = 6
questionNumberArray[5] = 5
questionNumberArray[6] = 1
questionNumberArray[7] = 8
questionNumberArray[8] = 2
questionNumberArray[9] = 3

R: How to compute correlation between rows of a matrix without having to transpose it?

I have a big matrix and am interested in computing the correlation between the rows of the matrix. Since the cor method computes correlation between the columns of a matrix, I am transposing the matrix before calling cor. But since the matrix is big, transposing it is expensive and is slowing down my program. Is there a way to compute the correlations among the rows without having to take transpose?
EDIT: thanks for the responses. thought i'd share some findings. my input matrix is 16 rows by 239766 cols and comes from a .mat file. I wrote C# code to do the same thing using the csmatio library. it looks like this:
foreach (var file in Directory.GetFiles(path, interictal_pattern))
{
var reader = new MatFileReader(file);
var mla = reader.Data[0] as MLStructure;
convert(mla.AllFields[0] as MLNumericArray<double>, data);
double sum = 0;
for (var i = 0; i < 16; i++)
{
for (var j = i + 1; j < 16; j++)
{
sum += cor(data, i, j);
}
}
var avg = sum / 120;
if (++count == 10)
{
var t2 = DateTime.Now;
var t = t2 - t1;
Console.WriteLine(t.TotalSeconds);
break;
}
}
static double[][] createArray(int rows, int cols)
{
var ans = new double[rows][];
for (var row = 0; row < rows; row++)
{
ans[row] = new double[cols];
}
return ans;
}
static void convert(MLNumericArray<double> mla, double[][] M)
{
var rows = M.Length;
var cols = M[0].Length;
for (int i = 0; i < rows; i++)
for (int j = 0; j < cols; j++)
M[i][j] = mla.Get(i, j);
}
static double cor(double[][] M, int i, int j)
{
var count = M[0].Length;
double sum1 = 0, sum2 = 0;
for (int ctr = 0; ctr < count; ctr++)
{
sum1 += M[i][ctr];
sum2 += M[j][ctr];
}
var mu1 = sum1 / count;
var mu2 = sum2 / count;
double numerator = 0, sumOfSquares1 = 0, sumOfSquares2 = 0;
for (int ctr = 0; ctr < count; ctr++)
{
var x = M[i][ctr] - mu1;
var y = M[j][ctr] - mu2;
numerator += x * y;
sumOfSquares1 += x * x;
sumOfSquares2 += y * y;
}
return numerator / Math.Sqrt(sumOfSquares1 * sumOfSquares2);
}
this gave a throughput of 22.22s for 10 files or 2.22s/file
Then I profiled my R code:
ptm=proc.time()
for(file in files)
{
i = i + 1;
mat = readMat(paste(path,file,sep=""))
a = t(mat[[1]][[1]])
C = cor(a)
correlations[i] = mean(C[lower.tri(C)])
}
print(proc.time()-ptm)
to my surprise its running faster than C# and is giving throughput of 5.7s per 10 files or 0.6s/file (an improvement of almost 4x!). The bottleneck in C# is the methods inside csmatio library to parse double values from input stream.
and if i do not convert the csmatio classes into a double[][] then the C# code runs extremely slow (order of magnitude slower ~20-30s/file).
Seeing that this problem arises from a data input issue whose details are not stated (and only hinted at in a comment), I will assume this is a comma-delimited file of unquoted numbers with the number of columns= Ncol. This does the transposition on input.
in.mat <- matrix( scan("path/to/the_file/fil.txt", what =numeric(0), sep=","),
ncol=Ncol, byrow=TRUE)
cor(in.nmat)
One dirty work-around would be to apply cor-functions row-wise and produce the correlation matrix from the results. You could try if this is any more efficient (which I doubt, though you could fine-tune it by not double computing everything or the redundant diagonal cases):
# Apply 2-fold nested row-wise functions
set.seed(1)
dat <- matrix(rnorm(1000), nrow=10)
cormat <- apply(dat, MARGIN=1, FUN=function(z) apply(dat, MARGIN=1, FUN=function(y) cor(z, y)))
cormat[1:3,1:3] # Show few first
# [,1] [,2] [,3]
#[1,] 1.000000000 0.002175792 0.1559263
#[2,] 0.002175792 1.000000000 -0.1870054
#[3,] 0.155926259 -0.187005418 1.0000000
Though, generally I would expect the transpose to have a really, really efficient implementation, so it's hard to imagine when that would be the bottle-neck. But, you could also dig through the implementation of 'cor' function and call the correlation C-function itself by first making sure your rows are suitable. Type 'cor' in the terminal to see the implementation, which is mostly a wrapper that makes input suitable for the C-function:
# Row with C-call from the implementation of 'cor':
# if (method == "pearson")
# .Call(C_cor, x, y, na.method, FALSE)
You can use outer:
outer(seq(nrow(mat)), seq(nrow(mat)),
Vectorize(function(x, y) cor(mat[x , ], mat[y , ])))
where mat is the name of your matrix.

Solving linear equations with numeric.js

I must be missing something really simple here. I've got some JS code that creates simple linear systems (I'm trying to create the shortest line between two skew lines). I've gotten to the point where I have Ax = b, and need to solve for x. A is a 3 x 2 matrix, b is 3 x 1.
I have:
function build_equation_system(v1, v2, b) {
var a = [ [v1.x, v2.x], [v1.y, v2.y], [v1.z, v2.z] ];
var b = [ [b.x], [b.y], [b.z]];
return numeric.solve(a,b)
}
Numeric returns a 1 x 3 matrix of NaNs, even when there is a solution.
Using numeric you can do the following:
Create a function which computes the pseudoinverse of your A matrix:
function pinv(A) {
return numeric.dot(numeric.inv(numeric.dot(numeric.transpose(A),A)),numeric.transpose(A));
}
Use that function to solve your linear least squares equation to get the coefficients.
var p = numeric.dot(pinv(a),b);
I tried your initial method of using numeric.solve and could not get it to work either so I'd be interested to know what the problem is.
A simple test...
var x = new Array(10);
var y = new Array(10);
for (var i = 0; i < 10; ++i) {
x[i] = i;
y[i] = i;
}
// Solve for the first order equation representing this data
var n = 1;
// Construct Vandermonde matrix.
var A = numeric.rep([x.length, n + 1], 1);
for (var i = 0; i < x.length; ++i) {
for (var j = n-1; j >= 0; --j) {
A[i][j] = x[i] * A[i][j+1];
}
}
// Solves the system Ap = y
var p = numeric.dot(pinv(A),y);
p = [1, 2.55351295663786e-15]
I've used this method to recreate MATLAB's polyfit for Javascript use.

Algorithm to generate interval graph

I wonder if there is any algorithm or some easy procedure to generate an interval graph?
I need to generate interval graphs with n nodes, where n is changing for 1 to, say, 10000.
If it is possible, I need an incidence matrix representation of the graph.
An additional restriction is not to have all these graphs complete.
Thanks everyone in advance!
==ADDITION==
Here is an implementation in Java:
public Object generate(int numberOfNodes) {
int listCapacity = numberOfNodes * 2;
List<Integer> arr = new ArrayList<Integer>();
int[][] adjacencyMatrix = new int[numberOfNodes][numberOfNodes];
Integer nodeNumber = 0;
for (int i = 0; i < listCapacity; i = i + 2) {
arr.add(nodeNumber);
arr.add(nodeNumber);
nodeNumber++;
}
Collections.shuffle(arr);
for (int i = 0; i < numberOfNodes; i++) {
for (int j = arr.indexOf(i); j < arr.lastIndexOf(i); j++) {
adjacencyMatrix[i][arr.get(j)] = 1;
adjacencyMatrix[arr.get(j)][i] = 1;
}
adjacencyMatrix[i][i] = 0;
}
return new Graph(adjacencyMatrix);
}
Though, in some cases it fails to produce interval graph.
One possible way to generate an interval graph with N nodes:
create an array [1, 1, 2, 2, ... n, n]
shuffle the array
create a graph:
each node v_i corresponds to the pair of occurences of i in the shuffled array
two nodes v_i and v_j are connected with an edge iff i and j are interleaved in the array. That is i j i j or i j j i, but not i i j j. In other words, the intervals i and j intersect.
This graph is guaranteed to be an interval graph (every node is an interval in the original array), and every graph is possible to create this way.

Resources