My implementation of k-means gives different results - initialization

I tried implementing Lloyd's algorithm and it seemed good until I ran it multiple times. Sometimes it gives the results I want, sometimes it gives strange centres.
I tried to change the condition so it stops when it has converged, but it doesn't help. Sorry for not translating comments to English, I hope it's clear enough.
The only randomness I have in the code is in the situation where my cluster empties so I replace it with a random point. I have no other idea what to do when this happens.
I can't see the problem. Can you give me an idea what might be the problem from the result figures?
This is my code:
(A is a matrix whose rows are my points)
% initialization of centroids; further-first method
n=size(A,1);
dim=size(A,2);
centri=zeros(k,dim); %matrix of centroids
for i=1:n
centri(1,:)=centri(1,:)+A(i,:);
end
centri(1,:)=centri(1,:)/n;
for j=2:k %u svakom koraku postavljamo za centar onu tocku koja je najdalje od centra 1,..j-1
maks=zeros(1,n);
%maks(i) je najveca udaljenost te tocke do centra =max d(x(i),c), c centri
for i=1:n
dist=zeros(1,j-1);
for l=1:j-1
dist(l)=norm(A(i,:)-centri(l,:));
end
if(size(dist,2)==1) maks(i)=dist;
else
maks(i)=max(dist);
end
%maks(i)=0;
%for l=1:j-1
% if(maks(i)<dist(l)) maks(i)=dist(l);
% end
%end
end
[maksi, ind]=max(maks);
centri(j,:)=A(ind(1),:);
end
indeksi=zeros(1,n);
for i=1:n
indeksi(i)=randi(k,1);
end
% u centrima je postavljena pocetna inicijalizacija
br_iter=0;
tic
while br_iter<=1000
br_iter=br_iter+1;
for i=1:n
dist=zeros(1,k); % udaljenosti od tocke x do centra j
for j=1:k
dist(j)=norm(A(i,:)-centri(j,:));
end
[mini, ind]=min(dist); % ind je vektor za koji se poprima minimalna vrijednost
indeksi(i)=ind(1); % uzmemo prvi po redu
end
% sad radimo nove centroide koji su aritmetička sredina svih vektora koji mu pripadaju
for j=1:k
centri(j,:)=zeros(1,dim);
brojac=0;
for i=1:n
if indeksi(i)==j
centri(j,:)=centri(j,:)+A(i,:);
brojac=brojac+1;
end
end
if brojac
centri(j,:)=centri(j,:)/brojac;
else
ind=randi(n, 1);
centri(j,:)=A(ind,:);
end
end
end
toc
for i=1:n
plot(A(i,1), A(i,2), '.b');
if(i==1) hold on;
end
end
for i=1:k
plot(centri(i,1), centri(i,2), '+r');
end
hold off

Starting with centers all zero is not a recommended approach.
After the first iteration, all but one of these centers will be empty. So randomness does have an effect on your result.
The results you show are typical for k-means. It does not guarantee to fond the optimum, but it can get stuck in a "local optimum".
So I don't think there is an error in your code. Just the starting condition is not chosen very wisely & you are mistaken to expect k-means to always give good results.

Related

Schroders Big number sequence

I am implementing a recursive program to calculate the certain values in the Schroder sequence, and I'm having two problems:
I need to calculate the number of calls in the program;
Past a certain number, the program will generate incorrect values (I think it's because the number is too big);
Here is the code:
let rec schroder n =
if n <= 0 then 1
else if n = 1 then 2
else 3 * schroder (n-1) + sum n 1
and sum n k =
if (k > n-2) then 0
else schroder k * schroder (n-k-1) + sum n (k+1)
When I try to return tuples (1.), the function sum stops working because it's trying to return int when it has type int * int;
Regarding 2., when I do schroder 15 it returns:
-357364258
when it should be returning
3937603038.
EDIT:
firstly thanks for the tips, secondly after some hours of deep struggle, i manage to create the function, now my problem is that i'm struggling to install zarith. I think I got it installed, but ..
in terminal when i do ocamlc -I +zarith test.ml i get an error saying Required module 'Z' is unavailable.
in utop after doing #load "zarith.cma";; and #install_printer Z.pp_print;; i can compile, run the function and it works. However i'm trying to implement a Scanf.scanf so that i can print different values of the sequence. With this being said whenever i try to run the scanf, i dont get a chance to write any number as i get a message saying that '\\n' is not a decimal digit.
With this being said i will most probably also have problems with printing the value, because i dont think that i'm going to be able to print such a big number with a %d. The let r1,c1 = in the following code, is a example of what i'm talking about.
Here's what i'm using :
(function)
..
let v1, v2 = Scanf.scanf "%d %d" (fun v1 v2-> v1,v2);;
let r1,c1 = schroder_a (Big_int_Z.of_int v1) in
Printf.printf "%d %d\n" (Big_int_Z.int_of_big_int r1) (Big_int_Z.int_of_big_int c1);
let r2,c2 = schroder_a v2 in
Printf.printf "%d %d\n" r2 c2;
P.S. 'r1' & 'r2' stands for result, and 'c1' and 'c2' stands for the number of calls of schroder's recursive function.
P.S.S. the prints are written differently because i was just testing, but i cant even pass through the scanf so..
This is the third time I've seen this problem here on StackOverflow, so I assume it's some kind of school assignment. As such, I'm just going to make some comments.
OCaml doesn't have a function named sum built in. If it's a function you've written yourself, the obvious suggestion would be to rewrite it so that it knows how to add up the tuples that you want to return. That would be one approach, at any rate.
It's true, ints in OCaml are subject to overflow. If you want to calculate larger values you need to use a "big number" package. The one to use with a modern OCaml is Zarith (I have linked to the description on ocaml.org).
However, none of the other people solving this assignment have mentioned overflow as a problem. It could be that you're OK if you just solve for representable OCaml int values.
3937603038 is larger than what a 32-bit int can hold, and will therefore overflow. You can fix this by using int64 instead (until you overflow that too). You'll have to use int64 literals, using the L suffix, and operations from the Int64 module. Here's your code converted to compute the value as an int64:
let rec schroder n =
if n <= 0 then 1L
else if n = 1 then 2L
else Int64.add (Int64.mul 3L (schroder (n-1))) (sum n 1)
and sum n k =
if (k > n-2) then 0L
else Int64.add (Int64.mul (schroder k) (schroder (n-k-1))) (sum n (k+1))
I need to calculate the number of calls in the program;
...
the function 'sum' stops working because it's trying to return 'int' when it has type 'int * int'
Make sure that you have updated all the recursive calls to shroder. Remember it is now returning a pair not a number, so you can't, for example, just to add it and you need to unpack the pair first. E.g.,
...
else
let r,i = schroder (n-1) (i+1) in
3 * r + sum n 1 and ...
and so on.
Past a certain number, the program will generate incorrect values (I think it's because the number is too big);
You need to use an arbitrary-precision numbers, e.g., zarith

How to calculate the number distribution of data in IDL

I have a data consist of time and flux (4117 rows x 2 columns) I want to calculate and plot the number distribution of brightness variation between all pairs of two consecutive data points same as the picture distribution of brightness variation
This is the code I used in idl
nx=4117
t=fltarr(nx)
f=fltarr(nx)
df=fltarr(nx-1)
dt=fltarr(nx-1)
n=4116
dff=fltarr(n)
dc=fltarr(n-1)
data=read_table('data.dat')
;print,data(0,*) ;this is t (time)
;print,data(1,*) ;this is f (flux)
; Plot the light curve
window,0
plot,data(0,*)/data(0,0),data(1,*)/data(1,0),yrange=[0.93,1.1],ystyle=1
; calculate the flux difference (dff)
for i=0,nx-2 do begin
df(i)=data(1,i+1)/data(1,0) - data(1,i)/data(1,0)
dt(i)=data(0,i+1)/data(0,0) - data(0,i)/data(0,0)
endfor
for i=0,n-1 do dff(i)=min(df)+i*(max(df)-min(df))/float(n-1.0)
print,dff
; calculate the number distribution (dc), I want the counter to reset to zero after every point and start to count again
for i=0,n-2 do begin
c=0.0
for j=0,nx-2 do begin
IF (df(j) < dff(i+1)) or (df(j) > dff(i)) THEN begin
c=c+1
dc(i)=c
endif
endfor
print, dc(i)
endfor
end
when I run the code all the value of dc is 4116 . I think the way I calculated dc is wrong. Any suggestion to do this in proper way?
I'm pretty sure the problem is this line:
IF (df(j) < dff(i+1)) or (df(j) > dff(i)) THEN begin
In IDL, a < b is shorthand for "the lesser of a and b", and a > b is likewise "the greater of a and b". So right now, your IF statement is actually evaluated as:
IF (df(j) or dff(i+1), whichever is less) OR (df(j) or dff(i), whichever is more) THEN begin
and since non-zero floats evaluate as TRUE in IDL, it's ultimately this:
IF TRUE or TRUE THEN begin
Because the IF statement is always true, c is always incremented, and every value of dc ends up being 4116.
To fix this, you want to use the relational operators LT and GT, which return TRUE or FALSE. In other words, your code should read:
IF (df(j) LT dff(i+1)) or (df(j) GT dff(i)) THEN begin

The Eight-Queen Puzzle in Programming in Lua Fourth Edition

I'm currently reading Programming in Lua Fourth Edition and I'm already stuck on the first exercise of "Chapter 2. Interlude: The Eight-Queen Puzzle."
The example code is as follows:
N = 8 -- board size
-- check whether position (n, c) is free from attacks
function isplaceok (a, n ,c)
for i = 1, n - 1 do -- for each queen already placed
if (a[i] == c) or -- same column?
(a[i] - i == c - n) or -- same diagonal?
(a[i] + i == c + n) then -- same diagonal?
return false -- place can be attacked
end
end
return true -- no attacks; place is OK
end
-- print a board
function printsolution (a)
for i = 1, N do -- for each row
for j = 1, N do -- and for each column
-- write "X" or "-" plus a space
io.write(a[i] == j and "X" or "-", " ")
end
io.write("\n")
end
io.write("\n")
end
-- add to board 'a' all queens from 'n' to 'N'
function addqueen (a, n)
if n > N then -- all queens have been placed?
printsolution(a)
else -- try to place n-th queen
for c = 1, N do
if isplaceok(a, n, c) then
a[n] = c -- place n-th queen at column 'c'
addqueen(a, n + 1)
end
end
end
end
-- run the program
addqueen({}, 1)
The code's quite commented and the book's quite explicit, but I can't answer the first question:
Exercise 2.1: Modify the eight-queen program so that it stops after
printing the first solution.
At the end of this program, a contains all possible solutions; I can't figure out if addqueen (n, c) should be modified so that a contains only one possible solution or if printsolution (a) should be modified so that it only prints the first possible solution?
Even though I'm not sure to fully understand backtracking, I tried to implement both hypotheses without success, so any help would be much appreciated.
At the end of this program, a contains all possible solutions
As far as I understand the solution, a never contains all possible solutions; it either includes one complete solution or one incomplete/incorrect one that the algorithm is working on. The algorithm is written in a way that simply enumerates possible solutions skipping those that generate conflicts as early as possible (for example, if first and second queens are on the same line, then the second queen will be moved without checking positions for other queens, as they wouldn't satisfy the solution anyway).
So, to stop after printing the first solution, you can simply add os.exit() after printsolution(a) line.
Listing 1 is an alternative to implement the requirement. The three lines, commented respectively with (1), (2), and (3), are the modifications to the original implementation in the book and as listed in the question. With these modifications, if the function returns true, a solution was found and a contains the solution.
-- Listing 1
function addqueen (a, n)
if n > N then -- all queens have been placed?
return true -- (1)
else -- try to place n-th queen
for c = 1, N do
if isplaceok(a, n, c) then
a[n] = c -- place n-th queen at column 'c'
if addqueen(a, n + 1) then return true end -- (2)
end
end
return false -- (3)
end
end
-- run the program
a = {1}
if not addqueen(a, 2) then print("failed") end
printsolution(a)
a = {1, 4}
if not addqueen(a, 3) then print("failed") end
printsolution(a)
Let me start from Exercise 2.2 in the book, which, based on my past experience to explain "backtracking" algorithms to other people, may help to better understand the original implementation and my modifications.
Exercise 2.2 requires to generate all possible permutations first. A straightforward and intuitive solution is in Listing 2, which uses nested for-loops to generate all permutations and validates them one by one in the inner most loop. Although it fulfills the requirement of Exercise 2.2, the code does look awkward. Also it is hard-coded to solve 8x8 board.
-- Listing 2
local function allsolutions (a)
-- generate all possible permutations
for c1 = 1, N do
a[1] = c1
for c2 = 1, N do
a[2] = c2
for c3 = 1, N do
a[3] = c3
for c4 = 1, N do
a[4] = c4
for c5 = 1, N do
a[5] = c5
for c6 = 1, N do
a[6] = c6
for c7 = 1, N do
a[7] = c7
for c8 = 1, N do
a[8] = c8
-- validate the permutation
local valid
for r = 2, N do -- start from 2nd row
valid = isplaceok(a, r, a[r])
if not valid then break end
end
if valid then printsolution(a) end
end
end
end
end
end
end
end
end
end
-- run the program
allsolutions({})
Listing 3 is equivalent to List 2, when N = 8. The for-loop in the else-end block does what the whole nested for-loops in Listing 2 do. Using recursive call makes the code not only compact, but also flexible, i.e., it is capable of solving NxN board and board with pre-set rows. However, recursive calls sometimes do cause confusions. Hope the code in List 2 helps.
-- Listing 3
local function addqueen (a, n)
n = n or 1
if n > N then
-- verify the permutation
local valid
for r = 2, N do -- start from 2nd row
valid = isplaceok(a, r, a[r])
if not valid then break end
end
if valid then printsolution(a) end
else
-- generate all possible permutations
for c = 1, N do
a[n] = c
addqueen(a, n + 1)
end
end
end
-- run the program
addqueen({}) -- empty board, equivalent allsolutions({})
addqueen({1}, 2) -- a queen in 1st row and 1st column
Compare the code in Listing 3 with the original implementation, the difference is that it does validation after all eight queens are placed on the board, while the original implementation validates every time when a queen is added and will not go further to next row if the newly-added queen causes conflicts. This is all what "backtracking" is about, i.e. it does "brute-force" search, it abandons the search branch once it finds a node that will not lead to a solution, and it has to reach a leaf of the search tree to determine it is a valid solution.
Back to the modifications in Listing 1.
(1) When the function hits this point, it reaches a leaf of the search tree and a valid solution is found, so let it return true representing success.
(2) This is the point to stop the function from further searching. In original implementation, the for-loop continues regardless of what happened to the recursive call. With modification (1) in place, the recursive call returns true if a solution was found, the function needs to stop and to propagate the successful signal back; otherwise, it continues the for-loop, searching for other possible solutions.
(3) This is the point the function returns after finishing the for-loop. With modification (1) and (2) in place, it means that it failed to find a solution when the function hits this point, so let it explicitly return false representing failure.

Simple subtraction in Verilog

I've been working on a hex calculator for a while, but seem to be stuck on the subtraction portion, particularly when B>A. I'm trying to simply subtract two positive integers and display the result. It works fine for A>B and A=B. So far I'm able use two 7-segment displays to show the integers to be subtracted and I get the proper difference as long as A>=B
When B>A I see a pattern that I'm not able to debug because of my limited knowledge in Verilog case/if-else statements. Forgive me if I'm not explaining the best way but what I'm observing is that once the first number, A, "reaches" 0 (after being subtracted from) it loops back to F. The remainder of B is then subtracted from F rather than 0.
For example: If A=1, B=3
A - B =
1 - 1 = 0
0 - 1 = F
F - 1 = E
Another example could be 4-8=C
Below are the important snippets of code I've put together thus far.
First, my subtraction statement
always#*
begin
begin
Cout1 = 7'b1000000; //0
end
case(PrintDifference[3:0])
4'b0000 : Cout0 = 7'b1000000; //0
4'b0001 : Cout0 = 7'b1111001; //1
...
4'b1110 : Cout0 = 7'b0000110; //E
4'b1111 : Cout0 = 7'b0001110; //F
endcase
end
My subtraction is pretty straightforward
output [4:0]Difference;
output [4:0] PrintDifference;
assign PrintDifference = A-B;
I was thinking I could just do something like
if A>=B, Difference = B-A
else, Difference = A-B
Thank you everyone in advance!
This is expected behaviour of twos complement addition / subtraction which I would recommend reading up on since it is so essential.
The result obtained can be changed back into an unsigned form by inverting all the bits and adding one. Checking the most significant bit will tell you if the number is negative or not.

how to turn this loops to big-o notation

The question:
For the pseudo-code given below with T, being the or instruction period to run the i-th line, provide total execution time in big-O notation.
// get a positive integer from input
if n > 10
print "this might take a while"
for k=1 to n
for j = 1 to k
print k*j
print "Done!"
Actually I know what that code does but I can't understand how to type this in big-O notation?
EDIT: loop as php
for k=1 to n
for j = 1 to k
print k*j
The outer loop will iterate n times, that part is easy. Since it does no work other than running the inner loop we can ignore it for the purposes of Big O calculation. The inner loop will iterate 1 + 2 + 3 + 4 ... + n times which is a triangular number or (n*(n+1))/2. Big O notation ignores constants, so that can be simplified to O(n*n) or O(n^2).
It's worth noting that the worst, best and average case for this algorithm are all the same.
The inner loop is run n*(n+1)/2, so it's O(n^2)

Resources