I try the code from the squeen.icl example. When I try it with BoardSize :== 11, there is no problem. But when I change it to 12, the output is [. Why? How to fix that?
module squeen
import StdEnv
BoardSize :== 12
Queens::Int [Int] [[Int]] -> [[Int]]
Queens row board boards
| row>BoardSize = [board : boards]
| otherwise = TryCols BoardSize row board boards
TryCols::Int Int [Int] [[Int]] -> [[Int]]
TryCols 0 row board boards = boards
TryCols col row board boards
| Save col 1 board = TryCols (col-1) row board queens
| otherwise = TryCols (col-1) row board boards
where queens = Queens (row+1) [col : board] boards
Save::!Int !Int [Int] -> Bool
Save c1 rdiff [] = True
Save c1 rdiff [c2:cols]
| cdiff==0 || cdiff==rdiff || cdiff==0-rdiff = False
| otherwise = Save c1 (rdiff+1) cols
where cdiff = c1 - c2
Start::(Int,[Int])
Start = (length solutions, hd solutions)
where solutions = Queens 1 [] []
This is because you're running out of space on the heap. By default, the heap of Clean programs is set to 2M. You can change this, of course. When using clm from the command line, you can add -h 4M to its command line or to the command line of the clean program itself. If you're using the Clean IDE, you can change the heap size through Project Options, Application.
The reason that ( is still printed (which is what I get, rather than [), is the following. A Clean program will output as much of its output as possible, rather than waiting until the whole output is known. This means, for example, that a simple line as Start = [0..] will spam your terminal, not wait until the whole infinite list is in memory and then print it. In the case of squeen.icl, Clean sees that the result of Start will be a tuple, and therefore prints the opening brace directly. However, when trying to compute the elements of the tuple (length solutions and hd solutions), the heap fills up, making the program terminate.
I don't know what it looks like when you get a full heap on Windows, but on Linux(/Mac), it looks like this:
$ clm squeen -o squeen && ./squeen -h 2M
Linking squeen
Heap full.
Execution: 0.13 Garbage collection: 0.03 Total: 0.16
($
Note that the tuple opening brace is on the last line. So, when using a terminal it is quite easy to spot this error.
Interestingly, since length exploits tail recursion, the first element of the tuple can be computed, even with a small heap (you can try this by replacing the second element with []). Also the second element of the tuple can be computed on a small heap (replace the first element with 0).
The point is that the length is computed before the head, since it has to be printed the first. While with a normal length call parts of the list are garbage collected (after iterating over the first 100 elements, they can be discarded, allowing for smaller heap usage), the hd call makes sure that the first element of the list is not discarded. If the first element is not discarded, than neither can the second be, the third, etc. Hence, the whole list is kept in memory, while this is not actually necessary. Flipping the length and hd calls solve the issue:
Start :: ([Int], Int)
Start = (hd solutions, length solutions)
where solutions = Queens 1 [] []
Now, after hd has been called, there is no reason to keep the whole list in memory, so length can discard elements it has iterated over, and the heap doesn't fill up.
Related
OpenCL kernel crunches some numbers. This particular kernel then searches an array of 8 bit char4 vectors for a matching string of numbers. For example, array holds 3 67 8 2 56 1 3 7 8 2 0 2 - the kernel loops over that (actual string is 1024 digits long) and searches for 1 3 7 8 2 and "returns" data letting the host program know it found a match.
In an combo learning exercise/programming experiment I wanted to see if I could loop over an array and search for a range of values, where the array is not just char values, but char4 vectors, WITHOUT using a single if statement in the kernel. Two reasons:
1: After half an hour of getting compile errors I realized that you cannot do:
if(charvector[3] == searchvector[0])
Because some may match and some may not. And 2:
I'm new to OpenCL and I've read a lot about how branches can hurt a kernel's speed, and if I understand the internals of kernels correctly, some math may actually be faster than if statements. Is that the case?
Anyway... first, the kernel in question:
void search(__global uchar4 *rollsrc, __global uchar *srch, char srchlen)
{
size_t gx = get_global_id(0);
size_t wx = get_local_id(0);
__private uint base = 0;
__local uchar4 queue[8092];
__private uint chunk = 8092 / get_local_size(0);
__private uint ctr, start, overlap = srchlen-1;
__private int4 srchpos = 0, srchtest = 0;
uchar4 searchfor;
event_t e;
start = max((int)((get_group_id(0)*32768) - overlap), 0);
barrier(CLK_LOCAL_MEM_FENCE);
e = async_work_group_copy(queue, rollsrc+start, 8092, 0);
wait_group_events(1, &e);
for(ctr = 0; ctr < chunk+overlap; ctr++) {
base = min((uint)((get_group_id(0) * chunk) + ctr), (uint)((N*32768)-1));
searchfor.x = srch[max(srchpos.x, 0)];
searchfor.y = srch[max(srchpos.y, 0)];
searchfor.z = srch[max(srchpos.z, 0)];
searchfor.w = srch[max(srchpos.w, 0)];
srchpos += max((convert_int4(abs_diff(queue[base], searchfor))*-100), -100) | 1;
srchpos = max(srchpos, 0);
srchtest = clamp(srchpos-(srchlen-1), 0, 1) << 31;
srch[0] |= (any(srchtest) * 255);
// if(get_group_id(0) == 0 && get_local_id(0) == 0)
// printf("%u: %v4u %v4u\n", ctr, srchpos, srchtest);
}
barrier(CLK_LOCAL_MEM_FENCE);
}
There's extra unneeded code in there, this was a copy from a previous kernel, and I havent cleaned up the extra junk yet. That being said.. in short and in english, how the math based if statement works:
Since I need to search for a range, and I'm searching a vector, I first set a char4 vector (searchfor) to have elements xyzw individually set to the number I am searching for. It's done individually because each of xyz and w hold a different stream, and the search counter - how many matches in a row we've had - will be different for each of the members of the vector. I'm sure there's a better way to do it than what I did. Suggestions?
So then, an int4 vector, searchpos, which holds the current position in the search array for each of the 4 vector positions, gets this added to it:
max((convert_int4(abs_diff(queue[base], searchfor))*-100), -100) | 1;
What this does: Take the ABS difference between the current location in the target queue (queue) and the searchfor vector set in the previous 4 lines. A vector is returned where each member will have either a positive number (not a match) or zero (a match - no difference).
It's converted to int4 (as uchar cannot be negative) then multipled by -100, then run through max(x,-100). Now the vector is either -100, or 0. We OR it with 1 and now it's -99 or 1.
End result: searchpos either increments by 1 (a match), or is reduced by 99, resetting any previous partial match increments. (Searches can be up to 96 characters long - there exists a chance to match 91, then miss, so it has to be able to wipe that all out). It is then max'ed with 0 so any negative result is clamped to zero. Again - open to suggestions to make that more efficient. I realized as I was writing this I could probably use addition with saturation to remove some of the max statements.
The last part takes the current srchpos, which now equals the number of consecutive matches, subtracts 1 less than the length of the search string, then clamps it to 0-1, thus ending up with either a 1 - a full match, or 0. We bit shift this << 31. Result is 0, or 0x8000000. Put this into srchtest.
Lastly, we bitwise OR the first character of the search string with the result of any(srchtest) * 255 - it's one of the few ways (I'm aware of) to test something across a vector and return a single integer from it. (any() returns 1 if any member of the vector has it's MSB set - which we set in the line above)
End result? srch[0] is unchanged, or, in the case of a match, it's set to 0xff. When the kernel returns, the host can read back srch from the buffer. If the first character is 0xff, we found a match.
It probably has too many steps and can be cleaned up. It also may be less efficient than just doing 4 if checks per loop. Not sure.
But, after this massive post, the thing that has me pulling my hair out:
When I UNCOMMENT the two lines at the end that prints debug information, the script works. This is the end of the output on my terminal window as I run it:
36: 0,0,0,0 0,0,0,0
37: 0,0,0,0 0,0,0,0
38: 0,0,0,0 0,0,0,0
39: 0,0,0,0 0,0,0,0
Search = 613.384 ms
Positive
Done read loop: -1 27 41
Positive means the string was found. The -1 27 41 is the first 3 characters of the search string, the first being set to -1 (signed char on the host side).
Here's what happens when I comment out the printf debugging info:
Search = 0.150 ms
Negative
Done read loop: 55 27 41
IT DOES NOT FIND IT. What?! How is that possible? Of course, I notice that the script execution time jumps from .15ms to 600+ms because of the printf, so I think, maybe it's somehow returning and reading the data BEFORE the script ends, and the extra delay from the printf gives it a pause. So I add a barrier(CLK_LOCAL_MEM_FENCE); to the end, thinking that will make sure all threads are done before returning. Nope. No effect. I then add in a 2 second sleep on the host side, after running the kernel, after running clFinish, and before running clReadBuffer.
NOPE! Still Negative. But I put the printf back in - and it works. How is that possible? Why? Does anyone have any idea? This is the first time I've had a programming bug that baffled me to the point of pulling hair out, because it makes absolutely zero sense. The work items are not clashing, they each read their own block, and even have an overlap in case the search string is split across two work item blocks.
Please - save my hair - how can a printf of irrelevant data cause this to work and removing it causes it to not?
Oh - one last fun thing: If I remove the parameters from the printf - just have it print text like "grr please work" - the kernel returns a negative, AND, nothing prints out. The printf is ignored.
What the heck is going on? Thanks for reading, I know this was absurdly long.
For anyone referencing this question in the future, the issue was caused by my arrays being read out of bounds. When that happens, all heck breaks loose and all results are unpredictable.
Once I fixed the work and group size and made sure I was not exceeding the memory bounds, it worked as expected.
I need to implement a tree structure in Fortran for a project, so I've read various guides online explaining how to do it. However, I keep getting errors or weird results.
Let's say I want to build a binary tree where each node stores an integer value. I also want to be able to insert new values into a tree and to print the nodes of the tree. So I wrote a type "tree" that contains an integer, two pointers towards the children sub-trees and a boolean which I set to .true. if there are no children sub-trees:
module class_tree
implicit none
type tree
logical :: isleaf
integer :: value
type (tree), pointer :: left,right
end type tree
interface new
module procedure newleaf
end interface
interface insert
module procedure inserttree
end interface
interface print
module procedure printtree
end interface
contains
subroutine newleaf(t,n)
implicit none
type (tree), intent (OUT) :: t
integer, intent (IN) :: n
t % isleaf = .true.
t % value = n
nullify (t % left)
nullify (t % right)
end subroutine newleaf
recursive subroutine inserttree(t,n)
implicit none
type (tree), intent (INOUT) :: t
integer, intent (IN) :: n
type (tree), target :: tleft,tright
if (t % isleaf) then
call newleaf(tleft,n)
call newleaf(tright,n)
t % isleaf = .false.
t % left => tleft
t % right => tright
else
call inserttree(t % left,n)
endif
end subroutine inserttree
recursive subroutine printtree(t)
implicit none
type (tree), intent (IN) :: t
if (t % isleaf) then
write(*,*) t % value
else
write(*,*) t % value
call printtree(t % left)
call printtree(t % right)
endif
end subroutine printtree
end module class_tree
The insertion is always done into the left sub-tree unless trying to insert into a leaf. In that case, the insertion is done into both sub-trees to make sure a node has always 0 or 2 children. The printing is done in prefix traversal.
Now if I try to run the following program:
program main
use class_tree
implicit none
type (tree) :: t
call new(t,0)
call insert(t,1)
call insert(t,2)
call print(t)
end program main
I get the desired output 0 1 2 2 1. But if I add "call insert(t,3)" after "call insert(t,2)" and run again, the output is 0 1 2 0 and then I get a segfault.
I tried to see whether the fault happened during insertion or printing so I tried to run:
program main
use class_tree
implicit none
type (tree) :: t
call new(t,0)
call insert(t,1)
call insert(t,2)
write(*,*) 'A'
call insert(t,3)
write(*,*) 'B'
call print(t)
end program main
It makes the segfault go away but I get a very weird output A B 0 1 2673568 6 1566250180.
When searching online for similar errors, I got results like here where it says it might be due to too many recursive calls. However, the call to insert(t,3) should only contain 3 recursive calls... I've also tried to compile using gfortran with -g -Wall -pedantic -fbounds-check and run with a debugger. It seems the fault happens at the "if (t % isleaf)" line in the printing subroutine, but I have no idea how to make sense of that.
Edit:
Following the comments, I have compiled with -g -fbacktrace -fcheck=all -Wall in gfortran and tried to check the state of the memory. I'm quite new to this so I'm not sure I'm using my debugger (gdb) correctly.
After the three insertions and before the call to print, it seems that everything went well: for example when I type p t % left % left % right % value in gdb I get the expected output (that is 3). If I just type p t, the output is (.FALSE.,0,x,y), where x and y are hexadecimal numbers (memory addresses, I guess). However, if I try p t % left, I get something like a "description" of the pointer:
PTR TO -> (Type tree
logical(kind=4) :: isleaf
integer(kind=4) :: value
which repeats itself a lot since each pointer points to a tree that contains two pointers. I would have expected an output similar to that of p t, but I have no idea whether that's normal.
I also tried to examine the memory: for example if I type x/4uw t % left, I get 4 words, the first 2 words seem to correspond to isleaf and value, the last 2 to memory addresses. By following the memory addresses like that, I managed to visit all the nodes and I didn't find anything wrong.
The segfault happens within the printing routine. If I type p t after the fault, it says I cannot access the 0x0 address. Does that mean my tree is somehow modified when I try to print it?
The reason for your problems is the fact, that variables, which get out of scope, are not valid anymore. This is in contrast to languages like Python, where the number of existing pointers is relevant (refcount).
In your particular case, this means, that the calls to newleaf(left, n) and newleaf(right, n) set the values of left and right, resp., but these variables get ouf of scope and, thus, invalid.
A better approach is to allocate each leaf as it is needed (except the first one, since this is already allocated and will not get out of scope till the end of the program).
recursive subroutine inserttree(t,n)
implicit none
type (tree), intent (INOUT) :: t
integer, intent (IN) :: n
if (t % isleaf) then
allocate(t%left)
allocate(t%right)
call newleaf(t%left,n)
call newleaf(t%right,n)
t % isleaf = .false.
else
call inserttree(t % left,n)
endif
end subroutine inserttree
I've created the following toy example that counts in a loop and writes the value to an Async.Pipe:
open Sys
open Unix
open Async.Std
let (r,w) = Pipe.create ()
let rec readloop r =
Pipe.read r >>=
function
| `Eof -> return ()
| `Ok v -> return (printf "Got %d\n" v) >>=
fun () -> after (Core.Time.Span.of_sec 0.5) >>=
fun () -> readloop r
let countup hi w =
let rec loop i =
printf "i=%d\n" i ;
if (i < hi &&( not (Pipe.is_closed w))) then
Pipe.write w i >>>
fun () -> loop (i+1)
else Pipe.close w
in
loop 0
let () =
countup 10 w;
ignore(readloop r);;
Core.Never_returns.never_returns (Scheduler.go ())
Notice the readloop function is recursive - it just continuously reads values from the Pipe as they are available. However, I've added a delay there of 0.5 sec between each read. The countup function is kind of similar but it loops and does a write to the same Pipe.
When I run this I get:
i=0
i=1
Got 0
i=2
Got 1
i=3
Got 2
i=4
Got 3
i=5
Got 4
i=6
Got 5
i=7
Got 6
i=8
Got 7
i=9
Got 8
i=10
Got 9
Aside from the first three lines of output above, all the rest of the output lines seem to need to wait the half second. So it seems that the Pipe is blocked after a write until there is a read from the Pipe. (Pipe.write w data appears to block waiting for a Pipe.read r )
What I thought should happen (since this is an Async Pipe of some sort) is that values would be queued up in the Pipe until the reads take place, something like:
i=0
Got 0 (* now reader side waits for 1/2 second before reading again *)
i=1 (* meanwhile writer side keeps running *)
i=2
i=3
i=4
i=5
i=6
i=7
i=8
i=9 (* up till here, all output happens pretty much simultaneously *)
Got 1 (* 1/2 second between these messages *)
Got 2
Got 3
Got 4
Got 5
Got 6
Got 7
Got 8
Got 9
I'm wondering if there is a way to get the behavior using Async?
My real usecase is that I've got a Tcp socket open (as a client) and if I were using threads after some setup between the client and the server I would start a thread that just sits and reads data coming in from the socket from the server and put that data into a queue of messages that can be examined in the main thread of the program when it's ready. However, instead of using threads I want to use Core.Async to achieve the same thing: Read data from the socket as it comes in from the server and when data is available, examine the message and do something based on it's content. There could be other things going on as well, so this is simulated by the "wait half a second" in the code above. I thought Pipe would queue up the messages so that they could be read when the reader side was ready, but that doesn't seem to be the case.
Indeed, pipe is a queue, but by default its length is set to 0. So that, when you're pushbacking, a producer will stop immediately and wait. You can control the size with a set_size_budget function.
I was trying some basic pointer manipulation and have a issue i would like clarified. Here is the code snippet I am referring to
int arr[3] = {0};
*(arr+0) = 12;
*(arr+1) = 24;
*(arr+2) = 74;
*(arr+3) = 55;
cout<<*(arr+3)<<"\t"<<(long)(arr+3)<<endl;
//cout<<"Address of array arr : "<<arr<<endl;
cout<<(long)(arr+0)<<"\t"<<(long)(arr+1)<<"\t"<<(long)(arr+2)<<endl;;
for(int i=0;i<4;i++)
cout<<*(arr+i)<<"\t"<<i<<"\t"<<(long)(arr+i)<<endl;
//*(arr+3) = 55;
cout<<*(arr+3)<<endl<<endl;
My problem is:
When I try to acces arr+3 outside the for-loop , I get the desired value 55 printed. But when I try to access it through the for loop, I get some different value(3 in this case). After the for loop, it is printing the value as 4. Could someone explain to me what is happening? Thanks in advance..
You have created an array of size 3 and you are trying to access the 4th element. The outcome is therefore undefined.
Since you allocate the array in the stack, the first time you try to write the 4th element, you are actually writing beyond the space that was allocated for the stack. In Debug mode this will work, but in Release your program will probably crash.
The second time you are reading the value at the 4th place you are reading the value 4. This makes sense, as the compiler has allocated the stack space after the array for variable i, which after the loop has finished executing will have the value 4.
As array has been defined with 3 elements, data will be stored sequentially like 12,24,74. When you assign 55 for 4th element, it is stored somewhere else in memory, not sequentially. First time, Compiler prints it correctly, but then it is not able to handle memory so it prints garbage value.
I'm working on a p2p app that uses hash trees.
I am writing the hash tree construction functions (publ/4 and publ_top/4) but I can't see how to fix publ_top/4.
I try to build a tree with publ/1:
nivd:publ("file.txt").
prints hashes...
** exception error: no match of right hand side value [67324168]
in function nivd:publ_top/4
in call from nivd:publ/1
The code in question is here:
http://github.com/AndreasBWagner/nivoa/blob/886c624c116c33cc821b15d371d1090d3658f961/nivd.erl
Where do you think the problem is?
Thank You,
Andreas
Looking at your code I can see one issue that would generate that particular exception error
publ_top(_,[],Accumulated,Level) ->
%% Go through the accumulated list of hashes from the prior level
publ_top(string:len(Accumulated),Accumulated,[],Level+1);
publ_top(FullLevelLen,RestofLevel,Accumulated,Level) ->
case FullLevelLen =:= 1 of
false -> [F,S|T]=RestofLevel,
io:format("~w---~w~n",[F,S]),
publ_top(FullLevelLen,T,lists:append(Accumulated,[erlang:phash2(string:concat([F],[S]))]),Level);
true -> done
end.
In the first function declaration you match against the empty list. In the second declaration you match against a list of length (at least) 2 ([F,S|T]). What happens when FullLevelLen is different from 1 and RestOfLevel is a list of length 1? (Hint: You'll get the above error).
The error would be easier to spot if you would pattern match on the function arguments, perhaps something like:
publ_top(_,[],Accumulated,Level) ->
%% Go through the accumulated list of hashes from the prior level
publ_top(string:len(Accumulated),Accumulated,[],Level+1);
publ_top(1, _, _, _) ->
done;
publ_top(_, [F,S|T], Accumulated, Level) ->
io:format("~w---~w~n",[F,S]),
publ_top(FullLevelLen,T,lists:append(Accumulated,[erlang:phash2(string:concat([F],[S]))]),Level);
%% Missing case:
% publ_top(_, [H], Accumulated, Level) ->
% ...