Lets say you want 5 threads to process data simultaneous. Also assume, you have 89 tasks to process.
Off the bat you know 89 / 5 = 17 with a remainder of 4. The best way to split up tasks would be to have 4 (the remainder) threads process 18 (17+1) tasks each and then have 1 (# threads - remainder) thread to process 17.
This will eliminate the remainder. Just to verify:
Thread 1: Tasks 1-18 (18 tasks)
Thread 2: Tasks 19-36 (18 tasks)
Thread 3: Tasks 37-54 (18 tasks)
Thread 4: Tasks 55-72 (18 tasks)
Thread 5: Tasks 73-89 (17 tasks)
Giving you a total of 89 tasks completed.
I need a way of getting the start and ending range of each thread mathematically/programmability; where the following should print the exact thing I have listed above:
$NumTasks = 89
$NumThreads = 5
$Remainder = $NumTasks % $NumThreads
$DefaultNumTasksAssigned = floor($NumTasks / $NumThreads)
For $i = 1 To $NumThreads
if $i <= $Remainder Then
$NumTasksAssigned = $DefaultNumTasksAssigned + 1
else
$NumTasksAssigned = $DefaultNumTasksAssigned
endif
$Start = ??????????
$End = ??????????
print Thread $i: Tasks $Start-$End ($NumTasksAssigned tasks)
Next
This should also work for any number of $NumTasks.
Note: Please stick to answering the math at hand and avoid suggesting or assuming the situation.
Why? Rather then predetermining the scheduling order, stick all of the tasks on a queue, and then have each thread pull them off one by one when they're ready. Then your tasks will basically run "as fast as possible".
If you pre-allocated, then one thread may be doing a particularly long bit of processing and blocking the running of all the tasks stuck behind it. Using the queue, as each task finishes and a thread frees up, it grabs the next task and keeps going.
Think of it like a bank with 1 line per teller vs one line and a lot of tellers. In the former, you might get stuck behind the person depositing coins and counting it out one by one, the latter you get to the next available teller, while Mr. PocketChange counts away.
I second Will Hartung 's remark. You may just feed them one task at a time (or a few tasks at a at a time, depending if there's much overhead, i.e. if individual tasks get typically completed very fast, relative to the cost of starting/recycling threads). Your subsequent comments effectively explain that your "threads" carry a heavy creation cost, and hence your desire to feed them once with as much work as possible, rather than wasting time creating new "thread" each fed a small amount of work.
Anyway... going to the math question...
If you'd like to assign tasks just once, the following formula, plugged in lieu of the the ????????? in your logic, should do the trick:
$Start = 1
+ (($i -1) * ($DefaultNumTasksAssigned + 1)
- (floor($i / ($Remainder + 1)) * ($i - $Remainder))
$End = $Start + $NumTasksAssigned -1
The formula is explained as follow:
1 is for the fact that your display / logic is one-based not zero-based
The second term is because we generally add ($DefaultNumTasksAssigned + 1) with each iteration.
The third term provides a correction for the last few iterations.
Its first part, (floor($i / ($Remainder + 1)) provides 0 until $i reaches the first thread
that doesn't receive one extra task, and 1 thereafter.
The second part express by how much we need to correct.
The formula for $End is easier, the only trick is the minus 1, it is because the Start and End values are inclusive (so for example between 1 an 19 there are 19 tasks not 18)
The following slightly modified piece of logic should also work, it avoids the "fancy" formula by keeping a running tab of the $Start variable, rather than recomputing it each time..
$NumTasks = 89
$NumThreads = 5
$Remainder = $NumTasks % $NumThreads
$DefaultNumTasksAssigned = floor($NumTasks / $NumThreads)
$Start = 1
For $i = 1 To $NumThreads
if $i <= $Remainder Then // fixed here! need <= because $i is one-based
$NumTasksAssigned = $DefaultNumTasksAssigned + 1
else
$NumTasksAssigned = $DefaultNumTasksAssigned
endif
$End = $Start + $NumTasksAssigned -1
print Thread $i: Tasks $Start-$End ($NumTasksAssigned tasks)
$Start = $Start + $NumTasksAssigned
Next
Here's a Python transcription of the above
>>> def ShowWorkAllocation(NumTasks, NumThreads):
... Remainder = NumTasks % NumThreads
... DefaultNumTasksAssigned = math.floor(NumTasks / NumThreads)
... Start = 1
... for i in range(1, NumThreads + 1):
... if i <= Remainder:
... NumTasksAssigned = DefaultNumTasksAssigned + 1
... else:
... NumTasksAssigned = DefaultNumTasksAssigned
... End = Start + NumTasksAssigned - 1
... print("Thread ", i, ": Tasks ", Start, "-", End, "(", NumTasksAssigned,")")
... Start = Start + NumTasksAssigned
...
>>>
>>> ShowWorkAllocation(89, 5)
Thread 1 : Tasks 1 - 18 ( 18 )
Thread 2 : Tasks 19 - 36 ( 18 )
Thread 3 : Tasks 37 - 54 ( 18 )
Thread 4 : Tasks 55 - 72 ( 18 )
Thread 5 : Tasks 73 - 89 ( 17 )
>>> ShowWorkAllocation(11, 5)
Thread 1 : Tasks 1 - 3 ( 3 )
Thread 2 : Tasks 4 - 5 ( 2 )
Thread 3 : Tasks 6 - 7 ( 2 )
Thread 4 : Tasks 8 - 9 ( 2 )
Thread 5 : Tasks 10 - 11 ( 2 )
>>>
>>> ShowWorkAllocation(89, 11)
Thread 1 : Tasks 1 - 9 ( 9 )
Thread 2 : Tasks 10 - 17 ( 8 )
Thread 3 : Tasks 18 - 25 ( 8 )
Thread 4 : Tasks 26 - 33 ( 8 )
Thread 5 : Tasks 34 - 41 ( 8 )
Thread 6 : Tasks 42 - 49 ( 8 )
Thread 7 : Tasks 50 - 57 ( 8 )
Thread 8 : Tasks 58 - 65 ( 8 )
Thread 9 : Tasks 66 - 73 ( 8 )
Thread 10 : Tasks 74 - 81 ( 8 )
Thread 11 : Tasks 82 - 89 ( 8 )
>>>
I think you've solved the wrong half of your problem.
It's going to be virtually impossible to precisely determine the time it will take to complete all your tasks, unless all of the following are true:
your tasks are 100% CPU-bound: that is, they use 100% CPU while running and don't need to do any I/O
none of your tasks have to synchronize with any of your other tasks in any way
you have exactly as many threads as you have CPUs
the computer that is running these tasks is not performing any other interesting tasks at the same time
In practice, most of the time, your tasks are I/O-bound rather than CPU-bound: that is, you are waiting for some external resource such as reading from a file, fetching from a database, or communicating with a remote computer. In that case, you only make things worse by adding more threads, because they're all contending for the same scarce resource.
Finally, unless you have some really weird hardware, it's unlikely you can actually have exactly five threads running simultaneously. (Usually processor configurations come in multiples of at least two.) Usually the sweet spot is at about 1 thread per CPU if your tasks are very CPU-bound, about 2 threads per CPU if the tasks spend half their time being CPU-bound and half their time doing IO, etc.
tl;dr: We need to know a lot more about what your tasks and hardware look like before we can advise you on this question.
Related
while working on the exercise 2.2 of "programming in Lua 4" I do have to create a function to built all permutations of the numbers 1-8. I decided to use Heaps algorithm und made the following script. I´m testing with numbers 1-3.
In the function I store the permutations as tables {1,2,3} {2,1,3} and so on into local "a" and add them to global "perm". But something runs wrong and at the end of the recursions I get the same permutation on all slots. I can´t figure it out. Please help.
function generateperm (k,a)
if k == 1 then
perm[#perm + 1] = a -- adds recent permutation to table
io.write(table.unpack(a)) -- debug print. it shows last added one
io.write("\n") -- so I can see the algorithm works fine
else
for i=1,k do
generateperm(k-1,a)
if k % 2 == 0 then -- builts a permutation
a[i],a[k] = a[k],a[i]
else
a[1],a[k] = a[k],a[1]
end
end
end
end
--
perm = {}
generateperm(3,{1,2,3}) -- start
--
for k,v in ipairs (perm) do -- prints all stored permutations
for k,v in ipairs(perm[k]) do -- but it´s 6 times {1,2,3}
io.write(v)
end
io.write("\n")
end
debug print:
123
213
312
132
231
321
123
123
123
123
123
123
I am trying to pull information from two different dictionaries. (excuse me because I am literally hacking to understand.)
I have a for loop that gives me the vmname. I have another for loop that gives me the other information like 'replicationid'.
I could be doing a very huge assumption here but hey ill start there. what I want to do it to integrate for loop 1 and for loop 2. as so the results are like this, is it even possible?
initial output of for loop1 which I can get:
vma
vmb
vmc
initial output of for loop2 which I can get:
replication job 1
replication job 2
replication job 3
desired results is:
vma
replication job 1
vmb
replication job 2
vmc
replication job 3
def get_replication_job_status():
sms = boto3.client('sms')
resp = sms.get_replication_jobs()
#print(resp)
things = [(cl['replicationJobId'], cl['serverId']) for cl in
resp['replicationJobList']]
thangs = [cl['vmServer'] for cl in resp['replicationJobList']]
for i in thangs:
print()
print("this is vm " + (i['vmName']))
print("this is the vm location " + (i['vmPath']))
print("this is the vm address, " +(str(i['vmServerAddress'])))
for j in things:
print("The Replication ID is : " +(str(j[0])))
again I want:
vma
replication job 1
vmb
replication job 2
vmc
replication job 3
im am getting:
vma
replication job 1
replication job 2
replication job 3
vmb
replication job 1
replication job 2
replication job 3
..
..
..
If you are sure that your lists both have the same length, then what you need is python built-in zip function:
for thing, thang in zip(things, thangs):
print()
print(thing)
print(thang)
But if one of the lists is longer then another then zip will crop both lists to have the same length as the shortest, for example:
>>> for i, j in zip(range(3), range(5)):
... print(i, j)
...
(0, 0)
(1, 1)
(2, 2)
UPD:
You can also unpack your tuples right in for loop definition, so each item (they are 2-tuples) in things list gets saved to two variables:
for (replicationJobId, serverId), thang in zip(things, thangs):
print()
print(replicationJobId)
print(serverId)
print(thang)
UPD 2:
Why do you split resp into to two lists?
def get_replication_job_status():
sms = boto3.client('sms')
resp = sms.get_replication_jobs()
#print(resp)
for replication_job in resp['replicationJobList']:
vm_server = replication_job['vmServer']
print()
print("this is vm:", vm_server['vmName'])
print("this is the vm location:", vm_server['vmPath'])
print("this is the vm address:", vm_server['vmServerAddress'])
print("The Replication ID is :", replication_job['replicationJobId'])
Here is the mcve to demonstrate losing workers over time. This is a followup to
Distributing graphs to across cluster nodes
The example is not quite minimal but it does give an idea of our typical work patterns. The sleep is necessary to cause the problem. This occurs in the full application because of the need to generate a large graph from previous results.
When I run this on a cluster, I use dask-ssh to get 32 workers over 8 nodes:
dask-ssh --nprocs 4 --nthreads 1 --scheduler-port 8786 --log-directory `pwd` --hostfile hostfile.$JOBID &
sleep 10
It should run in less than about 10 minutes with the full set of workers. I follow the execution on the diagnostics screen. Under events, I see the workers being added but then I sometimes but not always see removal of a number of workers, usually leaving only those on the node hosting the scheduler.
""" Test to illustrate losing workers under dask/distributed.
This mimics the overall structure and workload of our processing.
Tim Cornwell 9 Sept 2017
realtimcornwell#gmail.com
"""
import numpy
from dask import delayed
from distributed import Client
# Make some randomly located points on 2D plane
def init_sparse(n, margin=0.1):
numpy.random.seed(8753193)
return numpy.array([numpy.random.uniform(margin, 1.0 - margin, n),
numpy.random.uniform(margin, 1.0 - margin, n)]).reshape([n, 2])
# Put the points onto a grid and FFT, skip to save time
def grid_data(sparse_data, shape, skip=100):
grid = numpy.zeros(shape, dtype='complex')
loc = numpy.round(shape * sparse_data).astype('int')
for i in range(0, sparse_data.shape[0], skip):
grid[loc[i,:]] = 1.0
return numpy.fft.fft(grid).real
# Accumulate all psfs into one psf
def accumulate(psf_list):
lpsf = 0.0 * psf_list[0]
for p in psf_list:
lpsf += p
return lpsf
if __name__ == '__main__':
import sys
import time
start=time.time()
# Process nchunks each of length len_chunk 2d points, making a psf of size shape
len_chunk = int(1e6)
nchunks = 16
shape=[512, 512]
skip = 100
# We pass in the scheduler from the invoking script
if len(sys.argv) > 1:
scheduler = sys.argv[1]
client = Client(scheduler)
else:
client = Client()
print("On initialisation", client)
sparse_graph = [delayed(init_sparse)(len_chunk) for i in range(nchunks)]
sparse_graph = client.compute(sparse_graph, sync=True)
print("After first sparse_graph", client)
xfr_graph = [delayed(grid_data)(s, shape=shape, skip=skip) for s in sparse_graph]
xfr = client.compute(xfr_graph, sync=True)
print("After xfr", client)
tsleep = 120.0
print("Sleeping now for %.1f seconds" % tsleep)
time.sleep(tsleep)
print("After sleep", client)
sparse_graph = [delayed(init_sparse)(len_chunk) for i in range(nchunks)]
# sparse_graph = client.compute(sparse_graph, sync=True)
xfr_graph = [delayed(grid_data)(s, shape=shape, skip=skip) for s in sparse_graph]
psf_graph = delayed(accumulate)(xfr_graph)
psf = client.compute(psf_graph, sync=True)
print("*** Successfully reached end in %.1f seconds ***" % (time.time() - start))
print(numpy.max(psf))
print("After psf", client)
client.shutdown()
exit()
Grep'ing a typical run for Client shows:
On initialisation <Client: scheduler='tcp://sand-8-17:8786' processes=16 cores=16>
After first sparse_graph <Client: scheduler='tcp://sand-8-17:8786' processes=16 cores=16>
After xfr <Client: scheduler='tcp://sand-8-17:8786' processes=16 cores=16>
After sleep <Client: scheduler='tcp://sand-8-17:8786' processes=4 cores=4>
After psf <Client: scheduler='tcp://sand-8-17:8786' processes=4 cores=4>
Thanks,
Tim
It's not quite clear why this works but it did. We were using dask-ssh but needed more control over the creation of the workers. Eventually we settled on:
scheduler=$(head -1 hostfile.$JOBID)
hostIndex=0
for host in `cat hostfile.$JOBID`; do
echo "Working on $host ...."
if [ "$hostIndex" = "0" ]; then
echo "run dask-scheduler"
ssh $host dask-scheduler --port=8786 &
sleep 5
fi
echo "run dask-worker"
ssh $host dask-worker --host ${host} --nprocs NUMBER_PROCS_PER_NODE \
--nthreads NUMBER_THREADS \
--memory-limit 0.25 --local-directory /tmp $scheduler:8786 &
sleep 1
hostIndex="1"
done
echo "Scheduler and workers now running"
The following code writes to the screen every iteration. Based on my understanding of the DateDiff documentation, it should only write every 30 seconds. What did I do wrong?
lasttime = Now
Do While Not data.eof
'looping through database records
if DateDiff(s,lasttime,Now) >= 30 Then
lasttime = Now
WScript.Echo "It's been 30 seconds..."
End if
Loop
Change this line:
if DateDiff(s,lasttime,Now) >= 30 Then
To this (note the quotes around "s")
if DateDiff("s",lasttime,Now) >= 30 Then
I was implementing adjacency list in Pascal (by first reading edge end points, and then using dynamic arrays to assign required amount of memory to edgelist of each node). The program executes fine, gives correct outputs but gives runtime error 216 just before exiting.
The code is :
type aptr = array of longint;
var edgebuf:array[1..200000,1..2] of longint;
ptrs:array[1..100000] of longint;
i,j,n,m:longint;
elist:array[1..100000] of aptr;
{main}
begin
readln(n,m);
fillchar(ptrs,sizeof(ptrs),#0);
for i:=1 to m do begin
readln(edgebuf[i][1],edgebuf[i][2]);
inc(ptrs[edgebuf[i][1]]);
end;
for i:=1 to n do begin
setlength(elist[i],ptrs[i]);
end;
fillchar(ptrs,sizeof(ptrs),#0);
for i:=1 to m do begin
inc(ptrs[edgebuf[i][1]]);
elist[edgebuf[i][1]][ptrs[edgebuf[i][1]]]:=edgebuf[i][2];
end;
for i:=1 to n do begin
writeln(i,' begins');
for j:=1 to ptrs[i] do begin
write(j,' ',elist[i][j],' ');
end;
writeln();
writeln(i,' ends');
end;
writeln('bye');
end.
When run on file
4 5
1 2
3 2
4 3
2 1
2 3
gives output:
1 begins
1 2
1 ends
2 begins
1 1 2 3
2 ends
3 begins
1 2
3 ends
4 begins
1 3
4 ends
bye
Runtime error 216 at $0000000000416644
$0000000000416644
$00000000004138FB
$0000000000413740
$0000000000400645
$00000000004145D2
$0000000000400180
Once the program says "bye", what is the program executing that is giving runtime error 216?
RTE 216 is in general fatal exceptions. GPF/SIGSEGV and in some cases SIGILL/SIGBUS, and that probably means that your program corrupts memory somewhere.
Compile with runtime checks on might help you find errors (Free Pascal : -Criot )