Identify cause of high GHC memory consumption without profile build - ghc

I heard GHC is slow in terms of LOC per second. So I created this Haskell program:
module Main where
x1 = 1
x2 = 2
x3 = 3
...
x999998 = 999998
x999999 = 999999
x1000000 = 1000000
main = putStrLn "1M LOC!"
And I can't even compile it! At least I can see that parser can do 43 lines per second:
[1 of 1] Compiling Main ( 1Mloc.hs, 1Mloc.o )
*** Parser [Main]:
Parser [Main]: alloc=22735369056 time=23683.420
As far as I know, GHC RTS must be recompiled with profiling enabled to start digging into the cause. Given I don't have profiled GHC, is there any chance to figure out what is causing this? I can't even collect statistics because it gets killed...
Killed process 16609 (ghc) total-vm:1074093288kB, anon-rss:6804448kB ...
Actually, I can't compile 10K LOC either. With down to 1K LOC at least I can see horrible productivity numbers. I realize this is a synthetic program, but what could be so bad about it?
Linking 1Kloc ...
1,383,344,416 bytes allocated in the heap
325,164,408 bytes copied during GC
60,849,840 bytes maximum residency (9 sample(s))
282,960 bytes maximum slop
58 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 233 colls, 0 par 0.227s 0.230s 0.0010s 0.0066s
Gen 1 9 colls, 0 par 0.149s 0.174s 0.0193s 0.0588s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0(0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.522s ( 1.047s elapsed)
GC time 0.376s ( 0.404s elapsed)
EXIT time 0.000s ( 0.008s elapsed)
Total time 0.899s ( 1.460s elapsed)
Alloc rate 2,647,974,824 bytes per MUT second
Productivity 58.1% of total user, 71.7% of total elapsed

Related

Why is my IR signal not powering on my TV

I am currently working on a personal project with pigpio and piscope on raspberry PI 4.
I try to simulate my TV remote by sending IR signal through an IR LED setup connected on GPIO 23 and GND pin (setup is a simple IR LED with a 200 ohm resistor)
I searched on LIRC database my TV remote config file and I did not find it, but I found another one (MKJ40653802-TV) which is said to be working also for my TV which is a LG 50PS3000:
https://www.remote-control-world.eu/lg-c-2_64/lg-mkj42519615-replacement-remote-control-p-4195
also config file :
begin remote
name MKJ40653802-TV
bits 16
flags SPACE_ENC|CONST_LENGTH
eps 30
aeps 100
header 9061 4473
one 591 1660
zero 591 521
ptrail 590
pre_data_bits 16
pre_data 0x20DF
gap 108029
toggle_bit_mask 0x0
begin codes
KEY_POWER 0x10EF # Was: power
After reading LIRC documentation and explainations on how to contruct an IR signal, I managed to get my hands through a python script which create IR waveform to be fired through IR LED
https://github.com/bschwind/ir-slinger/blob/master/pyslinger.py
I simply changed the NEC protocol paramters to the values present in the config file.
Also my power on/off hex value is 0x20DF23DC (pre-data + command) that I convert to binary 32 bits :
00100000110111110010001111011100
my code below :
#!/usr/bin/env python3
# Python IR transmitter
# Requires pigpio library
# Supports NEC, RC-5 and raw IR.
# Danijel Tudek, Aug 2016
import subprocess
import ctypes
import time
# This is the struct required by pigpio library.
# We store the individual pulses and their duration here. (In an array of these structs.)
class Pulses_struct(ctypes.Structure):
_fields_ = [("gpioOn", ctypes.c_uint32),
("gpioOff", ctypes.c_uint32),
("usDelay", ctypes.c_uint32)]
# Since both NEC and RC-5 protocols use the same method for generating waveform,
# it can be put in a separate class and called from both protocol's classes.
class Wave_generator():
def __init__(self,protocol):
self.protocol = protocol
MAX_PULSES = 12000 # from pigpio.h
Pulses_array = Pulses_struct * MAX_PULSES
self.pulses = Pulses_array()
self.pulse_count = 0
def add_pulse(self, gpioOn, gpioOff, usDelay):
self.pulses[self.pulse_count].gpioOn = gpioOn
self.pulses[self.pulse_count].gpioOff = gpioOff
self.pulses[self.pulse_count].usDelay = usDelay
self.pulse_count += 1
# Pull the specified output pin low
def zero(self, duration):
self.add_pulse(0, 1 << self.protocol.master.gpio_pin, duration)
# Protocol-agnostic square wave generator
def one(self, duration):
period_time = 1000000.0 / self.protocol.frequency
on_duration = int(round(period_time * self.protocol.duty_cycle))
off_duration = int(round(period_time * (1.0 - self.protocol.duty_cycle)))
total_periods = int(round(duration/period_time))
total_pulses = total_periods * 2
# Generate square wave on the specified output pin
for i in range(total_pulses):
if i % 2 == 0:
self.add_pulse(1 << self.protocol.master.gpio_pin, 0, on_duration)
else:
self.add_pulse(0, 1 << self.protocol.master.gpio_pin, off_duration)
# NEC protocol class
class NEC():
def __init__(self,
master,
frequency=38000,
duty_cycle=0.5,
leading_pulse_duration=9061,
leading_gap_duration=4473,
one_pulse_duration = 591,
one_gap_duration = 1660,
zero_pulse_duration = 591,
zero_gap_duration = 521,
trailing_pulse = [1, 590]):
self.master = master
self.wave_generator = Wave_generator(self)
self.frequency = frequency # in Hz, 38000 per specification
self.duty_cycle = duty_cycle # duty cycle of high state pulse
# Durations of high pulse and low "gap".
# The NEC protocol defines pulse and gap lengths, but we can never expect
# that any given TV will follow the protocol specification.
self.leading_pulse_duration = leading_pulse_duration # in microseconds, 9000 per specification
self.leading_gap_duration = leading_gap_duration # in microseconds, 4500 per specification
self.one_pulse_duration = one_pulse_duration # in microseconds, 562 per specification
self.one_gap_duration = one_gap_duration # in microseconds, 1686 per specification
self.zero_pulse_duration = zero_pulse_duration # in microseconds, 562 per specification
self.zero_gap_duration = zero_gap_duration # in microseconds, 562 per specification
self.trailing_pulse = trailing_pulse # trailing 562 microseconds pulse, some remotes send it, some don't
print("NEC protocol initialized")
# Send AGC burst before transmission
def send_agc(self):
print("Sending AGC burst")
self.wave_generator.one(self.leading_pulse_duration)
self.wave_generator.zero(self.leading_gap_duration)
# Trailing pulse is just a burst with the duration of standard pulse.
def send_trailing_pulse(self):
print("Sending trailing pulse")
self.wave_generator.one(self.trailing_pulse[1])
# This function is processing IR code. Leaves room for possible manipulation
# of the code before processing it.
def process_code(self, ircode):
if (self.leading_pulse_duration > 0) or (self.leading_gap_duration > 0):
self.send_agc()
for i in ircode:
if i == "0":
self.zero()
elif i == "1":
self.one()
else:
print("ERROR! Non-binary digit!")
return 1
if self.trailing_pulse[0] == 1:
self.send_trailing_pulse()
return 0
# Generate zero or one in NEC protocol
# Zero is represented by a pulse and a gap of the same length
def zero(self):
self.wave_generator.one(self.zero_pulse_duration)
self.wave_generator.zero(self.zero_gap_duration)
# One is represented by a pulse and a gap three times longer than the pulse
def one(self):
self.wave_generator.one(self.one_pulse_duration)
self.wave_generator.zero(self.one_gap_duration)
# RC-5 protocol class
# Note: start bits are not implemented here due to inconsistency between manufacturers.
# Simply provide them with the rest of the IR code.
class RC5():
def __init__(self,
master,
frequency=36000,
duty_cycle=0.33,
one_duration=889,
zero_duration=889):
self.master = master
self.wave_generator = Wave_generator(self)
self.frequency = frequency # in Hz, 36000 per specification
self.duty_cycle = duty_cycle # duty cycle of high state pulse
# Durations of high pulse and low "gap".
# Technically, they both should be the same in the RC-5 protocol, but we can never expect
# that any given TV will follow the protocol specification.
self.one_duration = one_duration # in microseconds, 889 per specification
self.zero_duration = zero_duration # in microseconds, 889 per specification
print("RC-5 protocol initialized")
# This function is processing IR code. Leaves room for possible manipulation
# of the code before processing it.
def process_code(self, ircode):
for i in ircode:
if i == "0":
self.zero()
elif i == "1":
self.one()
else:
print("ERROR! Non-binary digit!")
return 1
return 0
# Generate zero or one in RC-5 protocol
# Zero is represented by pulse-then-low signal
def zero(self):
self.wave_generator.one(self.zero_duration)
self.wave_generator.zero(self.zero_duration)
# One is represented by low-then-pulse signal
def one(self):
self.wave_generator.zero(self.one_duration)
self.wave_generator.one(self.one_duration)
# RAW IR ones and zeroes. Specify length for one and zero and simply bitbang the GPIO.
# The default values are valid for one tested remote which didn't fit in NEC or RC-5 specifications.
# It can also be used in case you don't want to bother with deciphering raw bytes from IR receiver:
# i.e. instead of trying to figure out the protocol, simply define bit lengths and send them all here.
class RAW():
def __init__(self,
master,
frequency=36000,
duty_cycle=0.33,
one_duration=520,
zero_duration=520):
self.master = master
self.wave_generator = Wave_generator(self)
self.frequency = frequency # in Hz
self.duty_cycle = duty_cycle # duty cycle of high state pulse
self.one_duration = one_duration # in microseconds
self.zero_duration = zero_duration # in microseconds
def process_code(self, ircode):
for i in ircode:
if i == "0":
self.zero()
elif i == "1":
self.one()
else:
print("ERROR! Non-binary digit!")
return 1
return 0
# Generate raw zero or one.
# Zero is represented by low (no signal) for a specified duration.
def zero(self):
self.wave_generator.zero(self.zero_duration)
# One is represented by pulse for a specified duration.
def one(self):
self.wave_generator.one(self.one_duration)
class IR():
def __init__(self, gpio_pin, protocol, protocol_config):
print("Starting IR")
print("Loading libpigpio.so")
self.pigpio = ctypes.CDLL('libpigpio.so')
print("Initializing pigpio")
PI_OUTPUT = 1 # from pigpio.h
self.pigpio.gpioInitialise()
subprocess.Popen('piscope', shell=True)
time.sleep(1)
self.gpio_pin = gpio_pin
print("Configuring pin %d as output" % self.gpio_pin)
self.pigpio.gpioSetMode(self.gpio_pin, PI_OUTPUT) # pin 17 is used in LIRC by default
print("Initializing protocol")
if protocol == "NEC":
self.protocol = NEC(self, **protocol_config)
elif protocol == "RC-5":
self.protocol = RC5(self, **protocol_config)
elif protocol == "RAW":
self.protocol = RAW(self, **protocol_config)
else:
print("Protocol not specified! Exiting...")
return 1
print("IR ready")
# send_code takes care of sending the processed IR code to pigpio.
# IR code itself is processed and converted to pigpio structs by protocol's classes.
def send_code(self, ircode):
print("Processing IR code: %s" % ircode)
code = self.protocol.process_code(ircode)
if code != 0:
print("Error in processing IR code!")
return 1
clear = self.pigpio.gpioWaveClear()
print(clear)
if clear != 0:
print("Error in clearing wave!")
return 1
pulses = self.pigpio.gpioWaveAddGeneric(self.protocol.wave_generator.pulse_count, self.protocol.wave_generator.pulses)
if pulses < 0:
print("Error in adding wave!")
return 1
wave_id = self.pigpio.gpioWaveCreate()
# Unlike the C implementation, in Python the wave_id seems to always be 0.
if wave_id >= 0:
print("Sending wave...")
result = self.pigpio.gpioWaveTxSend(wave_id, 0)
if result >= 0:
print("Success! (result: %d)" % result)
else:
print("Error! (result: %d)" % result)
return 1
else:
print("Error creating wave: %d" % wave_id)
return 1
while self.pigpio.gpioWaveTxBusy():
time.sleep(0.1)
print("Deleting wave")
self.pigpio.gpioWaveDelete(wave_id)
print("Terminating pigpio")
self.pigpio.gpioTerminate()
# Simply define the GPIO pin, protocol (NEC, RC-5 or RAW) and
# override the protocol defaults with the dictionary if required.
# Provide the IR code to the send_code() method.
# An example is given below.
if __name__ == "__main__":
protocol = "NEC"
gpio_pin = 23
protocol_config = dict(one_pulse_duration = 591,
zero_pulse_duration = 591)
ir = IR(gpio_pin, protocol, protocol_config)
ir.send_code("00100000110111110001000011101111")
print("Exiting IR")
When launching the script it's working, I can see the IR LED blinking through phone cam and also I see the waveform generating through piscope :
Everything looks correct to me but I don't know why it's not powering on my TV...
Could you please help me with this problem ? I don't know if I missed something or if I am using the wrong TV code...
Thanks a lot !
I tried other remote code, I tried the toggle-bit-mask on the first bit (toggle_bit_mask = 0x0)
I tried other codes (on and off) from this page :
https://gist.github.com/francis2110/8f69843dd57ae07dce80
with no success
It's working.
I just had to get close to tv (less than 1 meter away).
So I am reviewing my LED setup adding a transistor.
As seen online it should be working from longer distances...

Progress bar slows the loop

I've done some tests with the progress bar and it slows down the test code considerably.
Are there any alternatives or solutions? I'm looking for a way to track current index while looping and there are some primitive ways to put more conditions to print when step reached but isn't there something good that's built in?
Oh and one more question, Is there a way to print time elapsed from when the function started and display with the index? let me clarify, I know about #time and etc but is there a way to count time and display it with corresponding index like
"Reached index $i in iteration in time $time"
Code for the tests done:
function test(x)
summ = BigInt(0);
Juno.progress(name = "foo") do id
for i = 1:x
summ+=i;
#info "foo" progress=i/x _id=id
end
end
println("sum up to $x is $summ");
return summ;
end
#benchmark test(10^4)
function test2(x)
summ = BigInt(0);
for i = 1:x
summ+=i;
(i%10 == 0) && println("Reached this milestone $i")
end
println("sum up to $x is $summ");
return summ;
end
#benchmark test2(10^4)
EDIT 1
for Juno.progress:
BenchmarkTools.Trial:
memory estimate: 21.66 MiB
allocs estimate: 541269
--------------
minimum time: 336.595 ms (0.00% GC)
median time: 345.875 ms (0.00% GC)
mean time: 345.701 ms (0.64% GC)
maximum time: 356.436 ms (1.34% GC)
--------------
samples: 15
evals/sample: 1
For the crude simple version:
BenchmarkTools.Trial:
memory estimate: 1.22 MiB
allocs estimate: 60046
--------------
minimum time: 111.251 ms (0.00% GC)
median time: 117.110 ms (0.00% GC)
mean time: 119.886 ms (0.51% GC)
maximum time: 168.116 ms (15.31% GC)
--------------
samples: 42
evals/sample: 1
I'd recommend using Juno.#progress directly for much better performance:
using BenchmarkTools
function test(x)
summ = BigInt(0)
Juno.progress(name = "foo") do id
for i = 1:x
summ += i
#info "foo" progress = i / x _id = id
end
end
println("sum up to $x is $summ")
return summ
end
#benchmark test(10^4) # min: 326ms
function test1(x)
summ = BigInt(0)
Juno.#progress "foo" for i = 1:x
summ += i
end
println("sum up to $x is $summ")
return summ
end
#benchmark test1(10^4) # min 5.4ms
function test2(x)
summ = BigInt(0)
for i = 1:x
summ += i
end
println("sum up to $x is $summ")
return summ
end
#benchmark test2(10^4) # min 0.756ms
function test3(x)
summ = BigInt(0);
for i = 1:x
summ+=i;
(i%10 == 0) && println("Reached this milestone $i")
end
println("sum up to $x is $summ");
return summ;
end
#benchmark test3(10^4) # min 33ms
Juno.progress can make no performance optimizations at all for you, but you can implement them manually:
function test4(x)
summ = BigInt(0)
update_interval = x÷200 # update every 0.5%
Juno.progress(name = "foo") do id
for i = 1:x
summ += i
if i % update_interval == 0
#info "foo" progress = i / x _id = id
end
end
end
println("sum up to $x is $summ")
return summ
end
#benchmark test4(10^4) # min: 5.2ms
As was stated by High Performance Mark writing to the screen is fundamentally slow (crazy fast in human scale, very slow in computer scale.) You could abandon writing the output to the progress bar, but you can also simply update the progress bar less often. In your test case you're doing 10000 additions and updating the progress bar 10000 times. To be honest I've never used Julia and I have no idea what the progress bar looks like. Even if it is a GUI progress bar on a 4K screen and each of these updates actually changes it at all I guarantee a human can't see the difference. I would update it at the beginning (to be 0) and at the end (to be 100%) and then use an if statement with a modulo test to only update every so many additions. Example below in python which I'll claim is pseudo code since I've never used julia:
updateEvery = 2
for i in range(1,x):
sum += i
if x % updateEvery == 0:
updateProgressBar(i/x)
By varying updateEvery you can decrease or increase the number of progress bar updates. You can even calculate it dynamically based on x, say updateEvery = x/100, this would mean the progress bar would line up pretty well to percentages. The inefficiency caused by the progress bar updates is also probably meaningless for small values of x and as x increases the number of updates per number to be added will decrease (because the total number of updates will be constant.
Oh and if you really need great performance to the counting clock tick level (which you probably don't,) modulo is faster for powers of 2 as it can be done with a binary and operation. I assume Julia will figure this optimisation out for you and you can just use % and round the value of updateEvery to the next power of 2. Though if you really care about that level of performance you'd be best to just get rid of the progress bar to eliminate the loop altogether.

How do I finish a recursive binary search in fortran?

I am trying to find the smallest index containing the value i in a sorted array. If this i value is not present I want -1 to be returned. I am using a binary search recursive subroutine. The problem is that I can't really stop this recursion and I get lot of answers(one right and the rest wrong). And sometimes I get an error called "segmentation fault: 11" and I don't really get any results.
I've tried to delete this call random_number since I already have a sorted array in my main program, but it did not work.
program main
implicit none
integer, allocatable :: A(:)
real :: MAX_VALUE
integer :: i,j,n,s, low, high
real :: x
N= 10 !size of table
MAX_VALUE = 10
allocate(A(n))
s = 5 ! searched value
low = 1 ! lower limit
high = n ! highest limit
!generate random table of numbers (from 0 to 1000)
call Random_Seed
do i=1, N
call Random_Number(x) !returns random x >= 0 and <1
A(i)= anint(MAX_VALUE*x)
end do
call bubble(n,a)
print *,' '
write(*,10) (a(i),i=1,N)
10 format(10i6)
call bsearch(A,n,s,low,high)
deallocate(A)
end program main
The sort subroutine:
subroutine sort(p,q)
implicit none
integer(kind=4), intent(inout) :: p, q
integer(kind=4) :: temp
if (p>q) then
temp = p
p = q
q = temp
end if
return
end subroutine sort
The bubble subroutine:
subroutine bubble(n,arr)
implicit none
integer(kind=4), intent(inout) :: n
integer(kind=4), intent(inout) :: arr(n)
integer(kind=4) :: sorted(n)
integer :: i,j
do i=1, n
do j=n, i+1, -1
call sort(arr(j-1), arr(j))
end do
end do
return
end subroutine bubble
recursive subroutine bsearch(b,n,i,low,high)
implicit none
integer(kind=4) :: b(n)
integer(kind=4) :: low, high
integer(kind=4) :: i,j,x,idx,n
real(kind=4) :: r
idx = -1
call random_Number(r)
x = low + anint((high - low)*r)
if (b(x).lt.i) then
low = x + 1
call bsearch(b,n,i,low,high)
else if (b(x).gt.i) then
high = x - 1
call bsearch(b,n,i,low,high)
else
do j = low, high
if (b(j).eq.i) then
idx = j
exit
end if
end do
end if
! Stop if high = low
if (low.eq.high) then
return
end if
print*, i, 'found at index ', idx
return
end subroutine bsearch
The goal is to get the same results as my linear search. But I'am getting either of these answers.
Sorted table:
1 1 2 4 5 5 6 7 8 10
5 found at index 5
5 found at index -1
5 found at index -1
or if the value is not found
2 2 3 4 4 6 6 7 8 8
Segmentation fault: 11
There are a two issues causing your recursive search routine bsearch to either stop with unwanted output, or result in a segmentation fault. Simply following the execution logic of your program at the hand of the examples you provided, elucidate the matter:
1) value present and found, unwanted output
First, consider the first example where array b contains the value i=5 you are searching for (value and index pointed out with || in the first two lines of the code block below). Using the notation Rn to indicate the the n'th level of recursion, L and H for the lower- and upper bounds and x for the current index estimate, a given run of your code could look something like this:
b(x): 1 1 2 4 |5| 5 6 7 8 10
x: 1 2 3 4 |5| 6 7 8 9 10
R0: L x H
R1: Lx H
R2: L x H
5 found at index 5
5 found at index -1
5 found at index -1
In R0 and R1, the tests b(x).lt.i and b(x).gt.i in bsearch work as intended to reduce the search interval. In R2 the do-loop in the else branch is executed, idx is assigned the correct value and this is printed - as intended. However, a return statement is now encountered which will return control to the calling program unit - in this case that is first R1(!) where execution will resume after the if-else if-else block, thus printing a message to screen with the initial value of idx=-1. The same happens upon returning from R0 to the main program. This explains the (unwanted) output you see.
2) value not present, segmentation fault
Secondly, consider the example resulting in a segmentation fault. Using the same notation as before, a possible run could look like this:
b(x): 2 2 3 4 4 6 6 7 8 8
x: 1 2 3 4 5 6 7 8 9 10
R0: L x H
R1: L x H
R2: L x H
R3: LxH
R4: H xL
.
.
.
Segmentation fault: 11
In R0 to R2 the search interval is again reduced as intended. However, in R3 the logic fails. Since the search value i is not present in array b, one of the .lt. or .gt. tests will always evaluate to .true., meaning that the test for low .eq. high to terminate a search is never reached. From this point onwards, the logic is no longer valid (e.g. high can be smaller than low) and the code will continue deepening the level of recursion until the call stack gets too big and a segmentation fault occurs.
These explained the main logical flaws in the code. A possible inefficiency is the use of a do-loop to find the lowest index containing a searched for value. Consider a case where the value you are looking for is e.g. i=8, and that it appears in the last position in your array, as below. Assume further that by chance, the first guess for its position is x = high. This implies that your code will immediately branch to the do-loop, where in effect a linear search is done of very nearly the entire array, to find the final result idx=9. Although correct, the intended binary search rather becomes a linear search, which could result in reduced performance.
b(x): 2 2 3 4 4 6 6 7 |8| 8
x: 1 2 3 4 5 6 7 8 |9| 10
R0: L xH
8 found at index 9
Fixing the problems
At the very least, you should move the low .eq. high test to the start of the bsearch routine, so that recursion stops before invalid bounds can be defined (you then need an additional test to see if the search value was found or not). Also, notify about a successful search right after it occurs, i.e. after the equality test in your do-loop, or the additional test just mentioned. This still does not address the inefficiency of a possible linear search.
All taken into account, you are probably better off reading up on algorithms for finding a "leftmost" index (e.g. on Wikipedia or look at a tried and tested implementation - both examples here use iteration instead of recursion, perhaps another improvement, but the same principles apply) and adapt that to Fortran, which could look something like this (only showing new code, ...refer to existing code in your examples):
module mod_search
implicit none
contains
! Function that uses recursive binary search to look for `key` in an
! ordered `array`. Returns the array index of the leftmost occurrence
! of `key` if present in `array`, and -1 otherwise
function search_ordered (array, key) result (idx)
integer, intent(in) :: array(:)
integer, intent(in) :: key
integer :: idx
! find left most array index that could possibly hold `key`
idx = binary_search_left(1, size(array))
! if `key` is not found, return -1
if (array(idx) /= key) then
idx = -1
end if
contains
! function for recursive reduction of search interval
recursive function binary_search_left(low, high) result(idx)
integer, intent(in) :: low, high
integer :: idx
real :: r
if (high <= low ) then
! found lowest possible index where target could be
idx = low
else
! new guess
call random_number(r)
idx = low + floor((high - low)*r)
! alternative: idx = low + (high - low) / 2
if (array(idx) < key) then
! continue looking to the right of current guess
idx = binary_search_left(idx + 1, high)
else
! continue looking to the left of current guess (inclusive)
idx = binary_search_left(low, idx)
end if
end if
end function binary_search_left
end function search_ordered
! Move your routines into a module
subroutine sort(p,q)
...
end subroutine sort
subroutine bubble(n,arr)
...
end subroutine bubble
end module mod_search
! your main program
program main
use mod_search, only : search_ordered, sort, bubble ! <---- use routines from module like so
implicit none
...
! Replace your call to bsearch() with the following:
! call bsearch(A,n,s,low,high)
i = search_ordered(A, s)
if (i /= -1) then
print *, s, 'found at index ', i
else
print *, s, 'not found!'
end if
...
end program main
Finally, depending on your actual use case, you could also just consider using the Fortran intrinsic procedure minloc saving you the trouble of implementing all this functionality yourself. In this case, it can be done by making the following modification in your main program:
! i = search_ordered(a, s) ! <---- comment out this line
j = minloc(abs(a-s), dim=1) ! <---- replace with these two
i = merge(j, -1, a(j) == s)
where j returned from minloc will be the lowest index in the array a where s may be found, and merge is used to return j when a(j) == s and -1 otherwise.

Julia is slow with cat command

I wanted to have a look at the julia language, so I wrote a little script to import a dataset I'm working with. But when I run and profile the script it turns out that it is much slower than a similar script in R.
When I do profiling it tells me that all the cat commands have a bad performance.
The files look like this:
#
#Metadata
#
Identifier1 data_string1
Identifier2 data_string2
Identifier3 data_string3
Identifier4 data_string4
//
I primarily want to get the data_strings and split them up into a matrix of single characters.
This is a somehow minimal code example:
function loadfile()
f = open("/file1")
first=true
m = Array(Any, 1,0)
for ln in eachline(f)
if ln[1] != '#' && ln[1] != '\n' && ln[1] != '/'
s = split(ln[1:end-1])
s = split(s[2],"")
if first
m = reshape(s,1,length(s))
first = false
else
s = reshape(s,1,length(s))
println(size(m))
println(size(s))
m = vcat(m, s)
end
end
end
end
Any idea why julia might be slow with the cat command or how i can do it differently?
Thanks for any suggestions!
Using cat like that is slow in that it requires a lot of memory allocations. Every time we do a vcat we are allocating a whole new array m which is mostly the same as the old m. Here is how I'd rewrite your code in a more Julian way, where m is only created at the end:
function loadfile2()
f = open("./sotest.txt","r")
first = true
lines = Any[]
for ln in eachline(f)
if ln[1] == '#' || ln[1] == '\n' || ln[1] == '/'
continue
end
data_str = split(ln[1:end-1]," ")[2]
data_chars = split(data_str,"")
# Can make even faster (2x in my tests) with
# data_chars = [data_str[i] for i in 1:length(data_str)]
# But this inherently assumes ASCII data
push!(lines, data_chars)
end
m = hcat(lines...)' # Stick column vectors together then transpose
end
I made a 10,000 line version of your example data and found the following performance:
Old version:
elapsed time: 3.937826405 seconds (3900659448 bytes allocated, 43.81% gc time)
elapsed time: 3.581752309 seconds (3900645648 bytes allocated, 36.02% gc time)
elapsed time: 3.57753696 seconds (3900645648 bytes allocated, 37.52% gc time)
New version:
elapsed time: 0.010351067 seconds (11568448 bytes allocated)
elapsed time: 0.011136188 seconds (11568448 bytes allocated)
elapsed time: 0.010654002 seconds (11568448 bytes allocated)

Passing information between threads (foreach with %dopar%)

I'm using doSNOW- package for parallelizing tasks, which differ in length. When one thread is finished, I want
some information generated by old threads passed to the next thread
start the next thread immediatly (loadbalancing like in clusterApplyLB)
It works in singlethreaded (see makeClust(spec = 1 ))
#Register Snow and doSNOW
require(doSNOW)
#CHANGE spec to 4 or more, to see what my problem is
registerDoSNOW(cl <- makeCluster(spec=1,type="SOCK",outfile=""))
numbersProcessed <- c() # init processed vector
x <- foreach(i = 1:10,.export=numbersProcessed) %dopar% {
#Do working stuff
cat(format(Sys.time(), "%X"),": ","Starting",i,"(Numbers processed so far:",numbersProcessed, ")\n")
Sys.sleep(time=i)
#Appends this number to general vector
numbersProcessed <- append(numbersProcessed,i)
cat(format(Sys.time(), "%X"),": ","Ending",i,"\n")
cat("--------------------\n")
}
#End it all
stopCluster(cl)
Now change the spec in "makeCluster" to 4. Output is something like this:
[..]
Type: EXEC
18:12:21 : Starting 9 (Numbers processed so far: 1 5 )
18:12:23 : Ending 6
--------------------
Type: EXEC
18:12:23 : Starting 10 (Numbers processed so far: 2 6 )
18:12:25 : Ending 7
At 18:12:21 thread 9 knew, that thread 1 and 5 have been processed. 2 seconds later thread 6 ends. The next thread has to know at least about 1, 5 and 6, right?. But thread 10 only knows about 6 and 2.
I realized, this has to do something with the cores specified in makeCluster. 9 knows about 1, 5 and 9 (1 + 4 + 4), 10 knows about 2,6 and 10 (2 + 4 + 4).
Is there a better way to pass "processed" stuff to further generations of threads?
Bonuspoints: Is there a way to "print" to the master- node in parallel processing, without having these "Type: EXEC" etc messages from the snow package? :)
Thanks!
Marc
My bad. Damn.
I thought, foreach with %dopar% is load-balanced. This isn't the case, and makes my question absolete, because there can nothing be executed on the host-side while parallel processing. That explains why global variables are only manipulated on the client side and never reach the host.

Resources