BLAS.axpy! slower than += in Julia - julia

Consider the following 4 functions in Julia: They all pick/compute a random column of a matrix A and adds a constant times this column to a vector z.
The difference between slow1 and fast1 is how z is updated and likewise for slow2 and fast2.
The difference between the 1 functions and 2 functions is whether the matrix A is passed to the functions or computed on the fly.
The odd thing is that for the 1 functions, fast1 is faster (as I would expect when using BLAS instead of +=), but for the 2 functions slow1 is faster.
On this computer I get the following timings (for the second run of each function):
#time slow1(A, z, 10000);
0.172560 seconds (110.01 k allocations: 940.102 MB, 12.98% gc time)
#time fast1(A, z, 10000);
0.142748 seconds (50.07 k allocations: 313.577 MB, 4.56% gc time)
#time slow2(complex(float(x)), complex(float(y)), z, 10000);
2.265950 seconds (120.01 k allocations: 1.529 GB, 1.20% gc time)
#time fast2(complex(float(x)), complex(float(y)), z, 10000);
4.351953 seconds (60.01 k allocations: 939.410 MB, 0.43% gc time)
Is there an explanation to this behaviour? And a way to make BLAS faster than +=?
M = 2^10
x = [-M:M-1;]
N = 2^9
y = [-N:N-1;]
A = cis( -2*pi*x*y' )
z = rand(2*M) + rand(2*M)*im
function slow1(A::Matrix{Complex{Float64}}, z::Vector{Complex{Float64}}, maxiter::Int)
S = [1:size(A,2);]
for iter = 1:maxiter
idx = rand(S)
col = A[:,idx]
a = rand()
z += a*col
end
end
function fast1(A::Matrix{Complex{Float64}}, z::Vector{Complex{Float64}}, maxiter::Int)
S = [1:size(A,2);]
for iter = 1:maxiter
idx = rand(S)
col = A[:,idx]
a = rand()
BLAS.axpy!(a, col, z)
end
end
function slow2(x::Vector{Complex{Float64}}, y::Vector{Complex{Float64}}, z::Vector{Complex{Float64}}, maxiter::Int)
S = [1:length(y);]
for iter = 1:maxiter
idx = rand(S)
col = cis( -2*pi*x*y[idx] )
a = rand()
z += a*col
end
end
function fast2(x::Vector{Complex{Float64}}, y::Vector{Complex{Float64}}, z::Vector{Complex{Float64}}, maxiter::Int)
S = [1:length(y);]
for iter = 1:maxiter
idx = rand(S)
col = cis( -2*pi*x*y[idx] )
a = rand()
BLAS.axpy!(a, col, z)
end
end
Update:
Profiling slow2:
2260 task.jl; anonymous; line: 92
2260 REPL.jl; eval_user_input; line: 63
2260 profile.jl; anonymous; line: 16
2175 /tmp/axpy.jl; slow2; line: 37
10 arraymath.jl; .*; line: 118
33 arraymath.jl; .*; line: 120
5 arraymath.jl; .*; line: 125
46 arraymath.jl; .*; line: 127
3 complex.jl; cis; line: 286
3 complex.jl; cis; line: 287
2066 operators.jl; cis; line: 374
72 complex.jl; cis; line: 286
1914 complex.jl; cis; line: 287
1 /tmp/axpy.jl; slow2; line: 38
84 /tmp/axpy.jl; slow2; line: 39
5 arraymath.jl; +; line: 96
39 arraymath.jl; +; line: 98
6 arraymath.jl; .*; line: 118
34 arraymath.jl; .*; line: 120
Profiling fast2:
4288 task.jl; anonymous; line: 92
4288 REPL.jl; eval_user_input; line: 63
4288 profile.jl; anonymous; line: 16
1 /tmp/axpy.jl; fast2; line: 47
1 random.jl; rand; line: 214
3537 /tmp/axpy.jl; fast2; line: 48
26 arraymath.jl; .*; line: 118
44 arraymath.jl; .*; line: 120
1 arraymath.jl; .*; line: 122
4 arraymath.jl; .*; line: 125
53 arraymath.jl; .*; line: 127
7 complex.jl; cis; line: 286
3399 operators.jl; cis; line: 374
116 complex.jl; cis; line: 286
3108 complex.jl; cis; line: 287
2 /tmp/axpy.jl; fast2; line: 49
748 /tmp/axpy.jl; fast2; line: 50
748 linalg/blas.jl; axpy!; line: 231
Oddly, the computing time of col differs even though the functions are identical up to this point.
But += is still relatively faster than axpy!.

Some more info now that julia 0.6 is out. To multiply a vector by a scalar in place, there are at least four options. Following Tim's suggstions, I used BenchmarkTool's #btime macro. It turns out that loop fusion, the most julian way to write it, is on par with calling BLAS. That's something the julia developers can be proud of!
using BenchmarkTools
function bmark(N)
a = zeros(N);
#btime $a *= -1.;
#btime $a .*= -1.;
#btime LinAlg.BLAS.scal!($N, -1.0, $a, 1);
#btime scale!($a, -1.);
end
And the results for 10^5 numbers.
julia> bmark(10^5);
78.195 μs (2 allocations: 781.33 KiB)
35.102 μs (0 allocations: 0 bytes)
34.659 μs (0 allocations: 0 bytes)
34.664 μs (0 allocations: 0 bytes)
The profiling backtrace shows that scale! just calls blas in the background, so they should give the same best time.

Related

what does the return value nfeval represent in the DEoptim R package?

I would like to know what nfeval returned by DEoptim R package represents.
The documentation simply states nfeval: number of function evaluations.
I assumed this meant "number of times the objective function was evaluated" and therefore should be approximately equal to the population size multiplied by the number of generations. However that doesn't seem to be the case, for example:
Rosenbrock <- function(x){
x1 <- x[1]
x2 <- x[2]
100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
run_deoptim_report_nfeval <- function(itermax,NP){
lower <- c(-10,-10)
upper <- -lower
opt <- DEoptim(Rosenbrock, lower, upper,
control = DEoptim.control(itermax=itermax,NP=NP,trace=FALSE))
return(opt$optim$nfeval)
}
library(DEoptim)
## vary number of generations
sapply(seq(10,200,10), function(itmx) run_deoptim_report_nfeval(itmx,NP=100))
[1] 22 42 62 82 102 122 142 162 182 202 222 242 262 282 302 322 342 362 382 402
## vary population size
sapply(seq(10,200,10), function(np) run_deoptim_report_nfeval(itermax=20,NP=np))
[1] 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
What am I missing here? Why doesn't increasing NP increase nfeval? Why does nfeval=2*(1+itermax)?

Fibonacci sequence less than 1000 in R

I'm trying to print the Fibonacci Sequence less than 1000 using while loop in R.
So far,
fib <- c(1,1)
counter <-3
while (fib[counter-1]<1000){
fib[counter]<- fib[counter-2]+fib[counter-1]
counter = counter+1
}
fib
I have this code. Only the first two numbers are given: 1,1. This is printing:
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
How do I fix my code to print only less than 1000?
Instead of checking the value of the last element wrt 1000, for the expected output you should be checking the sum of the last two elements as so.
fib <- c(1,1)
counter <-3
while (fib[counter-2]+fib[counter - 1]<1000){
fib[counter]<- fib[counter-2]+fib[counter-1]
counter = counter+1
}
fib
The issue with your approach is when the condition (fib[counter-1]<1000) in while loop is FALSE you have already added the number in fib which is greater than 1000.
You could return fib[-length(fib)] to remove the last number or check the number before inserting the number in fib.
fib <- c(1,1)
counter <-3
while (TRUE){
temp <- fib[counter-2] + fib[counter-1]
if(temp < 1000)
fib[counter] <- temp
else
break
counter = counter+1
}
fib
#[1] 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
You could change the while condition to sum the last 2 answers instead of just the last one:
fib <- c(1,1)
counter <-3
while (sum(fib[counter - 1:2]) < 1000){
fib[counter]<- fib[counter-2]+fib[counter-1]
counter = counter+1
}
fib
#> [1] 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
Or just get rid of counter completely:
fib <- c(1,1)
while (sum(fib[length(fib) - 0:1]) < 1000) fib <- c(fib, sum(fib[length(fib) - 0:1]))
fib
#> [1] 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

How to switch between radians and degrees in SAS

just looking for an easy way to run trig functions in SAS without having to manually correct in each calculation. Below is what I am working with.
I am running this in SAS 9 probably, the SAS Studio Student Module but this is a general SAS question.
I have manually created a variable, 'rad' in the 'calc' data step to deal with this but it adds a step of complexity that I would like to avoid.
I am asking whether there is a system setting, alternate trig function or ... ? that would change the calculation from:
bh_x = cos(rad*bh_a)*bh_l ;
to:
bh_x = cos(bh_a)*bh_l ;
so I don't have to manually convert my angle in degrees to radians for the trig function to work.
Thanks to anyone reading this and putting any mental effort to the solution!
Tim
data spec ;
length
b2h_a 8
b2h_l 8
b2h_l_e 8
bike $ 8
name $ 16
;
input
bike $
name $
bh_a
bh_l
ht_a
spcr
st_h
st_a
st_l
hb_r
hb_a
;
datalines ;
srcn (0,0) 0 0 67 0 0 0 0 0 0
srcn c 41 658 71.5 27 40 25 120 100 13
srcn ne_27_n13 41 658 71.5 27 40 27 127 100 13
srcn ne_15_0 41 658 71.5 15 40 27 127 100 0
srcn ne_5_0 41 658 71.5 5 40 27 127 100 0
srcn ne_2_n9 41 658 71.5 2 40 27 127 100 9
srcn ne_5_10 41 658 71.5 5 40 27 127 100 -10
srcn ne_10_rf10 41 658 71.5 10 40 27 127 20 -10
srcn max 41 658 90 250 0 0 250 0 0
;
run ;
data calc ;
set spec ;
pi=constant('pi') ;
rad=pi/180 ;
bh_x = cos(rad*bh_a)*bh_l ;
bh_y = sin(rad*bh_a)*bh_l ;
sr_x = (cos(rad*ht_a)*(spcr+st_h/2))*-1 ;
sr_y = sin(rad*ht_a)*(spcr+st_h/2);
st_x = cos(rad*(90-ht_a+st_a))*st_l ;
st_y = sin(rad*(90-ht_a+st_a))*st_l ;
hb_x = cos(rad*(90-hb_a))*hb_r*-1 ;
hb_y = sin(rad*(90-hb_a))*hb_r ;
hd_x = bh_x + sr_x + st_x + hb_x ;
hd_y = bh_y + sr_y + st_y + hb_y ;
if hd_x=0 then do ;
b2h_a=0 ;
b2h_l=0 ;
end ;
else do ;
b2h_a = atan(hd_y/hd_x)/rad ;
b2h_l = hd_y/sin(b2h_a*rad) ;
end ;
b2h_l_e = b2h_l/25.4 ;
drop pi rad ;
format
b2h_a 5.
b2h_l 5.
b2h_l_e 5.
bh_a 5.
bh_l 5.
ht_a 5.
spcr 5.
st_h 5.
st_a 5.
st_l 5.
hb_r 5.
hb_a 5.
bh_x 5.
bh_y 5.
sr_x 5.
sr_y 5.
st_x 5.
st_y 5.
hb_x 5.
hb_y 5.
hd_x 5.
hd_y 5.
b2h_a 5.
b2h_l 5.
b2h_l_e 5.1
;
run ;
There are no trig functions in SAS that accept DEGREE or GRADIAN arguments. You always need to convert from your data's angular measurement system to RADIAN.
You can write a macro to perform the conversion. Example:
%macro cosD(theta);
%* theta is angle in degrees;
%* emit data step source code that performs conversion from degrees to radians;
cos(&theta*constant('PI')/180)
%mend;
In use:
data calc ;
set spec ;
bh_x = %cosD(bh_a) * bh_l ;
You could convert the angular data to radians during the step where input occurs and then not have to worry about it again.

Create a vector from a specific sequence of intervals

I have 20 intervals:
10 intervals from 1 to 250 of size 25:
[1.25] [26.50] [51.75] [76.100] [101.125] [126.150] ... [226.250]
10 intervals from 251 to 1000 of size 75:
[251,325] [326,400] [401,475] [476,550] [551,625] ... [926,1000]
I would like to create a vector composed of the first 5 elements of each interval like:
(1,2,3,5, 26,27,28,29,30, 51,52,53,54,55, 76,77,78,79,80, ....,
251,252,253,254,255, 326,327,328,329,330, ...)
How create this vector using R?
Let's assume you have two interval like :
interval1 <- seq(1.25, 226.250, 25)
interval2 <- seq(251, 1000, 75)
We can create a new interval combining the two and then use mapply to create sequence
new_interval <- c(as.integer(interval1), interval2)
c(mapply(`:`, new_interval, new_interval + 4))
#[1] 1 2 3 4 5 26 27 28 29 30 51 52 53 54 .....
#[89] ..... 779 780 851 852 853 854 855 926 927 928 929 930

Creating data continuously using rnorm until an outlier occurs in R

Sorry for the confusing title, but i wasn't sure how to title what i am trying to do. My objective is to create a dataset of 1000 obs each would be the length of the run. I have created a phase1 dataset, from which a set of control limits are produced. What i am trying to do now is create a phase2 dataset most likely using rnorm. what im trying to do is create a repeat loop that will continuously create values in the phase2 dataset until one of those values is outside of the control limits produced from the phase1 dataset. for example if i had 3.0 and -3.0 as control limits the phase2 dataset would create a bunch of observations until obs 398 when the value here happens to be 3.45, thus stopping the creation of data. my objective is then to record the number 398. Furthermore, I am then trying to loop the code back to the phase1 dataset/ control limits portion and create a new set of control limits and then run another phase2, until i have 1000 run lengths recorded. the code i have for the phase1/ control limits works fine and looks like this:
nphase1=50
nphase2=1000
varcount=1
meanshift= 0
sigmashift= 1
##### phase1 dataset/ control limits #####
phase1 <- matrix(rnorm(nphase1*varcount, 0, 1), nrow = nphase1, ncol=varcount)
mean_var <- apply(phase1, 2, mean)
std_var <- apply(phase1, 2, sd)
df_var <- data.frame(mean_var, std_var)
Upper_SPC_Limit_Method1 <- with(df_var, mean_var + 3 * std_var)
Lower_SPC_Limit_Method1 <- with(df_var, mean_var - 3 * std_var)
df_control_limits<- data.frame(Upper_SPC_Limit_Method1, Lower_SPC_Limit_Method1)
I have previously created this code in SAS and it looks like this. might be a better reference for what i am trying to achieve then me trying to explain it.
%macro phase2_dataset (n=,varcount=, meanshift=, sigmashift=, nphase1=,simID=,);
%do z=1 %to &n;
%phase1_dataset (n=&nphase1, varcount=&varcount);
data phase2; set control_limits n=lastobs;
call streaminit(0);
do until (phase2_var1<Lower_SPC_limit_method1_var1 or
phase2_var1>Upper_SPC_limit_method1_var1);
phase2_var1 = rand("normal", &meanshift, &sigmashift);
output;
end;
run;
ods exclude all;
proc means data=phase2;
var phase2_var1;
ods output summary=x;
run;
ods select all;
data run_length; set x;
keep Phase2_var1_n;
run;
proc append base= QA.Phase2_dataset&simID data=Run_length force; run;
%end;
%mend;
Also been doing research about using a while loop in replace of the repeat loop.
Im new to R so Any ideas you are able to throw my way are greatly appreciated. Thanks!
Using a while loop indeed seems to be the way to go. Here's what I think you're looking for:
set.seed(10) #Making results reproducible
replicate(100, { #100 is easier to display here
phase1 <- matrix(rnorm(nphase1*varcount, 0, 1), nrow = nphase1, ncol=varcount)
mean_var <- colMeans(phase1) #Slightly better than apply
std_var <- apply(phase1, 2, sd)
df_var <- data.frame(mean_var, std_var)
Upper_SPC_Limit_Method1 <- with(df_var, mean_var + 3 * std_var)
Lower_SPC_Limit_Method1 <- with(df_var, mean_var - 3 * std_var)
df_control_limits<- data.frame(Upper_SPC_Limit_Method1, Lower_SPC_Limit_Method1)
#Phase 2
x <- 0
count <- 0
while(x > Lower_SPC_Limit_Method1 && x < Upper_SPC_Limit_Method1) {
x <- rnorm(1)
count <- count + 1
}
count
})
The result is:
[1] 225 91 97 118 304 275 550 58 115 6 218 63 176 100 308 844 90 2758
[19] 161 311 1462 717 2446 74 175 91 331 210 118 1517 420 32 39 201 350 89
[37] 64 385 212 4 72 730 151 7 1159 65 36 333 97 306 531 1502 26 18
[55] 67 329 75 532 64 427 39 352 283 483 19 9 2 1018 137 160 223 98
[73] 15 182 98 41 25 1136 405 474 1025 1331 159 70 84 129 233 2 41 66
[91] 1 23 8 325 10 455 363 351 108 3
If performance becomes a problem, perhaps it would be interesting to explore some improvements, like creating more numbers with rnorm() at a time and then counting how many are necessary to exceed the limits and repeat if necessary.

Resources