using zip and drop in Julia - functional-programming

This code doesn't work for some reason:
collect(zip(drop([1,2,3], 1), drop([1,2,3], 1)))
I'm trying to drop the first element of a collection and zip up two copies of the result.

This code runs perfectly fine for me. Please check your version using versioninfo()
julia> collect(zip(drop([1,2,3], 1), drop([1,2,3], 1)))
2-element Array{Tuple{Int64,Int64},1}:
(2,2)
(3,3)
julia> versioninfo()
Julia Version 0.5.1
Commit 6445c82 (2017-03-05 13:25 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin13.4.0)
CPU: Intel(R) Core(TM) i5-3210M CPU # 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, ivybridge)
julia>

Related

Rcpp modules method overloading

Is there a Rcpp Modules "internal" (or say correct) way how to export overloaded methods?
The Rcpp-modules vignette still has a "TODO" on providing a good example (in section 2.2.5 it says "TODO: mention overloading, need good example.").
I can export my overloaded methods following this solution from Romain François, however, issues can occur with the provided solutions. E.g.:
library(Rcpp)
# define example class in C++
sourceCpp(code = paste0('
#include<Rcpp.h>
class Test {
private:
int a;
public:
Test(): a{0} {};
Test(int x): a{x} {};
int foo();
int foo(int x);
};
int Test::foo() {
return a;
}
int Test::foo(int x) {
return a + x;
}
RCPP_MODULE(rawdata_module) {',
# solution 1 to handle member function overloading provided in
# https://lists.r-forge.r-project.org/pipermail/rcpp-devel/2010-November/001326.html
' int (Test::*foo1)(int x) = &Test::foo;
int (Test::*foo0)() = &Test::foo;
Rcpp::class_<Test>( "Test" )
.constructor()
.constructor<int>()',
# works: foo with 1 argument before 0 arguments!
' .method("foo", foo1)
.method("foo", foo0)',
# solution 2 (also working)
#works: foo with 1 argument before 0 arguments!
' //.method("foo", ( int (Test::*)(int) )(&Test::foo) )
//.method("foo", ( int (Test::*)() )(&Test::foo) )
;
}
'))
# create new object in R
obj0 <- Test$new()
obj1 <- Test$new(5L)
# test overloading
obj0$foo()
obj0$foo(3L)
obj1$foo()
obj1$foo(3L)
For both solutions, if we export the method with 0 arguments before foo with one argument (foo0 before foo1 in solution 1), the code compiles fine and we can call both functions from R, however only method foo with 0 arguments is called.
Thx & please be patient with me. I'd call myself rather inexperienced in C++...
Compiler info:
gcc (Debian 10.2.1-6) 10.2.1 20210110
R session info:
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Progress Linux 6.99 (fuchur-backports)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rcpp_1.0.8.3 colorout_1.2-2
loaded via a namespace (and not attached):
[1] compiler_4.2.0 tools_4.2.0 codetools_0.2-18 RhpcBLASctl_0.21-247.1

In Julia, creating a Weights vector in statsbase

I am playing a bit with Julia.
Consider this function:
function drawValues(fromDistribution, byCount)
#=
inputs:
fromDistribution :
A 2D array
Each element is an array with two elements
The first one is a value, and the second one is the probability of that value
We will draw a value out of this distribution from a random number generator
byCount :
An integer
We draw that many values from the source distribution
=#
values = []
wts = []
for i = 1:length(fromDistribution)
push!(values, fromDistribution[i][1])
push!(wts , fromDistribution[i][2])
end
w = Weights(wts)
res = []
for i = 1:byCount
r = sample(values, w)
push!(res, r)
end
plot(values, wts)
print(res)
end
This throws the error :
ERROR: MethodError: no method matching Weights(::Array{Any,1},
::Float64) Closest candidates are: Weights(::var"#18#V",
::var"#16#S") where {var"#16#S"<:Real, var"#17#T"<:Real,
var"#18#V"<:AbstractArray{var"#17#T",1}} at
/home/hedgehog/.julia/packages/StatsBase/EA8Mh/src/weights.jl:13
Weights(::Any) at
/home/hedgehog/.julia/packages/StatsBase/EA8Mh/src/weights.jl:16
Stacktrace: [1] Weights(::Array{Any,1}) at
/home/hedgehog/.julia/packages/StatsBase/EA8Mh/src/weights.jl:16 [2]
drawValues(::Array{Array{Float64,1},1}, ::Int64) at
/home/hedgehog/LASER.jl:51 [3] top-level scope at REPL[13]:1 [4]
run_repl(::REPL.AbstractREPL, ::Any) at
/build/julia/src/julia-1.5.3/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:288
It seems, that the second definition ( Weights(::Array{Any,1})) whould fit. But somehow Julia sees two input arguments?
Please help.
Version details :
Julia Version 1.5.3
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD Ryzen 7 3700X 8-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM:
libLLVM-10.0.1 (ORCJIT, znver2)
Your Vectors have elements of type any.
It should be:
wts = Float64[]
When you write wts=[] it is an equivalent of wts=Any[].
Have a look at the weight methods:
julia> methods(weights)
# 3 methods for generic function "weights":
[1] weights(vs::AbstractArray{T,1} where T<:Real) in StatsBase at c:\JuliaPkg\Julia1.5.3\packages\StatsBase\EA8Mh\src\weights.jl:76
[2] weights(vs::AbstractArray{T,N} where N where T<:Real) in StatsBase at c:\JuliaPkg\Julia1.5.3\packages\StatsBase\EA8Mh\src\weights.jl:77
[3] weights(model::StatisticalModel) in StatsBase at c:\JuliaPkg\Julia1.5.3\packages\StatsBase\EA8Mh\src\statmodels.jl:143
A container having elements of subtype of Real is required.
Similarly for the other containers providing the types is recommended as well:
value = Float64[]
res = Float64[] # or maybe Int[] depending on what your code does

Julia: segfault when assigning DataFrame column while using threads

The following code creates a segfault for me - is this a bug? And if so, in which component?
using DataFrames
function test()
Threads.#threads for i in 1:50
df = DataFrame()
df.foo = 1
end
end
test()
(need to start Julia with multithreading support for this to work, eg JULIA_NUM_THREADS=50; julia)
It only generates a segfault if the number of iterations / threads is sufficiently high, eg 50. For lower numbers it only sporadically / never does so.
My environment:
julia> versioninfo()
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
OS: Linux (x86_64-redhat-linux)
CPU: Intel(R) Xeon(R) Gold 6254 CPU # 3.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 50
It is most likely caused by the fact that you are using deprecated syntax so probably something with deprecation handling messes up things (I do not have enough cores to test it).
In general your code uses deprecated syntax (and produces something different than you probably expect):
~$ julia --depwarn=yes --banner=no
julia> using DataFrames
julia> df = DataFrame()
0×0 DataFrame
julia> df.foo=1
┌ Warning: `setproperty!(df::DataFrame, col_ind::Symbol, v)` is deprecated, use `df[!, col_ind] .= v` instead.
│ caller = top-level scope at REPL[3]:1
└ # Core REPL[3]:1
1
julia> df # note that the resulting deprecated syntax has added the column but it has 0 rows
0×1 DataFrame
julia> df2 = DataFrame()
0×0 DataFrame
julia> df2.foo = [1] # this is a correct syntax - assign a vector
1-element Array{Int64,1}:
1
julia> df2[:, :foo2] .= 1 # or use broadcasting
1-element Array{Int64,1}:
1
julia> insertcols!(df2, :foo3 => 1) # or use insertcols! which does broadcasting automatically, see the docstring for details
1×3 DataFrame
│ Row │ foo │ foo2 │ foo3 │
│ │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1 │ 1 │ 1 │ 1 │
The reason why df.foo = 1 is disallowed and df.foo = [1] is required follows the fact that, as opposed to e.g. R, Julia distinguishes scalars and vectors (in R everything is a vector).
Going back to the original question something e.g. like this should work:
using DataFrames
function test()
Threads.#threads for i in 1:50
df = DataFrame()
df.foo = [1]
end
end
test()
please let me know if it causes problems or not. Thank you!

ERROR: undefined reference to `sdot_' when using arma::dot

I use Rcpp::sourceCpp("test.cpp") and it output the following error information. Note that check1() works and check2 fails. The difference is "arma::vec" and "arma::fvec". The error happens when I tried it on a Windows. When I tried it on linux, it works.
(EDIT: I have added my R environment on Linux. PS: Results on Linux shows that float is faster than double, which is why I prefer using float)
C:/RBuildTools/3.5/mingw_64/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-36~1.1/include" -DNDEBUG -I../inst/include -fopenmp -I"C:/Users/wenji/OneDrive/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/wenji/OneDrive/Documents/R/win-library/3.6/RcppArmadillo/include" -I"Y:/" -O2 -Wall -mtune=generic -c check.cpp -o check.o
C:/RBuildTools/3.5/mingw_64/bin/g++ -shared -s -static-libgcc -o sourceCpp_3.dll tmp.def check.o -fopenmp -LC:/PROGRA~1/R/R-36~1.1/bin/x64 -lRlapack -LC:/PROGRA~1/R/R-36~1.1/bin/x64 -lRblas -lgfortran -lm -lquadmath -LC:/PROGRA~1/R/R-36~1.1/bin/x64 -lR
check.o:check.cpp:(.text+0xa18): undefined reference to `sdot_'
collect2.exe: error: ld returned 1 exit status
The below is the R environment on Windows
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1 RcppArmadillo_0.9.850.1.0
[4] Rcpp_1.0.3
The below is the R environment on Linux
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3
[3] RcppArmadillo_0.9.850.1.0 Rcpp_1.0.4
The below is The codes of "test.cpp"
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector timesTwo(NumericVector x) {
return x * 2;
}
// [[Rcpp::export]]
arma::vec check1(arma::vec x1, arma::vec x2, int rep){
int n = x1.size();
arma::vec y(n);
y.fill(0);
for(int i = 0; i < rep; i ++){
y += x1 * arma::dot(x1, x2);
}
return y;
}
// [[Rcpp::export]]
arma::fvec check2(arma::fvec x1, arma::fvec x2, int rep){
int n = x1.size();
arma::fvec y(n);
y.fill(0);
for(int i = 0; i < rep; i ++){
y += x1 * arma::dot(x1, x2);
}
return y;
}
// You can include R code blocks in C++ files processed with sourceCpp
// (useful for testing and development). The R code will be automatically
// run after the compilation.
//
/*** R
timesTwo(42)
n = 100000
x1 = rnorm(n)
x2 = rnorm(n)
rep = 1000
system.time(y1 <- check1(x1, x2, rep))
system.time(y2 <- check2(x1, x2, rep))
head(y1)
head(y2)
*/
The below is the output on Linux
> system.time(y1 <- check1(x1, x2, rep))
user system elapsed
0.156 0.000 0.160
> system.time(y2 <- check2(x1, x2, rep))
user system elapsed
0.088 0.000 0.100
There are two questions here:
Why did it work on Linux but not Windows?
R only has int and double, but not float (or 64-bit integer). On Windows you may be linking with R's own internal LAPACK which likely only has double. On Linux float may be present in the system LAPACK. That is my best guess.
Can you / should you use float with Armadillo?
Not really. R only has double and not float so to get values back and forth will always involve copies and is less efficient. I would stick with double.

PyOpenCL enqueue_copy hanging when run on different devices

I'm having trouble getting a kernel to run on two different OpenCL platforms. The only difference in the platforms is one is OpenCL 1.1 and the other 1.2 as such:
Code works on this device (OS X 10.8):
===============================================================
('Platform name:', 'Apple')
('Platform profile:', 'FULL_PROFILE')
('Platform vendor:', 'Apple')
('Platform version:', 'OpenCL 1.2 (Sep 20 2012 17:42:28)')
---------------------------------------------------------------
('Device name:', 'Intel(R) Core(TM) i5-3427U CPU # 1.80GHz')
('Device type:', 'CPU')
('Device memory: ', 8192L, 'MB')
('Device max clock speed:', 1800, 'MHz')
('Device compute units:', 4)
Target device (Ubuntu 11.04):
===============================================================
('Platform name:', 'NVIDIA CUDA')
('Platform profile:', 'FULL_PROFILE')
('Platform vendor:', 'NVIDIA Corporation')
('Platform version:', 'OpenCL 1.1 CUDA 4.2.1')
---------------------------------------------------------------
('Device name:', 'Tesla M2050')
('Device type:', 'GPU')
('Device memory: ', 3071, 'MB')
('Device max clock speed:', 1147, 'MHz')
('Device compute units:', 14)
===============================================================
('Platform name:', 'NVIDIA CUDA')
('Platform profile:', 'FULL_PROFILE')
('Platform vendor:', 'NVIDIA Corporation')
('Platform version:', 'OpenCL 1.1 CUDA 4.2.1')
---------------------------------------------------------------
('Device name:', 'Tesla M2050')
('Device type:', 'GPU')
('Device memory: ', 3071, 'MB')
('Device max clock speed:', 1147, 'MHz')
('Device compute units:', 14)
I've traced what I believe to the source of the hang to the following code:
# set up
host_array = numpy.array(arr)
device_buffer = pyopencl.Buffer(context, pyopencl.mem_flags.WRITE_ONLY, host_array.nbytes)
# run the kernel
program.run(queue, host_array.shape, None, device_buffer)
# copy the results back --- this call causes the code to hang ----
pyopencl.enqueue_copy(queue, host_array, device_buffer)
There are no code changes between the two devices and both devices are running PyOpenCL 2013.1. Am I missing something? Any suggestion is much appreciated.
Try adding a .wait() to the program.run. This will determine if it's actually the program that's hanging.
Turns out the problem was a threading issue. I was using a 2nd thread spawned with the threading module to make my pyopencl calls. I believe the problem was that the context I was using to call pyopencl was created on the main thread and I think this was causing some sort of issue.
To fix I just made sure to declare my context, queue, and created program on the 2nd thread instead of on the primary thread.

Resources