In cython, one can use array views, e.g.
cdef void func(float[:, :] arr)
In my usage the second dimension should always have a shape of 2. Can I tell cython this? I was thinking of something like:
cdef void func(float[:, 2] arr)
but this results in an invalid syntax; Or is it possible to have something more similar to c++, e.g.
cdef void func(tuple<float, float>[:] arr)
Thanks in advance!
You can use a 2D static array instead. Just use the pointer notation. Here is how you achieve it
def pyfunc():
# static 1D array
cdef float *arr1d = [1,-1, 0, 2,-1, -1, 4]
# static 2D array
cdef float[2] *arr2d = [[1,.2.],[3.,4.]]
# pass to a "cdef"ed function
cfunc(arr2d)
# your function signature would now look like this
cdef void cfunc(float[2] *arr2d):
print("my 2D static array")
print(arr2d[0][0],arr2d[0][1],arr2d[1][0],arr2d[1][1])
Calling it you get:
>>> pyfunc()
my 2D static array
1.0, 2.0, 3.0, 4.0
I don't think this is really supported, but if you want to do this then the best way is probably to use memoryviews of structs (which are compatible with numpys custom dtypes):
import numpy as np
cdef packed struct Pair1: # packed ensures it matches custom numpy dtypes
# (but probably doesn't matter here!)
double x
double y
# pair 1 matches arrays of this dtype
pair_1_dtype = [('x',np.float64), ('y',np.float64)]
cdef packed struct Pair2:
double data[2]
pair_2_dtype = [('data',np.float64, (2,))]
def pair_func1(Pair1[::1] x):
# do some very basic work
cdef Pair1 p
cdef Py_ssize_t i
p.x = 0; p.y = 0
for i in range(x.shape[0]):
p.x += x[i].x
p.y += x[i].y
return p # take advantage of auto-conversion to a dict
def pair_func2(Pair2[::1] x):
# do some very basic work
cdef Pair2 p
cdef Py_ssize_t i
p.data[0] = 0; p.data[1] = 0
for i in range(x.shape[0]):
p.data[0] += x[i].data[0]
p.data[1] += x[i].data[1]
return p # take advantage of auto-conversion to a dict
and a function to show you how to call it:
def call_pair_funcs_example():
# generate data of correct dtype
d = np.random.rand(100,2)
d1 = d.view(dtype=pair_1_dtype).reshape(-1)
print(pair_func1(d1))
d2 = d.view(dtype=pair_2_dtype).reshape(-1)
print(pair_func2(d2))
The thing I'd like to have done is:
ctypedef double[2] Pair3
def pair_func3(Pair3[::1] x):
# do some very basic work
cdef Pair3 p
cdef Py_ssize_t i
p[0] = 0; p[1] = 0
for i in range(x.shape[0]):
p[0] += x[i][0]
p[1] += x[i][1]
return p # ???
That compiles successfully, but I couldn't find any way of converting it from numpy. If you could work out how to get this version to work then I think it would be the most elegant solution.
Note that I'm not convinced of the performance advantages of any of these solutions. Your best move is probably to tell Cython that the trailing dimension is contiguous in memory (e.g. double [:,::1]) but let it be any size.
Related
I am trying to write an explicit Successive Overrelaxation Function over a 2D matrix. In this case for an electrostatic potential.
When trying to optimize this in Cython I seem to get an error that I am not quite sure I understand.
%%cython
cimport cython
import numpy as np
cimport numpy as np
from libc.math cimport pi
#SOR function
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.initializedcheck(False)
#cython.nonecheck(False)
def SOR_potential(np.float64_t[:, :] potential, mask, int max_iter, float error_threshold, float alpha):
#the ints
cdef int height = potential.shape[0]
cdef int width = potential.shape[1] #more general non quadratic
cdef int it = 0
#the floats
cdef float error = 0.0
cdef float sor_adjustment
#the copy array we will iterate over and return
cdef np.ndarray[np.float64_t, ndim=2] input_matrix = potential.copy()
#set the ideal alpha if user input is 0.0
if alpha == 0.0:
alpha = 2/(1+(pi/((height+width)*0.5)))
#start the SOR loop. The for loops omit the 0 and -1 index\
#because they are *shadow points* used for neuman boundary conditions\
cdef int row, col
#iteration loop
while True:
#2-stencil loop
for row in range(1, height-1):
for col in range(1, width-1):
if not(mask[row][col]):
potential[row][col] = 0.25*(input_matrix[row-1][col] + \
input_matrix[row+1][col] + \
input_matrix[row][col-1] + \
input_matrix[row][col+1])
sor_adjustment = alpha * (potential[row][col] - input_matrix[row][col])
input_matrix[row][col] = sor_adjustment + input_matrix[row][col]
error += np.abs(input_matrix[row][col] - potential[row][col])
#by the end of this loop input_matrix and potential have diff values
if error<error_threshold:
break
elif it>max_iter:
break
else:
error = 0
it = it + 1
return input_matrix, error, it
and I used a very simple example for an array to see if it would give an error output.
test = [[True, False], [True, False]]
pot = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float64)
SOR_potential(pot, test, 50, 0.1, 0.0)
Gives out this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [30], line 1
----> 1 SOR_potential(pot, test, 50, 0.1, 0.0)
File _cython_magic_6c09a5060df996862b8e35adacc0e25c.pyx:21, in _cython_magic_6c09a5060df996862b8e35adacc0e25c.SOR_potential()
TypeError: Cannot convert _cython_magic_6c09a5060df996862b8e35adacc0e25c._memoryviewslice to numpy.ndarray
But when I delete the np.float64_t[:, :] part from
def SOR_potential(np.float64_t[:, :] potential,...)
the code works. Of course, the simple 2x2 matrix will not converge but it gives no errors. Where is the mistake here?
I also tried importing the modules differently as suggested here
Cython: how to resolve TypeError: Cannot convert memoryviewslice to numpy.ndarray?
but I got 2 errors instead of 1 where there were type mismatches.
Note: I would also like to ask, how would I define a numpy array of booleans to put in front of the "mask" input in the function?
A minimal reproducible example of your error message would look like this:
def foo(np.float64_t[:, :] A):
cdef np.ndarray[np.float64_t, ndim=2] B = A.copy()
# ... do something with B ...
return B
The problem is, that A is a memoryview while B is a np.ndarray. If both A and B are memoryviews, i.e.
def foo(np.float64_t[:, :] A):
cdef np.float64_t[:, :] B = A.copy()
# ... do something with B ...
return np.asarray(B)
your example will compile without errors. Note that you then need to call np.asarray if you want to return a np.ndarray.
Regarding your second question: You could use a memoryview with dtype np.uint8_t
def foo(np.float64_t[:, :] A, np.uint8_t[:, :] mask):
cdef np.float64_t[:, :] B = A.copy()
# ... do something with B and mask ...
return np.asarray(B)
and call it like this from Python:
mask = np.array([[True, True], [False, False]], dtype=bool)
A = np.ones((2,2), dtype=np.float64)
foo(A, mask)
PS: If your array's buffers are guaranteed to be C-Contiguous, you can use contiguous memoryviews for better performance:
def foo(np.float64_t[:, ::1] A, np.uint8_t[:, ::1] mask):
cdef np.float64_t[:, ::1] B = A.copy()
# ... do something with B and mask ...
return np.asarray(B)
I would like to do something like this:
Base.#kwdef mutable struct Setup
# physics
lx = 20.0
dc = 1.0
n = 4
# inital condition
ic(x) = exp(-(x-lx/4)^2)
# numerics
nx = 200
nvis = 50
# derived numerics
dx = lx/nx
dt = dx^2/dc/10
nt = nx^2 ÷ 5
# arrays
xc = LinRange(dx/2,lx-dx/2,nx)
C0 = ic.(xc)
C = copy(C)
q = zeros(nx-1)
# collections for easy use
dgl_params=[dc,n]
end
The problem here is that it says ic was undefined. Makes sense, because ic is not in the global scope.
Then I tried writing an outside constructor instead (I am not writing an inside constructor as that would overwrite the default constructor).
Base.#kwdef mutable struct Setup
# physics
lx = 20.0
dc = 1.0
n = 4
# inital condition
ic(x) = exp(-(x-lx/4)^2)
# numerics
nx = 200
nvis = 50
# derived numerics
dx = lx/nx
dt = dx^2/dc/10
nt = nx^2 ÷ 5
# arrays
xc = LinRange(dx/2,lx-dx/2,nx)
# C0 = ic.(xc)
C0
C = copy(C)
q = zeros(nx-1)
# collections for easy use
dgl_params=[dc,n]
end
function Setup()
Setup(Setup.ic(Setup.xc))
end
Setup()
But now it says DataType has no field ic which of course makes sense, I want the ic of the object itself. However there appears to be no selfor this keyword in julia.
Strangely enough the above seems to work fine with dx or dt which are also depending on other variables
Normally the design is to have multiple dispatch and functions outside of the object
When creating structs always provide the datatype of elements
For this large structs usually you will find out that using Parameters package will be more convenient when later debugging
The easiest way to circumvent the limitation is to have a lambda function in a field such as (this is however not the recommended Julia style):
#with_kw mutable struct Setup
lx::Float64 = 20.0
ic::Function = x -> lx * x
end
This can be now used as:
julia> s = Setup(lx=30)
Setup
lx: Float64 30.0
ic: #10 (function of type var"#10#14"{Int64})
julia> s.ic(10)
300
Actually, it is not in the design to have what in Java or C++ you would call "member functions". Part of this is Julia's will to benefit from the multiple dispatch programming paradigm. In Julia, mutables are pointers, so you pass them directly to a function, e.g.
function ic(setup::Setup, x)
return exp(-(x-setup.lx/4)^2)
end
That said, there is still a way to have more Java-esque classes, though not super recommended. Check this thread and, particularly, the answered marked as solution, given by one of Julia's authors themself.
Okay, I found the solution.
This does not work, because there are no methods in julia:
Base.#kwdef mutable struct S
n = 5
m
f(x) = x + 100
A = f.(randn(n,m))
end
s = S(m=5) # ERROR: UndefVarError: f not defined
s.A
s.f(5)
But this does work, because here f is a variable and not a function
Base.#kwdef mutable struct S
n = 5
m
f= x-> x + 100
A = f.(randn(n,m))
end
s = S(m=5)
s.A
s.f(5)
I am trying to convert a python list of lists to a cython multidimensional array.
The list has 300,000 elements each element is a list of 10 integers. For this case here created randomly. The way I tried works fine as long as my cython multidimensional array is not bigger then somewhere about [210000][10]. My actual project of course is more complex but I believe if I get this example here to work, the rest is just more of the same.
I have a cython file "array_cy.pyx" with the following content:
cpdef doublearray(list list1):
cdef int[200000][10] a
cdef int i
cdef int y
cdef int j
cdef int value = 0
for i in range(200000):
for y in range(10):
a[i][y] = list1[i][y]
print("doublearray")
print(a[40000][6])
cpdef doublearray1(list list1):
cdef int[300000][10] a
cdef int i
cdef int y
cdef int value = 0
for i in range(300000):
for y in range(10):
a[i][y] = list1[i][y]
print("doublearray1")
print(a[40000][6])
Then in the main.py I have
import array_cy
import random
list1 = []
for i in range(300000):
list2 = []
for j in range(10):
list2.append(random.randint(0, 22))
list1.append(list2)
array_cy.doublearray(list1)
array_cy.doublearray1(list1)
And the output is:
doublearray
4
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
So the function doublearray(list) works fine and the output is some random number as expected. But doublearray1(list) gives SIGSEGV. If in doublearray1(list) I comment out the line
print(a[40000][6])
it also runs through witout a problem, which makes sense because I never try to access the array. I dont understand why it does not work. I thought in C the limit of elements in an array would be defined by the hardware. My goal is to convert the python list of lists in a way to a cython multidimensional array, that I can access without any python interaction.
The suggested question is about using malloc I think that is what I need but I still dont get it to work because if I change the two functions to:
cpdef doublearray(list list1):
cdef int[200000][10] a = <int**> malloc(200000 * 10 * sizeof(int))
cdef int i
cdef int y
cdef int j
cdef int value = 0
for i in range(200000):
for y in range(10):
a[i][y] = list1[i][y]
print("doublearray")
print(a[40000][6])
cpdef doublearray1(list list1):
cdef int[300000][10] a = <int**> malloc(300000 * 10 * sizeof(int))
cdef int i
cdef int y
cdef int value = 0
for i in range(300000):
for y in range(10):
a[i][y] = list1[i][y]
print("doublearray1")
print(a[40000][6])
still only the smaller array works.
The way to do that in C is that you transform the list of lists with length 10 into a 1D-Array. And Using malloc to allocate enough space and freeing it afterwards. Another way is to use an array of pointers.
cpdef doublearray1(list list1):
cdef int *a = <int *> malloc(3000000*sizeof(int))
cdef int i
cdef int y
cdef int value = 0
for i in range(300000):
for y in range(10):
a[i*10+y] = list1[i][y]
print("doublearray1")
# same as a[2][5] in 2D-Array
print(a[25])
Why is the image coming out as black when saved? I am just beginning to learn opencl.
Without opencl, on purely CPU, the loop iterates through the matrix and uses the rgb2gray average formula to store the values in gray array.
Using windows and python 3.8
import pyopencl
import numpy as np
import imread
import matplotlib.pyplot as plt
ocl_platforms = (platform.name for platform in pyopencl.get_platforms())
print("\n".join(ocl_platforms))
# select platform
platform = pyopencl.get_platforms()[0]
# select device
device = platform.get_devices()[0]
# create context
ctx = pyopencl.Context(devices=[device])
img = imread.imread('gigapixel.jpg')
r = np.array(img[:, :, 0], dtype=np.float32)
g = np.array(img[:, :, 1], dtype=np.float32)
b = np.array(img[:, :, 2], dtype=np.float32)
gray = np.empty_like(r)
# without gpu
for i in range(r.shape[0]):
for j in range(r.shape[1]):
gray[i, j] = (r[i, j] + g[i, j] + b[i, j]) / 3
plt.imshow(gray)
plt.show()
# convert to uint8
gray = np.uint8(gray)
# save image
imread.imsave('gray_cpu.jpg', gray)
with GPU the rest of the code is
gray = np.empty_like(r)
program_source = """
__kernel void rgb2gray(__global float *r, __global float *g, __global float *b, __global
float *gray) {
int i = get_global_id(0);
int j = get_global_id(1);
gray[i, j] = (r[i, j] + g[i, j] + b[i, j])/ 3;
}
"""
gpu_program_source = pyopencl.Program(ctx, program_source)
gpu_program = gpu_program_source.build()
program_kernel_names = gpu_program.get_info(pyopencl.program_info.KERNEL_NAMES)
print(program_kernel_names)
queue = pyopencl.CommandQueue(ctx)
r_buf = pyopencl.Buffer(ctx, pyopencl.mem_flags.READ_ONLY |
pyopencl.mem_flags.COPY_HOST_PTR, hostbuf=r)
g_buf = pyopencl.Buffer(ctx, pyopencl.mem_flags.READ_ONLY |
pyopencl.mem_flags.COPY_HOST_PTR, hostbuf=g)
b_buf = pyopencl.Buffer(ctx, pyopencl.mem_flags.READ_ONLY |
pyopencl.mem_flags.COPY_HOST_PTR, hostbuf=b)
gray_buf = pyopencl.Buffer(ctx, pyopencl.mem_flags.WRITE_ONLY, r.nbytes)
gpu_program.rgb2gray(queue, r.shape, None, r_buf, g_buf, b_buf, gray_buf)
pyopencl.enqueue_copy(queue, gray, gray_buf)
plt.imshow(gray)
plt.show()
gray = np.uint8(gray)
imread.imsave('gigapixel_gray.jpg', gray)
If you need to keep the array[x,y] notation, then try Numba instead of PyOpenCL. Numba converts Python function's bytecode into OpenCL kernels.
PyOpenCL is only a wrapper over OpenCL API so it compiles the given kernel code for C or C++ languages directly. So you need to index like this:
int i = get_global_id(0);
int j = get_global_id(1);
gray[i][j] = (r[i][j] + g[i][j] + b[i][j])/ 3;
If you want to see the errors produced at any stage(kernel compiling, buffer copying, etc), you need to catch exceptions of Python because PyOpenCL binds OpenCL-errors to Python exceptions. You should check them like this:
try:
your_opencl_accelerated_function()
except:
print("Something didn't work")
How would I determine the number of elements in a pointer variable in cython? I saw that in C one way seems to be sizeof(ptr)/sizeof(int), if the pointer points to int variables. But that doesn't seem to work in cython. E.g. when I tried to join two memory views into a single pointer like so:
from libc.stdlib cimport malloc, free
cdef int * join(int[:] a, int[:] b):
cdef:
int n_a = a.shape[0]
int n_b = b.shape[0]
int new_size = n_a + n_b
int *joined = <int *> malloc(new_size*sizeof(int))
int i
try:
for i in range(n_a):
joined[i] = a[i]
for i in range(n_b):
joined[n_a+i] = b[i]
return joined
finally:
free(joined)
#cython.cdivision(True)
def join_memviews(int[:] n, int[:] m):
cdef int[:] arr_fst = n
cdef int[:] arr_snd = m
cdef int *arr_new
cdef int new_size
arr_new = join(arr_fst,arr_snd)
new_size = sizeof(arr_new)/sizeof(int)
return [arr_new[i] for i in range(new_size)]
I do not get the desired result when calling join_memviews from a python script, e.g.:
# in python
a = np.array([1,2])
b = np.array([3,4])
a_b = join_memviews(a,b)
I also tried using the types
DTYPE = np.int
ctypedef np.int_t DTYPE_t
as the arguement inside sizeof(), but that didn't work either.
Edit: The handling of the pointer variable was apparently a bit careless of me. I hope the following is fine (even though it might not be a prudent approach):
cdef int * join(int[:] a, int[:] b, int new_size):
cdef:
int *joined = <int *> malloc(new_size*sizeof(int))
int i
for i in range(n_a):
joined[i] = a[i]
for i in range(n_b):
joined[n_a+i] = b[i]
return joined
def join_memviews(int[:] n, int[:] m):
cdef int[:] arr_fst = n
cdef int[:] arr_snd = m
cdef int *arr_new
cdef int new_size = n.shape[0] + m.shape[0]
try:
arr_new = join(arr_fst,arr_snd, new_size)
return [arr_new[i] for i in range(new_size)]
finally:
free(arr_new)
You can't. It doesn't work in C either. sizeof(ptr) returns the amount of memory used to store the pointer (i.e. typically 4 or 8 depending on your system) rather than the length of the array. The lengths of your malloced arrays are something that you need to keep track of manually.
Additionally the following code is a recipe for disaster:
cdef int *joined = <int *> malloc(new_size*sizeof(int))
try:
return joined
finally:
free(joined)
The free happens immediately on function exit so that an invalid pointer is returned to the calling function.
You should be using properly managed Python arrays (either from numpy or the standard library array module) unless you absolutely can't avoid it.