What does 'CuArray only supports element types that are stored inline' mean in Julia using CuArrays? - julia

I want to know why some codes work fine when using standard arrays but fail when using CuArrays.
For example, I have an array time_idx defined as:
1×32 CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
0.71173 0.941251 0.571602 0.037198 0.212053 0.227296 0.457712 0.697708 0.788338 0.994031 0.228599 … 0.856314 0.830083 0.111376 0.0333812 0.722638 0.293733 0.114187 0.072304 0.275268
and a vector of CuArrays vehicle_states, each with a dim of 7*32:
3-element Vector{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}:
[0.49417984 0.11234676 … 0.107337356 0.72619927; 0.46416637 0.21656695 … 0.18117706 0.18970703; … ; 0.15575896 0.79976654 … 0.3788491 0.29301012; 0.97315633 0.8638843 … 0.5506643 0.30244973]
[0.4448264 0.9205822 … 0.61369383 0.5310524; 0.75463957 0.29982162 … 0.13896087 0.09793778; … ; 0.60275537 0.39284942 … 0.2803427 0.7379274; 0.8305204 0.056631837 … 0.16771089 0.9385667]
[0.78282833 0.594285 … 0.65157485 0.82812166; 0.28565544 0.021899216 … 0.7051293 0.48643407; … ; 0.18139555 0.44223073 … 0.9017556 0.3409817; 0.5128845 0.79966474 … 0.039010685 0.53230214]
I want to concatenate them using broadcast behavior but an error occurred (this is fine when using standard arrays):
vcat.(time_idx, vehicle_states) # CuArray only supports element types that are stored inline
But if I don't use broadcasting, it will work just fine:
[vcat(time_idx, vehicle_state) for vehicle_state in vehicle_states]
3-element Vector{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}:
[0.7117298 0.94125116 … 0.07230403 0.27526757; 0.49417984 0.11234676 … 0.107337356 0.72619927; … ; 0.15575896 0.79976654 … 0.3788491 0.29301012; 0.97315633 0.8638843 … 0.5506643 0.30244973]
[0.7117298 0.94125116 … 0.07230403 0.27526757; 0.4448264 0.9205822 … 0.61369383 0.5310524; … ; 0.60275537 0.39284942 … 0.2803427 0.7379274; 0.8305204 0.056631837 … 0.16771089 0.9385667]
[0.7117298 0.94125116 … 0.07230403 0.27526757; 0.78282833 0.594285 … 0.65157485 0.82812166; … ; 0.18139555 0.44223073 … 0.9017556 0.3409817; 0.5128845 0.79966474 … 0.039010685 0.53230214]
Why is that?

when you try to run:
vcat.(time_idx, vehicle_states)
it's probably trying to make the outer container CuArray instead of Vector, if you look at this variable's type:
3-element Vector{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}:
the outer type is just Vector, it's a Vector of CuArray. And more importantly, you cannot have a CuArray of CuArray because of the same reason outlined in the error message.
The error message is basically saying, each element inside the CuArray has to be isbits (that's the only way to store them in VRAM) but when you have a Vector of Vector, each element is a "pointer to a vector", that's not "just some bits", and thus not GPU compatible.

Related

Line profiling with cython in jupyter notebook

I'm trying to use liner_profiler library in jupyter notebook with cython function. It is working only halfway. The result I get only consist of first row of the function and no profiling results.
%%cython -a
# cython: linetrace=True
# cython: binding=True
# distutils: define_macros=CYTHON_TRACE_NOGIL=1
import numpy as np
cimport numpy as np
from datetime import datetime
import math
cpdef np.int64_t get_days(np.int64_t year, np.int64_t month):
cdef np.ndarray months=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
if month==2:
if (year%4==0 and year%100!=0) or (year%400==0):
return 29
return months[month-1]
For the profiling result int onlt shows one line of code
Timer unit: 1e-07 s
Total time: 0.0015096 s
File: .ipython\cython\_cython_magic_0154a9feed9bbd6e4f23e57d73acf50f.pyx
Function: get_days at line 15
Line # Hits Time Per Hit % Time Line Contents
==============================================================
15 cpdef np.int64_t get_days(np.int64_t year, np.int64_t month):
This can be seen as a bug in the line_profiler (if it is supposed to support Cython). To get the code of the profiled function, line_profiler reads the pyx-file and tries to extract the code with help of inspect.getblock:
...
# read pyx-file
all_lines = linecache.getlines(filename)
# try to extract body of the function strarting at start_lineno:
sublines = inspect.getblock(all_lines[start_lineno-1:])
...
However, getblock knows nothing about cpdef-function, as python has only def-functions and thus yields wrong function-body (i.e. only the signature).
Workaround:
A simple work around would be to introduce a dummy def-function, which would be a sentinel for the cpdef-function in such a way, that inspect.getblock would yield the whole body of the cpdef-function + body of the the sentinel function, i.e.:
%%cython
...
cpdef np.int64_t get_days(np.int64_t year, np.int64_t month):
...
def get_days_sentinel():
pass
and now the report %lprun -f get_days get_days(2019,3) looks as follows:
Timer unit: 1e-06 s
Total time: 1.7e-05 s
File: XXXX.pyx
Function: get_days at line 10
Line # Hits Time Per Hit % Time Line Contents
==============================================================
10 cpdef np.int64_t get_days(np.int64_t year, np.int64_t month):
11 1 14.0 14.0 82.4 cdef np.ndarray months=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
12 1 1.0 1.0 5.9 if month==2:
13 if (year%4==0 and year%100!=0) or (year%400==0):
14 return 29
15 1 2.0 2.0 11.8 return months[month-1]
16
17 def get_days_sentinel():
18 pass
There are still somewhat ugly trailing lines from the sentinel, but it is probably better as not seeing anything at all.

R data.table fread fails on special character

I can only give you picture of data I'm working with or the character that creates my problems in .csv file. I don't know how to get that character.
This pillar character is stopping fread working. Is there away to escape it? readr read_csv works through them with no problem. I have tried to drop, make it character column, use comment.char = "", but nothing seems to work.
Here what I'm hoping to get out (what I get out with read_csv)
# A tibble: 5 x 4
X1 trade date trade_condition
<dbl> <dbl> <date> <chr>
1 2902 28.3 2019-01-14 -12------P----
2 2903 28.0 2019-01-14 P
3 2904 28.0 2019-01-14 P
4 2905 28.0 2019-01-14 P
5 2906 28.1 2019-01-14 P
I'm using data.table_1.12.0
Here is Verbose = T
omp_get_max_threads() = 8
omp_get_thread_limit() = 2147483647
DTthreads = 0
RestoreAfterFork = true
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
Using 8 threads (omp_get_max_threads()=8, nth=8)
NAstrings = [<<NA>>]
None of the NAstrings look like numbers.
show progress = 1
0/1 column will be read as integer
[02] Opening the file
Opening file C:/Users/Markku/Desktop/KONECRANES_2019.01.14/trades.csv
File opened, size = 592KB (606768 bytes).
Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
Positioned on line 1 starting: <<,trade,date,trade_condition,sy>>
[06] Detect separator, quoting rule, and ncolumns
Detecting sep automatically ...
sep=',' with 100 lines of 9 fields using quote rule 0
Detected 9 columns on line 1. This line is either column names or first data row. Line starts as: <<,trade,date,trade_condition,sy>>
Quote rule picked = 0
fill=false and the most number of columns found is 9
[07] Detect column types, good nrow estimate and whether first row is column names
Number of sampling jump points = 10 because (606767 bytes from row 1 to eof) / (2 * 27623 jump0size) == 10
Type codes (jump 000) : 57AAAA5AA Quote rule 0
A line with too-few fields (4/9) was found on line 4 of sample jump 7. Most likely this jump landed awkwardly so type bumps here will be skipped.
A line with too-few fields (4/9) was found on line 13 of sample jump 9. Most likely this jump landed awkwardly so type bumps here will be skipped.
Type codes (jump 010) : 57AAAA5AA Quote rule 0
'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 858 sample rows
=====
Sampled 858 rows (handled \n inside quoted fields) at 11 jump points
Bytes from first data row on line 2 to the end of last row: 606683
Line length: mean=213.01 sd=86.78 min=59 max=372
Estimated number of rows: 606683 / 213.01 = 2849
Initial alloc = 5698 rows (2849 + 100%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
=====
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : 57AAAA5AA
[10] Allocate memory for the datatable
Allocating 9 column slots (9 - 0 dropped) with 5698 rows
[11] Read the data
jumps=[0..1), chunk_size=606683, total_size=606683
Restarting team from jump 0. nSwept==0 quoteRule==1
jumps=[0..1), chunk_size=606683, total_size=606683
Restarting team from jump 0. nSwept==0 quoteRule==2
jumps=[0..1), chunk_size=606683, total_size=606683
Restarting team from jump 0. nSwept==0 quoteRule==3
jumps=[0..1), chunk_size=606683, total_size=606683
Read 2903 rows x 9 columns from 592KB (606768 bytes) file in 00:00.014 wall clock time
[12] Finalizing the datatable
Type counts:
2 : int32 '5'
1 : float64 '7'
6 : string 'A'
=============================
0.003s ( 21%) Memory map 0.001GB file
0.007s ( 50%) sep=',' ncol=9 and header detection
0.000s ( 0%) Column type detection using 858 sample rows
0.000s ( 0%) Allocation of 5698 rows x 9 cols (0.000GB) of which 2903 ( 51%) rows used
0.004s ( 29%) Reading 1 chunks (0 swept) of 0.579MB (each chunk 2903 rows) using 1 threads
+ 0.000s ( 0%) Parse to row-major thread buffers (grown 0 times)
+ 0.002s ( 14%) Transpose
+ 0.002s ( 14%) Waiting
0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
0.014s Total
Warning message:
In fread(trades_file, verbose = T) :
Stopped early on line 2905. Expected 9 fields but found 4. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<2903,28.04,2019-01-14,"P>>

Return multiple nested dictionaries from Tcl

I have a Tcl proc that creates two dictionaries from a large file. It is something like this:
...
...
proc makeCircuitData {spiceNetlist} {
#read the spiceNetlist file line by line
# create a dict with multilevel nesting called elementMap that will have the following structure:
# elementMap key1 key2 value12
# elementMap keyA keyB valueAB
# and so on
# ... some other code here ...
# create another dict with multilevel nesting called cktElementAttr that will have the following structure:
# cktElementAttr resistor leftVoltageNode1 rightVoltageNode1 resValue11
# cktElementAttr resistor leftVoltageNode2 rightVoltageNode2 resValue12
# cktElementAttr inductor leftVoltageNode2 rightVoltageNode2 indValue11
# cktElementAttr inductor leftVoltageNode2 rightVoltageNode2 indValue12
# cktElementAttr capacitor leftVoltageNode2 rightVoltageNode2 capValue11
# ... so on...
}
I want to return these two nested dictionaries:
cktElementAttr and elementMap from the above types of procedures as these two dictionaries get used by other parts of my program.
What is the recommended way to return two dictionaries from Tcl procs?
Thanks.
This should work:
return [list $cktElementAttr $elementMap]
Then, at the caller, you can assign the return value to a list:
set theDictionaries [makeCircuitData ...]
or assign them to different variables:
lassign [makeCircuitData ...] cEltAttr elmMap
In Tcl 8.4 or older (which are obsolete!), you can (ab)use foreach to do the job of lassign:
foreach {cEltAttr elmMap} [makeCircuitData ...] break
Documentation:
break,
foreach,
lassign,
list,
return,
set

Error DimensionMismatch in Julia

Hi,
Learning a bit more about Julia (0.4.0), I am facing an interesting situation, probably with a simple solution that escapes me.
I have an array similar to this one:
17200x11 Array{Any,2}:
1 -16.449 -1.091 -3.6087 -12.6724 -1.5945 -14.7705 -7.2174 -25.2609 -3.7766 -14.3509
1 -16.6168 -5.2032 1.091 -3.8605 1.1749 -11.6653 -6.1264 -16.3651 -2.0142 -14.0991
1 -16.8686 -7.3853 3.8605 6.2103 -0.9232 -6.546 -8.1406 -10.0708 -2.2659 -16.3651
1 -16.5329 -10.4904 -1.7624 8.1406 -10.2386 1.3428 -16.0294 -6.4621 -4.6158 -19.5541
1 -13.8474 -13.5117 -13.6795 1.9302 -18.5471 3.6087 -22.995 -4.2801 -8.2245 -17.9596
1 -9.1476 -13.7634 -20.6451 -1.7624 -18.2953 1.091 -24.0021 -2.7695 -10.4904 -8.3923
1 -4.6997 -8.9798 -14.267 1.6785 -10.7422 1.1749 -19.3024 -2.2659 -11.0779 -2.6016
I have built a function like this one:
function aligner(mat,sc=schord)
ls=#parallel vcat for i=1:Int64(size(mat,1)/sc)
hcat(mat[((i-1)*sc+1),1],reshape(mat[((i-1)*sc+1):(i*sc),2:end],length(mat[((i-1)*sc+1):(i*sc),2:end]))') # reshape to convert array to vector and ' to transpose
end
return ls
end
Running this line
tmpU=aligner(tmpR,100)
I got this error:
ERROR: DimensionMismatch("mismatch in dimension 1 (expected 1 got 100)")
in cat_t at abstractarray.jl:824
in hcat at abstractarray.jl:849
[inlined code] from none:3
in anonymous at no file:1500
in anonymous at multi.jl:684
in run_work_thunk at multi.jl:645
in remotecall_fetch at multi.jl:718
in remotecall_fetch at multi.jl:734
in anonymous at multi.jl:1485
in yieldto at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib
in wait at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib (repeats 3 times)
in preduce at multi.jl:1489
[inlined code] from multi.jl:1498
in anonymous at expr.jl:1543
in aligner at none:2
Curiously, if I use only the core of the function (and of course, mat=myArray and sc=100), it works perfectly.
ls=#parallel vcat for i=1:Int64(size(mat,1)/sc)
hcat(mat[((i-1)*sc+1),1],reshape(mat[((i-1)*sc+1):(i*sc),2:end],length(mat[((i-1)*sc+1):(i*sc),2:end]))') # reshape to convert array to vector and ' to transpose
end
172x1001 Array{Any,2}:
1 -16.449 -16.6168 -16.8686 -16.5329 -13.8474 -9.1476 -4.6997 … 10.3226 3.273 -0.2518 4.364 7.2174 1.3428 -6.2103
1 -21.6522 -14.6866 -15.0223 -19.9738 -21.7361 -22.5754 -23.3307 12.1689 12.1689 8.0566 3.6926 3.0212 3.9444 1.3428
1 -6.6299 -4.6997 3.6926 7.5531 7.3013 4.1962 5.3711 -15.5258 -12.2528 -7.5531 -7.1335 -12.3367 -17.4561 -17.2882
1 9.903 5.9586 3.3569 4.1962 4.8676 4.6997 8.3923 0.9232 -0.5035 -5.9586 -9.9869 -9.6512 -1.7624 4.4479
1 19.1345 14.183 10.1547 10.4904 8.2245 2.4338 -3.6926 -4.8676 -6.7978 -8.8959 -11.5814 -15.0223 -11.0779 -3.1891
1 -3.1052 -0.7553 6.3782 6.2943 0.9232 0.8392 4.0283 … -8.0566 -8.5602 -9.5673 -10.6583 -8.0566 -2.2659 1.2589
I would appreciate any help to understand/solve the problem.
Kind Regards, RN
Well, it seems the solution is really simple:
function aligner(mat::Array,sc::Int=schord)
ls::Array=#parallel vcat for i=1:Int64(size(mat,1)/sc)
hcat(mat[((i-1)*sc+1),1],reshape(mat[((i-1)*sc+1):(i*sc),2:end],length(mat[((i-1)*sc+1):(i*sc),2:end]))') # reshape to convert array to vector and ' to transpose
end
return ls
end
!

Reading data from URL

Is there a reasonably easy way to get data from some url? I tried the most obvious version, does not work:
readcsv("https://dl.dropboxusercontent.com/u/.../testdata.csv")
I did not find any usable reference. Any help?
If you want to read a CSV from a URL, you can use the Requests package as #waTeim shows and then read the data through an IOBuffer. See example below.
Or, as #Colin T Bowers comments, you could use the currently (December 2017) more actively maintained HTTP.jl package like this:
julia> using HTTP
julia> res = HTTP.get("https://www.ferc.gov/docs-filing/eqr/q2-2013/soft-tools/sample-csv/transaction.txt");
julia> mycsv = readcsv(res.body);
julia> for (colnum, myheader) in enumerate(mycsv[1,:])
println(colnum, '\t', myheader)
end
1 transaction_unique_identifier
2 seller_company_name
3 customer_company_name
4 customer_duns_number
5 tariff_reference
6 contract_service_agreement
7 trans_id
8 transaction_begin_date
9 transaction_end_date
10 time_zone
11 point_of_delivery_control_area
12 specific location
13 class_name
14 term_name
15 increment_name
16 increment_peaking_name
17 product_name
18 transaction_quantity
19 price
20 units
21 total_transmission_charge
22 transaction_charge
Using the Requests.jl package:
julia> using Requests
julia> res = get("https://www.ferc.gov/docs-filing/eqr/q2-2013/soft-tools/sample-csv/transaction.txt");
julia> mycsv = readcsv(IOBuffer(res.data));
julia> for (colnum, myheader) in enumerate(mycsv[1,:])
println(colnum, '\t', myheader)
end
1 transaction_unique_identifier
2 seller_company_name
3 customer_company_name
4 customer_duns_number
5 tariff_reference
6 contract_service_agreement
7 trans_id
8 transaction_begin_date
9 transaction_end_date
10 time_zone
11 point_of_delivery_control_area
12 specific location
13 class_name
14 term_name
15 increment_name
16 increment_peaking_name
17 product_name
18 transaction_quantity
19 price
20 units
21 total_transmission_charge
22 transaction_charge
If you are looking to read into a dataframe, this will also work in Julia:
using CSV
dataset = CSV.read(download("https://mywebsite.edu/ml/machine-learning-databases/my.data"))
The Requests package seems to work pretty well. There are others (see the entire package list) but Requests is actively maintained.
Obtaining it
julia> Pkg.add("Requests")
julia> using Requests
Using it
You can use one of the exported functions that correspond to the various HTTP verbs get, post, etc which returns a Response type
julia> res = get("http://julialang.org")
Response(200 OK, 21 Headers, 20913 Bytes in Body)
julia> typeof(res)
Response (constructor with 8 methods)
And then, for example, you can print the data using #printf
julia> #printf("%s",res.data);
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
...
If it is directly a csv file, something like this should work:
A = readdlm(download(url),';')
Nowadays you can also use UrlDownload.jl which is pure Julia, take care of download details, process data in-memory and can also work with compressed files.
Usage is straightforward
using UrlDownload
A = urldownload("https://data.ok.gov/sites/default/files/unspsc%20codes_3.csv")

Resources