Why does Julia DIct.keys show a power of 2 values - dictionary

using Julia 0.6.2
when i create a dictionary of 10 items, the array for the keys is 16, apparently rounding up to the next power of 2.
julia> dk.keys
16-element Array{Int64,1}:
0
4
9
25
100
81
0
0
16
36
64
0
49
0
0
1
when i create a dictionary with 17 keys
julia> dkk = Dict(k^2 => "*"^k for k = 1:17)
Dict{Int64,String} with 17 entries:
...
julia> dkk.keys
64-element Array{Int64,1}:
0
0
100
0
121
81
0
0
16
0
⋮
4536409040
4536409456
36
225
256
0
0
4536409904
1
why 64 instead of the next power of 2, which would be 32?
either way, i really just want the keys and not the hash table.
note: when the dictionary is access directly, the number of entries is what i'd expect.
julia> dk
Dict{Int64,String} with 10 entries:
julia> dkk
Dict{Int64,String} with 17 entries:

It's powers of 2 for some internal reason (which I would guess is due to using a tree or something like that, I don't know). Avoid directly grabbing internals. Instead, use the iterator keys(dk). If you want the keys as an array, use collect(keys(dk)).

Related

Julia BitArray with 128 Bits

I need a Julia BitArray-like object that can encode more than 64 bits, say 128 bits. Does a simple replacement of UInt64 with UInt128 in bitarray.jl work?
Based on the information in your comment, the existing BitArray would itself serve your needs. Note that BitArray uses UInt64s internally, but that's not a limitation on the size of the array - it actually stores the bits as a Vector of UInt64s, so there's no special size limitation. You can create a 5x5x5 BitArray with no problem.
julia> b = BitArray(undef, 5, 5, 5);
julia> b .= 0;
julia> b[3, 5, 5] = 1
1
julia> b[3, :, :]
5×5 BitMatrix:
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 1
Maybe this part of the documentation threw you off:
BitArrays pack up to 64 values into every 8 bytes, resulting in an 8x space efficiency over Array{Bool, N} and allowing some
operations to work on 64 values at once.
but that's talking about internal implementation details. BitArrays are not limited to 8 bytes, so they're not limited to having just 64 values in them either.
Creating a new type of bit array using UInt128s would likely not be optimized, and is unnecessary anyway.

How to delete a selected element in a range construct in Julia?

From here I found that in a range construct one cannot find and replace its elements via array functions... How can be do it anyway?
Suppose I want to delete the elements 2,6,7,8,13,19 in range(1, step=1, stop=21). Or more generally, suppose a is a random array that contains numbers in the range [1,21] and one wants to delete these elementes in the given range.
You cannot delete from a range object, since that is immutable, but you can filter it:
julia> filter(x -> x ∉ [2,6,7,8,13,19], a)
15-element Array{Int64,1}:
1
3
4
5
9
10
11
12
14
15
16
17
18
20
21
However, if a is a "real" array, you can use filter! to operate in-place.
Another solution that if often convenient is to use InvertedIndices.jl package which exports Not and you can just use indexing:
julia> r = 1:21
1:21
julia> x = [2,6,7,8,13,19]
6-element Array{Int64,1}:
2
6
7
8
13
19
julia> r[Not(x)]
15-element Array{Int64,1}:
1
3
4
5
9
10
11
12
14
15
16
17
18
20
21

Print varible names in table() with 2 binary variables in R

I'm sure I'll kick myself for not being able to figure this out, but when you have a table with 2 variables (i.e. cross-tab) and both are binary or otherwise have the same levels, how can you make R show which variable is displayed row-wise and which is column-wise?
For example:
> table(tc$tr, tc$fall_term)
0 1
0 1569 538
1 0 408
is a little confusing because it's not immediately obvious which is which. Of course, I checked out ?table but I don't see an option to do this, at least not a logical switch that doesn't require me to already know which is which.
I tried ftable but had the same problem.
The output I want would be something like this:
> table(tc$tr, tc$fall_term)
tr tr
0 1
fallterm 0 1569 538
fallterm 1 0 408
or
> table(tc$tr, tc$fall_term)
fallterm fallterm
0 1
tr 0 1569 538
tr 1 0 408
You can use the dnn option :
table(df$tr,df$fall_term) # impossible to tell the difference
0 1
0 18 33
1 15 34
table(df$tr,df$fall_term,dnn=c('tr','fall_term')) # you have the names
fall_term
tr 0 1
0 18 33
1 15 34
Note that it's easier (and safer) to do table(df$tr,df$fall_term,dnn=colnames(df))
Check out dimnames, and in particular their names. I’m using another example here since I don’t have your data:
x = HairEyeColor[, , Sex = 'Male']
names(dimnames(x))
# [1] "Hair" "Eye"
names(dimnames(x)) = c('Something', 'Else')
x
# Else
# Something Brown Blue Hazel Green
# Black 32 11 10 3
# Brown 53 50 25 15
# Red 10 10 7 7
# Blond 3 30 5 8

R: filling matrix with values does not work

I have a data frame vec that I need to prepare for an image.plot() plot. The structure of vec is as follows:
> str(vec)
'data.frame': 31212 obs. of 5 variables:
$ x : int 8 24 40 56 72 88 104 120 136 152 ...
$ y : int 8 8 8 8 8 8 8 8 8 8 ...
$ dx: num 0 0 0 0 0 0 0 0 0 0 ...
$ dy: num 0 0 0 0 0 0 0 0 0 0 ...
$ d : num 0 0 0 0 0 0 0 0 0 0 ...
Note: the values in $dx, $dy and $d are not zero but only too small to be shown in this overview.
Background: the data is the output of a pixel tracking software. $x and $y are pixel coordinates while in $d are the displacement vector lengths (in pixels) of that pixel.
image.plot() expects as first and second argument the dimension of the matrix as ordered vectors, so I think sort(unique(vec$x)) and sort(unique(vec$y)) respectively should be good. So, I would like to end up with image.plot(sort(unique(vec$x)),sort(unique(vec$y)), data)
The third argument is the actual data. To build this I tried:
# spanning an empty matrix
data = matrix(NA,length(unique(vec$x)),length(unique(vec$y)))
# filling the matrix
data[match(vec$x, sort(unique(vec$x))), match(vec$y, sort(unique(vec$y)))] = vec$d
But, unfortunately, this isn't working. It reports no errors but data contains no values! This works:
for(i in c(1:length(vec$x))) data[match(vec$x[i], sort(unique(vec$x))), match(vec$y[i], sort(unique(vec$y)))] = vec$d[i]
But is very slow.
a) is there a better way to build data?
b) is there a better way to deal with my problem, anyways?
R allows indexing of a matrix by a two-column matrix, where the first column of the index is interpreted as the row index, and the second column as the column index. So create the indexes into data as a two-column matrix
idx = cbind(match(vec$x, sort(unique(vec$x))),
match(vec$y, sort(unique(vec$y))))
and use that
data[idx] = vec$d

extracting data from excel spreadsheet that is not organized in columns but repeats every x number of rows

I'm trying to extract information from an excel spreadsheet that is not organized in columns but by rows. key points:
the excel spreadsheet was converted to csv resulting in 2023 rows
and 5 columns.
read this file and converted in a data.frame,
called "test".
attempt to create a data.frame with 2 loops.
result
There were 50 or more warnings (use warnings() to see the first 50)
warning(extractor)
Error in FUN(X[[1L]], ...) :
cannot coerce type 'closure' to vector of type 'character'
very much appreciate your help..
extractor<-function(test){
##x<-data.frame(matrix(NA,nrow=920,ncol=3))
x<-data.frame(name=character(920),date=numeric(920),ton=numeric(920))
for (i in 1:920){
m<-11*i-9
{for(j in 1:5) {
x$name[i]=test[m,][1]
x$date[i]=test[m+j+2,][1]
x$ton[i]=test[m+j+2,][3]
}
}
}
test.csv looks like this:
XXXX-XXX-LHS-P1
2 XXXX-XXX-BHS-P1
3 Date blasted BLASTED (T) MUCKED (T) REM'G (T)
4 BLAST #1 0 0
5 BLAST #2 0.00 0
6 BLAST #3 0 0
7 BLAST #4 0 0
8 BLAST #5 0 0
9 TOTAL 0 0
10 % Mucked to Date 0 0 of design
11 REM'G TO BLAST 25419
12 XXXX-XXX-LHS-P1
13 XXXX-XXX-BHS-P1 10069 Ready? 0
14 Date blasted BLASTED (T) MUCKED (T) REM'G (T)
15 41556 BLAST #1 10069 10069
16 BLAST #2 0 0
17 BLAST #3 0 0
18 BLAST #4 0 0
19 BLAST #5 0 0
20 TOTAL 10069 9656 413
21 % Mucked to Date 0.958983017
22 REM'G TO BLAST 0
...
I'm not sure that this will definitely address all of the warnings, but try adding the argument stringsAsFactors=FALSE to the end of the line where you create the data.frame.
Just by creating a character column you are using factors, which can't be modified with a simple assignment command. Your command should read x<-data.frame(name=character(920),date=numeric(920),ton=numeric(920),stringsAsFactors=FALSE).

Resources