I saw here that julia has some %f style in printing the floats.
I would like to know if this is possible to do it in setting the name of some output file. I mean that in my program I have something like:
...
for epsilon in epsilon_array
...
printfile = open("outputfile_epsilon$(epsilon).dat", "w")
...
end
...
So for instance, if I have epsilon=0.010000000 (float64), I want that the name of the outputfile is just outputfile_epsilon0.1.dat
EDIT:
For example consider the following "filling" of my the array:
epsilon_array = zeros(Float64,100)
iijj = 0.0
for ii in 1:100
epsilon_array[ii] = iijj
iijj += 0.01
end
If I take a look at some outputfile, I will have:
outputfile_epsilon0.9500000000000006.dat
So the problem is that there's a nasty number at the end of the float, that forces julia to print the whole integer.
The crux of the problem here seems to be that you are not accounting for roundoff error in your summation. Float64 can't represent 0.01 exactly, so computers instead store the closest approximation. Each time you use epsilon in a computation, you increase the resulting error. The range and linspace functions are aware of this, and are thus able to avoid the problem:
iijj = 0:0.01:1
for ii in 1:100
epsilon_array[ii] = iijj[ii]
end
Alternatively, you might decide that you just always really want 2 digits printed, in which case you could use printf-style formatting for the filename:
open(#sprintf("outputfile_epsilon%.2f.dat", epsilon), "w")
You can use the rstrip() function which will remove instances of a given character from the right side of a string (see documentation here). E.g.
epsilon=0.010000000
eps_String = rstrip("$epsilon", '0')
println(eps_String)
# 0.01
edit: I'm not certain that any kind of formatting is actually necessary here though. If your numbers are indeed stored as floats, then trailing zeros shouldn't be stored with them. E.g. even without the rstrip() in the example above I get:
julia> epsilon_array = [0.010000000, 0.02000000]
2-element Array{Float64,1}:
0.01
0.02
julia> for epsilon in epsilon_array
println("outputfile_epsilon$(epsilon).dat")
end
outputfile_epsilon0.01.dat
outputfile_epsilon0.02.dat
Related
I am trying to extract values from a vector to generate random numbers from a GEV distribution. I keep getting an error. This is my code
x=rand(Truncated(Poisson(2),0,10),10)
t=[]
for i in 1:10 append!(t, maximum(rand(GeneralizedExtremeValue(2,4,3, x[i])))
I am new to this program and I think I am not passing the variable x properly. Any help will be appreciated. Thanks
If I am correctly understanding what you are trying to do, you might want something more like
x = rand(Truncated(Poisson(2),0,10),10)
t = Float64[]
for i in 1:10
append!(t, max(rand(GeneralizedExtremeValue(2,4,3)), x[i]))
end
Among other things, you were missing a paren, and probably want max instead of maximum here.
Also, while it would technically work, t = [] creates an empty array of type Any, which tends to be very inefficient, so you can avoid that by just telling Julia what type you want that array to hold with e.g. t = Float64[].
Finally, since you already know t only needs to hold ten results, you can make this again more efficient by pre-allocating t
x = rand(Truncated(Poisson(2),0,10),10)
t = Array{Float64}(undef,10)
for i in 1:10
t[i] = max(rand(GeneralizedExtremeValue(2,4,3)), x[i])
end
I'm a "write Fortran in all languages" kind of person trying to learn modern programming practices. I have a one dimensional function ft(lx)=HT(x,f(x),lx), where x, and f(x) are one dimensional arrays of size nx, and lx is the size of output array ft. I want to apply HT on a multidimensional array f(x,y,z).
Basically I want to apply HT on all three dimensions to go from f(x,y,z) defined on (nx,ny,nz) dimensional grid, to ft(lx,ly,lz) defined on (lx,ly,lz) dimensional grid:
ft(lx,y,z) = HT(x,f(x,y,z) ,lx)
ft(lx,ly,z) = HT(y,ft(lx,y,z) ,ly)
ft(lx,ly,lz) = HT(z,ft(lx,ly,z),lz)
In f95 style I would tend to write something like:
FTx=zeros((lx,ny,nz))
for k=1:nz
for j=1:ny
FTx[:,j,k]=HT(x,f[:,j,k],lx)
end
end
FTxy=zeros((lx,ly,nz))
for k=1:nz
for i=1:lx
FTxy[i,:,k]=HT(y,FTx[i,:,k],ly)
end
end
FTxyz=zeros((lx,ly,lz))
for j=1:ly
for i=1:lx
FTxyz[i,j,:]=HT(z,FTxy[i,j,:],lz)
end
end
I know idiomatic Julia would require using something like mapslices. I was not able to understand how to go about doing this from the mapslices documentation.
So my question is: what would be the idiomatic Julia code, along with proper type declarations, equivalent to the Fortran style version?
A follow up sub-question would be: Is it possible to write a function
FT = HTnD((Tuple of x,y,z etc.),f(x,y,z), (Tuple of lx,ly,lz etc.))
that works with arbitrary dimensions? I.e. it would automatically adjust computation for 1,2,3 dimensions based on the sizes of input tuples and function?
I have a piece of code here which is fairly close to what you want. The key tool is Base.Cartesian.#nexprs which you can read up on in the linked documentation.
The three essential lines in my code are Lines 30 to 32. Here is a verbal description of what they do.
Line 30: reshape an n1 x n2 x ... nN-sized array C_{k-1} into an n1 x prod(n2,...,nN) matrix tmp_k.
Line 31: Apply the function B[k] to each column of tmp_k. In my code, there are some indirections here since I want to allow for B[k] to be a matrix or a function, but the basic idea is as described above. This is the part where you would want to bring in your HT function.
Line 32: Reshape tmp_k back into an N-dimensional array and circularly permute the dimensions such that the second dimension of tmp_k ends up as the first dimension of C_k. This makes sure that the next iteration of the "loop" implied by #nexprs operates on the second dimension of the original array, and so on.
As you can see, my code avoids forming slices along arbitrary dimensions by permuting such that we only ever need to slice along the first dimension. This makes programming much easier, and it can also have some performance benefits. For example, computing the matrix-vector products B * C[i1,:,i3] for all i1,i3can be done easily and very efficiently by moving the second dimension of C into the first position of tmp and using gemm to compute B * tmp. Doing the same efficiently without the permutation would be much harder.
Following #gTcV's code, your function would look like:
using Base.Cartesian
ht(x,F,d) = mapslices(f -> HT(x, f, d), F, dims = 1)
#generated function HTnD(
xx::NTuple{N,Any},
F::AbstractArray{<:Any,N},
newdims::NTuple{N,Int}
) where {N}
quote
F_0 = F
Base.Cartesian.#nexprs $N k->begin
tmp_k = reshape(F_{k-1},(size(F_{k-1},1),prod(Base.tail(size(F_{k-1})))))
tmp_k = ht(xx[k], tmp_k, newdims[k])
F_k = Array(reshape(permutedims(tmp_k),(Base.tail(size(F_{k-1}))...,size(tmp_k,1))))
# https://github.com/JuliaLang/julia/issues/30988
end
return $(Symbol("F_",N))
end
end
A simpler version, which shows the usage of mapslices would look like this
function simpleHTnD(
xx::NTuple{N,Any},
F::AbstractArray{<:Any,N},
newdims::NTuple{N,Int}
) where {N}
for k = 1:N
F = mapslices(f -> HT(xx[k], f, newdims[k]), F, dims = k)
end
return F
end
you could even use foldl if you are a friend of one-liners ;-)
fold_HTnD(xx, F, newdims) = foldl((F, k) -> mapslices(f -> HT(xx[k], f, newdims[k]), F, dims = k), 1:length(xx), init = F)
I am trying to use the aggregate function to compute the mean of a variable by group
using Distributions, PooledArrays
N=Int64(2e9/8); K=100;
pool = [#sprintf "id%03d" k for k in 1:K]
pool1 = [#sprintf "id%010d" k for k in 1:(N/K)]
function randstrarray(pool, N)
PooledArray(PooledArrays.RefArray(rand(UInt8(1):UInt8(K), N)), pool)
end
using JuliaDB
DT = IndexedTable(Columns([1:N;]), Columns(
id1 = randstrarray(pool, N),
v3 = rand(round.(rand(Uniform(0,100),100),4), N) # numeric e.g. 23.5749
));
res = IndexedTables.aggregate(mean, DT, by=(:id1,), with=:v3)
How I get the error
MethodError: no method matching mean(::Float64, ::Float64)
Closest candidates are:
mean(!Matched::Union{Function, Type}, ::Any) at statistics.jl:19
mean(!Matched::AbstractArray{T,N} where N, ::Any) where T at statistics.jl:57
mean(::Any) at statistics.jl:34
in at base\<missing>
in #aggregate#144 at IndexedTables\src\query.jl:119
in aggregate_to at IndexedTables\src\query.jl:148
however
IndexedTables.aggregate(+ , DT, by=(:id1,), with=:v3)
works fine
Edit:
res = IndexedTables.aggregate_vec(mean, DT, by=(:id1,), with=:v3)
from help:
help?> IndexedTables.aggregate_vec
aggregate_vec(f::Function, x::IndexedTable)
Combine adjacent rows with equal indices using a function from vector to scalar, e.g. mean.
Old answer:
(I keep it because it was pleasant exercise (for me) how to create helper type and functions if something doesn't work like we want. Maybe it could help someone in future :)
I am not sure how do you like to aggregate mean. My idea is to calculate "center of gravity" for points with equivalent mass.
center of two points: G = (A+B)/2
adding (aggregating) third point C is (2G+C)/3 (2G because G's mass is A's mass +B's mass)
etc.
struct Atractor
center::Float64
mass::Int64
end
" two points create new atractor with double mass "
mediocre(a::Float64, b::Float64) = Atractor((a+b)/2, 2)
# pls forgive me function's name! :)
" aggregate new point to atractor "
function mediocre(a::Atractor, b::Float64)
mass = a.mass + 1
Atractor((a.center*a.mass+b)/mass, mass)
end
Test:
tst_array = rand(Float64, 100);
isapprox(mean(tst_array), reduce(mediocre, tst_array).center)
true # at least in my tests! :)
mean(tst_array) == reduce(mediocre, tst_array).center # sometimes true
For aggregate function we need a little more work:
import Base.convert
" we need method for convert Atractor to Float64 because aggregate
function wants to store result in Float64 "
convert(Float64, x::Atractor) = x.center
And now it (probably :P) works
res = IndexedTables.aggregate(mediocre, DT, by=(:id1,), with=:v3)
id1 │
────────┼────────
"id001" │ 45.9404
"id002" │ 47.0032
"id003" │ 46.0846
"id004" │ 47.2567
...
I hope you see that aggregating mean has impact to precision! (there is more sum and divide operations)
You need to tell it how to reduce two numbers to one. mean is for arrays. So just use an anonymous function:
res = IndexedTables.aggregate((x,y)->(x+y)/2, DT, by=(:id1,), with=:v3)
I'd really like to help you, but it took me 10 minutes to install all the packages and another few minutes to run the code and figuring out what it actually does (or doesn't). It would be great if you'd provide a "minimal working example", which focusses on the problem. In fact, the only requirement to reproduce your problem is seemingly IndexedTables and two random arrays.
(Sorry, this is not a complete answer, but too long to be a comment.)
Anyways, if you read the docstring of IndexedTables.aggregate, you see that it requires a function which takes two arguments and obviously returns a single value::
help?> IndexedTables.aggregate
aggregate(f::Function, arr::IndexedTable)
Combine adjacent rows with equal indices using the given 2-argument
reduction function, returning the result in a new array.
You see in the error message you posted, that there is
no method matching mean(::Float64, ::Float64)
Since I don't know what you expect to be calculated, I now assume that you want to calculate the mean value of the two numbers. In this case you can define another method for mean():
Base.mean(x, y) = (x+y) / 2
This will fulfil the aggregate function signature requirements. But I am not sure if this is what you want.
I want to retrieve all the elements along the last dimension of an N-dimensional array A. That is, if idx is an (N-1) dimensional tuple, I want A[idx...,:]. I've figured out how to use CartesianRange for this, and it works as shown below
A = rand(2,3,4)
for idx in CartesianRange(size(A)[1:end-1])
i = zeros(Int, length(idx))
[i[bdx] = idx[bdx] for bdx in 1:length(idx)]
#show(A[i...,:])
end
However, there must be an easier way to create the index i shown above . Splatting idx does not work - what am I doing wrong?
You can just index directly with the CartesianIndex that gets generated from the CartesianRange!
julia> for idx in CartesianRange(size(A)[1:end-1])
#show(A[idx,:])
end
A[idx,:] = [0.0334735,0.216738,0.941401,0.973918]
A[idx,:] = [0.842384,0.236736,0.103348,0.729471]
A[idx,:] = [0.056548,0.283617,0.504253,0.718918]
A[idx,:] = [0.551649,0.55043,0.126092,0.259216]
A[idx,:] = [0.65623,0.738998,0.781989,0.160111]
A[idx,:] = [0.177955,0.971617,0.942002,0.210386]
The other recommendation I'd have here is to use the un-exported Base.front function to extract the leading dimensions from size(A) instead of indexing into it. Working with tuples in a type-stable way like this can be a little tricky, but they're really fast once you get the hang of it.
It's also worth noting that Julia's arrays are column-major, so accessing the trailing dimension like this is going to be much slower than grabbing the columns.
I'm trying to return a (square) section from an array, where the indices wrap around the edges. I need to juggle some indexing, but it works, however, I expect the last two lines of codes to have the same result, why don't they? How does numpy interpret the last line?
And as a bonus question: Am I being woefully inefficient with this approach? I'm using the product because I need to modulo the range so it wraps around, otherwise I'd use a[imin:imax, jmin:jmax, :], of course.
import numpy as np
from itertools import product
i = np.arange(-1, 2) % 3
j = np.arange(1, 4) % 3
a = np.random.randint(1,10,(3,3,2))
print a[i,j,:]
# Gives 3 entries [(i[0],j[0]), (i[1],j[1]), (i[2],j[2])]
# This is not what I want...
indices = list(product(i, j))
print indices
indices = zip(*indices)
print 'a[indices]\n', a[indices]
# This works, but when I'm explicit:
print 'a[indices, :]\n', a[indices, :]
# Huh?
The problem is that advanced indexing is triggered if:
the selection object, obj, is [...] a tuple with at least one sequence object or ndarray
The easiest fix in your case is to use repeated indexing:
a[i][:, j]
An alternative would be to use ndarray.take, which will perform the modulo operation for you if you specify mode='wrap':
a.take(np.arange(-1, 2), axis=0, mode='wrap').take(np.arange(1, 4), axis=1, mode='wrap')
To give another method of advanced indexing which is better in my opinion then the product solution.
If you have for every dimension an integer array these are broadcasted together and the output is the same output as the broadcast shape (you will see what I mean)...
i, j = np.ix_(i,j) # this adds extra empty axes
print i,j
print a[i,j]
# and now you will actually *not* be surprised:
print a[i,j,:]
Note that this is a 3x3x2 array, while you had a 9x2 array, but simple reshape will fix that and the 3x3x2 array is actually closer to what you want probably.
Actually the surprise is still hidden in a way, because in your examples a[indices] is the same as a[indices[0], indicies[1]] but a[indicies,:] is a[(indicies[0], indicies[1]),:] which is not a big surprise that it is different. Note that a[indicies[0], indicies[1],:] does give the same result.
See : http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
When you add :, you are mixing integer indexing and slicing. The rules are quite complicated and better explained than I could in the above link.