Add a column of DateTime to an existing dataframe - julia

I have the following DataFrame:
using DataFrames, Dates
df = DataFrame( A=Int[], B=Int[] )
push!(df, [1, 10])
push!(df, [2, 20])
push!(df, [3, 30])
df[!, :C] .= DateTime()
for r in eachrow(df)
# Super complex function, simplified for example
r.C = now() + Day(r.A)
end
But df[!, :C] .= DateTime() does not create a column with DateTime's (I want just to allocate the column, actual DateTimes will be populated via for i in eachrow loop).

There is no method DateTime(), try
df[!, :C] .= DateTime(0)
instead.

While #Daniel's answer is correct to question as asked.
You could replace your for loop with a map,
and avoid having to preallocate.
df.C = map(eachrow(df) do r
# Super complex function, simplified for example
return now() + Day(r.A)
end

Related

Julia - last() does not work inside the function

I am writing the below code, and in the code the last() function does not work. I get the below error.
ERROR: UndefVarError: last not defined
But when I use last() outside of the function with same logic it works.
I am trying to write the below function -
function mergeOverlappingIntervals(intervals)
sort!(intervals, by = x -> intervals[1])
new_interval = intervals[1]
for i in range(2, length(intervals))
if last(new_intervals)[2] >= intervals[i][1]
last(new_intervals) = [minimum!(last(new_intervals)[1], intervals[i][1]), maximum!(last(new_intervals)[2], intervals[2])]
else
push!(new_interval, intervals[i])
end
end
end
Can you please help?
If you are trying to merge the intervals, note that your sort does not work since `by = x -> intervals[1] is a constant: you wanted to say by = x -> x[1]. and the default sort on vectors does that already.
You could instead do:
using Random
intervals = shuffle([[1, 2], [3, 5], [4, 6]])
function mergeintervals(intervals)
#assert length(intervals) > 1
sort!(intervals)
a = [copy(intervals[begin])]
for v in #view intervals[begin+1:end]
if a[end][end] >= v[begin]
if a[end][end] < v[end]
a[end][end] = v[end]
end
else
push!(a, copy(v))
end
end
return a
end
#show mergeintervals(intervals) # [[1, 2], [3, 6]]
The reason last(x) does not work is that (unlike x[end]) it does not return an lvalue, that is, it does not produce a value or syntactic expression that you can assign to. So Julia thinks you are trying to redefine the function last(x) when you attempt to assign to it (as DNF pointed out). (An lvalue is something that can be used as the left hand side of an assignment, which in Julia does not mean it is a direct memory address: see below).
A straightforward implementation, after fixing two errors. last() cannot be assigned to as others said, and minimum or maximum are used with arrays, you want max here to compare scalars. Finally you should return the new merged stack of intervals.
function mergeIntervals(intervals)
sort!(intervals)
stack = [intervals[1]]
for i in 2:length(intervals)
if stack[end][2] < intervals[i][1]
push!(stack, intervals[i])
else
stack[end][2] = max(stack[end][2], intervals[i][2])
end
end
return stack
end
Test an example:
intervals = [[6, 8], [1, 9], [2, 4], [4, 7]]
mergeIntervals(intervals)
[[1, 9]]
I was able to solve the problem by taking the suggestions from all the answers posted. Thank you.
Below is the code -
function mergeOverlappingIntervals(intervals)
sort!(intervals, by = x -> intervals[1])
new_interval = [intervals[1]]
for i in range(2, length(intervals))
if new_interval[end][2] >= intervals[i][1]
new_interval[end] = [min(new_interval[end][1], intervals[i][1]), max(new_interval[end][2], intervals[i][2])]
else
push!(new_interval, intervals[i])
end
end
return new_interval
end

Convert a vector of tuples in an array in JULIA

I'm quite new to Julia and I'm trying to convert a vector of tuples in array.
Here's an example
using Statistics
a = randn((10, 100))
q = (0.05, 0.95)
conf_intervals = [quantile(a[i,:], q) for i in 1:10]
and conf_intervals is a 10-element Vector{Tuple{Float64, Float64}}.
The expected result should be a 10×2 Matrix{Float64}
I tried splatting conf_intervals with [conf_intervals...] but the vector doesn't change.
Thank you very much
You can use a comprehension:
mat2x10 = [tup[k] for k in 1:2, tup in conf_intervals]
mat10x2 = [tup[k] for tup in conf_intervals, k in 1:2]
Or you can just re-interpret the same memory. This is more fragile -- it won't work for all vectors of tuples, e.g. Any[(i, i^2/2) for i in 1:10]. But for Vector{Tuple{Float64, Float64}}:
if VERSION >= v"1.6"
reinterpret(reshape, Float64, conf_intervals)
else
reshape(reinterpret(Float64, conf_intervals), 2, :)
end
mat2x10 == ans # true
You need to use collect to convert tuples to vectors, and then you can combine them:
julia> hcat(collect.(conf_intervals)...)
2×10 Matrix{Float64}:
-1.59757 -2.10057 -1.4437 -1.32868 -1.10686 -1.41256 -1.5696 -1.67288 -1.51947 -1.72257
1.24604 1.61692 1.77684 1.3599 1.90853 1.30831 1.10667 1.58356 1.56811 1.70685
If you need to transpose the result, add an apostrophe ' end the end of the command

Julia : Multidimensional Array indexing by a predicate

I have a multidimensional array 69 x 4 in JULIA. I would like to filter the rows using a condition on one of the columns of the frame.
updown[updown[:,4] .> .5]
does not seem to work.
You could pass something for the second axis, basically saying "all columns":
julia> updown = randn(69, 4);
julia> updown[updown[:, 4] .> 1.5, :]
4×4 Array{Float64,2}:
1.76637 -0.307257 -0.125816 1.89179
0.0858598 -0.812886 -0.030113 1.66113
-0.144546 0.374371 -0.731996 1.56694
0.330211 0.108665 0.98783 1.71425

Convert Dict to DataFrame in Julia

Suppose I have a Dict defined as follows:
x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [4,5,6])
I want to convert this to a DataFrame object (from the DataFrames module). Constructing a DataFrame has a similar syntax to constructing a dictionary. For example, the above dictionary could be manually constructed as a data frame as follows:
DataFrame(A = [1,2,3], B = [4,5,6])
I haven't found a direct way to get from a dictionary to a data frame but I figured one could exploit the syntactic similarity and write a macro to do this. The following doesn't work at all but it illustrates the approach I had in mind:
macro dict_to_df(x)
typeof(eval(x)) <: Dict || throw(ArgumentError("Expected Dict"))
return quote
DataFrame(
for k in keys(eval(x))
#eval ($k) = $(eval(x)[$k])
end
)
end
end
I also tried writing this as a function, which does work when all dictionary values have the same length:
function dict_to_df(x::Dict)
s = "DataFrame("
for k in keys(x)
v = x[k]
if typeof(v) <: AbstractString
v = string('"', v, '"')
end
s *= "$(k) = $(v),"
end
s = chop(s) * ")"
return eval(parse(s))
end
Is there a better, faster, or more idiomatic approach to this?
Another method could be
DataFrame(Any[values(x)...],Symbol[map(symbol,keys(x))...])
It was a bit tricky to get the types in order to access the right constructor. To get a list of the constructors for DataFrames I used methods(DataFrame).
The DataFrame(a=[1,2,3]) way of creating a DataFrame uses keyword arguments. To use splatting (...) for keyword arguments the keys need to be symbols. In the example x has strings, but these can be converted to symbols. In code, this is:
DataFrame(;[Symbol(k)=>v for (k,v) in x]...)
Finally, things would be cleaner if x had originally been with symbols. Then the code would go:
x = Dict{Symbol,Array{Integer,1}}(:A => [1,2,3], :B => [4,5,6])
df = DataFrame(;x...)

Julia list comprehension changes the type

Suppose we have a Vector of tuples (Int64, Int64) in julia:
In [1] xx = [(1, 2), (3, 4), (5, 6)]
typeof(xx) == Vector{(Int64, Int64)}
Out[1] true
Now I want to construct a new vector of the first indices of the tuples.
In [2] indices = [x[1] for x in xx]
typeof(indices)
Out[2] Array{Any, 1}
I expect it to be an Array{Int64, 1} type. How can I fix this?
edit: I am using 0.3.9.
function f()
xx = [(1, 2), (3, 4), (5, 6)]
inds = [ x[1] for x in xx ]
return(inds)
end
y = f()
typeof(y)
The last line of code returns Array{Int64, 1}.
The problem here is that you are working in global scope. For Julia's type inference to be able to do its magic, you need to work in a local scope. In other words, wrap all your code in functions. This rule is very, very, important, but, having come from a MatLab background myself, I can see why people forget it. Just remember, 90% of questions saying "Why is my Julia code slow?" occur because the user was working in global scope, not local scope.
ps, even in local scope, type inference of loop comprehensions can stumble in particularly complex cases. This is a known issue and is being worked on. If you want to provide the compiler with some "help" you can do something like:
inds = Int[ x[1] for x in xx ]
You can also use map and preserve the type:
#passing a lambda that takes the 1st element, and the iterable
inds = map( (x)-> x[1], xx)

Resources