Is there a way to convert an object in Julia to a code representation generating the same object?
I am basically looking for an equivalent to R's dput function.
So if I have an object like:
A = rand(2,2)
# Which outputs
>2×2 Array{Float64,2}:
0.0462887 0.365109
0.698356 0.302478
I can do something like dput(A) which prints something like the following to the console that can be copy-pasted to be able to replicate the object:
[0.0462887 0.365109; 0.698356 0.302478]
I think you are looking for repr:
julia> A = rand(2, 2);
julia> repr(A)
"[0.427705 0.0971806; 0.395074 0.168961]"
Just use Base.dump.
julia> dump(rand(2,2))
Array{Float64}((2, 2)) [0.162861 0.434463; 0.0823066 0.519742]
You can copy the second part.
(This is a modified crosspost of https://stackoverflow.com/a/73337342/18431399)
repr might not work as expected for DataFrames.
Here is one way to mimic the behaviour of R's dput for DataFrames in Julia:
julia> using DataFrames
julia> using Random; Random.seed!(0);
julia> df = DataFrame(a = rand(3), b = rand(1:10, 3))
3×2 DataFrame
Row │ a b
│ Float64 Int64
─────┼──────────────────
1 │ 0.405699 1
2 │ 0.0685458 7
3 │ 0.862141 2
julia> repr(df) # Attempting with repr()
"3×2 DataFrame\n Row │ a b\n │ Float64 Int64\n─────┼──────────────────\n 1 │ 0.405699 1\n 2 │ 0.0685458 7\n 3 │ 0.862141 2"
julia> julian_dput(x) = invoke(show, Tuple{typeof(stdout), Any}, stdout, df);
julia> julian_dput(df)
DataFrame(AbstractVector[[0.4056994708920292, 0.06854582438651502, 0.8621408571954849], [1, 7, 2]], DataFrames.Index(Dict(:a => 1, :b => 2), [:a, :b]))
That is, julian_dput() takes a DataFrame as input and returns a string that can generate the input.
Source: https://discourse.julialang.org/t/given-an-object-return-julia-code-that-defines-the-object/80579/12
Related
I have a Julia DataFrame which I can work with fine in any normal way. For example, let's say the df is
df = DataFrame(:x => [1,2,3], :y => [4,5,6], :z => [7,8,9]);
I can easily do
julia> df[:, :x]
3-element Vector{Int64}:
1
2
3
julia> df[:, [:x, Symbol("y")]]
3×2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 2 5
3 │ 3 6
However, if I do the following I get a massive error.
julia> #select(df, Symbol("y"))
ERROR: LoadError: ArgumentError: Malformed expression in DataFramesMeta.jl macro
Stacktrace:
[1] fun_to_vec(ex::Expr; gensym_names::Bool, outer_flags::NamedTuple{(Symbol("#byrow"), Symbol("#passmissing"), Symbol("#astable")), Tuple{Base.RefValue{Bool}, Base.RefValue{Bool}, Base.RefValue{Bool}}}, no_dest::Bool)
# DataFramesMeta ~/.julia/packages/DataFramesMeta/yzaoq/src/parsing.jl:289
[2] (::DataFramesMeta.var"#47#48"{NamedTuple{(Symbol("#byrow"), Symbol("#passmissing"), Symbol("#astable")), Tuple{Base.RefValue{Bool}, Base.RefValue{Bool}, Base.RefValue{Bool}}}})(ex::Expr)
# DataFramesMeta ./none:0
[3] iterate(::Base.Generator{Vector{Any}, DataFramesMeta.var"#47#48"{NamedTuple{(Symbol("#byrow"), Symbol("#passmissing"), Symbol("#astable")), Tuple{Base.RefValue{Bool}, Base.RefValue{Bool}, Base.RefValue{Bool}}}}})
# Base ./generator.jl:47
[4] select_helper(x::Symbol, args::Expr)
# DataFramesMeta ~/.julia/packages/DataFramesMeta/yzaoq/src/macros.jl:1440
[5] var"#select"(__source__::LineNumberNode, __module__::Module, x::Any, args::Vararg{Any})
# DataFramesMeta ~/.julia/packages/DataFramesMeta/yzaoq/src/macros.jl:1543
in expression starting at REPL[35]:1
Any clues?
If you want the call to Symbol constructor be parsed verbatim you need to escape it with $:
julia> #select(df, $(Symbol("y")))
3×1 DataFrame
Row │ y
│ Int64
─────┼───────
1 │ 4
2 │ 5
3 │ 6
See https://juliadata.github.io/DataFramesMeta.jl/stable/#dollar for more examples.
The reason why this is needed is because DataFramesMeta.jl introduces non standard evaluation, so if you want things to be evaluated in a standard way you need to escape them.
pandas has a number of very handy utilities for manipulating datetime indices. Is there any similar functionality in Julia? I have not found any tutorials for working with such things, though it obviously must be possible.
Some examples of pandas utilities:
dti = pd.to_datetime(
["1/1/2018", np.datetime64("2018-01-01"),
datetime.datetime(2018, 1, 1)]
)
dti = pd.date_range("2018-01-01", periods=3, freq="H")
dti = dti.tz_localize("UTC")
dti.tz_convert("US/Pacific")
idx = pd.date_range("2018-01-01", periods=5, freq="H")
ts = pd.Series(range(len(idx)), index=idx)
ts.resample("2H").mean()
Julia libraries have "do only one thing but do it right" philosophy so the layout of its libraries matches perhaps more a Unix (battery of small tools that allow to accomplish a common goal) rather then Python's.
Hence you have separate libraries for DataFrames and Dates:
julia> using Dates, DataFrames
Going through some of the examples of your tutorial:
Pandas
dti = pd.to_datetime(
["1/1/2018", np.datetime64("2018-01-01"), datetime.datetime(2018, 1, 1)]
)
Julia
julia> DataFrame(dti=[Date("1/1/2018", "m/d/y"), Date("2018-01-01"), Date(2018,1,1)])
3×1 DataFrame
Row │ dti
│ Date
─────┼────────────
1 │ 2018-01-01
2 │ 2018-01-01
3 │ 2018-01-01
Pandas
dti = pd.date_range("2018-01-01", periods=3, freq="H")
Julia
julia> DateTime("2018-01-01") .+ Hour.(0:2)
3-element Vector{DateTime}:
2018-01-01T00:00:00
2018-01-01T01:00:00
2018-01-01T02:00:00
Pandas
dti = dti.tz_localize("UTC")
dti.tz_convert("US/Pacific")
Julia
Note that that there is a separate library in Julia for time zones. Additionally "US/Pacific" is a legacy name of a time zone.
julia> using TimeZones
julia> dti = ZonedDateTime.(dti, tz"UTC")
3-element Vector{ZonedDateTime}:
2018-01-01T00:00:00+00:00
2018-01-01T01:00:00+00:00
2018-01-01T02:00:00+00:00
julia> julia> astimezone.(dti, TimeZone("US/Pacific", TimeZones.Class(:LEGACY)))
3-element Vector{ZonedDateTime}:
2017-12-31T16:00:00-08:00
2017-12-31T17:00:00-08:00
2017-12-31T18:00:00-08:00
Pandas
idx = pd.date_range("2018-01-01", periods=5, freq="H")
ts = pd.Series(range(len(idx)), index=idx)
ts.resample("2H").mean()
Julia
For resampling or other complex manipulations you will want to use the split-apply-combine pattern (see https://docs.juliahub.com/DataFrames/AR9oZ/1.3.1/man/split_apply_combine/)
julia> df = DataFrame(date=DateTime("2018-01-01") .+ Hour.(0:4), vals=1:5)
5×2 DataFrame
Row │ date vals
│ DateTime Int64
─────┼────────────────────────────
1 │ 2018-01-01T00:00:00 1
2 │ 2018-01-01T01:00:00 2
3 │ 2018-01-01T02:00:00 3
4 │ 2018-01-01T03:00:00 4
5 │ 2018-01-01T04:00:00 5
julia> df.date2 = floor.(df.date, Hour(2));
julia> using StatsBase
julia> combine(groupby(df, :date2), :date2, :vals => mean => :vals_mean)
5×2 DataFrame
Row │ date2 vals_mean
│ DateTime Float64
─────┼────────────────────────────────
1 │ 2018-01-01T00:00:00 1.5
2 │ 2018-01-01T00:00:00 1.5
3 │ 2018-01-01T02:00:00 3.5
4 │ 2018-01-01T02:00:00 3.5
5 │ 2018-01-01T04:00:00 5.0
I must be doing something wrong. I have a Julia script (below) that uses both vcat and plot. When I run the script, vcat returns an empty DataFrame. Another function calls plot and no plot is generated.
When I manually type the commands in the terminal window the commands behave normally.
Any help would be appreciated.
f_l = file_list[start_row_num:end_row_num] # Build a dataframe containing the data
len = length(f_l)
tmp_stock_df = DataFrame(CSV.File(f_l[1]))
vcat(s_d_df, tmp_stock_df)
println(s_d_df)
for i = 2:len
tmp_stock_df = DataFrame(CSV.File(f_l[i]))
tmp_stock_df.quote_datetime = map((x) -> DateTime(x, "yyyy-mm-dd HH:MM:SS"), tmp_stock_df.quote_datetime)
DataFrames.vcat(s_d_df, tmp_stock_df)
end
It's hard to say what you're doing differently when manually typing in the commands, but it seems to me that this code would ever produce the results you're looking for. Apart from the fact that s_d_df is not defined, vcat does not mutate its arguments, and therefore you're never actually adding to your DataFrame:
julia> using DataFrames
julia> df1 = DataFrame(a = rand(2), b = rand(2)); df2 = DataFrame(a = rand(2), b = rand(2));
julia> vcat(df1, df2)
4×2 DataFrame
Row │ a b
│ Float64 Float64
─────┼────────────────────
1 │ 0.918298 0.343344
2 │ 0.538763 0.188229
3 │ 0.347177 0.385166
4 │ 0.18795 0.98408
julia> df1
2×2 DataFrame
Row │ a b
│ Float64 Float64
─────┼────────────────────
1 │ 0.918298 0.343344
2 │ 0.538763 0.188229
You probably want s_d_df = vcat(s_d_df, tmp_stock_df) to assign the result of the concatenation.
On a related note, it looks like you just have a list of files f_l with different csv files stored on your system which you want to read into a single DataFrame, in which case you can just replace the whole loop with:
s_d_df = vcat(CSV.read.(f_l, DataFrame)...)
(potentially also use the dateformat = "yyyy-mm-dd HH:MM:SS" kwarg in CSV.read to directly parse the dates when reading in the file).
I am new to Julia, when i am trying to import csv file
using CSV
CSV.read("C:\\Users\\...\\loan_predicton.csv")
I am getting below error
Error : ArgumentError: provide a valid sink argument, like `using DataFrames; CSV.read(source, DataFrame)`
Use:
using CSV
using DataFrames
df = CSV.read("C:\\Users\\...\\loan_predicton.csv", DataFrame)
After you will get some more experience with Julia you will find out that you can read a CSV file into different tabular data formats. That is why CSV.read asks you to provide the type of the output you want to read your data into. Here is a small example:
julia> write("test.csv",
"""
a,b,c
1,2,3
4,5,6
""")
18
julia> using CSV, DataFrames
julia> CSV.read("test.csv", DataFrame)
2×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 2 3
2 │ 4 5 6
julia> CSV.read("test.csv", NamedTuple)
(a = [1, 4], b = [2, 5], c = [3, 6])
and you can see that in the first case you stored the result in a DataFrame, and in the second a NamedTuple.
I am new to Julia and am working with creating a properly shaped multidimensional array.
function get_deets(curric)
curric = curric.metrics
return ["" curric["complexity"][1] curric["blocking factor"][1] curric["delay factor"][1]]
end
function compare_currics(currics...)
headers = [" ", "Complexity", "Blocking Factor", "Delay Factor"]
data = [get_deets(curric) for curric in currics]
return pretty_table(data, headers)
end
The data I am getting back is:
3-element Array{Array{Any,2},1}:
["" 393.0 184 209.0]
["" 361.0 164 197.0]
["" 363.0 165 198.0]
However, I need something that looks like this:
3×4 Array{Any,2}:
"" 393.0 184 209.0
"" 361.0 164 197.0
"" 363.0 165 198.0
I would replace the comprehension [get_deets(curric) for curric in currics] with a reduction.
For example:
using Random
function getdeets(curric)
# random "deets", as a 1-D Vector
return [randstring(4), rand(), 10rand(), 100rand()]
end
function getdata(currics)
# All 1-D vectors are concatenated horizontally, to produce a
# 2-D matrix with "deets" as columns (efficient since Julia matrices
# are stored in column major order)
data = reduce(hcat, getdeets(curric) for curric in currics)
return data
end
With this, you get a slightly different structure than what you want: it is transposed, but that should be more efficient overall
julia> getdata(1:3)
4×3 Array{Any,2}:
"B2Mq" "S0hO" "6KCn"
0.291359 0.00046518 0.905285
4.03026 0.612037 8.6458
35.3133 79.3744 6.49379
If you want your tabular data to be presented in the same way as your question, this solution can easily be adapted:
function getdeets(curric)
# random "deets", as a row matrix
return [randstring(4) rand() 10rand() 100rand()]
end
function getdata(currics)
# All rows are concatenated vertically, to produce a
# 2-D matrix
data = reduce(vcat, getdeets(curric) for curric in currics)
return data
end
This produces:
julia> getdata(1:3)
3×4 Array{Any,2}:
"eU7p" 0.563626 0.282499 52.1877
"3pIw" 0.646435 8.16608 27.534
"AI6z" 0.86198 0.235428 25.7382
It looks like for the stuff you want to do you need a DataFrame rather than an Array.
Look at the sample Julia session below:
julia> using DataFrames, Random
julia> df = DataFrame(_=randstring(4), Complexity=rand(4), Blocking_Factor=rand(4), Delay_Factor=rand(4))
4×4 DataFrame
│ Row │ _ │ Complexity │ Blocking_Factor │ Delay_Factor │
│ │ String │ Float64 │ Float64 │ Float64 │
├─────┼────────┼────────────┼─────────────────┼──────────────┤
│ 1 │ S6vT │ 0.817189 │ 0.00723053 │ 0.358754 │
│ 2 │ S6vT │ 0.569289 │ 0.978932 │ 0.385238 │
│ 3 │ S6vT │ 0.990195 │ 0.232987 │ 0.434745 │
│ 4 │ S6vT │ 0.59623 │ 0.113731 │ 0.871375 │
julia> Matrix(df[!,2:end])
4×3 Array{Float64,2}:
0.817189 0.00723053 0.358754
0.569289 0.978932 0.385238
0.990195 0.232987 0.434745
0.59623 0.113731 0.871375
Note that in the last part we have converted the numerical part of the data into an Array (I assume you need an Array at some point). Note that this Array is containing only Float64 elements. In practice this means that no boxing will occur when storing values and any operation on such Array will be an order of magnitude faster. To illustrate the point have a look at the code below (I copy the data from df into two almost identical Arrays).
julia> m = Matrix(df[!,2:end])
4×3 Array{Float64,2}:
0.817189 0.00723053 0.358754
0.569289 0.978932 0.385238
0.990195 0.232987 0.434745
0.59623 0.113731 0.871375
julia> m2 = Matrix{Any}(df[!,2:end])
4×3 Array{Any,2}:
0.817189 0.00723053 0.358754
0.569289 0.978932 0.385238
0.990195 0.232987 0.434745
0.59623 0.113731 0.871375
julia> using BenchmarkTools
julia> #btime mean($m)
5.099 ns (0 allocations: 0 bytes)
0.5296580253263143
julia> #btime mean($m2)
203.103 ns (12 allocations: 192 bytes)
0.5296580253263143