How do you apply a shift to a Julia Dataframe? - julia

In python pandas, the shift function is useful to shift the rows in the dataframe forward and possible relative to the original which allows for calculating changes in time series data. What is the equivalent method in Julia?

Normally one would use ShiftedArrays.jl and apply it to columns that require shifting.

Here is a small working example:
using DataFrames, ShiftedArrays
df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 2 5
3 │ 3 6
transform(df, :a => lag => :lag_a)
3×3 DataFrame
Row │ a b lag_a
│ Int64 Int64 Int64?
─────┼───────────────────────
1 │ 1 4 missing
2 │ 2 5 1
3 │ 3 6 2
or you could do:
df.c = lag(df.a)
or, to have the lead of two rows:
df.c = lead(df.a, 2)
etc.

Related

How this below R code equivalent to Julia code?

I am trying some julia code as shown here:
However, I get an error:
In Julia 1 and 1.0 are different. 1 is an Integer while 1.0 is a floating point number. R only has floating point numbers. you want x and y to be Integers.
You are most likely using incorrectly the filtering.
Suppose you have a data frame:
data = DataFrame(a=1:6, b='a':'f');
One way to filter would be to use a BitVector such as:
julia> rows = data.a .< 3
6-element BitVector:
1
1
0
0
0
0
julia> data[rows, :]
2×2 DataFrame
Row │ a b
│ Int64 Char
─────┼─────────────
1 │ 1 a
2 │ 2 b
You could of course just write data[data.a .< 3, :]
If you want to use filter instead the code could look like this:
julia> filter(row -> row.a < 3, data)
2×2 DataFrame
Row │ a b
│ Int64 Char
─────┼─────────────
1 │ 1 a
2 │ 2 b

How to create TypedTable from names (strings) and vectors?

Consider
import TypedTables as TT
TT.Table(this=[1,2,3])
Fine. Now instead I have
a = "this"
b = [1,2,3]
How do I create the same table from a and b? Going via a NamedTyple is a bit round about but seems to work:
TT.Table((; Symbol(a) =>b))
Is a less round about approach available?
You can skip NamedTuple construction and just pass this as kwargs:
julia> Table(;Symbol(a) =>b)
Table with 1 column and 3 rows:
this
┌─────
1 │ 1
2 │ 2
3 │ 3
Regarding the multi-column comments:
julia> as = ["this", "that"];
julia> bs = [[1,2,3],[4,5,6]];
julia> Table(; (Symbol.(as) .=> bs)...)
Table with 2 columns and 3 rows:
this that
┌───────────
1 │ 1 4
2 │ 2 5
3 │ 3 6

Is there as.factor analogue in Julia?

I have an integer column in dataframe. How can I convert its values into string in Julia?
In R a can simply write:
mutate(column2 = as.factor(column1))
In Julia:
julia> using DataFramesMeta, CategoricalArrays
julia> df = DataFrame(a=1:3, b='a':'c')
3×2 DataFrame
Row │ a b
│ Int64 Char
─────┼─────────────
1 │ 1 a
2 │ 2 b
3 │ 3 c
julia> #transform!(df, :b = categorical(:b))
3×2 DataFrame
Row │ a b
│ Int64 Cat…
─────┼─────────────
1 │ 1 a
2 │ 2 b
3 │ 3 c
or #transform if you want a new data frame. Also target column name can be different e.g. :b_categorical = categorical(:b).

Return the maximum sum in `DataFrames.jl`?

Suppose my DataFrame has two columns v and g. First, I grouped the DataFrame by column g and calculated the sum of the column v. Second, I used the function maximum to retrieve the maximum sum. I am wondering whether it is possible to retrieve the value in one step? Thanks.
julia> using Random
julia> Random.seed!(1)
TaskLocalRNG()
julia> dt = DataFrame(v = rand(15), g = rand(1:3, 15))
15×2 DataFrame
Row │ v g
│ Float64 Int64
─────┼──────────────────
1 │ 0.0491718 3
2 │ 0.119079 2
3 │ 0.393271 2
4 │ 0.0240943 3
5 │ 0.691857 2
6 │ 0.767518 2
7 │ 0.087253 1
8 │ 0.855718 1
9 │ 0.802561 3
10 │ 0.661425 1
11 │ 0.347513 2
12 │ 0.778149 3
13 │ 0.196832 1
14 │ 0.438058 2
15 │ 0.0113425 1
julia> gdt = combine(groupby(dt, :g), :v => sum => :v)
3×2 DataFrame
Row │ g v
│ Int64 Float64
─────┼────────────────
1 │ 1 1.81257
2 │ 2 2.7573
3 │ 3 1.65398
julia> maximum(gdt.v)
2.7572966050340257
I am not sure if that is what you mean but you can retrieve the values of g and v in one step using the following command:
julia> v, g = findmax(x-> (x.v, x.g), eachrow(gdt))[1]
(4.343050512360169, 3)
DataFramesMeta.jl has an #by macro:
julia> #by(dt, :g, :sv = sum(:v))
3×2 DataFrame
Row │ g sv
│ Int64 Float64
─────┼────────────────
1 │ 1 1.81257
2 │ 2 2.7573
3 │ 3 1.65398
which gives you somewhat neater syntax for the first part of this.
With that, you can do either:
julia> #by(dt, :g, :sv = sum(:v)).sv |> maximum
2.7572966050340257
or (IMO more readably):
julia> #chain dt begin
#by(:g, :sv = sum(:v))
maximum(_.sv)
end
2.7572966050340257

Julia dataframe : Deleting row from sub dataframe

Problem statement : deleting row from sub dataframe
Code:
x=[rand(3) for i in 1:3]
dfx=DataFrame(x,:auto)
dfy=#view dfx[2:3,:]
Q: I want to delete first row from dfy so it will be deleted from dfx too.
I do subset of original dfx to make further checking of subsetted rows if they fulfill conditions. At the end I want to decide to keep row in dfx or to delete it. I operate on subset of dfx which is dfy.
You are not allowed to perform row deletion in views. Here is one example showing why it would be problematic:
julia> using DataFrames
julia> df = DataFrame(a=1:3)
dfv = 3×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
julia> dfv = view(df, [1, 1, 1, 1], :)
4×1 SubDataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
2 │ 1
3 │ 1
4 │ 1
and now assume you want to remove rows 2 and 3 from the dfv view, but you cannot remove the row from the parent twice and also after such a deletion what would be the state of dfv?
I do subset of original dfx to make further checking of subsetted rows if they fulfill conditions.
Note that you can use parentindices function to get the indices in the parent of your view, so that you can later remove appropriate rows from the parent.
EDIT
An example:
julia> x=[rand(3) for i in 1:3]
3-element Vector{Vector{Float64}}:
[0.9362990387940191, 0.872386665989372, 0.9062520245175714]
[0.31161625031197393, 0.21614040488877717, 0.7277794414244152]
[0.35548885964798926, 0.4422493896149622, 0.45150837090448315]
julia> dfx=DataFrame(x, :auto)
3×3 DataFrame
Row │ x1 x2 x3
│ Float64 Float64 Float64
─────┼──────────────────────────────
1 │ 0.936299 0.311616 0.355489
2 │ 0.872387 0.21614 0.442249
3 │ 0.906252 0.727779 0.451508
julia> dfy=#view dfx[2:3, :]
2×3 SubDataFrame
Row │ x1 x2 x3
│ Float64 Float64 Float64
─────┼──────────────────────────────
1 │ 0.872387 0.21614 0.442249
2 │ 0.906252 0.727779 0.451508
julia> row_to_remove = parentindices(dfy)[1][1]
2
julia> delete!(dfx, row_to_remove)
2×3 DataFrame
Row │ x1 x2 x3
│ Float64 Float64 Float64
─────┼──────────────────────────────
1 │ 0.936299 0.311616 0.355489
2 │ 0.906252 0.727779 0.451508

Resources