I have a dataframe with some columns,sometimes it can be : [Type_House, Name, Location].
And sometimes it can be: [Type_Build, Name, Location]
There is a way to acess this dataframe column Type dynamically, like?
colName = "House"
dataframe.Type_colName
Thanks.
if you have
colName = "House"
you can access the column with
df[!, colName]
and from there you can use typeof() or eltype() to get the type or element type of that column
As indicated by #jling but specific to your question it would be:
> colName = "House"
> df[!, "Type_"*colName]
or
> getproperty(df, "Type_"*colName)
then you can just change colName="Build" to select the other column.
If you want to access the column that starts with Type_, you can use the names function this way:
julia> df = DataFrame( Type_Build = ["foo", "bar"], Name = ["A", "B"])
2×2 DataFrame
Row │ Type_Build Name
│ String String
─────┼────────────────────
1 │ foo A
2 │ bar B
julia> names(df, startswith("Type_"))
1-element Vector{String}:
"Type_Build"
To access the values in the column, you can use that to index into the dataframe:
julia> df[!, names(df, startswith("Type_"))]
2×1 DataFrame
Row │ Type_Build
│ String
─────┼────────────
1 │ foo
2 │ bar
Related
I have the following bar plot:
using Plots, DataFrames
Plots.bar(["A", "B", "c"],[6,5,3],fillcolor=[:red,:green,:blue], legend = :none)
Output:
I would like to add a simple small table inside to plot in the top right corner. The table should have the following values of the dataframe:
df = DataFrame(x = ["A", "B", "c"], y = [6,5,3])
3×2 DataFrame
Row │ x y
│ String Int64
─────┼───────────────
1 │ A 6
2 │ B 5
3 │ c 3
So I was wondering if anyone knows how to add a simple table to a Plots graph in Julia?
You can use the following:
using Plots, DataFrames
df = DataFrame(; x=["A", "B", "c"], y=[6, 5, 3])
plt = Plots.bar(
df[!, :x], df[!, :y]; fillcolor=[:red, :green, :blue], legend=:none, dpi=300
)
Plots.annotate!(
1,
4,
Plots.text(
replace(string(df), ' ' => '\u00A0'); family="Courier", halign=:left, color="black"
),
)
Plots.savefig("plot.svg")
Which gets you the following plot (after conversion to PNG for uploading to StackOverflow):
Note that we have to replace space characters with '\u00a0', non-breaking space, to prevent multiple consecutive spaces from being collapsed by the SVG (SVG‘s collapse consecutive spaces by default).
Suppose I have a DataFrame with numeric elements. I want to check that all the elements are non-negative. I can do something like:
df .> 0
which results in a DataFrame of ones and zeros. How do I reduce it to a one true/false value?
The almost non-allocating and efficient way to do it is:
all(all.(>(0), eachcol(df)))
or
all(all.(x -> isless(0, x), eachcol(df)))
depending on how you want to handle missing values.
Here is an example of the difference:
julia> df = DataFrame(a=[1, missing], b=1:2)
2×2 DataFrame
Row │ a b
│ Int64? Int64
─────┼────────────────
1 │ 1 1
2 │ missing 2
julia> all(all.(>(0), eachcol(df)))
missing
julia> all(all.(x -> isless(0, x), eachcol(df)))
true
as with isless missing value is treated as greater than any other value.
I have an excel file with dates and stock prices. I read this data into a dataframe with DataFrames.jl
using DataFrames, StatsPlots, Indicators
df = DataFrame(XLSX.readtable("Demo-sv.xlsx", "Blad3")...)
This works great and here I print the first 6 entries.
6×2 DataFrame
│ Row │ Date │ Closeprice │
│ │ Any │ Any │
├─────┼────────────┼────────────┤
│ 1 │ 2019-05-03 │ 169.96 │
│ 2 │ 2019-05-02 │ 168.06 │
│ 3 │ 2019-04-30 │ 165.58 │
│ 4 │ 2019-04-29 │ 166.4 │
│ 5 │ 2019-04-26 │ 167.76 │
│ 6 │ 2019-04-25 │ 167.46 │
I then plot this data with StatsPlots.jl #df df plot(df.Date, df.Closeprice) and get a nice plot graph.
The problem is when I want to plot a simple moving average with Indicators.jl
movingaverage = sma(df, n=200)
plot!(movingaverage, linewidth=2, color=:red)
I get this error message
ERROR: LoadError: MethodError: no method matching sma(::DataFrame; n=200)
Closest candidates are:
sma(::Array{T,N} where N; n) where T<:Real at
/Users/HBrovell/.julia/packages/Indicators/QGmEX/src/ma.jl:8
sma(::Temporal.TS{V,T}; args...) where {V, T} at
/Users/HBrovell/.julia/packages/Indicators/QGmEX/src/temporal.jl:64
What I understand, I need to convert the DataFrame so I will be able to use the Indicators.jl sma function. I have tried with convert(Array{Float64}, df[2]) to only convert the Closeprice column, but that didn't work the way I wanted. I guess I don't want to convert the date column?
So how can I convert the DataFrame, so I can use the sma function in Indicators.jl, or is there a better way than using DataFrames.jl?
I assume what you need is:
sma(sort(df, :Date).ClosePrice, n=200)
One additional problem you have is the data type of your ClosePrice column that should be numeric rather than Any
You need to convert it somehow, for an example:
df[!, :ClosePrice] .= Float64.(df.ClosePrice)
I have a Dataframe of several columns say column1, column2...column100. How do I select only a subset of the columns eg (not column1) should return all columns column2...column100.
data[[colnames(data) .!= "column1"]])
doesn't seem to work.
I don't want to mutate the dataframe. I just want to select all the columns that don't have a particular column name like in my example
EDIT 2/7/2021: as people seem to still find this on Google, I'll edit this to say write at the top that current DataFrames (1.0+) allows both Not() selection supported by InvertedIndices.jl and also string types as column names, including regex selection with the r"" string macro. Examples:
julia> df = DataFrame(a1 = rand(2), a2 = rand(2), x1 = rand(2), x2 = rand(2), y = rand(["a", "b"], 2))
2×5 DataFrame
Row │ a1 a2 x1 x2 y
│ Float64 Float64 Float64 Float64 String
─────┼────────────────────────────────────────────────
1 │ 0.784704 0.963761 0.124937 0.37532 a
2 │ 0.814647 0.986194 0.236149 0.468216 a
julia> df[!, r"2"]
2×2 DataFrame
Row │ a2 x2
│ Float64 Float64
─────┼────────────────────
1 │ 0.963761 0.37532
2 │ 0.986194 0.468216
julia> df[!, Not(r"2")]
2×3 DataFrame
Row │ a1 x1 y
│ Float64 Float64 String
─────┼────────────────────────────
1 │ 0.784704 0.124937 a
2 │ 0.814647 0.236149 a
Finally, the names function has a method which takes a type as its second argument, which is handy for subsetting DataFrames by the element type of each column:
julia> df[!, names(df, String)]
2×1 DataFrame
Row │ y
│ String
─────┼────────
1 │ a
2 │ a
In addition to indexing with square brackets, there's also the select function (and its mutating equivalent select!), which basically takes the same input as the column index in []-indexing as its second argument:
julia> select(df, Not(r"a"))
2×3 DataFrame
Row │ x1 x2 y
│ Float64 Float64 String
─────┼────────────────────────────
1 │ 0.124937 0.37532 a
2 │ 0.236149 0.468216 a
Original answer below
As #Reza Afzalan said, what you're trying to do returns an array of strings, while column names in DataFrames are symbols.
Given that Julia doesn't have conditional list comprehension, the nicest thing you could do I guess would be
data[:, filter(x -> x != :column1, names(df))]
This will give you the data set with column 1 removed (without mutating it). You could extend this to checking against lists of names as well:
data[:, filter(x -> !(x in [:column1,:column2]), names(df))]
UPDATE: As Ian says below, for this use case the Not syntax is now the best way to go.
More generally, conditional list comprehensions are also available by now, so you could do:
data[:, [x for x in names(data) if x != :column1]]
As of DataFrames 0.19, seems that you can now do
select(data, Not(:column1))
to select all but the column column1. To select all except for multiple columns, use an array in the inverted index:
select(data, Not([:column1, :column2]))
To select several columns by name:
df[[:col1, :col2]
or, for other versions of the DataFrames library, I use:
select(df, [:col1, :col2])
colnames(data) .!= "column1" # => returns an array of bool
I think the right way is to use a filter function that returns desired column names
filter(x->x != "column1", colnames(data)) # => returns an array of string
DataFrame column names are of Symbol datatype
map(symbol ,str_array_of_filterd_column_names) # => returns array of identical symbols
One way is selecting a range of columns using the index
idx = length(data)
data[2:idx]
Other ways to do conditional selection are in the DataFrames docs
For example say you create a Julia DataFrame like so with 20 columns:
y=convert(DataFrame, randn(10,20))
How do you convert the column names (:x1 ... :x20) to something else, like (:col1, ..., :col20) for example, all at once?
You might find the names! function more concise:
julia> using DataFrames
julia> df = DataFrame(x1 = 1:2, x2 = 2:3, x3 = 3:4)
2x3 DataFrame
|-------|----|----|----|
| Row # | x1 | x2 | x3 |
| 1 | 1 | 2 | 3 |
| 2 | 2 | 3 | 4 |
julia> names!(df, [symbol("col$i") for i in 1:3])
Index([:col2=>2,:col1=>1,:col3=>3],[:col1,:col2,:col3])
julia> df
2x3 DataFrame
|-------|------|------|------|
| Row # | col1 | col2 | col3 |
| 1 | 1 | 2 | 3 |
| 2 | 2 | 3 | 4 |
One way to do this is with the rename! function. The method of the rename function takes a DataFrame as input though only allows you to change a single column name at a time (as of the development version 0.3 branch on 1/4/2014). Looking into the code of Index.jl in the DataFrames repository lead me to this solution which works for me:
rename!(y.colindex, [(symbol("x$i")=>symbol("col$i")) for i in 1:20])
y.colindex returns the index for the dataframe y, and the next argument creates a dictionary mapping the old column symbols to the new column symbols. I imagine that by the time someone else needs this, there will be a nicer way to do this, but I just spent a few hours figuring this out in the development version 0.3 of Julia, so I thought i would share!
As an update to the answer of #JohnMylesWhite, the names! function has been deprecated in DataFrames v 0.20.2. The latest way of going about this is by using the rename! function:
import DataFrames
DF = DataFrames
df = DF.DataFrame(x1 = 1:2, x2 = 2:3, x3 = 3:4)
println(df)
DF.rename!(df, [Symbol("Col$i") for i in 1:size(df,2)])
println(df)
v1.1.0
One can directly change the column names by
names!(df, colNames_as_Symbols)
To rename the columns with a vector of strings, this can be done via
names!(df, Symbol.(colNames_as_strings) )
# import Pkg; Pkg.add("DataFrames")
using DataFrames
The question has been answered, but for the additional clarity, sometimes you just want to specify the names without using loops (i.e. over-engineering):
rename!(df, [:Date, :feature_1, :feature_2 ], makeunique=true)
Example output:
141 rows × 3 columns
Date feature_1 feature_2
Date Float64? Float64?
1 2020-08-03 44.3 missing
Update:
For Julia 0.4, as described by John Myles White, all the names can be changed with:
names!(df::AbstractDataFrame, vals)
where vals is a Vector{Symbol} the same length as
the number of columns in df.
Specific names can be changed with:
rename!(df::AbstractDataFrame, from::Symbol, to::Symbol)
rename!(df::AbstractDataFrame, d::Associative)
rename!(f::Function, df::AbstractDataFrame)
where d is an Associative type that maps the original name to a new name
and f is a function that has the old column name (a symbol) as input
and new column name (a symbol) as output.
This is documented in the code at https://github.com/JuliaStats/DataFrames.jl/blob/7e2f48ad9f31185d279fdd81d6413a79b7e42e87/src/abstractdataframe/abstractdataframe.jl
This is the short and simple answer for Julia 1.1.1:
names!(df, [Symbol("Col$i") for i in 1:size(df,2)])
Use the rename function with an array containing the new names:
Vector_with_names = ["col1","col2","col3"]
rename!(df,Vector_with_names)
Using John's dataframe, i had to use colnames! instead of names!
df = DataFrame(x1 = 1:2, x2 = 2:3, x3 = 3:4)
colnames!(df, ["col$i" for i in 1:3])
My version of Julia is 0.2.1