Add table inside plot Julia - plot

I have the following bar plot:
using Plots, DataFrames
Plots.bar(["A", "B", "c"],[6,5,3],fillcolor=[:red,:green,:blue], legend = :none)
Output:
I would like to add a simple small table inside to plot in the top right corner. The table should have the following values of the dataframe:
df = DataFrame(x = ["A", "B", "c"], y = [6,5,3])
3×2 DataFrame
Row │ x y
│ String Int64
─────┼───────────────
1 │ A 6
2 │ B 5
3 │ c 3
So I was wondering if anyone knows how to add a simple table to a Plots graph in Julia?

You can use the following:
using Plots, DataFrames
df = DataFrame(; x=["A", "B", "c"], y=[6, 5, 3])
plt = Plots.bar(
df[!, :x], df[!, :y]; fillcolor=[:red, :green, :blue], legend=:none, dpi=300
)
Plots.annotate!(
1,
4,
Plots.text(
replace(string(df), ' ' => '\u00A0'); family="Courier", halign=:left, color="black"
),
)
Plots.savefig("plot.svg")
Which gets you the following plot (after conversion to PNG for uploading to StackOverflow):
Note that we have to replace space characters with '\u00a0', non-breaking space, to prevent multiple consecutive spaces from being collapsed by the SVG (SVG‘s collapse consecutive spaces by default).

Related

Change variable name dynamically

I have a dataframe with some columns,sometimes it can be : [Type_House, Name, Location].
And sometimes it can be: [Type_Build, Name, Location]
There is a way to acess this dataframe column Type dynamically, like?
colName = "House"
dataframe.Type_colName
Thanks.
if you have
colName = "House"
you can access the column with
df[!, colName]
and from there you can use typeof() or eltype() to get the type or element type of that column
As indicated by #jling but specific to your question it would be:
> colName = "House"
> df[!, "Type_"*colName]
or
> getproperty(df, "Type_"*colName)
then you can just change colName="Build" to select the other column.
If you want to access the column that starts with Type_, you can use the names function this way:
julia> df = DataFrame( Type_Build = ["foo", "bar"], Name = ["A", "B"])
2×2 DataFrame
Row │ Type_Build Name
│ String String
─────┼────────────────────
1 │ foo A
2 │ bar B
julia> names(df, startswith("Type_"))
1-element Vector{String}:
"Type_Build"
To access the values in the column, you can use that to index into the dataframe:
julia> df[!, names(df, startswith("Type_"))]
2×1 DataFrame
Row │ Type_Build
│ String
─────┼────────────
1 │ foo
2 │ bar

Separating a column by the first 3 characters

I have a set of data below and I would like to separate the first three characters from the bm_id column into a separate column with the rest of the characters in another column.
bm_id
1
popCL20TE
2
agrST20
3
agrST20-09SE
I have tried using solutions to a similar question asked on stack, however I end up making extra empty columns with my data remaining together.
bm_id[c('species', 'id')] <- tstrsplit(bm_id$bm_id, '(?<=.{3})', perl = TRUE)
same happens with this code
bm_id2 <- tidyr::separate(bm_id, bm_id, into = c("species", "id"), sep = 3)
How about substr
df <- data.frame(vec= c("popCL20TE", "agrST20"))
df$first3 <- substr(df$vec, 1, 3)
df$last <- substr(df$vec, 4, nchar(df$vec))
df
vec first3 last
1 popCL20TE pop CL20TE
2 agrST20 agr ST20

How to plot based on a wildcard

I have data that looks like this:
A 2 3 LOGIC:A
B 3 3 LOGIC:B
C 2 2 COMBO:A
plot(Data$V2[Data$V4 == "LOGIC:A"], DATA$V3[Data$V4 == "LOGIC:A"])
However I want to plot whenever the column 4 is LOGIC, when I provide "LOGIC" inside the plot command it should plot both "LOGIC:A" and "LOGIC:B". Right now it only accepts the exact column 4 value. Can I use wildcards?
You can use grepl to find occurrences of your string.
x <- c("LOGIC: A", "COMBO: B")
x[grepl("LOGIC", x)]
[1] "LOGIC: A"
Using Data shown reproducibly in the Note at the end this will plot those rows for which V4 contains the substring LOGIC using the character after the colon to represent the point. If you want all points to be represented by the same character omit the pch argument from plot.
plot(V3 ~ V2, Data, subset = grep("LOGIC", V4), pch = sub("LOGIC:", "", V4))
Note
Lines <- "A 2 3 LOGIC:A
B 3 3 LOGIC:B
C 2 2 COMBO:A"
Data <- read.table(text = Lines, as.is = TRUE, strip.white = TRUE)

Comparing two columns in a dataframe using R or Excel

I have a csv file containing two columns, "Taxon" in column A and "Tip" in column C. I would like to compare column A against column C, and if the string matches another string in column C I'd like it to print "y" or something similar in column B next to the string in column A, if not I would like to print "n" or equivalent. Here is the beginning of my data:
Taxon B Tip
Nitrosotalea devanaterra Methanothermobacter thermautotrophicus
Nitrososphaera gargensis Methanobacterium beijingense
Nitrososphaera sca5445 Methanobacterium bryantii
Nitrososphaera sca2170 Methanosarcina mazei
Methanobacterium beijingense Persephonella marina
Methanobacterium bryantii Sulfurihydrogenibium azorense
Methanothermobacter thermautotrophicus Balnearium lithotrophicum
Methanosarcina mazei Isosphaera pallida
Koribacter versatilis Methanobacterium beijingense
Acidicapsa borealis Parachlamydia acanthamoebae
Acidobacterium capsulatum Leptospira biflexa
This is only a small part of the data, but the idea is that "n" would be printed in column B for all of the bacteria apart from "Methanobacterium beijingense" and "Methanobacterium bryantii", which are also found in the "Tip" column, and so "y" would be posted there. These could also just be "1" and "0".
I know dplyr has some good functions for filtering and joining data, however I can't find anything that exactly matches my needs. If there is an alternative method of using Excel to do this that's fine too.
Thanks.
For excel use the following formula in B2,
=if(isnumber(match(a2, c:c, 0)), "y", "n")
Fill down or double-click the 'drag button'.
A method using r and dplyr:
# create example data
x = read.table(header = TRUE, stringsAsFactors = FALSE, text =
"Taxon B Tip
Nitrosotalea_devanaterra 1 Methanothermobacter_thermautotrophicus
Nitrososphaera_gargensis 1 Methanobacterium_beijingense
Nitrososphaera_sca5445 1 Methanobacterium_bryantii
Nitrososphaera_sca2170 1 Methanosarcina_mazei
Methanobacterium_beijingense 1 Persephonella_marina
Methanobacterium_bryantii 1 Sulfurihydrogenibium_azorense
Methanothermobacter_thermautotrophicus 1 Balnearium_lithotrophicum
Methanosarcina_mazei 1 Isosphaera_pallida
Koribacter_versatilis 1 Methanobacterium_beijingense
Acidicapsa_borealis 1 Parachlamydia_acanthamoebae
Acidobacterium_capsulatum 1 Leptospira_biflexa")
# Data management part
x1 = data.frame(A = x$Taxon,B = x$B)
x2 = data.frame(A = x$Tip,B = x$B)
x$B[which(x$Taxon == anti_join(x1,x2))] = 0

How to select only a subset of dataframe columns in julia

I have a Dataframe of several columns say column1, column2...column100. How do I select only a subset of the columns eg (not column1) should return all columns column2...column100.
data[[colnames(data) .!= "column1"]])
doesn't seem to work.
I don't want to mutate the dataframe. I just want to select all the columns that don't have a particular column name like in my example
EDIT 2/7/2021: as people seem to still find this on Google, I'll edit this to say write at the top that current DataFrames (1.0+) allows both Not() selection supported by InvertedIndices.jl and also string types as column names, including regex selection with the r"" string macro. Examples:
julia> df = DataFrame(a1 = rand(2), a2 = rand(2), x1 = rand(2), x2 = rand(2), y = rand(["a", "b"], 2))
2×5 DataFrame
Row │ a1 a2 x1 x2 y
│ Float64 Float64 Float64 Float64 String
─────┼────────────────────────────────────────────────
1 │ 0.784704 0.963761 0.124937 0.37532 a
2 │ 0.814647 0.986194 0.236149 0.468216 a
julia> df[!, r"2"]
2×2 DataFrame
Row │ a2 x2
│ Float64 Float64
─────┼────────────────────
1 │ 0.963761 0.37532
2 │ 0.986194 0.468216
julia> df[!, Not(r"2")]
2×3 DataFrame
Row │ a1 x1 y
│ Float64 Float64 String
─────┼────────────────────────────
1 │ 0.784704 0.124937 a
2 │ 0.814647 0.236149 a
Finally, the names function has a method which takes a type as its second argument, which is handy for subsetting DataFrames by the element type of each column:
julia> df[!, names(df, String)]
2×1 DataFrame
Row │ y
│ String
─────┼────────
1 │ a
2 │ a
In addition to indexing with square brackets, there's also the select function (and its mutating equivalent select!), which basically takes the same input as the column index in []-indexing as its second argument:
julia> select(df, Not(r"a"))
2×3 DataFrame
Row │ x1 x2 y
│ Float64 Float64 String
─────┼────────────────────────────
1 │ 0.124937 0.37532 a
2 │ 0.236149 0.468216 a
Original answer below
As #Reza Afzalan said, what you're trying to do returns an array of strings, while column names in DataFrames are symbols.
Given that Julia doesn't have conditional list comprehension, the nicest thing you could do I guess would be
data[:, filter(x -> x != :column1, names(df))]
This will give you the data set with column 1 removed (without mutating it). You could extend this to checking against lists of names as well:
data[:, filter(x -> !(x in [:column1,:column2]), names(df))]
UPDATE: As Ian says below, for this use case the Not syntax is now the best way to go.
More generally, conditional list comprehensions are also available by now, so you could do:
data[:, [x for x in names(data) if x != :column1]]
As of DataFrames 0.19, seems that you can now do
select(data, Not(:column1))
to select all but the column column1. To select all except for multiple columns, use an array in the inverted index:
select(data, Not([:column1, :column2]))
To select several columns by name:
df[[:col1, :col2]
or, for other versions of the DataFrames library, I use:
select(df, [:col1, :col2])
colnames(data) .!= "column1" # => returns an array of bool
I think the right way is to use a filter function that returns desired column names
filter(x->x != "column1", colnames(data)) # => returns an array of string
DataFrame column names are of Symbol datatype
map(symbol ,str_array_of_filterd_column_names) # => returns array of identical symbols
One way is selecting a range of columns using the index
idx = length(data)
data[2:idx]
Other ways to do conditional selection are in the DataFrames docs

Resources