I have SecurityLog with fields like DstIP_s and want to display records matching my trojanDst table
let trojanDst = datatable (DstIP_s:string)
[ "1.1.1.1","2.2.2.2","3.3.3.3"
];
SecurityLog |
| join trojanDst on DstIP_s
I am getting query could not be parsed error ?
The query you posted has a redundant pipe (|) before the join.
From an efficiency standpoint, make sure the left side of the join is the smaller one, as suggested here: https://learn.microsoft.com/en-us/azure/kusto/query/best-practices#join-operator
This is too long for a comment. As #Yoni L pointed the problem is doubled pipe operator.
For anyone with SQL background join may be a bit counterintuitive(in reality it is kind=innerunique):
JOIN operator:
kind unspecified, kind=innerunique
Only one row from the left side is matched for each value of the on
key. The output contains a row for each match of this row with rows
from the right.
Kind=inner
There's a row in the output for every combination of matching rows
from left and right.
let t1 = datatable(key:long, value:string)
[
1, "a",
1, "b"
];
let t2 = datatable(key:long, value:string)
[
1, "c",
1, "d"
];
t1| join t2 on key;
Output:
┌─────┬───────┬──────┬────────┐
│ key │ value │ key1 │ value1 │
├─────┼───────┼──────┼────────┤
│ 1 │ a │ 1 │ c │
│ 1 │ a │ 1 │ d │
└─────┴───────┴──────┴────────┘
Demo
SQL style JOIN version:
let t1 = datatable(key:long, value:string)
[
1, "a",
1, "b"
];
let t2 = datatable(key:long, value:string)
[
1, "c",
1, "d"
];
t1| join kind=inner t2 on key;
Output:
┌─────┬───────┬──────┬────────┐
│ key │ value │ key1 │ value1 │
├─────┼───────┼──────┼────────┤
│ 1 │ b │ 1 │ c │
│ 1 │ a │ 1 │ c │
│ 1 │ b │ 1 │ d │
│ 1 │ a │ 1 │ d │
└─────┴───────┴──────┴────────┘
Demo
There are many join types in KQL such as innerunique, inner, leftouter, rightouter, fullouter, anti and more. here you can find the full list
Related
I have this dataset:
text sentiment
randomstring positive
randomstring negative
randomstring netrual
random mixed
Then if I run a countmap i have:
"mixed" -> 600
"positive" -> 2000
"negative" -> 3300
"netrual" -> 780
I want to random sample from this dataset in a way that I have records of all smallest class (mixed = 600) and the same amount of each of other classes (positive=600, negative=600, neutral = 600)
I know how to do this in pandas:
df_teste = [data.loc[data.sentiment==i]\
.sample(n=int(data['sentiment']
.value_counts().nsmallest(1)[0]),random_state=SEED) for i in data.sentiment.unique()]
df_teste = pd.concat(df_teste, axis=0, ignore_index=True)
But I am having a hard time to do this in Julia.
Note: I don´t want to hardcode which of the class is the lowest one, so I am looking for a solution that infer that from the countmap or freqtable, if possible.
Why do you want a countmap or freqtable solution if you seem do want to use a data frame in the end?
This is how you would do this with DataFrames.jl (but without StatsBase.jl and FreqTables.jl as they are not needed for this):
julia> using Random
julia> using DataFrames
julia> df = DataFrame(text = [randstring() for i in 1:6680],
sentiment = shuffle!([fill("mixed", 600);
fill("positive", 2000);
fill("ngative", 3300);
fill("neutral", 780)]))
6680×2 DataFrame
Row │ text sentiment
│ String String
──────┼─────────────────────
1 │ R3W1KL5b positive
2 │ uCCpNrat ngative
3 │ fwqYTCWG ngative
⋮ │ ⋮ ⋮
6678 │ UJiNrlcw ngative
6679 │ 7aiNOQ1o neutral
6680 │ mbIOIQmQ ngative
6674 rows omitted
julia> gdf = groupby(df, :sentiment);
julia> min_len = minimum(nrow, gdf)
600
julia> df_sampled = combine(gdf) do sdf
return sdf[randperm(nrow(sdf))[1:min_len], :]
end
2400×2 DataFrame
Row │ sentiment text
│ String String
──────┼─────────────────────
1 │ positive O0QsyrJZ
2 │ positive 7Vt70PSh
3 │ positive ebFd8m4o
⋮ │ ⋮ ⋮
2398 │ neutral Kq8Wi2Vv
2399 │ neutral yygOzKuC
2400 │ neutral NemZu7R3
2394 rows omitted
julia> combine(groupby(df_sampled, :sentiment), nrow)
4×2 DataFrame
Row │ sentiment nrow
│ String Int64
─────┼──────────────────
1 │ positive 600
2 │ ngative 600
3 │ mixed 600
4 │ neutral 600
If your data is very large and you need the operation to be very fast there are more efficient ways to do it, but in most situations this should be fast enough and the solution does not require any extra packages.
I am using Julia CSV and I am trying to read data with DateTime in the form 10/17/2012 12:00:00 AM i tried
dfmt = dateformat"mm/dd/yyyy HH:MM:SS"
data =CSV.File("./Fremont_Bridge_Bicycle_Counter.csv", dateformat=dfmt) |> DataFrame
println(first(data,8))
but the thing is that I think the AM and PM makes the string not recognized as a date can someone help show how to pass this as a date
You can use the p specifier, which matches AM or PM. With that, your date format would look like this:
dfmt = dateformat"mm/dd/yyyy HH:MM:SS p"
You can see that the parsing is correct:
julia> DateTime("10/17/2012 12:00:00 AM", dfmt)
2012-10-17T00:00:00
To see all the possible format characters, check out the docstring of Dates.DateFormat, which is accessible in the REPL through ?DateFormat.
With the file Fremont_Bridge_Bicycle_Counter.csv
N1, N2, fecha
hola, 3, 10/03/2020 10:30:00
pepe, 5, 10/03/2020 11:40:50
juan, 5, 03/04/2020 20:10:12
And with the julia code:
using DataFrames, Dates, CSV
dfmt = dateformat"mm/dd/yyyy HH:MM:SS p"
data =CSV.File("./Fremont_Bridge_Bicycle_Counter.csv", dateformat=dfmt) |> DataFrame
println(first(data,8))
It gives the right result:
3×3 DataFrame
│ Row │ N1 │ N2 │ fecha │
│ │ String │ Int64 │ DateTime │
├─────┼────────┼───────┼─────────────────────┤
│ 1 │ hola │ 3 │ 2020-10-03T10:30:00 │
│ 2 │ pepe │ 5 │ 2020-10-03T11:40:50 │
│ 3 │ juan │ 5 │ 2020-03-04T20:10:12 │
I am prettry new to Julia and I am just playing around, and suddenly
the following code starts throwing errors, but it has worked in the past.
using SQLite
db = SQLite.DB("db")
data = SQLite.Query(db,"SELECT * FROM d")
throws:
ERROR: LoadError: MethodError: no method matching
SQLite.Query(::SQLite.DB, ::String)
can someone please enlighten me wha the problem is? Thank you.
I also tried with lower case: query.
Here is a short MWE of differences using SQLLite (v0.9.0 vs v1.0.0) with the current Julia version (1.3.1).
You do not have the table so you need to create it first:
using SQLite
using DataFrames
db = SQLite.DB("db")
# v0.9.0
SQLite.Query(db,"CREATE TABLE d (col1 INT, col2 varchar2(100))")
# v1.0.0
DBInterface.execute(db,"CREATE TABLE d (col1 INT, col2 varchar2(100))")
Now you can check if the table exits:
julia> SQLite.tables(db) |> DataFrame
1×1 DataFrames.DataFrame
│ Row │ name │
│ │ String⍰ │
├─────┼─────────┤
│ 1 │ d │
Let's insert some rows (note how one should sepearate data from SQL code via precompiled statements):
stmt = SQLite.Stmt(db, "INSERT INTO d (col1, col2) VALUES (?, ?)")
#v0.9.0
SQLite.execute!(stmt; values=(1, "Hello world"))
SQLite.execute!(stmt; values=(2, "Goodbye world"))
#v1.0.0
DBInterface.execute(stmt, (1, "Hello world"))
DBInterface.execute(stmt, (2, "Goodbye world"))
Now let us get the data
v0.9.0
julia> data = SQLite.Query(db,"SELECT * FROM d") |> DataFrame
3×2 DataFrame
│ Row │ col1 │ col2 │
│ │ Int64⍰ │ String⍰ │
├─────┼────────┼───────────────┤
│ 1 │ 1 │ Hello world │
│ 2 │ 2 │ Goodbye world │
v1.0.0
julia> data = DBInterface.execute(db, "select * from d") |> DataFrame
3×2 DataFrame
│ Row │ col1 │ col2 │
│ │ Int64⍰ │ String⍰ │
├─────┼────────┼───────────────┤
│ 1 │ 1 │ Hello world │
│ 2 │ 2 │ Goodbye world │
Having a graph like:
CREATE (Alice:Person {id:'a', fraud:1})
CREATE (Bob:Person {id:'b', fraud:0})
CREATE (Charlie:Person {id:'c', fraud:0})
CREATE (David:Person {id:'d', fraud:0})
CREATE (Esther:Person {id:'e', fraud:0})
CREATE (Fanny:Person {id:'f', fraud:0})
CREATE (Gabby:Person {id:'g', fraud:0})
CREATE (Fraudster:Person {id:'h', fraud:1})
CREATE
(Alice)-[:CALL]->(Bob),
(Bob)-[:SMS]->(Charlie),
(Charlie)-[:SMS]->(Bob),
(Fanny)-[:SMS]->(Charlie),
(Esther)-[:SMS]->(Fanny),
(Esther)-[:CALL]->(David),
(David)-[:CALL]->(Alice),
(David)-[:SMS]->(Esther),
(Alice)-[:CALL]->(Esther),
(Alice)-[:CALL]->(Fanny),
(Fanny)-[:CALL]->(Fraudster)
When trying to query like:
MATCH (a)-->(b)
WHERE b.fraud = 1
RETURN (count() / ( MATCH (a) -->(b) RETURN count() ) * 100)
I want to compute the fraudulence of a user which (as fraud is only either 0 or 1 is defined as the mean of all connected nodes fraud level:
MATCH ()--(f)
RETURN f.id, f.fraud, COUNT(*), COLLECT(f) AS fs
returns the correct number of friends, but is not able to access these i.e. in the collect statement is only accessing the node itself:
╒══════╤═════════╤══════════════╤══════════╤══════════════════════════════════════════════════════════════════════╕
│"f.id"│"f.fraud"│"avg(f.fraud)"│"COUNT(*)"│"fs" │
╞══════╪═════════╪══════════════╪══════════╪══════════════════════════════════════════════════════════════════════╡
│"h" │1 │1 │1 │[{"fraud":1,"id":"h"}] │
├──────┼─────────┼──────────────┼──────────┼──────────────────────────────────────────────────────────────────────┤
│"f" │0 │0 │4 │[{"fraud":0,"id":"f"},{"fraud":0,"id":"f"},{"fraud":0,"id":"f"},{"frau│
│ │ │ │ │d":0,"id":"f"}] │
....
I.e. naively calculating the average
MATCH ()--(f)
RETURN f.id, avg(f.fraud)
will only consider this single node and not the network. How can I consider the social network of a node instead (up to a defined depth, i.e. here 1) to improve the original answer of neo4j percentage of attribute for social network
edit
MATCH p = ()--()
UNWIND nodes(p) AS f
RETURN f.id, f.fraud, COUNT(*), COLLECT({id: f.id, fraud: f.fraud}) AS fs
will return only duplicates of the original node in the list and not the connected nodes:
│"f.id"│"f.fraud"│"COUNT(*)"│"fs" │
╞══════╪═════════╪══════════╪══════════════════════════════════════════════════════════════════════╡
│"h" │1 │2 │[{"id":"h","fraud":1},{"id":"h","fraud":1}] │
├──────┼─────────┼──────────┼──────────────────────────────────────────────────────────────────────┤
│"f" │0 │8 │[{"id":"f","fraud":0},{"id":"f","fraud":0},{"id":"f","fraud":0},{"id":│
│ │ │ │"f","fraud":0},{"id":"f","fraud":0},{"id":"f","fraud":0},{"id":"f","fr│
│ │ │ │aud":0},{"id":"f","fraud":0}] │
edit 2
MATCH p = (source)--(destination)
RETURN source.id, source.fraud, COUNT(*), COLLECT({id: destination.id, fraud: destination.fraud}) AS neighbors
is already pretty close - but lacking the avg function
MATCH p = (source)-[*..3]-(destination)
RETURN source.id, source.fraud, COUNT(*), avg(destination.fraud), COLLECT({id: destination.id, fraud: destination.fraud}) AS neighbors
includes the fraudulence defined as the average
This may be a stupid question, but for the life of me I can't figure out how to get Julia to read a csv file with column names that start with numbers and use them in DataFrames. How does one do this?
For example, say I have the file "test.csv" which contains the following:
,1Y,2Y,3Y
1Y,11,12,13
2Y,21,22,23
If I just use readtable(), I get this:
julia> using DataFrames
julia> df = readtable("test.csv")
2x4 DataFrames.DataFrame
| Row | x | x1Y | x2Y | x3Y |
|-----|------|-----|-----|-----|
| 1 | "1Y" | 11 | 12 | 13 |
| 2 | "2Y" | 21 | 22 | 23 |
What gives? How can I get the column names to be what they're supposed to be, "1Y, "2Y, etc.?
The problem is that in DataFrames, column names are symbols, which aren't meant to (see comment below) start with a number.
You can see this by doing e.g. typeof(:2), which will return Int64, rather than (as you might expect) Symbol. Thus, to get your columnnames into a useable format, DataFrames will have to prefix it with a letter - typeof(:x2) will return Symbol, and is therefore a valid column name.
Unfortunately, you can't use numbers for starting names in DataFrames.
The code that does the parsing of names makes sure that this restriction stays like this.
I believe this is because of how parsing takes place in julia: :aa names a symbol, while :2aa is a value (makes more sense considering 1:2aa is a range)
You could just use rename!() after the import:
df = csv"""
,1Y,2Y,3Y
1Y,11,12,13
2Y,21,22,23
"""
rename!(df, Dict(:x1Y =>Symbol("1Y"), :x2Y=>Symbol("2Y"), :x3Y=>Symbol("3Y") ))
2×4 DataFrames.DataFrame
│ Row │ x │ 1Y │ 2Y │ 3Y │
├─────┼──────┼────┼────┼────┤
│ 1 │ "1Y" │ 11 │ 12 │ 13 │
│ 2 │ "2Y" │ 21 │ 22 │ 23 │
Still you may experience problems later in your code, better to avoid column names starting with numbers...