Kibana : how to do a visualisation with an mathematical expression? - kibana

So I have 3 searches.
I'm interested in 3 lines of log (each line is a document, msg is a field)
S1 : msg = Sending to ELK
S2 : msg = ELK failure - rejected
S3 : msg = ELK failure due to us
Search 1 is a try, search 2 and 3 are failures, I need graph that display this :
(CountS1-(CountS2+CountS3))/(CountS1/100) on the Y axisand the date of the log on the X axis
I know how to use the date of the logs on the X axis but for the Y axis I can only do things such as count, average, sum, etc of 1 search only.
Any ideas?
Thanks.

Yes, the best solution is go to Scripted fields and create the field that you need.
You can do in this way for example:
(doc['CountS1'].value - (doc['CountS2'].value+doc['CountS3].value))/(doc['CountS1'].value/100)
With this you have a new field that you can use only reference the name that you given to the new field. For example you name this field as Example1, then in your visualize option will appear the new field.

Related

CreateML Recommender Training Error: Item IDs in the recommender model must be numbered 0, 1, ..., num_items - 1

I'm using CreateML to generate a Recommender model using an implicit dataset of the format: User ID, Item ID. The data is loaded into CreateML as a CSV with about 400k rows.
When attempting to 'Train' the model, I receive the following error:
Training Error: Item IDs in the recommender model must be numbered 0, 1, ..., num_items - 1
My dataset is in the following format:
"user_id","item_id"
"e7ca1b039bca4f81a33b21acc202df24","f7267c60-6185-11ea-b8dd-0657986dc989"
"1cd4285b19424a94b33ad6637ec1abb2","e643af62-6185-11ea-9d27-0657986dc989"
"1cd4285b19424a94b33ad6637ec1abb2","f2fd13ce-6185-11ea-b210-0657986dc989"
"1cd4285b19424a94b33ad6637ec1abb2","e95864ae-6185-11ea-a254-0657986dc989"
"31042cbfd30c42feb693569c7a2d3f0a","e513a2dc-6185-11ea-9b4c-0657986dc989"
"39e95dbb21854534958d53a0df33cbf2","f27f62c6-6185-11ea-b14c-0657986dc989"
"5c26ca2918264a6bbcffc37de5079f6f","ec080d6c-6185-11ea-a6ca-0657986dc989"
I've tried modifying both Item ID and User ID to enumerated IDs, but I still receive the training error. Example:
"item_ids","user_ids"
0,0
1,0
2,0
2,0
0,225
400,225
409,225
0,282
0,4
8,4
8,4
I receive this error both within the CreateML UI and when using CreateML within a Swift playground. I've also tried removing duplicates and verified that the maximum ID for each column is (num_items - 1).
I've searched for documentation on what the exact requirement is for the set of IDs with no luck.
Thank you in advance for any helping clarifying this error message.
I was able to discuss this issue with Apple's CoreML developers during WWDC2020. They described this as a known bug which will be fixed with the upcoming OS (Big Sur). The work-around for this bug is:
In the CSV dataset, create records for a single user which interacts with ALL items, and create records for a single item interacted with by ALL users.
Using pandas in python, I essentially implemented the following:
# Find the unique item ids
item_ids = ratings_df.item_id.unique()
# Find the unique user ids
user_ids = ratings_df.user_id.unique()
# Create a 'dummy user' which interacts with all items
mock_item_interactions_df = pd.DataFrame({'item_id': item_ids, 'user_id': 'mock-user'})
ratings_with_mocks_df = ratings_df.append(mock_item_interactions_df)
# Create a 'dummy item' which interacts with all users
mock_item_interactions_df = pd.DataFrame({'item_id': 'mock-item', 'user_id': user_ids})
ratings_with_mocks_df = ratings_with_mocks_df.append(mock_item_interactions_df)
# Export the CSV
ratings_with_mocks_df.to_csv('data/ratings-w-mocks.csv', quoting=csv.QUOTE_NONNUMERIC, index=True)
Using this CSV, I successfully generated a CoreML model using CreateML.
Try adding unnamed first column to your csv data which counts rows from 0 ... number of items - 1
like
"","userID","itemID","rating"
0,"a","x",1
1,"a","y",0
...
I think today after adding this column it started working for me. I use UUID for userID and itemID in my training model. and be sure to sort rows by itemID so all for one itemID are close to each other

Multiple altair charts generated by the same cell

I have a list of pandas dataframes I named entries, which I want to visualize after running code from the same cell. Below is the code I used :
alt.data_transformers.disable_max_rows()
for entry in entries :
entry['ds'] = entry.index
entry['y'] = entry['count']
entry['floor'] = 0
serie = alt.Chart(entry).mark_line(size=2, opacity=0.7, color = 'Black').encode(
x=alt.X('ds:T', title ='date'),
y='y'
).interactive().properties(
title='Evolution of '+entry.event.iloc[0]+' events over time'
)
alt.layer(serie)\
.properties(width=870, height=450)\
.configure_title(fontSize=20)
When i run the same code out of the 'for' loop, I get to see the one chart that corresponds to one dataframe, but once I run the code above, I don't get any graphs at all.
Does anyone know why It's not working or how to solve this issue?
TLDR: use chart.display()
Unless a chart appears at the end of the cell, you must manually display it.
By analogy, if you run
x + 1
by itself, Python will display the result. However, if you run
for x in range(10):
x + 1
Python will not display anything, because the last statement in the cell (in this case the for loop) has no return value to display. Instead you have to write
for x in range(10):
print(x + 1)
For altair, the mechanism is similar: if the chart is defined in the last statement in the cell, it will be automatically displayed. Otherwise, you have to manually trigger the display, which you can do using the display method:
for i in range(10:
chart = alt.Chart(...)
chart.display()
For more information on display troubleshooting in Altair, see https://altair-viz.github.io/user_guide/troubleshooting.html

Ist it possible to combine dimensions and metrics in calculated fields?

We have the variables:
"Unique User"
"Version" (Plus, Light in a ratio 79:21 from all Unique User)
"total Events"
"Eventkatagories".
And following scenario:
We can't get the exact data how many users are plus or light users.
But we know how many events are triggered by version (plus/light).
Now we want to know how the relative frequency of events triggered grouped by Version and event category.
So in a pivot table there is the row dimension = Version and the column Dimension = event category.
So the measurement should be the relative frequency.
So the simple custom calculated field should be "total events / users"... But remember we can't get the absolute value of Users by Version, we just know the ratio (80-20).
So I build another calculated field called UsersbyVersion with following statement:
CASE
WHEN (Version = "light") THEN SUM(User) * 0.21
WHEN (Version = "Plus") THEN SUM(User) * 0.79
END
But this formula gives following error:
Invalid formula - Invalid input expression. - Failed to parse CASE
statement
If I use absolute numbers for the statement it works.
Example:
CASE
WHEN (Version = "Normal") THEN 5000
WHEN (Version = "Plus") THEN 25000
END
But we need the statement "User * ration" ... the ratio won't change a lot but the user value in relation to the date we want to set on the Data Studio Report.
So I guess the problem is that the statement won't work with a combination of metrics and dimensions.
I already tried putting the "User * 0.79" and "User * 0.21" in custom metrics but this won't work aswell.
Is there a way to combine dimensions and metrics in a calculated field as an measurement?
Thx for your help
Create 2 metrics -
users * 0.2 (lets call this UsersP2)
users * 0.8 (lets call this UsersP8)
Now this should work
CASE
WHEN (Version = "light") THEN UserP2
WHEN (Version = "Plus") THEN UserP8
END
Dataset
Result

{getPost() does not retrieve reactions' component} & {"reactions" and "likes" with the same logical value return neither error nor warning msg}

[Win 10; R 3.4.3; RStudio 1.1.383; Rfacebook 0.6.15]
Hi!
I would like to ask two questions concerning the Rfacebook's getPost function:
Even though I have tried all possible combinations of the logical values for the arguments "comments", "reactions" and "likes", the best result I could get so far was a list of 3 components for each post ("post", "comments", and "likes") - that is, without the "reactions" component. Nevertheless, according to the rdocumentation, "getPost returns a list with up to four components: post, likes, comments, and reactions". getPost
Besides the (somehow strange) fact that, according to the same documentation, the argument "reactions" should be FALSE (default) in order to retrieve info on the total reactions to the post(s), I noticed a seemingly odd result: if I simultaneously set "reactions" and "likes" to be either TRUE or FALSE, R returns neither an error nor a warning message. The reason I find it a bit odd is because likes = !reactions in its own definition.
Here is the code:
#packageVersion("Rfacebook")
#[1] ‘0.6.15'
## temporary access token
fb_oauth <- "user access token"
qtd <- 5000
#pag_loop$id[1]
#[1] "242862559586_10156144461009587"
# arguments with default value (reactions = F, likes = T, comments = T)
x <- getPost(pag_loop$id[1], token = fb_oauth, n = qtd)
str(x)
# retrieves a list of 3: posts, likes, comments
Can someone please explain to me why I don't get the reaction's component?
Best,
Luana
Men, this is by the new version of facebook. This worked fine to V2.10 Version of API of facebook. As V2.11 and forward, it no longer works well.
I also can not capture the reactions, and the user's name is null. I have win 10 and R 3.4.2. Could to be R version? please, if you can to resolve this issue send me the response to my email

Encryption or Hashing of Date Value

I have an old program that has been discontinued which communicates with an SQL database. When I enter certain information in the defunct software, it is encrypted, encoded, or hashed before being entered into the database.
I am creating another application to interact with the same data, and I need to figure out how the end result is being produced.
Here's an example:
I enter 6/18/2017, I get y/7w/iXIE
I enter 6/18/2099, I get y/7w/iXBM
I enter 6/12/2017, I get y/7c/iXIE
I enter 12/11/2018, I get SN/u0/ZmWk
The last one throws me for a loop... what method is being used and how can I replicate this?
It might be format preserving encryption or just substatutions. In all cases the number of characgters in each section delimited by / are the same number of characters. With enough samples, all 12 months, 31 days and years you should be able to match the method.
6/18/2017
y/7w/iXIE
6/18/2099
y/7w/iXBM
6/12/2017
y/7c/iXIE
12/11/2018
SN/u0/ZmWk
months: 6 -> y, 12 -> SN
days: 11 -> u0, 12 -> 7c, 18 -> 7w
years: 2017 -> iXIE, 2018 -> ZmWk, 2099 -> iXBM

Resources