Elastic - Grok patterns parses string field incorrectly - kibana

So I have my log message field parsed as separate fields via ingest pipeline and grok processor, but one of these field ( string ) is parsed in format of seperate new log. Better explain it with specific example.
This is my log:
{"#timestamp":"2021-08-27T10:53:04.669661+02:00","#version":1,"host":"fafca1a6b0d9","message":"Loose white designer T-Shirt,L,29,1,sylius,1","type":"sylius","channel":"app","level":"INFO","monolog_level":200}
This is my simple ingest pipeline:
[
{
"grok": {
"field": "message",
"patterns": [
"%{DATA:product-name},%{DATA:product-variant},%{NUMBER:current-stock:float},%{NUMBER:order-quantity:float},%{USERNAME:identity},%{NUMBER:authenticated:float}"
]
}
},
{
"remove": {
"field": "message"
}
}
]
Problem is with product-name field. This field value get parse instead of 'Loose white designer T-Shirt' like this:
{"#Timestamp":"2021-08-27t11:40:28.159124+02:00","#version":1,"host":"fafca1a6b0d9","message":"Loose white designer T-Shirt
It is like the original log format, that is cut in half. What could be wrong? I tested it on Grok debuger with the same message and Grok pattern and this field has been separated correctly,

I want to share my solution for this. Not sure why, but It gets parsed correctly if I put delimeter on a the beginning and end of message and change pattern accordingly. I have put semicolon on beginning and the end.

Related

In Wordpress can I store a JSON object as a post and retrieve as it through REST?

Is it possible in WP to paste in a post a JSON object such as
[
{
name: 'hello',
type: 'myself',
},
]
and retrieve it exactly like this?
The maximum I've been able to do is to add this in a code block and then I will get
\n<pre class=\"wp-block-code\"><code> [{\"name\":\"hello\",\"type\":\"myself\",
Obviously if the string is obtained through a variable with content="...", everything there has to have the " escaped, so I don't delete the question but it was a nonsense question

Elastic Search in ASP.NET - using ampersand sign

I'm new to Elastic Search in ASP.NET, and I have a problem which I'm, so far, unable to resolve.
From documentation, I've seen that & sign is not listed as a special character. Yet, when I submit my search ampersand sign is fully ignored. For example if I search for procter & gamble, & sign is fully ignored. That makes quite a lot of problems for me, because I have companies that have names like M&S. When & sign is ignored, I get basically everything that has M or S in it. If I try with exact search (M&S), I have the same problem.
My code is:
void Connect()
{
node = new Uri(ConfigurationManager.AppSettings["Url"]);
settings = new ConnectionSettings(node);
settings.DefaultIndex(ConfigurationManager.AppSettings["defaultIndex"]);
settings.ThrowExceptions(true);
client = new ElasticClient(settings);
}
private string escapeChars(string inStr) {
var temp = inStr;
temp = temp
.Replace(#"\", #"\\")
.Replace(#">",string.Empty)
.Replace(#"<",string.Empty)
.Replace(#"{",string.Empty)
.Replace(#"}",string.Empty)
.Replace(#"[",string.Empty)
.Replace(#"]",string.Empty)
.Replace(#"*",string.Empty)
.Replace(#"?",string.Empty)
.Replace(#":",string.Empty)
.Replace(#"/",string.Empty);
return temp;
}
And then inside one of my functions
Connect();
ISearchResponse<ElasticSearch_Result> search_result;
var QString = escapeChars(searchString);
search_result = client.Search<ElasticSearch_Result>(s => s
.From(0)
.Size(101)
.Query(q =>
q.QueryString(b =>
b.Query(QString)
//.Analyzer("whitespace")
.Fields(fs => fs.Field(f => f.CompanyName))
)
)
.Highlight(h => h
.Order("score")
.TagsSchema("styled")
.Fields(fs => fs
.Field(f => f.CompanyName)
)
)
);
I've tried including analyzers, but then I've found out that they change the way tokenizers split words. I haven't been able to implement changes to the tokenizer.
I would like to be able to have following scenario:
Search: M&S Company Foo Bar
Tokens: M&S Company Foo Bar + bonus is if it's possible to have M S tokens too
I'm using elastic search V5.0.
Any help is more than welcome. Including better documentation than the one found here: https://www.elastic.co/guide/en/elasticsearch/client/net-api/5.x/writing-queries.html.
By default for a text field the analyzer applied is standard analyzer. This analyzer applies standard tokenizer along with lowercase token filter. So when you are indexing some value against that field, the standard analyzer is applied on that value and the resultant tokens are indexed against the field.
Let's understand this by e.g. For the field companyName (text type) let us assume that the value being passed is M&S Company Foo Bar while indexing a document. The resultant tokens for this value after the application of standard analyzer will be:
m
s
company
foo
bar
What you can notice is that not just whitespace but also & is used as delimiter to split and generate the tokens.
When you query against this field and don't pass any analyzer in the search query, it by default apply the same analyzer for search as well which is applied for indexing against the field. Therefore, if you search for M&S it get tokenised to M and S and thus actual search query search for these two tokens instead of M&S.
To solve this, you need to change the analyzer for the field companyName. Instead of standard analyzer you can create a custom analyzer which use whitespace tokenizer and lowercase filter (to make search case insensitive). For this you need to change the setting and mapping as below:
{
"settings": {
"analysis": {
"analyzer": {
"whitespace_lowercase": {
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"companyName": {
"type": "text",
"analyzer": "whitespace_lowercase",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Now for the above input the tokens generated will be:
m&s
company
foo
bar
This will ensure that when searching for M&S, & is not ignored.

Google Cloud Vision Raw JSON Response

When trying out google cloud vision with the drag and drop Try Drag and Drop, the last tab has raw JSON. What parameter do we need to pass to get that data?
I'm currently doing DOCUMENT_TEXT_DETECTION but it only gives data at the level of words and not of individual characters.
Edit: I modified this code vision test and changed the feature ...
feature := &vision.Feature{
Type: "DOCUMENT_TEXT_DETECTION",
}
and the printing to ...
body, err := json.Marshal(res)
fmt.Println(string(body))
I'm only seeing textAnnotations in the output.
The JSON file contains different things like text, locations and etc etc, Your concern is about getting full text.
Here I am adding a Python code, it says that you can get the full text by rendering the JSON file, you will find your required result using data['fullTextAnnotation']['text'], and you can get characters by breaking this file into smaller chunks and I guess JSON file has individual characters in it but I have never worked on it.
import json
from pprint import pprint
data = json.load(open('File Path'))
pprint(data['fullTextAnnotation']['text'])
Well, if you check properly there are various things available in that last tab containing raw JSON.
Based on your requirements you can fetch any of them.
From the response that you get from DOCUMENT_TEXT_DETECTION, you can fetch text_annotations, full_text_annotations, etc.
From text_annotations, you can fetch description, language of entire text, each words of texts, numeric digits, special characters and their respective co-ordinates.
From full_text_annotations, you can fetch pages, blocks of data, paragraphs, and individual characters, with their respective co-ordinates and confidence score.
Using the same code template you are using in Go language:
Search “type Feature struct” in the browser in this page. You can see the following feature types and descriptions:
// Type: The feature type.
//
// Possible values:
// "TYPE_UNSPECIFIED" - Unspecified feature type.
// "FACE_DETECTION" - Run face detection.
// "LANDMARK_DETECTION" - Run landmark detection.
// "LOGO_DETECTION" - Run logo detection.
// "LABEL_DETECTION" - Run label detection.
// "TEXT_DETECTION" - Run text detection / optical character
// recognition (OCR). Text detection
// is optimized for areas of text within a larger image; if the image
// is
// a document, use `DOCUMENT_TEXT_DETECTION` instead.
// "DOCUMENT_TEXT_DETECTION" - Run dense text document OCR. Takes
// precedence when both
// `DOCUMENT_TEXT_DETECTION` and `TEXT_DETECTION` are present.
// "SAFE_SEARCH_DETECTION" - Run Safe Search to detect potentially
// unsafe
// or undesirable content.
// "IMAGE_PROPERTIES" - Compute a set of image properties, such as
// the
// image's dominant colors.
// "CROP_HINTS" - Run crop hints.
// "WEB_DETECTION" - Run web detection.
There is not an option to directly show the JSON tab contents. The JSON tab contents are the addition of all the tabs “output”. Users tend to ask just for one. For example, when someone is analyzing faces is not interested in text detection.
If you need more than one, you can obtain multiple features outputs by “adding” the result of all the possible values together. Based on the facts mentioned, I have added the following lines to your code:
feature2 := &vision.Feature{
Type: "LABEL_DETECTION",
MaxResults: 10,
}
req2 := &vision.AnnotateImageRequest{
Image: img,
Features: []*vision.Feature{feature2},
}
batch2 := &vision.BatchAnnotateImagesRequest{
Requests: []*vision.AnnotateImageRequest{req2},
}
res2, err := svc.Images.Annotate(batch2).Do()
if err != nil {
log.Fatal(err)
}
body2, err := json.Marshal(res2)
fmt.Println(string(body2))
I have tested it and works. You should add this block of code for all the features in which you are interested. If you intend to add many of them, I would suggest to create a function/loop to avoid repeating code.
Anyway, I suggest you to fulfill the request here in order to exactly obtain the JSON output (that gives data at the level of words or letters) through calling the API instead of using a client library. I have used the next code to obtain the bounding box for the numbers of my interest:
{
"requests":
[
{
"features":
[
{
"type":
""
"maxResults":
-- add a property --model
}
{
"type":
""
-- add a property --maxResultsmodel
}
]
"image":
{
"source":
{
"gcsImageUri":
""
-- add a property --imageUri
}
-- add a property --content
}
-- add a property --imageContext
}
]
-- add a property --
}

Watson Conversation Dialogue, how to save user input using slot

In my Watson conversation dialogue am trying to read user input using slot,
my requirement is to prompt user for enter an issue description and save it in a variable named issue_description.
but in slot, watson check for intent or entity before saving it into a variable. in my case i have put an intent to check with, but it is not saved into variable after the check, i always get true as issue_description.
how can i save the issue _description into a variable?
what should be the possible intent and entity for this?
If you want to save user input then you can use to save the input in any variable.
"context":{
"issue_description":"<?input.text?>"
}
To capture something like a description in a slot, my recommendation is to
define an entity based on a pattern that describes how the description should be.
in the pattern, you could use quotes as delimiter of the string to capture
in the slot definition Watson has to look for that entity, you provide the name of a context variable the entity value is saved to
access the context variable to process the captured value
There is a sample workspace I wrote that captures an event description using a pattern. In the dialog I cut the quotes off the string and then send it to a function for postprocessing. The eventName is defined as follows, the pattern in patterns is the interesting part:
{
"entity": "eventName",
"values": [
{
"type": "patterns",
"value": "shortname",
"created": "2018-01-31T13:28:56.245Z",
"updated": "2018-02-07T09:08:31.651Z",
"metadata": null,
"patterns": [
"[\"„“][A-Za-z0-9.:| #\\']+[\"”“]"
]
}
],
}
To store the user input as in the context variable issue_description, you can either use an intent if you are not validating the input (description) or you can use an entity with the synonym value based on pattern. By doing this, you can configure the bot to recognize the condition and save the value to the context variable.

Removing of unwanted line returns/breaks in a csv export in openUI5

I'm learning OpenUI5 as part as my new job/internship and I've hit a snag in the product that I'm working on. The exported csv is correct as far as everything that we want is properly exported but if the string/input of an item contains a new line character or is ended with the enter key it breaks the csv export but the model within the table still displays correctly.
description.replace(/(\r\n|\n|\r)/gm," ");
Is what would work to remove any line returns or enters being found within the string but the way that data is bound within this application is within this type of structure:
exportType : new sap.ui.core.util.ExportTypeCSV({
separatorChar : "," //;
}),
models : table.getModel(),
rows : {
path : "/interactions"
},
columns : [ {
name : "description",
template : {
content : "{description}"
}
}] // There is more listings after this but it's not important
// ... more items here
}); // End of the bounded data to export
As stated previously, my item 'description' can contain new line characters but when I convert to to csv in the export, it will do something like this:
90000440,Information Protection Policy,Scene1_QuestionDraw01_Slide1_TrueFalse_0_0,The Information Security Officer is responsible for the review and revision of this policy.
(True or False),false,false,1,1
There isn't supposed to be an actual line return within the outputted csv but since there is a new line character or line return within the description, it outputs one in the export.
Any amount of help that leads to me solving this issue would be fantastic.
Thank you, Jordan.
The best way would be to be able to use string delimiters as indicated in the comment by criticalfix. This normally works by default, see the following code from the current UI5 codebase: github. It might be that you have an UI5 version that does not cover this, because this was fixed last year in summer (see this commit). You can see the versions which contain this commit in the commit itself (immediately above the author line).
If you cannot upgrade to a version that contains this commit, then maybe your first idea of replacing the newlines would be appropriate. You can use a formatter in conjunction with your binding to remove the newlines:
// all the stuff before
columns : [ {
name : "description",
template : {
content : {
path: "description",
formatter: function (description) {
return description.replace(/(\r\n|\n|\r)/gm," ");
}
}
}
}]

Resources