Can fluentbit forward fluentbit_metrics as plain text instead of a JSON object to a port? - fluent-bit

I am trying to send fluentbit metrics to an external source for processing. My understanding from the documentation is that the fluentbit_metrics input is intended to be used with output plugins that are for specific telemetry solutions like Prometheus, OpenTelemetry, etc. However, for my purposes, I cannot actually use any of those solutions and instead have to use a different bespoke metrics solution. For this to work, I would like to just send lines of text to a port that my metrics solution is listening on.
I am trying to use the fluentbit forward output to send data to this endpoint, but I am getting an error in response from my metrics solution because it is receiving a big JSON object which it can't parse. However, when I output the same fluentbit_metrics input to a file or to stdout, the contents of the file is more like what I would expect, where each metric is just a line of text. If these text lines were what was being sent to my metrics endpoint, I wouldn't have any issue ingesting them.
I know that I could take on the work to change my metrics solution to parse and process this JSON map, but before I do that, I wanted to check if this is the only way forward for me. So, my question is, is there a way to get fluentbit to send fluentbit_metrics to a forward output where it does not convert the metrics into a big JSON object? Is the schema for that JSON object specific to prometheus? Is there a reason why the outputs differ so substantially?
Here is a copy of an example config I am using with fluentbit:
[SERVICE]
# This is a commented line
Daemon off
log_level info
log_file C:\MyFolder\fluentlog.txt
flush 1
parsers_file .\parsers.conf
[INPUT]
name fluentbit_metrics
tag internal_metrics
scrape_interval 2
[OUTPUT]
Name forward
Match internal_metrics
Host 127.0.0.1
Port 28232
tag internal_metrics
Time_as_Integer true
[OUTPUT]
name stdout
match *
And here is the output from the forward output plugin:
{
"meta": {
"cmetrics": {},
"external": {},
"processing": {
"static_labels": []
}
},
"metrics": [
{
"meta": {
"ver": 2,
"type": 0,
"opts": {
"ns": "fluentbit",
"ss": "",
"name": "uptime",
"desc": "Number of seconds that Fluent Bit has been running."
},
"labels": [
"hostname"
],
"aggregation_type": 2
},
"values": [
{
"ts": 1670884364820306500,
"value": 22,
"labels": [
"myHostName"
],
"hash": 16603984480778988994
}
]
}, etc.
and here is the output of the same metrics from stdout:
2022-12-12T22:02:13.444100300Z fluentbit_uptime{hostname="myHostName"} = 2
2022-12-12T22:02:11.721859000Z fluentbit_input_bytes_total{name="tail.0"} = 1138
2022-12-12T22:02:11.721859000Z fluentbit_input_records_total{name="tail.0"} = 12
2022-12-12T22:02:11.444943400Z fluentbit_input_files_opened_total{name="tail.0"} = 1

Related

How to parse GraphSON data from Neptune as a list of dictionaries?

If you make a signed request using the code provided by AWS here: https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-connecting-python.html
Then if you do a query like this from a python script:
make_signed_request(query="g.V().limit(10).valueMap(true).toList()")
It outputs an ugly unusable thing like this:
{
"requestId": "bf942e84-ff49-42c7-a65c-ef43f45g5h63",
"status": {
"message": "",
"code": 200,
"attributes": {
"#type": "g:Map",
"#value": []
}
},
"result": {
"data": {
"#type": "g:List",
"#value": [
{
"#type": "g:Map",
"#value": [
"names",
{
"#type": "g:List",
"#value": ["David Bowie"]
}
..., etc.
Whereas if I run the same query on a notebook, like this:
%%gremlin --store-to foo
g.V().limit(10).valueMap(true).toList()
Then foo is a nicely formatted list of dictionaries, like this:
[
{'names': ['David Bowie'], 'dob': [08-01-1947]},
{'names': ['Michael Jackson'], 'dob': [29-08-1958]},
]
How do I get the maked_signed_request function to return data in the same way that the notebook does?
What you are seeing is the default HTTP response format that you can expect to see from any "Gremlin Server compatible" TinkerPop endpoint. Under the covers, the graph-notebook notebooks are using the Gremlin Python client and sending the request over a web socket. The Gremlin Python client nicely de-serializes that result for you. You essentially have two options when calling the Neptune Gremlin endpoint.
Use a specific Gremlin client for your preferred programming language (if one exists).
Call the HTTP endpoint and post process the GraphSON result. Rather than write your own you can most likely repurpose the serializers from one of the clients. If possible I would use option 1.

Context size limit inside a node in Watson Conversation

My Watson Conversation bots typically have a node where I load some data into context. This usually contains all possible answers, strings, various other data.
So one of my first nodes in any bot looks like this:
{
"type": "standard",
"title": "Load Messages",
"output": {
"text": {
"values": [
""
],
"selection_policy": "sequential"
}
},
"context": {
// A whole bunch of data here
}
...
Is there a limit on how much data I can put there? Currently I have around 70 kilobytes, but potentially I can put a few megabytes there just for the convenience of running the logic inside Conversation. (Yes I am aware that this entire data will be sent back to the client, which is not very efficient)
There is no documented limit. You are more likely to hit network issues before Watson Assistant has any issues.
But storing your whole applications logic in the context object is considered an anti-pattern.
Your context object should only store what is required in Watson Assistant, and then if possible only for the related portion of the conversation.
For one time context values you can store them in the output object.
{
"context": {
},
"output": {
...
"one_time_var": "abc"
}
}
This will be discarded on your next call.
If you have a large volume of data that could be used at different times, then one pattern to use is a context request object.
For example:
"context": {
"request": "name,address,id"
}
Your next response from the application layer would send this:
"context": {
"name" : "Bob",
"address": "123 street",
"id": "1234"
}
You have your returning response update those variables, then clear the context variables again. If you have other context variables that need to stay, then store those in an object and erase just that object.

Using nginx to redirect dynamic request

I have a druid service which runs at my local machine at port 8082 as follows:
Method POST: http://localhost:8082/druid/v2/?pretty
Body:
{
"queryType" : "topN",
"dataSource" : "some_source",
"intervals" : ["2015-09-12/2015-09-13"],
"granularity" : "all",
"dimension" : "page",
"metric" : "edits",
"threshold" : 25,
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "pix_id",
"value": "1234"
}
}
}
Hitting this query gives me a list of records based on the value of the dimension 'pix_id'.
Now, I want to setup an nginx such that the external application should not have any clue about my druid service. I just want the external application to hit the URL:
http://localhost:80/pix_id/98765
This url should dynamically generate a JSON with the above mentioned pix_id and send a request to druid and return the response to the user.
Is it possible to do this in nginx?
Yes you can do this, but rather I would suggest to have a php or python script in between to give the results.
So the setup would be -
Have php page receive the request.
make a curl call from php to the druid, locally.
get the result and pass on the response.
There are multiple benefits of doing this eg. -
You completely mask druid, and not necessarily limited to druid.
You can do more calculations in php before sending the request to druid.
caching at php end.

"Reverse formatting" Riak search results

Let's say I have an object in the test bucket in my Riak installation with the following structure:
{
"animals": {
"dog": "woof",
"cat: "miaow",
"cow": "moo"
}
}
When performing a search request for this object, the structure of the search results is as follows:
{
"responseHeader": {
"status": 0,
"QTime": 3,
"params": {
"q": "animals_cow:moo",
"q.op": "or",
"filter":"",
"wt": "json"
}
},
"response": {
"numFound": 1,
"start": 0,
"maxScore": "0.353553",
"docs": [
{
"id": "test",
"index": "test",
"fields": {
"animals_cat": "miaow",
"animals_cow": "moo",
"animals_dog": "woof"
},
"props": {}
}
]
}
}
As you can see, the way the object is stored, the cat, cow and dog keys are nested within animals. However, when the search results come back, none of the keys are nested, and are simply separated by _.
My question is this: Is there any way provided by Riak to "reverse format" the search, and return the fields of the object in the correct (nested) format? This becomes a problem when storing and returning user data that might possibly contain _.
I do see that the latest version of Riak (beta release) provides a search schema, but I can't seem to see whether my question would be answered by this.
What you receive back in the search result is what the object looked like after passing through the json analyzer. If you need the data formatted differently, you can use a custom analyzer. However, this will only affect newly put data.
For existing data, you can use the id field and issue a get request for the original object, or use the solr query as input to a MapReduce job.

Google Cloud Datastore runQuery returning 412 "no matching index found"

** UPDATE **
Thanks to Alfred Fuller for pointing out that I need to create a manual index for this query.
Unfortunately, using the JSON API, from a .NET application, there does not appear to be an officially supported way of doing so. In fact, there does not officially appear to be a way to do this at all from an app outside of App Engine, which is strange since the Cloud Datastore API was designed to allow access to the Datastore outside of App Engine.
The closest hack I could find was to POST the index definition using RPC to http://appengine.google.com/api/datastore/index/add. Can someone give me the raw spec for how to do this exactly (i.e. URL parameters, what exactly should the body look like, etc), perhaps using Fiddler to inspect the call made by appcfg.cmd?
** ORIGINAL QUESTION **
According to the docs, "a query can combine equality (EQUAL) filters for different properties, along with one or more inequality filters on a single property".
However, this query fails:
{
"query": {
"kinds": [
{
"name": "CodeProse.Pogo.Tests.TestPerson"
}
],
"filter": {
"compositeFilter": {
"operator": "and",
"filters": [
{
"propertyFilter": {
"operator": "equal",
"property": {
"name": "DepartmentCode"
},
"value": {
"integerValue": "123"
}
}
},
{
"propertyFilter": {
"operator": "greaterThan",
"property": {
"name": "HourlyRate"
},
"value": {
"doubleValue": 50
}
}
},
{
"propertyFilter": {
"operator": "lessThan",
"property": {
"name": "HourlyRate"
},
"value": {
"doubleValue": 100
}
}
}
]
}
}
}
}
with the following response:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "FAILED_PRECONDITION",
"message": "no matching index found.",
"locationType": "header",
"location": "If-Match"
}
],
"code": 412,
"message": "no matching index found."
}
}
The JSON API does not yet support local index generation, but we've documented a process that you can follow to generate the xml definition of the index at https://developers.google.com/datastore/docs/tools/indexconfig#Datastore_Manual_index_configuration
Please give this a shot and let us know if it doesn't work.
This is a temporary solution that we hope to replace with automatic local index generation as soon as we can.
The error "no matching index found." indicates that an index needs to be added for the query to work. See the auto index generation documentation.
In this case you need an index with the properties DepartmentCode and HourlyRate (in that order).
For gcloud-node I fixed it with those 3 links:
https://github.com/GoogleCloudPlatform/gcloud-node/issues/369
https://github.com/GoogleCloudPlatform/gcloud-node/blob/master/system-test/data/index.yaml
and most important link:
https://cloud.google.com/appengine/docs/python/config/indexconfig#Python_About_index_yaml to write your index.yaml file
As explained in the last link, an index is what allows complex queries to run faster by storing the result set of the queries in an index. When you get no matching index found it means that you tried to run a complex query involving order or filter. So to make your query work, you need to create your index on the google datastore indexes by creating a config file manually to define your indexes that represent the query you are trying to run. Here is how you fix:
create an index.yaml file in a folder named for example indexes in your app directory by following the directives for the python conf file: https://cloud.google.com/appengine/docs/python/config/indexconfig#Python_About_index_yaml or get inspiration from the gcloud-node tests in https://github.com/GoogleCloudPlatform/gcloud-node/blob/master/system-test/data/index.yaml
create the indexes from the config file with this command:
gcloud preview datastore create-indexes indexes/index.yaml
see https://cloud.google.com/sdk/gcloud/reference/preview/datastore/create-indexes
wait for the indexes to serve on your developer console in Cloud Datastore/Indexes, the interface should display "serving" once the index is built
once it is serving your query should work
For example for this query:
var q = ds.createQuery('project')
.filter('tags =', category)
.order('-date');
index.yaml looks like:
indexes:
- kind: project
ancestor: no
properties:
- name: tags
- name: date
direction: desc
Try not to order the result. After removing orderby(), it worked for me.

Resources