Is it possible to use a fluent-bit record's timestamp? - fluent-bit

I'm trying to create a fluent-bit config which uses a record's timestamp to a custom key using a filter. Something like:
[INPUT]
Name tail
Path /some/path
...
[FILTER]
Name record_modifier
Match *
Record fluenbit_orig_ts SOME_MAGIC_WAY_TO_GET_UNIXTIME
[OUTPUT]
Name stdout
Match *
The rationale for this that I'm using several parsers, each has its own time format (Time_Format, as it's used in the regular expression parser. Even if I used Time_Keep, it won't help because the time is specified differently by different services, with a different Time_Format). I'd like the records getting to an [OUTPUT} to have the same key to describe the timestamp. In this example that key would be fluenbit_orig_ts
Is this possible?

I got an answer from the fluent-bit slack channel.
Seems like this is possible using lua filters. Specifically, this example seems to be relevant: https://github.com/fluent/fluent-bit/blob/master/scripts/append_tag.lua

I've same issue.
According to #BugoK, I've solved.
This is lua script.
function append_tag(tag, timestamp, record)
new_record = record
new_record["log_time"] = os.date("%Y-%m-%d %H:%M:%S")
return 1, timestamp, new_record
end
And this is td-agent-bit config.
[FILTER]
Name lua
Match nginx.access
script override_time.lua
call append_tag
And then restart td-agent-bit. It works~!

Related

Fluentbit rewrite_tag not working with JSON Array

We are using fluent-bit plugin to tail from a file and send to an HTTP endpoint.
The sample log looks like the following.
tenant 1 testing 100
The configuration for input looks like the following.
[INPUT]
Name tail
Path /var/log/input/**/*.log
Tag tenant
Path_Key filename
We then use a lua filter to add a key based on the filepath. This works as expected.
[FILTER]
Name lua
Match *
script /etc/td-agent-bit/test.lua
call extract_id
At this point, we try to filter the message and rewrite the tag based on the tenantid.
[FILTER]
Name rewrite_tag
Match *
Rule $tenantid ^([a-z]+)-([0-9]+)$ from.$tenantid false
Emitter_Name re_emitted
With a stdout, like below,
[OUTPUT]
Name stdout
Match *
we verified the message to be like the following.
tenant: [1630073320.394812583, {"log"=>"tenant 1 testing 100", "tenantid"=>"tenant1", "filename"=>"/var/log/input/tenant1/file1.log"}]
It looks like the rewrite_tag plugin is not able to work and change the tag as expected. Is there a problem with the regex pattern ? Any help on this will be hugely appreciated.
I believe your regex needs a very slight tweak to remove the dash in the match pattern. At the moment, it's looking for abc-123, but your tenant id format is abc123.
By removing the dash from your regex, the example tenantid field should match:
^([a-z]+)-([0-9]+)$ will match "tenant-1"
^([a-z]+)([0-9]+)$ will match "tenant1"

Solr supports something like NOW() for /update operation?

There is some Solr support to fill a datetime field value with current Solr server time, like NOW() function defined in some general relational databases, i.e. pgsql, mysql, ...?
By example, considering NOW some reserved word representing current solr server UTC time. If I send following document to /update endpoint:
{
"available": false,
"oid": "c5f2788641cb33aea2bb2969a05aede6",
"last_update": NOW
}
Expected value on last_update will be current server time.
If there is not something ready, there is some way to extend DateField class or parser to define it?
Considering question about default value applied on schema, I am looking for something dynamic, defined during POST message to /update, not a setup for default value on schema.

Compare strings ignoring case

I am trying to find entities (Tags) in the database via their Name property, where it is important to ignore the case (When database is searched and the Database contains a Tag with the name Database, that one should be returned). Right now, my code looks like this:
public IEnumerable<db.Tag> FindByNames(IEnumerable<string> tagNames)
{
return this.DatabaseContext.Tags
.Where(tag => tagNames.Contains(tag.Name));
}
Obviously, this one is case sensitive. I have tried to provide StringComparer.OrdinalIgnoreCase as comparer to the Contains method, but got a warning during execution that the expression could not be translated and would be executed in code rather than in the database (I don't remember the exact message, I will edit as soon as I am back on my development machine). I can live with that if I have to, but it would be nice to know how to let the database do the work in this case. Is it possible?
No change should be necessary. SQLite's "LIKE" is already case-insensitive.
The default behavior of the LIKE operator is to ignore case for ASCII characters.
(cref https://www.sqlite.org/pragma.html#pragma_case_sensitive_like and Case sensitive and insensitive like in SQLite)
Of course, you can always use .FromSql() to get the exact query you want. Example:
context.Tags.FromSql("SELECT * FROM tags WHERE name LIKE '%{0}%'", tagName)

how to get query parameter in lua or nginx?

I am trying to implement this-
https://gist.github.com/MendelGusmao/2356310
Lua,nginx based URL shortener,The only change i want to implement is when some query string parameter comes with shortened URL i need to take that parameter and insert into the long URL.
e.g.
http://google.com?test=2 will be like http://abc.in/abc
while hitting on http://abc.in/abc?test=3 I get redirected to - http://google.com?test=3.
For that i need to take query string parameters from $request_URI, can any one help with some code?
You should be able to use ngx.var.arg_name where name is the name of the query parameter you want to access. See Variables with Infinite Names section in this tutorial for details on query parameter handling; you may also check my blog post for Lua nginx/openresty examples.
As an alternative, you can use ngx.req.get_uri_args() to retrieve all query parameters as one table. See this section in the same tutorial for the brief comparison between these methods.
You can also use ngx.var.QUERY_STRING to access the query string and unescape and parse it.
You can obtain the query parameter with just nginx by using $arg_test, test is the name of the query parameter in this example.
This is documented in http://nginx.org/en/docs/http/ngx_http_core_module.html#var_arg_.

Transforming the Default URI when using MLCP

I have a delimited file as input source to ingest data in marklogic using conten-pump through unix.There is no such column in the file that is unique throught to serve as the URI. Problem with this is that since duplicates(URI) is not possible, those records are skipped/overwritten for that particular URI.
The syntaxes available are:
-delimited_uri_id *my_column_name*
output_uri_prefix *my_prefix_string*
output_uri_suffix *my_suffix_string*
output_uri_replace pattern,'string'
The command for mlcp is:
bin/mlcp.sh import -host localhost -port 8042 -username name -password password-input_file_path hdfs://path/to/file -delimiter '|' -delimited_uri_id column_name-input_file_type delimited_text -mode distributed
The problem that lies here is that if I modify the above command and include:
-output_uri_prefix $(date +%s%N)
It takes the time(in nanoseconds) of execution of this command and prefixes for all URI.But that doesnt solve my problem since this value remains repeated. Same would happen for other options available too .What could be done to have all records ingested by the construction of unique URI for all records in some manner?
One way or another it is up to you to provide unique ids. For a delimited file the easiest answer might be to add a new column and populate it with a unique id, generated however you like.
Or you could use http://marklogic.github.io/recordloader/ DelimitedDataLoader with the special option ID_NAME=#AUTO. But keep in mind that ID_NAME=#AUTO will single-thread ingestion.

Resources