Ingest pipeline is not working over logs obtained from an event hub wih filebeat - kibana

I am sending logs to an azure eventhub with Serilog (using WriteTo.AzureEventHub(eventHubClient)), after that I am running a filebeat process with the azure module enabled, so I send these logs to elasticsearch to be able to explore them with Kibana.
The problem I have is that all the information goes to the field "message", I would need to separate the information of my logs in different fields to be able to do good queries.
The way I found was create an ingest pipeline in Kibana and through a grok processor I separate the fields inside the "meessage" and generate multiple fields. In the filebeat.yml I set the pipeline name, but nothing happen, it seems the pipeline is not working.
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]
pipeline: "filebeat-otc"
Does anybody knows what I am missing? THANKS in advance.
EDITION. I will add an example of my pipeline and my data. In the simulation is working properly:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{TIME:timestamp}\\s%{LOGLEVEL}\\s{[a-zA-Z]*:%{UUID:CorrelationID},[a-zA-Z]*:%{TEXT:OperationTittle},[a-zA-Z]*:%{TEXT:OriginSystemName},[a-zA-Z]*:%{TEXT:TargetSystemName},[a-zA-Z]*:%{TEXT:OperationProcess},[a-zA-Z]*:%{TEXT:LogMessage},[a-zA-Z]*:%{TEXT:ErrorMessage}}"
],
"pattern_definitions": {
"LOGLEVEL" : "\\[[^\\]]*\\]",
"TEXT" : "[a-zA-Z0-9- ]*"
}
}
}
]
},
"docs": [
{
"_source": {
"message": "15:13:59 [INF] {CorrelationId:83355884-a351-4c8b-af8d-b77c48462f36,OperationTittle:Operation1,OriginSystemName:Fexa,TargetSystemName:Usina,OperationProcess:Testing Log Data,LogMessage:Esto es una buena prueba,ErrorMessage:null}"
}
},
{
"_source": {
"message": "20:13:48 [INF] {CorrelationId:8451ee54-efca-40be-91c8-8c8e18e33f58,OperationTittle:null,OriginSystemName:Fexa,TargetSystemName:Donna,OperationProcess:Testing Log Data,LogMessage:null,ErrorMessage:null}"
}
}
]
}

It seems when you use a module it will create and use an ingest pipeline in elasticsearch, and the pipeline option in the output is ignored.
So my solution was modify the index.final_pipeline. For this, in Kibana I went to Stack Management / Index Management there I found my index, there I went to Edit Settings and set "index.final_pipeline": "the-name-of-my-pipeline".
I hope this helps to anybody.
This was thanks to leandrojmp

Related

Authorization error when trying to apply an index template to elasticsearch

so we have an elastic search service running in AWS, the elasticsearch version is 7.8.0. And I need to add an index template to limit the amount of shards that are allocated to new indices when they are added.
I followed this example of how to add an index template and got this very simple template:
PUT _index_template/shard_limitation
{
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
}
When running this request from the inside Kibana's Dev Tools console I get the following error: {"Message":"Your request: '/_index_template/shard_limitation' is not allowed."}. As well as an Unauthorized - 401 icon. I'm running this command with the admin user.
I tested it locally (elastic search running on my machine) and it all works fine. Any idea why this might happen?
SOLUTION:
As was suggested by #Ajinkya, the correct way to do this is to not include the "_index" before the template api. The correct way to achieve what I was trying to do is to type the following:
PUT _template/shard_limitation
{
"index_patterns": ["some-pattern"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
'_index_template' operation might not be supported in AWS elasticsearch. You can check supported operations for your AWS ES version here
You can still use '_template' API to add index template
PUT _template/shard_limitation
{
"index_patterns": ["test*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
}

Hadoop Streaming jar not found when submitting Google Dataproc Hadoop Job?

When trying to submit a Hadoop MapReduce job programmatically (from a Java application using the dataproc library), the job fails immediately. When submitting that exact same job through the UI, it works fine.
I've tried SSHing onto the Dataproc cluster to confirm the file exists, to check permissions, and changed the jar reference. Nothing has worked yet.
The error I'm getting:
Exception in thread "main" java.lang.ClassNotFoundException: file:///usr/lib/hadoop-mapreduce/hadoop-streaming-2.8.4.jar
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:18)
Job output is complete
When I clone the failed job in the console and look at the REST equivalent this is what I see:
POST /v1/projects/project-id/regions/us-east1/jobs:submit/
{
"projectId": "project-id",
"job": {
"reference": {
"projectId": "project-id",
"jobId": "jobDoesNotWork"
},
"placement": {
"clusterName": "cluster-name",
"clusterUuid": "uuid"
},
"submittedBy": "service-account#project.iam.gserviceaccount.com",
"jobUuid": "uuid",
"hadoopJob": {
"args": [
"-Dmapred.reduce.tasks=20",
"-Dmapred.output.compress=true",
"-Dmapred.compress.map.output=true",
"-Dstream.map.output.field.separator=,",
"-Dmapred.textoutputformat.separator=,",
"-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec",
"-Dmapreduce.input.fileinputformat.split.minsize=268435456",
"-Dmapreduce.input.fileinputformat.split.maxsize=268435456",
"-mapper",
"/bin/cat",
"-reducer",
"/bin/cat",
"-inputformat",
"org.apache.hadoop.mapred.lib.CombineTextInputFormat",
"-outputformat",
"org.apache.hadoop.mapred.TextOutputFormat",
"-input",
"gs://input/path/",
"-output",
"gs://output/path/"
],
"mainJarFileUri": "file:///usr/lib/hadoop-mapreduce/hadoop-streaming-2.8.4.jar"
}
}
}
When I submit the job through the console it works. The REST equivalent of that job:
POST /v1/projects/project-id/regions/us-east1/jobs:submit/
{
"projectId": "project-id",
"job": {
"reference": {
"projectId": "project-id,
"jobId": "jobDoesWork"
},
"placement": {
"clusterName": "cluster-name,
"clusterUuid": ""
},
"submittedBy": "user_email_account#email.com",
"jobUuid": "uuid",
"hadoopJob": {
"args": [
"-Dmapred.reduce.tasks=20",
"-Dmapred.output.compress=true",
"-Dmapred.compress.map.output=true",
"-Dstream.map.output.field.separator=,",
"-Dmapred.textoutputformat.separator=,",
"-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec",
"-Dmapreduce.input.fileinputformat.split.minsize=268435456",
"-Dmapreduce.input.fileinputformat.split.maxsize=268435456",
"-mapper",
"/bin/cat",
"-reducer",
"/bin/cat",
"-inputformat",
"org.apache.hadoop.mapred.lib.CombineTextInputFormat",
"-outputformat",
"org.apache.hadoop.mapred.TextOutputFormat",
"-input",
"gs://input/path/",
"-output",
"gs://output/path/"
],
"mainJarFileUri": "file:///usr/lib/hadoop-mapreduce/hadoop-streaming-2.8.4.jar"
}
}
}
I ssh’ed into the box and confirmed that the file is, in fact, present. The only difference I can really see is the “submittedBy”. One works, one doesn't. I'm guessing this is a permission thing, but I cannot seem to tell where the permissions are being pulled from in each scenario. In both cases, the Dataproc cluster is created with the same service account.
Looking at permissions for that jar on the cluster I see:
-rw-r--r-- 1 root root 133856 Nov 27 20:17 hadoop-streaming-2.8.4.jar
lrwxrwxrwx 1 root root 26 Nov 27 20:17 hadoop-streaming.jar -> hadoop-streaming-2.8.4.jar
I tried changing the mainJarFileUri from pointing explicitly to the versioned jar to the link (since it had open permissions), but didn't really expect it to work. And it didn't.
Does anyone with more Dataproc experience have any idea what's going on here, and how I can resolve it?
One common mistake that's easy to make in code is to call setMainClass when you intended to call setMainJarFileUri or vice-versa. The java.lang.ClassNotFoundException you received indicates that Dataproc was trying to submit that jarfile string as a classname rather than a jarfile, so somewhere along the way Dataproc thought you set main_class. You might want to double-check your code to see if this is the bug you encountered.
The reason using "clone job" in the GUI hides this problem is that the GUI tries to be more user-friendly by offering a single text box for setting either main_class or main_jar_file_uri, and infers whether it is a jarfile by looking at the file extension. So, if you submit a job with a jarfile URI in the main_class field and it fails, then you click clone and submit the new job, the GUI will try to be smart and recognize that the new job actually specified a jarfile name, and thus will correctly set the main_jar_file_uri field in the JSON request instead of main_class.

Using nginx to redirect dynamic request

I have a druid service which runs at my local machine at port 8082 as follows:
Method POST: http://localhost:8082/druid/v2/?pretty
Body:
{
"queryType" : "topN",
"dataSource" : "some_source",
"intervals" : ["2015-09-12/2015-09-13"],
"granularity" : "all",
"dimension" : "page",
"metric" : "edits",
"threshold" : 25,
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "pix_id",
"value": "1234"
}
}
}
Hitting this query gives me a list of records based on the value of the dimension 'pix_id'.
Now, I want to setup an nginx such that the external application should not have any clue about my druid service. I just want the external application to hit the URL:
http://localhost:80/pix_id/98765
This url should dynamically generate a JSON with the above mentioned pix_id and send a request to druid and return the response to the user.
Is it possible to do this in nginx?
Yes you can do this, but rather I would suggest to have a php or python script in between to give the results.
So the setup would be -
Have php page receive the request.
make a curl call from php to the druid, locally.
get the result and pass on the response.
There are multiple benefits of doing this eg. -
You completely mask druid, and not necessarily limited to druid.
You can do more calculations in php before sending the request to druid.
caching at php end.

Openstack API Authentication

Openstack noob here. I have setup an Ubuntu VM with DevStack, and am trying to authenticate with Keystone to obtain a token to be used for subsequent Openstack API calls. The identity endpoint shown on the “API Access” page in Horizon is: http://<DEVSTACK_IP>/identity.
When I post the below JSON payload to this endpoint, I get the error get_version_v3() got an unexpected keyword argument 'auth’.
{
"auth": {
"identity": {
"methods": [
"password"
],
"password": {
"user": {
"name": "admin",
"domain": {
"name": "Default"
},
"password": “AdminPassword”
}
}
}
}
}
Based on the Openstack docs, I should be hitting http://<DEVSTACK_IP>/v3/auth/tokens to obtain a token, but when I hit that endpoint, I get 404 Not Found.
I'm currently using Postman for testing this, but will eventually be doing programmatically.
Does anybody have any experience with authenticating against the Openstack API that can help?
Not sure whether you want to do it in a python way, but if you do, here is a way to do it:
from keystoneauth1.identity import v3
from keystoneauth1 import session
v3_auth = v3.Password(auth_url=V3_AUTH_URL,
username=USERNAME,
password=PASSWORD,
project_name=PROJECT_NAME,
project_domain_name="default",
user_domain_name="default")
v3_ses = session.Session(auth=v3_auth)
auth_token = v3_ses.get_token()
And you V3_AUTH_URL should be http://<DEVSTACK_IP>:5000/v3 since keystone is using port 5000 as a default.
If you do have a multi-domain devstack, you can change the domains, otherwise, they should be default
Just in case you don't have the client library installed: pip install python-keystoneclient
Here is a good doc for you to read about it:
https://docs.openstack.org/keystoneauth/latest/using-sessions.html
HTH

Can't create cloudsql role for Service Account via api

I have been trying to use the api to create service accounts in GCP.
To create a service account I send the following post request:
base_url = f"https://iam.googleapis.com/v1/projects/{project}/serviceAccounts"
auth = f"?access_token={access_token}"
data = {"accountId": name}
# Create a service Account
r = requests.post(base_url + auth, json=data)
this returns a 200 and creates a service account:
Then, this is the code that I use to create the specific roles:
sa = f"{name}#dotmudus-service.iam.gserviceaccount.com"
sa_url = base_url + f'/{sa}:setIamPolicy' + auth
data = {"policy":
{"bindings": [
{
"role": roles,
"members":
[
f"serviceAccount:{sa}"
]
}
]}
}
If roles is set to one of roles/viewer, roles/editor or roles/owner this approach does work.
However, if I want to use, specifically roles/cloudsql.viewer The api tells me that this option is not supported.
Here are the roles.
https://cloud.google.com/iam/docs/understanding-roles
I don't want to give this service account full viewer rights to my project, it's against the principle of least privilege.
How can I set specific roles from the api?
EDIT:
here is the response using the resource manager api: with roles/cloudsql.admin as the role
POST https://cloudresourcemanager.googleapis.com/v1/projects/{project}:setIamPolicy?key={YOUR_API_KEY}
{
"policy": {
"bindings": [
{
"members": [
"serviceAccount:sa#{project}.iam.gserviceaccount.com"
],
"role": "roles/cloudsql.viewer"
}
]
}
}
{
"error": {
"code": 400,
"message": "Request contains an invalid argument.",
"status": "INVALID_ARGUMENT",
"details": [
{
"#type": "type.googleapis.com/google.cloudresourcemanager.projects.v1beta1.ProjectIamPolicyError",
"type": "SOLO_REQUIRE_TOS_ACCEPTOR",
"role": "roles/owner"
}
]
}
}
With the code provided it appears that you are appending to the first base_url which is not the correct context to modify project roles.
This will try to place the appended path to: https://iam.googleapis.com/v1/projects/{project}/serviceAccount
The POST path for adding roles needs to be: https://cloudresourcemanager.googleapis.com/v1/projects/{project]:setIamPolicy
If you remove /serviceAccounts from the base_url and it should work.
Edited response to add more information due to your edit
OK, I see the issue here, sorry but I had to set up a new project to test this.
cloudresourcemanager.projects.setIamPolicy needs to replace the entire policy. It appears that you can add constraints to what you change but that you have to submit a complete policy in json for the project.
Note that gcloud has a --log-http option that will help you dig through some of these issues. If you run
gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME --role roles/cloudsql.viewer --log-http
It will show you how it pulls the existing existing policy, appends the new role and adds it.
I would recommend using the example code provided here to make these changes if you don't want to use gcloud or the console to add the role to the user as this could impact the entire project.
Hopefully they improve the API for this need.

Resources