'Illegal base64 character' exception when indexing pdf in elasticsearch from shell script - unix

I installed elasicsearch 5.0.1 with ingest attachment and tried indexing pdf in elasticsearch from shell script using below command
#!/bin/ksh
var=$(base64 sample.pdf | perl -pe 's/\n/\\n/g')
var1=$(curl -XPUT 'http://localhost:9200/my_index5/my_type/my_id?pipeline=attachment&pretty' -d' { "data" : "'$var'" }')
echo $var1
Got an error as
{ "error" : { "root_cause" : [ { "type" : "exception", "reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: IllegalArgumentException[Illegal base64 character a];", "header" : { "processor_type" : "attachment" } } ]
Can anyone please help resolving the above error

Rectified the error.
Cause for this error is, In the base64 encoded content \n was present which caused the "Illegal format exception".
As a solution, When tried like below it worked
var=$(base64 sample.pdf | perl -pe 's/\n//g')

Related

Jq: how to address a random-like key in json

in AWS CLI, the command aws quicksight describe-data-set blah blah returns a json document with the following troublesome syntax:
{
"Status": 200,
"DataSet": {
"Arn": "arn:aws:quicksight:<region>:<acct>:dataset/b7c87122-e180-47a9-a8a4-19f171e13fc8",
"DataSetId": "b7c87122-e180-47a9-a8a4-19f171e13fc8",
"Name": "MyName",
"CreatedTime": "2022-08-16T12:01:54.948000-05:00",
"LastUpdatedTime": "2022-08-19T08:47:55.553000-05:00",
"PhysicalTableMap": {
"6fac5dee-3691-4ddd-ba7a-0667168bb80c": {
"CustomSql": {
"DataSourceArn": "arn:aws:quicksight:<region>:<acct>:datasource/46f83f8b-181e-4575-8d61-84c50125f3aa",
I need to address that DataSetArn, but the key "6fac5dee-3691-4ddd-ba7a-0667168bb80c" is unknown to me at runtime. How do I address it?
I tried:
jq -r '.DataSet.PhysicalTableMap.*.CustomSql.DataSourceArn'
jq -r '.DataSet.PhysicalTableMap.\*.CustomSql.DataSourceArn'
jq -r '.DataSet.PhysicalTableMap.?.CustomSql.DataSourceArn'
jq -r '.DataSet.PhysicalTableMap.\?.CustomSql.DataSourceArn'
jq -r '.DataSet.PhysicalTableMap.%.CustomSql.DataSourceArn'
jq -r '.DataSet.PhysicalTableMap.\%.CustomSql.DataSourceArn'
All return an error similar to:
jq: error: syntax error, unexpected INVALID_CHARACTER, expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.DataSet.PhysicalTableMap.\?.CustomSql.DataSourceArn
jq: 1 compile error
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
I'm a noob, I know I'm guessing here. Does anyone have any insight on this?
Something like this:
jq -r '.DataSet.PhysicalTableMap[].CustomSql.DataSourceArn'
The part .DataSet.PhysicalTableMap returns the object as one result, the following filter [] takes that object and returns each value as one result. The following filters take each of these separate results and refines more stuff.
Note: If the object is the top-level item then the syntax is .[] .

Using kislyuk/yq returns syntax error, unexpected INVALID_CHARACTER with additional /0 at the end

I am using kislyuk/yq - The more often talked about version, which is a wrapper over jq, written in Python using the PyYAML library for YAML parsing
The version is yq 2.12.2
My jq is jq-1.6
I'm using ubuntu and bash scripts to do my parsing.
I wrote this as bash
alias=alias1
token=abc
yq -y -i ".tokens += { $alias: { value: $token }}" /root/.github.yml
I get the following error
jq: error: abc/0 is not defined at <top-level>, line 1:
.tokens += { alias1: { value: abc }}
I don't get it. Why would there be a /0 at the end?
The problem is abc is not interpreted as a literal string, when the double quotes are expanded by the shell. The underlying jq wrapper tries to match with abc as a standard built-in or a user-defined function which it was not able to resolve to, hence the error.
A JSON string (needed for jq) type needs to be quoted with ".." to be consistent with the JSON grammar. One way would be to pass the arg via command line with the --arg support
yq -y -i --arg t "$token" --arg a "$alias" '.tokens += { ($a): { value: $t } }' /root/.github.yml
Or have a quoting mess like below, which I don't recommend at all
yq -y -i '.tokens += { "'"$alias"'": { value: "'"$token"'" }}' /root/.github.yml

R library "XML" doesn't recognize encoding

Problem
I have an XML file that I would like to parse in R. I know that this file is not corrupted because the following Python code seems to work:
>>> import xml.etree.ElementTree as ET
>>> xml_tree = ET.parse(PATH_TO_MY_XML_FILE)
>>> do_my_regular_xml_stuff_that_seems_to_work_no_problem(xml_tree)
Now, when I try to run the following code in R, I get an error message:
> library("XML")
> xml_tree <- XML::xmlParse(PATH_TO_MY_XML_FILE)
Error in nchar(text_repr): invalid multibyte string, element 1
Traceback:
Alright, maybe the parser doesn't recognize the encoding. Luckily this should be specified in a decent XML file. So, I go to my shell and check:
$ head -n1 PATH_TO_MY_XML_FILE
??<?xml version="1.0" encoding="utf-16"?>
Now, I can go back to R and explicitly pass on the encoding, only to face the next error message where I got stuck now:
> library("XML")
> xml_tree <- XML::xmlParse(PATH_TO_MY_XML_FILE, encoding='UTF-16')
Start tag expected, '<' not found
Error: 1: Start tag expected, '<' not found
Traceback:
1. XML::xmlParse(filePath, encoding = "UTF-16")
2. (function (msg, ...)
. {
. if (length(grep("\\\n$", msg)) == 0)
. paste(msg, "\n", sep = "")
. if (immediate)
. cat(msg)
. if (length(msg) == 0) {
. e = simpleError(paste(1:length(messages), messages, sep = ": ",
. collapse = ""))
. class(e) = c(class, class(e))
. stop(e)
. }
. messages <<- c(messages, msg)
. })(character(0))
A last attempt to check (in R) if the file is in fact "UTF-16" encoded yields:
> f <- file(filePath, 'r', encoding = "UTF-16")
> firstLine <- readLines(f, n=1)
> close(f)
> print(line)
[1] "<?xml version=\"1.0\" encoding=\"utf-16\"?>"
Which looks just about right to me.
Question(s)
Does anyone know what is happening here? Is this a bug from the XML library? Is the file maybe not 'UTF-16' encoded, even though it claims it is? What are the two question marks ?? that I see when I print the file into the shell? These question marks don't appear when reading in the file properly...
Is this a bug from the XML library?
I think there could be a bug here. If I generate a valid UTF-16 XML document, which will have an initial byte-order mark:
$ echo '<a>😊</a>' | iconv -t utf-16 >a-utf16.xml
$ xxd a-utf16.xml
00000000: fffe 3c00 6100 3e00 3dd8 0ade 3c00 2f00 ..<.a.>.=...<./.
00000010: 6100 3e00 0a00 a.>...
then I can parse it with:
> XML::xmlParse('a-utf16.xml')
<?xml version="1.0"?>
<a>😊</a>
but not if I specify the encoding:
> XML::xmlParse('a-utf16.xml', encoding='utf-16')
Start tag expected, '<' not found
Error: 1: Start tag expected, '<' not found
Your original problem was when you weren't specifying the encoding. However:
I know that this file is not corrupted because the following Python code seems to work
That's a good hint, but I think you'll find edge cases where that doesn't hold. Try iconv for a second opinion on whether the file is encoded correctly.
For a more specific response, you'll need to post a reproducible XML file,

Apigee Command Line import returns 500 with NullPointerException

I'm trying to customise the deploy scripts to allow me to deploy each of my four API proxies from the command line. It looks very similar to the one provided in the samples on Github:
#!/bin/bash
if [[ $# -eq 0 ]] ; then
echo 'Must provide proxy name.'
exit 0
fi
dirname=$1
proxyname="teamname-"$dirname
source ./setup/setenv.sh
echo "Enter your password for user $username in the Apigee Enterprise organization $org, followed by [ENTER]:"
read -s password
echo Deploying $proxyname to $env on $url using $username and $org
./tools/deploy.py -n $proxyname -u $username:$password -o $org -h $url -e $env -p / -d ./$dirname
echo "If 'State: deployed', then your API Proxy is ready to be invoked."
echo "Run '$ sh invoke.sh'"
echo "If you get errors, make sure you have set the proper account settings in /setup/setenv.sh"
However when I run it, I get the following response:
Deploying teamname-gameassets to int on https://api.enterprise.apigee.com using my-email-address and org-name
Writing ./gameassets/teamname-gameassets.xml to ./teamname-gameassets.xml
Writing ./gameassets/policies/Add-CORS.xml to policies/Add-CORS.xml
Writing ./gameassets/proxies/default.xml to proxies/default.xml
Writing ./gameassets/targets/development.xml to targets/development.xml
Writing ./gameassets/targets/production.xml to targets/production.xml
Import failed to /v1/organizations/org-name/apis?action=import&name=teamname-gameassets with status 500:
{
"code" : "messaging.config.beans.ImportFailed",
"message" : "Failed to import the bundle : java.lang.NullPointerException",
"contexts" : [ ],
"cause" : {
"contexts" : [ ]
}
}
How should I go about debugging when I receive errors during the deploy process? Is there some sort of console I can view once logged in to Apigee?
I'm not sure how your proxy ended up this way, but it looks like the top-level directory is named "gameassets." It should be named "apiproxy". If you rename this directory you should see a successful deployment.
Also, before you customize too much, please try out "apigeetool," which is a more flexible command-line tool for deploying proxies:
https://github.com/apigee/api-platform-tools

Call Elastic search from shell script for indexing pdf document

I installed elasticsearch 5.0.1 and corresponding ingest attachment. Tried indexing pdf document from shell script as below
#!/bin/ksh
var=$(base64 file_name.pdf)
var1=$(curl -XPUT 'http://localhost:9200/my_index4/my_type/my_id?pipeline=attachment&pretty' -d' { "data" : $var }')
echo $var1
I got error as
{ "error" : { "root_cause" : [ { "type" : "exception", "reason" :
"java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field
[data]]; nested: IllegalArgumentException[Illegal base64 character 24];",
"header" : { "processor_type" : "attachment" } } ]...
Can anyone please help on resolving the above issue ... Not sure whether I am passing invalid base64 character ?
Please note that when I pass like this, It works !
var1=$(curl -XPUT 'http://localhost:9200/my_index4/my_type/my_id?pipeline=attachment&pretty'
-d' { "data" : "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=" }')
I guess the issue has to shell not expanding the variables within single-quotes, you need to double-quote to expand it. i.e.
change -d' { "data" : $var }'
to
-d '{"data" : "'"$(base64 file_name.pdf)"'"}'
directly to pass the base64 stream.
(or)
-d '{"data" : "'"$var"'"}'
More about quoting and variables in ksh here.

Resources