Convert Evernote Markup Language (ENML) to Markdown with pandoc - evernote

I'm trying to convert Evernote Markup Language (ENML) to Markdown using Pandoc. ENML is mostly a subset of XHTML with a few additional elements. The element I'm trying to convert is a special <en-todo checked="true"/>. Here's a sample ENML document with two en-todo items:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE en-note SYSTEM "xml/enml2.dtd">
<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div><en-todo checked="true"/>This is a thing<br/></div>
<div><en-todo checked="false"/>This is another thing<br/></div>
</en-note>
I'm trying to convert it to the following markdown:
[X] This is a thing
[ ] This is another thing
My current approach is to create a JSON filter
pandoc --parse-raw -f html -t json test.enml | \
./my-filter | pandoc -f json -t markdown
I'm not sure how to properly parse the RawInline blocks:
[
{
"Para": [
{
"RawInline": [
"html",
"<en-todo checked=\"true\">"
]
},
{
"RawInline": [
"html",
"</en-todo>"
]
},
{
"Str": "This"
},
"Space",
{
"Str": "is"
},
"Space",
{
"Str": "a"
},
"Space",
{
"Str": "thing"
},
"LineBreak"
]
},
{
"RawBlock": [
"html",
"</div>"
]
},
{
"RawBlock": [
"html",
"<div>"
]
},
{
"Para": [
{
"RawInline": [
"html",
"<en-todo checked=\"false\">"
]
},
{
"RawInline": [
"html",
"</en-todo>"
]
},
{
"Str": "This"
},
"Space",
{
"Str": "is"
},
"Space",
{
"Str": "another"
},
"Space",
{
"Str": "thing"
},
"LineBreak"
]
}
]

Related

How to create a json file in R with list with lists as elements with fromJson()

I am trying to create a .json file. The third element should be a list with lists as elements.
What am I doing wrong?
Bellow is the json file I created with R:
{
"list1": [
"element1"
],
"list2": [
"element2"
],
"List_with_lists_as_elements": [
"Child1":{
"Name": "Child1",
"Child1_Title": [
"Title1",
"Title2",
"Title3"]
,
"Child1_Subtitle": [
"Subtitle_1",
"Subtitle_2",
"Subtitle_3"
]
},
"Child2":{
"Name": "Child2",
"Child2_Title": [
"Title1",
"Title2",
"Title3"]
,
"Child2_Subtitle": [
"Subtitle2_1",
"Subtitle2_2",
"Subtitle2_3"
]
},
"Child3":{
"Name": "Child3",
"Child2_Title": [
"Title1",
"Title2",
"Title3"]
,
"Child2_Subtitle": [
"Subtitle3_1",
"Subtitle3_2",
"Subtitle3_3"
]
}
]
}
I then save this as example_json.json and upload using fromJSON(txt = 'example_json.json'), and I have a error message, probably because I dont know quite well create a .json file:
Error in parse_con(txt, bigint_as_char) :
parse error: after array element, I expect ',' or ']'
_as_elements": [ "Child1":{ "Name": "Child1",
(right here) ------^
How can I create a .json file that gives me a list with lists() ?
The issue is that you have keys in your array :
...
"List_with_lists_as_elements": [
"Child1":{
"Name": "Child1",
...
},
"Child2":{
"Name": "Child2",
...
},
"Child3":{
"Name": "Child3",
...
}
]
...
You have a Name field which contains the key values, so you can probably just remove the keys:
...
"List_with_lists_as_elements": [
{
"Name": "Child1",
...
},
{
"Name": "Child2",
...
},
{
"Name": "Child3",
...
}
]
...

convert very large custom json to csv using jq bash

have a very large JSON data like below
{
"10.10.10.1": {
"asset_id": 1,
"referencekey": "ASSET-00001",
"hostname": "testDev01",
"fqdn": "ip-10-10.10.1.ap-northeast-2.compute.internal",
"network_zone": [
"DEV",
"Dev"
],
"service": {
"name": "TEST_SVC",
"account": "AWS_TEST",
"billing": "Testpay"
},
"aws": {
"tags": {
"Name": "testDev01",
"Service": "TEST_SVC",
"Usecase": "Dev",
"billing": "Testpay",
"OsVersion": "20.04"
},
"instance_type": "t3.micro",
"ami_imageid": "ami-e000001",
"state": "running"
}
},
"10.10.10.2": {
"asset_id": 3,
"referencekey": "ASSET-47728",
"hostname": "Infra_Live01",
"fqdn": "ip-10-10-10-2.ap-northeast-2.compute.internal",
"network_zone": [
"PROD",
"Live"
],
"service": {
"name": "Infra",
"account": "AWS_TEST",
"billing": "infra"
},
"aws": {
"tags": {
"Name": "Infra_Live01",
"Service": "Infra",
"Usecase": "Live",
"billing": "infra",
"OsVersion": "16.04"
},
"instance_type": "r5.large",
"ami_imageid": "ami-e592398b",
"state": "running"
}
}
}
Can I use JQ to make the conversion like below?
Or is there an easier way to solve it?
Thank you
Expected result
_key,asset_id,referencekey,hostname,fqdn,network_zone/0,network_zone/1,service/name,service/account,service/billing,aws/tags/Name,aws/tags/Service,aws/tags/Usecase,aws/tags/billing,aws/tags/OsVersion,aws/instance_type,aws/ami_imageid,aws/state
10.10.10.1,1,ASSET-00001,testDev01,ip-10-10.10.1.ap-northeast-2.compute.internal,DEV,Dev,TEST_SVC,AWS_TEST,Testpay,testDev01,TEST_SVC,Dev,Testpay,20.04,t3.micro,ami-e000001,running
10.10.10.2,3,ASSET-47728,Infra_Live01,ip-10-10-10-2.ap-northeast-2.compute.internal,PROD,Live,Infra,AWS_TEST,infra,Infra_Live01,Infra,Live,infra,16.04,r5.large,ami-e592398b,running
jq let's you do the conversion to CSV easily. The following code produces the desired output:
jq -r 'to_entries
| map([.key,
.value.asset_id, .value.referencekey, .value.hostname, .value.fqdn,
.value.network_zone[0], .value.network_zone[1],
.value.service.name, .value.service.account, .value.service.billing,
.value.aws.tags.Name, .value.aws.tags.Service, .value.aws.tags.Usecase, .value.aws.tags.billing, .value.aws.tags.OsVersion,
.value.aws.instance_type, .value.aws.ami_imageid, .value.aws.state])
| ["_key","asset_id","referencekey","hostname","fqdn","network_zone/0","network_zone/1","service/name","service/account","service/billing","aws/tags/Name","aws/tags/Service","aws/tags/Usecase","aws/tags/billing","aws/tags/OsVersion","aws/instance_type","aws/ami_imageid","aws/state"]
, .[]
| #csv' "$INPUT"
Remarks
If some nodes in the input JSON are missing, the code does not break but fills in empty values in the CSV file.
If more than two network zones are given, only the first two are covered in the CSV file

Combine multiple json to single json using jq

I am new to jq and stuck with this problem for a while. Any help is appreciable.
I have two json files,
In file1.json:
{
"version": 4,
"group1": [
{
"name":"olditem1",
"content": "old content"
}
],
"group2": [
{
"name":"olditem2"
}
]
}
And in file2.json:
{
"group1": [
{
"name" : "newitem1"
},
{
"name":"olditem1",
"content": "new content"
}
],
"group2": [
{
"name" : "newitem2"
}
]
}
Expected result is:
{
"version": 4,
"group1": [
{
"name":"olditem1",
"content": "old content"
},
{
"name" : "newitem1"
}
],
"group2": [
{
"name":"olditem2"
},
{
"name" : "newitem2"
}
]
}
Criterial for merge:
Has to merge only group1 and group2
Match only by name
I have tried
jq -S '.group1+=.group1|.group1|unique_by(.name)' file1.json file2.json
but this is filtering group1 and all other info are lost.
This approach uses INDEX to create a dictionary of unique elements based on their .name field, reduce to iterate over the group fields to be considered, and an initial state created by combining the slurped (-s) input files using add after removing the group fileds to be processed separately using del.
jq -s '
[ "group1", "group2" ] as $gs | . as $in | reduce $gs[] as $g (
map(del(.[$gs[]])) | add; .[$g] = [INDEX($in[][$g][]; .name)[]]
)
' file1.json file2.json
{
"version": 4,
"group1": [
{
"name": "olditem1",
"content": "new content"
},
{
"name": "newitem1"
}
],
"group2": [
{
"name": "olditem2"
},
{
"name": "newitem2"
}
]
}
Demo

Treat Entry As Primary Key Print Once, Print Associated Entry Array, As CSV, Drop Empties

I have records like these with sometimes duplicate srcPath entries, though with different references.
For example /content/dam/foo/about-bar/photos/rayDavis.PNG appears 3 times in one record, with different references.
I'd like to get the unique srcPath printed once, and the associated references.
I also have empty records,
{
"pages": []
}
I don't want to see those.
I'd really like a csv with:
srcPath, perhaps a different field like published, and first reference, second reference, third reference, etc. -- associated references array as consecutive comma separated values on the same line, like:
"/content/dam/foo/about-bar/pdf/theplan.pdf", true, "/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content2/image/link", "/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content2/textboximg/boxFtr", "/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content1/textboximg/text"
"/content/dam/foo/about-bar/photos/rayDavis.PNG", true, "/content/foo/en/about-bar/jcr:content/content1B/promos_1/image/fileReference", "/content/foo/en/about-bar/monkey-development/tales-of-giving/ray-moose-davis/jcr:content/content1/textboximg/fileReference", "/content/foo/en/about-bar/monkey-development/tales-of-giving/jcr:content/content1/textboximg_2/fileReference"
"/content/dam/foo/about-bar/pdf/foo_19thNewsletter.pdf", true, "/content/foo/en/gremlins/stay-tuned/jcr:content/content3/textboximg/text"
"/content/dam/foo/about-bar/pdf/barNews_fall1617.pdf", true, "/content/foo/en/gremlins/jcr:content/content2C/textboximg_114671747/text", "/content/dam/foo/about-bar/pdf/barNews_fall1617.pdf", "/content/foo/en/gremlins/stay-tuned/jcr:content/content3/textboximg_0/text"
In other words, unique srcPath entries with associated references.
I imagine if I wanted path too, I wouldn't be able to have unique srcPath lines in the csv?
DATA:
{
"pages": [
{
"srcPath": "/content/dam/foo/about-bar/pdf/theplan.pdf",
"srcTitle": "theplan.pdf",
"path": "/content/foo/en/about-bar/the-plan-and-vision",
"title": "the Plan and Vision",
"references": [
"/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content2/image/link",
"/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content2/textboximg/boxFtr",
"/content/foo/en/about-bar/the-plan-and-vision/jcr:content/content1/textboximg/text"
],
"published": false,
"isPage": "true"
}
]
}
{
"pages": []
}
{
"pages": []
}
{
"pages": [
{
"srcPath": "/content/dam/foo/about-bar/photos/rayDavis.PNG",
"srcTitle": "rayDavis.PNG",
"path": "/content/foo/en/about-bar",
"title": "About bar",
"references": [
"/content/foo/en/about-bar/jcr:content/content1B/promos_1/image/fileReference"
],
"published": true,
"isPage": "true"
},
{
"srcPath": "/content/dam/foo/about-bar/photos/rayDavis.PNG",
"srcTitle": "rayDavis.PNG",
"path": "/content/foo/en/about-bar/monkey-development/tales-of-giving/ray-moose-davis",
"title": "ray moose Davis",
"references": [
"/content/foo/en/about-bar/monkey-development/tales-of-giving/ray-moose-davis/jcr:content/content1/textboximg/fileReference"
],
"published": true,
"isPage": "true"
},
{
"srcPath": "/content/dam/foo/about-bar/photos/rayDavis.PNG",
"srcTitle": "rayDavis.PNG",
"path": "/content/foo/en/about-bar/monkey-development/tales-of-giving",
"title": "tales of Giving",
"references": [
"/content/foo/en/about-bar/monkey-development/tales-of-giving/jcr:content/content1/textboximg_2/fileReference"
],
"published": true,
"isPage": "true"
}
]
}
{
"pages": [
{
"srcPath": "/content/dam/foo/about-bar/pdf/foo_19thNewsletter.pdf",
"srcTitle": "foo_19thNewsletter.pdf",
"path": "/content/foo/en/gremlins/stay-tuned",
"title": "Stay tuned",
"references": [
"/content/foo/en/gremlins/stay-tuned/jcr:content/content3/textboximg/text"
],
"published": true,
"isPage": "true"
}
]
}
{
"pages": [
{
"srcPath": "/content/dam/foo/about-bar/pdf/barNews_fall1617.pdf",
"srcTitle": "barNews_fall1617.pdf",
"path": "/content/foo/en/gremlins",
"title": "gremlins",
"references": [
"/content/foo/en/gremlins/jcr:content/content2C/textboximg_114671747/text"
],
"published": true,
"isPage": "true"
},
{
"srcPath": "/content/dam/foo/about-bar/pdf/barNews_fall1617.pdf",
"srcTitle": "barNews_fall1617.pdf",
"path": "/content/foo/en/gremlins/stay-tuned",
"title": "Stay tuned",
"references": [
"/content/foo/en/gremlins/stay-tuned/jcr:content/content3/textboximg_0/text"
],
"published": true,
"isPage": "true"
}
]
}
You can use the following :
jq --raw-output '.pages | group_by(.srcPath)[] | [.[0].srcPath, .[0].published, .[].references[]] | #csv'
We group the pages by srcPath and map each group into an array that contains the srcPath and published of the first element of the group as well as the references of each element of the group. Each of these arrays will be a row in the CSV result.
Try it here !

How to create/modify a jupyter notebook from code (python)?

I am trying to automate my project create process and would like as part of it to create a new jupyter notebook and populate it with some cells and content that I usually have in every notebook (i.e., imports, titles, etc.)
Is it possible to do this via python?
You can do it using nbformat. Below an example taken from Creating an IPython Notebook programatically:
import nbformat as nbf
nb = nbf.v4.new_notebook()
text = """\
# My first automatic Jupyter Notebook
This is an auto-generated notebook."""
code = """\
%pylab inline
hist(normal(size=2000), bins=50);"""
nb['cells'] = [nbf.v4.new_markdown_cell(text),
nbf.v4.new_code_cell(code)]
fname = 'test.ipynb'
with open(fname, 'w') as f:
nbf.write(nb, f)
This is absolutely possible. Notebooks are just json files. This
notebook for example is just:
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Header 1"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2016-09-16T16:28:53.333738",
"start_time": "2016-09-16T16:28:53.330843"
},
"collapsed": false
},
"outputs": [],
"source": [
"def foo(bar):\n",
" # Standard functions I want to define.\n",
" pass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Header 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
},
"toc": {
"toc_cell": false,
"toc_number_sections": true,
"toc_threshold": 6,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 0
}
While messy it's just a list of cell objects. I would probably create my template in an actual notebook and save it rather than trying to generate the initial template by hand. If you want to add titles or other variables programmatically, you could always copy the raw notebook text in the *.ipynb file into a python file and insert values using string formatting.

Resources