word boundary in regex with mongolite - r

I am facing a problem using word boundary regex with mongolite. It looks like the word boundary \b does not work, whereas it works in norm MongoDB queries.
Here is a working example:
I create this toy collection:
db.test2.insertMany([
{ item: "journal gouttiere"},
{ item: "notebook goutte"},
{ item: "paper plouf"},
{ item: "planner gouttement"},
{ item: "postcard goutte"}
]);
With mongosh:
db.test2.aggregate(
{
$match: {
item: RegExp("\\bgoutte\\b")
}
})
returns:
[
{
"_id": {
"$oid": "63206efeb0e1e89db6ef0c20"
},
"item": "notebook goutte"
},
{
"_id": {
"$oid": "63206efeb0e1e89db6ef0c23"
},
"item": "postcard goutte"
}
]
But:
library(mongolite)
connection <- mongo(collection="test2",db="test",
url = "mongodb://localhost:27017",
verbose = T)
connection$aggregate(pipeline = '[{
"$match": {
"item":{"$regex" : "\\bgoutte\\b", "$options" : "i"}
}
}]',options = '{"allowDiskUse":true}')
returns 0 lines. Changing to
connection$aggregate(pipeline = '[{
"$match": {
"item":{"$regex" : "goutte", "$options" : "i"}
}
}]',options = '{"allowDiskUse":true}')
Imported 3 records. Simplifying into dataframe...
_id item
1 63206efeb0e1e89db6ef0c20 notebook goutte
2 63206efeb0e1e89db6ef0c22 planner gouttement
3 63206efeb0e1e89db6ef0c23 postcard goutte
It looks like the word boundary regex does not work the same with mongolite. What is the proper solution ?

Ottie is right (and should post an answer!–I'd be fine with deleting mine then):
Backslashes have special meaning for both R and in the regex. You need two additional backslashes (one per \) to pass \\ from R to mongoDB (where you escape \b by \\b), see e.g. this SO question. I just checked:
con <- mongo(
"test",
url = "mongodb+srv://readwrite:test#cluster0-84vdt.mongodb.net/test"
)
con$insert('{"item": "notebook goutte" }')
con$insert('{"item": "postcard goutte" }')
Now
con$aggregate(pipeline = '[{
"$match": {
"item":{"$regex" : "\\\\bgoutte\\\\b", "$options" : "i"}
}
}]',options = '{"allowDiskUse":true}')
yields
_id item
1 63234ac1435f9b7c2a0787c2 notebook goutte
2 63234ac5435f9b7c2a0787c5 postcard goutte

Related

jq pipe operator in nested array

Using code below to pull data from a local json file.
The file is very large and is nested with objects and arrays. There are multiple objects in the .ratings[] that I would like to extract.
How can I use the pipe operator in the .ratings[] array so that I don't have to retype .ratings[] for each piece of data that I would like to pull?
jq -r '.players[] | [.firstName,.lastName,.tid,.pid,.ratings[].spd,.ratings[].jmp] | join(", ")'
You can enclose it in () to use the pipe sign:
.players[] | [.firstName, .lastName, .tid, .pid, (.ratings[] | .spd, .jmp)] | join(", ")
Try it online
You didn't specify the expected output, so it is not clear if your proposed solution gives you the output you want.
Given the following input:
{
"players": [
{
"firstName": "fname1",
"lastName": "lname1",
"tid": "tid1",
"pid": "pid1",
"ratings": [
{
"spd": "spd1-1",
"jmp": "jmp1-1"
}
]
},
{
"firstName": "fname2",
"lastName": "lname2",
"tid": "tid2",
"pid": "pid2",
"ratings": [
{
"spd": "spd2-1",
"jmp": "jmp2-1"
},
{
"spd": "spd2-2",
"jmp": "jmp2-2"
}
]
},
{
"firstName": "fname3",
"lastName": "lname3",
"tid": "tid3",
"pid": "pid3",
"ratings": [
{
"spd": "spd3-1",
"jmp": "jmp3-2"
},
{
"spd": "spd3-2",
"jmp": "jmp3-2"
},
{
"spd": "spd3-3",
"jmp": "jmp3-3"
}
]
}
]
}
Your solution and the answer from 0ston0 will give you 1 line per player, but a different number of columns per line:
.players[] | [.firstName,.lastName,.tid,.pid,(.ratings[]|.spd,.jmp)] | join(", ")
generates:
fname1, lname1, tid1, pid1, spd1-1, jmp1-1
fname2, lname2, tid2, pid2, spd2-1, jmp2-1, spd2-2, jmp2-2
fname3, lname3, tid3, pid3, spd3-1, jmp3-2, spd3-2, jmp3-2, spd3-3, jmp3-3
This might or might not be what want your result to look like.
A different solution will print one line per rating, but duplicate the players' names. Running:
.players[] | [.firstName,.lastName,.tid,.pid] + (.ratings[]|[.spd,.jmp]) | join(", ")
will result in:
fname1, lname1, tid1, pid1, spd1-1, jmp1-1
fname2, lname2, tid2, pid2, spd2-1, jmp2-1
fname2, lname2, tid2, pid2, spd2-2, jmp2-2
fname3, lname3, tid3, pid3, spd3-1, jmp3-2
fname3, lname3, tid3, pid3, spd3-2, jmp3-2
fname3, lname3, tid3, pid3, spd3-3, jmp3-3
Both solutions are valid for different use cases and depending on how you are going to subsequently process the data.

Parse an array of json object using jq

I am trying to parse the below json file and store the result into another json file. How do I achieve this ?
{
"Objects": [
{
"ElementName": "Test1",
"ElementArray": ["abc","bcd"],
"ElementUnit": "4"
},
{
"ElementName": "Test2",
"ElementArray": ["abc","bcde"],
"ElementUnit": "8"
}
]
}
Expected result :
{
"Test1" :[
"abc","bcd"
],
"Test2" :[
"abc","bcde"
]
}
I've tried something on the lines of the below but I seem to be off -
jq '[.Objects[].ElementName ,(.Objects[]. ElementArray[])]' user1.json
jq ".Objects[].ElementName .Objects[].ElementArray" ruser1.json
Your expected output needs to be wrapped in curly braces in order to be a valid JSON object. That said, use from_entries to create an object from an array of key-value pairs, which can be produced by accordingly mapping the input object's Objects array.
.Objects | map({key: .ElementName, value: .ElementArray}) | from_entries
{
"Test1": [
"abc",
"bcd"
],
"Test2": [
"abc",
"bcde"
]
}
Demo
Demo
https://jqplay.org/s/YbjICOd8EJ
You can also use reduce
reduce .Objects[] as $o ({}; .[$o.ElementName]=$o.ElementArray)

Select code block or paragraph in VS Code/R

I would like to select an R code block via a shortcut.
At the moment I am using CTRL+L to select the current line and CTRL+ALT+UP/DOWN to expand the selection. This, however, is cumbersome.
Is there any way to tell VS Code to select everything in a paragraph?
Example:
library(dplyr)
starwars %>%
filter(species == "Droid")
starwars %>%
|mutate(name, bmi = mass / ((height / 100) ^ 2)) %>% # <- The cursor is where "|" is for example
select(name:mass, bmi)
This is what should be selected in this example:
starwars %>%
mutate(name, bmi = mass / ((height / 100) ^ 2)) %>%
select(name:mass, bmi)
This could be done with the aid of an extension. See, e.g., the Select By extension in which you can specify the starting and ending regex within the keybinding. Keybinding:
{
"key": "alt+q", // whatever you want
"command": "selectby.regex",
"args": {
"flags": "m",
"backward": "^\\w", // since your block starts flush left apparently
"forward": "\n^$", // stop at first empty line
"forwardInclude": false,
"backwardInclude": true
}
}
Here is one that I wrote: Jump and Select. Use this keybinding:
{
"key": "alt+q", // whatever keybinding you want
"command": "jump-and-select.jumpBackwardSelect",
"args": {
"text": "^\\w",
"putCursorBackward": "beforeCharacter",
"restrictSearch": "document"
}
}
This should select from the cursor back to the first blank line (given your well-structured code examples).
To select the block from anywhere, you also need a macro extension like multi-command and this keybinding:
{
"key": "alt+q",
"command": "extension.multiCommand.execute",
"args": {
// "interval": 200,
"sequence": [
{
"command": "jump-and-select.jumpBackward",
"args": {
"text": "^\\w",
"putCursorBackward": "beforeCharacter",
}
},
{
"command": "jump-and-select.jumpForwardSelect",
"args": {
"text": "^[^\\w]$\n?",
"putCursorBackward": "afterCharacter",
}
}
]
},
"when": "editorFocus"
},

Using `sed` to remove all elements from JSON array

I know StackOverflow is not a code-writing-service, but sed has been driving me crazy for the past 3 hours.
In summary, I need to modify the contents of a .json file that I have.
What the file looks like:
{
// ...
"first": {
"second": [
"indexZero",
"theseStringsAreDynamic",
"soINeedToUseWildcard"
]
}
// ...
}
Desired result:
{
// ...
"first": {
"second": [
]
}
// ...
}
What have you done?
I have tried about a million variations loosely based upon:
sed -i 's/\"second\": \[.*\]/\"second\": []/' "my.json"
## ~ Which gives: ~
#
# "first": {
# "second": []
# "indexZero",
# "theseStringsAreDynamic",
# "soINeedToUseWildcard"
# ]
# },
Essentially, I need to remove all elements from an array in a .json file, so if sed is not the correct tool for the job, please let me know.
Thank you for your time.
The correct tool for the job is jq:
$ jq '.first.second = []' input.json
{
"first": {
"second": []
}
}
To replace the original file, it's a two step process - redirect output to a temporary file and then rename it:
jq '.first.second = []' orig.json > tmp.json && mv -f tmp.json orig.json

Parsing JSON dict of CloudFormation parameters for '--parameter-overrides'

I'm using AWS CloudFormation at the moment, and I need to parse out parameters due to differences between stack creation and deployment. Command aws cloudformation create accepts a JSON file, but aws cloudformation deploy only accepts inlined application parameters of Key=Value type.
I have this JSON file:
[
{
"ParameterKey": "EC2KeyPair",
"ParameterValue": "$YOUR_EC2_KEY_PAIR"
},
{
"ParameterKey": "SSHLocation",
"ParameterValue": "$YOUR_SSH_LOCATION"
},
{
"ParameterKey": "DjangoEnvVarDebug",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DEBUG"
},
{
"ParameterKey": "DjangoEnvVarSecretKey",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_SECRET_KEY"
},
{
"ParameterKey": "DjangoEnvVarDBName",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DB_NAME"
},
{
"ParameterKey": "DjangoEnvVarDBUser",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DB_USER"
},
{
"ParameterKey": "DjangoEnvVarDBPassword",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DB_PASSWORD"
},
{
"ParameterKey": "DjangoEnvVarDBHost",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DB_HOST"
}
]
And I want to turn it into this:
'EC2KeyPair=$YOUR_EC2_KEY_PAIR SSHLocation=$YOUR_SSH_LOCATION DjangoEnvVarDebug=$YOUR_DJANGO_ENV_VAR_DEBU
G DjangoEnvVarSecretKey=$YOUR_DJANGO_ENV_VAR_SECRET_KEY DjangoEnvVarDBName=$YOUR_DJANGO_ENV_VAR_DB_NAME D
jangoEnvVarDBUser=$YOUR_DJANGO_ENV_VAR_DB_USER DjangoEnvVarDBPassword=$YOUR_DJANGO_ENV_VAR_DB_PASSWORD Dj
angoEnvVarDBHost=$YOUR_DJANGO_ENV_VAR_DB_HOST'
This would be the equivalent Python code:
thing = json.load(open('stack-params.example.json', 'r'))
convert = lambda item: f'{item["ParameterKey"]}={item["ParameterValue"]}'
>>> print(list(map(convert, thing)))
['EC2KeyPair=$YOUR_EC2_KEY_PAIR', 'SSHLocation=$YOUR_SSH_LOCATION', 'DjangoEnvVarDebug=$YOUR_DJANGO_ENV_V
AR_DEBUG', 'DjangoEnvVarSecretKey=$YOUR_DJANGO_ENV_VAR_SECRET_KEY', 'DjangoEnvVarDBName=$YOUR_DJANGO_ENV_
VAR_DB_NAME', 'DjangoEnvVarDBUser=$YOUR_DJANGO_ENV_VAR_DB_USER', 'DjangoEnvVarDBPassword=$YOUR_DJANGO_EN$
_VAR_DB_PASSWORD', 'DjangoEnvVarDBHost=$YOUR_DJANGO_ENV_VAR_DB_HOST']
>>> ' '.join(map(convert, thing))
'EC2KeyPair=$YOUR_EC2_KEY_PAIR SSHLocation=$YOUR_SSH_LOCATION DjangoEnvVarDebug=$YOUR_DJANGO_ENV_VAR_DEBU
G DjangoEnvVarSecretKey=$YOUR_DJANGO_ENV_VAR_SECRET_KEY DjangoEnvVarDBName=$YOUR_DJANGO_ENV_VAR_DB_NAME D
jangoEnvVarDBUser=$YOUR_DJANGO_ENV_VAR_DB_USER DjangoEnvVarDBPassword=$YOUR_DJANGO_ENV_VAR_DB_PASSWORD Dj
angoEnvVarDBHost=$YOUR_DJANGO_ENV_VAR_DB_HOST'
I have this little snippet:
$ cat stack-params.example.json | jq '.[] | "\(.ParameterKey)=\(.ParameterValue)"'
"EC2KeyPair=$YOUR_EC2_KEY_PAIR"
"SSHLocation=$YOUR_SSH_LOCATION"
"DjangoEnvVarDebug=$YOUR_DJANGO_ENV_VAR_DEBUG"
"DjangoEnvVarSecretKey=$YOUR_DJANGO_ENV_VAR_SECRET_KEY"
"DjangoEnvVarDBName=$YOUR_DJANGO_ENV_VAR_DB_NAME"
"DjangoEnvVarDBUser=$YOUR_DJANGO_ENV_VAR_DB_USER"
"DjangoEnvVarDBPassword=$YOUR_DJANGO_ENV_VAR_DB_PASSWORD"
"DjangoEnvVarDBHost=$YOUR_DJANGO_ENV_VAR_DB_HOST"
But I'm not sure how to join the strings together. I was looking at reduce but I think it only works on lists, and streams of strings aren't lists. So I'm thinking the correct approach is to convert the key : value association into 'key=value' strings within the list, then join altogether, though I have trouble working with the regex. Does anybody have any tips?
The goal as exemplified by the illustrative output seems highly dubious, but it can easily be achieved using the -r command-line option together with the filter:
map("\(.ParameterKey)=\(.ParameterValue)") | "'" + join(" ") + "'"
Footnote
I was looking at reduce but I think it only works on lists, and streams of strings aren't lists.
To use reduce on a list, say $l, you could simply use [] as in:
reduce $l[] as $x (_;_)

Resources