How to send erlang functions source to riak mapreduce via HTTP? - riak

I'm trying to use Riak's mapreduce via http. his is what i'm sending:
{
"inputs":{
"bucket":"test",
"key_filters":[["matches", ".*"]]
},
"query":[
{
"map":{
"language":"erlang",
"source":"value(RiakObject, _KeyData, _Arg) -> Key = riak_object:key(RiakObject), Count = riak_kv_crdt:value(RiakObject, <<\"riak_kv_pncounter\">>), [ {Key, Count} ]."
}
}
]}
Riak fails with "[worker_startup_failed]", which isn't very informative. Could anyone please help me get this to actually execute the function?

WARNING
Allowing arbitrary Erlang functions via map-reduce is a security risk. Any valid Erlang can be executed, including sending your entire data set offsite or formatting the hard drive.
You have been warned.
However, if you implicitly trust any client that may connect to your cluster, you can allow Erlang source to be passed in a map-reduce request by setting {allow_strfun, true} in the riak_kv section of app.config, (or in the advanced.config if you are using riak.conf).
Once you have allowed passing an Erlang function in a map-reduce phase, you need to pass in a function of the form fun(RiakObject,KeyData,Arg) -> [result] end. Note that this must be an anonymous fun, so fun is a keyword, not a name, and it must end with end.
Your function should handle the case where {error,notfound} is passed as the first argument instead of an object. Simply adding a catch-all clause to the function could accomplish that.
Perhaps something like:
{
"inputs":{
"bucket":"test",
"key_filters":[["matches", ".*"]]
},
"query":[
{
"map":{
"language":"erlang",
"source":"fun(RiakObject, _KeyData, _Arg) ->
Key = riak_object:key(RiakObject),
Count = riak_kv_crdt:value(
RiakObject,
<<\"riak_kv_pncounter\">>),
[ {Key, Count} ];
(_,_,_) -> [{error,0}]
end."
}
}
]}
Allowing the source to be passed in the request is very useful while developing and debugging. For production, you really should put the functions in a dedicated pre-compiled module that you copy to the code path of each node so that the phase spec can specify the module and function by name instead of providing arbitrary code.
{"map":{
"language":"erlang",
"module":"yourprecompiledmodule",
"function":"functionname"}}

You need to enable allow_strfun on all nodes in your cluster. To do so in Riak 2, you will need to use the advanced.config file to add this to the riak_kv configuration:
[
{riak_kv, [
{allow_strfun, true}
]}
].
The other option is to create your own Erlang module by using the compiler shipped with Riak and placing the *.beam file in a well-known location for Riak to find. The basho-patches directory is one such place.
Please see the documentation as well:
advanced.config
Installing custom Erlang code
HTTP MapReduce
Using MapReduce
Advanced MapReduce
MapReduce / curl example

Related

How to send GET request longer than 65535 symbols from rust?

I am rewriting part of my API from python to rust. In particular, I am trying to make an HTTP request to OSRM server to get a big distance matrix. This kind of request can have quite large URLs. In python everything works fine, but in rust I get an error:
thread 'tokio-runtime-worker' panicked at 'a parsed Url should always be a valid Uri: InvalidUri(TooLong)'
I have tried to use several HTTP client libraries: reqwest, surf, isahc, awc. But it turns out that constraining logic is located at the URL processing library https://github.com/hyperium/http and most HTTP clients depend on this library. So they behave the same. I could not use some libs, for example with awc I got compile-time errors with my async code.
Is there any way to send a large GET request from rust, preferably asynchronously?
As freakish pointed out in the comments already, having such a long URL is a bad idea, anything longer than 2,000 characters won't work in most browsers.
That being said: In the comments, you stated that an external API wants those crazily long URIs, so you don't really have an alternative. Therefore, let's give this problem a shot.
It looks like the limitation to 65.534 bytes is because the http library stores the position of the query string as a u16 (and uses 65,535 if there is no query part). The following patch seems to make the code use u32 instead, thereby raising the number of characters to 4,294,967,294 (if you've got longer URIs than that, you might be able to use u64 instead, but that would be an URI of a length greater than 4 GB – I doubt you need this):
--- a/src/uri/mod.rs
+++ b/src/uri/mod.rs
## -141,7 +141,7 ## enum ErrorKind {
}
// u16::MAX is reserved for None
-const MAX_LEN: usize = (u16::MAX - 1) as usize;
+const MAX_LEN: usize = (u32::MAX - 1) as usize;
// URI_CHARS is a table of valid characters in a URI. An entry in the table is
// 0 for invalid characters. For valid characters the entry is itself (i.e.
diff --git a/src/uri/path.rs b/src/uri/path.rs
index be2cb65..9abec4c 100644
--- a/src/uri/path.rs
+++ b/src/uri/path.rs
## -11,10 +11,10 ## use crate::byte_str::ByteStr;
#[derive(Clone)]
pub struct PathAndQuery {
pub(super) data: ByteStr,
- pub(super) query: u16,
+ pub(super) query: u32,
}
-const NONE: u16 = ::std::u16::MAX;
+const NONE: u32 = ::std::u32::MAX;
impl PathAndQuery {
// Not public while `bytes` is unstable.
## -32,7 +32,7 ## impl PathAndQuery {
match b {
b'?' => {
debug_assert_eq!(query, NONE);
- query = i as u16;
+ query = i as u32;
break;
}
b'#' => {
You could try to get this merged, however the issue covering this problem sounds like a pull request might not be accepted. Depending on your use case, you could fork the repository, commit the fix and then use the Cargo features for overriding dependencies to make Cargo use your patched version instead of the version in the repositories. The following addition to your Cargo.toml might get you started:
[patch.crates-io]
http = { git = 'https://github.com/your/repository' }
Note however that this only overrides the current version of the Uri crate – as soon as a new version of the original crate is published, it will probably be chosen by Cargo until you update your fork.

Evernote IOS SDK fetchResourceByHashWith throws exception

Working with Evernote IOS SDK 3.0
I would like to retrieve a specific resource from note using
fetchResourceByHashWith
This is how I am using it. Just for this example, to be 100% sure about the hash being correct I first download the note with a single resource using fetchNote and then request this resource using its unique hash using fetchResourceByHashWith (hash looks correct when I print it)
ENSession.shared.primaryNoteStore()?.fetchNote(withGuid: guid, includingContent: true, resourceOptions: ENResourceFetchOption.includeData, completion: { note, error in
if error != nil {
print(error)
seal.reject(error!)
} else {
let hash = note?.resources[0].data.bodyHash
ENSession.shared.primaryNoteStore()?.fetchResourceByHashWith(guid: guid, contentHash: hash, options: ENResourceFetchOption.includeData, completion: { res, error in
if error != nil {
print(error)
seal.reject(error!)
} else {
print("works")
seal.fulfill(res!)
}})
}
})
Call to fetchResourceByHashWith fails with
Optional(Error Domain=ENErrorDomain Code=0 "Unknown error" UserInfo={EDAMErrorCode=0, NSLocalizedDescription=Unknown error})
The equivalent setup works on Android SDK.
Everything else works so far in IOS SDK (chunkSync, auth, getting notebooks etc.. so this is not an issue with auth tokens)
would be great to know if this is an sdk bug or I am still doing something wrong.
Thanks
This is a bug in the SDK's "EDAM" Thrift client stub code. First the analysis and then your workarounds.
Evernote's underlying API transport uses a Thrift protocol with a documented schema. The SDK framework includes a layer of autogenerated stub code that is supposed to marshal input and output params correctly for each request and response. You are invoking the underlying getResourceByHash API method on the note store, which is defined per the docs to accept a string type for the contentHash argument. But it turns out the client is sending the hash value as a purely binary field. The service is failing to parse the request, so you're seeing a generic error on the client. This could reflect evolution in the API definition, but more likely this has always been broken in the iOS SDK (getResourceByHash probably doesn't see a lot of usage). If you dig into the more recent Python version of the SDK, or indeed also the Java/Android version, you can see a different pattern for this method: it says it's going to write a string-type field, and then actually emits a binary one. Weirdly, this works. And if you hack up the iOS SDK to do the same thing, it will work, too.
Workarounds:
Best advice is to report the bug and just avoid this method on the note store. You can get resource data in different ways: First of all, you actually got all the data you needed in the response to your fetchNote call, i.e. let resourceData = note?.resources[0].data.body and you're good! You can also pull individual resources by their own guid (not their hash), using fetchResource (use note?.resources[0].guid as the param). Of course, you may really want to use the access-by-hash pattern. In that case...
You can hack in the correct protocol behavior. In the SDK files, which you'll need to build as part of your project, find the ObjC file called ENTProtocol.m. Find the method +sendMessage:toProtocol:withArguments.
It has one line like this:
[outProtocol writeFieldBeginWithName:field.name type:field.type fieldID:field.index];
Replace that line with:
[outProtocol writeFieldBeginWithName:field.name type:(field.type == TType_BINARY ? TType_STRING : field.type) fieldID:field.index];
Rebuild the project and you should find that your code snippet works as expected. This is a massive hack however and although I don't think any other note store methods will be impacted adversely by it, it's possible that other internal user store or other calls will suddenly start acting funny. Also you'd have to maintain the hack through updates. Probably better to report the bug and don't use the method until Evernote publishes a proper fix.

How to concatenate constant string with jsonpath

I have AWS step machine and one of the step is used to notify failure using SNS service. I want to select some metadata from input json into outgoing message. So i am trying to concatenate constant string with jsonpath like below
"Notify Failure": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"Message.$": "A job submitted through Step Functions failed for document id $.document_id",
"Subject":"Job failed",
"TopicArn": "arn:aws:sns:us-west-2:xxxxxxx:xxxxxxxx"
},
"End": true
}
where document_id is one of the property in input json
However when i try save state machine defination i get error
There is a problem with your ASL definition, please review it and try
again The value for the field 'Message.$' must be a valid JSONPath
I was able to solve a similar issue using:
"Message.$": "States.Format('A job submitted through Step Functions failed for document id {}', $.document_id)",
Described in a AWS News Blog post.
The JSONPath implementation referenced from the AWS Step Functions documentation supports String concatenation via $.concat($..prop) but sadly this does not work when deployed to AWS, suggesting that AWS uses a different implementation.
Therefore there is no way to do string concatenation with JSONPath in AWS.
As the message suggest you need to provide a valid JSONPath.
"Message.$": "$.document_id"
You cannot use any string interpolation as it invalidates the JSONPath format. You will need to construct the message in the preceding state.
I know that this thread is quite old, but I think it might be useful for some people.
It IS actually possible to concatenate strings or JSONPaths in AWS Step Functions thanks to the function States.Format.
The principle is the same as the string format method in Python.
Example with strings
"States.Format('{}<separator_1>{}<separator_2>{}', 'foo', 'bar', 'baz')"
will give you
'foo<separator_1>bar<separator_2>baz'
Example with JSONPaths
"States.Format('{}<separator>{}', $.param_1, $.param_2)"
will give you
'<value of param_1><separator><value of param_2>'
NB: You can also combine strings with JSONPaths.
Hope it helps!

Cosmos Db library Microsoft.Azure.DocumentDB.Core (2.1.0) - Actual REST invocations

We are attempting to Wiremock (https://github.com/WireMock-Net/WireMock.Net) CosmosDb invocations - so we can build integrationtests in our .net core 2.1 microservice.
By looking at the WireMock instance Request/Response entries, we can observe the following:
1) GET towards "/"
We mock the returning metadata of databases
THIS IS OK
2) GET towards collection (in our case: "/dbs/Chunker/colls/RHTMLChunks")
Returns metadata about the collections
THIS IS OK
3) POST a Query that results in one document being returned towards the documents endpoint on the collection (in our case: "/dbs/Chunker/colls/RHTMLChunks/docs")
I have tried to emulate what we get when we do the exact same query towards the CosmosDb instance in Postman, including headers and response.
However I observe that the lib does the query again, and again, and again....
(I can see this by pausing in Visual Studio, then look at the RequestLog in WireMock)
Does anyone know what should be returned. I have set up WireMock to return the following json payload:
{
"_rid": "q0dcAOelSAI=",
"Documents": [
{
"id": "gL20020621z2D34-1",
"ChunkSize": 658212,
"TotalChunks": 2,
"Metadata": {
"Active": true,
"PublishedDate": "",
},
"ChunkId": 1,
"Markup": "<h1>hello</h1>",
"MainDestination": "gL20020621z2D34",
"_rid": "q0dcAOelSAIHAAAAAAAAAA==",
"_self": "dbs/q0dcAA==/colls/q0dcAOelSAI=/docs/q0dcAOelSAIHAAAAAAAAAA==/",
"_etag": "\"0100e92a-0000-0000-0000-5ba96cf70000\"",
"_attachments": "attachments/",
"_ts": 1537830135
}
],
"_count": 0
}
Problems:
1) Can not find .pdb belonging to Microsoft.Azure.DocumentDB.Core v2.1.0
2) What payload/headers should be returned, so the library will NOT blow up, and retry when we invoke:
var response = await documentQuery.ExecuteNextAsync<DocumentDto>(); // this hangs forever
Please help :)
We're working on open sourcing the C# code base and some other fun improvements to make this easier. In the mean time, I'd advocate for using the emulator for local testing/etc., although I understand mocking is still a lot faster an nicer - it'll just be hard :)
My best pointer is actually our Node.js code base since that's public already. The query code is relatively hard to follow, but basically, you create a query, we look up all the partitions we need to talk to, then we send a request for each partition and keep querying until we don't get back a continuation token anymore (or maxBufferedItem Count/etc. goes over the limit, and we pause until goes back down, etc.)
Effectively, we send out N number of requests for each partition, where N is the number of pages of results, and can vary per partition and query. You'd likely be able to mock a single partition, single page response relatively easy, but a full partition response isn't gonna be fun.
As I mentioned in the beginning, we've got some cool stuff coming, hopefully before the end of the year, which will make offline mocking easier, as well as open sourcing it finally. You might be better off with the emulator until then.

Key vault values from deployment, and linked templates parameters

I have a template to create a key vault and a secret within it. I also have a service fabric template, that requires 3 things from the key vault: the Vault URI, the certificate URL, and the certificate thumbprint.
If I create the key vault and secret with powershell, it is easy to manually copy these 3 things from the output, and paste them into the parameters of the service fabric template. However, what I am hoping to do, due to the fact that this cert has the same life cycle as the service fabric cluster, is to link from the key vault template to the service fabric template, so when I deploy the key vault and secret (which btw is a key that has been base 64 encoded to a string. I could have this as a secret in yet another key vault...), I can pass the 3 values on as parameters.
So I have two questions.
How do I retrieve the 3 values in the arm template. Powershell outputs them as 'ResourceId' of the key vault, 'Id' of the secret, and 'Version' of the secret. My attempt:
"sourceVaultValue": {
"value": "resourceId('Microsoft.KeyVault/vaults/', parameters('keyVaultName')"
},
"certificateThumbprint": {
"value": "[listKeys(resourceId('secrets', parameters('secretName')), '2015-06-01')"
},
"certificateUrlValue": { "value": "[concat('https://', parameters('keyVaultName'), '.vault.azure.net:443/secrets/', parameters('secretName'), resourceId('secrets', parameters('secretName')))]"
But the certificateUrlValue is incorrect. You can see I tried with and without listKeys, but neither seemed to work... (The thumbprint is within the certUrl itself)
If I were to get the correct values, I would like to try pass them as parameters to the next template. The template in question has quite a few more parameters than the 3 I want to pass however. So is it possible to have a parametersLink element to link to the parameter file, as well as a parameters element for just those 3? Or is there an intended way of doing this?
Cheers
Ok, try this when you get back to the keyboard...
1) for the uri, you can use an output like:
"secretUri": {
"type": "string",
"value": "[reference(resourceId('Microsoft.KeyVault/vaults/secrets', parameters('keyVaultName'), parameters('secretName'))).secretUri]"
}
For #2, you cannot mix and match the link and some values, it's one or the other.
A couple thoughts on how you could do this (it depends a bit on how you want to structure the rest of your deployment)...
One way to think of this is instead of nesting the SF, deploy them in the same template since they have the same lifecycle
instead of nesting the SF template, nest the KV template and reference the outputs of that deployment in the SF template...
Aside from that I can't think of anything elegant - since you want to pass "dynamic" params to a nested deployment really the only way to do that is to dynamically write the param file behind the link or pass all the params into the deployment resource.
HTH - LMK if it doesn't...
Can't Reference a secret with dynamic id !!!!
The obvious problems with this way of doing things are:
Someone needs to type the cleartext password which means:
it needs to be known to anyone who provisions the environment and how do I feed it into an automated environment deployment? If I store the password in a parameter… ???????
"variables": {
"tenantPassword": {
"reference": {
"keyVault": {
"ID": "[concat(subscription().id,'/resourceGroups/',parameters('keyVaultResourceGroup'),'/providers/Microsoft.KeyVault/vaults/', parameters('VaultName'))]"
},
"secretName": "tenantPassword"
}
}
},

Resources