Batching technique using Protobufs - http

Is there an efficient technique to batch different Protobuf events while sending over HTTP?
The goal is to have a list of multi-type Protobuf messages in one request. One idea I have is to separate messages in small arrays and specify their type to be able to deserialize them on the server.

You can use some Any message type combined with repeated, as follows:
message Any {
string type_url = 1;
bytes value = 2;
}
message Envelope {
repeated Any events = 1;
}
Then, in your code:
when serializing, you must set type_url according to the message type that you serialize in value
when deserializing, you must read type_url to know which type is contained in value, and deserialize accordingly
The example above reproduces the google/protobuf/any, that is documented here:
https://developers.google.com/protocol-buffers/docs/proto3#any

Related

Accessing pubsublite message attributes in beam pipeline - Java

We have been using PubSubLite in our Go program without any issues and I just started using the Java library with Beam.
Using the PubSubLite IO, we get PCollection of SequencedMessage specifically: https://cloud.google.com/java/docs/reference/google-cloud-pubsublite/latest/com.google.cloud.pubsublite.proto.SequencedMessage
Now, from it I can get the data by doing something like:
message.getMessage().getData().toByteArray()
and then doing the normal conversion.
But for attributes, I cannot seem to get it correctly, just the value. In Go, I could do:
msg.Attributes["attrKey"]
but when I do:
message.getMessage().getAttributesMap().get("attrKey")
I am getting an Object which I cannot seem to convert to just string value of it. As far as I understand, it returns a Map<String, AttributeValues> and they all seem to be just wrapper over the internal protobuf. Also, Map is an interface so how do I get to the actual implementation to get the underlying value of each of the attribute.
The SequencedMessage attributes represent a multimap of string to bytes, not a map of string to string like in standard Pub/Sub. In the go client, by default the client will error if there are multiple values for a given key or if any of the values is not valid UTF-8, and thus presents a map[string]string interface.
When you call message.getMessage().getAttributesMap().get("attrKey"), you have a value of type AttributeValues which is a holder for a list of ByteStrings. To convert this to a single String, you would need to throw if the list is not of length 1, then call toStringUtf8 on the byte string element with index 0.
If you wish to interact with the standard Pub/Sub message format like you would in go, you can convert to this format by doing:
import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage;
import org.apache.beam.sdk.io.gcp.pubsublite.CloudPubsubTransforms;
PCollection<SequencedMessage> messages = ...
PCollection<PubsubMessage> transformed = messages.apply(CloudPubsubTransforms.toCloudPubsubMessages());

How can I stream arrays with Grpc?

I want to transport an array of int64.
I looked up how to use it. In my proto-file I either need a stream:
service myService {
rpc GetValues(myRequest) returns (stream myResponse);
}
message myRequest {
}
message myResponse {
int64 values = 1;
}
or a repeated response:
message myRepeatedResponse {
repeated int64 value = 1;
}
Is one option better than the other?
My use case is that I want to read the latest x entries from my Database and send these values as an array to my client.
But I didn't get how I am supposed to do it, because when assigning the values in the overridden function of MyService.MyService.Base I can only pass values of type 'long' and not 'long[]'.
For the stream vs repeated question, the answer is: it depends.
The distinction between the two is that:
streaming sends one or more messages (each message possibly containing repeated fields)
unary sends a single message containing a repeated field
So, I think your decision is based upon:
how the server obtains the repeated field.
the size of the message (including the repeated field that's being sent)
the 'integrity' of the message
If the server is unable to obtain the entirety of the repeated field in one go, then your answer is simpler; the server will need to stream the messages (including the repeated field) as it obtains them.
By 'integrity' of the message, is there some reason why decomposing the message into many (to stream) is problematic. If the repeated field must be transmitted as a single chunk, almost as a transactional unit, then I think you may prefer to not stream the message as chunks.
You should also consider the consequence on your client(s). Are your clients able to receive one larger message or, would many smaller messages be preferred, e.g. an IoT SoC device that's resource constrained.
Otherwise, if individual messages are large1, then you'd want to decompose them into smaller 'bites' and stream them.
1: Large Data Set and note that there is a hard limit of 2GB/message.

How GRPC handle pointer that appear more then once?

For example (golang):
type {
Product struct {
Name string
}
Customer struct {
Name string
Products []*Product
}
}
Which is the correct behavior:
GRPC honor the *Product pointer and transfer it only once.
GRPC will transfer the same *Product as many times as it associated to different Customer.
Michael,
It is not clear on your message, but I am assuming that you will send a Customer as part of your request to a gRPC server.
Golang will marshal the struct into []byte (https://godoc.org/github.com/golang/protobuf/proto#Marshal), so the message will not have such thing as a pointer. It will be just an encoded message. (see
https://github.com/golang/protobuf/blob/master/proto/wire.go#L22).
gRPC is not a Golang thing, so a pointer on a side (e.g. server) does not mean it must be a point on the other side (e.g. client).
Finally, answering your question, the expected behavior is 2. However, you may take a deeper look into proto buff serialization (https://developers.google.com/protocol-buffers/docs/encoding). I have no idea how it works, but maybe the message is compressed, so repeated []bytes maybe be discarded.

Is there a tangible benefit to using wrapper requests over plain messages in grpc service calls?

Lets say we have a message containing ID of some record in the database
message Record {
uint64 id = 1;
}
We also have an rpc call that returns all of the rows from table DATA that said record is mentioned in.
rpc GetDataForRecord(Record) returns (Data) {}
If we, for example, wrap Record in
RqData{
Record id = 1;
}
then once we need to only return, for example, "active" data, we won't need to make
GetActiveDataForRecord
instead we could extend RqData as:
RqData{
Record id = 1;
bool use_active = 2;
}
and use
rpc GetDataForRecord(RqData) returns (Data) {}
and clients that know of this new functionality will be able to call it, while older clients will just use it as it was passing only Record part within the Rq wrapper, without specifying active or not.
Here's the question: is there really a reason to use this kind of wrapping of everything into a separate request, or am I overthinking things and just passing plain structures will do?
I am kinda trying to think about the future, but not sure if I am not overcomplicating things.
In general, making a method-specific request and response is a Good Thing™ and is encouraged. For a Foo method you'd have FooRequest and FooResponse. Having specialized messages for the method allows you to add new "arguments," as you mentioned.
But for some cases it turns out fine to break the pattern and avoid the wrapping; it's a judgement call. Although you're asking from a different perspective, you may be interested in this answer about related methods.

Sending HTTP request with multiple parameters having same name

I need to send a HTTP request (and get XML response) from Flash that looks similar to following:
http://example.com/somepath?data=1&data=2&data=3
I.e. having several parameters that share same name, but have different values.
Until now I used following code to make HTTP requests:
var resp:XML = new XML();
resp.onLoad = function(success:Boolean) {/*...*/};
resp.ignoreWhite = true;
var req:LoadVars = new LoadVars();
req["someParam1"] = 3;
req["someParam2"] = 12;
req.sendAndLoad("http://example.com/somepath", resp, "GET");
In this case this will not do: there will be only one parameter having last value.
What are my options? I'm using actionscript 2.
Added
I guess, I can do something like that:
var url:String = myCustomFunctionForBuildingRequestString();
var resp:XML = new XML();
resp.onLoad = function(success:Boolean) {/*...*/};
resp.load(url);
But in that case I am loosing ability to do POST requests. Any alternatives?
Changing request is not appropriate.
The standard http way of sending array data is
http://example.com/?data[0]=1&data[1]=2
But this isn't wrong either (added from comment):
http://example.com/?data[]=1&data[]=2
Sending more parameters with the same name like you're doing, in practice means that all but the last item should be ignored. This is because when reading variables, the server overwrites (in memory) any item that has the same name as that one, because renaming a variable isn't good practice and never was.
I don't know much AS (none :p) but you'd access it as a list or array or whatever data structures it has.
Although POST may be having multiple values for the same key, I'd be cautious using it, since some servers can't even properly handle that, which is probably why this isn't supported ... if you convert "duplicate" parameters to a list, the whole thing might start to choke, if a parameter comes in only once, and suddendly you wind up having a string or something ... but i guess you know what you're doing ...
I am sorry to say so, but what you want to do, is not possible in pure AS2 ... the only 2 classes available for HTTP are LoadVars and XML ... technically there's also loadVariables, but it will simply copy properties from the passed object into the request, which doesn't change your problem, since properties are unique ...
if you want to stick to AS2, you need an intermediary tier:
a server to forward your calls. if you have access to the server, then you create a new endpoint for AS2 clients, which will decode the requests and pass them to the normal endpoint.
use javascript. with flash.external::ExternalInterface you can call JavaScript code. You need to define a callback for when the operation is done, as well as a JavaScript function that you can call (there are other ways but this should suffice). Build the request string inside flash, pump it to JavaScript and let JavaScript send it to the server in a POST request and get the response back to flash through the callback.
up to you to decide which one is more work ...
side note: in AS3, you'd use flash.net::URLLoader with dataFormat set to flash.net::URLLoaderDataFormat.TEXT, and then again encode parameters to a string, and send them.
Disclaimer; I've never used Actionscript and have no means for testing this.
Putting the same variable name with several values on the query string is the standard way of sending multi-value variables (for example form checkboxes) to web servers. If LoadVars is capable of sending multiple values then it seems plausible that the values should be stored in an array:
req["someParam1"] = ["foo","bar","bas"];
There also seems to be a decode function to LoadVars, what happens if you try to import the query string you want into the object?:
req.decode("someParam1=foo&someParam1=bar&someParam1=bas");
You cannot use loadvars like this - because data can be either 1 or 2 or 3, not all of them at the same time.
You can either pass it as a comma separated list:
var req:LoadVars = new LoadVars();
req["data"] = "1,2,3";
or as an xml string, and parse it at the server. I am not familiar with manipulating xml in AS2, but this is how you'd do it in AS3:
var xml:XML = <root/>;
xml.appendChild(<data>1</data>);
xml.appendChild(<data>2</data>);
xml.appendChild(<data>3</data>);
//now pass it to loadvars
req["data"] = xml.toXMLString();
The string you send is:
<root>
<data>1</data>
<data>2</data>
<data>3</data>
</root>

Resources