How can Postfix filter email (DKIM) without keeping the message in memory and without writing it to disc twice? - postfix-mta

I need to DKIM sign possibly huge emails (up to 150MB). I’m running Postfix and so far want to keep that MTA.
Conceptually DKIM needs to go over the email twice: once to calculate and sign the checksum and once to write it out with the result of the previous step in the headers.1
A DKIM signer can do this by either keeping the message in memory (a no-go for me) or write it to a file.
For the task at hand I want to use a Postfix (filter) mechanism that allows me to do that without keeping the message in memory and without having it written to disc twice!
So far I see that the after-queue content filter mechanism forces you to write the email to disc again, and for no good reason! It should instead pass a seekable file descriptor to filter’s stdin, but the implementation does not do it.
The alternative, the before-queue milter, is insufficiently documented for me to see if it avoids keeping the message in memory and avoids writing the original mail to file twice. – This is why I have opendkim in my tags: maybe those experts know how the milter API can avoid and how opendkim indeed does avoid these pitfalls.
1...as Posix file systems have no prepend operation

Postfix queue files are not flat mails. Adding a header does not require a rewrite. To take advantage of that use the milter interface. The answers I got from postfix-users make me believe mail is not kept in memory during milter processing either. At least not by Postfix.
Using the pipe mechanism with the after-queue content filter would not do it as mentioned in the question. A write out to file to avoid the mail in memory would probably be reasonable enough though and better than keeping it in memory.
While the milter interface is good enough for DKIM, I’ld like to list it’s shortcomings (all of them could have been avoided):
you can not modify neither header nor body(-parts) before the entire message is received
no proper in transit piping
you can not back reference no header and no body once you are allowed to replace/modify content
milter client needs to keep a copy during reception phase if it needs the information
the body can only be replaced in its entirety
header substitution/deletion require name and index, but milter server is not passing the index number (or any other opaque unique reference)
milter client needs to count headers for any header it might later decide to replace
Postfix has some shortcomings as well:
postfix offers 3 filter mechanisms at 2 positions
you can not mix and match mechanism and position
the most appropriate mechanism for DKIM is milter
the most appropriate place for DKIM signing is after queue
after queue milter is not available
within limits that would be possible
postfix can actually already fake SMTP/milter environments to make milters work in new areas (“non-smtp-milter”)
no mechanism exploits all benefits of what would be possible with the current queue data structure
not needed for DKIM, though; just saying

Related

Is end-to-end encryption possible with Realm Mobile Platform?

On the client device, a synced Realm can be setup with an encryption key that's unique to the user and stored on the device keychain, so data is stored encrypted on the client.
(related question: Can "data at rest" in the Realm Mobile Platform be encrypted?)
Realm Object Server and the clients can communicate via TLS, so data is encrypted in transit.
But the Realm Object Server does not appear to store data using encryption, since an admin user is able to access all the database contents via Realm Browser (https://realm.io/docs/realm-object-server/#data-browser).
Is it possible to setup Realm Mobile Platform so user data is encrypted end-to-end, such as no one but the user (not even server admins) have access to the decryption key?
Due to the way we handle conflict resolution, we currently are unable to provide end-to-end encryption, as you correctly deduced. Let's go a tiny bit into detail with regards to the conflict resolution.
In order to handle conflicts the way we do, we use something called operational transformation. This means that instead of sending the data over directly, the client tells the server the intent of the change, rather than the result. For example, when two users edit a text field, we would tell the server insert(data='new text', offset=0) because the first user prepended data at the beginning of the text field, and insert(data='some more stuff', offset=10) because the second user added data in the middle of the field. These two separate operations allow the server to uniquely resolve what happened, and have conflictless resolution of the two writes.
This also means that if we encrypt everything, the server would be unable to handle this conflict resolution.
This being said, that's for the current version. We do have a number of thoughts on how we could handle this in the future, while providing (some degree) of encryption. Mainly this would mean more work on the client, and maybe find a new algorithm that would allow us to tell the client the intent, and let the client figure out how to merge everything. This is a quadratic problem, though, so we're reticent to putting too much work on the client side, as it could really drain the battery.
That might be acceptable for some users, which is why we're looking into it. Basically, there will be a trade-off. As the old adage goes: fast, secure, convenient: pick two. We just have to figure out how to handle this properly.
I just opened a feature request around possibly using Tresorit's ZeroKit to solve the end-to-end encryption question posed. Sounds like the conflict resolution implementation will still cause an issue though, but maybe there is a different conflict resolution level that can be applied for those that don't need the realtime dynamic editing of individual data fields (like patient health data, where only a single clinician ever really edits a record at any given time).
https://github.com/realm/realm-mobile-platform/issues/96

How to encrypt files with AES256-GCM in golang?

AES256-GCM could be implemented in go as https://gist.github.com/cannium/c167a19030f2a3c6adbb5a5174bea3ff
However, Seal method of interface cipher.AEAD has signature:
Seal(dst, nonce, plaintext, additionalData []byte) []byte
So for very large files, one must read all file contents into memory, which is unacceptable.
A possible way is to implement Reader/Writer interfaces on Seal and Open, but shouldn't that be solved by those block cipher "modes" of AEAD? So I wonder if this is a design mistake of golang cipher lib, or I missed something important with GCM?
AEADs should not be used to encrypt large amounts of data in one go. The API is designed to discourage this.
Encrypting large amounts of data in a single operation means that a) either all the data has to be held in memory or b) the API has to operate in a streaming fashion, by returning unauthenticated plaintext.
Returning unauthenticated data is dangerous it's not
hard to find people on the internet suggesting things like gpg -d your_archive.tgz.gpg | tar xzbecause the gpg command also provides a streaming interface.
With constructions like AES-GCM it's, of course, very easy to
manipulate the plaintext at will if the application doesn't
authenticate it before processing. Even if the application is careful
not to "release" plaintext to the UI until the authenticity has been
established, a streaming design exposes more program attack surface.
By normalising large ciphertexts and thus streaming APIs, the next
protocol that comes along is more likely to use them without realising
the issues and thus the problem persists.
Preferably, plaintext inputs would be chunked into reasonably large
parts (say 16KiB) and encrypted separately. The chunks only need to be
large enough that the overhead from the additional authenticators is
negligible. With such a design, large messages can be incrementally
processed without having to deal with unauthenticated plaintext, and
AEAD APIs can be safer. (Not to mention that larger messages can be
processed since AES-GCM, for one, has a 64GiB limit for a single
plaintext.)
Some thought is needed to ensure that the chunks are in the correct
order, i.e. by counting nonces, that the first chunk should be first, i.e. by starting the nonce at zero, and that the last chunk should be
last, i.e. by appending an empty, terminator chunk with special
additional data. But that's not hard.
For an example, see the chunking used in miniLock.
Even with such a design it's still the case that an attacker can cause
the message to be detectably truncated. If you want to aim higher, an
all-or-nothing transform can be used, although that requires two
passes over the input and isn't always viable.
It's not a design mistake. It's just that the API is incomplete in that regard.
GCM is a streaming mode of operation and therefore able to handle encryption and decryption on demand without stopping the stream. It seems that you cannot reuse the same AEAD instance with the previous MAC state, so you cannot directly use this API for GCM encryption.
You could implement your own GCM on top of crypto.NewCTR and your own implementation of GHASH.

Rebus Pub / Sub Encryption

Is it possible to turn of message encryption on subscription messages?
We want to let external parties subscribe to messages via AzureServiceBus and SAS keys pr queue, but we use Encryption of messages and don't want to expose that key to external parties.
One way is to create another bus to that, but that just seems complicated, is there another way?
Also want to thank you for quick repons here on Stack overflow and a great product.
You could probably achieve what you want by wrapping the outgoing step that encrypts the message body in something that would decide to invoke the step (and thus encrypt the body) or not, depending on the message's intent value in the headers, but I think that is a little bit icky.
The reason I think it is icky, is because I think you should treat your encrypted bus as an internally used bus only, and then have a completely separate bus that you use to publish messages to third parties.
This separation would have the benefit that you can update internal message schemas etc. without worrying about breaking anything in the integration with your external parties.

Consequences of POST not being idempotent (RESTful API)

I am wondering if my current approach makes sense or if there is a better way to do it.
I have multiple situations where I want to create new objects and let the server assign an ID to those objects. Sending a POST request appears to be the most appropriate way to do that.
However since POST is not idempotent the request may get lost and sending it again may create a second object. Also requests being lost might be quite common since the API is often accessed through mobile networks.
As a result I decided to split the whole thing into a two-step process:
First sending a POST request to create a new object which returns the URI of the new object in the Location header.
Secondly performing an idempotent PUT request to the supplied Location to populate the new object with data. If a new object is not populated within 24 hours the server may delete it through some kind of batch job.
Does that sound reasonable or is there a better approach?
The only advantage of POST-creation over PUT-creation is the server generation of IDs.
I don't think it worths the lack of idempotency (and then the need for removing duplicates or empty objets).
Instead, I would use a PUT with a UUID in the URL. Owing to UUID generators you are nearly sure that the ID you generate client-side will be unique server-side.
well it all depends, to start with you should talk more about URIs, resources and representations and not be concerned about objects.
The POST Method is designed for non-idempotent requests, or requests with side affects, but it can be used for idempotent requests.
on POST of form data to /some_collection/
normalize the natural key of your data (Eg. "lowercase" the Title field for a blog post)
calculate a suitable hash value (Eg. simplest case is your normalized field value)
lookup resource by hash value
if none then
generate a server identity, create resource
Respond => "201 Created", "Location": "/some_collection/<new_id>"
if found but no updates should be carried out due to app logic
Respond => 302 Found/Moved Temporarily or 303 See Other
(client will need to GET that resource which might include fields required for updates, like version_numbers)
if found but updates may occur
Respond => 307 Moved Temporarily, Location: /some_collection/<id>
(like a 302, but the client should use original http method and might do automatically)
A suitable hash function might be as simple as some concatenated fields, or for large fields or values a truncated md5 function could be used. See [hash function] for more details2.
I've assumed you:
need a different identity value than a hash value
data fields used
for identity can't be changed
Your method of generating ids at the server, in the application, in a dedicated request-response, is a very good one! Uniqueness is very important, but clients, like suitors, are going to keep repeating the request until they succeed, or until they get a failure they're willing to accept (unlikely). So you need to get uniqueness from somewhere, and you only have two options. Either the client, with a GUID as Aurélien suggests, or the server, as you suggest. I happen to like the server option. Seed columns in relational DBs are a readily available source of uniqueness with zero risk of collisions. Round 2000, I read an article advocating this solution called something like "Simple Reliable Messaging with HTTP", so this is an established approach to a real problem.
Reading REST stuff, you could be forgiven for thinking a bunch of teenagers had just inherited Elvis's mansion. They're excitedly discussing how to rearrange the furniture, and they're hysterical at the idea they might need to bring something from home. The use of POST is recommended because its there, without ever broaching the problems with non-idempotent requests.
In practice, you will likely want to make sure all unsafe requests to your api are idempotent, with the necessary exception of identity generation requests, which as you point out don't matter. Generating identities is cheap and unused ones are easily discarded. As a nod to REST, remember to get your new identity with a POST, so it's not cached and repeated all over the place.
Regarding the sterile debate about what idempotent means, I say it needs to be everything. Successive requests should generate no additional effects, and should receive the same response as the first processed request. To implement this, you will want to store all server responses so they can be replayed, and your ids will be identifying actions, not just resources. You'll be kicked out of Elvis's mansion, but you'll have a bombproof api.
But now you have two requests that can be lost? And the POST can still be repeated, creating another resource instance. Don't over-think stuff. Just have the batch process look for dupes. Possibly have some "access" count statistics on your resources to see which of the dupe candidates was the result of an abandoned post.
Another approach: screen incoming POST's against some log to see whether it is a repeat. Should be easy to find: if the body content of a request is the same as that of a request just x time ago, consider it a repeat. And you could check extra parameters like the originating IP, same authentication, ...
No matter what HTTP method you use, it is theoretically impossible to make an idempotent request without generating the unique identifier client-side, temporarily (as part of some request checking system) or as the permanent server id. An HTTP request being lost will not create a duplicate, though there is a concern that the request could succeed getting to the server but the response does not make it back to the client.
If the end client can easily delete duplicates and they don't cause inherent data conflicts it is probably not a big enough deal to develop an ad-hoc duplication prevention system. Use POST for the request and send the client back a 201 status in the HTTP header and the server-generated unique id in the body of the response. If you have data that shows duplications are a frequent occurrence or any duplicate causes significant problems, I would use PUT and create the unique id client-side. Use the client created id as the database id - there is no advantage to creating an additional unique id on the server.
I think you could also collapse creation and update request into only one request (upsert). In order to create a new resource, client POST a “factory” resource, located for example at /factory-url-name. And then the server returns the URI for the new resource.
Why don't you use a request Id on your originating point (your originating point should do two things, send a GET request on request_id=2 to see if it's request has been applied - like a response with person created and created as part of request_id=2
This will ensure your originating system knows what was the last request that was executed as the request id is stored in db.
Second thing, if your originating point finds that last request was still at 1 not yet 2, then it may try again with 3, to make sure if by any chance just the GET response has gotten lost but the request 2 was created in the db.
You can introduce number of tries for your GET request and time to wait before firing again a GET etc kinds of system.

Determining the set of message destinations at runtime in BizTalk application

I’m a complete newbie at BizTalk and I need to create a BizTalk 2006 application which broadcasts messages in a specific way. I’m not asking for a complete solution, but for advise and guidelines, which capabilities of BizTalk I should use.
There’s a message source, for simplicity, say, a directory where the user adds files to publish them. There are several subscribers, each having a directory to receive published files. The number of subscribers can vary in the course of exploitation of the program. There are also some rules which determine if a particular subscriber needs to receive a particular file, based on the filename. For example, each subscriber has a pattern or mask of filename which files they receives must match. Those rules (for example, patterns) can change in time as well.
I don’t know how to do this. Create a set of send ports at runtime, each for each destination? Is it possible? Use one port changing its binding? Would it work correctly with concurrent sendings? Are there other ways?
EDIT
I realized my question may be to obscure and general to prefer one answer over another to accept. So I just upvoted them.
You could look at using dynamic send ports to achieve this - if your subscribers are truly dynamic. This introduces a bit of complexity since you'll need to use an orchestration to configure the send port's properties based on your rules.
If you can, try and remove the complexity. If you know that you don't need to be truly dynamic when adding subscribers (i.e. a subscriber and it's rules can be configured one time only) and you have a manageable number of subscribers then I would suggest configuring each subscriber using it's own send port and use a filter to create subscriptions based on message context properties. The beauty of this approach is that you don't need to create and deploy an orchestration and this becomes a highly performant and scalable solution.
If the changes to the destination are going to be frequent, you are right in seeking a more dynamic solution. One nice solution is using dynamic send ports and the Business Rules Engine. You create rule set for the messages you are receving. This could be based on a destination property or customer ID in the message. Using these facts, the rules engine can return a bunch of information like file mask, server name, ip address of deleiver server, etc. You can thenuse this information to configure the dynamic send in the orchestration. The real nice thing here is that you can update the rule set in the rules engine without redeploying the whole solution. As a newb, these are some advanced concepts, but not as diificult as you may think.
For a simpler solution, you might want to look at setting the FILE Send adapters properties via it's Propery Schema (ie. File name, Directory, etc.). You could pull these values from a database with a helper class inside an expresison shape. On each message ogig out, use the property shcema to set where the message will be sent and named. This way, you just update the database as things change.
Good Luck!

Resources