I am experimenting with firebase-queue. I saw the option for sanitizing. It's described in the doc as
sanitize - specifies whether the data object passed to the processing
function is sanitized of internal keys reserved for use by the queue.
Defaults to true.
What does it mean?
I am getting an error for not specifying { sanitize : false }
When the sanitize option is set, the queue sanitizes (or cleans) the input provided to the processing function so that it resembles that which the original client placed onto the queue, and doesn't contain any of the keys added by the implementation of the queue itself.
If, however, you rely on a key (usually the keys starting with an underscore, e.g. _id) that is added by the queue, and not the original client, you need to set sanitize: false so those keys are returned to your function and they're not undefined.
You can clearly see the difference with a simple processing function that just performs a console.log(data).
A quick note about why these keys are removed by default: Reading or writing directly to the location (as it looks like you're perhaps doing, by passing undefined into the client SDK child() method instead of data._id) is generally a bad idea from within the worker itself as writes performed directly are not guarded by the extensive transaction logic in the queue to prevent race conditions. If you can isolate the work to taking input from the provided data field, and returning outputs to the resolve() function, you'll likely have a better time scaling up your queue.
Related
When writing event based cloud functions for firebase firestore it's common to update fields in the affected document, for example:
When a document of users collection is updated a function will trigger, let's say we want to determine the user info state and we have a completeInfo: boolean property, the function will have to perform another update so that the trigger will fire again, if we don't use a flag like needsUpdate: boolean to determine if excecuting the function we will have an infinite loop.
Is there any other way to approach this behavior? Or the situation is a consequence of how the database is designed? How could we avoid ending up in such scenario?
I have a few common approaches to Cloud Functions that transform the data:
Write the transformed data to a different document than the one that triggers the Cloud Function. This is by far the easier approach, since there is no additional code needed - and thus I can't make any mistakes in it. It also means there is no additional trigger, so you're not paying for that extra invocation.
Use granular triggers to ensure my Cloud Function only gets called when it needs to actually do some work. For example, many of my functions only need to run when the document gets created, so by using an onCreate trigger I ensure my code only gets run once, even if it then ends up updating the newly created document.
Write the transformed data into the existing document. In that case I make sure to have the checks for whether the transformation is needed in place before I write the actual code for the transformation. I prefer to not add flag fields, but use the existing data for this check.
A recent example is where I update an amount in a document, which then needs to be fanned out to all users:
exports.fanoutAmount = functions.firestore.document('users/{uid}').onWrite((change, context) => {
let old_amount = change.before && change.before.data() && change.before.data().amount ? change.before.data().amount : 0;
let new_amount = change.after.data().amount;
if (old_amount !== new_amount) {
// TODO: fan out to all documents in the collection
}
});
You need to take care to avoid writing a function that triggers itself infinitely. This is not something that Cloud Functions can do for you. Typically you do this by checking within your function if the work was previously done for the document that was modified in a previous invocation. There are several ways to do this, and you will have to implement something that meets your specific use case.
I would take this approach from an execution time perspective, this means that the function for each document will be run twice. Each time when the document is triggered, a field lastUpdate would be there with a timestamp and the function only updates the document if the time is older than my time - eg 10 seconds.
What's the proper way to interrupt a long chain of compose or pipe functions ?
Let's say the chain doesn't need to run after the second function because it found an invalid value and it doesn't need to continue the next 5 functions as long as user submitted value is invalid.
Do you return an undefined / empty parameter, so the rest of the functions just check if there is no value returned, and in this case just keep passing the empty param ?
I don't think there is a generic way of dealing with that.
Often when working with algebraic data types, things are defined so that you can continue to run functions even when the data you would prefer is not present. This is one of the extremely useful features of types such as Maybe and Either for instance.
But most versions of compose or related functions don't give you an early escape mechanism. Ramda's certainly doesn't.
While you can't technically exit a pipeline, you can use pipe() within a pipeline, so there's no reason why you couldn't do something like below to 'exit' (return from a pipeline or kick into another):
pipe(
// ... your pipeline
propOr(undefined, 'foo'), // - where your value is missing
ifElse(isNil, always(undefined), pipe(
// ...the rest of your pipeline
))
)
I recognized that (insert/delete)-XQueries executed with the BaseX client always returning an empty string. I find this very confusing or unintuitive.
Is there a way to find out if the query was "successful" without querying the database again (and using potentially buggy "transitive" logic like "if I deleted a node, there must be 'oldNodeCount-1' nodes in the XML")?
XQuery Update statements do not return anything -- that's how they are defined. But you're not the only one who does not like those restrictions, and BaseX added two ways around this limitation:
Returning Results
By default, it is not possible to mix different types of expressions
in a query result. The outermost expression of a query must either be
a collection of updating or non-updating expressions. But there are
two ways out:
The BaseX-specific update:output() function bridges this gap: it caches the results of its arguments at runtime and returns them after
all updates have been processed. The following example performs an
update and returns a success message:
update:output("Update successful."), insert node <c/> into doc('factbook')/mondial
With the MIXUPDATES option, all updating constraints will be turned off. Returned nodes will be copied before they are modified by
updating expressions. An error is raised if items are returned within
a transform expression.
If you want to modify nodes in main memory, you can use the transform
expression.
The transform expression will not help you, as you seem to modify the data on disk. Enabling MIXUPDATES allows you to both update the document and return something at the same time, for example running something like
let $node := <c/>
return ($node, insert node $node into doc('factbook')/mondial)
MIXUPDATES allows you to return something which can be further processed. Results are copied before being returned, if you run multiple updates operations and do not get the expected results, make sure you got the concept of the pending update list.
The db:output() function intentionally breaks its interface contract: it is defined to be an updating function (not having any output), but at the same time it prints some information to the query info. You cannot further process these results, but the output can help you debugging some issues.
Pending Update List
Both ways, you will not be able to have an immediate result from the update, you have to add something on your own -- and be aware updates are not visible until the pending update list is applied, ie. after the query finished.
Compatibility
Obviously, these options are BaseX-specific. If you strongly require compatible and standard XQuery, you cannot use these expressions.
I'm writing a program in Scala and trying to remain as functionally pure as is possible. The problem I am facing is not Scala specific; it's more to do with trying to code functionally. The logic for the function that I have to code goes something like:
Take some value of type A
Use this value to generate log information
Log this information by calling a function in an external library and evaluate the return status of the logging action (ie was it a successful log or did the log action fail)
Regardless of whether the log succeeded or failed, I have to return the input value.
The reason for returning the input value as the output value is that this function will be composed with another function which requires a value of type A.
Given the above, the function I am trying to code is really of type A => A i.e. it accepts a value of type A and returns a value of type A but in between it does some logging. The fact that I am returning the same value back that I inputted makes this function boil down to an identity function!
This looks like code smell to me and I am wondering what I should do to make this function cleaner. How can I separate out the concerns here? Also the fact that the log function goes away and logs information means that really I should wrap that call in a IO monad and call some unsafePerformIO function on it. Any ideas welcome.
What you're describing sounds more like debugging than logging. For example, Haskell's Debug.Trace.trace does exactly that and its documentation states: "These can be useful for investigating bugs or performance problems. They should not be used in production code."
If you're doing logging, the logging function should only log and have no further return value. As mentioned by #Bartek above, its type would be A -> IO (), i.e. returning no information () and having side-effects (IO). For example Haskell's hslogger library provides such functions.
The problem
One data source generating data in format {key, value}
Multiple receivers each waiting for different key
Example
Getting data is run in loop. Sometimes I will want to get next value labelled with key by using
Value = MyClass:GetNextValue(Key)
I want my code to stop there until the value is ready (making some sort of future(?) value). I've tried using simple coroutines, but they work only when waiting for any data.
So the question I want to ask is something like How to implement async values in lua using coroutines or similar concept (without threads)?
Side notes
The main processing function will, apart from returning values to waiting consumers, process some of incoming data (say, labeled with special key) itself.
The full usage context should look something like:
-- in loop
ReceiveData()
ProcessSpecialData()
--
-- Called outside the loop:
V = RequestDataWithGivenKey(Key)
How to implement async values
You start by not implementing async values. You implement async functions: you don't get the value back until has been retrieved.
First, your code must be in a Lua coroutine. I'll assume you understand the care and feeding of coroutines. I'll focus on how to implement RequestDataWithGivenKey:
function RequestDataWithGivenKey(key)
local request = FunctionThatStartsAsyncGetting(key)
if(not request:IsComplete()) then
coroutine.yield()
end
--Request is complete. Return the value.
return request:GetReturnedValue()
end
FunctionThatStartsAsyncGetting returns a request back to the function. The request is an object that stores all of the data needs to process the specific request. It represents asking for the value. This should be a C-function that starts the actual async getting.
The request will be either a userdata or an encapsulated Lua table that stores enough information to communicate with the C-code that's doing the async fetching. IsComplete uses the internal request data to see if that request has completed. GetReturnedValue can only be called when IsComplete returns true; it puts the value on the Lua stack, so that this function can return it.
Your external code simply needs to handle the async stuff internally. Between resumes of these Lua coroutines, you'll need to pump whatever async stuff is doing the fetching, if there are outstanding requests.