Multiple set operations - which will be performed last? - firebase

I have to sync file with firebase when it changes. I know how to set up handler when file changes and then call Firebase.set. But what will happen if file is quite big (9mb) and it takes 30 seconds for example for set operation to finish and in the mean time file changes again? What happens when set on some locations is in progress and second set on the same location is performed?
Fb will automatically cancel in-progress operation and start it again?
Or it will wait till in-progress opeartion is finished and then start it again?
Or it will be the best to monitor if set on goven location is if progress, and if it is then queue next set operation after the in-progress finishes?
sample test
ref1 = new Firebase('https://my_firebase.firebaseio.com/some_location')
ref2 = new Firebase('https://my_firebase.firebaseio.com/some_location')
ref1.set({ some_large_data: 'abcef...' });
ref2.set({ some_large_data: '12345...' });
// which set will take effect on server? second? Or random (second that actually completes)?

If I understand you correctly, your question is about this flow:
client A starts uploading a huge file
client B start uploading a smaller file
client B's upload completes
client A's upload completes
Firebase's servers will handle the updates transactionally, so no partial updates will ever occur.
The last update to (completely) reach the Firebase server, is the one that will end up in the node.
If you want control over this type of concurrent access, you're probably better of using a worker queue. Of course Firebase is extremely well suited for synchronizing access to such a worker queue too. :-)
So in the flow above, no writing to your Firebase node will occur on the server until after step 3. After step 4, Firebase will write the huge file and "forgot" that the smaller file from client B ever existed. If you want to prevent such unintended overwriting, you could consider step 0 being "Client locks the upload location with a transaction call". This would essentially implement pessimistic locking on top of Firebase's optimistic locking approach.

So after the tests it seems that Firebase will perform all set operations on server sequentialy one after another, without cancelling any of them even when next one is done locally before previous completed remotely
This means that if you call set() 100 times, each set operation will affect remote data (database). For my use case this is bad because on every file change it will be entirely uploaded to firebase. I will have to write my own algorithm that will decide when to sync data with firebase.
Test consists of two scripts - one is writing data to same firebase location multiple times and second one is reading data from that location.
set_data.coffee
Firebase = require 'firebase'
firebaseUrl = 'https://my_firebase.firebaseio.com/same_location'
ref1 = new Firebase firebaseUrl
ref2 = new Firebase firebaseUrl
ref3 = new Firebase firebaseUrl
ref1.on 'value', (snapshot) -> console.log 'local same_location changed to ' + Buffer.byteLength((snapshot.val() or ''), 'utf8') + ' bytes'
data1 = ''
data2 = ''
data3 = ''
data1 += '1' for i in [1..1000*1000*2]
data2 += '2' for i in [1..1000*1000*3]
data3 += '3' for i in [1..100]
console.log 'data1 ' + Buffer.byteLength(data1, 'utf8') + ' bytes'
console.log 'data2 ' + Buffer.byteLength(data2, 'utf8') + ' bytes'
console.log 'data3 ' + Buffer.byteLength(data3, 'utf8') + ' bytes'
t1 = new Date()
ref1.set data1, (err) ->
elapsed = new Date().getTime() - t1.getTime()
console.log 'set1 finished ('+elapsed+'ms) - err is '+err
t2 = new Date()
ref2.set data2, (err) ->
elapsed = new Date().getTime() - t2.getTime()
console.log 'set2 finished ('+elapsed+'ms) - err is '+err
t3 = new Date()
ref3.set data3, (err) ->
elapsed = new Date().getTime() - t3.getTime()
console.log 'set3 finished ('+elapsed+'ms) - err is '+err
get_data.coffee
Firebase = require 'firebase'
firebaseUrl = 'https://my_firebase.firebaseio.com/same_location'
ref1 = new Firebase firebaseUrl
ref1.on 'value', (snapshot) -> console.log 'remote same_location changed to ' + Buffer.byteLength((snapshot.val() or ''), 'utf8') + ' bytes'
output from set_data.coffee
data1 2000000 bytes
data2 3000000 bytes
data3 100 bytes
local same_location changed to 2000000 bytes
local same_location changed to 3000000 bytes
local same_location changed to 100 bytes
set1 finished (118314ms) - err is null
set2 finished (149844ms) - err is null
set3 finished (149845ms) - err is null
output from get_data.coffee
remote same_location changed to 0 bytes
remote same_location changed to 2000000 bytes
remote same_location changed to 3000000 bytes
remote same_location changed to 100 bytes

Related

Firestore Native Client SDK cold start? (React Native Firebase)

In short: Is there some kind of cold start when connecting to Firestore directly from Client SDK
Hey. I'm using Firestore client sdk in Andoid and IOS application through #react-native-firebase.
Everything works perfectly but I have noticed weird behavior I haven't found explanation.
I have made logging to see how long it takes from user login to retrieve uid corresponding data from Firestore and this time has been ~0.4-0.6s. This is basically the whole onAuthStateChanged workflow.
let userLoggedIn: Date;
let userDataReceived: Date;
auth().onAuthStateChanged(async (user) => {
userLoggedIn = new Date();
const eventsRetrieved = async (data: UserInformation) => {
userDataReceived = new Date();
getDataDuration = `Get data duration: ${(
(userDataReceived.getTime() - userLoggedIn.getTime()) /
1000
).toString()}s`;
console.log(getDataDuration)
// function to check user role and to advance timing logs
onUserDataReceived(data);
};
const errorRetrieved = () => {
signOut();
authStateChanged(false);
};
let unSub: (() => void) | undefined;
if (user && user.uid) {
const userListener = () => {
return firestore()
.collection('Users')
.doc(user.uid)
.onSnapshot((querySnapshot) => {
if (querySnapshot && querySnapshot.exists) {
const data = querySnapshot.data() as UserInformation;
data.id = querySnapshot.id;
eventsRetrieved(data);
} else errorRetrieved();
});
};
unSub = userListener();
} else {
if (typeof unSub === 'function') unSub();
authStateChanged(false);
}
});
Now the problem. When I open the application ~30-50 minutes after last open the time to retrieve uid corresponding data from Firestore will be ~3-9s. What is this time and why does it happen? And after I open the application right after this time will be low again ~0.4-0-6s.
I have been experiencing this behavior for weeks. It is hard to debug as it happens only on build application (not in local environments) and only between +30min interval.
Points to notice
The listener query (which I'm using in this case, I have used also simple getDoc function) is really simple and focused on single document and all project configuration works well. Only in this time interval, which seems just like cold start, the long data retrieval duration occurs.
Firestore Rules should not be slowing the query as subsequent request are fast. Rules for 'Users' collection are as follows in pseudo code:
function checkCustomer(){
let data =
get(/databases/$(database)/documents/Users/$(request.auth.uid)).data;
return (resource.data.customerID == data.customerID);
}
match /Users/{id}{
allow read:if
checkUserRole() // Checks user is logged in and has certain customClaim
&& idComparison(request.auth.uid, id) // Checks user uid is same as document id
&& checkCustomer() // User can read user data only if data is under same customer
}
Device cache doesn't seem to affect the issue as application's cache can be cleaned and the "cold start" still occurs
Firestore can be called from another environment or just another mobile device and this "cold start" will occur to devices individually (meaning that it doesn't help if another device opened the application just before). Unlike if using Cloud Run with min instances, and if fired from any environment the next calls right after will be fast regardless the environment (web or mobile).
EDIT
I have tested this also by changing listener to simple getDoc call. Same behavior still happens on a build application. Replacing listener with:
await firestore()
.collection('Users')
.doc(user.uid)
.get()
.then(async document => {
if (document.exists) {
const data = document.data() as UserInformation;
if (data) data.id = document.id;
eventsRetrieved(data);
}
});
EDIT2
Testing further there has been now 3-15s "cold start" on first Firestore getDoc. Also in some cases the timing between app open has been only 10 minutes so the minimum 30 min benchmark does not apply anymore. I'm going to send dm to Firebase bug report team to see things further.
Since you're using React Native, I assume that the documents in the snapshot are being stored in the local cache by the Firestore SDK (as the local cache is enabled by default on native clients). And since you use an onSnapshot listener it will actually re-retrieve the results from the server if the same listener is still active after 30 minutes. From the documentation on :
If offline persistence is enabled and the listener is disconnected for more than 30 minutes (for example, if the user goes offline), you will be charged for reads as if you had issued a brand-new query.
The wording here is slightly different, but given the 30m mark you mention, I do expect that this is what you're affected by.
In the end I didn't find straight answer why this cold start appeared. I ended up changing native Client SDK to web Client SDK which works correctly first data fetch time being ~0.6s (always 0.5-1s). Package change fixed the issue for me while functions to fetch data are almost completely identical.

hiredis run Sync command from Async Context

I'm using the hiredis C client library to interact with Redis in an async context.
On some point of my workflow I have to make a Sync call to Redis but I'm not being able to get a successful response from Redis.
I'm not sure whether I can issue a sync command to Redis from an async context but...
I have something like this
redisAsyncContext * redis_ctx;
redisReply * reply;
// ...
reply = redisCommand(&(redis_ctx->c), COMMAND);
After redisCommand call, my reply is NULL what is documented as an error condition and my redis_ctx->c is like
err = 0
errstr = '\000' <repeats 127 times>
fd = 11
flags = 2
obuf = "*5\r\n$4\r\nEVAL\r\n$215\r\n\"math.randomseed(tonumber(ARGV[1])) local keys = redis.call('hkeys',KEYS[1]) if #keys == 0 then return nil end local key = keys[math.random(#keys)] local value = redis.call('hget', KEYS[1], key) return {key, value}\"\r\n$1\r\n1\r\n$0\r\n\r\n$1\r\n1\r\n"
reader = 0x943730
I can't figure out whether the command was issued or not.
Hope it's not too late. I'm not so expert about Redis, but if you need to make a Sync call to Redis, why would you use an AsyncContext?
If you just use redisCommand with a redisContext everything should be fine.
Assuming that variable ctx has been declared as
redisContext *ctx;
you can use redisCommand like this:
reply = (redisReply *)redisCommand(ctx, "HGET %s %s", hash, key);

Strange behaviour of firebase transaction

My firebase looks like this:
This is test code (coffee script):
Firebase = require 'firebase'
ref = new Firebase 'https://my_firebase.firebaseio.com/items'
ref.once 'child_added', (snapshot) ->
childRef = snapshot.ref()
console.log "child_added", childRef.toString(), snapshot.val()
childRef.transaction(
(data) ->
console.log 'transaction on data', data
return if !data or data.my_key isnt 'my_val'
data.my_key = 'new_val'
return data
,
(err, commited, snapshot) ->
if err
console.error 'error', err
return
console.log 'commited? '+commited
console.log 'server data', snapshot.val()
,
false
)
And output:
child_added https://my_firebase.firebaseio.com/items/item1 { my_key: 'my_val' }
transaction on data null
commited? false
server data null
Same happens when third parameter of transaction(...) is true.
To make this code work, I have to change ref.once 'child_added', (snapshot) -> to ref.on 'child_added', (snapshot) -> (once to on). After this change output is:
child_added https://my_firebase.firebaseio.com/items/item1 { my_key: 'my_val' }
transaction on data { my_key: 'my_val' }
commited? true
server data { my_key: 'new_val' }
It seems that for some reason when I am using once data are not synced properly and local snapshot is not updated and transaction "thinks" that there is no data under the ref. Is it a bug or I am doing something wrong? I know about transactions that updateFunction can be called more than one time, and about third parameter (I have tried true and false options for it) but still I can't understand why transaction does not work when using once to obtain a child.
The transaction should eventually succeed, and run on the correct state of the data, but will initially run in an "uncached" state, meaning it will run against the client's local copy of the data (likely to be null), try to commit the change to the server (which will fail), and then re-try the transaction.
This is normal, and expected. If, however, the transaction does not ever succeed, I would recommend reaching out to the support folks at support#firebase.com to continue troubleshooting the problem.

Download large file with LuaSocket's HTTP module while keeping UI responsive

I would like to use LuaSocket's HTTP module to download a large file while displaying progress in the console and later on in a GUI. The UI must never block, not even when the server is unresponsive during the transfer. Additionally, creating a worker thread to handle the download is not an option.
Here's what I got so far:
local io = io
local ltn12 = require("ltn12")
local http = require("socket.http")
local fileurl = "http://www.example.com/big_file.zip"
local fileout_path = "big_file.zip"
local file_size = 0
local file_down = 0
-- counter filter used in ltn12
function counter(chunk)
if chunk == nil then
return nil
elseif chunk == "" then
return ""
else
file_down = file_down + #chunk
ui_update(file_size, file_down) -- update ui, run main ui loop etc.
return chunk -- return unmodified chunk
end
end
-- first request
-- determine file size
local r, c, h = http.request {
method = "HEAD",
url = fileurl
}
file_size = h["content-length"]
-- second request
-- download file
r, c, h = http.request {
method = "GET",
url = fileurl,
-- set our chain, count first then write to file
sink = ltn12.sink.chain(
counter,
ltn12.sink.file(io.open(fileout_path, "w"))
)
}
There are a few problems with the above, ignoring error checking and hard-coding:
It requires 2 HTTP requests when it is possible with only 1 (a normal GET request also sends content-length)
If the server is unresponsive, then the UI will also be unresponsive, as the filter only gets called when there is data to process.
How could I do this making sure the UI never blocks?
There is an example on non-preemptive multithreading in Programming in Lua that uses non-blocking luasocket calls and coroutines to do a multiple parallel downloads. It should be possible to apply the same logic to your process to avoid blocking. I can only add that you should consider calling this logic from IDLE event in your GUI (if there is such a thing) to avoid getting "attempt to yield across metamethod/c-call boundary" errors.

is node.js' console.log asynchronous?

Are console.log/debug/warn/error in node.js asynchrounous? I mean will javascript code execution halt till the stuff is printed on screen or will it print at a later stage?
Also, I am interested in knowing if it is possible for a console.log to NOT display anything if the statement immediately after it crashes node.
Update: Starting with Node 0.6 this post is obsolete, since stdout is synchronous now.
Well let's see what console.log actually does.
First of all it's part of the console module:
exports.log = function() {
process.stdout.write(format.apply(this, arguments) + '\n');
};
So it simply does some formatting and writes to process.stdout, nothing asynchronous so far.
process.stdout is a getter defined on startup which is lazily initialized, I've added some comments to explain things:
.... code here...
process.__defineGetter__('stdout', function() {
if (stdout) return stdout; // only initialize it once
/// many requires here ...
if (binding.isatty(fd)) { // a terminal? great!
stdout = new tty.WriteStream(fd);
} else if (binding.isStdoutBlocking()) { // a file?
stdout = new fs.WriteStream(null, {fd: fd});
} else {
stdout = new net.Stream(fd); // a stream?
// For example: node foo.js > out.txt
stdout.readable = false;
}
return stdout;
});
In case of a TTY and UNIX we end up here, this thing inherits from socket. So all that node bascially does is to push the data on to the socket, then the terminal takes care of the rest.
Let's test it!
var data = '111111111111111111111111111111111111111111111111111';
for(var i = 0, l = 12; i < l; i++) {
data += data; // warning! gets very large, very quick
}
var start = Date.now();
console.log(data);
console.log('wrote %d bytes in %dms', data.length, Date.now() - start);
Result
....a lot of ones....1111111111111111
wrote 208896 bytes in 17ms
real 0m0.969s
user 0m0.068s
sys 0m0.012s
The terminal needs around 1 seconds to print out the sockets content, but node only needs 17 milliseconds to push the data to the terminal.
The same goes for the stream case, and also the file case gets handle asynchronous.
So yes Node.js holds true to its non-blocking promises.
console.warn() and console.error() are blocking. They do not return until the underlying system calls have succeeded.
Yes, it is possible for a program to exit before everything written to stdout has been flushed. process.exit() will terminate node immediately, even if there are still queued writes to stdout. You should use console.warn to avoid this behavior.
My Conclusion , after reading Node.js 10.* docs (Attached below). is that you can use console.log for logging , console.log is synchronous and implemented in low level c .
Although console.log is synchronic, it wont cause a performance issue only if you are not logging huge amount of data.
(The command line example below demonstrate, console.log async and console.error is sync)
Based on Node.js Doc's
The console functions are synchronous when the destination is a terminal or a file (to avoid lost messages in case of premature exit) and asynchronous when it's a pipe (to avoid blocking for long periods of time).
That is, in the following example, stdout is non-blocking while stderr is blocking:
$ node script.js 2> error.log | tee info.log
In daily use, the blocking/non-blocking dichotomy is not something you should worry about unless you > log huge amounts of data.
Hope it helps
Console.log is asynchronous in windows while it is synchronous in linux/mac. To make console.log synchronous in windows write this line at the start of your
code probably in index.js file. Any console.log after this statement will be considered as synchronous by interpreter.
if (process.stdout._handle) process.stdout._handle.setBlocking(true);
You can use this for synchrounous logging:
const fs = require('fs')
fs.writeSync(1, 'Sync logging\n')

Resources