I want to factorize my code in Cloud Functions in order to improve readability and maintenance. The code below works but after waiting for all Promises to complete with Promises.all(), the code timeout.
The things I don't understand is that :
It works great and complete without timeout when toiletJsonObject["fields"]["adresse"] = formatAddress(toiletJsonObject["fields"]["adresse"]) is commented
If it works without the line above, the timeout should be due to the formatAddress() function. However, this function in not an async one and just return a string synchronously. Maybe that's what I misunderstand.
So my questions are :
How to correct my code to avoid timeout?
what's the best way to factorize code with custom functions that are only accessible inside the file and therefore does not need export ?
The entire code :
import * as functions from "firebase-functions";
import * as admin from "firebase-admin";
import fetch from "node-fetch";
admin.initializeApp();
const db = admin.firestore();
export const tempoCF = functions.firestore.document("/tempo/{docId}").onCreate(async () => {
console.log("onCreate")
const settings = { method: "Get" }
const metaUrl = "https://opendata.paris.fr/api/datasets/1.0/sanisettesparis/"
const toiletUpdateDateRef = db.collection('toilets').doc("updateDate")
try {
// Get meta data to check last update date
const metaResponse = await fetch(metaUrl, settings)
const metaJson = await metaResponse.json()
const metaUpdateDate = metaJson["metas"]["modified"]
const lastUpdatedDateDoc = await toiletUpdateDateRef.get()
if (!lastUpdatedDateDoc.exists) {
console.log("No existing date document, create one and add last update date : " + metaUpdateDate)
await fetchDataFromURL()
return toiletUpdateDateRef.set({ "lastUpdateDate": metaUpdateDate })
} else {
const lastUpdateDate = lastUpdatedDateDoc.data()["lastUpdateDate"]
// If date from meta data newer that saved date : get data and update
if (new Date(lastUpdateDate) < new Date(metaUpdateDate)) {
console.log("New data available, update database")
await fetchDataFromURL()
return toiletUpdateDateRef.set({ "lastUpdateDate": metaUpdateDate })
}
else {
console.log("No new data available, do nothing")
return null
}
}
}
catch (error) {
console.log(error);
return null;
}
}
);
async function fetchDataFromURL() {
const dataUrl = "https://opendata.paris.fr/api/records/1.0/search/?dataset=sanisettesparis&q=&rows=-1"
const settings = { method: "Get" }
try {
const response = await fetch(dataUrl, settings)
const json = await response.json()
const promises = []
console.log("fetch data and add toilets to collection")
json["records"].forEach(toiletJsonObject => {
delete toiletJsonObject["fields"]["geo_shape"]
toiletJsonObject["fields"]["adresse"] = formatAddress(toiletJsonObject["fields"]["adresse"])
console.log("after updating adresse field: " + toiletJsonObject["fields"].toString())
const p = db.collection("toilets").doc(toiletJsonObject["recordid"]).set(toiletJsonObject["fields"])
promises.push(p)
})
console.log("finished creating promises. Wait for all to complete")
return Promise.all(promises);
}
catch (error) {
console.log(error);
return null;
}
}
const linkWords = ["de", "des", "du", "le"]
const linkLetters = ["l", "d"]
const firstWordsAddress = ["face", "opposé", "au"]
const alwaysLowerCaseWords = ["ville", "rue"]
function formatAddress(address) {
let processedAddress = ""
if (address != null) {
//if (address.length <= 1) processedAddress = address.toUpperCase();
// Split string into list of words
var wordsList = address.split(' ')
.filter((word) => {
// If there is a word in front of the street number, don't use it
if (firstWordsAddress.includes(word.toLowerCase())) return false
// Else use it
return true
})
var capitalizedList = wordsList.map((word) => {
const lowerCaseWord = word.toLowerCase() //TOSTRING ?
// If current word is a link word, don't capitalize
if (linkWords.includes(lowerCaseWord))
return lowerCaseWord
// If current word is a link letter, add ' char
else if (linkLetters.includes(lowerCaseWord))
return lowerCaseWord + '\''
// If current word should always be in lower case, don't capitalize
else if (alwaysLowerCaseWords.includes(lowerCaseWord))
return word.toLowerCase() //TOSTRING
// Else, capitalize the word
return word[0].toUpperCase() + word.substr(1).toLowerCase()
});
// Always capitalize first word of the address
capitalizedList[0] = capitalizedList[0][0].toUpperCase() + capitalizedList[0].substr(1).toLowerCase()
processedAddress = capitalizedList.join(' ')
processedAddress = processedAddress.replace("\' ", "\'")
processedAddress = processedAddress.trim()
}
return processedAddress
}
Regarding the formatAddress() helper function you defined, there doesn't appear to be an issue with it in it's current form. It can happily run through the entire list of 644 addresses ~210 times per second.
Any timeouts are instead likely to be caused by performing so many database writes in quick succession. When running fetchDataFromURL(), you "spam" the Firestore server with a request for each toilet object you are uploading.
The best-practice approach would be to compile a Batched Write and then commit the result once you've finished processing the data.
As stated in that documentation:
A batched write can contain up to 500 operations. Each operation in the batch counts separately towards your Cloud Firestore usage. Within a write operation, field transforms like serverTimestamp, arrayUnion, and increment each count as an additional operation.
Note: The current list of field transforms includes serverTimestamp, arrayUnion, arrayRemove, and increment. Reference: FieldValue
Creating/deleting/writing a document to Firestore is considered "one operation". Because a field transform requires reading the document, then writing data to that document, it is counted as "two operations".
Because a single batched write is limited to 500 operations, you should split your data up into smaller batched writes so that each batch is less than this 500 operations limit. The easiest way to achieve this would be to use this MultiBatch class (included below) that I've updated from one of my old answers.
If the data you are writing to a Cloud Firestore document is just basic data, use one of multibatch.create(), multibatch.delete(), multibatch.set(), or multibatch.update(). Each time one of these is called, the internal operations counter is increased by 1.
If the data you are writing to Cloud Firestore contains any FieldValue
transforms, use one of multibatch.transformCreate(), multibatch.transformDelete(), multibatch.transformSet(), or multibatch.transformUpdate(). Each time one of these is called, the internal operations counter is increased by 2.
Once the internal counter exceeds 500, it automatically starts a new batched write and adds it to it's internal list.
When you've queued up all your data ready to send off to Firestore, call multibatch.commit().
console.log("Fetching data from third-party server...")
const response = await fetch(dataUrl, settings)
const json = await response.json()
console.log("Data obtained. Parsing as Firestore documents...")
const batch = new MultiBatch(db)
json["records"].forEach(toiletJsonObject => {
delete toiletJsonObject["fields"]["geo_shape"]
toiletJsonObject["fields"]["adresse"] = formatAddress(toiletJsonObject["fields"]["adresse"])
console.log("after updating adresse field: " + toiletJsonObject["fields"].toString())
batch.set(db.collection("toilets").doc(toiletJsonObject["recordid"]), toiletJsonObject["fields"])
})
console.log("Finished parsing. Committing data to Firestore...")
const results = await batch.commit() // see notes about MultiBatch#commit()
console.log("Finished data upload!")
return results;
import { firestore } from "firebase-admin";
/**
* Helper class to compile an expanding `firestore.WriteBatch`.
*
* Using an internal operations counter, this class will automatically start a
* new `firestore.WriteBatch` instance when it detects it has hit the operations
* limit of 500. Once prepared, you can commit the batches together.
*
* Note: `FieldValue` transform operations such as `serverTimestamp`,
* `arrayUnion`, `arrayRemove`, `increment` are counted as two operations. If
* your written data makes use of one of these, you should use the appropriate
* `transformCreate`, `transformSet` or `transformUpdate` method so that the
* internal counter is correctly increased by 2 (the normal versions only
* increase the counter by 1).
*
* If not sure, just use `delete`, `transformCreate`, `transformSet`, or
* `transformUpdate` functions for every operation as this will make sure you
* don't exceed the limit.
*
* #author Samuel Jones [MIT License] (#samthecodingman)
* #see https://stackoverflow.com/a/66692467/3068190
* #see https://firebase.google.com/docs/firestore/manage-data/transactions
* #see https://firebase.google.com/docs/reference/js/firebase.firestore.FieldValue
*/
export class MultiBatch {
constructor(dbRef) {
this.dbRef = dbRef;
this.committed = false;
this.currentBatch = this.dbRef.batch();
this.currentBatchOpCount = 0;
this.batches = [this.currentBatch];
}
_getCurrentBatch(count) {
if (this.committed) throw new Error("MultiBatch already committed.");
if (this.currentBatchOpCount + count > 500) {
// operation limit exceeded, start a new batch
this.currentBatch = this.dbRef.batch();
this.currentBatchOpCount = 0;
this.batches.push(this.currentBatch);
}
this.currentBatchOpCount += count;
return this.currentBatch;
}
/** Creates the document, fails if it exists. */
create(ref, data) {
this._getCurrentBatch(1).create(ref, data);
return this;
}
/**
* Creates the document, fails if it exists.
*
* Used for commands that contain serverTimestamp, arrayUnion, etc
*/
transformCreate(ref, data) {
this._getCurrentBatch(2).create(ref, data);
return this;
}
/** Writes the document, creating/overwriting/etc as applicable. */
set(ref, data, options = undefined) {
this._getCurrentBatch(1).set(ref, data, options);
return this;
}
/**
* Writes the document, creating/overwriting/etc as applicable.
*
* Used for commands that contain serverTimestamp, arrayUnion, etc
*/
transformSet(ref, data, options = undefined) {
this._getCurrentBatch(2).set(ref, data, options);
return this;
}
/** Merges data into the document, failing if the document doesn't exist. */
update(ref, data, ...fieldsOrPrecondition) {
this._getCurrentBatch(1).update(ref, data, ...fieldsOrPrecondition);
return this;
}
/**
* Merges data into the document, failing if the document doesn't exist.
*
* Used for commands that contain serverTimestamp, arrayUnion, etc
*/
transformUpdate(ref, data, ...fieldsOrPrecondition) {
this._getCurrentBatch(2).update(ref, data, ...fieldsOrPrecondition);
return this;
}
/** Used when for basic update operations */
delete(ref) {
this._getCurrentBatch(1).delete(ref);
return this;
}
/**
*
* Commits all of the batches to Firestore.
*
* Note: Unlike normal batch operations, this may cause one or more atomic
* writes. One batch may succeed where others fail. By default, if any batch
* fails, it will fail the whole promise. This can be suppressed by passing in
* a truthy value as the first argument and checking the results returned by
* this method.
*
* #param {boolean} [suppressErrors=false] Whether to suppress errors on a
* per-batch basis.
* #return {firestore.WriteResult[]} array containing an array of
* `WriteResult` objects (and error-batch pairs if `suppressErrors=true`),
* for each batch.
*/
commit(suppressErrors = false) {
this.committed = true;
const mapCallback = suppressErrors
? (batch) => batch.commit().catch((error) => ({ error, batch }))
: (batch) => batch.commit();
return Promise.all(this.batches.map(mapCallback));
}
}
Before I start the business process, I select the attachments. I can do it many times, remove attachments and choose again.
I want to display dynamic table with information about attachments.
For example, to retrieve all the attachments details, I use such code:
...
var divWithAnchors = YAHOO.util.Selector.query("#page_x002e_data-form_x002e_task-details_x0023_default_assoc_packageItems-cntrl")[0];
var anchors = divWithAnchors.getElementsByTagName('a');
var attachments = new Array();
for(var i = 0; i < anchors.length; i++) {
attachments[i] = anchors[i].href.split('=')[1];
}
...
It gives me references to nodes, for example:
...
workspace://SpacesStore/c5a27463-c2aa-4c70-aca7-1f999d3ac76a
workspace://SpacesStore/29e9f035-403c-47b6-8421-624d584ff7eb
workspace://SpacesStore/712aaca2-9c90-4733-a690-bbf9bacb26e6
workspace://SpacesStore/68893fde-ee7c-4ecb-a2df-d4953dc69439
...
Then I can do AJAX requests to the REST back-end (WebScripts) and get the responses:
...
for(var i = 0; i < attachments.length; i++) {
Alfresco.util.Ajax.jsonGet(
...
// parse JSON and fill the table
Is this the correct way? I'm not sure about the ID:
page_x002e_data-form_x002e_task-details_x0023_default_assoc_packageItems-cntrl
Is this a constant?.. Can this identifier be changed?
In fact, all these NodeRefs are available in the object selectedItems = {} and can be obtained in the method getAddedItems() (see object-finder.js):
...
/**
* Selected items. Keeps a list of selected items for correct Add button state.
*
* #property selectedItems
* #type object
*/
selectedItems: null,
...
/**
* Returns items that have been added to the current value
*
* #method getAddedItems
* #return {array}
*/
getAddedItems: function ObjectFinder_getAddedItems() {
var addedItems = [],
currentItems = Alfresco.util.arrayToObject(this.options.currentValue.split(","));
for (var item in this.selectedItems) {
if (this.selectedItems.hasOwnProperty(item)) {
if (!(item in currentItems)) {
addedItems.push(item);
}
}
}
return addedItems;
},
...
Next, is needed to send these NodeRefs to the WebScript and get all the necessary properties by using NodeService service.
So I followed Will Abson's guide and source code for extending custom media viewers in Alfresco.
I have a couple of issues though.
I'm already using 4.2+ Alfresco, so no need to use head.ftl as deprecated, I'm using a second extensibility module to add my own configuration automatically, BUT:
how can I access the jsNode in my web-preview.get.js? Or better, is there a way to access properties values and aspects of a node which is being displayed?
I know about both server and client side var jsNode = new Alfresco.util.Node(model.widgets[i].options.nodeRef) and var jsNode = AlfrescoUtil.getNodeDetails(model.widgets[i].options.nodeRef);
which were mentioned in another question here, but it seems like except of default values like, mimeType, size, nodeRef, I'm not able to use those to get data from the file.
These are my changes:
web-preview.get.js in -config folder of my custom media-viewer
//<import resource="classpath:/alfresco/templates/org/alfresco/import/alfresco-util.js">
if (model.widgets)
{
for (var i = 0; i < model.widgets.length; i++)
{
var at = "test";
//var jsNode = AlfrescoUtil.getNodeDetails(model.widgets[i].options.nodeRef);
//var author = jsNode.properties["cm:author"];
var widget = model.widgets[i];
if (widget.id == "WebPreview")
{
var conditions = [];
// Insert new pluginCondition(s) at start of the chain
conditions.push({
attributes: {
mimeType: "application/pdf"
},
plugins: [{
name: "PDF",
attributes: {
}
}]
});
var oldConditions = eval("(" + widget.options.pluginConditions + ")");
// Add the other conditions back in
for (var j = 0; j < oldConditions.length; j++)
{
conditions.push(oldConditions[j]);
}
// Override the original conditions
model.pluginConditions = jsonUtils.toJSONString(conditions);
widget.options.pluginConditions = model.pluginConditions;
}
}
}
PDF.js
/**
* Copyright (C) 2014 Will Abson
*/
/**
* This is the "PDF" plug-in used to display documents directly in the web browser.
*
* Supports the "application/pdf" mime types.
*
* #namespace Alfresco.WebPreview.prototype.Plugins
* #class Alfresco.WebPreview.prototype.Plugins.PDF
*/
(function()
{
/**
* PDF plug-in constructor
*
* #param wp {Alfresco.WebPreview} The Alfresco.WebPreview instance that decides which plugin to use
* #param attributes {Object} Arbitrary attributes brought in from the <plugin> element
*/
Alfresco.WebPreview.prototype.Plugins.PDF = function(wp, attributes)
{
this.wp = wp;
this.attributes = YAHOO.lang.merge(Alfresco.util.deepCopy(this.attributes), attributes);
//this.wp.options.nodeRef = this.wp.nodeRef;
return this;
};
Alfresco.WebPreview.prototype.Plugins.PDF.prototype =
{
/**
* Attributes
*/
attributes:
{
/**
* Maximum size to display given in bytes if the node's content is used.
* If the node content is larger than this value the image won't be displayed.
* Note! This doesn't apply if src is set to a thumbnail.
*
* #property srcMaxSize
* #type String
* #default "2000000"
*/
srcMaxSize: "2000000"
},
/**
* Tests if the plugin can be used in the users browser.
*
* #method report
* #return {String} Returns nothing if the plugin may be used, otherwise returns a message containing the reason
* it cant be used as a string.
* #public
*/
report: function PDF_report()
{
// TODO: Detect whether Adobe PDF plugin is installed, or if navigator is Chrome
// See https://stackoverflow.com/questions/185952/how-do-i-detect-the-adobe-acrobat-version-installed-in-firefox-via-javascript
var srcMaxSize = this.attributes.srcMaxSize;
if (!this.attributes.src && srcMaxSize.match(/^\d+$/) && this.wp.options.size > parseInt(srcMaxSize))
{
return this.wp.msg("pdf.tooLargeFile", this.wp.options.name, Alfresco.util.formatFileSize(this.wp.options.size), Alfresco.util.formatFileSize(this.attributes.srcMaxSize));
}
},
/**
* Display the node.
*
* #method display
* #public
*/
display: function PDF_display()
{
// TODO: Support rendering the content of the thumbnail specified
var src = this.wp.getContentUrl();
var test = this.attributes.author;
//var test = this.wp.options.nodeRef;
//var jsNode = new Alfresco.util.Node(test);
//var jsNode = AlfrescoUtil.getNodeDetails(this.wp.options.nodeRef);
//var author = jsNode.properties["cm:author"];
//var test = this.wp.options.author;
//var test1 = this.wp.options.mimeType;
//var test = this.attributes.author.replace(/[^\w_\-\. ]/g, "");
//.replace(/[^\w_\-\. ]/g, "");
return '<iframe name="' + test + '" src="' + src + '"></iframe>';
}
};
})();
As you can see by commented sections I tried different methods to access node properties/values, even simple strings, but I'm missing something for sure.
Thanks.
If you take a look at the source code you'll see that the helper method is nothing else than doing a remote call to var url = '/slingshot/doclib2/node/' + nodeRef.replace('://', '/');
So take a look at what that Repository WebScript is returning en match it to the properties you need.
I normally don't use this one and I know for sure the /api/metadata returns all the properties.
I have the following DynamoDB query which returns the first record with the hash apple and time-stamp less than some_timestamp.
Map<String, Condition> keyConditions = newHashMap();
keyConditions.put("HASH", new Condition().
withComparisonOperator(EQ).
withAttributeValueList(new AttributeValue().withS("apple")))
);
keyConditions.put("TIMESTAMP", new Condition().
withComparisonOperator(LE).
withAttributeValueList(new AttributeValue().withN(some_timestamp)))
);
QueryResult queryResult = dynamoDBClient.query(
new QueryRequest().
withTableName("TABLE").
withKeyConditions(keyConditions).
withLimit(1).
withScanIndexForward(SCAN_INDEX_FORWARD)
);
I need to execute many queries of this kind and so my question: is it possible to batch execute these queries? Something like the following API.
Map<String, Condition> keyConditions = newHashMap();
keyConditions.put("HASH", new Condition().
withComparisonOperator(EQ).
withAttributeValueList(new AttributeValue().withS("apple")))
);
keyConditions.put("TIMESTAMP", new Condition().
withComparisonOperator(LE).
withAttributeValueList(new AttributeValue().withN(some_timestamp)))
);
QueryRequest one = new QueryRequest().
withTableName("TABLE").
withKeyConditions(keyConditions).
withLimit(1).
withScanIndexForward(SCAN_INDEX_FORWARD);
keyConditions = newHashMap();
keyConditions.put("HASH", new Condition().
withComparisonOperator(EQ).
withAttributeValueList(new AttributeValue().withS("pear")))
);
keyConditions.put("TIMESTAMP", new Condition().
withComparisonOperator(LE).
withAttributeValueList(new AttributeValue().withN(some_other_timestamp)))
);
QueryRequest two = new QueryRequest().
withTableName("TABLE").
withKeyConditions(keyConditions).
withLimit(1).
withScanIndexForward(SCAN_INDEX_FORWARD)
ArrayList<String> queryRequests = new ArrayList<String>() {{
add(one);
add(two);
}};
List<QueryResult> queryResults = dynamoDBClient.query(queryRequests);
From a very similar question in the AWS forums here:
DynamoDB's Query API only supports a single "use" of the index in the query operation, and as a result, the "hash" of the index you're querying has to be specified as an EQ condition. DynamoDB does not currently have any kind of "batch query" API, so unfortunately what you're looking for is not possible today in a single API call. If these were GetItem requests (not suitable for your use case though), you could issue a BatchGetItem request.
In the meantime, since it looks like you're using Java, my recommendation would be to use threads to issue multiple query requests in parallel. Here's some sample code that accomplishes this, but you'll want to consider how you want your application to handle pagination / partial results, and errors:
/**
* Simulate a "Batch Query" operation in DynamoDB by querying an index for
* multiple hash keys
*
* Resulting list may be incomplete if any queries time out. Returns a list of
* QueryResult so that LastEvaluatedKeys can be followed. A better implementation
* would answer the case where some queries fail, deal with pagination (and
* Limit), have configurable timeouts. One improvement on this end would be
* to make a simple immutable bean that contains a query result or exception,
* as well as the associated request. Maybe it could even be called back with
* a previous list for pagination.
*
* #param hashKeyValues (you'll also need table name / index name)
* #return a list of query results for the queries that succeeded
* #throws InterruptedException
*/
public List<QueryResult> queryAll(String... hashKeyValues)
throws InterruptedException {
// initialize accordingly
int timeout = 2 * 1000;
ExecutorService executorService = Executors.newFixedThreadPool(10);
final List<QueryResult> results =
new ArrayList<QueryResult>(hashKeyValues.length);
final CountDownLatch latch =
new CountDownLatch(hashKeyValues.length);
// Loop through the hash key values to "OR" in the final list of results
for (final String hashKey : hashKeyValues) {
executorService.submit(new Runnable() {
#Override
public void run() {
try {
// fill in parameters
QueryResult result = dynamodb.query(new QueryRequest()
.withTableName("MultiQueryExample")
.addKeyConditionsEntry("City", new Condition()
.withComparisonOperator("EQ")
.withAttributeValueList(new AttributeValue(hashKey))));
// one of many flavors of dealing with concurrency
synchronized (results) {
results.add(result);
}
} catch (Throwable t) {
// Log and handle errors
t.printStackTrace();
} finally {
latch.countDown();
}
}
});
}
// Wait for all queries to finish or time out
latch.await(timeout, TimeUnit.MILLISECONDS);
// return a copy to prevent concurrent modification of
// the list in the face of timeouts
synchronized (results) {
return new ArrayList<QueryResult>(results);
}
}
I'm using 'Simple Login Web' with Google, and I discovered that in the auth callback, the email property is missing and the thirdPartyUserData object is empty when I set preferRedirect to true.
This is either:
a bug in Firebase
a bug in my own code that I should fix myself (see code below. also, the configuration at Google is done exactly as instructed at https://www.firebase.com/docs/security/simple-login-google.html)
a known restriction that needs to be documented
So, my question is: Which? And if it's a bug in my own code, how do I fix it?
var ref = new Firebase('https://<myfirebase>.firebaseio.com/');
var auth = new FirebaseSimpleLogin(ref, function(error, user) {
// When logging in with `preferRedirect: true`, `user` contains:
// * accessToken
// * displayName
// * firebaseAuthToken
// * id
// * provider
// * thirdPartyUserData: empty object
// * uid
//
// When logging in WITHOUT `preferRedirect`, `user` contains:
// * accessToken
// * displayName
// * email
// * firebaseAuthToken
// * id
// * provider
// * thirdPartyUserData:
// * email
// * family_name
// * gender
// * given_name
// * hd
// * id
// * link
// * locale
// * name
// * picture
// * verified_email
// * uid
});
auth.login('google', {
preferRedirect: true,
rememberMe: true,
scope: 'https://www.googleapis.com/auth/plus.login'
});