I'm currently trying to learn Symfony and a big part of it is Doctrine. I've been reading the official documentation for Doctrine and in the part about Collections library I stumbled upon this thing called "ordered map". I tried to search it on google, but I wasn't able to find any satisfying answer. There were just answers for specific languages (mostly Java and C++), but I want to understand it in general. How it works and what it is, because in the Doctrine documentation they are comparing it to the ArrayCollection, so I hope if I can understand what it is, it will be easier for me to understand ArrayCollection as well.
I tried to search for things like "what is an ordered map" or "ordered map explained", but as I said earlier, I didn't find what I was looking for.
A map is sometimes called ordered when the entries remain in the same sequence in which they were inserted.
For example, arrays in PHP are ordered (preserve insertion order). So creating/modifying an array like this:
$array = [2 => 'a', 1 => 'b'];
$array[0] = 'c';
will indeed result in the PHP array [2 => 'a', 1 => 'b', 0 => 'c'] - it preserves the insertion order - while in some other languages it will be turned into [0 => 'c', 1 => 'b', 2 => 'a'].
This affects a few operations. Iterating over an array with foreach will return the entries in insertion order. You can do key-wise or value-wise sorting on PHP arrays, the default sorting function sort will drop the original keys and reindex numerically. Serialization and deserialization with numeric keys may have unintended consequences. And some other effects that sometimes are beneficial and sometimes are surprising or annoying (or both). You can read lots of it on PHP's array doc page and the array function pages.
In the context of Doctrine (since it's written in PHP) this means, that a collection where values are the entity objects can be sorted in any manner you want (including id of course), and if you iterate over that collection, you get the entity objects in the order they were added by doctrine (the order of the SQL/DQL query). Doctrine also allows to set the keys to the entities' ids, while still preserving the SQL/DQL query order. This can simplify code since Doctrine's Collection implements PHP's ArrayAccess.
As a counter example, maps can also be unordered or sorted, where the first means when you retrieve the pairs the order can be random (in golang, the starting index used to be random when iterating over maps, don't know if this is still true) or automatically sorted (like SortedMap in Java).
Related
I noticed that Firestore allows arrays and some operations on them like containsAny([...]).
I'm thinking of having a array of values, but the values I'll be putting in are UUIDs (strings). So, it may look like this:
MyCollection {
categoryIds List<String>
}
And I'll do operations like where(categoryIds, containsAny(uuid1, uuid2))
Is there a performance hit vs if I had stored numbers instead of string?DOes it matter?
Firestore queries are generally based on indexes, so I doubt there's any performance difference between the two.
Also note: Firestore "arrays" are ABSOLUTELY NOT ARRAYS. They are ORDERED LISTS, generally in the order they were added to the array. The SDK presents them to the CLIENT as arrays, but Firestore itself does not STORE them as actual arrays - THE NUMBER YOU SEE IN THE CONSOLE is the order, not an index. Matching elements in an array (arrayContains, e.g.) requires matching the WHOLE element - if you store an ordered list of objects, you CANNOT query the "array" on sub-elements.
The client SDKs generally present the values in the arrays/"ordered lists" to you as an array - which has more to do with most languages not having a primitive element that is an ordered list.
I have a collection where the documents are uniquely identified by a date, and I want to get the n most recent documents. My first thought was to use the date as a document ID, and then my query would sort by ID in descending order. Something like .orderBy(FieldPath.documentId, descending: true).limit(n). This does not work, because it requires an index, which can't be created because __name__ only indexes are not supported.
My next attempt was to use .limitToLast(n) with the default sort, which is documented here.
By default, Cloud Firestore retrieves all documents that satisfy the query in ascending order by document ID
According to that snippet from the docs, .limitToLast(n) should work. However, because I didn't specify a sort, it says I can't limit the results. To fix this, I tried .orderBy(FieldPath.documentId).limitToLast(n), which should be equivalent. This, for some reason, gives me an error saying I need an index. I can't create it for the same reason I couldn't create the previous one, but I don't think I should need to because they must already have an index like that in order to implement the default ordering.
Should I just give up and copy the document ID into the document as a field, so I can sort that way? I know it should be easy from an algorithms perspective to do what I'm trying to do, but I haven't been able to figure out how to do it using the API. Am I missing something?
Edit: I didn't realize this was important, but I'm using the flutterfire firestore library.
A few points. It is ALWAYS a good practice to use random, well distributed documentId's in firestore for scale and efficiency. Related to that, there is effectively NO WAY to query by documentId - and in the few circumstances you can use it (especially for a range, which is possible but VERY tricky, as it requires inequalities, and you can only do inequalities on one field). IF there's a reason to search on an ID, yes it is PERFECTLY appropriate to store in the document as well - in fact, my wrapper library always does this.
the correct notation, btw, would be FieldPath.documentId() (method, not constant) - alternatively, __name__ - but I believe this only works in Queries. The reason it requested a new index is without the () it assumed you had a field named FieldPath with a subfield named documentid.
Further: FieldPath.documentId() does NOT generate the documentId at the server - it generates the FULL PATH to the document - see Firestore collection group query on documentId for a more complete explanation.
So net:
=> documentId's should be as random as possible within a collection; it's generally best to let Firestore generate them for you.
=> a valid exception is when you have ONE AND ONLY ONE sub-document under another - for example, every "user" document might have one and only one "forms of Id" document as a subcollection. It is valid to use the SAME ID as the parent document in this exceptional case.
=> anything you want to query should be a FIELD in a document,and generally simple fields.
=> WORD TO THE WISE: Firestore "arrays" are ABSOLUTELY NOT ARRAYS. They are ORDERED LISTS, generally in the order they were added to the array. The SDK presents them to the CLIENT as arrays, but Firestore it self does not STORE them as ACTUAL ARRAYS - THE NUMBER YOU SEE IN THE CONSOLE is the order, not an index. matching elements in an array (arrayContains, e.g.) requires matching the WHOLE element - if you store an ordered list of objects, you CANNOT query the "array" on sub-elements.
From what I've found:
FieldPath.documentId does not match on the documentId, but on the refPath (which it gets automatically if passed a document reference).
As such, since the documents are to be sorted by timestamp, it would be more ideal to create a timestamp fieldvalue for createdAt rather than a human-readable string which is prone to string length sorting over the value of the string.
From there, you can simply sort by date and limit to last. You can keep the document ID's as you intend.
I'm integrating Crossfilter with Vue and wonder about efficiency.
Whenever the state of the UI updates, I'm doing calculations using code like this one, to get various metrics from the dataset:
const originKeys = this.dimensions.origin.group().all().map(value => value.key)
At this point I realised that the group being created by the call is stored in the "registry" of the dimension every time the UI updates, but since I'm not storing the reference to the group, it's effectively "lost".
Then, whenever the data set updates, all of these "lost" groups do their calculations, even though the results are never used.
This is what I assumed, correct me if I'm wrong, please.
The calculations change according to the UI, so it's pointless to store references to the groups.
To overcome this issue, I created a simple helper function that creates a temporary group and disposes of it right after the calculation is done.
function temporaryGroup(dimension, calculator) {
const group = dim.group()
const result = calculator(group, dim)
group.dispose()
return result
}
Using it like this:
const originKeys = temporaryGroup(this.dimensions.origin, (group) => {
return group.all().map(value => value.key)
})
The question is, is there a better (more efficient) way for temporary calculations like the one above?
The answer is no. Your stated assumptions are correct. That's not efficient, and there is no more efficient way of using this library for temporary groups.
Crossfilter is designed for quick interaction between a fixed set of dimensions and groups.
It's stateful by design, adding and removing just the specific rows from each group that have changed based on the changes to filters.
This matters especially for range-based filters, since if you drag a brush interactively a small segment of domain is added and a small segment is removed at each mousemove.
There are also array indices created to track the mapping from data -> key -> bin. One array of keys and one integer array of indices for the dimension, and one integer array of bin indices for the group. This makes updates fast, but it may be inefficient for temporary groups.
If you don't have a consistent set of charts, it would be more efficient in principle to do the calculation yourself, using typed arrays and e.g. d3-array.
On the other hand, if you are doing this to comply with Vue's data model, you might see if Vue has a concept similar to React's "context", where shared state is associated with a parent component. This is how e.g. react-dc-js holds onto crossfilter objects.
I would like to return the field names of a given mongodb collection from R mongolite.
Starting from mongolite recent versions (i.e 1.5+), you can run a raw command on the mongodb, I can use the below for instance to return all the collections:
m = mongo(db = 'dbname', url='urlofdb')
m$run('{"listCollections":1}')
This would return a list of collection:
$cursor
$cursor$id
[1] 0
$cursor$ns
[1] "db.$cmd.listCollections"
$cursor$firstBatch
name type readOnly idIndex.v idIndex._id idIndex.name idIndex.ns
1 collection-name collection FALSE 1 1 _id_ db.collection
Can you please advise how I could return the column names of a given collection using the run command?
Thanks!
I don't think you really can do it directly.
If you could, that would largely go against the entire philosophy of a NoSQL-database (which Mongo is). The idea behind a NoSQL-database is that you have a collection of documents, which can all have their own fields.
The analogy to paper documents really does work, and the concept of 'columns' is replaced by 'fields', which don't pertain to the collection as a whole, but to individual documents, and each document can contain anything. And there is no overarching mandatory template into which everything must fit. In practice, a lot of documents will have a similar structure, but this is by no means guaranteed. This means that it's entirely possible that you have 100 million documents with 3 fields called "a", "b" and "c", and that document 100000001 has 4 fields: a, b, c and d.
It could be that the database-engine keeps track of what fields are somewhere in a collection, but I doubt that. And if it doesn't, the only way to get all four names a, b, c and d, is to go through all 100000001 documents (or more), which will take a while. Undoubtedly, some optimisation is implemented, but it will always be a hard question.
If you just want an answer for a small DB, I think simply querying for all documents and taking the column-names of the resulting data.frame is easiest.
But if your database is large, this question is no longer about R or mongolite, and I'm not sufficient enough in working with Mongo to help you further.
I'm trying to add something to rooms/users using a data model that looks like this:
rooms: {
name: roomname
users: {
0: email#email.com
}
}
My question is if there is any way to append a new item to the users array. I would normally do this using update(), but update() requires a key for the data to be set to when I just want to set it to the next array index. I figure that I can do this by getting the current rooms/users array, appending to it locally, and using set() to overwrite it, but I was wondering if there was a better (built in) way to go about this.
Using arrays in a potentially massively distributed system such as Firebase are in general a bad idea. And from what you've described, your use-case falls under the "in general" category.
From the Firebase documentation on arrays:
Why not just provide full array support? Since array indices are not permanent, unique IDs, concurrent real-time editing will always be problematic.
Consider, for example, if three users simultaneously updated an array on a remote service. If user A attempts to change the value at key 2, user B attempts to move it, and user C attempts to change it, the results could be disastrous. For example, among many other ways this could fail, here's one:
// starting data
['a', 'b', 'c', 'd', 'e']
// record at key 2 moved to position 5 by user A
// record at key 2 is removed by user B
// record at key 2 is updated by user C to foo
// what ideally should have happened
['a', 'b', 'd', 'e']
// what actually happened
['a', 'c', 'foo', 'b']
Instead of using arrays, Firebase uses a concept called "push ids". These are consistently increasing (like array indices), but (unlike array indices) you don't have to know the current count to add a new push id.
With push ids, you can add a new user with:
var ref = new Firebase('https://yours.firebaseio.com/rooms/users');
ref.push('email#email.com');
Note that the Firebase documentation is in general considered to be pretty good. I highly recommend that you follow at least the programming guide for JavaScript, from which I copied the above.