I am trying to send a message to IoT Hub and save it to DocumentDB
using Stream Analytics, though I'm having a trouble outputting it to
the DocumentDB's "Partitioned" collection.
I was able to output the message to "Single Partition" collection, but no documents are outputted to "Partitioned" collection.
The details are as below:
[Stream Analytics output for the DocDB "Partitioned" collection]
Output Alias : outdocdbpart
PartitionKey : DeviceId
Document Id : id
[Stream Analytics query]
/*Partitioned:no document inserted*/
SELECT * INTO [outdocdbpart] FROM [inputiothub]
[Format of JSON content to insert is something like this]
{
"DeviceId": "device001",
"id": "{Guid}",
...
}
(*) I've added "id" for "Partitioned" collection only.
For "Single Partitioned", I didn't place "id" property.
[Settings for DocDB's "Partitioned" collection]
PartitionMode: Partitioned
PartitionKey : /DeviceId
Above resources are all in the same group and same region.
What could be the cause of the problem?
Am I missing something?
Azure Stream Analytics currently can not output to Azure DocumentDB partitioned collections. Our teams are currently working on implementing this, but we do not have an ETA yet.
In the meantime, can you vote for this item on the Azure Stream Analytics feedback forum?
https://feedback.azure.com/forums/270577-stream-analytics/suggestions/13431888-output-new-documentdb-single-partition-partition
Related
I use Google Firestore for my iOS app built in Swift/SwiftUI and would like to implement the Snapshot listeners feature to my app.
I want to list all documents in debts collection in realtime by using snapshot listeners. Every document in this collection has subcollection debtors, which I want to get in realtime for each debts document as well. Each document in debtors has field userId, which refers to DocumentID in users collection which I would also love to have realtime connection on (for example when user changes his name I would love to see it instantly in the debt entity inside the list). This means I must initialize 2 more snapshot listeners for each document in debts collection. I'm concerned that this is too many opened connections once I have like 100 debts in the list. I can't come up with no idea apart from doing just one time fetches.
Have anyone of you ever dealt with this kind of nested snapshot listeners? Do I have a reason to worry?
This is my Firestore db
Debts
document
- description
- ...
- debtors (subcollection)
- userId
- amount
- ...
Users
document
- name
- profileImage
- email
I uploaded this gist where you can see how I operate with Firestore right now.
https://gist.github.com/michalpuchmertl/6a205a66643c664c46681dc237e0fb5d
If you want to read all debtors documents anywhere in the database with a given value for userId, you can use a collection group query to do so.
In Swift that'd look like:
db.collectionGroup("debtors").whereField("userId", isEqualTo: "uidOfTheUser").getDocuments { (snapshot, error) in
// ...
}
This will read from any collection name debtors. You'll have to add the index for this yourself, and set up the proper security rules. Both of those are documented in the link I included above.
I am developing an application that needs to search all nearby users WITHOUT sharing their coordinates within 100 miles let's say. The example below I am using GeoHash to help me in calculating the distance.
In FireStore, I have the following document inside collection of users.
{
"userId" : "12345",
"displayName" : "username",
"geoHash" : "gbsuv",
"photoUrl" : "example.com/user.jpg",
"refId" : "0001"
}
The question is: How should I protect the "geoHash" from being retrieved within each document inside the collection?
Firestore security rules grant access on a document level. So either the user can read an entire document, or they can't read anything in that document. There's no way to grant users access to only part of a document.
This means that you can't query something that the client can't read. So in your current structure, if the user needs to query on geoHash, they will be able to read that field too.
The only alternative is to not let the client do the querying, but instead do that querying on a server (such as in Cloud Functions). For this you'd store the geohash for each user in a separate document (say in a collection called locations). The Cloud Function then queries this collection, and returns the real user document(s) (which doesn't contain the geohash anymore) to the user.
Imagine that I have a collection of documents:
collection:
'products'
And inside that collection I have documents, and each document has a really heavy field that I want filtered before it gets sent to my clients.
document product1:
{
id: 'someUniqueKey',
title: 'This is the title',
price: 35.00,
heavyField: [
'longStringxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
'longStringxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
'longStringxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
'longStringxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
'longStringxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
'longStringxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
}
Can I query for those documents inside the products collection and leave this specific heavyField out of my response?
I would like to be able to get the documents with 1 of their fields filtered out. Maybe select the fields I want to receive on my client.
Is this possible? Or in this case it's best to structure my data different and leave the heavyField in a different collection?
In SQL, this would be called a "projection" in a query. Cloud Firestore doesn't support projections from web and mobile client SDKs. You should instead put your "heavy" fields into documents in another collection, and query that only as needed.
If you're wondering why, it partly has to do with the way that the client local persistence layer works. It would make caching much more complicated if only certain fields existed locally for a given document. It's much more straightforward to simply fetch and store the entire document, so there is no question whether or not the document is locally available.
For the purpose of data modeling, it's best to think of documents as "atomic units", all or nothing, that can't be broken down.
The documentation for Cosmos DB Change Feed mentions that the documents in Change Feed are persisted and can be processed asynchronously. The CreateDocumentFeedQuery has below prototype:
CreateDocumentChangeFeedQuery(Uri collectionLink, ChangeFeedOptions feedOptions);
So the control offered is only on the level of ChangeFeedOptions where we can specify partition key range and the start time after which we need documents.
Is there any way we can query the Change Feed by passing in a custom query as done against the Cosmos DB Collection? Like querying based on document properties?
Right now filtering the Change Feed by a query is not supported, like you said, you can filter by PartitionKeyRangeId and use the ContinuationToken to iterate over the feed.
Here is the UserVoice item that tracks this request with an alternative using Spark.
Coming from a mongodb background, I'd like to set up a document with an embedded collection.
For instance if I have a profile object
Profile
name : string
followers : [
name: string
]
such that it has an embedded collection of followers.
Is there a way that I can create an index on Profile so that I can query for all profiles where Profile.Followers includes myUsername?
In short I can query for profiles I'm following from a dynamoDB table?
In mongo I can easily do this by setting up an index on Profile.followers and doing an $in query. Is there something similar for dynamodb?
This documentation suggests there is nothing like an in clause
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
Currently DynamoDB does not support indices for non scalar types (i.e. Set, List, or Map data types - see here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html). If you have a separate users table, you can keep track of all profiles you are following in a Set/List attribute.