Retrieving distinct properties [duplicate] - azure-cosmosdb

I would like to be able to provide a list of all the properties across all documents in a collection.
The best way I can come up with is to query for all documents and then build the list in the client, but this feels wrong.

The only way to do what you want is to read all of the documents. However, if you are worried about bandwidth, then you can do it in a stored procedure that only returns the list of properties.
If you take that route, I recommend that you start with the countDocuments sproc here and be prepared to call as many times as necessary until the continuation comes back empty and there are no 429 errors... or use documentdb-utils which takes care of that for you.
Alternatively, I could give you a full on example here. Just let me know.
Another approach would be to maintain a list of properties as documents are being written. This would be preferred if you need this list often.

You can store Documents with any kind of structure in a collection, they could be all different.
You are not restricted in a collection to store all objects from the same "schema".
So, getting all the properties available on a collection is not really something supported by the DocumentDB API or SDK, you either, read the whole collection, or rely on some sort of convention that you make when you create objects.

You can use Slazure for this. Example follows which lists all property names for a given set of documents:
using SysSurge.Slazure.AzureDocumentDB.Linq;
using SysSurge.Slazure.Core;
using SysSurge.Slazure.Core.Linq.QueryParser;
public void ShowPropertyNames()
{
// Get a reference to the TestCstomers collection
dynamic storage = new QueryableStorage<DynDocument>("URL=https://contoso.documents.azure.com:443/;DBID=DDBExample;TOKEN=VZ+qKPAkl9TtX==");
QueryableCollection<DynDocument> collection = storage.TestCustomers;
// Build collection query
var queryResult = collection.Where("SignedUpForNewsletter = true and Age < 22");
foreach (DynDocument document in queryResult)
{
foreach (KeyValuePair<string, IDynProperty> keyValuePair in document)
{
Console.WriteLine(keyValuePair.Key);
}
}
}

Related

How to fetch only part of document from dynamoDB?

For fetching document faster I want to fetch only a part of it, as the document size can be large. How to achieve this in dynamodb?
#JsonIgnoreProperties(ignoreUnknown = true)
#DynamoDBTable(tableName="tab")
#Data
#NoArgsConstructor
#JsonInclude(JsonInclude.Include.NON_NULL)
public class Doc {
#DynamoDBAttribute(attributeName ="id")
#JsonProperty("id")
#DynamoDBHashKey
private String id;
#DynamoDBAttribute
Info info;
#DynamoDBAttribute
List<Participants> participants;
#DynamoDBAttribute
List<Phases> phases;
}
This below load method will load complete document. But I don't want that. I want to load only info
Transaction transaction = dynamoDBMapper.load(Doc.class, txnId);
How to achieve that ?
I'll give you a generic DynamoDB answer - you can look up the specific syntax for your language/SDK on your own.
You said that you have a large document and want to fetch only a part of it so it will be faster. This question has two parts - first how to fetch only a part of the document, and second whether it will actually be faster.
The answer for the first part is that yes - it is indeed possible to fetch only a part of the document, by passing additional parameters to the GetItem (or Query, Scan, etc.) request. The older parameter is AttributesToGet, letting you provide a list of top-level attributes you want to fetch for the item. The newer and recommended replacement is the ProjectionExpression parameter - which also allows you to retrieve parts of a nested documents (an attribute which is itself a list or a map).
However, it is less obvious whether this will be "faster". First you need to know that it will not be cheaper - the cost of a read request is calculated based on the size of the entire item (which the DynamoDB implementation reads from disk entirely) - not just the specific attributes you ask to retrieve. It will also not be faster for DynamoDB to read the data from the disk. One thing that can be faster is the networking part (since the response is smaller), but whether or not this translates to any appreciable fastness in your application depends on your exact setup.

Something akin to "Sparse Fieldsets" in .NET

I'm trying to find the vocabulary to describe what I want and if it exists.
I have a table that shows a few data points from large objects. Loading the entire objects just for the table is very slow. Is there a way to only pass to the front the few properties I want without having to define a new object?
I found something called Sparse Fieldsets in JSON API, and I'm wondering if something like this exists for .NET under another name.
Update:
Talking to another coder, I realize it probably makes more sense to implement something like this between the backend and the database and make a specific call for this table. I still need to work out if I need to create a new object to support this. I think it'd still be faster if I just kept the same object but nulled out all of the connecting objects that I don't need for the table. But maybe that's considered bad practice? Also we're using Entity Framework for what it's worth.
Update 2:
I just created a new query without all of the .Include() and works well enough for what I need:
_dataContext.ApplePie
.Include(f => f.Apples).ThenInclude(f => f.Apple)
.Include(f => f.Sugars).ThenInclude(f => f.MolecularStructure)
.Include(f => f.Recipe)
Maybe you are looking for Anonymous Types?
For example, if you had a typed object with three properties, but you only wanted to operate on two:
var threePropThing = new ThreePropertyThing { Id = 1, Message = "test", ExtraProperty = "ex" };
var myAnonThing = new { Id = threePropThing.Id, Message = threePropThing.Message };
Best practice would be to not pass this anonymous object around. But, if you really needed to, you could return it as type object.
Typically, when passing data around in c#, you want to have it typed.
C# is a strongly-typed language and I would say that it is unusual for C# to support scenarios, when object definition (properties) are not known in advance, like in JSON API "fields" parameter case. Implementing this would imply using reflection to filter the properties, which is usually slow and error-prone.
When implementing C# web-services, people usually create one DTO response model per each request.
If your table has fixed set of fields, I would personally recommend to create a DTO class containing only the fields which are required for your table, and then create a method which returns this response for your specific request. While it doesn't align with "without having to define a new object" in the question, it makes the intention clear and makes it really easier to maintain the API in future.
You might want to use libraries like AutoMapper to save time and avoid duplicated code of copying the values from data model to DTO, if you have many such methods.

Saving an entire one-to-many structure of transient objects in one query

In Short
I seem to have landed on a MAJOR anti-pattern of saving objects WAY too many times. I've read through the limited Objectify docs and can't seem to find the right pattern to use.
Details
I have multiple objects I want to store. They are all transient (they don't exist in the database yet) and they have a one-to-many relationship. I don't want to sit and call ofy().save() on every last object in my hierarchy.
In the following example, a Player has a List of Cards.
My Model:
#Entity
public class Player {
#Id private Long id = null;//will be generated
private List<Ref<Card>> cards = new ArrayList<Ref<Card>>();
//getters and setters here
}
public class Card{
#Id private Long id = null;//will be generated
//lots of other fields and getters and setters here
}
My Operation:
I need to create a new player and new card, with the player having a reference to the card in his List "cards."
IDEAL SOLUTION:
I would like to just create the player and card java objects, set their relationships, and pass them to Objectify to be saved. Like this:
Player player = new Player();
Card card = new Card();
player.setPlayer(Ref.create(card));
ofy.save().entity(player).now();
That will fail. The 3rd line attempts to create a new Ref for Card, which cannot be done because Card doesn't have an Id yet, which will be assigned to it once it's already persisted. It seems I must never associate an object with another until one has already been saved.
Current Crappy Solution
So, my solution must be to save the Card first, and then relate it to the Player, then save the player.
Player player = new Player();
Card card = new Card();
ofy().save().entity(card).now();
player.setPlayer(Ref.create(card));
ofy().save().entity(card).now();
This is insane. It seems reasonable at first, but my app is dealing with many more relationships than just this, and with this pattern my algorithm will be a spiderweb of checking for transient objects inside collections before saving the entity I'm actually concerned with.
There MUST be some way to tell Objectify to just SAVE all child/related entities along with the entity I've requested, and furthermore generate the Ids necessary instead of throwing an Exception at me.
Furthermore, I'll also need this sort of "recursive save" solution even when none of my objects are transient (ie they all have IDs already). I can't waste my time iterating through collections and then all the collections WITHIN those collections and saving them all. I'm going to need some way of telling Objectify to just SAVE THIS WHOLE HEIRARCHY OF OBJECTS I just passed you.
I've been reading around this #Load annotation and I feel like maybe there's something in there I'm missing... I don't know. Need help. Documentation is slim.
UPDATED SOLUTION
For posterity -
Using the allocateId() method decouples the entire ID generation constraint away from the database and you get a VERY clean pattern, particularly if you do as I did:
All database #Entity classes get a private constructor and a static public factory for creating transient objects. This static factory method ( createTransient() ) will always allocate a new ID. So then, all client code can use this method for acquiring new transient objects, or the obvious objectify load for acquiring existing persisted instances. Simple. Done. Lovely.
I recommend two things:
Allocate ids manually when you construct your objects using ObjectifyFactory.allocateId(). Do not use the "save with null autogenerates" feature. As you've noticed, it's a PITA to deal with entity objects that have null ids, so don't allow them to exist.
Use deferred saves. ofy().defer().save().entity(blah); You can save almost any number of things this way and they'll only get saved once on commit (or closing of the objectify session). Deferring save on the same entity multiple times produces only a single save.
This pattern of leaving ids null and filling it in on save is a holdover from the JPA days. It didn't work very well with JPA either; there were plenty of frustrating edge cases dealing with entities missing ids (especially when you wanted to put the in maps or sets). The best solution is to simply guarantee that no entity is ever missing an id in the first place.
Note that you'll want to allocate the id in a custom constructor, not the no-args constructor that Objectify uses to build your entity on load. Allocating an id is cheap but still a call to the GAE service layer and you don't want to do this on every load.

Meteor drop collection after every query

I am writing an application which communicates with an API and stores the response in a Meteor Collection so I can have the power of mongo to sort/filter.
I would like to clear the collection for every new result set. But a Meteor Collection is persistent.
What is the preferred way of clearing the collection? I know you can drop the meteor collection, but is that the preferred method?
Help appreciated. Thank you!
I would go about creating a local mongo collection which will be available on client side only. To create a client-side collection, just don't give it a name argument.
//This collection is client-only, and will not be sync with server
myCollection = new Mongo.Collection();
//To be more explicit, you can use `null` for the name:
myCollection = new Mongo.Collection(null);
Once you are done using the data empty the collection
myCollection.remove({});
myCollection.remove({}) is the syntax for removing all documents from a collection. This will only work on the server unless the collection is a client-side collection as per #Nakib's example. Otherwise documents can only be deleted by _id on the client side. Normally your allow/deny rules should block any attempt to delete anything on the client as it provides a great attack vector.
Not completely familiar with the Meteor best practice but if you were going to clear out an array in javascript the best practice would be to run the following.
myArrary.length = 0;
For more information I recommend this blog post by David Walsh where he details the reasoning behind zeroing out an array as follows:
Setting the length equal to zero empties the existing array, not
creating another array! This helps you to avoid pointer issues with
arrays as well.

query ravendb from web api 2 and return one document

I want to do the following using Asp.net Web API 2 and RavenDB.
Send a string to RavenDB.
Lookup a document containing a field called UniqueString that contain the string i passed to RavenDB.
Return either the document that matched, or a "YES/NO a document with that string exists" - message.
I am completely new to NoSQL and RavenDB, so this has proven to be quite difficult :-) I hope someone can assist me, and i assume it is actually quite easy to do though i haven't been able to find any guides showing it.
This has nothing to do with WebAPI 2, but you can do what you ask for using RavenDb combined with WebAPI 2.
First you need to have an index (or let RavenDb auto create one for you) on the document and property/properties you want to be indexed. This index can be created from code like this:
public class MyDocument_ByUniqueString : AbstractIndexCreationTask<MyDocument>
{
public override string IndexName
{
get { return "MyDocumentIndex/ByUniqueString"; }
}
public MyDocument_ByUniqueString()
{
Map = documents => from doc in documents
select new
{
doc.UniqueString
};
}
}
or created in the RavenDb Studio:
from doc in docs.MyDocuments
select new {
doc.UniqueString
}
After that you can do an "advanced document query" (from a WebAPI 2 controller or similar in your case) on that index and pass in a Lucene wildcard:
using (var session = documentStore.OpenSession())
{
var result = session.Advanced
.DocumentQuery<MyDocument>("MyDocumentIndex/ByUniqueString")
.Where("UniqueString: *uniq*")
.ToList();
}
This query will return all documents that has a property "UniqueString" that contains the term "uniq". The document in my case looked like this:
{
"UniqueString": "This is my unique string"
}
Please note however that these kind of wildcards in Lucene might not be super performant as they might need to scan large amount of texts. In the RavenDB documentation there's even a warning aginst this:
Warning
RavenDB allows to search by using such queries but you have to be
aware that leading wildcards drastically slow down searches. Consider
if you really need to find substrings, most cases looking for words is
enough. There are also other alternatives for searching without
expensive wildcard matches, e.g. indexing a reversed version of text
field or creating a custom analyzer.
http://ravendb.net/docs/article-page/2.0/csharp/client-api/querying/static-indexes/searching
Hope this helps!
Get the WebApi endpoint working to collect your input. This is independent of RavenDB.
Using the RavenDB client, query the database using Linq or one of the other methods.
After the document is retrieved you may need to write some logic to return the expected result.
I skipped the step where the database gets populated with the data to query. I would leverage the RavenDB client tools as much as possible in your app vs trying to use the HTTP api.

Resources