For fetching document faster I want to fetch only a part of it, as the document size can be large. How to achieve this in dynamodb?
#JsonIgnoreProperties(ignoreUnknown = true)
#DynamoDBTable(tableName="tab")
#Data
#NoArgsConstructor
#JsonInclude(JsonInclude.Include.NON_NULL)
public class Doc {
#DynamoDBAttribute(attributeName ="id")
#JsonProperty("id")
#DynamoDBHashKey
private String id;
#DynamoDBAttribute
Info info;
#DynamoDBAttribute
List<Participants> participants;
#DynamoDBAttribute
List<Phases> phases;
}
This below load method will load complete document. But I don't want that. I want to load only info
Transaction transaction = dynamoDBMapper.load(Doc.class, txnId);
How to achieve that ?
I'll give you a generic DynamoDB answer - you can look up the specific syntax for your language/SDK on your own.
You said that you have a large document and want to fetch only a part of it so it will be faster. This question has two parts - first how to fetch only a part of the document, and second whether it will actually be faster.
The answer for the first part is that yes - it is indeed possible to fetch only a part of the document, by passing additional parameters to the GetItem (or Query, Scan, etc.) request. The older parameter is AttributesToGet, letting you provide a list of top-level attributes you want to fetch for the item. The newer and recommended replacement is the ProjectionExpression parameter - which also allows you to retrieve parts of a nested documents (an attribute which is itself a list or a map).
However, it is less obvious whether this will be "faster". First you need to know that it will not be cheaper - the cost of a read request is calculated based on the size of the entire item (which the DynamoDB implementation reads from disk entirely) - not just the specific attributes you ask to retrieve. It will also not be faster for DynamoDB to read the data from the disk. One thing that can be faster is the networking part (since the response is smaller), but whether or not this translates to any appreciable fastness in your application depends on your exact setup.
Related
I would like to be able to provide a list of all the properties across all documents in a collection.
The best way I can come up with is to query for all documents and then build the list in the client, but this feels wrong.
The only way to do what you want is to read all of the documents. However, if you are worried about bandwidth, then you can do it in a stored procedure that only returns the list of properties.
If you take that route, I recommend that you start with the countDocuments sproc here and be prepared to call as many times as necessary until the continuation comes back empty and there are no 429 errors... or use documentdb-utils which takes care of that for you.
Alternatively, I could give you a full on example here. Just let me know.
Another approach would be to maintain a list of properties as documents are being written. This would be preferred if you need this list often.
You can store Documents with any kind of structure in a collection, they could be all different.
You are not restricted in a collection to store all objects from the same "schema".
So, getting all the properties available on a collection is not really something supported by the DocumentDB API or SDK, you either, read the whole collection, or rely on some sort of convention that you make when you create objects.
You can use Slazure for this. Example follows which lists all property names for a given set of documents:
using SysSurge.Slazure.AzureDocumentDB.Linq;
using SysSurge.Slazure.Core;
using SysSurge.Slazure.Core.Linq.QueryParser;
public void ShowPropertyNames()
{
// Get a reference to the TestCstomers collection
dynamic storage = new QueryableStorage<DynDocument>("URL=https://contoso.documents.azure.com:443/;DBID=DDBExample;TOKEN=VZ+qKPAkl9TtX==");
QueryableCollection<DynDocument> collection = storage.TestCustomers;
// Build collection query
var queryResult = collection.Where("SignedUpForNewsletter = true and Age < 22");
foreach (DynDocument document in queryResult)
{
foreach (KeyValuePair<string, IDynProperty> keyValuePair in document)
{
Console.WriteLine(keyValuePair.Key);
}
}
}
We have a aggregate root as follows.
#AggregateRoot
class Document {
DocumentId id;
}
The problem statement given by the client is "A document can have multiple document as attachments"
So refactoring the model will lead to
//Design One
#AggregateRoot
class Document {
DocumentId id;
//Since Document is an aggregate root it is referenced by its id only
Set<DocumentId> attachments;
attach(Document doc);
detach(Document doc);
}
But this model alone won't be sufficient as the client wants to store some meta information about the attachment, like who attached it and when it was attached. This will lead to creation of another class.
class Attachment {
DocumentId mainDocument;
DocumentId attachedDocument;
Date attachedOn;
UserId attachedBy;
//no operation
}
and we could again refactor the Document model as below
//Design Two
#AggregateRoot
class Document {
DocumentId id;
Set<Attachment> attachments;
attach(Document doc);
detach(Document doc);
}
The different possibilities of modeling that I could think of are given below.
If I go with design one then I could model Attachment class as an aggregate root and use Events to create them whenever a Document is attached. But it doesn't look like an aggregate root.
If I choose design two then Attachment class could be modeled as a value object or an entity.
Or If I use CQRS, I could go with design one and model Attachment as a query model and populate it using Events.
So, which is the right way to model this scenario? Is there any other way to model other what I have mentioned?
You might find in the long term that passing values, rather than entities, makes your code easier to manage. If attach/detach don't care about the entire document, then just pass in the bits they do care about (aka Interface Segregation Principle).
attach(DocumentId);
detach(DocumentId);
this model alone won't be sufficient as the client wants to store some meta information about the attachment, like who attached it and when it was attached.
Yes, that makes a lot of sense.
which is the right way to model this scenario?
Not enough information provided (the polite way of saying "it depends".)
Aggregate boundaries are usually discovered by looking at behaviors, rather than at structures or relationships. Is the attachment relationship just an immutable value that you can add/remove, or is it an entity with an internal state that changes over time? If you change an attachment, what other information do you need, and so on.
In Short
I seem to have landed on a MAJOR anti-pattern of saving objects WAY too many times. I've read through the limited Objectify docs and can't seem to find the right pattern to use.
Details
I have multiple objects I want to store. They are all transient (they don't exist in the database yet) and they have a one-to-many relationship. I don't want to sit and call ofy().save() on every last object in my hierarchy.
In the following example, a Player has a List of Cards.
My Model:
#Entity
public class Player {
#Id private Long id = null;//will be generated
private List<Ref<Card>> cards = new ArrayList<Ref<Card>>();
//getters and setters here
}
public class Card{
#Id private Long id = null;//will be generated
//lots of other fields and getters and setters here
}
My Operation:
I need to create a new player and new card, with the player having a reference to the card in his List "cards."
IDEAL SOLUTION:
I would like to just create the player and card java objects, set their relationships, and pass them to Objectify to be saved. Like this:
Player player = new Player();
Card card = new Card();
player.setPlayer(Ref.create(card));
ofy.save().entity(player).now();
That will fail. The 3rd line attempts to create a new Ref for Card, which cannot be done because Card doesn't have an Id yet, which will be assigned to it once it's already persisted. It seems I must never associate an object with another until one has already been saved.
Current Crappy Solution
So, my solution must be to save the Card first, and then relate it to the Player, then save the player.
Player player = new Player();
Card card = new Card();
ofy().save().entity(card).now();
player.setPlayer(Ref.create(card));
ofy().save().entity(card).now();
This is insane. It seems reasonable at first, but my app is dealing with many more relationships than just this, and with this pattern my algorithm will be a spiderweb of checking for transient objects inside collections before saving the entity I'm actually concerned with.
There MUST be some way to tell Objectify to just SAVE all child/related entities along with the entity I've requested, and furthermore generate the Ids necessary instead of throwing an Exception at me.
Furthermore, I'll also need this sort of "recursive save" solution even when none of my objects are transient (ie they all have IDs already). I can't waste my time iterating through collections and then all the collections WITHIN those collections and saving them all. I'm going to need some way of telling Objectify to just SAVE THIS WHOLE HEIRARCHY OF OBJECTS I just passed you.
I've been reading around this #Load annotation and I feel like maybe there's something in there I'm missing... I don't know. Need help. Documentation is slim.
UPDATED SOLUTION
For posterity -
Using the allocateId() method decouples the entire ID generation constraint away from the database and you get a VERY clean pattern, particularly if you do as I did:
All database #Entity classes get a private constructor and a static public factory for creating transient objects. This static factory method ( createTransient() ) will always allocate a new ID. So then, all client code can use this method for acquiring new transient objects, or the obvious objectify load for acquiring existing persisted instances. Simple. Done. Lovely.
I recommend two things:
Allocate ids manually when you construct your objects using ObjectifyFactory.allocateId(). Do not use the "save with null autogenerates" feature. As you've noticed, it's a PITA to deal with entity objects that have null ids, so don't allow them to exist.
Use deferred saves. ofy().defer().save().entity(blah); You can save almost any number of things this way and they'll only get saved once on commit (or closing of the objectify session). Deferring save on the same entity multiple times produces only a single save.
This pattern of leaving ids null and filling it in on save is a holdover from the JPA days. It didn't work very well with JPA either; there were plenty of frustrating edge cases dealing with entities missing ids (especially when you wanted to put the in maps or sets). The best solution is to simply guarantee that no entity is ever missing an id in the first place.
Note that you'll want to allocate the id in a custom constructor, not the no-args constructor that Objectify uses to build your entity on load. Allocating an id is cheap but still a call to the GAE service layer and you don't want to do this on every load.
I want to do the following using Asp.net Web API 2 and RavenDB.
Send a string to RavenDB.
Lookup a document containing a field called UniqueString that contain the string i passed to RavenDB.
Return either the document that matched, or a "YES/NO a document with that string exists" - message.
I am completely new to NoSQL and RavenDB, so this has proven to be quite difficult :-) I hope someone can assist me, and i assume it is actually quite easy to do though i haven't been able to find any guides showing it.
This has nothing to do with WebAPI 2, but you can do what you ask for using RavenDb combined with WebAPI 2.
First you need to have an index (or let RavenDb auto create one for you) on the document and property/properties you want to be indexed. This index can be created from code like this:
public class MyDocument_ByUniqueString : AbstractIndexCreationTask<MyDocument>
{
public override string IndexName
{
get { return "MyDocumentIndex/ByUniqueString"; }
}
public MyDocument_ByUniqueString()
{
Map = documents => from doc in documents
select new
{
doc.UniqueString
};
}
}
or created in the RavenDb Studio:
from doc in docs.MyDocuments
select new {
doc.UniqueString
}
After that you can do an "advanced document query" (from a WebAPI 2 controller or similar in your case) on that index and pass in a Lucene wildcard:
using (var session = documentStore.OpenSession())
{
var result = session.Advanced
.DocumentQuery<MyDocument>("MyDocumentIndex/ByUniqueString")
.Where("UniqueString: *uniq*")
.ToList();
}
This query will return all documents that has a property "UniqueString" that contains the term "uniq". The document in my case looked like this:
{
"UniqueString": "This is my unique string"
}
Please note however that these kind of wildcards in Lucene might not be super performant as they might need to scan large amount of texts. In the RavenDB documentation there's even a warning aginst this:
Warning
RavenDB allows to search by using such queries but you have to be
aware that leading wildcards drastically slow down searches. Consider
if you really need to find substrings, most cases looking for words is
enough. There are also other alternatives for searching without
expensive wildcard matches, e.g. indexing a reversed version of text
field or creating a custom analyzer.
http://ravendb.net/docs/article-page/2.0/csharp/client-api/querying/static-indexes/searching
Hope this helps!
Get the WebApi endpoint working to collect your input. This is independent of RavenDB.
Using the RavenDB client, query the database using Linq or one of the other methods.
After the document is retrieved you may need to write some logic to return the expected result.
I skipped the step where the database gets populated with the data to query. I would leverage the RavenDB client tools as much as possible in your app vs trying to use the HTTP api.
this is one of the few moments I couldn't find the same question that I have at this place so I'm trying to describe my problem and hope to get some help an ideas!
Let's say...
I want to design a RESTful API for a domain model, that might have entities/resources like the following:
class Product
{
String id;
String name;
Price price;
Set<Tag> tags;
}
class Price
{
String id;
String currency;
float amount;
}
class Tag
{
String id;
String name;
}
The API might look like:
GET /products
GET /products/<product-id>
PUT /prices/<price-id>?currency=EUR&amount=12.34
PATCH /products/<product-id>?name=updateOnlyName
When it comes to updating references:
PATCH /products/<product-id>?price=<price-id>
PATCH /products/<product-id>?price=
may set the Products' Price-reference to another existing Price, or delete this reference.
But how can I add a new reference of an existing Tag to a Product?
If I wanted to store that reference in a relational database, I needed a relationship table 'products_tags' for that many-to-many-relationship, which brings us to a clear solution:
POST /product_tags [product: <product-id>, tag: <tag-id>]
But a document-based NoSQL database (like MongoDB) could store this as a one-to-many-relationship for each Product, so I don't need to model a 'new resource' that has to be created to save a relationship.
But
POST /products/<product-id>/tags/ [name: ...]
creates a new Tag (in a Product),
PUT /products/<product-id>/tags/<tag-id>?name=
creates a new Tag with <tag-id> or replaces an existing
Tag with the same id (in a Product),
PATCH /products/<product-id>?tags=<tag-id>
sets the Tag-list and doesn't add a new Tag, and
PATCH /products/<product-id>/tags/<tag-id>?name=...
sets a certain attribute of a Tag.
So I might want to say something link this:
ATTACH /products/<product-id>?tags=<tag-id>
ATTACH /products/<product-id>/tags?tag=<tag-id>
So the point is:
I don't want to create a new resource,
I don't want to set the attribute of a resource, but
I want to ADD a resource to another resources attribute, which is a set. ^^
Since everything is about resources, one could say:
I want to ATTACH a resource to another.
My question: Which Method is the right one and how should the URL look like?
Your REST is an application state driver, not aimed to be reflection of your entity relationships.
As such, there's no 'if this was the case in the db' in REST. That said, you have pretty good URIs.
You talk about IDs. What is a tag? Isn't a tag a simple string? Why does it have an id? Why isn't its id its namestring?
Why not have PUT /products/<product-id>/tags/tag_name=?
PUT is idempotent, so you are basically asserting the existance of a tag for the product referred to by product-id. If you send this request multiple times, you'd get 201 Created the first time and 200 OK the next time.
If you are building a simple system with a single concurrent user running on a single web server with no concurrency in requests, you may stop reading now
If someone in between goes and deletes that tag, your next put request would re-create the tag. Is this what you want?
With optimistic concurrency control, you would pass along the ETag a of the document everytime, and return 409 Conflict if you have a newer version b on the server and the diff, a..b cannot be reconciled. In the case of tags, you are just using PUT and DELETE verbs; so you wouldn't have to diff/look at reconciliation.
If you are building a moderately advanced concurrent system, with first-writer-wins semantics, running on a single sever, you can stop reading now
That said, I don't think you have considered your transactional boundaries. What are you modifying? A resource? No, you are modifying value objects of the product resource; its tags. So then, according to your model of resources, you should be using PATCH. Do you care about concurrency? Well, then you have much more to think about with regards to PATCH:
How do you represent the diff of a hierarchial JSON object?
How do you know what PATCH requests that conflict in a semantic way - i.e. we may not care about DELETEs on Tags, but two other properties might interact semantically.
The RFC for HTTP PATCH says this:
With PATCH, however, the enclosed entity contains a set of
instructions describing how a resource currently residing on the
origin server should be modified to produce a new version. The PATCH
method affects the resource identified by the Request-URI, and it also
MAY have side effects on other resources; i.e., new resources may be
created, or existing ones modified, by the application of a PATCH.
PATCH is neither safe nor idempotent as defined by [RFC2616], Section
9.1.
I'm probably going to stop putting strange ideas in your head now. Comment if you want me to continue down this path a bit longer ;). Suffice to say that there are many more considerations that can be done.