Evict objects from objectify cache - objectify

especially Objectify team,
I'm persisting my objects through this code pattern
Entity filled = ofy().save().toEntity(myPojo);
filled.setUnindexedProperty( "myStuff", "computedSpecialValue" );
datastore.persist(filled);
Reading back my objects, I noticed they are get from cache since Objectify was not notified that it should evict the updated entity from its cache.
I like the Objectify cache feature since it saves me the time to grab data from memcache and reconstuct the objects for each read, so I want my objects to be cached, but I want to be able to evict them.
This discussion says there was no solution in mid 2013, https://groups.google.com/forum/#!msg/objectify-appengine/n3FJjnYVVsk/6Xp99zReOKQJ
If it's still the case, I'd expect an API like
ofy().save().entity(myPojo).evict();
and by the way, I imagine the API would be more consistent if
Entity filled = ofy().save().toEntity(myPojo);
was replaced by
Entity filled = ofy().save().entity(myPojo).toEntity();
Naturally, there's a costly workaround to the issue:
save the entity twice (once manually, then through objectify)

While there is no formal API for evicting cache entries, it's not hard to do:
MemcacheServiceFactory
.getMemcacheService(ObjectifyFactory.MEMCACHE_NAMESPACE)
.delete(key.toWebSafeString());

Related

Objectify : is there a way to know whether the entity is loaded from objectify session or directly from datastore?

I have two question :
1) Suppose i have already loaded 50 entities from datastore using a set of filters and it would be present in the objectify session, after a while, if i try to load the same entities with a different set of filters would it fetch from objectify session or datastore?
2) I have 50 entities already loaded and is available in objectify session, now am trying to load some entities with a set of filters , for example this filter would fetch 55 entities, out of that 50 entities will be the same that i have already loaded, the other 5 are a new ones. Will it fetch all 55 entities from datastore or will it fetch 50 entities from session and the remaining 5 from datastore ?
Objectify always prefers to give you objects from the session. The answer to 1 is that you will get objects from the session. The answer to 2 is you will get (as many as possible) objects from the session.
Keep in mind that queries (ie not get-by-key operations) always reach to the datastore to execute. Depending on a variety of factors, Objectify might issue a keys-only query and then perform a batch get-by-key for any "missing" entities, or Objectify might issue a full query and throw out any extra data which is already present in the session.

Why adding new entity just to modify another one

I'm reading the source of an ASP.NET Core example project (MSDN) and try to understand all.
There's an Edit razor page which shows the values of an entity record in <input> fields allowing the user to see and change different fields of a given record. There's this line:
Movie = await _context.Movie.FirstOrDefaultAsync(m => m.ID == id);
...
_context.Attach(Movie).State = EntityState.Modified;
I don't understand why it adds a new entity and change its EntityState to Modified, instead of fetch the record and change it then call SaveChanges().
My guess is that their example is loading the movie in one call, passing it to the view, then in another update action, passing the modified entity from the view to the controller, which is attaching it, setting it's state to modified, and then calling save changes.
IMHO this is an extremely bad practice with EF for a number of reasons, and I have no idea why Microsoft uses it in examples (other than that it makes CRUD look easy-peazy).
By serializing entities to the view, you are serializing typically far more data to send across the wire than your view actually needs. You give malicious, or curious users far more information about your system than you should.
You are bound to run into serializer errors with bi-directional references. ("A" has reference to "B" which has a reference back to "A") Serializers (like JSON) generally don't like these.
You are bound to run into performance issues with lazy loading calls as the serializer "touches" references. When dealing with collections of results, the resulting lazy load calls can blow performance completely out of the water.
Without lazy loading enabled, you can easily run into issues where referenced data is passed as #null or potentially incomplete collections due to the Context possibly having some referenced data in the cache that it can pull and associate to the entity, but not the complete set of child records.
By passing entities back to the controller you expose your system to unintentional modifications by which an attacker can modify the entity data in ways that you do not intend it to be modified, then when you attach it, set the state to modified, and save, you overwrite the data state. I.e. change FKs, or otherwise alter data that is not supported, or even displayed by your UI.
You are bound to run into stale data issues where data can have changed between the point you read the entity initially and the point it is saved. Attaching and saving takes a brutal "last-in-wins" approach to concurrent data.
My general advice is to:
1. Leverage Select or Automapper's ProjectTo to populate ViewModel classes with just the fields your view will need. This avoids the risks of lazy loads, and minimizes the data passed to the client. (Faster, and reveals nothing extra about your system)
2. Trust absolutely nothing coming back from the client. Validate the returned view model object, then only after you're satisfied it is legit, load the entity(ies) from the context and copy the applicable fields across. This also gives you the opportunity to evaluate row versioning to handle concurrency issues.
Disclaimer: You certainly can address most of my pointed out issues while still passing entities back & forth, but you definitely leave the door wide open to vulnerabilities and bugs creeping in when someone just defaults to an attach & save, or lazy loads start creeping in.

GAE Datastore kind design with a requirement of too many entity fields

One of my datastore entity is growing with too many fields, hence it could be a future performance bottle neck.
As of now I see the entity consists of 100 fields, if I need to fetch 100 entities each having 100 fields, would definitely be a performance hit (considering underlying data serialization and de-serialization while fetching the data from data-store).
So is it a good idea to convert the entire entity to a blob and store it with a key value and later logically parse the data back into a required object format?
Any valuable suggestions please?
Unless you have done some profiling and find that serialization is a real bottleneck, I wouldn't worry about how many fields you have. Object assembly and disassembly in Java is fast. In the unlikely event that you actually are hitting limits (say, thousands of entities with thousands of fields) you can write a custom Objectify translator that eliminates all the reflection overhead.
This sounds like premature optimization.
I'm not so sure if converting the entity to a blob will increase the performance much, since you will still need to deserialise the blob into an entities later on in your application code.
If you never need all the fields of the object, then one method of increasing performance is to use projection queries. (See https://developers.google.com/appengine/docs/java/datastore/projectionqueries)
Projection queries basically allow you to return back only the properties you require. This works because it uses the information stored in the indexes, hence it never needs to deserialise the entity. This means that you must have an index defined for any projection query you use.

Symfony2 Doctrine merge

I am studying https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/reference/working-with-associations.html but I cannot figure out what cascade merge does. I have seen elsewhere that
$new_object = $em->merge($object);
basically creates a new managed object based on $object. Is that correct?
$em->merge() is used to take an Entity which has been taken out of the context of the entity manager and 'reattach it'.
If the Entity was never managed, merge is equivalent to persist.
If the Entity was detached, or serialized (put in a cache perhaps) then merge more or less looks up the id of the entity in the data store and then starts tracking any changes to the entity from that point on.
Cascading a merge extends this behavior to associated entities of the one you are merging. This means that changes are cascaded to the associations and not just the entity being merged.
I know this is an old question, but I think it is worth mentioning that $em->merge() is deprecated and will be removed soon. Check here
Merge operation is deprecated and will be removed in Persistence 2.0.
Merging should be part of the business domain of an application rather
than a generic operation of ObjectManager.
Also please read this doc v3 how they expect entities to be stored
https://www.doctrine-project.org/projects/doctrine-orm/en/latest/cookbook/entities-in-session.html#entities-in-the-session
It is a good idea to avoid storing entities in serialized formats such
as $_SESSION: instead, store the entity identifiers or raw data.

What's the best way to cache complicated search queries in a .NET webapp?

I have a website that allows users to query for specific recipes using various search criteria. For example, you can say "Show me all recipes that I can make in under 30 minutes that will use chicken, garlic and pasta but not olive oil."
This query is sent to the web server over JSON, and deserialized into a SearchQuery object (which has various properties, arrays, etc).
The actual database query itself is fairly expensive, and there's a lot of default search templates that would be used quite frequently. For this reason, I'd like to start caching common queries. I've done a little investigation into various caching technologies and read plenty of other SO posts on the subject, but I'm still looking for advice on which way to go. Right now, I'm considering the following options:
Built in System.Web.Caching: This would provide a lot of control over how many items are in the cache, when they expire, and their priority. However, cached objects are keyed by a string, rather than a hashable object. Not only would I need to be able to convert a SearchQuery object into a string, but the hash would have to be perfect and not produce any collisions.
Develop my own InMemory cache: What I'd really like is a Dictionary<SearchQuery, Results> object that persists in memory across all sessions. Since search results can start to get fairly large, I'd want to be able to cap how many queries would be cached and provide a way for older queries to expire. Something like a FIFO queue would work well here. I'm worried about things like thread safety, and am wondering if writing my own cache is worth the effort here.
I've also looked into some other third party cache providers such as NCache and Velocity. These are both distributed cache providers and are probably completely overkill for what I need at the moment. Plus, it seems every cache system I've seen still requires objects to be keyed by a string. Ideally, I want something that holds a cache in process, allows me to key by an object's hash value, and allows me to control expiration times and priorities.
I'd appreciate any advice or references to free and preferably open source solutions that could help me out here. Thanks!
Based on what you are saying, I recommend you use System.Web.Caching and build that into your DataAccess layer shielding it from the rest of you system. When called you can make your real time query or pull from a cached object based on your business/application needs. I do this today, but with Memcached.
An in-memory cache should be pretty easy to implement. I can't think of any reason why you should have particular concerns about validating the uniqueness of a SearchQuery object versus any other - that is, while the key must be a string, you can just store the original object along with the results in the cache, and validate equality directly after you've got a hit on the hash. I would use System.Web.Caching for the benefits you've noted (expiration, etc.). If there happened to be a collision, then the 2nd one would just not get cached. But this would be extremely rare.
Also, the amount of memory needed to store search results should be trivial. You don't need to keep the data of every single field, of every single row, in complete detail. You just need to keep a fast way to access each result, e.g. an int primary key.
Finally, if there are possibly thousands of results for a search that could be cached, you don't even need to keep an ID for each one - just keep the first 100 or something (as well as the total number of hits). I suspect if you analyzed how people use search results, it's a rare person that goes beyond a few pages. If someone did, then you can just run the query again.
So basically you're just storing a primary key for the first X records of each common search, and then if you get a hit on your cache, all you have to do is run a very inexpensive lookup of a handful of indexed keys.
Give a quick look to the Enterprise library Caching Application Block. Assuming you want a web application wide cache, this might be the solution your looking for.
I'm assuming that generating a database query from a SearchQuery object is not expensive, and you want to cache the result (i.e. rowset) obtained from executing the query.
You could generate the query text from your SearchQuery object and use that text as the key for a lookup using System.Web.Caching.
From a quick reading the documentation for the Cache class it appears that the keys have to be unique - which they would be if you used they query text - not the hash of the key.
EDIT
If you are concerned about long cache keys then check the following links:
Cache key length in asp.net
Maximum length of cache keys in HttpRuntime.Cache object?
It seems that the Cache class stores the cached items in an internal dictionary, which uses the key's hash. Keys (query text) with the same hash would end-up in the same bucket in the dictionary, where its just a quick linear search to find the required one when do a cache lookup. So I think you'd be okay with long key strings.
The asp.net caching is pretty well thought out, and I don't think this is a case where you need something else.

Resources