I’m using DatastoreIO from my streaming Dataflow pipeline and getting an error when writing an entity with the same key.
2016-12-10T22:51:04.385Z: Error: (af00222cfd901860): Exception: com.google.datastore.v1.client.DatastoreException: A non-transactional commit may not contain multiple mutations affecting the same entity., code=INVALID_ARGUMENT
If I use a random number in the key then things work but I need to update the same key so is there a transactional way to do this using DataStoreIO?
static class CreateEntityFn extends DoFn<KV<String, Tile>, Entity> {
private static final long serialVersionUID = 0;
private final String namespace;
private final String kind;
CreateEntityFn(String namespace, String kind) {
this.namespace = namespace;
this.kind = kind;
}
public Entity makeEntity(String key, Tile tile) {
Entity.Builder entityBuilder = Entity.newBuilder();
Key.Builder keyBuilder = makeKey(kind, key );
if (namespace != null) {
keyBuilder.getPartitionIdBuilder().setNamespaceId(namespace);
}
entityBuilder.setKey(keyBuilder.build());
entityBuilder.getMutableProperties().put("tile", makeValue(tile.toString()).build());
return entityBuilder.build();
}
#Override
public void processElement(ProcessContext c) {
String key = c.element().getKey();
// this works key = key.concat(":" + UUID.randomUUID().toString());
c.output(makeEntity(key, c.element().getValue()));
}
}
...
...
inputData = pipeline
.apply(PubsubIO.Read.topic(pubsubTopic));
windowedDataStreaming = inputData
.apply(Window.<String>into(
SlidingWindows.of(Duration.standardMinutes(15))
.every(Duration.standardSeconds(31))));
...
...
...
//Create a Datastore entity
PCollection<Entity> siteTileEntities = tileSiteKeyed
.apply(ParDo.named("CreateSiteEntities").of(new CreateEntityFn(options.getNamespace(), options.getKind())));
// write site tiles to datastore
siteTileEntities
.apply(DatastoreIO.v1().write().withProjectId(options.getDataset()));
// Run the pipeline
pipeline.run();
Your code snippet doesn't explain how tileSiteKeyed is created. Presumably it's a PCollection<KV<String, Tile>, but if it might have duplicate String keys, that would explain the issue.
Generally a PCollection<KV<K, V>> may contain multiple KV pairs with the same key. If you'd like to ensure unique keys per window, you can use a GroupByKey to do that. That will give you a PCollection<KV<K, Iterable<V>>> with unique keys per window. Then augment CreateEntityFn to take an Iterable<Tile> and create a single mutation with the changes you need to make.
This error indicates that Cloud Datastore received a Commit request with two mutations for the same key (i.e. it tries to insert the same entity twice or modify the same entity twice).
You can avoid the error by only including one mutation per key per Commit request.
Related
Does java DynamoDB load whole Items when the #DynamoDBTable annotated class maps only a subset of their attributes?
example: "Product" table, holding items with these attributes:
id, name, description. I would like to get the names of several products, without loading the description (which would be a huge amount of data).
Does this code load description from DynamoDB?
#DynamoDBTable(tableName = "Product")
public class ProductName {
private UUID id;
private String name;
#DynamoDBHashKey
#DynamoDBTyped(DynamoDBAttributeType.S)
public UUID getId() { return id; }
public void setId(UUID id) { this.id = id; }
#DynamoDBAttribute
public String getName() { return name; }
public void setName(String name) { this.name = name; }
}
...
DynamoDBMapper dynamoDBMapper = ...
dynamoDBMapper.batchLoad(products); // TODO is description loaded? what is the amount of Consumed Read Capacity Units?
As their docs say:
DynamoDB calculates the number of read capacity units consumed based on item size, not on the amount of data that is returned to an application. For this reason, the number of capacity units consumed will be the same whether you request all of the attributes (the default behavior) or just some of them (using a projection expression). The number will also be the same whether or not you use a filter expression.
As you see, projections does not impact on the amount of capacity units used.
BTW, in your case, description field will be returned anyway, because you do not need to annotate every field with DynamoDB annotation, only those, who are keys, or named differently, or need custom converters. All non-annotated fields will populated from the corresponding DB fields automatically.
Could someone give a simple use case example why someone would use copyToRealm() instead of createObject() ?
It is not clear to me why and when would anyone use copyToRealm() if there is createObject().
In the example here they seem pretty much the same https://realm.io/docs/java/latest/ .
copyToRealm() takes an unmanaged object and connects it to a Realm, while createObject() creates an object directly in a Realm.
For example it is very useful when you copy objects generated by GSON - returned from your Rest API into Realm.
realm.createObject() also returns a RealmProxy instance and is manipulated directly and therefore creates N objects to store N objects, however you can use the following pattern to use only 1 instance of object to store N objects:
RealmUtils.executeInTransaction(realm -> {
Cat defaultCat = new Cat(); // unmanaged realm object
for(CatBO catBO : catsBO.getCats()) {
defaultCat.setId(catBO.getId());
defaultCat.setSourceUrl(catBO.getSourceUrl());
defaultCat.setUrl(catBO.getUrl());
realm.insertOrUpdate(defaultCat);
}
});
But to actually answer your question, copyToRealmOrUpdate() makes sense if you want to persist elements, put them in a RealmList<T> and set that RealmList of newly managed objects in another RealmObject. It happens mostly if your RealmObject classes and the downloaded parsed objects match.
#JsonObject
public class Cat extends RealmObject {
#PrimaryKey
#JsonField(name="id")
String id;
#JsonField(name="source_url")
String sourceUrl;
#JsonField(name="url")
String url;
// getters, setters;
}
final List<Cat> cats = //get from LoganSquare;
realm.executeTransaction(new Realm.Transaction() {
#Override
public void execute(Realm realm) {
Person person = realm.where(Person.class).equalTo("id", id).findFirst();
RealmList<Cat> realmCats = new RealmList<>();
for(Cat cat : realm.copyToRealmOrUpdate(cats)) {
realmCats.add(cat);
}
person.setCats(realmCats);
}
});
I want to use Realm to replace SqLite in Android to store a list of classes, my code is very simple as below.
public class MyRealmObject extends RealmObject {
public String getField() {
return field;
}
public void setField(String field) {
this.field = field;
}
private String field;
...
}
List<MyObject> myObjects = new ArrayList();
Realm realm = Realm.getInstance(this);
for(MyRealmObject realm : realm.allobjects(MyRealmObject.class)) {
myObjects.add(new MyObject(realm));
}
realm.close();
return myObjects;
However, its performance is actually slower than a simple SqlLite table on my tested device, am I using it the wrong way? Is there any optimization tricks?
Why do you want to wrap all your RealmObjects in the MyObject class?. Especially copying the entire result set means you will loose the benefit of using Realm, namely that it doesn't copy data unless needed to.
RealmResults implements the List interface so you should be able to use the two interchangeably.
List<MyRealmObject> myObjects;
Realm realm = Realm.getInstance(this);
myObjects = realm.allObjects(MyRealmObject.class();
return myObjects;
I'm trying to update an entity using Entity Framework version 6.
I'm selecting the entity from the database like so...
public T Find<T>(object id) where T : class
{
return this._dbContext.Set<T>().Find(id);
}
And updating the entity like so..
public T Update<T>(T entity) where T : class
{
// get the primary key of the entity
object id = this.GetPrimaryKeyValue(entity);
// get the original entry
T original = this._dbContext.Set<T>().Find(id);
if (original != null)
{
// do some automatic stuff here (taken out for example)
// overwrite original property values with new values
this._dbContext.Entry(original).CurrentValues.SetValues(entity);
this._dbContext.Entry(original).State = EntityState.Modified;
// commit changes to database
this.Save();
// return entity with new property values
return entity;
}
return default(T);
}
The GetPrimaryKeyValue function is as so...
private object GetPrimaryKeyValue<T>(T entity) where T : class
{
var objectStateEntry = ((IObjectContextAdapter)this._dbContext).ObjectContext
.ObjectStateManager
.GetObjectStateEntry(entity);
return objectStateEntry.EntityKey.EntityKeyValues[0].Value;
}
Just for clarity. I'm selecting the original entry out as I need to perform some concurrency logic (that Ive taken out). I'm not posting that data with the entity and need to select it manually out of the DB again to perform the checks.
I know the GetPrimaryKeyValue function is not ideal if there's more than one primary key on the entity. I just want it to work for now.
When updating, entity framework coughs up the error below when trying to execute the GetPrimaryKeyValue function.
The ObjectStateManager does not contain an ObjectStateEntry with a reference to an object of type 'NAME_OF_ENTITY_IT_CANNOT_FIND'
I've written many repositories before and I've never had this issue, I cannot seem to find why its not working (hence the post).
Any help would be much appreciated.
Thanks guys!
Steve
It seems like you are having issues getting the PK from the entity being passed in. Instead of trying to go through EF to get this data you could either use their Key attribute or create your own and just use reflection to collect what the key names are. This will also allow you to retrieve multiple keys if it is needed. Below is an example I created inside of LinqPad, you should be able to set it to "Program" mode and paste this in and see it work. Hack the code up and use what you may. I implemented an IEntity but it is not required, and you can change the attribute to anything really.
Here are the results:
Keys found:
CustomIdentifier
LookASecondKey
Here is the code:
// this is just a usage demo
void Main()
{
// create your object from wherever
var car = new Car(){ CustomIdentifier= 1, LookASecondKey="SecretKey", Doors=4, Make="Nissan", Model="Altima" };
// pass the object in
var keys = GetPrimaryKeys<Car>(car);
// you have the list of keys now so work with them however
Console.WriteLine("Keys found: ");
foreach(var k in keys)
Console.WriteLine(k);
}
// you probably want to use this method, add whatever custom logic or checking you want, maybe put
private IEnumerable<string> GetPrimaryKeys<T>(T entity) where T : class, IEntity
{
// place to store keys
var keys = new List<string>();
// loop through each propery on the entity
foreach(var prop in typeof(T).GetProperties())
{
// check for the custom attribute you created, replace "EntityKey" with your own
if(prop.CustomAttributes.Any(p => p.AttributeType.Equals(typeof(EntityKey))))
keys.Add(prop.Name);
}
// check for key and throw if not found (up to you)
if(!keys.Any())
throw new Exception("No EntityKey attribute was found, please make sure the entity includes this attribute on at least on property.");
// return all the keys
return keys;
}
// example of the custom attribute you could use
[AttributeUsage(AttributeTargets.Property)]
public class EntityKey : Attribute
{
}
// this interface is not NEEDED but I like to restrict dal to interface
public interface IEntity { }
// example of your model
public class Car : IEntity
{
[EntityKey] // add the attribure to property
public int CustomIdentifier {get;set;}
[EntityKey] // i am demonstrating multiple keys but you can have just one
public string LookASecondKey {get;set;}
public int Doors {get;set;}
public string Make {get;set;}
public string Model {get;set;}
}
i am facing some problems in my project. when i try to update entity it gives me different type of errors.
i read from net. these errors are because
1 - I am getting Object of entity class from method which creates DataContext locally
and in update method id does not update because here another DataContext is created locally.
(even it does not throw any exception)
i found many articles related to this problem
1 - Adding timestamp column in table (does not effect in my project. i tried this)
one guy said that use SINGLE DataContext for everyone.
i did this by creating the following class
public class Factory
{
private static LinqDemoDbDataContext db = null;
public static LinqDemoDbDataContext DB
{
get
{
if (db == null)
db = new LinqDemoDbDataContext();
return db;
}
}
}
public static Student GetStudent(long id)
{
LinqDemoDbDataContext db = Factory.DB;
//LinqDemoDbDataContext db = new LinqDemoDbDataContext();
Student std = (from s in db.Students
where s.ID == id
select s).Single();
return std;
}
public static void UpdateStudent(long studentId, string name, string address)
{
Student std = GetStudent(studentId);
LinqDemoDbDataContext db = Factory.DB;
std.Name = name;
std.Address = address;
db.SubmitChanges();
}
in this case i want to update student details.
it solved my problem. but now the question is.
Is it good approach to use above technique in Web Based application???
Is it good approach to use above technique in Web Based application???
No. DataContext is not thread safe. You cannot share 1 DataContext among the different threads handling different requests safely.
Also - this pattern is called Singleton, not Factory