DataNucleus - HashMultimap datastore created as mediumblob - jdo

I am trying to use the type HashMultimap in one of my entity object.
#Persistent
private HashMultimap<String, String> testMap;
JDO
<field name="testMap" persistence-modifier="persistent" serialized="false">
<join />
</field>
However when i run schema tools with DataNuclues, the field is created as a mediumblob. My question is whether i can force this to be create as a join table.
What i am after is basically a data type of a hashmap that can support duplicate key value.
I am using MySQL as datastore
Thanks

Andy from DataNucleus
HashMultimap is not supported in the default package

Related

How to update single property of multiple entities in specific kind of datastore?

I want to update one property of each entity present in one particular kind of my datastore. In traditional sql, we do something like this as -
update <tablename> set <property> = <value>; {where clause is optional}
Now, how can I do same thing for datastore using golang code?
In Datastore you can't perform an update like that without retrieving the entities. You have to pull all entities in that kind, update the property on each, and re-upsert the now updated entities (preferably in a batch).
Go Datastore Queries: https://cloud.google.com/datastore/docs/concepts/queries#datastore-datastore-basic-query-go
Go Update Entities: https://cloud.google.com/datastore/docs/concepts/entities#datastore-datastore-update-go
Go Batch Upsert: https://cloud.google.com/datastore/docs/concepts/entities#datastore-datastore-batch-upsert-go

Joins in datastore

How to implement joins in datastore ,iam using java ,i want to insert a file(Excel,img,word or pdf) into datastore and retrive a file from datastore.
Joins are not supported in GAE. See this documentation:
http://code.google.com/appengine/docs/java/datastore/jdo/relationships.html
If you are looking for an RDBMS style database in GAE, then Google Cloud SQL would be your choice: http://code.google.com/apis/sql/docs/developers_guide_java.html
Joins are not supported in GAE datastore, but you can use the latest service-CloudSQL
https://cloud.google.com/products/cloud-sql
See this documentation:
https://developers.google.com/cloud-sql/docs/introduction
you can't do joins in the datastore, a better option would be to filter entries as you go along. e.g., you want to join student entity with classes entity. You can't write a simple select s.student, c.class from student s join class c on s.class_id = c.class_id where...
instead, do a filter on student first, get a set of those values ('foreign key') in a buffer and then use that buffer to filter on the other table, class

how to generate a java class code from a sqlite database for ORMLite

Given a sqlite database as input, I want to know how can i can generate an ORMLite java class that map with the associated database. Many thanks.
You could try Telosys Tools, an Eclipse plugin for code generation
working from an existing database with customizable Velocity templates
See: https://sites.google.com/site/telosystools/
A set of templates is available on GitHub for JPA :
//github.com/telosys-tools-community/jpa-templates-TT206-v2
A Java class for JPA is very near to ORMLite so it's possible to adapt the templates
in oder to generate ORMLite java classes
A global tutorial for Spring MVC and JPA :
//sites.google.com/site/telosystutorial/springmvc-jpa-springdatajpa
(you can just consider the JPA bundle)
I am new to ORMLite and also have the same need.
For SQLite, I read and parse the SQL statement in the field "sql" of table "sqlite_master".
Although it works well with tables, I had to find an other way to deal with views; now I use Excel to load data from views into ADO objects and parse fields' properties to generate Java POJO class definition text, then paste it onto IDE.
It's not perfect, saved me a lot of time though.
This is not something that ORMLite can do itself -- you are going to have to help it. If you want to edit your question and include your SQLite schema, I'll edit my answer to include some of the necessary object.
For example, here are some field mappings:
INTEGER -> int
VARCHAR -> String
BOOLEAN -> boolean
TIMESTAMP -> Date
BIGINT -> long
...
I would suggest creating a class and using the TableUtils.getCreateTableStatements(ConnectionSource, Class<T>) method to see what schema is dumped out and how it compares to your existing schema. Then add or modify the fields until you get as close a match as possible.

Avoid DataNucleus joins?

I'm experimenting with moving a JDBC webapp to JDO DataNucleus 2.1.1.
Assume I have some classes that look something like this:
public class Position {
private Integer id;
private String title;
}
public class Employee {
private Integer id;
private String name;
private Position position;
}
The contents of the Position SQL table really don't change very often. Using JDBC, I read the entire table into memory (with the ability to refresh periodically or at-will). Then, when I read an Employee into memory, I simply retrieve the position ID from the Employee table and use that to obtain the in-memory Position instance.
However, using DataNucleus, if I iterate over all Positions:
Extent<Position> extent =pm.getExtent(Position.class, true);
Iterator<Position> iter =extent.iterator();
while(iter.hasNext()) {
Position position =iterPosition.next();
System.out.println(position.toString());
}
And then later, with a different PersistenceManager, iterate over all Employees, obtaining their Position:
Extent<Employee> extent =pm.getExtent(Employee.class, true);
Iterator<Employee> iter =extent.iterator();
while(iter.hasNext()) {
Employee employee =iter.next();
System.out.println(employee.getPosition());
}
Then DataNucleus appears to produce SQL joining the two tables when I obtain an Employee's Position:
SELECT A0.POSITION_ID,B0.ID,B0.TITLE FROM MYSCHEMA.EMPLOYEE A0 LEFT OUTER JOIN MYSCHEMA."POSITION" B0 ON A0.POSITION_ID = B0.ID WHERE A0.ID = <1>
My understanding is that DataNucleus will use a cached Position instance, when available. (Is that correct?) However, I'm concerned that the joins will degrade performance. I'm not yet far enough along to run benchmarks. Are my fears misplaced? Should I continue, and benchmark? Is there a way to have DataNucleus avoid the join?
<jdo>
<package name="com.example.staff">
<class name="Position" identity-type="application" schema="MYSCHEMA" table="Position">
<inheritance strategy="new-table"/>
<field name="id" primary-key="true">
<column name="ID" jdbc-type="integer"/>
</field>
<field name="title">
<column name="TITLE" jdbc-type="varchar"/>
</field>
</class>
</package>
</jdo>
<jdo>
<package name="com.example.staff">
<class name="Employee" identity-type="application" schema="MYSCHEMA" table="EMPLOYEE">
<inheritance strategy="new-table"/>
<field name="id" primary-key="true">
<column name="ID" jdbc-type="integer"/>
</field>
<field name="name">
<column name="NAME" jdbc-type="varchar"/>
</field>
<field name="position" table="Position">
<column name="POSITION_ID" jdbc-type="int" />
<join column="ID" />
</field>
</class>
</package>
</jdo>
I guess what I'm hoping to be able to do is tell DataNucleus to go ahead and read the POSITION_ID int as part of the default fetch group, and see if the corresponding Position is already cached. If so, then set that field. If not, then do the join later, if called upon. Better yet, go ahead and stash that int ID somewhere, and use it if getPosition() is later called. That would avoid the join in all cases.
I would think that knowing the class and the primary key value would be enough to avoid the naive case, but I don't yet know enough about DataNucleus.
With the helpful feedback I've received, my .jdo is now cleaned up. However, after adding the POSITION_ID field to the default fetch group, I'm still getting a join.
SELECT 'com.example.staff.Employee' AS NUCLEUS_TYPE,A0.ID,A0."NAME",A0.POSITION_ID,B0.ID,B0.TITLE FROM MYSCHEMA.EMPLOYEE A0 LEFT OUTER JOIN MYSCHEMA."POSITION" B0 ON A0.POSITION_ID = B0.ID
I understand why it is doing that, the naive method will always work. I was just hoping it was capable of more. Although DataNucleus might not read all columns from the result set, but rather return the cached Position, it is still calling upon the datastore to access a second table, with all that entails - including possible disk seeks and reads. The fact that it will throw that work away is little consolation.
What I was hoping to do was tell DataNucleus that all Positions will be cached, trust me on that. And if for some reason you find one that isn't, blame me for the cache miss. I understand that you'll have to (transparently) perform a separate select on the Position table. (Even better, pin any Positions you do have to go fetch due to a cache miss. That way there won't be a cache miss on the object again.)
That is what I'm doing now using JDBC, by way of a DAO. One of the reasons for investigating a persistence layer was to ditch these DAOs. It is difficult to imagine moving to a persistence layer that can't move beyond naive fetches resulting in expensive joins.
As soon as Employee has not only a Position, but a Department, and other fields, an Employee fetch causes a half dozen tables to be accessed, even though all of those objects are already pinned in the cache, and are addressable given their class and primary key. In fact, I can implement this myself, changing Employee.position to an Integer, creating an IntIdentity, and passing it to PersistenceManager.getObjectByID().
What I think I'm hearing is that DataNucleus is not capable of this optimization. Is that right? It's fine, just not what I expected.
By default, a join will not be done when the Employee entity is fetched from the datastore, it will only be done when Employee.position is actually read (this is called lazy loading).
Additionally, this second fetch can be avoided using the level 2 cache. First check that the level 2 cache is actually enabled (in DataNucleus 1.1 it is disabled by default, in 2.0 it is enabled by default). You should probably then "pin" the class so that the Position entities it will be cached indefinitely:
The level 2 cache can cause issues if other applications use the same database, however, so I would recommend only enabling it for classes such as Position which are rarely changed. For other classes, set the "cacheable" attribute to false (default is true).
EDITED TO ADD:
The <join> tag in your metadata is not suitable for this situation. In fact you don't need to specify the relationship explicitly at all, DataNucleus will figure it out from the types. But you are right when you say that you need POSITION_ID to be read in the default fetch group. This can all be achieved with the following change to your metadata:
<field name="position" default-fetch-group="true">
<column name="POSITION_ID" jdbc-type="int" />
</field>
EDITED TO ADD:
Just to clarify, after making the metadata change descibed above I ran the test code which you provided (backed by a MySQL database) and I saw only these two queries:
SELECT 'com.example.staff.Position' AS NUCLEUS_TYPE,`THIS`.`ID`,`THIS`.`TITLE` FROM `POSITION` `THIS` FOR UPDATE
SELECT 'com.example.staff.Employee' AS NUCLEUS_TYPE,`THIS`.`ID`,`THIS`.`NAME`,`THIS`.`POSITION_ID` FROM `EMPLOYEE` `THIS` FOR UPDATE
If I run only the second part of the code (the Employee extent), then I see only the second query, without any access to the POSITION table at all. Why? Because DataNucleus initially provides "hollow" Position objects and the default implementation of Position.toString() inherited from Object doesn't access any internal fields. If I override the toString() method to return the position's title, and then run the second part of your sample code, then the calls to the database are:
SELECT 'com.example.staff.Employee' AS NUCLEUS_TYPE,`THIS`.`ID`,`THIS`.`NAME`,`THIS`.`POSITION_ID` FROM `EMPLOYEE` `THIS` FOR UPDATE
SELECT `A0`.`TITLE` FROM `POSITION` `A0` WHERE `A0`.`ID` = <2> FOR UPDATE
SELECT `A0`.`TITLE` FROM `POSITION` `A0` WHERE `A0`.`ID` = <1> FOR UPDATE
(and so on, one fetch per Position entity). As you can see, there are no joins being performed, and so I'm surprised to hear that your experience is different.
Regarding your description of how you hope caching should work, that is how the level 2 cache ought to work when a class is pinned. In fact, I wouldn't even bother trying to pre-load Position objects into the cache at application start-up. Just let DN cache them cumulatively.
It's true that you may have to accept some compromises if you adopt JDO...you'll have to relinquish the absolute control that you get with hand-rolled JDBC-based DAOs. But in this case at least you should be able to achieve what you want. It really is one of the archetypal use cases for the level 2 cache.
Adding on to Todd's reply, to clarify a few things.
A <join> tag on a 1-1 relation means nothing. Well it could be interpreted as saying "create a join table to store this relationship", but then DataNucleus doesn't support such a concept since best practice is to use a FK in either owner or related table. So remove the <join>
A "table" on a 1-1 relation suggest that it is stored in a secondary table, yet you don't want that either, so remove it.
You retrieve Position objects, so it issues something like
SELECT 'org.datanucleus.test.Position' AS NUCLEUS_TYPE,A0.ID,A0.TITLE FROM "POSITION" A0
You retrieve Employee objects, so it issues something like
SELECT 'org.datanucleus.test.Employee' AS NUCLEUS_TYPE,A0.ID,A0."NAME" FROM EMPLOYEE A0
Note that it doesn't retrieve the FK for the position here since that field is not in the default fetch group (lazy loaded)
You access the position field of an Employee object, so it needs the FK retrieving (since it doesn't know which Position object relates to this Employee), so it issues
SELECT A0.POSITION_ID,B0.ID,B0.TITLE FROM EMPLOYEE A0 LEFT OUTER JOIN "POSITION" B0 ON A0.POSITION_ID = B0.ID WHERE A0.ID = ?
At this point it doesn't need to retrieve the Position object since it is already present (in the cache), so that object is returned.
All of this is expected behaviour IMHO. You could put the "position" field of Employee into its default fetch group and that FK would be retrieved in step 4, hence removing one SQL call.

Using GUID with SQL Server and NHibernate

I'm running NHibernate and SQL Server CE I am trying to use GUIDs as my ID column. This is the code I already have:
Mapping:
<class name="DatabaseType" table="DBMON_DATABASE_TYPE">
<id name="Id" column="DATABASE_TYPE_ID">
<generator class="guid" />
</id>
<property name="DispName" />
</class>
And this is the create statement it creates:
create table DBMON_DATABASE_TYPE (
DATABASE_TYPE_ID BIGINT not null,
DispName NVARCHAR(255) null,
primary key (DATABASE_TYPE_ID)
)
And this is the kind of insert statement I want to be able to run on it:
Insert into DBMON_DATABASE_TYPE (DATABASE_TYPE_ID,DISPNAME) values ('f5c7181e-e117-4a98-bc06-733638a3a264','DOC')
And this is the error I get when I try that:
Major Error 0x80040E14, Minor Error 26306
> Insert into DBMON_DATABASE_TYPE (DATABASE_TYPE_ID,DISPNAME) values ('f5c7181e-e117-4a98-bc06-733638a3a264','DOC')
Data conversion failed. [ OLE DB status value (if known) = 2 ]
Once again my goal is to be able to use GUIDs as the ID column of my table, they don't even need to be auto generated, I can generate them manually in the Save/SaveOrUpdate methods of NHibernate. If there is any other information you need to know please let me know!
In your mapping you need to identify that you want NHibernate to create a GUID in the database schema (it looks like you are generating your schema from the nhibernate mapping). Off the top of my head the following should work:
<id name="Id" column="DATABASE_TYPE_ID" type="Guid">
<generator class="guid" />
</id>
instead of
DATABASE_TYPE_ID BIGINT not null,
it needs to be
DATABASE_TYPE_ID UNIQUEIDENTIFIER not null,
I would strongly advice you to use NEWSEQUENTIALID() as a default instead since you are using it as a clustered index because it won't cause page splits and thus fragmentation, see here how to use it http://msdn.microsoft.com/en-us/library/ms189786.aspx
Dare I recommend using GuidComb's instead?

Resources