Doctrine2 Batch Insert - symfony

Is it possible to insert several entities to DB with a single query?
When I use an example from here I can see several queries in the Web Debugger
UPDATED 23.08.2012
I found the following related links. I hope it will help to someone to understand a batch processing:
http://www.doctrine-project.org/blog/doctrine2-batch-processing.html
doctrine2 - How to improve flush efficiency?
Doctrine 2: weird behavior while batch processing inserts of entities that reference other entities
The main things:
Some people seem to be wondering why Doctrine does not use
multi-inserts (insert into (...) values (...), (...), (...), ...
First of all, this syntax is only supported on mysql and newer postgresql versions. Secondly, there is no easy way to get hold of all
the generated identifiers in such a multi-insert when using
AUTO_INCREMENT or SERIAL and an ORM needs the identifiers for identity
management of the objects. Lastly, insert performance is rarely the
bottleneck of an ORM. Normal inserts are more than fast enough for
most situations and if you really want to do fast bulk inserts, then a
multi-insert is not the best way anyway, i.e. Postgres COPY or Mysql
LOAD DATA INFILE are several orders of magnitude faster.
These are the reasons why it is not worth the effort to implement an abstraction that performs multi-inserts on mysql and postgresql in
an ORM.
I hope that clears up some questionmarks.

I think that there will be several insert statements, but only one query to the database per "flush" call.
As mentioned here
http://doctrine-orm.readthedocs.org/en/2.0.x/reference/working-with-objects.html
Each "persist" will add an operation to the current UnitOfWork, then it is the call to EntityManager#flush() which will actually write to the database (encapsulating all the operation of the UnitOfWork in a single transaction).
But I have not checked that the behavior I describe above is the actual behavior.
Best regards,
Christophe

Related

Doctrine schema update or Doctrine migrations

What are the practical advantages of Doctrine Migrations over just running a schema update?
Safety?
The orm:schema-tool:update command (doctrine:schema:update in Symfony) warns
This operation should not be executed in a production environment.
but why is this? Sure, it can delete data but so can a migration.
Flexibility?
I thought I could tailor my migrations to add stuff like column defaults but this often doesn't work as Doctrine will notice the discrepancy between the schema and the code on the next diff and stomp over your changes.
When you are using the schema-tool, no history of database modification is kept, and in a production/staging environment this is a big downside.
Let's assume you have a complicated database structure in a live project. And in the next changeset you have to alter the database somehow. For example, your users' contact phones need to be stored in a different format, not a VARCHAR, but three SMALLINT columns for country code, area code and the phone number.
Well, that's not so hard to figure out a query that would fetch the current data, separate it into three values and insert them back. That's when migrations come into play: you can create your new fields, then do the transforms and finally drop the field that was holding the data before.
And even more! You can even describe the backwards process (the down migration), when you need to undo the changes introduced in your migration. Let's assume that someone somewhere relied heavily on the format of the VARCHAR field, and now that you've changed the structure, his piece of code is not working as expected. So, you run migration:down, and everything gets reverted. In this specific case you'd just bring back the old VARCHAR column and concatenate the values back, and then drop the fields.
Doctrine's migration tool basically does most of the work for you. When you diff your schema, it generates all the necessary up's and down's, so only thing you'll have to do is handle the data that could be damaged when the migration is applied.
Also, migrations are something that gives other developers on your team knowledge on when it's time to update their schemas. With just the schema-tool, your teammates would have to run doctrine:schema:update each and every time they pull, `cause they wouldn't know if the schema has really changed.
When using migrations, you always see that there are some updates in the migrations folder, which means you need to update your schema.
I think that you indeed nailed it on Safety. With Migrations you can go back to another state of the table (just like you can do in Git version control). With the schema update command you can only UPDATE the tables. There is no log kept for going back in case of a failure with already saved data in those tables. I don't know exactly, but doesn't a migration also saves the data of the corresponding table that's being updated? That would be essential in my opinion, otherwise there is no big reason to use them.
So yes, I personally think that the main reason for using migrations in a production environment is safety and maybe a bit of flexibility. Safety would be the winner here I think :)
Hope this helps.
edit: Here is another answer with references to the Symfony docs: Is it safe to use doctrine2 migrations in production environment with symfony2 and php
You also cant perform large updates with plain doctrine migration. Like try to update index on 30 mln users database. As it will a lot of time while you app will not be accessible.

How to plan for schema changes in an SQLite database?

I am currently developing an application that will store data in an SQLite database. The database will have much more read- than write-access (in fact, it will be filled with data once, and then almost only reading will happen). The read-performance is therefore much mre important. The schema I am currently developing is very likely to change in the future, with additional columns and tables being added. I do not have very much experience with databases in general. My question is, specifically in SQLite, are there any pitfalls to be considered when changing a schema? Are there any patterns or best practices to plan ahead for such cases?
Here are some suggestions:
Don't use select * from ... because the meaning of * changes with schema changes; explicitly name the columns your query uses
Keep the schema version number in the database and keep code in the application to convert from schema version N to version N+1; then all the code in the application works with the latest schema version; this may mean having default values to fill added columns
You can avoid copying tables for schema updates with SQLite version 3.1.3 or better which supports ALTER TABLE ADD COLUMN...
Look into data-marts and star schema design. This might be overkill for your situation, but at least it will prevent you from designing at random.

Avoding unnecessary updates In Update Query

In our application,Many pages includes "update" and when we update a table,we update unnecessary columns,which dont change,too.
i want to know that is there a way to avoid unnecessary column updates?We use stored procedures in .net 2003.In Following link,i found a solution but it is not for stored procedures.
http://blogs.msdn.com/alexj/archive/2009/04/25/tip-15-how-to-avoid-loading-unnecessary-properties.aspx
Thanks
You can really only accomplish this with a good ORM tool that generates the update query for you. It will typically look at what changed and generate the query for only the columns that changed.
If you're using a stored procedure then all of the column values get sent over to the database anyway when you call the stored procedure so you can't save there. The SP will probably just execute a run-of-the-mill UPDATE statement. The RDMS then takes over. It won't physically change the data on disc if it's not different. It's smart enough for that.
So my answer in short: don't worry about it. It's not really a big deal and requires drastic changes to get what you want and you wont even see performance benefits.
When I was working at a financial software company, performance was vital. Some tables had hundreds of columns, and the update statements were costly. We created our own ORM layer (in java) which included an object cache. When we generated the update statement, we compared the current values of every field to the values as they were on load and only updated the changed fields.
Our db was SQLServer. I do not remember the performance improvement, but it was substantial and worth the investment. We also did bulk inserts and updates where possible.
I believe that Hibernate and the other big ORMs all do this sort of thing for you, if you do not want to write one yourself.

Ways to circumvent a bad database schema?

Our team has been asked to write a Web interface to an existing SQL Server backend that has its roots in Access.
One of the requirements/constraints is that we must limit changes to the SQL backend. We can create views and stored procedures but we have been asked to leave the tables/columns as-is.
The SQL backend is less than ideal. Most of the relationships are implicit due to a lack of foreign keys. Some tables lack primary keys. Table and column names are inconsistent and include characters like spaces, slashes, and pound signs.
Other than getting a new job or asking them to reconsider this requirement, can anyone provide any good patterns for addressing this deficiency?
NOTE: We will be using SQL Server 2005 and ASP.NET with the .NET Framework 3.5.
Views and Synonyms can get you so far, but fixing the underlying structure is clearly going to require more work.
I would certainly try again to convince the stakeholder's that by not fixing the underlying problems they are accruing technical debt and this will create a slower velocity of code moving forward, eventually to a point where the debt can be crippling. Whilst you can work around it, the debt will be there.
Even with a data access layer, the underlying problems on the database such as the keys / indexes you mention will cause problems if you attempt to scale.
Easy: make sure that you have robust Data Access and Business Logic layers. You must avoid the temptation to program directly to the database from your ASPX codebehinds!
Even with a robust database schema, I now make it a practice never to work with SQL in the codebehinds - a practice that developed only after learning the hard way that it has its drawbacks.
Here are a few tips to help the process along:
First, investigate the ObjectDataSource class. It will allow you to build a robust BLL that can still feed controls like the GridView without using direct SQL. They look like this:
<asp:ObjectDataSource ID="ObjectDataSource1" runat="server"
OldValuesParameterFormatString="original_{0}"
SelectMethod="GetArticles" <-- The name of the method in your BLL class
OnObjectCreating="OnObjectCreating" <-- Needed to provide an instance whose constructor takes arguments (see below)
TypeName="MotivationBusinessModel.ContentPagesLogic"> <-- The BLL Class
<SelectParameters>
<asp:SessionParameter DefaultValue="News" Name="category" <-- Pass parameters to the method
SessionField="CurPageCategory" Type="String" />
</SelectParameters>
</asp:ObjectDataSource>
If constructing an instance of your BLL class requires that you pass arguments, you'll need the OnObjectCreating link. In your codebehind, implement this as so:
public void OnObjectCreating(object sender, ObjectDataSourceEventArgs e)
{
e.ObjectInstance = new ContentPagesLogic(sessionObj);
}
Next, implementing the BLL requires a few more things that I'll save you the trouble of Googling. Here's an implementation that matches the calls above.
namespace MotivationBusinessModel <-- My business model namespace
{
public class ContentPagesItem <-- The class that "stands in" for a table/query - a List of these is returned after I pull the corresponding records from the db
{
public int UID { get; set; }
public string Title { get; set; } <-- My DAL requires properties but they're a good idea anyway
....etc...
}
[DataObject] <-- Needed to makes this class pop up when you are looking up a data source from a GridView, etc.
public class ContentPagesLogic : BusinessLogic
{
public ContentPagesLogic(SessionClass inSessionObj) : base(inSessionObj)
{
}
[DataObjectMethodAttribute(DataObjectMethodType.Select, true)] <-- Needed to make this *function* pop up as a data source
public List<ContentPagesItem> GetArticles(string category) <-- Note the use of a generic list - which is iEnumerable
{
using (BSDIQuery qry = new BSDIQuery()) <-- My DAL - just for perspective
{
return
qry.Command("Select UID, Title, Content From ContentLinks ")
.Where("Category", category)
.OrderBy("Title")
.ReturnList<ContentPagesItem>();
// ^-- This is a simple table query but it could be any type of join, View or sproc call.
}
}
}
}
Second, it is easy to add DAL/BLL dlls to your project as additional projects and then add a reference to the main web project. Doing this not only gives your DAL and BLL their own identities but it makes Unit testing a snap.
Third, I almost hate to admit it but this might be one place where Microsoft's Entity Framework comes in handy. I generally dislike the Linq to Entities but it does permit the kind of code-side specification of data relationships you are lacking in your database.
Finally, I can see why changes to your db structure (e.g. moving fields around) would be a problem but adding new constraints (especially indices) shouldn't be. Are they afraid that a foreign key will end up causing errors in other software? If so...isn't this sort of a good thing; you have to have some pain to know where the disease lies, no?
At the very least, you should be pushing for the ability to add indexes as needed for performance reasons. Also, I agree with others that Views can go a long way toward making the structure more sensible. However, this really isn't enough in the long run. So...go ahead and build Views (Stored Procedures too) but you should still avoid coding directly to the database. Otherwise, you are still anchoring your implementation to the database schema and escaping it in the future will be harder than if you isolate db interactions to a DAL.
I've been here before. Management thinks it will be faster to use the old schema as it is and move forward. I'd try to convince them that a new schema would cause this and all future development to be faster and more robust.
If you are still shot down and can not do a complete redesign, approach them with a compromise, where you can rename columns and tables, add PKs, FKs, and indexes. This shouldn't take much time and will help a lot.
short of that you'll have to grind it out. I'd encapsulate everything in stored procedures where you can add all sorts of checks to enforce data integrity. I'd still sneak in all the PK, FK, indexes and column renames as possible.
Since you stated that you need to "limit" the changes to the database schema, it seems that you can possibly add a Primary Key field to those tables that do not have them. Assuming existing applications aren't performing any "SELECT *..." statements, this shouldn't break existing code. After that's been done, create views of the tables that are in a uniform approach, and set up triggers on the view for INSERT, UPDATE, and DELETE statements. It will have a minor performance hit, but it will allow for a uniform interface to the database. And then any new tables that are added should conform to the new standard.
Very hacky way of handling it, but it's an approach I've taken in the past with a limited access to modify the existing schema.
I would do an "end-around" if faced with this problem. Tell your bosses that the best way to handle this is to create a "reporting" database, which is essentially a dynamic copy of the original. You would create scripts or a separate data-pump application to update your reporting database with changes to the original, and propagate changes made to your database back to the original, and then write your web interface to communicate only with your "reporting" database.
Once you have this setup in place, you'd be free to rationalize your copy of the database and bring it back to the realm of sanity by adding primary keys, indexes, normalizing it etc. Even in the short-term, this is a better alternative than trying to write your web interface to communicate with a poorly-designed database, and in the long-term your version of the database would probably end up replacing the original.
can anyone provide any good patterns for addressing this deficiency
Seems that you have been denied the only possibility to "address this deficiency".
I suggest adding the logic of integrity check into your coming stored procedures. Foreign keys, unique keys, it all can be enforced manually. Not a nice work but doable.
One thing you could do is create lots of views, so that the table and column names can be more reasonable, getting rid of the troublesome characters. I tend to prefer not having spaces in my table/column names so that I don't have to use square brackets everywhere.
Unfortunately you will have problems due to the lack of foreign keys, but you may want to create indexes on the column, such as a name, that must be unique, to help out.
At least on the bright side you shouldn't really have many joins to do, so your SQL should be simple.
If you use LinqToSql, you can manually create relationships in the DBML. Additionally, you can rename objects to your preferred standard (instead of the slashes, etc).
I would start with indexed views on each of the tables defining a primary key. And perhaps more indexed views in places where you want indexes on the tables. That will start to address some of your performance issues and all of your naming issues.
I would treat these views as if they were raw tables. This is what your stored procedures or data access layer should be manipulating, not the base tables.
In terms of enforcing integrity rules, I prefer to put that in a data access layer in the code with only minimal integrity checks at the database layer. If you prefer the integrity checks in the database, then you should use sprocs extensively to enforce them.
And don't underestimate the possibility of finding a new job ;-)
It really depends how much automated test coverage you have. If you have little or none, you can't make any changes without breaking stuff (If it wasn't already broken to start with).
I would recommend that you sensibly refactor things as you make other changes, but don't do a large-scale refactoring on the DB, it will introduce bugs and/or create too much testing work.
Some things like missing primary keys are really difficult to work-around, and should be rectified sooner rather than later. Inconsistent table names can basically be tolerated forever.
I do a lot of data integration with poorly-designed databases, and the method that I've embraced is to use a cross-referenced database that sits alongside the original source db; this allows me to store views, stored procedures, and maintenance code in an associated database without touching the original schema. The one exception I make to touching the original schema is that I will implement indexes if needed; however, I try to avoid that if at all possible.
The beauty of this method is you can isolate new code from old code, and you can create views that address poor entity definitions, naming errors, etc without breaking other code.
Stu

Saving MFC Model as SQLite database

I am playing with a CAD application using MFC. I was thinking it would be nice to save the document (model) as an SQLite database.
Advantages:
I avoid file format changes (SQLite takes care of that)
Free query engine
Undo stack is simplified (table name, column name, new value
and so on...)
Opinions?
This is a fine idea. Sqlite is very pleasant to work with!
But remember the old truism (I can't get an authoritative answer from Google about where it originally is from) that storing your data in a relational database is like parking your car by driving it into the garage, disassembling it, and putting each piece into a labeled cabinet.
Geometric data, consisting of points and lines and segments that refer to each other by name, is a good candidate for storing in database tables. But when you start having composite objects, with a heirarchy of subcomponents, it might require a lot less code just to use serialization and store/load the model with a single call.
So that would be a fine idea too.
But serialization in MFC is not nearly as much of a win as it is in, say, C#, so on balance I would go ahead and use SQL.
This is a great idea but before you start I have a few recommendations:
Be careful that each database is uniquely identifiable in some way besides file name such as having a table that describes the file within the database.
Take a look at some of the MFC based examples and wrappers already available before creating your own. The ones I have seen had borrowed on each to create a better result. Google: MFC SQLite Wrapper.
Using SQLite database is also useful for maintaining state. Think ahead about how you would manage keeping in mind what features are included and are missing in SQLite.
You can also think now about how you may extend your application to the web by making sure your database table structure is easily exportable to other SQL database systems- as well as easy enough to extend to a backup system.

Resources