Optimizing doctrine performance: select * is a bad idea? - symfony

We are trying to optimize a project that is consumig a lot of memory resources. All of our query is done using this kind of sintaxes:
$qb->select(array('e'))
->from('MyBundle:Event', 'e');
This is converted in a query selecting every field of the table, like this:
SELECT t0.id AS id1,
t0.field1 AS field12,
t0.field2 AS field23,
t0.field3 AS field34,
t0.field4 AS field45,
FROM event t0
It's a good ideia for performance to use Partial Object Syntax for hydrating only some predefined fields? I really don't know if it will affect performance and I will have a lot of disadvantages because other fields will be null. What do you use to do in your select queries with Doctrine?
Regards.

My two cents
I suppose that hydration (Object Hydration and lazy loading, of course) is good until you don't know how many and what fields to pull from DB tables and put into objects. If you know that you have to retrieve all fields, is better to get them once and work with them, instead of do every time a query that is time-consuming.
However, as a good practice, when I have retrieved and used my objects I unset them explicitly (not if they are last instructions of my function that will return and implicitly unset them)
Update
$my_obj_repo = $this->getDoctrine()->getManager()->getRepository('MyBundleName:my_obj');
$my_obj = $my_obj_repo->fooHydrateFunction(12); //here I don't pull out from db all data
//do stuff with this object, like extracting data or manipulating data
if($my_obj->getBarField() == null) //this is the only field I've load with fooHydrateFunction, so no lazy loading
{
$my_obj->setBarField($request->query->get('bar');
$entity_manager->persist($my_obj);
$entity_manager->flush();
}
//here my object isn't necessary anymore
unset($my_obj); //this is a PHP instruction
//continue with my business logic

Related

RecordSortedList and temporary table

I have a performance issue with multiple temporary tables that I'm trying to solve with RecordSortedList, but I'm getting strange results. I have a temporary table that has a couple hundred thousand records being inserted into it, and then used elsewhere for joins to other temporary tables. The problem is after trace parsing this solution the insert is taking too long for all the individual inserts and I was hoping to use a RecordSortedList to bulk insert into the staging table. However, I can't find a handle to the temporary table after the RecordSortedList.insertDatabase() call.
I've tried something like this:
RecordSortedList tmpTableSortedList;
MyTempTable myTempTable;
AssetTrans assetTrans;
int i = 1;
tmpTableSortedList = new RecordSortedList(tableNum(MyTempTable));
tmpTableSortedList.sortOrder(fieldNum(MyTempTable, LineNum));
//the real scenario has a much more complicated data gathering, but just for sample
while select * from AssetTrans
{
myTempTable.AssetGroup = assetTrans.AssetGroup
myTempTable.LineNum = i;
tmpTableSortedList.ins(myTempTable);
i++;
}
tmpTableSortedList.insertDatabase();
//strange things happen here
MyTempTable myTempTableCopy;
AnotherTmpTable anotherTmpTable;
tmpTableSortedList.first(myTempTableCopy); //returns a buffer, but not usable buffer in join.
//does not work, I imagine because the myTempTableCopy isn't actually pointing to the
//inserted records above; somehow the temp table is out of scope.
while select * from anotherTmpTable
join myTempTableCopy
where anotherTmpTable.id == myTempTableCopy.id
{
//logic
}
Is there a way to get a pointer to the temp table after the call to RecordSortedList.insertDatabase()? I've also tried linkPhysicalTable() and a few other things, but maybe RecordSortedList was not supposed to be used with tempDb tables?
Edit: Like Aliaksandr points out below this works with RecordInsertList instead of RecordSortedList
but maybe RecordSortedList was not supposed to be used with tempDb tables?
Error message when using TempDb tables:
RecordInsertList or RecordSortedList operations are not allowed with database temporary tables.
So it's not allowed, which might make sense because RecordSortedList is a memory-based object and TempDb tables are not. I would think you could though because I'm not sure there's a huge difference in a TempDb table and a Regular table when they're both stored on disk?
If you wanted to use an InMemory table, look at \Classes\CustVendSettle specifically the variable rslTmpOverUnderReverseTax, which uses an InMemory table.
IF TempDb tables were allowed, you would use getPhysicalTableName() to get the handle combined with useExistingTempDBTable().
Or did I misread your question?
does not work, I imagine because the myTempTableCopy isn't actually pointing to the inserted records above; somehow the temp table is out of scope.
Method new of RecordSortedList has additional Common parameter where you should pass your tempDB table buffer.
Error message when using TempDb tables:
RecordInsertList or RecordSortedList operations are not allowed with database temporary tables.
So it's not allowed, which might make sense because RecordSortedList is a memory-based object and TempDb tables are not.
Although the message says we can't use temporary tables for such operations, indeed we can. We just need to be careful because the code must be executed on the server.
RecordSortedList objects must be server-located before the insertDatabase method can be called. Otherwise, an exception is thrown.
I have a temporary table that has a couple hundred thousand records being inserted into it
There is no limit to the size of a RecordSortedList object, but they are completely memory-based, so there are potential memory consumption problems. So this may not be the best solution in your case.

Avoiding inserting duplicate data into table in MS Dynamics AX

I have a custom table that I'm inserting data into. I do not want duplicate data to end up there, so I created a unique index consisting of 20ish fields that I wish to be unique. As expected, when I run my job to insert data it of course fails and tells me it was trying to insert a duplicate record and stops the job there. If I wrap a tts around it the whole thing fails.
My question is, how can I make it so that the jobs still continues and only just stops the duplicates from inserting? Note, like I mentioned above, I have 20ish fields that make up the key, it'd be cumbersome to write up something that checks for existing records with data matching all 20 fields.
I found it, keeping the unique index on the table, I wrapped it around a try catch, which apparently has its own Exception type for this, in place of just the insert():
try
{
customTable.insert();
}
catch (Exception::DuplicateKeyException)
{
//clears the last infolog message, which is created by trying to insert a duplicate
infolog.clear(Global::infologLine() - 1);
}
Man, I wouldn't delegate the management of this to exception control. If it's only in a Job, it's ok, but if you plan to manage records in other points, we warned that if you use nested try-catch blocks, the control will go to the outermost try-catch block, avoiding internal ones. Well, there are two or three exceptions that aren't (check programming manual, I don't remeber them now, they were related to DDBB record blocking and so on).
I would create a static Exists method in table, and be careful in selecting only recid for performance purposes. Yes, writing 20 fields in a select is a pain, but you will do that ONCE, and in long-time terms it's the best and maintaineable focus.
public MyTable exists(Type1 _field1, Type2 _field2...)
{
boolean ret = false
if (_field1 && _field2 && ...) //mandatory fields
{
ret = (select firstonly RecId from MyTable
where MyTable.Field1 == _field1
&& MyTable.Field2 == _field2 ...).Recid != 0;
}
return ret;
}
In general I wouldn't use this method in insert() or update() except if there's a good reason for this (in that case, It can be interesting to set AllowDuplicates == Yes if performance is critical, beacuse you're managing duplicates manually - be careful with doupdates/doinserts or external inserts/updates). I would use this method in your job or other places to check duplicates before inserting/updating.
Why don't you implement a validate write method and avoid to insert the duplicates?
if (table.validateWrite())
table.insert();
else
log

SQLite - Get a specific row index for a Sorted/Filtered Query

I'm creating a caching system to take data from an SQLite database table using a sorted/filtered query and display it. The tables I'm pulling from can be potentially very large and, of course, I need to minimize impact on memory by only retaining a maximum number of rows in memory at any given time. This is easily done by using LIMIT and OFFSET to load only the records I need and update the cache as needed. Implementing this is trivial. The problem I'm having is determining where the insertion index is for a new record inserted into a particular query so I can update my UI appropriately. Is there an easy way to do this? So far the ideas I've had are:
Dump the entire cache, re-count the Query results (there's no guarantee the new row will be included), refresh the cache and refresh the entire UI. I hope it's obvious why that's not really desirable.
Use my own algorithm to determine whether the new row is included in the current query, if it is included in the current cached results and at what index it should be inserted into if it's within the current cached scope. The biggest downfall of this approach is it's complexity and the risk that my own sorting/filtering algorithm won't match SQLite's.
Of course, what I want is to be able to ask SQLite: Given 'Query A' what is the index of 'Row B', without loading the entire query results. However, so far I haven't been able to find a way to do this.
I don't think it matters but this is all occurring on an iOS device, using the objective-c programming language.
More Info
The Query and subsequent cache is based off of user input. Essentially the user can re-sort and filter (or search) to alter the results they're seeing. My reticence in simply recreating the cache on insertions (and edits, actually) is to provide a 'smoother' UI experience.
I should point out that I'm leaning toward option "2" at the moment. I played around with creating my own caching/indexing system by loading all the records in a table and performing the sort/filter in memory using my own algorithms. So much of the code needed to determine whether and/or where a particular record is in the cache is already there, so I'm slightly predisposed to use it. The danger lies in having a cache that doesn't match the underlying query. If I include a record in the cache that the query wouldn't return, I'll be in trouble and probably crash.
You don't need record numbers.
Save the values of the ordered field in the first and last records of the LIMITed query result.
Then you can use these to check whether the new record falls into this range.
In other words, assuming that you order by the Name field, and that the original query was this:
SELECT Name, ...
FROM mytab
WHERE some_conditions
ORDER BY Name
LIMIT x OFFSET y
then try to get at the new record with a similar query:
SELECT 1
FROM mytab
WHERE some_conditions
AND PrimaryKey = LastInsertedValue
AND Name BETWEEN CachedMin AND CachedMax
Similarly, to find out before (or after) which record the new record was inserted, start directly after the inserted record and use a limit of one, like this:
SELECT Name
FROM mytab
WHERE some_conditions
AND Name > MyInsertedName
AND Name BETWEEN CachedMin AND CachedMax
ORDER BY Name
LIMIT 1
This doesn't give you a number; you still have to check where the returned Name is in your cache.
Typically you'd expect a cache to be invalidated if there were underlying data changes. I think dropping it and starting over will be your simplest, maintainable solution. I would recommend it unless you have a very good reason.
You could write another query that just returned the row count (example below) to see if your cache should be invalidated. That would save recreating the cache when it did not change.
SELECT name,address FROM people WHERE area_code=970;
SELECT COUNT(rowid) FROM people WHERE area_code=970;
The information you'd need from sqlite to know when your cache was invalidated would require some rather intimate knowledge of how the query and/or index was working. I would say that is fairly high coupling.
Otherwise, you'd want to know where it was inserted with regards to the sorting. You would probably key each page on the sorted field. Delete anything greater than the insert/delete field. Any time you change the sorting you'd drop everything.
Something like the below would be a start if you were using C++. I realize you aren't doing C++, but hopefully it is evident as to what I'm trying to do.
struct Person {
std::string name;
std::string addr;
};
struct Page {
std::string key;
std::vector<Person> persons;
struct Less {
bool operator()(const Page &lhs, const Page &rhs) const {
return lhs.key.compare(rhs.key) < 0;
}
};
};
typedef std::set<Page, Page::Less> pages_t;
pages_t pages;
void insert(const Person &person) {
if (sql_insert(person)) {
pages_t::iterator drop_cache_start = pages.lower_bound(person);
//... drop this page and everything after it
}
}
You'd have to do some wrangling to get different datatypes of key to work nicely, but its possible.
Theoretically you could just leave the pages out of it and only use the objects themselves. The database would no longer "own" the data though. If you only fill pages from the database, then you'll have less data consistency worries.
This may be a bit off topic, you aren't re-implementing views are you? It doesn't cache per se, but it isn't clear if that is a requirement of your project.
The solution I came up with is not exactly simple, but it's currently working well. I realized that the index of a record in a Query Statement is also the Count of all it's previous records. What I needed to do was 'convert' all the ORDER statements in the query to a series of WHERE statements that would return only the preceding records and take a count of those records. It's trickier than it sounds (or maybe not...it sounds tricky). The biggest issue I had was making sure the query was, in fact, sorted in a way I could predict. This meant I needed to have an order column in the Order Parameters that was based off of a column with unique values. So, whenever a user sorts on a column, I append to the statement another order parameter on a unique column (I used a "Modified Date Stamp") to break ties.
Creating the WHERE portion of the statement requires more than just tacking on a bunch of ANDs. It's easier to demonstrate. Say you have 3 Order columns: "LastName" ASC, "FirstName" DESC, and "Modified Stamp" ASC (the tie breaker). The WHERE statement would have to look something like this ('?' = record value):
WHERE
"LastName" < ? OR
("LastName" = ? AND "FirstName" > ?) OR
("LastName" = ? AND "FirstName" = ? AND "Modified Stamp" < ?)
Each set of WHERE parameters grouped together by parenthesis are tie breakers. If, in fact, the record values of "LastName" are equal, we must then look at "FirstName", and finally "Modified Stamp". Obviously, this statement can get really long if you're sorting by a bunch of order parameters.
There's still one problem with the above solution. Mathematical operations on NULL values always return false, and yet when you sort SQLite sorts NULL values first. Therefore, in order to deal with NULL values appropriately you've gotta add another layer of complication. First, all mathematical equality operations, =, must be replace by IS. Second, all < operations must be nested with an OR IS NULL to include NULL values appropriately on the < operator. This turns the above operation into:
WHERE
("LastName" < ? OR "LastName" IS NULL) OR
("LastName" IS ? AND "FirstName" > ?) OR
("LastName" IS ? AND "FirstName" IS ? AND ("Modified Stamp" < ? OR "Modified Stamp" IS NULL))
I then take a count of the RowID using the above WHERE parameter.
It turned out easy enough for me to do mostly because I had already constructed a set of objects to represent various aspects of my SQL Statement which could be assembled to generate the statement. I can't even imagine trying to manipulate a SQL statement like this any other way.
So far, I've tested using this on several iOS devices with up to 10,000 records in a table and I've had no noticeable performance issues. Of course, it's designed for single record edits/insertions so I don't really need it to be super fast/efficient.

More efficient SQL for retrieving thousands of records on a view

I am using Linq to Sql as my ORM and I have a list of Ids (up to a few thousand) passed into my retriever method, and with that list I want to grab all User records that correspond to those unique Ids. To clarify, imagine I have something like this:
List<IUser> GetUsersForListOfIds(List<int> ids)
{
using (var db = new UserDataContext(_connectionString))
{
var results = (from user in db.UserDtos
where ids.Contains(user.Id)
select user);
return results.Cast<IUser>().ToList();
}
}
Essentially that gets translated into sql as
select * from dbo.Users where userId in ([comma delimmited list of Ids])
I'm looking for a more efficient way of doing this. The problem is the in clause in sql seems to take too long (over 30 seconds).
Will need more information on your database setup like index's and type of server (Mitch Wheat's post). Type of database would help as well, some databases handle in clauses poorly.
From a trouble shooting standpoint...have you isolated the time delay to the sql server? Can you run the query directly on your server and confirm it's the query taking the extra time?
Select * can also have a bit of a performance impact...could you narrow down the result set that's being returned to just the columns you require?
edit: just saw the 'view comment' that you added...I've had problems with view performance in the past. Is it a materialized view...or could you make it into one? Recreating the view logic as a stored procedure may aslo help.
Have you tried converting this to a list, so the application is doing this in-memory? i.e.:
List<IUser> GetUsersForListOfIds(List<int> ids)
{
using (var db = new UserDataContext(_connectionString))
{
var results = (from user in db.UserDtos.ToList()
where ids.Contains(user.Id)
select user);
return results.Cast<IUser>().ToList();
}
}
This will obviously be memory-intensive if this is being run on a public-facing page on a hard-hit site. If this still takes 30+ seconds though in staging/development, then my guess is that the View itself takes that long to process -OR- you're transferring 10's of MB of data each time you retrieve the view. Either way, my only suggestions are to access the table directly and only retrieve the data you need, rewrite the view, or create a new view for this particular scenario.

Nhibernate -- Excuting some simple HQL

I have map the entities in .hmb.xml and define attribute for all entity in classes.
I have some basic accomplishment and get all the record using below code.
public List<DevelopmentStep> getDevelopmentSteps()
{
List<DevelopmentStep> developmentStep;
developmentStep = Repository.FindAll<DevelopmentStep>(new OrderBy("Id", Order.Asc));
return developmentStep;
}
I have check from net that we can write HQL, Now the problem is how to execute this HQL like..
string hql = "From DevelopmentSteps d inner join table2 t2 d.id=t2.Id where d.id=IDValue";
What additional Classes or other thing I need to add to execute this kind of HQL?
Please help me ---- Thanks
To write dynamic queries, I recommend using the Criteria API. This is dynamic, because you have a single query for several different types and you also want to set the ordering dynamically.
The queries are always object oriented. You don't need to join by foreign keys, you just navigate through the class model. There also no "tables" in the queries, but entities.
Getting (single) instances by ID should always be done using session.Get (or session.Load). Only then NHibernate can also take it directly from the cache without database roundtrip, it it had already been loaded.
for instance:
public IList<T> GetAll<T>(string orderBy)
{
return session.CreateCriteria(typeof(T))
.AddOrder(Order.Asc(orderBy))
.List<T>();
}

Resources