I'm parsing a big XML file, with many items. Each item has many categories, which can repeat. Here's a sample XML.
<item>
<category>Category1</category>
<category>Category2</category>
<category>Category3</category>
<category>Category4</category>
<category>Category5</category>
</item>
<item>
<category>Category1</category>
<category>Category2</category>
<category>Category3</category>
<category>Category7</category>
<category>Category9</category>
</item>
Using doctrine to handle the many-to-many relationship described above, I have a sample code like this:
$em = $this->getDoctrine()->getEntityManager();
foreach ($items as $item) {
[...]
$categories = ... //Array with category names, parsed from the XML.
foreach ($categories as $category) {
//This will check if the 'item' entity
//already has a category with that name.
$exists = $entity->getCategories()->exists(function($key, $element) use ($category) {
return $category == $element->getName();
});
if (!$exists) {
//If there's already one on the database, we'll load it.
//Otherwise, we'll save a new Category..
$query = $this->_entityManager->createQueryBuilder();
$query->select('c')
->from("MyBundle:Category, 'c');
->where("c.name = :name")
->setParameter("name", $category);
}
$result = $query->getQuery()->getOneOrNullResult();
if ($result != null) {
$item->addCategory($result);
} else {
$categoryEntity = new Category($category);
$em->persist($categoryEntity);
$item->addCategory($categoryEntity);
}
}
}
}
The thing is: I only flush() the entitymanager when I complete looping through all items. Therefore, $query->getQuery()->getOneOrNullResult() always returns null, leading me to create duplicated categories.
In the XML example above, I have the following:
| item |
| 1 |
| 2 |
| category.id, category.name |
| 1, Category1 |
| 2, Category2 |
| 3, Category3 |
| 4, Category4 |
| 5, Category5 |
| 6, Category1 |
| 7, Category2 |
| 8, Category3 |
| 9, Category7 |
| 10, Category9 |
| item | category |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
| 2 | 9 |
| 2 | 10 |
I wanted the following:
| item |
| 1 |
| 2 |
| category.id, category.name |
| 1, Category1 |
| 2, Category2 |
| 3, Category3 |
| 4, Category4 |
| 5, Category5 |
| 6, Category7 |
| 7, Category9 |
| item | category |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 9 |
| 2 | 10 |
Simply adding $em->flush() after $em->persist($categoryEntity) solves it, but I don't want to flush things just yet (or for that matter, flush only a category). There are a lot of unfinished stuff to do and I don't want to interrupt my transaction. I want to still be able to rollback to the very beginning and exclude all unused categories, if I need to (and, obviously, without running additional queries).
My question is: is there a way to access both the database and doctrine's internal entity mapping to retrieve an entity that might or might not have an ID? Or will I have to create this mapping myself, run a DQL and check on my mapping?
Doctrine2 can't do this for you,
but it's pretty easy to store the newly created categories in your loop and check them when you get a MISS from the database.
$_created_categories = array();
if (!$exists) {
// If there's already one on the database, we'll load it.
// Otherwise, we'll save a new Category..
$query = $this->_entityManager->createQueryBuilder();
$query->select('c')
->from("MyBundle:Category, 'c');
->where("c.name = :name")
->setParameter("name", $category);
$result = $query->getQuery()->getOneOrNullResult();
if ($result) {
$item->addCategory($result);
elseif ( isset($_created_categories[$category]) ) {
$item->addCategory($_created_categories[$category]);
} else {
$categoryEntity = new Category($category);
$em->persist($categoryEntity);
$item->addCategory($categoryEntity);
$_created_categories[$category] = $categoryEntity;
}
}
There is no memory overhead to store the new categories entities in the $_created_categories array as all objects are manipuled by reference in PHP.
Related
I’m trying to generate a query where I limit the number of sub results I get per a particular category, and could use some help on if there is a good function for this.
Quick Example:
| ID | Category | Value | A bunch of other important columns |
|-----------|-----------------|--------------|-------------------------------------------|
| 1 | A | GUID | |
| 2 | A | GUID | |
| 3 | A | GUID | |
| 4 | A | GUID | |
| 5 | B | GUID | |
| 6 | B | GUID | |
I want to return only N GUIDs per category. (Largely because I’m hitting the 64MB Kusto query limits for some Categories that won’t be useful anyway)
The Top-nested operator looks good at first, BUT I don’t want to do any aggregation, and it filters out other important columns. Per the note on the page, I can use Ignore=max(1) to remove the aggregation, then do some serializing of all my other columns to a certain value, then unpack after the filter. But that feels like I’m doing something very wrong.
I've also tried something like:
| partition by Category ( top 3 by Value)
But it's limited to 64 partitions, and I need closer to 500.
Any idea of a good pattern to do this?
Here you go:
let NumItemsPerCategory = 3;
datatable(ID:long, Category:string, Value:guid)
[
1, "A", guid(40b73f8f-78d2-4eae-bd5b-b3e00f38ac33),
2, "A", guid(043ee507-aadf-4453-bcc6-d8f4f541b043),
3, "A", guid(f71d3cc0-ce46-474f-9dcd-f3883fa08859),
4, "A", guid(bf259fc8-e9fe-4a99-a296-ca81e1fa250a),
5, "B", guid(d8ee3ac7-da76-4e87-a9ed-e5a37c943ad2),
6, "B", guid(282e74ff-3b71-407c-a2a7-92bb1cb17b27),
]
| summarize PackedItems = make_list(pack_all(), NumItemsPerCategory) by Category
| project-away Category
| mv-expand PackedItem = PackedItems
| evaluate bag_unpack(PackedItem)
| project-away PackedItems
Result:
| ID | Category | Value |
|----|----------|--------------------------------------|
| 1 | A | 40b73f8f-78d2-4eae-bd5b-b3e00f38ac33 |
| 2 | A | 043ee507-aadf-4453-bcc6-d8f4f541b043 |
| 3 | A | f71d3cc0-ce46-474f-9dcd-f3883fa08859 |
| 5 | B | d8ee3ac7-da76-4e87-a9ed-e5a37c943ad2 |
| 6 | B | 282e74ff-3b71-407c-a2a7-92bb1cb17b27 |
I am trying to parse the below data in Kusto. Need help.
[[ObjectCount][LinkCount][DurationInUs]]
[ChangeEnumeration][[88][9][346194]]
[ModifyTargetInLive][[3][6][595903]]
Need generic implementation without any hardcoding.
ideally - you'd be able to change the component that produces source data in that format to use a standard format (e.g. CSV, Json, etc.) instead.
The following could work, but you should consider it very inefficient
let T = datatable(s:string)
[
'[[ObjectCount][LinkCount][DurationInUs]]',
'[ChangeEnumeration][[88][9][346194]]',
'[ModifyTargetInLive][[3][6][595903]]',
];
let keys = toscalar(
T
| where s startswith "[["
| take 1
| project extract_all(#'\[([^\[\]]+)\]', s)
);
T
| where s !startswith "[["
| project values = extract_all(#'\[([^\[\]]+)\]', s)
| mv-apply with_itemindex = i keys on (
extend Category = tostring(values[0]), p = pack(tostring(keys[i]), values[i + 1])
| summarize b = make_bag(p) by Category
)
| project-away values
| evaluate bag_unpack(b)
--->
| Category | ObjectCount | LinkCount | DurationInUs |
|--------------------|-------------|-----------|--------------|
| ChangeEnumeration | 88 | 9 | 346194 |
| ModifyTargetInLive | 3 | 6 | 595903 |
I have a SQLite database that looks similar to this:
---------- ------------ ------------
| Car | | Computer | | Category |
---------- ------------ ------------
| id | | id | | id |
| make | | make | | record |
| model | | price | ------------
| year | | cpu |
---------- | weight |
------------
The record column in my Category table contains a comma separated list of the table name and id of the items that belong to that Category, so an entry would look like this:
Car_1,Car_2.
I am trying to split the items in the record on the comma to get each value:
Car_1
Car_2
Then I need to take it one step further and split on the _ and return the Car records.
So if I know the Category id, I'm trying to wind up with this in the end:
---------------- ------------------
| Car | | Car |
---------------| -----------------|
| id: 1 | | id: 2 |
| make: Honda | | make: Toyota |
| model: Civic | | model: Corolla |
| year: 2016 | | year: 2013 |
---------------- ------------------
I have had some success on splitting on the comma and getting 2 records back, but I'm stuck on splitting on the _ and making the join to the table in the record.
This is my query so far:
WITH RECURSIVE record(recordhash, data) AS (
SELECT '', record || ',' FROM Category WHERE id = 1
UNION ALL
SELECT
substr(data, 0, instr(data, ',')),
substr(data, instr(data, ',') + 1)
FROM record
WHERE data != '')
SELECT recordhash
FROM record
WHERE recordhash != ''
This is returning
--------------
| recordhash |
--------------
| Car_1 |
| Car_2 |
--------------
Any help would be greatly appreciated!
If your recursive CTE works as expected then you can split each of the values of recordhash with _ as a delimiter and use the part after _ as the id of the rows from Car to return:
select * from Car
where id in (
select substr(recordhash, 5)
from record
where recordhash like 'Car%'
)
Table is the following:
CREATE TABLE UserLog(uid TEXT, clicks INT, lang TEXT)
Where uid field should be unique.
Here is some sample data:
| uid | clicks | lang |
----------------------------------------
| "898187354" | 4 | "ru" |
| "898187354" | 4 | "ru" |
| "123456789" | 1 | <null> |
| "123456789" | 10 | "en" |
| "140922382" | 13 | <null> |
As you can see, I have multiple rows with where the uid field is now duplicated. I would like for those rows to be merged in a following way:
clicks fields are added, and lang fields are updated if their previous value was null.
For the data shown above, it would look something like this:
| uid | clicks | lang |
---------------------------------------
| "898187354" | 8 | "ru" |
| "123456789" | 11 | "en" |
| "140922382" | 13 | <null> |
It seems that I can find many ways to simply delete duplicate data, which I do not necessarily want to do. I'm unsure how I can introduce logic in SQL statements that does this.
First update:
update userlog
set
clicks = (select sum(u.clicks) from userlog u where u.uid = userlog.uid),
lang = (select max(u.lang) from userlog u where u.uid = userlog.uid)
where not exists (
select 1 from userlog u
where u.uid = userlog.uid and u.rowid < userlog.rowid
);
and then delete the duplicate rows that are not needed:
delete from userlog
where exists (
select 1 from userlog u
where u.uid = userlog.uid and u.rowid < userlog.rowid
);
if i am browsing a page of taxonomy in drupal, is there a way to get term-id of this page??
ex:
select * from term_data limit 2;
+-----+-----+--------------------------+-------------+--------+----------+------+
| tid | vid | name | description | weight | language | trid |
+-----+-----+--------------------------+-------------+--------+----------+------+
| 24 | 1 | Central African Republic | | 0 | en | 0 |
| 26 | 1 | Cyprus | | 0 | en | 0 |
+-----+-----+--------------------------+-------------+--------+----------+------+
if i browse the page of Cyprus, how can i get its tid??
Thanks...
i got the answer, it is arg(2)
arg(0) ==> returns "taxonomy"
arg(1) ==> returns "term"
arg(2) ==> returns tid
in other words
if (arg(0) == 'taxonomy' && is_numeric(arg(2))) {
return arg(2);
}
else
return FALSE;
}