Select DISTINCT on a single column using Silverstripe ORM - silverstripe

My table:
-----------------------------------------
| ID | RoomTypeId | ChargeTypeId | Name |
-----------------------------------------
| 1 | 23 | 32 | DD |
| 2 | 26 | 32 | DD |
| 3 | 28 | 31 | CC |
-----------------------------------------
The ORM does already DISTINCT by default, but does so on every column and returns all 3
The return I need:
-----------------------
| ChargeTypeId | Name |
-----------------------
| 32 | DD |
| 31 | CC |
-----------------------
Hoping there is actually a implemented method of achieving this without having to DB::query()

I found that toMap() will create a DISTINCT query based on your chosen columns
Example:
$result = \ChargeTypes::get()->toMap("ChargeTypeId", "Name")
$result->toArray():
array(2) {
[32]=>
string(2) "DD"
[6]=>
string(2) "CC"
}
UPDATE I don't believe that this actually creates a DISTINCT query it just worked in my case and the following will clearly elaborate as to why
$myArray = array();
$myArray[32] = "DD";
$myArray[32] = "DD";
$myArray[32] = "DD";
$myArray[6] = "CC";
$myArray[6] = "CC";
var_dump($myArray);
Result:
array(2) {
[32]=>
string(2) "DD"
[6]=>
string(2) "CC"
}
So in theory as long as your first key is unique, this isn't actually that bad of solution despite the redundant iterations.

Related

Parse data in Kusto

I am trying to parse the below data in Kusto. Need help.
[[ObjectCount][LinkCount][DurationInUs]]
[ChangeEnumeration][[88][9][346194]]
[ModifyTargetInLive][[3][6][595903]]
Need generic implementation without any hardcoding.
ideally - you'd be able to change the component that produces source data in that format to use a standard format (e.g. CSV, Json, etc.) instead.
The following could work, but you should consider it very inefficient
let T = datatable(s:string)
[
'[[ObjectCount][LinkCount][DurationInUs]]',
'[ChangeEnumeration][[88][9][346194]]',
'[ModifyTargetInLive][[3][6][595903]]',
];
let keys = toscalar(
T
| where s startswith "[["
| take 1
| project extract_all(#'\[([^\[\]]+)\]', s)
);
T
| where s !startswith "[["
| project values = extract_all(#'\[([^\[\]]+)\]', s)
| mv-apply with_itemindex = i keys on (
extend Category = tostring(values[0]), p = pack(tostring(keys[i]), values[i + 1])
| summarize b = make_bag(p) by Category
)
| project-away values
| evaluate bag_unpack(b)
--->
| Category | ObjectCount | LinkCount | DurationInUs |
|--------------------|-------------|-----------|--------------|
| ChangeEnumeration | 88 | 9 | 346194 |
| ModifyTargetInLive | 3 | 6 | 595903 |

Split data in SQLite column

I have a SQLite database that looks similar to this:
---------- ------------ ------------
| Car | | Computer | | Category |
---------- ------------ ------------
| id | | id | | id |
| make | | make | | record |
| model | | price | ------------
| year | | cpu |
---------- | weight |
------------
The record column in my Category table contains a comma separated list of the table name and id of the items that belong to that Category, so an entry would look like this:
Car_1,Car_2.
I am trying to split the items in the record on the comma to get each value:
Car_1
Car_2
Then I need to take it one step further and split on the _ and return the Car records.
So if I know the Category id, I'm trying to wind up with this in the end:
---------------- ------------------
| Car | | Car |
---------------| -----------------|
| id: 1 | | id: 2 |
| make: Honda | | make: Toyota |
| model: Civic | | model: Corolla |
| year: 2016 | | year: 2013 |
---------------- ------------------
I have had some success on splitting on the comma and getting 2 records back, but I'm stuck on splitting on the _ and making the join to the table in the record.
This is my query so far:
WITH RECURSIVE record(recordhash, data) AS (
SELECT '', record || ',' FROM Category WHERE id = 1
UNION ALL
SELECT
substr(data, 0, instr(data, ',')),
substr(data, instr(data, ',') + 1)
FROM record
WHERE data != '')
SELECT recordhash
FROM record
WHERE recordhash != ''
This is returning
--------------
| recordhash |
--------------
| Car_1 |
| Car_2 |
--------------
Any help would be greatly appreciated!
If your recursive CTE works as expected then you can split each of the values of recordhash with _ as a delimiter and use the part after _ as the id of the rows from Car to return:
select * from Car
where id in (
select substr(recordhash, 5)
from record
where recordhash like 'Car%'
)

'merge' rows if they are duplicated in a table - SQLite

Table is the following:
CREATE TABLE UserLog(uid TEXT, clicks INT, lang TEXT)
Where uid field should be unique.
Here is some sample data:
| uid | clicks | lang |
----------------------------------------
| "898187354" | 4 | "ru" |
| "898187354" | 4 | "ru" |
| "123456789" | 1 | <null> |
| "123456789" | 10 | "en" |
| "140922382" | 13 | <null> |
As you can see, I have multiple rows with where the uid field is now duplicated. I would like for those rows to be merged in a following way:
clicks fields are added, and lang fields are updated if their previous value was null.
For the data shown above, it would look something like this:
| uid | clicks | lang |
---------------------------------------
| "898187354" | 8 | "ru" |
| "123456789" | 11 | "en" |
| "140922382" | 13 | <null> |
It seems that I can find many ways to simply delete duplicate data, which I do not necessarily want to do. I'm unsure how I can introduce logic in SQL statements that does this.
First update:
update userlog
set
clicks = (select sum(u.clicks) from userlog u where u.uid = userlog.uid),
lang = (select max(u.lang) from userlog u where u.uid = userlog.uid)
where not exists (
select 1 from userlog u
where u.uid = userlog.uid and u.rowid < userlog.rowid
);
and then delete the duplicate rows that are not needed:
delete from userlog
where exists (
select 1 from userlog u
where u.uid = userlog.uid and u.rowid < userlog.rowid
);

R apply script output in different formats for similar inputs

I'm using a double apply function to get a list of p-values for cor.test between any two columns of two tables.
hel_plist<-apply(bc, 2, function(x) { apply(otud, 2, function(y) { if (cor.test(x,y,method="spearman", exact=FALSE)$p.value<0.05){cor.test(x,y,method="spearman", exact=FALSE)$p.value}}) })
The otud data.frame is 90X11 (90rows,11 colums or to say dim(otud) 90 11) and will be used with different data.frames.
bc and hel - are both 90X2 data.frame-s - so for both I get 2*11=22 p-values out of functions
bc_plist<-apply(bc, 2, function(x) { apply(otud, 2, function(y) { if (cor.test(x,y,method="spearman", exact=FALSE)$p.value<0.05){cor.test(x,y,method="spearman", exact=FALSE)$p.value}}) })
hel_plist<-apply(hel, 2, function(x) { apply(otud, 2, function(y) { if (cor.test(x,y,method="spearman", exact=FALSE)$p.value<0.05){cor.test(x,y,method="spearman", exact=FALSE)$p.value}}) })
For bc I will have an output with dim=NULL a list of elements of otunames$bcnames$ p-value (a format that I have always got from these scripts and are happy with)
But for hel I will get and output of dim(hel) 11 2 - an 11X2 table with p-values written inside.
Shortened examples of output.
hel_plist
+--------+--------------+--------------+
| | axis1 | axis2 |
+--------+--------------+--------------+
| Otu037 | 1.126362e-18 | 0.01158251 |
| Otu005 | 3.017458e-2 | NULL |
| Otu068 | 0.00476002 | NULL |
| Otu070 | 1.27646e-15 | 5.252419e-07 |
+--------+--------------+--------------+
bc_plist
$axis1
$axis1$Otu037
[1] 1.247717e-06
$axis1$Otu005
[1] 1.990313e-05
$axis1$Otu068
[1] 5.664597e-07
Why is it like that when the input formats are all the same? (Shortened examples)
bc
+-------+-----------+-----------+
| group | axis1 | axis2 |
+-------+-----------+-----------+
| 1B041 | 0.125219 | 0.246319 |
| 1B060 | -0.022412 | -0.030227 |
| 1B197 | -0.088005 | -0.305351 |
| 1B222 | -0.119624 | -0.144123 |
| 1B227 | -0.148946 | -0.061741 |
+-------+-----------+-----------+
hel
+-------+---------------+---------------+
| group | axis1 | axis2 |
+-------+---------------+---------------+
| 1B041 | -0.0667782322 | -0.1660606406 |
| 1B060 | 0.0214470932 | -0.0611351008 |
| 1B197 | 0.1761876858 | 0.0927570627 |
| 1B222 | 0.0681058251 | 0.0549292399 |
| 1B227 | 0.0516864361 | 0.0774155225 |
| 1B235 | 0.1205676221 | 0.0181712761 |
+-------+---------------+---------------+
How could I force my scripts to always produce "flat" outputs as in the case of bc
OK different output-s are caused because of the NULL results from conditional function in bc_plist case. If I'd to modify code to replace possible NULL-s with NA-s I'd get 2d tables in any case.
So to keep things constant :
bc_nmds_plist<-apply(bc_nmds, 2, function(x) { apply(stoma_otud, 2, function(y) { if (cor.test(x,y,method="spearman", exact=FALSE)$p.value<0.05){cor.test(x,y,method="spearman", exact=FALSE)$p.value}else NA}) })
And I get a 2d tabel out for bc_nmds_plist too.
So I guess this thing can be called solved - as I now have a piece of code that produces predictable output on any correct input.
If anyone has any idea how to force the output to conform to previos bc_plist format instead I would still be interested as I do actually prefer that form:
$axis1
$axis1$Otu037
[1] 1.247717e-06
$axis1$Otu005
[1] 1.990313e-05
$axis1$Otu068
[1] 5.664597e-07

Doctrine: Retrieving entities that may not be saved to the database yet

I'm parsing a big XML file, with many items. Each item has many categories, which can repeat. Here's a sample XML.
<item>
<category>Category1</category>
<category>Category2</category>
<category>Category3</category>
<category>Category4</category>
<category>Category5</category>
</item>
<item>
<category>Category1</category>
<category>Category2</category>
<category>Category3</category>
<category>Category7</category>
<category>Category9</category>
</item>
Using doctrine to handle the many-to-many relationship described above, I have a sample code like this:
$em = $this->getDoctrine()->getEntityManager();
foreach ($items as $item) {
[...]
$categories = ... //Array with category names, parsed from the XML.
foreach ($categories as $category) {
//This will check if the 'item' entity
//already has a category with that name.
$exists = $entity->getCategories()->exists(function($key, $element) use ($category) {
return $category == $element->getName();
});
if (!$exists) {
//If there's already one on the database, we'll load it.
//Otherwise, we'll save a new Category..
$query = $this->_entityManager->createQueryBuilder();
$query->select('c')
->from("MyBundle:Category, 'c');
->where("c.name = :name")
->setParameter("name", $category);
}
$result = $query->getQuery()->getOneOrNullResult();
if ($result != null) {
$item->addCategory($result);
} else {
$categoryEntity = new Category($category);
$em->persist($categoryEntity);
$item->addCategory($categoryEntity);
}
}
}
}
The thing is: I only flush() the entitymanager when I complete looping through all items. Therefore, $query->getQuery()->getOneOrNullResult() always returns null, leading me to create duplicated categories.
In the XML example above, I have the following:
| item |
| 1 |
| 2 |
| category.id, category.name |
| 1, Category1 |
| 2, Category2 |
| 3, Category3 |
| 4, Category4 |
| 5, Category5 |
| 6, Category1 |
| 7, Category2 |
| 8, Category3 |
| 9, Category7 |
| 10, Category9 |
| item | category |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
| 2 | 9 |
| 2 | 10 |
I wanted the following:
| item |
| 1 |
| 2 |
| category.id, category.name |
| 1, Category1 |
| 2, Category2 |
| 3, Category3 |
| 4, Category4 |
| 5, Category5 |
| 6, Category7 |
| 7, Category9 |
| item | category |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 9 |
| 2 | 10 |
Simply adding $em->flush() after $em->persist($categoryEntity) solves it, but I don't want to flush things just yet (or for that matter, flush only a category). There are a lot of unfinished stuff to do and I don't want to interrupt my transaction. I want to still be able to rollback to the very beginning and exclude all unused categories, if I need to (and, obviously, without running additional queries).
My question is: is there a way to access both the database and doctrine's internal entity mapping to retrieve an entity that might or might not have an ID? Or will I have to create this mapping myself, run a DQL and check on my mapping?
Doctrine2 can't do this for you,
but it's pretty easy to store the newly created categories in your loop and check them when you get a MISS from the database.
$_created_categories = array();
if (!$exists) {
// If there's already one on the database, we'll load it.
// Otherwise, we'll save a new Category..
$query = $this->_entityManager->createQueryBuilder();
$query->select('c')
->from("MyBundle:Category, 'c');
->where("c.name = :name")
->setParameter("name", $category);
$result = $query->getQuery()->getOneOrNullResult();
if ($result) {
$item->addCategory($result);
elseif ( isset($_created_categories[$category]) ) {
$item->addCategory($_created_categories[$category]);
} else {
$categoryEntity = new Category($category);
$em->persist($categoryEntity);
$item->addCategory($categoryEntity);
$_created_categories[$category] = $categoryEntity;
}
}
There is no memory overhead to store the new categories entities in the $_created_categories array as all objects are manipuled by reference in PHP.

Resources