JPA/JPQL COUNT question - count

I have the following JPQL query -
SELECT f.md5
FROM File f, Collection leafCollections, Collection instCollections
WHERE (f.status = com.foo.bar.FileStatus.Happy OR f.status = com.foo.bar.FileStatus.Sad)
AND f.collectionId = leafCollections.collectionId
AND leafCollections.instanceCollectionId = instCollections.collectionId
GROUP BY f.md5, instCollections.collectionId
It basically returns the md5s for files which are organized in a hierarchy (tree) such that if the same MD5 appears in more then one leaf in a particular branch of the hierarchy it will be only shown once (thanks to the GROUP BY).
This works fine. Let's say I get 100 rows back. Each row containing an md5 as a string.
Now I want to get the COUNT of the rows returned. I thought I could simply do:
SELECT COUNT(f.md5)
FROM File f, Collection leafCollections, Collection instCollections
WHERE (f.status = com.foo.bar.FileStatus.Happy OR f.status = com.foo.bar.FileStatus.Sad)
AND f.collectionId = leafCollections.collectionId
AND leafCollections.instanceCollectionId = instCollections.collectionId
GROUP BY f.md5, instCollections.collectionId
However this returns 100 rows, each one containing a long representing the number of times the md5 appeared in a branch. What I wanted was simply to get 1 row back with a long value of 100 being the total count of rows the original query returned. I feel like I am missing something obvious.
Suggestions?

As #doc_180 has commented. Using getSingleResult() will yield the count
Use the following code to get the result.
// Assuming that the Query is strQ
Query q = entityManager.createQuery( strQ );
int count = ( (Integer) q.getSingleResult() ).intValue();

Related

Algorithm to count instances of a value from a file

I am reading through a file of financial data with beneficiaries. I need to count the number of beneficiaries and then calculate their allocated percentage. If there is 1 beneficiary, the allocation is 100%, if there are 2, if there are 3, 33.33%, etc. The file is sorted by investment then beneficiary, so if there is more than one beneficiary per investment they will be in order in the file. Here's an example:
input file data
the output that I want
Here is my code, but it's wrong because this way I am assigning 100% to the first beneficiary, 50% to the second beneficiary, 33.333% to the third, etc. How can I change it to do the count, then create the beneficiaries with the right count? (There is an outer loop which is a table of investments.)
iBeneficiaryCount = 0.
dTempPercentage = 100.
FOR EACH ttJointData WHERE ttJointData.inv-num EQ ttInvestment.inv-num:
IF ttJointData.Joint_Type EQ "Joint" THEN DO:
cTemp = "JT".
RUN CreateOwner (....).
END.
ELSE IF ttJointData.Joint_Type EQ "Beneficiary" THEN DO:
iBeneficiaryCount = iBeneficiaryCount + 1.
dTempPercentage = 100 / iBeneficiaryCount.
RUN AddBeneficiary(ttJointData.investment-num,ttInvestment.benficiary-id,dTempPercentage).
END.
END.
What are the best ways to capture that beneficiary percentage? I am thinking that I need to read through the data and put that value into the ttJointData table. Or is there a way to do it on the loop? Regardless, I need a neat algorithm to count up the instances from an input file and create and assign a percentage value.
You can use a query to calculate the number of beneficiaries before you loop through them.
Something like
DEFINE VARIABLE dTempPercentage AS DECIMAL NO-UNDO.
DEFINE VARIABLE iBeneficiaryCount AS INTEGER NO-UNDO.
DEFINE QUERY qryJD FOR ttJointData.
dTempPercentage = 100.
FOR EACH ttInvestment:
// calculate how many beneficiaries; must use PRESELECT here
OPEN QUERY qryJD PRESELECT EACH ttJointData WHERE ttJointData.inv-num EQ ttInvestment.inv-num.
iBeneficiaryCount = QUERY qryJD:NUM-RESULTS.
dTempPercentage = 100 / iBeneficiaryCount.
GET FIRST qryJD .
DO WHILE AVAILABLE ttJointData :
IF ttJointData.Joint_Type EQ "Joint" THEN DO:
cTemp = "JT".
RUN CreateOwner (....).
END.
ELSE IF ttJointData.Joint_Type EQ "Beneficiary" THEN DO:
RUN AddBeneficiary(ttJointData.investment-num,ttInvestment.benficiary-id,dTempPercentage).
END.
GET NEXT qryJD .
END.
CLOSE QUERY qryJD.
END.

Differences in result in two similar functions: finding the key with maximun value

I am currently having an issue. Basically, I have 2 similar functions in terms of concept but the results do not align. These are the codes I learned from Bioinformatics I on Coursera.
The first code is simply creating a dictionary of occurrences of each k-mer pattern from a text (which is a long stretch of nucleotides). In this case, k is 5.
def FrequencyMap(text,k):
freq ={}
for i in range (0, len(text)-k+1):
freq[text[i:i+k]]=0
for j in range (0, len(text)-k+1):
if text[j:j+k] == text[i:i+k]:
freq[text[i:i+k]] +=1
return freq, max(freq)
The text and the result dictionary are kinda long, but the main point is when I call max(freq), it returns the key 'TTTTC', which has a value of 1.
Meanwhile, I wrote another code that is simply based on the previous code to generate the 5-mer patterns that have the max values (number of occurrences in the text).
def FrequentWords(text, k):
a = FrequencyMap(text, k)
m = max(a.values())
words = []
for i in a:
if a[i]==m:
words.append(i)
return words,m
And this code returns 'ACCTA', which has the value of 99, meaning it appears 99 times in the text. This makes total sense.
I used the same text and k (k=5) for both codes. I ran the codes on Jupyter Notebook. Why does the first one not return 'ACCTA'?
Thank you so much,
Here is the text, if anyone wants to try:
"ACCATCCCTAGGGCATACCTAAGTCTACCTAAAAGGCTACCTAATACCATACCTAATTACCTAACTACCTAAAATAAGTCTACCTAATACCTAATACCTAAAGTTACCTAACGTACCTAATACCTAATACCTAACCACTACCTAATCCGATTTACCTAACAACCGATCGAGTACCTAATCGATACCTAAATAACGGACAATATACCTAATTACCTAATACCTAATACCTAAGTGTACCTAAGACGTCTACCTAATTGTACCTAACTACCTAATTACCTAAGATTAATACCTAATACCTAATTTACCTAATACCTAACGTGGACTACCTAATACCTAACTTTTCCCCTACCTAATACCTAACTGTACCTAAATACCTAATACCTAAGCTACCTAAAGAACAACATTGTACGTGCGCCGTACCTAAATACCTAACAACTACCTAACTGATACCTAATAGTGATTACCTAACGCTTCTACCTAACTACCTAAGTACCTAACGCTACCTAACTACCTAATGTCCACAAAATACCTAATACCTAATAGCTACCTAATTGTGTACCTAAGTACCTAACCTACCTAATAATACCTAAAAATACCTAAGTACCTAACGTACCTAAATTTTACCTAATCTACCTAACGTACCTAATACCTAATTATACCTAATTACCTAATGGTTACCTAAGTTACCTAATATGCCACTACCTAACCTTACCTAAGACCTACCTAATAGGTACCTAACTGGGTACCTAAGGCAGTTTACCTAATTCAGGGCTACCTAATGTACCTAATACCTAAGTACCTAATACCTAATCCCATACCTAATATTTACCTAAGGGCACCGGTACCTAATACCTAATACCTAATACCTAAACCTTCGTACCTAAATACCTAATCTACCTAATGTACCTAAGGTACCTAATACCTAAGTCACTACCTAATACCTAATACCTAATGGGAGGAGCTTACCTAAGGTTACCTAATTACCTAAATACCTAATCGTTACCTAA"
Why does the first one not return 'ACCTA'?
Because max(freq) returns the maximum key of the dictionary. In this case the keys are strings (the k-mers), and strings are compared alphabetically. Hence the maximum one is the last string when the are sorted alphabetically.
If you want the first function to return the k-mer that occurs most often, you should change max(freq) to max(freq.items(), key=lambda key_value_pair: key_value_pair[1])[0]. Here, you are sorting the (kmer, count) pairs (that's the key_value_pair parameter of the lambda expression) based on the frequency and then selecting the kmer.

Best way to search in 950 numbers in a single column?

I have millions of rows in a sqlite3 database. The column 'points' of a single row contains the following example:
{1399808086,1366221142,1374614902,1374608759,1375598069,1375270116,1935207612,1914502332,1913478333,1930188205,1934563311,1942881023,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988 ... up to nearly 1'000 numbers, ending with }
What is the best way to search for rows where the column 'points' includes
a) all three example search values
1366221142,1374614902,1374608759 (Position two, three and four in the above content)
b) as much as possible (3 or 2 or 1) of the above 3 example search values
I tried it with Indexes, but a search with LIKE and '%1366221142%' takes "forever".
Actually I try it with FTS5, but the import into a new created virtual table seems to take several days.
Do you know any other possibility?
If you can use Python then this approach is 'pretty fast'.
import sqlite3
conn = sqlite3.connect(':memory:')
points = '{1399808086,1366221142,1374614902,1374608759,1375598069,1375270116,1935207612,1914502332,1913478333,1930188205,1934563311,1942881023,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860,1929778348,1900414380,1883651988,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,-778051210,-740437658,-749943514,-754136794,-770946570,1376606678,1380850053,1380854148,1381902468,1381971092,308228244,324940180,324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860}'
conn.execute('CREATE TABLE something (recno, points)')
for r in range(10000):
conn.execute('INSERT INTO something (recno, points) values (?,?)', (r, points))
seeking = ['1366221142', '1374614902', '1374608759']
first = True
for row in conn.execute('SELECT recno, points FROM something'):
pointsList = points[1:-1].split(',')
counts = { _:pointsList.count(_) for _ in seeking }
if first:
print (row)
print (counts)
first = False
The (abridged) output is:
(0, '{1399808086,1366221142,1374614902,1374608759,1375598069,1375270116,1935207612,1914502332,1913478333,1930188205,1934563311,1942881023,1373508175,-778100129,-788765075,-788763091,-790856156,-790835404,-791756027,-795938489,-779165370,... 324948372,327078869,292409717,275550503,275554606,275547438,812554558,812489022,1894554398,1895733774,1895741966,1912515343,1943993629,1935471709,1918694493,1914490972,1913409788,1913475260,1913458860}')
{'1374614902': 1, '1374608759': 1, '1366221142': 1}
Note that the code arranges to put 10,000 copies of your string, extended to 1,000 numbers, into the database and then processes them. Of course, the database is in memory, which is a factor to consider.
You could just try it.

Creating a HAVING COUNT(column) > 2 clause in pyDAL

I have the following pyDAL table:
market = db.define_table(
'market',
Field('name'),
Field('ask', type='double'),
Field('timestamp', type='datetime', default=datetime.now)
)
I would like to use the expression language to execute the following SQL:
SELECT * FROM market
GROUP BY name
ORDER BY timestamp DESCENDING
HAVING COUNT(name) > 1
I know how to do the ORDER BY and the GROUP BY:
db().select(
db.market.ALL,
orderby=~db.market.timestamp,
groupby=db.market.name
)
but I do not know how to do a count within a having clause even after reading the section in the web2py book on the HAVING clause.
The count() function returns an expression which can be used both as a field in the select query, and to build an argument to the query's having parameter. The Grouping and counting section from the web2py manual has a few hints on this topic.
The following code will give the desired result. The row objects will hold both the market objects and their respective row counts.
count = db.market.name.count()
rows = db().select(
db.market.ALL,
count,
groupby=db.market.name,
orderby=~db.market.timestamp,
having=(count > 2)
)

How to get count of repeated values in datatable in asp.net

I want total count of repeated values in a datable.
as shown in image total count should be : 2 because repeated values are only 1 and 2.
I tried as given below:
DataView dv = new DataView(dtTemp);
int iRowCount = dv.ToTable(true, "Column1").Rows.Count;
but it returns 3 which is incorrect.
Does anyone knows how to do it.
I don't want to use loop becoz sometimes data table contains 4000-5000 rows so if we use loop it will take much more time to get the total count.
Thanks.

Resources