astyanax column slice query how to do inclusive and exclusive column slice - astyanax

I have a composite column that I can do inclusive and exclusive on both ends of a range like so
AnnotatedCompositeSerializer serializer = info1.getCompositeSerializer();
CompositeRangeBuilder range = serializer.buildRange();
if(fromInclusive)
range = range.greaterThanEquals(from);
else
range = range.greaterThan(from);
if(toInclusive)
range = range.lessThanEquals(to);
else
range = range.lessThan(to);
return range;
GREAT so far, but when I have a normal column family with no composite names where the names are either Integer, Decimal, or String, how do I do the same thing. Right now I just have
ByteBufferRange range = new RangeBuilder().setStart(from).setEnd(to).setLimit(batchSize).build();
but there are no methods for inclusive/exclusive.
How to do this one?
NOTE: Integer and Decimal CAN be two's complement so they could start with fffff and who knows how long the values are since they can be as big as someone wants.
thanks,
Dean

Related

Algorithm to count instances of a value from a file

I am reading through a file of financial data with beneficiaries. I need to count the number of beneficiaries and then calculate their allocated percentage. If there is 1 beneficiary, the allocation is 100%, if there are 2, if there are 3, 33.33%, etc. The file is sorted by investment then beneficiary, so if there is more than one beneficiary per investment they will be in order in the file. Here's an example:
input file data
the output that I want
Here is my code, but it's wrong because this way I am assigning 100% to the first beneficiary, 50% to the second beneficiary, 33.333% to the third, etc. How can I change it to do the count, then create the beneficiaries with the right count? (There is an outer loop which is a table of investments.)
iBeneficiaryCount = 0.
dTempPercentage = 100.
FOR EACH ttJointData WHERE ttJointData.inv-num EQ ttInvestment.inv-num:
IF ttJointData.Joint_Type EQ "Joint" THEN DO:
cTemp = "JT".
RUN CreateOwner (....).
END.
ELSE IF ttJointData.Joint_Type EQ "Beneficiary" THEN DO:
iBeneficiaryCount = iBeneficiaryCount + 1.
dTempPercentage = 100 / iBeneficiaryCount.
RUN AddBeneficiary(ttJointData.investment-num,ttInvestment.benficiary-id,dTempPercentage).
END.
END.
What are the best ways to capture that beneficiary percentage? I am thinking that I need to read through the data and put that value into the ttJointData table. Or is there a way to do it on the loop? Regardless, I need a neat algorithm to count up the instances from an input file and create and assign a percentage value.
You can use a query to calculate the number of beneficiaries before you loop through them.
Something like
DEFINE VARIABLE dTempPercentage AS DECIMAL NO-UNDO.
DEFINE VARIABLE iBeneficiaryCount AS INTEGER NO-UNDO.
DEFINE QUERY qryJD FOR ttJointData.
dTempPercentage = 100.
FOR EACH ttInvestment:
// calculate how many beneficiaries; must use PRESELECT here
OPEN QUERY qryJD PRESELECT EACH ttJointData WHERE ttJointData.inv-num EQ ttInvestment.inv-num.
iBeneficiaryCount = QUERY qryJD:NUM-RESULTS.
dTempPercentage = 100 / iBeneficiaryCount.
GET FIRST qryJD .
DO WHILE AVAILABLE ttJointData :
IF ttJointData.Joint_Type EQ "Joint" THEN DO:
cTemp = "JT".
RUN CreateOwner (....).
END.
ELSE IF ttJointData.Joint_Type EQ "Beneficiary" THEN DO:
RUN AddBeneficiary(ttJointData.investment-num,ttInvestment.benficiary-id,dTempPercentage).
END.
GET NEXT qryJD .
END.
CLOSE QUERY qryJD.
END.

Differences in result in two similar functions: finding the key with maximun value

I am currently having an issue. Basically, I have 2 similar functions in terms of concept but the results do not align. These are the codes I learned from Bioinformatics I on Coursera.
The first code is simply creating a dictionary of occurrences of each k-mer pattern from a text (which is a long stretch of nucleotides). In this case, k is 5.
def FrequencyMap(text,k):
freq ={}
for i in range (0, len(text)-k+1):
freq[text[i:i+k]]=0
for j in range (0, len(text)-k+1):
if text[j:j+k] == text[i:i+k]:
freq[text[i:i+k]] +=1
return freq, max(freq)
The text and the result dictionary are kinda long, but the main point is when I call max(freq), it returns the key 'TTTTC', which has a value of 1.
Meanwhile, I wrote another code that is simply based on the previous code to generate the 5-mer patterns that have the max values (number of occurrences in the text).
def FrequentWords(text, k):
a = FrequencyMap(text, k)
m = max(a.values())
words = []
for i in a:
if a[i]==m:
words.append(i)
return words,m
And this code returns 'ACCTA', which has the value of 99, meaning it appears 99 times in the text. This makes total sense.
I used the same text and k (k=5) for both codes. I ran the codes on Jupyter Notebook. Why does the first one not return 'ACCTA'?
Thank you so much,
Here is the text, if anyone wants to try:
"ACCATCCCTAGGGCATACCTAAGTCTACCTAAAAGGCTACCTAATACCATACCTAATTACCTAACTACCTAAAATAAGTCTACCTAATACCTAATACCTAAAGTTACCTAACGTACCTAATACCTAATACCTAACCACTACCTAATCCGATTTACCTAACAACCGATCGAGTACCTAATCGATACCTAAATAACGGACAATATACCTAATTACCTAATACCTAATACCTAAGTGTACCTAAGACGTCTACCTAATTGTACCTAACTACCTAATTACCTAAGATTAATACCTAATACCTAATTTACCTAATACCTAACGTGGACTACCTAATACCTAACTTTTCCCCTACCTAATACCTAACTGTACCTAAATACCTAATACCTAAGCTACCTAAAGAACAACATTGTACGTGCGCCGTACCTAAATACCTAACAACTACCTAACTGATACCTAATAGTGATTACCTAACGCTTCTACCTAACTACCTAAGTACCTAACGCTACCTAACTACCTAATGTCCACAAAATACCTAATACCTAATAGCTACCTAATTGTGTACCTAAGTACCTAACCTACCTAATAATACCTAAAAATACCTAAGTACCTAACGTACCTAAATTTTACCTAATCTACCTAACGTACCTAATACCTAATTATACCTAATTACCTAATGGTTACCTAAGTTACCTAATATGCCACTACCTAACCTTACCTAAGACCTACCTAATAGGTACCTAACTGGGTACCTAAGGCAGTTTACCTAATTCAGGGCTACCTAATGTACCTAATACCTAAGTACCTAATACCTAATCCCATACCTAATATTTACCTAAGGGCACCGGTACCTAATACCTAATACCTAATACCTAAACCTTCGTACCTAAATACCTAATCTACCTAATGTACCTAAGGTACCTAATACCTAAGTCACTACCTAATACCTAATACCTAATGGGAGGAGCTTACCTAAGGTTACCTAATTACCTAAATACCTAATCGTTACCTAA"
Why does the first one not return 'ACCTA'?
Because max(freq) returns the maximum key of the dictionary. In this case the keys are strings (the k-mers), and strings are compared alphabetically. Hence the maximum one is the last string when the are sorted alphabetically.
If you want the first function to return the k-mer that occurs most often, you should change max(freq) to max(freq.items(), key=lambda key_value_pair: key_value_pair[1])[0]. Here, you are sorting the (kmer, count) pairs (that's the key_value_pair parameter of the lambda expression) based on the frequency and then selecting the kmer.

SQLite3 - Calculated SELECT with padding and concatenation

I have the following SQLite table (a stub of the real table which has a few other columns)
CREATE TABLE IF NOT EXISTS fingers(id INTEGER,intLL INTEGER,fracLat INTEGER,fracLng INTEGER,PRIMARY KEY(id)) WITHOUT ROWID;
A typical entry in this table would be along the lines of
INSERT INTO fingers(id,intLL,fracLat,fracLng) VALUES(1,12899,42513,4025);
From time-to-time I need to query this table to pull out rows for matching intLL values in such a way that a calculated value meets a variable condition. For example
SELECT * FROM fingers WHERE intLL = 12899 AND ('8508' = (CAST((ROUND(CAST(fracLat AS REAL)/500))
AS INTEGER) || CAST((ROUND(CAST(fraCLng AS REAL)/500)) AS INTEGER)));
Explanation
Transform the fractLat and fracLng columns by dividing them by 10,250 or 500. The CAST AS REAL is required to prevent the default integer division that would be performed by SQLite
Round the decimal result to the closest integer. After rounding you will by default get a value with a trailing .0. The CAST AS INTEGER ensures that this is removed
Concatenate the two parts. The concatenation is going wrong. In the present case the concatenated result would be 858 which is not what I want
Compare against an incoming value: 8508 in this case.
My questions
How can I pad the two parts with 0s when required prior to concatenation so as to ensure that they have the same number of digits
Is there a simpler way of achieving this?
One way to pad 0s is to concatenate 00 at the start of the number and with SUBSTR() return the last 2 chars.
Also, you can divide by 500.0 to avoid integer division:
SELECT * FROM fingers
WHERE intLL = 12899
AND '8508' = SUBSTR('00' || CAST(fracLat / 500.0 AS INTEGER), -2) ||
SUBSTR('00' || CAST(fraCLng / 500.0 AS INTEGER), -2)
Another way to do it is with the function printf() which formats a number:
SELECT * FROM fingers
WHERE intLL = 12899
AND '8508' = printf('%02d', fracLat / 500.0) ||
printf('%02d', fraCLng / 500.0)
See the demo.

PHPExcel: Setting column width based on column number

i am using PHPExcel & searched a lot to get the result for setting the column width based on column number. I found results based on column id's but couldnt find any result for setting width based on column number. I am asking to know about, based on column number. What i tried before is
$length = strlen($tempval);
$objPHPExcel->getActiveSheet()->getColumnDimensionByColumn($dataColumn)->setWidth($length+10);
But it is hsowing me fatel error.. what supposed to be the right one??
You can get the Column ID from the Column Number using the
PHPExcel_Cell::stringFromColumnIndex(), pass the column index (e.g. 32 or 7) and it will return the column ID (like AG or H).
There is also a corresponding PHPExcel_Cell::columnIndexFromString() static method.... pass the column ID (like "AB") as an argument, and it will return the column number (e.g. 28).
Note that (for historic reasons) PHPExcel_Cell::stringFromColumnIndex() is 0-based (0 will return A, 1 will return B, etc); whereas PHPExcel_Cell::columnIndexFromString() is 1-based (A will return 1, B will return 2, etc).

What is the fastest way to find if a large integer is power of ten?

I could just use division and modulus in a loop, but this is slow for really large integers. The number is stored in base two, and may be as large as 2^8192. I only need to know if it is a power of ten, so I figure there may be a shortcut (other than using a lookup table).
If your number x is a power of ten then
x = 10^y
for some integer y, which means that
x = (2^y)(5^y)
So, shift the integer right until there are no more trailing zeroes (should be a very low cost operation) and count the number of digits shifted (call this k). Now check if the remaining number is 5^k. If it is, then your original number is a power of 10. Otherwise, it's not. Since 2 and 5 are both prime this will always work.
Let's say that X is your input value, and we start with the assumption.
X = 10 ^ Something
Where Something is an Integer.
So we say the following:
log10(X) = Something.
So if X is a power of 10, then Something will be an Integer.
Example
int x = 10000;
double test = Math.log10(x);
if(test == ((int)test))
System.out.println("Is a power of 10");

Resources