Complex select in SQLite view - sqlite

I have two tables where Security holds the access bit mask for a given NTFS file system scan and FileSystemRights which equates to the string representations for the well known bit masks. I need to create a view which exposes the expected (not just proper) string representations for a given bit mask. The problem is several enum values composite and contain combinations of lower values, so the desired idea is not to repeat the implicit values.
For example, a value of 1179817 (Security.Id = 24) should only report ReadAndExecute and Synchronize, excluding ExecuteFile, ListDirectory, Read, ReadAttributes, ReadData, ReadExtendedAttributes, ReadPermissions and Traverse, as those are all part of ReadAndExecute (eg. ReadAndExecute & Read == Read). Its obviously correct to show them all, but a user wants only to see the non implicit values.
I'm lost within the constraints of SQL to produce a join that behaves like this without some abysmal nested case that would be a nightmare to look at.
Does a better programmatic approach exist?
FileSystemRights
================
Id Name Value
-- ---- -----
1 None 0
2 ListDirectory 1
3 ReadData 1
4 WriteData 2
5 CreateFiles 2
6 CreateDirectories 4
7 AppendData 4
8 ReadExtendedAttributes 8
9 WriteExtendedAttributes 16
10 ExecuteFile 32
11 Traverse 32
12 DeleteSubdirectoriesAndFiles 64
13 ReadAttributes 128
14 WriteAttributes 256
15 Write 278
16 Delete 65536
17 ReadPermissions 131072
18 Read 131209
19 ReadAndExecute 131241
20 Modify 197055
21 ChangePermissions 262144
22 TakeOwnership 524288
23 Synchronize 1048576
24 FullControl 2032127
25 GenericAll 268435456
26 GenericExecute 536870912
27 GenericWrite 1073741824
28 GenericRead 2147483648
Security
========
Id FileSystemRights IdentityReference
-- ---------------- -----------------
20 2032127 BUILTIN\Administrators
21 2032127 BUILTIN\Administrators
22 2032127 NT AUTHORITY\SYSTEM
23 268435456 CREATOR OWNER
24 1179817 BUILTIN\Users
25 4 BUILTIN\Users
26 2 BUILTIN\Users
MyView
======
SELECT s.Id AS SecurityId,
f.Name
FROM Security s
JOIN FileSystemRights f
ON CASE f.Value
WHEN 0 THEN s.FileSystemRights = f.Value
ELSE (s.FileSystemRights & f.Value) == f.Value
END
ORDER BY s.Id, f.Name;

Add the actual value of the name to the query.
Then wrap another query around that to filter out values for the same entry that are a subset of another value:
WITH AllValues(SecurityId, Name, Value) AS (
SELECT s.Id,
f.Name,
f.Value
FROM Security s
JOIN FileSystemRights f
ON CASE f.Value
WHEN 0 THEN s.FileSystemRights = f.Value
ELSE (s.FileSystemRights & f.Value) == f.Value
END
)
SELECT SecurityId,
Name
FROM AllValues
WHERE NOT EXISTS (SELECT *
FROM AllValues AS AV2
WHERE AV2.SecurityId = AllValues.SecurityId
AND (AV2.Value & AllValues.Value) != 0
AND AV2.Value > AllValues.Value
)
ORDER BY 1, 2;

Related

Extracting record from big endian data

I have the following code for network protocol implementation. As the protocol is big endian, I wanted to use the Bit_Order attribute and High_Order_First value but it seems I made a mistake.
With Ada.Unchecked_Conversion;
with Ada.Text_IO; use Ada.Text_IO;
with System; use System;
procedure Bit_Extraction is
type Byte is range 0 .. (2**8)-1 with Size => 8;
type Command is (Read_Coils,
Read_Discrete_Inputs
) with Size => 7;
for Command use (Read_Coils => 1,
Read_Discrete_Inputs => 4);
type has_exception is new Boolean with Size => 1;
type Frame is record
Function_Code : Command;
Is_Exception : has_exception := False;
end record
with Pack => True,
Size => 8;
for Frame use
record
Function_Code at 0 range 0 .. 6;
Is_Exception at 0 range 7 .. 7;
end record;
for Frame'Bit_Order use High_Order_First;
for Frame'Scalar_Storage_Order use High_Order_First;
function To_Frame is new Ada.Unchecked_Conversion (Byte, Frame);
my_frame : Frame;
begin
my_frame := To_Frame (Byte'(16#32#)); -- Big endian version of 16#4#
Put_Line (Command'Image (my_frame.Function_Code)
& " "
& has_exception'Image (my_frame.Is_Exception));
end Bit_Extraction;
Compilation is ok but the result is
raised CONSTRAINT_ERROR : bit_extraction.adb:39 invalid data
What did I forget or misunderstand ?
UPDATE
The real record in fact is
type Frame is record
Transaction_Id : Transaction_Identifier;
Protocol_Id : Word := 0;
Frame_Length : Length;
Unit_Id : Unit_Identifier;
Function_Code : Command;
Is_Exception : Boolean := False;
end record with Size => 8 * 8, Pack => True;
for Frame use
record
Transaction_Id at 0 range 0 .. 15;
Protocol_Id at 2 range 0 .. 15;
Frame_Length at 4 range 0 .. 15;
Unit_id at 6 range 0 .. 7;
Function_Code at 7 range 0 .. 6;
Is_Exception at 7 range 7 .. 7;
end record;
Where Transaction_Identifier, Word and Length are 16-bit wide.
These ones are displayed correctly if I remove the Is_Exception field and extend Function_Code to 8 bits.
The dump of the frame to decode is as following:
00000000 00 01 00 00 00 09 11 03 06 02 2b 00 64 00 7f
So my only problem is really to extract the 8th bit of the last byte.
So,
for Frame use
record
Transaction_Id at 0 range 0 .. 15;
Protocol_Id at 2 range 0 .. 15;
Frame_Length at 4 range 0 .. 15;
Unit_id at 6 range 0 .. 7;
Function_Code at 7 range 0 .. 6;
Is_Exception at 7 range 7 .. 7;
end record;
It seems you want Is_Exception to be the the LSB of the last byte?
With for Frame'Bit_Order use System.High_Order_First; the LSB will be bit 7,
(also, 16#32# will never be -- Big endian version of 16#4#, the bit pattern just doesn't match)
It may be more intuitive and clear to specify all of your fields relative to the word they're in, rather than the byte:
Unit_ID at 6 range 0..7;
Function_Code at 6 range 8 .. 14;
Is_Exception at 6 range 15 .. 15;
Given the definition of Command above, the legal values for the last byte will then be:
2 -> READ_COILS FALSE
3 -> READ_COILS TRUE
8 -> READ_DISCRETE_INPUTS FALSE
9 -> READ_DISCRETE_INPUTS TRUE
BTW,
by applying your update to your original program, and adding/changing the following, you program works for me
add
with Interfaces;
add
type Byte_Array is array(1..8) of Byte with Pack;
change, since we don't know the definition
Transaction_ID : Interfaces.Unsigned_16;
Protocol_ID : Interfaces.Unsigned_16;
Frame_Length : Interfaces.Unsigned_16;
Unit_ID : Interfaces.Unsigned_8;
change
function To_Frame is new Ada.Unchecked_Conversion (Byte_Array, Frame);
change
my_frame := To_Frame (Byte_Array'(00, 01, 00, 00, 00, 09, 16#11#, 16#9#));
I finally found what was wrong.
In fact, the Modbus Ethernet Frame definition mentioned that, in case of exception, the returned code should be the function code plus 128 (0x80) (see explanation on Wikipedia). That's the reason why I wanted to represent it through a Boolean value but my representation clauses were wrong.
The correct clauses are these ones :
for Frame use
record
Transaction_Id at 0 range 0 .. 15;
Protocol_Id at 2 range 0 .. 15;
Frame_Length at 4 range 0 .. 15;
Unit_id at 6 range 0 .. 7;
Is_Exception at 6 range 8 .. 8;
Function_Code at 6 range 9 .. 15;
end record;
This way, the Modbus network protocol is correctly modelled (or not but at least, my code is working).
I really thank egilhh and simonwright for making me find what was wrong and explain the semantics behind the aspects.
Obviously, I don't know who reward :)
Your original record declaration works fine (GNAT complains about the Pack, warning: pragma Pack has no effect, no unplaced components). The problem is with working out the little-endian Byte.
---------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | BE bit numbers
---------------------------------
| c c c c c c c | e |
---------------------------------
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | LE bit numbers
---------------------------------
so if you want the Command to be Read_Discrete_Inputs, the Byte needs to have BE bit 4 (LE bit 3) set i.e. LE 16#8#.
Take a look at this AdaCore post on bit order and byte order to see how they handle it. After reading that, you will probably find that the bit order of your frame value is really 16#08#, which probably is not what you are expecting.
Big Endian / Little Endian typically refers to Byte order rather than bit order, so when you see that Network protocols are Big Endian, they mean Byte order. Avoid setting Bit_Order for your records. In modern systems, you will almost never need that.
Your record is only one byte in size, so byte order won't matter for it by itself. Byte order comes into play when you have larger field values (>8 bits long).
The bit_order pragma doesn't reverse the order that the bits appear in memory. It simply defines whether the most significant bit (left most) will be logically referred to as zero (High_Order_First) or the least significant bit will be referred to as zero (Low_Order_First) when interpreting the First_Bit and Last_Bit offsets from the byte position in the representation clause. Keep in mind that these offsets are taken from the MSB or LSB of the scalar the record component belongs to AS A VALUE. So in order for the byte positions to carry the same meaning on a little endian CPU as they do on a big endian CPU (as well as the in memory representation of multibyte machine scalars, which exist when one or more record components with the same byte position have a last_bit value which exceeds the capacity of a single byte) then 'Scalar_Storage_Order must also be specified.

Counting observations using multiple BY groups SAS

I am examining prescription patterns within a large EHR dataset. The data is structured so that we are given several key bits of information, such as patient_num, encounter_num, ordering_date, medication, age_event (age at event) etc. Example below:
Patient_num enc_num ordering_date medication age_event
1111 888888 07NOV2008 Wellbutrin 48
1111 876578 11MAY2011 Bupropion 50
2222 999999 08DEC2009 Amitriptyline 32
2222 999999 08DEC2009 Escitalopram 32
3333 656463 12APR2007 Imipramine 44
3333 643211 21DEC2008 Zoloft 45
3333 543213 02FEB2009 Fluoxetine 45
Currently I have the dataset sorted by patient_id then by ordering_date so that I can see what each individual was prescribed during their encounters in a longitudinal fashion. For now, I am most concerned with the prescription(s) that were made during their first visit. I wrote some code to count the number of prescriptions and had originally restricted later analyses to RX = 1, but as we can see, that doesn't work for people with multiple scripts on the same encounter (Patient 2222).
data pt_meds_;
set pt_meds;
by patient_num;
if first.patient_num then RX = 1;
else RX + 1;
run;
Patient_num enc_num ordering_date medication age_event RX
1111 888888 07NOV2008 Wellbutrin 48 1
1111 876578 11MAY2011 Bupropion 50 2
2222 999999 08DEC2009 Amitriptyline 32 1
2222 999999 08DEC2009 Escitalopram 32 2
3333 656463 12APR2007 Imipramine 44 1
3333 643211 21DEC2008 Zoloft 45 2
3333 543213 02FEB2009 Fluoxetine 45 3
I think it would be more appropriate to recode the encounter numbers into a new variable so that they reflect a style similar to the RX variable. Where each encounter is listed 1-n, and the number will repeat if multiple scripts are made in the same encounter. Such as below:
Patient_num enc_num ordering_date medication age_event RX Enc_
1111 888888 07NOV2008 Wellbutrin 48 1 1
1111 876578 11MAY2011 Bupropion 50 2 2
2222 999999 08DEC2009 Amitriptyline 32 1 1
2222 999999 08DEC2009 Escitalopram 32 2 1
3333 656463 12APR2007 Imipramine 44 1 1
3333 643211 21DEC2008 Zoloft 45 2 2
3333 543213 02FEB2009 Fluoxetine 45 3 3
From what I have seen, this could be possible with a variant of the above code using 2 BY groups (patient_num & enc_num), but I can't seem to get it. I think the first. / last. codes require sorting, but if I am to sort by enc_num, they won't be in chronological order because the encounter numbers are generated by the system and depend on all other encounters going in at that time.
I tried to do the following code (using ordering_date instead because its already sorted properly) but everything under Enc_ is printed as a 1. I'm sure my logic is all wrong. Any thoughts?
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
if first.patient_num;
if first.ordering_date then enc_ = 1;
else enc_ + 1;
run;
First
.First/.Last flags doesn't require sorting if data is properly ordered or you use NOTSORTED in your BY statement. If your variable in BY statement is not properly ordered then BY statment will throw error and stop executing when encounter deviations. Like this:
data class;
set sashelp.class;
by age;
first = first.age;
last = last.age;
run;
ERROR: BY variables are not properly sorted on data set SASHELP.CLASS.
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 FIRST.Age=1 LAST.Age=1 first=. last=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set SASHELP.CLASS.
Try this code to see how exacly .first/.last flags works:
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
fp = first.patient_num;
lp = last.patient_num;
fo = first.ordering_date;
lo = last.ordering_date;
run;
Second
Those condidions works differently than you think:
if expression;
If expression is true then continue with next instructions after if.
Otherwise return to begining of data step (no implicit output). This also implies your observation is not retained in the output.
In most cases if without then is equivalent to where. However
whereworks faster but it is limited to variables that comes from data set you are reading
if can be used with any type of expression including calculated fields
More info:: IF
Statement, Subsetting
Third
I think lag() function can be your answear.
data pt_meds_test;
set pt_meds_;
by patient_num;
retain enc_;
prev_patient_num = lag(patient_num);
prev_ordering_date = lag(ordering_date);
if first.patient_num then enc_ = 1;
else if patient_num = prev_patient_num and ordering_date ne prev_ordering_date then enc_ + 1;
end;
run;
With lag() function you can look what was the value of vairalbe on the previos observation and compare it with current one later.
But be carefull. lag() doesn't look for variable value from previous observation. It takes vale of variable and stores it in a FIFO queue with size of 1. On next call it retrives stored value from queue and put new value there.
More info: LAG Function
I'm not sure if this hurts the rest of your analysis, but what about just
proc freq data=pt_meds noprint;
tables patient_num ordering_date / out=pt_meds_freq;
run;
data pt_meds_freq2;
set pt_meds_freq;
by patient_num ordering_date;
if first.patient_num;
run;

Hash Table + Binary Search

I'm using an Hash Table to store some values. Here are the details:
There will be roughly 1M items to store (not known before, so no perfect-hash possible).
Table is 10M large.
Hash function is MurMurHash3.
I did some tests and storing 1M values I get 350,000 collisions and 30 elements at the most-colliding hash table's slot.
Are these result good?
Would it make sense to implement Binary Search for lists that get created at colliding hash-table's slots?
What' your advice to improve performances?
EDIT: Here is my code
var
HashList: array [0..10000000 - 1] of Integer;
for I := 0 to High(HashList) do
HashList[I] := 0;
for I := 1 to 1000000 do
begin
Y := MurmurHash3(UIntToStr(I));
Y := Y mod Length(HashList);
Inc(HashList[Y]);
if HashList[Y] > 1 then
Inc(TotalCollisionsCount);
if HashList[Y] > MostCollidingSlotItemCount then
MostCollidingSlotItemCount := HashList[Y];
end;
Writeln('Total: ' + IntToStr(TotalCollisionsCount) + ' Max: ' + IntToStr(MostCollidingSlotItemCount));
Here is the result I get:
Total: 48169 Max: 5
Am I missing something?
This is what you get when you put 1M items randomly into 10M cells
calendar_size=10000000 nperson = 1000000
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 9048262 (0.904826) 0 (0.000000) 0 0 0
1: 905064 (0.090506) 905064 (0.905064) 1 905064 905064
2: 45136 (0.004514) 90272 (0.090272) 3 135408 1040472
3: 1488 (0.000149) 4464 (0.004464) 6 8928 1049400
4: 50 (0.000005) 200 (0.000200) 10 500 1049900
----+---------+--------+----------+--------+------+--------+--------
5: 10000000 1000000 1.049900 1049900
The left column is the number of items in a cell. The second: the number of cells having this itemcount.
WRT the binary search: it is obvious that for small tables like this (maximum chain length=4, but most chains are of length=1), linear search outperforms binary search. The takeover-point is probably somewhere between 10 and 100.

How can I apply fisher test on this set of data (nominal variables)

I'm pretty new in statistics:
fisher = function(idxToTest, idxATI){
idxDependent=c()
dependent=c()
p = c()
for(i in c(1:length(idxToTest)))
{
tbl = table(data[[idxToTest[i]]], data[[idxATI]])
rez = fisher.test(tbl, workspace = 20000000000)
if(rez$p.value<0.1){
dependent=c(dependent, TRUE)
if(rez$p.value<0.1){
idxDependent = c(idxDependent, idxToTest[i])
}
}
else{
dependent = c(dependent, FALSE)
}
p = c(p, rez$p.value)
}
}
This is the function I use. It seems to work.
What I understood until now is that I have to pass as first parameter data like:
Men Women
Dieting 10 30
Non-dieting 5 60
My data comes from a CSV:
data = read.csv('***.csv', header = TRUE, sep=',');
My first problem is that I don't know how to converse from:
Loan.Purpose Home.Ownership
lp_value_1 ho_value_2
lp_value_1 ho_value_2
lp_value_2 ho_value_1
lp_value_3 ho_value_2
lp_value_2 ho_value_3
lp_value_4 ho_value_2
lp_value_3 ho_value_3
to:
ho_value_1 ho_value_2 ho_value_3
lp_value1 0 2 0
lp_value2 1 0 1
lp_value3 0 1 1
lp_value4 0 1 0
The second issue is that I don't know what the second parameter should be
POST UPDATE: This is what I get using fisher.test(myTable):
Error in fisher.test(test) : FEXACT error 501.
The hash table key cannot be computed because the largest key
is larger than the largest representable int.
The algorithm cannot proceed.
Reduce the workspace size or use another algorithm.
where myTable is:
MORTGAGE NONE OTHER OWN RENT
car 18 0 0 5 27
credit_card 190 0 2 38 214
debt_consolidation 620 0 2 87 598
educational 5 0 0 3 7
...
Basically, fisher tests only work on smallish data sets because they require alot of memory. But all is good because chi-square tests make minimal additional assumptions and are easier on the computer. Just do:
chisq.test(Loan.Purpose,Home.Ownership)
to get your p-values.
Make sure you read through and understand the help page for chisq.test, especially the examples at the bottom.
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/chisq.test.html
Then look at a mosaicplot to see the quantities like:
mosaicplot(Loan.Purpose,Home.Ownership)
this reference explains how mosaicplots work.
http://alumni.media.mit.edu/~tpminka/courses/36-350.2001/lectures/day12/

Number pattern for checkboxes in asp.net?

I have a database table which contanis a field name Province bits and it adds up the count of the pattern like:
AB-1
BC-2
CD-4
DE-8
EF-16.... and so on.
Now in the table entry I have a value-13(Province bit), which implies checkboxes against entry AB,CD,DE(adds up to 13)should be checked.
I am not able to get the logic behind the same, how can check only those checkboxes whose sum adds up to the entry in the table?
You need to check to see if the value is in the bitwise total.
if( interestedInValue & totalValue == interestedInValue)
{
// this value is in the total, check the box
}
Documentation on & http://msdn.microsoft.com/en-us/library/sbf85k1c(v=vs.71).aspx
e.g. 13 = 1 + 4 + 8
13 & 1 == 1 // true
13 & 2 == 2 // false
13 & 4 == 4 // true
13 & 8 == 8 // true
13 & 16 == 16 // false
EDIT: for more clarification
ab.Checked = 1 && ProvinceBit == 1 // checkbox AB
bc.Checked = 2 && ProvinceBit == 2 // checkbox BC
...
The field is using bit flags.
13 is 1101 binary.
So convert the value to bits and assign one bit to each checkbox.
By converting your number to a string, you can convert to an array or just iterate through the string. A bit brute force, but will give you what you need.
var value = 13
string binary = Convert.ToString(value, 2);
//binary = "1101"

Resources