Combining Column result into Oracle? - oracle11g

I have below query
select distinct pj.pid,pj.seq, wm_concat(distinct pj.job)
from people_jobs pj
join entities aa on pj.pid=aa.pid and aa.mmd = 'A' and aa.mmd_cat in (50,70,80)
group by pj.pid, pj.seq
having wm_concat(distinct pj.job) like '%45%' and wm_concat(distinct pj.job) like '%70%'
Result
PID SEQ WM_concat(distinct pj.job)
9001800 10 16,45,70
9348300 1 16,45,70,90
9349100 1 16,19,45,54,56,70,81,82
9370600 1 19,45,55,70,90
9374000 1 10,42,45,70,90
9374000 3 10,45,70
9374400 1 45,70,90
9411100 1 45,54,57,58,70,90
9602500 10 19,45,70,90
Here I want combine seq by pid wise, pid should be distinctive, can you please help me?

Related

How to know if there is a different element in one array in Scilab?

My goal is to check if there are misplaced objects in one array.
for example the array is
2.
2.
2.
2.
2.
1.
3.
1.
3.
3.
3.
1.
3.
1.
1.
1.
1.
I want to know if the first 5 elements, 6 to 13 and 14-17 are the same.
The purpose of this is to identify the misplaced elements in a clustering solution.
I have tried for the first 5 elements
ISet=5
IVer=7
IVir=5
for i=1:ISet
if(isequal(FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2))==%f)
numMisp=numMisp+1
mprintf("Set misp: %i",numMisp)
end
end
For the next 6 to 13 elements
for i=ISet+1:IVer+ISet-1
if(isequal(FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2))==%f)
mprintf("%i %i Ver misp: %i\n",FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2),i)
numMisp=numMisp+1
end
end
For the next 14 to 17 elements
for i=IVer+ISet:IVer+IVir-1
if(isequal(FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2))==%f)
mprintf("%i %i Ver misp: %i\n",FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2),i)
numMisp=numMisp+1
mprintf("Vir misp: %i",i)
end
end
You can use unique for that purpose. For example the following test checks if the first five elements are the same
x=[2 2 2 2 2 1 3 1 3 3 3 1 3 1 1 1 1];
if length(unique(x(1:5))) == 1
//
end
You can do the the same for the other clusters by replacing 1:5 by 6:13 then 14:17.

How do I count how many of my comments contain words in a list?

My dataframe(df) looks like this:
Comments
-----------------
1 | comment1
2 | comment2
3 | comment3
4 | comment4
...
I have created 2 lists are follows:
list1<-c("money","finance","aid")
list2<-c("major","degree")
I want to search through rows in a datframe which has comments from different persons. When any of the words in list1 are found in a particular row, counter1 should increment and when words in list2 are found counter2 should increment
I want to get results as:
counter1=10 ; counter2=25
Note: I don't wish to increment the counter at each frequency of words. For example, if a comment contains both "money" and "finance" the counter should increment only once. But if it has "money" and "major", counter1 and counter2 both should increment.
You can collapse your list with |'s, so grepl will return TRUE if a match is found. Example:
Sample data
comments = data.frame(text=c("only list 1 since money","only list 2 since major","both lists money major","money finance list 1 once"))
text
1 only list 1 since money
2 only list 2 since major
3 both lists money major
4 money finance list 1 once
Code
list1<-c("money","finance","aid")
list2<-c("major","degree")
counter1=sum(grepl(paste(list1,collapse="|"),comments$text))
counter2=sum(grepl(paste(list2,collapse="|"),comments$text))
Result
counter1: 3
counter2: 2
Hope this helps!

Counting observations using multiple BY groups SAS

I am examining prescription patterns within a large EHR dataset. The data is structured so that we are given several key bits of information, such as patient_num, encounter_num, ordering_date, medication, age_event (age at event) etc. Example below:
Patient_num enc_num ordering_date medication age_event
1111 888888 07NOV2008 Wellbutrin 48
1111 876578 11MAY2011 Bupropion 50
2222 999999 08DEC2009 Amitriptyline 32
2222 999999 08DEC2009 Escitalopram 32
3333 656463 12APR2007 Imipramine 44
3333 643211 21DEC2008 Zoloft 45
3333 543213 02FEB2009 Fluoxetine 45
Currently I have the dataset sorted by patient_id then by ordering_date so that I can see what each individual was prescribed during their encounters in a longitudinal fashion. For now, I am most concerned with the prescription(s) that were made during their first visit. I wrote some code to count the number of prescriptions and had originally restricted later analyses to RX = 1, but as we can see, that doesn't work for people with multiple scripts on the same encounter (Patient 2222).
data pt_meds_;
set pt_meds;
by patient_num;
if first.patient_num then RX = 1;
else RX + 1;
run;
Patient_num enc_num ordering_date medication age_event RX
1111 888888 07NOV2008 Wellbutrin 48 1
1111 876578 11MAY2011 Bupropion 50 2
2222 999999 08DEC2009 Amitriptyline 32 1
2222 999999 08DEC2009 Escitalopram 32 2
3333 656463 12APR2007 Imipramine 44 1
3333 643211 21DEC2008 Zoloft 45 2
3333 543213 02FEB2009 Fluoxetine 45 3
I think it would be more appropriate to recode the encounter numbers into a new variable so that they reflect a style similar to the RX variable. Where each encounter is listed 1-n, and the number will repeat if multiple scripts are made in the same encounter. Such as below:
Patient_num enc_num ordering_date medication age_event RX Enc_
1111 888888 07NOV2008 Wellbutrin 48 1 1
1111 876578 11MAY2011 Bupropion 50 2 2
2222 999999 08DEC2009 Amitriptyline 32 1 1
2222 999999 08DEC2009 Escitalopram 32 2 1
3333 656463 12APR2007 Imipramine 44 1 1
3333 643211 21DEC2008 Zoloft 45 2 2
3333 543213 02FEB2009 Fluoxetine 45 3 3
From what I have seen, this could be possible with a variant of the above code using 2 BY groups (patient_num & enc_num), but I can't seem to get it. I think the first. / last. codes require sorting, but if I am to sort by enc_num, they won't be in chronological order because the encounter numbers are generated by the system and depend on all other encounters going in at that time.
I tried to do the following code (using ordering_date instead because its already sorted properly) but everything under Enc_ is printed as a 1. I'm sure my logic is all wrong. Any thoughts?
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
if first.patient_num;
if first.ordering_date then enc_ = 1;
else enc_ + 1;
run;
First
.First/.Last flags doesn't require sorting if data is properly ordered or you use NOTSORTED in your BY statement. If your variable in BY statement is not properly ordered then BY statment will throw error and stop executing when encounter deviations. Like this:
data class;
set sashelp.class;
by age;
first = first.age;
last = last.age;
run;
ERROR: BY variables are not properly sorted on data set SASHELP.CLASS.
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 FIRST.Age=1 LAST.Age=1 first=. last=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set SASHELP.CLASS.
Try this code to see how exacly .first/.last flags works:
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
fp = first.patient_num;
lp = last.patient_num;
fo = first.ordering_date;
lo = last.ordering_date;
run;
Second
Those condidions works differently than you think:
if expression;
If expression is true then continue with next instructions after if.
Otherwise return to begining of data step (no implicit output). This also implies your observation is not retained in the output.
In most cases if without then is equivalent to where. However
whereworks faster but it is limited to variables that comes from data set you are reading
if can be used with any type of expression including calculated fields
More info:: IF
Statement, Subsetting
Third
I think lag() function can be your answear.
data pt_meds_test;
set pt_meds_;
by patient_num;
retain enc_;
prev_patient_num = lag(patient_num);
prev_ordering_date = lag(ordering_date);
if first.patient_num then enc_ = 1;
else if patient_num = prev_patient_num and ordering_date ne prev_ordering_date then enc_ + 1;
end;
run;
With lag() function you can look what was the value of vairalbe on the previos observation and compare it with current one later.
But be carefull. lag() doesn't look for variable value from previous observation. It takes vale of variable and stores it in a FIFO queue with size of 1. On next call it retrives stored value from queue and put new value there.
More info: LAG Function
I'm not sure if this hurts the rest of your analysis, but what about just
proc freq data=pt_meds noprint;
tables patient_num ordering_date / out=pt_meds_freq;
run;
data pt_meds_freq2;
set pt_meds_freq;
by patient_num ordering_date;
if first.patient_num;
run;

Display Row number using UNIX command

What Unix command returns row number for all records in a file. Below is the requirement.
id name salary
10 a 1000
20 b 2000
30 c 3000
But I want output like this.
Row_id id name salary
1 10 a 1000
2 20 b 2000
3 30 c 3000
Thanks for your effort in advance.
Try:
nl script ##nl number the line
or
cat -n file ##This will number empty line as well
awk '{ print FNR " " $0 }' file
It will also print the line number.

Make a file counting instances in sets of 5

I have a file that looks like this:
1 rs531842 503939 61733 G A
1 rs10494103 35025 114771 C T
1 rs17038458 254490 21116837 G A
1 rs616378 525783 21127670 T C
1 rs3845293 432526 21199392 A C
2 rs16840461 233620 157112959 A G
2 rs1560628 224228 157113214 T C
2 rs17200880 269314 257145829 C T
2 rs10497165 35844 357156412 C T
2 rs7607531 624696 457156575 T C
...with column 1 stretching on to 22, and several thousand entries in total.
I want to create a file that lists bins of 5 million from column 4 which have data, separating by column 1.
Basically, all but column 1 and 4 can be discarded. A simple imput would look like this:
InputChr1:
61733
114771
21116837
21127670
21199392
InputChr2:
157112959
157113214
257145829
357156412
457156575
So, for the example above, I would want to get two files that look like this:
OutputChr1.txt
Start End Occurrences
1 5000000 2
20000001 25000000 3
OutputChr2.txt
Start End Occurrences
155000001 160000000 2
255000001 260000000 1
355000001 360000000 1
455000001 460000000 1
Any ideas? It seems like something that should be doable with lapply in R, but I can't get the for loops to work...
EDIT: Actually, I made this look much harder than it needed to be - basically, I want to split the original file by column 1, extract the data in column 4, and then count the instances in bins of 5 million.
(Apologies for slightly random tags, just trying to think of which tools might be best!)
Well, this happened to be very challenging. I couldn't find a way to use an unique awk command, though.
awk -v const=5000000 -v max=150
'{a[$1,int($4/const)]++; b[$1]}
END{for (i in b)
{for (j=0; j<max; j++)
print i, j*const +1, (j+1)*const, a[i,j]
}
}' file
And then to get only the results:
awk 'NF==4'
Explanation
-v const=5000000 -v max=150 give the variables. const is the 5 million value to split the results. max is the biggest number up to which we will look for info in the END block.
a[$1,int($4/const)]++ create an array with (1st field, 4th field) as index. Note the second is int($4/const) is to get from 23432 --> 0, 6000000 --> 1, etc. That is, to see in which block of values is every 4th column.
b[$1] keep track of the first columns that have been processed.
END{for (i in b) {for (j=0; j<max; j++) print j, j*const +1, (j+1)*const, a[i,j]}}' print the values.
awk 'NF==4' just print those lines that have 4 columns. This way it just outputs those cases in which there were matches.
In case you want to store the values into a new file, you can do
awk 'NF==4 {print > "OutputChr"$1".txt}'
Sample output
$ awk -v const=5000000 -v max=150 '{a[$1,int($4/const)]++; b[$1]} END{for (i in b) {for (j=0; j<max; j++) print i, j*const +1, (j+1)*const, a[i,j]}}' a | awk 'NF==4'
1 1 5000000 2
1 20000001 25000000 3
2 155000001 160000000 2
2 255000001 260000000 1
2 355000001 360000000 1
2 455000001 460000000 1
All in one
awk '{ v=int($4/const)
a[$1 FS v]++
min[$1]=min[$1]<v?min[$1]:v # get the Minimum of column $4 for group $1
max[$1]=max[$1]>v?max[$1]:v # get the Minimum of column $4 for group $1
}END{ for (i in min)
for (j=min[i];j<=max[i];j++) # set the for loop, and use the min and max value.
if (a[i FS j]!="") print j*const+1,(j+1)*const,a[i FS j] > "OutputChr" i ".txt" # if the data is exist, print to file "OutputChr" i ".txt"
}' const=5000000 file
result:
$ cat OutputChr1.txt
1 5000000 2
20000001 25000000 3
$ cat OutputChr2.txt
155000001 160000000 2
255000001 260000000 1
355000001 360000000 1
455000001 460000000 1

Resources