How do I represent medication alternatives in FHIR? - dstu2-fhir

I would like to know if there is a Resource I could use to represent medication alternatives, such as:
If you are taking Drug A , you should use Drug B or Drug C instead

This is not something that FHIR has implemented yet - it's clinical/medical knowledge. The nearest that FHIR currently has is the Clinical Quality Framework: http://hl7-fhir.github.io/cqif/cqif.html
Warning: It's very much a work in progress, and the link above is to the current build, not a stable release

Related

Firebase Revenue AB testing algorithm

We have run an AB test at firebase which has the following results:
I was also building my own Bayesian AB-test suite and was wondering how they came to these conclusions.
What I was doing was querying the data of this test for the Control Group and Variant C:
Control Group: $11943 Revenue from 900 payers of 80491 users.
Variant C: $16487 Revenue from 894 payers of 80224 users.
I based my algorithm on this tool: https://vidogreg.shinyapps.io/bayes-arpu-test/. When I enter these inputs I get the following result:
This tool seems to be much more condident that Variant C is better than the control group then Firebase. It also seems like the Firebase distributions for Revenue per user are skewed while the Bayesian ARPU tool has very symmetrical distribution.
The code for the Bayesian ARPU tool is available. They used conjugate priors to get to these conclusions based on this paper:
https://cdn2.hubspot.net/hubfs/310840/VWO_SmartStats_technical_whitepaper.pdf
Can anyone help me out which results are the best?
I have found out what my problem was.
The first problem is that it has to be broken into two steps. As it is freemium app, most user do not pay. This means that these users do not give extra information for the distribution.
So,
We first need to find posterior distribitions for the payer percentage. This can be done as explained the paper I mentioned. In Python a function for the posterior distribition is this:
def binomial_rvar(successs, samples):
rvar = np.random.beta(1 + successes, 1 + (total - successes), samples)
return rvar
Secondly, of all payers, we want to get the revenue. The paper also describes how to do revenue, but they assume the revenue is exponentially distributed. This is not the case for our app. We have some users that spend insane amount of money on this app. If this user were to be in one of the groups, this method will immediately think it is the best.
What we can do is take the log of the pareto distributed samples, which will transform a pareto distbution into a exponential distribution. We first take the log of the user revenue and then sum all these together creating the "logsum" and count from how many users it came. We can then use the same approach as the paper uses. In Python this would be something like this:
def get_exponential_rvars(total_sum, users, samples):
r_var = 1. / np.random.gamma(users, 1 / (1 + total_sum), samples)
return r_var
We can now multiply both these r_var results, giving the final distribution for the revenue per user.

Understanding Event Coincidence Analysis from CoinCalc R package

I have two binarized events (eventA and eventB), I want to know if there is any coincidence in these two events. So I'll use the new Package CoinCalc to investigate the potential relation between these two.
library(CoinCalc) #note that the package is not visible (at least for) me in CRAN. I got it from GitHub https://github.com/JonatanSiegmund/CoinCalc
two binary events
eventA= c(0,1,0,0,1,1,0,0,1,1,0,0,1,0,1,0,1,0,1,1,1,1,0,0,0,1,1,1,0,0,1,1,0,1,1,0,1,0,0,0,1,1,0,0,0,1,1,0,1,1,1,1,1,1,0,1,0,0,0,1,1,0,0,0,0,0,1,1,0,0,1,1,1,0,0,1,0,1,1,1,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,0,1,1,0,0,1,1,0,1,1,0,0,0,1,0,0,0,0,0,1,1,0,1,0,1,1,0,1,0,0,0,1,0,0,1,0,1)
eventB = c(0,1,0,0,0,0,1,0,0,0,1,1,1,1,1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,1,0,1,1,0,1,1,1,0,0,1,1,1,0,0,0,1,1,0,1,1,1,1,1,1,0,1,1,1,0,0,1,0,1,1,1,1,1,1,0,0,1,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,0,1,0,0,0,0,0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,1,0,0,1,0,1,0,1,1,0,0,0)
run ECA analysis
ca.out <- CC.eca.ts(eventA, eventB,delT=2,tau=2)
this yields:
$NH precursor
1 TRUE
$NH trigger
1 FALSE
$p-value precursor
1 0.2544052
$p-value trigger
1 0.003287963
$precursor coincidence rate
1 0.8243243
$trigger coincidence rate
1 0.9285714
I want to make sure I'm understanding this properly. Based on the results, the null hypothesis can only be rejected for the trigger, which is statistically significant at the 0.003 level, and the coincidence rate is 0.92 (very high, is this equivalent to R2?). Can this be interpreted that eventB has a strong influence on eventA, but not the opposite?
Then I can plot these two events using the CC.plot function:
CC.plot(eventA,eventB,dates=c(1900:2040),delT=2, tau=2, seriesAname = 'EventA', seriesBname = 'EventB')
Which yields:
Is there any way to modify the graphical parameters in CC.plot? The dummy years are not visible in this plot. I'd like to change fonts, size, colours, etc. Is there any way to draw the same figure by calling the model output (ca.out)?
Thanks in advance!
I'll try to answer your questions:
Question #1: The most important problem that I see in your example is that your events are not "rare". Therefore the most important pre-condition of the analytical significance test that you used by default (sigtest="poisson") in not fulfilled. Another "problem" is, that the events in both series seem to be clustered (may also be an effect of the high number of events). I would recommend to use sigtest="shuffle.surrogate" which is more appropriate for this case. More information about the significance test can be found at Siegmund et al. 2017 (http://www.sciencedirect.com/science/article/pii/S0098300416305489)
Executing this reveals that both coincidence rates are not significant. By the way: with such a high number of events it is extremely unlikely that you would ever get a 'significant coincidence rate', because the chance that simultaneities occur by random is very very high.
Nevertheless, if the trigger coincidence rate would be significant and the precursor not, your interpretation is a possible one.
Question #2: The problem with the plot is again, that there are too many events (compared to what the method was originally designed for). This is why everything looks so messy. The function was ment to be more like a help to explain how the method works and what you have done.
If you e.g. only plot e.g. 20 years of your data
CC.plot(eventA[120:140],eventB[120:140],dates=c(2020:2040),delT=2, tau=2, seriesAname = 'EventA', seriesBname = 'EventB')
you will get a much better image, that yet, due to the high event-density of almost 50%, is not very nice.
CoinCalc plot
For now, there are no options to change the plot parameters. This might come for a future version of the package.
I hope that this helps you a bit!

audio comparison with R

I am working in a project where my task deals with speech/audio/voice comparison. This project is used for judging the winner in the competitions(mimicry). Practically I need to capture the user's speech/voice and compare it with the original audio file and return a percentage match. I need to develop this in R-language.
I had already tried voice related packages in R (tuneR, audio, seewave) but in my search, I am not able to get the comparison related information.
I need some assistance from you guys that where, I can find the information related to my work, which is the best way to handle this type of problems and if there, what are the prerequisites for processing these type of audio related work.
Basically, the best features to be used for speech/voice comparison are the MFCC.
There are some softwares that can be used to extract these coefficients: Praat website
You can also try to find a lib to extract these coefficients.
[Edit: I've found in tuneR documentation that it has a function to extract MFCC - search for the function melfcc()]
After you've extracted these features, you can use Machine Learning (SVM, RandomForests or something like that) to develop a classifier.
I have a seminar that I've presented about Speaker Recognition Systems, take a look at it, it may be helpful. (Seminar)
If you have time and interest, you could algo read:
Authors: Kinnunen, T., & Li, H. (2010)
Paper: an overview of text-independent speaker recognition: From features to supervectors
After you get a feature vector for each audio sample (with MFCC and/or other features), then you'll need to compare pairs of feature vectors (Features from A versus Features from B):
You could try to use the Absolute Difference between these feature vectors:
abs(feature vector from A - feature vector from B)
The result of the operation above is a feature vector where every element is >=0 and it has the same size of the A (or B) feature vector.
You could also test the element-wise multiplication between A and B features:
(A1*B1, A2*B2, ... , An*Bn)
Then you need to label each feature vector
(1 if person A == person B and 0 if person A != person B).
Usually the absolute difference performs better than the multiplication feature vector, but you can append both vectors and test the performance of the classifier using both the abs diff and the multiplication features at the same time.

Trying to use the Naive Bayes Learner in R but predict() giving different results than model would suggest

I'm trying to use the Naive Bayes Learner from e1071 to do spam analysis. This is the code I use to set up the model.
library(e1071)
emails=read.csv("emails.csv")
emailstrain=read.csv("emailstrain.csv")
model<-naiveBayes(type ~.,data=emailstrain)
there a two sets of emails that both have a 'statement' and a type. One is for training and one is for testing. when I run
model
and just read the raw output it seems that it gives a higher then zero percent chance to a statement being spam when it is indeed spam and the same is true for when the statement is not. However when I try to use the model to predict the testing data with
table(predict(model,emails),emails$type)
I get that
ham spam
ham 2086 321
spam 2 0
which seems wrong. I also tried using the training set to test the data on as well, and in this case it should give quite good results, or at least as good as what was observed in the model. However it gave
ham spam
ham 2735 420
spam 0 6
which is only slightly better then with the testing set. I think it must be something wrong with how the predict function is working.
how the data files are set up and some examples of whats inside:
type,statement
ham,How much did ur hdd casing cost.
ham,Mystery solved! Just opened my email and he's sent me another batch! Isn't he a sweetie
ham,I can't describe how lucky you are that I'm actually awake by noon
spam,This is the 2nd time we have tried to contact u. U have won the £1450 prize to claim just call 09053750005 b4 310303. T&Cs/stop SMS 08718725756. 140ppm
ham,"TODAY is Sorry day.! If ever i was angry with you, if ever i misbehaved or hurt you? plz plz JUST SLAP URSELF Bcoz, Its ur fault, I'm basically GOOD"
ham,Cheers for the card ... Is it that time of year already?
spam,"HOT LIVE FANTASIES call now 08707509020 Just 20p per min NTT Ltd, PO Box 1327 Croydon CR9 5WB 0870..k"
ham,"When people see my msgs, They think Iam addicted to msging... They are wrong, Bcoz They don\'t know that Iam addicted to my sweet Friends..!! BSLVYL"
ham,Ugh hopefully the asus ppl dont randomly do a reformat.
ham,"Haven't seen my facebook, huh? Lol!"
ham,"Mah b, I'll pick it up tomorrow"
ham,Still otside le..u come 2morrow maga..
ham,Do u still have plumbers tape and a wrench we could borrow?
spam,"Dear Voucher Holder, To claim this weeks offer, at you PC please go to http://www.e-tlp.co.uk/reward. Ts&Cs apply."
ham,It vl bcum more difficult..
spam,UR GOING 2 BAHAMAS! CallFREEFONE 08081560665 and speak to a live operator to claim either Bahamas cruise of£2000 CASH 18+only. To opt out txt X to 07786200117
I would really love suggestions. Thank you so much for your help
Actually predict function works just fine. Don't get me wrong but problem is in what you are doing. You are building the model using this formula: type ~ ., right? It is clear what we have on the left-hand side of the formula so lets look at the right-hand side.
In your data you have only to variables - type and statement and because type is dependent variable only thing that counts as independent variable is statement. So far everything is clear.
Let's take a look at Bayesian Classifier. A priori probabilities are obvious, right? What about
conditional probabilities? From the classifier point of view you have only one categorical Variable (your sentences). For the classifier point it is only some list of labels. All of them are unique so a posteriori probabilities will be close to the the a priori.
In other words only thing we can tell when we get a new observation is that probability of it being spam is equal to probability of message being spam in your train set.
If you want to use any method of machine learning to work with natural language you have to pre-process your data first. Depending on you problem it could for example mean stemming, lemmatization, computing n-gram statistics, tf-idf. Training classifier is the last step.

Friends selection algorithm

In a .net project we have a group of 200 people of two types, lets say x and y, who need to be separated into groups of 7 or 8.
We have a web page where the people write other members they want to be in a group with. Each person builds a list of wanted members.
After this, there should be an algorithm to build the 7-8 member groups considering the peoples ratings, and the following condition: each group has at least 2 people of each type (x/y).
I'm pretty sure there must be a well known algorithm similar to this but didn't find one. Anyone knows how to do it?
this problem smells NP-Hard, so I suggest using Artificial Intelligence tools.
A possible approach is steepest ascent hill climbing [SAHC]
first, we will define our utility function (let it be u) as mentioned in the comments to the question. [sum of friends in group for each user]. let's define u(illegal) = -1 for illegal solution.
next,we define our 'world': S is the group of all possible solutions].
for each solution in S we define:
next(s)={all possibilities moving one person to a different group}
all we have to do now is run SAHC with random restarts:
1. best<- -INFINITY
2. while there is more time
3. choose a random legal solution
4. NEXT <- next(s)
5. if max{ U(NEXT) } < u(s): //s is the top of the hill
5.1. if u(s) > best: best <- u(s) //if s is better then the previous result - store it.
5.2. go to 2. //restart the hill climbing from a different random point.
6. else:
6.1. s <- max{ NEXT } //climb on the steepest hill.
6.2. goto 4.
7. return best //when out of time, return the best solution found so far.
It is anytime algorithm, meaning it will get a better result as you give it more time to run, and eventually [at time infinity] it will find the optimal result.

Resources