I have created and implemented a predictive model that provides us with the probability that an account will convert. We also have the expected value of each account.
Our team members are graded on two parts:
(Accounts Converted)/(Total Accounts Assigned to Member)
And
(Value of Accounts converted)/(Total Value of all assigned accounts)
The average of the two is used to Grade each team member. So converting accounts is not always the best idea, you want to convert big accounts that are worth more $$.
The Question:
If a team member makes 200 calls a day, which accounts should he/she be working on to maximize their grade?
Since I have the probability that each account will convert, I would like to run a simulation to design a strategy to optimize the team members efforts and optimize their Grade. I am not sure where to start or how to design the simulation.
Would a Monte-Carlo Simulation work for this particular problem?
I would usually provide my attempt, but I am not sure where to start here.
Since the simulation is entirely based on probabilities(is not empirical), you dont need montecarlo, you can simply compute the expected value and assume an error in your predictions.
Related
In a survey participants are supposed to solve computational tasks and their payment should differ by their rank. How can I compare the scores of the participants within a survey/during a session to determine who has done best, etc.? Is this even possible?
I'm currently building a data-driven attribution model for my company. It relies on Markov Chains, and I have used the package ChannelAttribution in R which is great!
It gave me results at global level: what is the weight of every channel on total conversions and revenue. But now I have to move towards deployment phase in the systems.
To do that, I need that every new transaction happening gets its revenue distributed on channels in the path. My problem is that Markov Model for attribution is not a "predictive" model so there is no output at the transaction level (it basically simulates paths to calculate removal effects so no granular info).
Does anyone have an idea on how to translate the global output of the model into a set of rules that would allow for distributing revenue of new transactions in real time (or approx. real time like everyday)? I guess there can be additional hypothesis or another layer of modelling that could do the trick but I can't put my finger on it.
Any help appreciated!
Thanks,
Baptiste
Let's say I have a strategy with multiple rules that generates multiple orders on the same symbol at the same timestamp. For example, on 2012-05-23 one rule might buy 10 shares of IBM while another rule sells 5 shares of IBM. In production, a reasonable system would use netting and execute one order to buy 5 shares, rather than one order to buy 10 shares and another order to sell 5 shares.
Is there a way to get this behaviour in quantstrat? From my experiments, quantstrat does not do netting, and for example will add transaction fees for both opposing orders as if two separate orders were executed.
If quantstrat cannot net orders then it should still be possible to obtain the desired PnL in backtesting by using a custom TxnFees function. If this is the correct way to go, how would one go about defining a custom function to net the transaction fees?
A 'reasonable system' would likely do no such thing. My experience of simultaneous execution on tick data is basically zero for aggressive orders.
On bar data, yes, internal netting would make sense, and would be handled by a production order management system. Or, for example, internalizing resting internal limit orders against other signals asking for aggressive orders on the other side, or netting positions. Does any investor of non-trivial size use bar data?
That seems to miss the point of what quantstrat is for. You are looking to figure out (in research) some strategy that makes good predictions and evaluate the quality of those predictions by writing a backtest.
Backtests aren't reality.
Further, netting would completely muddle any ability to figure out if your signal process has predictive power.
The account in blotter will net P&L automatically, so it will have the same result as your order netting, in the absence of fees. So I don't think you would need a separate TxnFees function to understand the possible impact of netting, pre-fees.
I'm doing some exploratory work on recommendation systems and have been reading about collaborative filtering techniques involving user-based, item-based, and SVD algorithms. I am also trying out R's recommenderlab package.
One apparent assumption in the literature is that the user data has labelled items based on a rating scale, e.g. between 1 and 5 stars. I'm looking at problems where the user data does not have ratings but rather just transactions. For example, if I want to recommend restaurants to a user, the only data I have is how often he has visited other restaurants.
How can I convert these "transaction" counts into ratings that can be used by recommendation algorithms that expect a fixed-scale rating? One approach I thought of is simple binning:
0 stars = 0-1 visits
1 star = 2-3 visits
...
5 stars = 10+ visits
However, that doesn't seem like it would work well. For example, if someone visited a restaurant only once, he may still really love it.
Any help would be appreciated.
I would try different approaches. As you said, only visited once may indicate that the user still loves the restaurant but you don't know for sure. Your goal is not to optimize for one single user rather for all users. So for this, you can split your data into training and test data. Train on the training data with different scales and test on the test data.
The different scales may be
a binary scale (0:never visited, 1: visited). This is mostly used in online shops (bought or not). Would support your assuption with the one time visit.
your presented scale or other ranges for the 5 stars. You can also use more than 5 stars. I would potentially not group 0-1 visits.
The approach with the best accuracy should be chosen.
Here's an idea: restaurants the user has visited zero or one times tell you nothing about what they like. Restaurants they have visited many times tell you lots. Why not just look for restaurants similar to those the customer most regularly frequents? In this way, you're using positive information (what they like) but none of the negative since you don't have access to it anyway.
If you absolutely had to infer some continuous measure, I think it would only be sensible to look at the propensity for another visit given past behaviour. This would start with the prior probability of choosing that restaurant (background frequency, or just uniform over restaurants) with a likelihood term related to the number of visits to that restaurant. In this way the more a user visits a restaurant the more likely they are to visit again.
I'm looking for a rating system that does not only weight the rating on number of votes, but also time and "activity"
To clarify a bit:
Consider a site where users produce something, like a picture.
There is another type of user that can vote on other peoples pictures (on a scale 1-5), but one picture will only recieve one vote.
The rating a productive user gets is derived from the rating his/hers pictures have recieved, but should be affected by:
How long ago the picture was made
How productive the user has been
A user who's getting 3's and 4's and still making 10 pictures per week should get higher rating than a person that have gotten 5's but only made 1 pic per week and stopped a few month ago.
I've been looking at Bayesian estimate, but that only considers the total amount of votes independent of time or productivity.
My math fu is pretty strong, so all I need is a nudge in right direction and I can probably modify something to fit my needs.
There are many things you could do here.
The obvious approach is to have your measure of the scores decay with time in your internal calculations, for example using an exponential decay with a time constant T. For example, use value = initial_score*exp(-t/T) where t is the time that's passed since picture was submitted. So if T is one month, after one month this score will contribute 1/e, or about 0.37 that it originally did. (You can also do this differentially, btw, with value -= (dt/T)*value, if that's more convenient.)
There's probably a way to work this with a Bayesian approach, but it seems forced to me. Bayesian approaches are generally about predicting something new based on a (usually large) set of prior data, which doesn't directly match your model.