Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am working on an optimization problem. The objective is to determine the optimal number of target hires (COUNT below) to set per recruitment sourcing channel that would yield the lowest number of attritors (ATTRITORS below) at the end of each month and at the minimum sourcing cost (COST below). The constraints would be that the the total Count should equal 600 and that all channels should be utilized.
Given the two objective functions, is this something that can be accomplished using R, Solver, or any open source tool?
I tried formatting a dummy data and it would look something like this:
enter image description here
Thanks!
The simple way of dealing with this problem is to quantify the attrition cost. You already have cost per hire, in the same way using your domain knowledge and business conversation try to come up with per employee attrition cost (say first model). More likely depending on skill level per employee attrition cost will be different so for the sake of good approximation you may want to calculate it for each channel and then do average (say second model)
Once you say got per employee attrition cost regardless of channel (first model) then you can simply add per employee attrition cost * total attrition in objective function. In second level model you can do same thing; with channel dimension added, per employee attrition cost for a channel * total attrition for that channel. Based on business interpretation people also go next level: factor * per employee attrition cost for a channel * total attrition for that channel where factor is to adjust the importance of hiring and attrition cost (though I would expect cost to tackle this alone).
you can do this in excel solver OR choose here https://cran.r-project.org/web/views/Optimization.html OR go for commercial solvers like Gurobi, CPLEX with their APIs in R, Python.
Related
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 days ago.
Improve this question
I am using Looker Studio to visualize my data and I need to create a table with a monthly granularity that shows the month-over-month (MoM) difference in a metric called "metric". The MoM difference should be calculated as the current month's value minus the previous month's value. Is there a way to achieve this directly in Looker Studio without creating a new BigQuery table?
Example table:
year-month
metric
metric MoM
Jan 2023
10
{calculated value}
Feb 2023
8
-2
I have tried using the "Comparison Date Range" feature, but it doesn't seem to work in a table and also it compares the current year with a previous one which is not a solution I am looking for.
I was trying something similar and found that one way to achieve it is blends. Create a blend with month as dimension and your metric as metric. Join it with a second table that has the same structure, but instead of the month, it displays the previous month with a calculated dimension like
MONTH(date) - 1
Perform the join on the month dimension of both tables. The resulting blended table. Contains the Month, the metric of this month and the metric of the previous month. Now you can easily calculate your difference.
I have created and implemented a predictive model that provides us with the probability that an account will convert. We also have the expected value of each account.
Our team members are graded on two parts:
(Accounts Converted)/(Total Accounts Assigned to Member)
And
(Value of Accounts converted)/(Total Value of all assigned accounts)
The average of the two is used to Grade each team member. So converting accounts is not always the best idea, you want to convert big accounts that are worth more $$.
The Question:
If a team member makes 200 calls a day, which accounts should he/she be working on to maximize their grade?
Since I have the probability that each account will convert, I would like to run a simulation to design a strategy to optimize the team members efforts and optimize their Grade. I am not sure where to start or how to design the simulation.
Would a Monte-Carlo Simulation work for this particular problem?
I would usually provide my attempt, but I am not sure where to start here.
Since the simulation is entirely based on probabilities(is not empirical), you dont need montecarlo, you can simply compute the expected value and assume an error in your predictions.
Im learning dimensional modeling and Im trying to create a model. I was thinking about a social media platform which rates hotels. The platform has following data:
hotel information: name and address
a user can rate hotels (1-5 points)
a user can write comments
platform stores the date of the comments
hotel can answer via comment and it stores the date of it
the platform stores the total number of each rating level (i.e.: all rates with 1 point, all rates with 2 point etc.)
platform stores information of the user: sex, name, total number of votes he/she made and address
First, I tried to define which information belongs to a dimension or fact table
(here I also checked which one is additive/semi additive/non-additive)
I realized my example is kind of difficult, because it’s hard to decide if it belongs to a fact table or dimension.
I would like to hear some advice. Would someone agree with my model?
This is how I would model it:
Hotel information -> hotel dimension
User rating -> additive fact – because I can aggregate them with all dimensions
User comment -> semi additive? – because I can aggregate them with the date dimension (I don’t know if my argument is correct, but I know I would have new comments every day, which is for me a reason to store it in a fact table
Answer as comment -> same handling like with the user comments
Date of comment-> dimension
Total Number of all votes (1/2/3/4/5) -> semi-additive facts – makes no sense to aggregate them, since its already total but I would get the average
User information sex and name, address -> user-dimension
User Information: total number of votes -> could be dimension or fact. It depends how often it changes. If it changes often, I store it in a fact. If its not that often, then dimension
I still have question, hope someone can help me:
My Question: should I create two date dimensions, or can I store both information in one date dimension?
2nd Question: each user and hotel just have one address. Are there arguments, to separate the address dimension in a own hierarchy? Can I create a 1:1 relationship to a user dimension and address dimension?
For your model, it looks well considered, but here are some thoughts:
User comment (and answers to comments): they are an event to be captured (with new ones each day, as you mention) so are factual, with dimensionality of the commenter, type of comment, date, and the measure is at least a 'count' which is additive. But you don't want to store big text in a fact so you would put that in a dimension by itself which is 1:1 with the fact, for situations where you need to query on the comment itself.
Total Number of all votes (1/2/3/4/5) are, as you say, already aggregates, mostly for performance. Totals should be easy from the raw data itself so maybe not worthwhile to store them at all. You might also consider updating the hotel dimension with columns (hotel A has 5 '1' votes and 4 '2' votes) that you'd update as you go on, for easy filtering and categorisation.
User Information: total number of votes: it is factual information about a user (dimension) and it depends on whether you always just want to 'find it out' about a person or whether you are likely to use it to filter other information (i.e. show me all reviews for users who have made 10-20 votes). In that case you might store the total in the user dimension (and/or a banding, like 'number of reviews range' with 10-20, 20-30). You can update dimensions often if you need to, but you're right, it could still just live as a fact only.
As for date dimensions, if the 'grain' is 'day' then you only need one dimension, that you refer to from multiple facts.
As for addresses, you're right that there are arguments on both sides! Many people separate addresses into their own dimension, referred to from the other dimensions that use them. Kimball suggests you can do that behind the scenes if necessary, but prefers for each dimension to have its own set of address columns(but modelled as consistently as possible).
I'm looking for a rating system that does not only weight the rating on number of votes, but also time and "activity"
To clarify a bit:
Consider a site where users produce something, like a picture.
There is another type of user that can vote on other peoples pictures (on a scale 1-5), but one picture will only recieve one vote.
The rating a productive user gets is derived from the rating his/hers pictures have recieved, but should be affected by:
How long ago the picture was made
How productive the user has been
A user who's getting 3's and 4's and still making 10 pictures per week should get higher rating than a person that have gotten 5's but only made 1 pic per week and stopped a few month ago.
I've been looking at Bayesian estimate, but that only considers the total amount of votes independent of time or productivity.
My math fu is pretty strong, so all I need is a nudge in right direction and I can probably modify something to fit my needs.
There are many things you could do here.
The obvious approach is to have your measure of the scores decay with time in your internal calculations, for example using an exponential decay with a time constant T. For example, use value = initial_score*exp(-t/T) where t is the time that's passed since picture was submitted. So if T is one month, after one month this score will contribute 1/e, or about 0.37 that it originally did. (You can also do this differentially, btw, with value -= (dt/T)*value, if that's more convenient.)
There's probably a way to work this with a Bayesian approach, but it seems forced to me. Bayesian approaches are generally about predicting something new based on a (usually large) set of prior data, which doesn't directly match your model.
Sorry the question title isn't very clear, this is a challenging question to ask without providing a more concrete example. Consider the following scenario:
I have a number of friends whose birthdays are coming up on dates (d1..dn), and I've managed to come up with a number of gifts I'd like to purchase them of cost (c1..cn). Unfortunately, I only have a fixed amount of money (m) that I can save per day towards purchasing these gifts. The question I'd like to ask is:
What is the ideal distribution of savings per gift (mi, where the sum of mi from 1..n == m) in order to minimize the aggregate deviance between my friends' birthdays and the date in which I'll have saved enough money to purchase that gift.
What I'm looking for is either a solution to this problem, or a mapping to a solved problem that I can utilize to deterministically answer this question. Thanks for pondering it, and let me know if I can provide any additional clarification!
I think you've stated a form of a knapsack problem with some additional complications - the knapsack problem is NP-Complete (p 247, Garey and Johnson). The basic knapsack problem is where you have a number of objects each with a volume and a value - you want to fill a knapsack of fixed volume with the objects to maximize the value without exceeding the knapsack capacity.
Given that you have stages (days) and resources (money) and the resources change by day while you decide what purchases to make, would lead me to a dynamic programming solution technique rather than a straight optimization model.
Could you clarify in comments "minimizing the deviance"? I'm not sure I understand that part.
BTW, mathoverflow.com is probably not helpful for this. If you look at algorithm questions, 50 on stackoverflow and 50 on mathoverflow, you'll find the questions (and answers) on stackoverflow have a lot more in common with the problem you are considering. There is a new site called OR Exchange, but there's not a lot of traffic there yet.