When will the netfix prize be won? - netflix

Does anyone have a graph of the best Netflix prize submission by day? I'd like to get a prediction of when it will be solved based on extrapolating the existing progress.
Alternativly, when do you think it will be won and why?

You can parse the leaderboard, then sort it by date, then calculate the daily maximum, etc. Personally, I do not believe such an extrapolation should give much, because you intend to predict the result of a complex process having many unknowns. Here's a nice review of the contest from NYT.

Contest begins October 2, 2006 and continues through at least October 2, 2011.
See The Rules

Related

Using OptaPlanner to create school time tables with some tricky constraints

I'm going to use OptaPlanner to lay out time tables for a school.
We're laying out the time tables for a full semester and every week could, if necessary, be slightly different.
There are some tricky constraints to take into account:
1. Weekly schedules
The lectures in one subject should be spread out somewhat evenly over the semester.
We can't for example put 20 math lectures the first week and "be done" with math for this semester.
In fact, it's nice to have some weekly predictibility
"Science year 2 have biology on Tuesday mornings"
This constraint must not be carved in stone however. Some weeks have to include work experience sessions, PE excursions, etc, in which case they must deviate from other weeks.
Problem
If I create a constraint that say, gives -1soft for not scheduling a subject the same time as the previous week, then OptaPlanner will waste a lot of time before it "accidentally" finds a good placement for a lecture, and even if it manages to converge so that each subject is scheduled the same time every week, it will never ever manage to move the entire series of lectures by moving them one by one. (That local optimum will never be escaped.)
2. Cross student group subjects
There's a large correlation between student groups and courses; For example, all students in Science year 2 mostly reads the same courses: Chemistry for Science year 2, Biology for Sience year 2, ...
The exception being language courses.
Each student can choose to study French, German or Spanish. So Spanish for year 2 is studied by a cross section of Science year 2 students, and Social Studies year 2 students, etc.
From the experience of previous (manual) scheduling, the optimal solution it's almost guaranteed to schedule all language classes in the same time slots. (If French is scheduled at 9 on Thursdays, then German and Spanish can be scheduled "for free" at 9 on Thursdays.)
Problem
There are many time slots in one semester, and the chances that OptaPlanner will discover a solution where all language lectures are scheduled at the same time by randomly moving individual lectures is small.
Also, similarly to problem 1: If OptaPlanner does manage to schedule French, German and Spanish at the same time, these "blocks" will never be moved elsewhere, since they are individual lectures, and the chances that all lectures will "randomly" move to the same new slot is tiny. Even with a large Tabu history length and so on.
My thoughts so far
As for problem 1 ("Weekly predictability") I'm thinking of doing the following:
In the construction phase for the full-semester-schedule I create a reduced version of the problem, that schedules (a reduced set of lectures) into a single "template week". Let's call it a "single-week-pre-scheduling". This template week is then repeated in the construction of the initial solution of the full semester which is the "real" planning entity.
The local search steps will then only focus on inserting PE excursions etc, and adjusting the schedule for the affected weeks.
As for problem 2 I'm thinking that the solution to problem 1 might solve this. In a 1 week schedule, it seems reasonable to assume that OptaPlaner will realize that language classes should be scheduled at the same time.
Regarding the local optimum settled by the single-week-pre-scheduling ("Biology is scheduled on Tuesday mornings"), I imagine that I could create a custom move operation that "bundles" these lectures into a single move. I have no idea how simple this is. I would really like to keep the code as simple as possible.
Questions
Are my thoughts reasonable? Is there a more clever way to approach these problems? If I have to create custom moves anyways, perhaps I don't need to construct a template-week?
Is there a way to assign hints or weights to moves? If so, I could perhaps generate moves with slightly larger weight that adjusts scheduling to adhere to predictable weeks and language scheduled in the same time slots.
A question well asked!
With regards to your first problem, I suggest you take a look at OptaWeb Employee Rostering and the concept of rotations. A rotation is "how things generally are" and then Planner has the freedom to diverge from the rotation at a penalty. Once you understand the concept of the rotation from the UI, take a look at the planning entity Shift and how the rotation is implemented with the use of employee and rotationEmployee variables. Note that only the employee is an actual #PlanningVariable, with the rotationEmployee being fixed.
That means that you have to define your rotations manually, therefore doing the work of the solver yourself. However, since this operation is only done once a semester I assume, maybe the solution could be to have a simpler solver generate a reasonable general rotation first, and then a second solver would take it and figure out the specific necessary adjustments?
With regards to your second problem, rotations could help there too. But I'm thinking maybe some move filtering and custom moves to help OptaPlanner to either move all language classes, or none? Writing efficient custom moves is not easy, and filtering stock moves is cumbersome. So I would only do it when the potential of other options is exhausted. If you end up doing this, look for MoveIteratorFactory.
My answer is a little vague, as we do not get into the specifics of the domain model, but for the purposes of designing the overall solution, it hopefully gives enough clues.

Find the equation to calculate Daily, Weekly, and Monthly rental costs

I have recently taken over a web development project for a local car rental company and need help finding out how to calculate the Daily, Weekly, and Monthly cost of a vehicle.
The previous developer used a plugin that allowed you to create "Pricing Schemes" where you define a day range and its price:
19.99/day, 99.99/week, 299.99/month:
Day 1-5 = $19.99
Day 5-6 = $16.665
Day 6-7 = 14.284
Day 7-8 = $14.9975
and so on...
Sadly the developer left no notes on how he got these numbers and each pricing scheme he made only extends to the 31st day. Which causes an issue when a user wants to rent a car for longer than a month (Which is common).
What I need help finding out is the equation he used to get these numbers so I can add on to the pricing schemes and create others if the need arises. I will add a screenshot of a full pricing scheme for reference below.
Any help with this would be greatly appreciated and I will be available to answer any questions if my question is not clear enough. Thank you!

How can I use the occurrences to calculate the end date of a recurring event in eas

Can anyone tell me the best way of calculating the end date of a recurring event from the number of occurrences and the pattern in which the event occurs.
For example:
I have an event which has start date as 10/07/2014 (Tuesday) and occurs every week on Tuesday. This event will end after 10 occurrences (say). So, the my method should return me the end date as : 12/09/2014
The method should also consider more complex situations like suppose if the event occurs yearly on first Monday of October and has total 10 occurrences.
(This isn't an answer which gives you a complete solution by any means, but hopefully it's a step in the right direction.)
Good luck. I've worked on an ActiveSync implementation, and recurrent events are fundamentally painful. You'll need to think about all kinds of corner cases - if something occurs every month on the 30th, what happens in February? What happens if it happens at 1.30am, and the clocks go forward or backward in the event's time zone so that 1.30am happens 0 or 2 times for a particular day?
Noda Time can help with this, but it doesn't provide a complete solution, partly because all the requirements will vary so much.
The important types you'll need to know about are LocalDate and LocalDateTime to provide time-zone-neutral dates/times, and Period which represents a not-necessarily-fixed period of time, such as "1 month". That will help with things like "add a week" - and there are methods on LocalDate for things like "next Monday after this date". It gets harder for events which are "weekly, on Monday and Wednesday" - you'll want to step through the weeks, working out which days occur within a particular week, until you've gone through all the events you need.
Noda Time 2.0 has the concept of "adjusters" which will make life somewhat simpler for things like "the first Monday of October" but everything you need to do can be done with Noda Time 1.3. (Don't wait for Noda Time 2.0, which I wouldn't expect to be released for another 6 months at least.)
I think my strongest pieces of advice would be:
Keep it simple. Focus on getting the right results first, then work out any optimizations you need. (For example, don't try to "guess" when the 100th instance of an event will occur - stepping through 100 instances with simple steps will be slower, but get you to the right answer. Do measure the performance, but make sure you have good tests before you optimize.)
Introduce your own types to represent exactly what you know about the event. Use the Noda Time types where they match of course, but don't be tempted to use an existing type just because it's quite like what you're trying to represent. The small differences will bit you eventually.
Make sure you know what you actually want the results to be. Write lots of tests. Date and time work is a naturally data-oriented domain, so invest in making it as easy as possible to write tests for all the corner cases you should be thinking of. (And you really should be thinking about them. Pay particular attention to leap years and time zones.)
Be aware that time arithmetic doesn't follow the normal rules of arithmetic - x + 1 month + 1 month isn't the same as x + 2 months
If/when behaviour surprises you, do come back to ask specific questions here. There aren't very many of us working on Noda Time, but questions tend to be answered quickly :)

Ratingsystem that considers time and activity

I'm looking for a rating system that does not only weight the rating on number of votes, but also time and "activity"
To clarify a bit:
Consider a site where users produce something, like a picture.
There is another type of user that can vote on other peoples pictures (on a scale 1-5), but one picture will only recieve one vote.
The rating a productive user gets is derived from the rating his/hers pictures have recieved, but should be affected by:
How long ago the picture was made
How productive the user has been
A user who's getting 3's and 4's and still making 10 pictures per week should get higher rating than a person that have gotten 5's but only made 1 pic per week and stopped a few month ago.
I've been looking at Bayesian estimate, but that only considers the total amount of votes independent of time or productivity.
My math fu is pretty strong, so all I need is a nudge in right direction and I can probably modify something to fit my needs.
There are many things you could do here.
The obvious approach is to have your measure of the scores decay with time in your internal calculations, for example using an exponential decay with a time constant T. For example, use value = initial_score*exp(-t/T) where t is the time that's passed since picture was submitted. So if T is one month, after one month this score will contribute 1/e, or about 0.37 that it originally did. (You can also do this differentially, btw, with value -= (dt/T)*value, if that's more convenient.)
There's probably a way to work this with a Bayesian approach, but it seems forced to me. Bayesian approaches are generally about predicting something new based on a (usually large) set of prior data, which doesn't directly match your model.

How to mitigate against bandwagon effect (voting behavior) in my ranking system?

What I mean by bandwagon effect describes itself like so:
Already top-ranked items have a higher tendency to get voted on at all, possibly even to get upvoted.
What I am hoping to get is some concrete recommendations, at best based on your practical experience with a mathematical formula and in which situation it helped.
However, any useful pointers are more than welcome!
My ranking system
Please consider a ranking system at a website that has a reputation system and where users cast only upvotes on items and the ranking table is reset to start fresh every month.
Every user has one upvote per item within each month, and there is a reward for users who, within a certain month, upvoted an item that made it into the top ranks at the end of that month.
Users are told the following about what increases the weight of their upvote:
1)... the more reputation you have at the time of upvoting
2)... the fewer items you upvote within the current month (including the current upvote)
3)... the fewer upvotes that item already has within the current month before your own upvote
The ranking table is recalculated once a day and is visible to all.
Goal
I'd like to implement part 3) in an effort to correct the ranks of items where one cannot tell if some users just upvoted it because of the bandwagon effect (those users might hope to gain a "tactical" advantage simply by voting what they perceive lots of other users already upvoted)
Also, I hope to mitigate this way against the possible use of sock puppets that managed to attain some reputation, but upvote the same item or group of items.
Question
Is there a (maybe even tested?) mathematical formula that I could just apply on the time-ordered list of upvotes for each item to get a coffecient for each of those upvotes so that their weights will be corrected in a sensible fashion?
I'm thinking it's got to be something of a lograthmic function but I can't quite get a grip on it...
Thank you!
Edit
Zack says: "beyond a certain level of popularity, additional upvotes decrease the probability that something will be displayed"
To further clarify: what I am after is which actual mathematical approaches are worth trying out that will, in the form of a mathematical function, translate this descrease in pop (i.e., apply coefficients to the weights, see above) in sensible, balanced manner.
My hope is someone has practical experience with such approaches in a simmilar or general situation to the one above.
Consider applying the "Indie Rock Peter Principle": beyond a certain level of popularity, additional upvotes decrease the probability that something will be displayed.
Term coined by Leonard Richardson in this paper. Indie Rock Peter is of course from Diesel Sweeties.
I have always disliked the bandwagon effect in voting systems, especially "most viewed" rankings in which simply clicking on a highly ranked item increases its rank. My solution to this problem, which I have never tested or seen implemented, would be to keep track of how an item was reached (and then voted for), and ignore (or greatly decrease the weight of) votes that came from any sorted-by-ranking page.

Resources