I'm using "Users Points Voting API" module to combine "User Points" module and "Fivestar" module.
It works, but not as expected.
When an user rate a post, the author gains N points.
If an user removes its vote, the author loses the N points.
N is a fixed number (I set it in the settings). It is fixed, and not related to the number of stars, it depends on the number of votes one receives and not how good and how bad are these votes (1,2.. or 5 stars).
Can anybody confirm what I've written ? It seems that the module should work differently because it allows to specify positive and negative votes. Maybe it has not been fully developed for the Fivestar module.
thanks
All votes in the fivestar module is positive in that sense that 1 star is above zero. So the problem is not so much the integration itself, but that the two modules don't mix that well. User points react on positive/negative votes, but since all votes in fivestar are positive, 1 star will gain the same result as 5 star vote.
You could argue this is a bad idea, but this is how it was designed.
Related
I am working on a case to finish my (not so advanced) data scientist course and I have already been helped a lot by topics here, thanks!
Unfortunately now I am stuck again and cannot find an existing answer.
My data comes from a bike shop and I want to see if products bought during customers' first registered purchase are related to/have impact on how important they will become to the shop in the future. I have grouped customers into 5 clusters (from those who registered and made never any registered purchase again, through these who made 2-3 purchases for little money, those who made a few purchases for a lot of money to those who purchase stuff regularly and really bring a lot of money to this bike shop), I have ordered them into an ordinal dependent variable.
As the independent variables I have prepared 20+ binary variables that identify products/services bought during the first purchase from this shop (first purchase as a registered customer). One row per customer. So I want to check the idea if there are combinations of products (probably "extras" to the bike purchase) that can increase the chance that a customer would register and hopefully stay as a loyal customer for the future.
The dream would be be able to say, for example, if you buy a cheap or middle-cheap bike during this first purchase you probably don't contribute so much to the bike shop in a long term so you have low grade on the dependent variable. But those who bought a middle-cheap bike AND a helmet AND a lock (probably to special price) are more likely to become one of the loyal registered customers bringing money for a longer time.
There might be no relation like that but I want to test that anyways. Implementation of the result could be being able to recommend an extra product during a purchase (with a good price on it).
I am learning R during this course. We went through some techniques and first I was imagining it would be possible to work with the neural networks (just cause it sounded most fun to try), having all these products as input in the sparse matrix and the customers clusters as the output (I hoped it was similar to the examples I read about with sparse matrix with pixels from a picture as the input and numbers 1-9 as the output) but then I was told that this actually is based on pictures and real patterns and in my case I don't even know if there is any.
Then I was thinking I could try with the ordinal forest. But it doesn't predict my clusters well, not at all (2 out of 5 clusters get no predictions). But that is OK, I don't expect the first purchase to be able to predict all the customers future. But I would really want to see if there are combinations of products that might increase the chance that a customer ends up in one of the "higher" clusters on the loyalty scale.
I am not sure if this was clear enough. :) Do you think that there is any way of testing my idea? What could I try to do? Let me know if you need more information.
I'm doing some exploratory work on recommendation systems and have been reading about collaborative filtering techniques involving user-based, item-based, and SVD algorithms. I am also trying out R's recommenderlab package.
One apparent assumption in the literature is that the user data has labelled items based on a rating scale, e.g. between 1 and 5 stars. I'm looking at problems where the user data does not have ratings but rather just transactions. For example, if I want to recommend restaurants to a user, the only data I have is how often he has visited other restaurants.
How can I convert these "transaction" counts into ratings that can be used by recommendation algorithms that expect a fixed-scale rating? One approach I thought of is simple binning:
0 stars = 0-1 visits
1 star = 2-3 visits
...
5 stars = 10+ visits
However, that doesn't seem like it would work well. For example, if someone visited a restaurant only once, he may still really love it.
Any help would be appreciated.
I would try different approaches. As you said, only visited once may indicate that the user still loves the restaurant but you don't know for sure. Your goal is not to optimize for one single user rather for all users. So for this, you can split your data into training and test data. Train on the training data with different scales and test on the test data.
The different scales may be
a binary scale (0:never visited, 1: visited). This is mostly used in online shops (bought or not). Would support your assuption with the one time visit.
your presented scale or other ranges for the 5 stars. You can also use more than 5 stars. I would potentially not group 0-1 visits.
The approach with the best accuracy should be chosen.
Here's an idea: restaurants the user has visited zero or one times tell you nothing about what they like. Restaurants they have visited many times tell you lots. Why not just look for restaurants similar to those the customer most regularly frequents? In this way, you're using positive information (what they like) but none of the negative since you don't have access to it anyway.
If you absolutely had to infer some continuous measure, I think it would only be sensible to look at the propensity for another visit given past behaviour. This would start with the prior probability of choosing that restaurant (background frequency, or just uniform over restaurants) with a likelihood term related to the number of visits to that restaurant. In this way the more a user visits a restaurant the more likely they are to visit again.
What I mean by bandwagon effect describes itself like so:
Already top-ranked items have a higher tendency to get voted on at all, possibly even to get upvoted.
What I am hoping to get is some concrete recommendations, at best based on your practical experience with a mathematical formula and in which situation it helped.
However, any useful pointers are more than welcome!
My ranking system
Please consider a ranking system at a website that has a reputation system and where users cast only upvotes on items and the ranking table is reset to start fresh every month.
Every user has one upvote per item within each month, and there is a reward for users who, within a certain month, upvoted an item that made it into the top ranks at the end of that month.
Users are told the following about what increases the weight of their upvote:
1)... the more reputation you have at the time of upvoting
2)... the fewer items you upvote within the current month (including the current upvote)
3)... the fewer upvotes that item already has within the current month before your own upvote
The ranking table is recalculated once a day and is visible to all.
Goal
I'd like to implement part 3) in an effort to correct the ranks of items where one cannot tell if some users just upvoted it because of the bandwagon effect (those users might hope to gain a "tactical" advantage simply by voting what they perceive lots of other users already upvoted)
Also, I hope to mitigate this way against the possible use of sock puppets that managed to attain some reputation, but upvote the same item or group of items.
Question
Is there a (maybe even tested?) mathematical formula that I could just apply on the time-ordered list of upvotes for each item to get a coffecient for each of those upvotes so that their weights will be corrected in a sensible fashion?
I'm thinking it's got to be something of a lograthmic function but I can't quite get a grip on it...
Thank you!
Edit
Zack says: "beyond a certain level of popularity, additional upvotes decrease the probability that something will be displayed"
To further clarify: what I am after is which actual mathematical approaches are worth trying out that will, in the form of a mathematical function, translate this descrease in pop (i.e., apply coefficients to the weights, see above) in sensible, balanced manner.
My hope is someone has practical experience with such approaches in a simmilar or general situation to the one above.
Consider applying the "Indie Rock Peter Principle": beyond a certain level of popularity, additional upvotes decrease the probability that something will be displayed.
Term coined by Leonard Richardson in this paper. Indie Rock Peter is of course from Diesel Sweeties.
I have always disliked the bandwagon effect in voting systems, especially "most viewed" rankings in which simply clicking on a highly ranked item increases its rank. My solution to this problem, which I have never tested or seen implemented, would be to keep track of how an item was reached (and then voted for), and ignore (or greatly decrease the weight of) votes that came from any sorted-by-ranking page.
I have a client who has suggested laying out a long list of categories in a custom order. The order is to be decided by them based on product items they sell the most etc.
I tend to disagree and feel that people browsing the internet prefer to search lists of categories that are in alphabetical order or sorted by something they can take reference of such as a date.
I would like to know others thoughts on this and it would be appreciated if anyone could point me in the direction of any open source surveys that have been taken in this area.
Thanks
Ben
What a silly stance to take regarding a simple customer request. Allow for both orderings, and other ones too. There is no survey that will demonstrate that the client is wrong as they are - by definition - correct.
Code that allows for different orderings has greater utility anyway, and real user data will be able to show them which - if either - should be the default.
I am about to embark on a new project - a video website. Users will be able to register, and vote on videos by clicking "like" or "dislike", or something to that effect. In any event, it will be a 2-option voting system, not a 5-star system.
Every X number of days, I will be generating a "chart" of the most popular videos. So my question is: how should I determine the popularity of a given video?
If I went the route of tallying up the videos with the most views, this could have the effect of exceptionally bad videos making it to the of the charts (just because they're so bad).
If I go the route of a scoring system based on the amount of "like" and "dislike" votes (eg. 100 like votes, and 50 dislike votes equals a score of 2), videos with few views could appear on the top of the charts.
So, what I need to do is a combination of the two. Barring, of course, spammy views and votes.
What's your guys' thoughts on the subject?
Edit: the following tags were removed: [mysql] [postgresql], to make room for other, more representative tags; the SQL technology used in the intended implementation does not seem to bear much on the considerations regarding the rating model per-se.
You seem to be missing the point that likes and dislikes in movies are anything but objective even within the context of a relatively homogeneous group of "voters". Think how the term "Chix Flix" or the success story called "NetFlix", illustrate this subjectivity...
Yet, if you persist in implementing the model you suggest, there are several hidden variables and system dynamics that need to be acknowledged and possibly taken into account in the rating's formula.
the existence of a third, implicit, value of the vote: "No vote"
i.e. when someone views the movie page and yet doesn't vote, either way.
The problem of dealing with this extra value is its ambiguity: do people not vote because they didn't see the movie or because they neither truly like nor disliked it? Very likely a bit of both, therefore we can/should use the count of the "Page views without vote" in the formula, to boost (somewhat) the rating of movies that do not generate a strong (positive or negative) sentiment (lest the "polarizing" movies will appear more notorious or popular)
the bandwagon effect
Past a certain threshold, and particularly if the rating and/or vote counts is visible before the page view, the rating and vote counts can influence the way people decide to vote (either way) or even decide to abstain from voting. The implication is that the total vote and/or view counts do not relate linearly to the effective rating.
"quality" vs. "notoriety"
Vote ratios in general (eg "likes" / "total" or "likes"/"dislikes" etc.) are indicative of the "quality" of a movie (note the quotes around quality...), whereby the number of votes (and of views) is indicative of the notoriety ("name recognition" etc.) of a movie.
statistical representativity
Very small vote and/or view counts are to be handled carefully because they introduce much volatility in the rating. Phrased otherwise, small samples make for not so statically representative ratings.
trends (the time variable)
At the risk of complicating the model, consider keeping [some] record of when votes/view happened, to allow identifying "hot" (and "cooling") movies in the collection. This info may inform the rating logic, but also may be used to direct the users towards currently hot items. BTW, hence feeding the bandwagon effect mentioned :-( but also, increasing the voting sample size :-).
All these considerations suggest caution in implementing this rating system. It also hints at the likely need of including statistics about the complete set of movies into the rating formula for an individual movie. In other words, do not rate a given movie solely on the basis of the its own vote/view counts but also on say the average vote counts a move receives, the maximum view a movie page gets etc. In fact, an iterative process, whereby movies are [roughly] ranked at first and then the ranking is recalculated by using the statistics of groups of movies similarly rated may provide a better system (provided the formulas are "fair" and somehow converge)
A standard trick is to start with a neutral baseline: say 10 likes and 10 dislikes that gives a score of 1. The first few votes don't change the ratio too much, but as votes accumulate, the baseline is overwhelmed. The exact choice of the baseline values will influence the rating of a new movie (the two values don't have to be equal), and how many votes are needed to change the rating substantially.