Formula for Smiley rating system - rating-system

I'm using the smiley rating system in my website: Like (Smile face), Dislike (Sad face), and Neutral (Straight face).
I used the formula below for evaluating the overall rating as a smiley and not as a number.
Let x= number of likes - number of dislikes
If x<0, then rating = dislike
If x>0 & x> number of apathetic, then rating = like
If x>0 & x<number of apathetic, then rating = apathetic
How can I calculate the average rating number/score in such rating system?
Note: I need the score in order to sort the highest-rated stuff at the top and lowest-rated at the bottom

I am not sure if you are using object orientated coding, but either way, if you have 'something' to vote on, I would guess that you also have other information on that 'something', which is stored either in a database or some other medium.
That being said, you can simply store the votes, as an integer, with the object being voted on.
Example:
You have a posting site, or a blog where users can vote on posts made by other users.
Each post would be stored with information like:
Who posted it
The date
Details of the post
With that information you can also store the upvotes and downvotes for it.
Or if you wanted to take it further you could create a vote object and a relationship to the post if you wanted to store more than just the amount of upvotes and downvotes (additional information like who voted or the date they voted, if for instance you did not want people to vote on the same thing twice).
Something else to keep in mind, always keep track of all the upvotes + downvotes and use logic to determine the display. Do not just keep a single integer and add or subtract to it. A post with 1 upvote and 1 downvote = 0 score, is not the same as a post with 1000 upvotes and 1000 downvotes = 0 score, as it got 2000 views, which in its own should account for something.
The correct way to do this:
I know that you do not want to display amounts, and rather emoticons, but you should still store it in that way and render the display differently.
Create an additional table and call it something like vote
Store information like the voter, date, vote type
Link the table in a many-to-many relationship between users and votes
Every time a vote is made, update the table
Calculate the difference between the ups and downs and display the correct smileys on your front-end, or display it as a number
The "I have a crappy client", "I'm lazy" or "I'm coding on 2 hours of sleep" way of doing it:
Create a single integer field with the object being voted on
Increase and decrease that as users vote
Hope I understood the question and this helps!

Related

Firestore data model that can filter by 'not contains' or similar [duplicate]

This is the current sample structure
Posts(Collection)
- post1Id : {
viewCount : 100,
likes : 45,
points : 190,
title : "Title",
postType : image/video
url : FileUrl,
createdOn : Timestamp,
createdBy : user20Id,
userName : name,
profilePic: url
}
Users(Collection)
- user1Id(Document):{
postsCount : 10,
userName : name,
profilePic : url
}
viewed(Collection)
- post1Id(Document):{
viewedTime : ""
}
- user2Id(Document)
The End goal is
I need to getPosts that the current user did not view and in points field descending order with paging.
What are the possible optimal solutions(like changing structure, cloud functions, multiple queries from client-side)?
I'm working on a solution to show trending posts and eliminate posts that are already seen by users or poor content. It's really painful to deal with two queries especially when the user base is increasing.
It's difficult to maintain the "viewed" collection and filter the new posts. Imagine having 1 million viewed posts and then filter for the un-seen posts.
So I figured a solution, which is not that great, but still cool.
So here is our data structure
posts(Collection)
--postid(document)
Title.
Description.
Image.
timestamp.
priority
This is a simple post structure with basic details. You can see I have added a Priority field. This field will do the magic.
How to use Priority.
We should query the posts that start with the higher priority and ends with lower priority.
When a user posts a new Post. Assign the current timestamp as the default priority.
When the user upvotes (Likes) a post increase the priority by 1 minute(60000 milliseconds)
When the user downvotes (Dislike) a post decrease the priority by 1 minute (60000 ms)
You can reset the priority every 24 hours. If you start browsing the feed today morning you will see posts with the last 24 hours in past. Once the 24-hour duration reached you can reset the priority to the present time. The 24-hour limit can be changed according to your needs. You may want to reset the limit every 15 min. because in every 15 min 100s of new posts might have added. This limit will ensure the repetition of content in the feed.
So when you start scrolling the feed you will get all the trending posts first then lower priority posts later. If you post a post today and people start upvoting it. It will get an increased lifetime, thus overpowers the poor content and when you downvote it, it will push down the post as long as users will not reach it.
Using timestamp as a priority because the old posts should lose priority with time. Even the trending posts today should lose the priority tomorrow.
Things to consider:
The lifetime can vary according to your needs.
The bigger the user base. You should lower the lifetime value. because if a post posted today is upvoted by 10,000 users it trends 6.9 days in the future. And if there are more than 100 posts that have been upvoted by more than 10,000 users then you will never get to see a new post in those 6.9 days.
So a trending post should hardly last a day or two.
So in this case you can give 10 seconds lifetime, it will give 1.1 day lifetime for 10,000 upvotes.
This is not a perfect solution but it may help you get started.
Edit: 11th June 2021
Nowadays, there are two more options that can help you solve such a problem. The first one would be the whereNotEqualTo method and the second one would be whereNotIn. You might choose one, or the other according to your needs.
Seeing your database structure, I can say you're almost there. According to your comment, you are hosting under the following reference:
Users(Collection) -> userId(Document) -> viewed(Collection)
As documents, all the posts a user has seen and you want to get all the post that the user hasn't seen. Because there is no != (not equal to) operator in Firestore nor a arrayNotContains() function, the only option that you have is to create an extra database call for each post that you want to display and check if that particular post is already seen or not.
To achieve this, first you need to add another property under your post object named postId, which will hold as String the actual post id. Now everytime you want to display the new posts, you should check if the post id already exist in viewed collection or not. If it dons't exist, display that post in your desired view, otherwise don't. That's it.
Edit: According to your comments:
So, for the first post to appear, it needs two Server calls.
Yes, for the first post to appear, two database calls are need, one to get post and second to see if it was or not seen.
large number of server calls to get the first post.
No, only two calls, as explained above.
Am I seeing it the wrong way
No, this is how NoSQL database work.
or there is no other efficient way?
Not I'm aware of. There is another option that will work but only for apps that have limited number of users and limited number of post views. This option would be to store the user id within an array in each post object and everytime you want to display a post, you only need to check if that user id exist or not in that array.
But if a post can be viewd by millions of users, storing millions of ids within an array is not a good option because the problem in this case is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. So you cannot store pretty much everything in a document.

modeling scenario with mostly semi-additive facts

Im learning dimensional modeling and Im trying to create a model. I was thinking about a social media platform which rates hotels. The platform has following data:
hotel information: name and address
a user can rate hotels (1-5 points)
a user can write comments
platform stores the date of the comments
hotel can answer via comment and it stores the date of it
the platform stores the total number of each rating level (i.e.: all rates with 1 point, all rates with 2 point etc.)
platform stores information of the user: sex, name, total number of votes he/she made and address
First, I tried to define which information belongs to a dimension or fact table
(here I also checked which one is additive/semi additive/non-additive)
I realized my example is kind of difficult, because it’s hard to decide if it belongs to a fact table or dimension.
I would like to hear some advice. Would someone agree with my model?
This is how I would model it:
Hotel information -> hotel dimension
User rating -> additive fact – because I can aggregate them with all dimensions
User comment -> semi additive? – because I can aggregate them with the date dimension (I don’t know if my argument is correct, but I know I would have new comments every day, which is for me a reason to store it in a fact table
Answer as comment -> same handling like with the user comments
Date of comment-> dimension
Total Number of all votes (1/2/3/4/5) -> semi-additive facts – makes no sense to aggregate them, since its already total but I would get the average
User information sex and name, address -> user-dimension
User Information: total number of votes -> could be dimension or fact. It depends how often it changes. If it changes often, I store it in a fact. If its not that often, then dimension
I still have question, hope someone can help me:
My Question: should I create two date dimensions, or can I store both information in one date dimension?
2nd Question: each user and hotel just have one address. Are there arguments, to separate the address dimension in a own hierarchy? Can I create a 1:1 relationship to a user dimension and address dimension?
For your model, it looks well considered, but here are some thoughts:
User comment (and answers to comments): they are an event to be captured (with new ones each day, as you mention) so are factual, with dimensionality of the commenter, type of comment, date, and the measure is at least a 'count' which is additive. But you don't want to store big text in a fact so you would put that in a dimension by itself which is 1:1 with the fact, for situations where you need to query on the comment itself.
Total Number of all votes (1/2/3/4/5) are, as you say, already aggregates, mostly for performance. Totals should be easy from the raw data itself so maybe not worthwhile to store them at all. You might also consider updating the hotel dimension with columns (hotel A has 5 '1' votes and 4 '2' votes) that you'd update as you go on, for easy filtering and categorisation.
User Information: total number of votes: it is factual information about a user (dimension) and it depends on whether you always just want to 'find it out' about a person or whether you are likely to use it to filter other information (i.e. show me all reviews for users who have made 10-20 votes). In that case you might store the total in the user dimension (and/or a banding, like 'number of reviews range' with 10-20, 20-30). You can update dimensions often if you need to, but you're right, it could still just live as a fact only.
As for date dimensions, if the 'grain' is 'day' then you only need one dimension, that you refer to from multiple facts.
As for addresses, you're right that there are arguments on both sides! Many people separate addresses into their own dimension, referred to from the other dimensions that use them. Kimball suggests you can do that behind the scenes if necessary, but prefers for each dimension to have its own set of address columns(but modelled as consistently as possible).

Frustrating Formula - Help Needed with Google Sheets formula/method

I am struggling to come up with a formula that fits certain criteria and was hoping someone with a better math brain than me might be able to help. What I have is a Google Sheets based tool that determines how much a someone has purchased of a product and then calculates the amount of times a special additional offer will be redeemed based on the amount spent.
As an example, the offer has three tiers to it. Though the actual costs will be variable for different offers let's say the first tier is gained with a $10 purchase, the second with a $20 purchase and the third with a $35 purchase (the only real relationship between the prices is that they get higher for each tier but there is no specific pattern to the costing of different offers). So if the customer bought $35 worth of goods they would get three free gifts, if they bought $45 worth they would get 4 and then an additional spend of $5 (totaling $50) would then allow them to redeem 5 gifts in total. It can be considered like filling a bucket, each time you hit the red line you get a new gift, when the bucket is full it's emptied and the process begins again.
If each tier of the offer was the same cost (e.g. $5, $10 and $15) this would be a simple case of division by the total purchase amount but as there is no specific relationship between the cost of the tiers (they are based on the value of the contents) I am having trouble coming up with a simple 'bucket filling' formula or calculation method that will work for any price ranges given to it. My current solution involved taking the modulus, subtracting offer amounts from the purchase amount etc. but provides plenty of cases where it breaks . If anyone could give me a start or provide some information that might help in my quest I would be highly appreciative and let me know if my explanation is unclear.! Thanks in advance and all the best
EDIT:
The user has three tiers and then the offer wraps around to the start after the initial three are unlocked once, looping until the offer has been maxed out. Avoiding a long sheet with a dynamic column of prices would be preferable and a small, multicell formula would be ideal
What you need is a lookup table. Create a table with the tier value in the left column, and the corresponding number of gifts for that tier value in the right column. Then you can use Vlookup to match the amount spent to correct tier.
I am not quite sure about, everything into one entire formula(is there a formula for loop and building arrays?)
from my understanding the tier amounts are viable, so every time you add a new tier with a new price limit then it must be calculated with a new limit price number...wouldn't it be much easier to write such module in javascript than in a google sheet? :o
anyways here is my workaround, that may could help you to find an idea
Example Doc
https://docs.google.com/spreadsheets/d/1z6mwkxqc2NyLJsH16NFWyL01y0jGcKrNNtuYcJS5dNw/edit#gid=0
my approach :
- enter purchases value
-> filter all items based by smaller than or equal "<=" (save all item somewhere as placeholder)
-> then decrease the purchases value by amount of existing number(max value) based on filtered items
-> save the new purchases value somewhere and begin from filtering again and decreasing the purchases value
(this needs to be done as many times again, till the purchases is empty)
after that, sums up all placeholder

Trying to build a 'random text' WordPress plugin. No idea where to start

Here’s what I’ve done in a spreadsheet:
I’ve assigned people to one or more categories (ie. male, female, tall, short)
I’ve assigned weights to these people (ie. 200 lbs, 120 lbs, 300 lbs)
I’ve assigned names to these people (ie. John, Jane, Bill)
Here’s what I need to do in a plugin:
Find some way to get my data into it (maybe through an admin interface, or via my spreadsheet)
Filter results by one or more categories (ie. only male; only tall + female, etc)
From those filtered results, pull 2 or 3 people (as many as I can fit) whose combined weights equal X or less
Display the names of those 2 or 3 people as a list to front-end users
At the press of a button, randomly generate another 2 or 3 person team
I don’t mind getting my hands dirty, but I don't know where to begin. If you guys could give me any advice, best practices, code to get me started, or names of plugins that already do this, etc, I’d really appreciate it.
Also, if I’m biting off too much for a complete noob, feel free to let me know. Because if it comes down to it, I’ll just create the teams manually and throw them into a random text plugin, or something.
This depends heavily on what format it's in presently. If it's in a spreadsheet, you can import it pretty easily by saving as a CSV and processing it with fgetscsv
Assuming this is going into MySQL (as most WP plugins do), this is just a SQL query (ie WHERE wp_custom_person_record_weight > 100 AND wp_custom_person_record_name != 'Bill'
Same as #2 but with a JOIN and a SUM and a WHERE query against that sum.
This is the same SQL query, if you call it through mysqli_query you'll get an array back that you can output on the page
Random records can be gleaned a number of ways, either by going through a limit of X,2 where X is a randomly generated number between 0 and the # of records or through MySQL itself (although that is not recommended for performance reasons).

Determining the popularity of a video with ratings and views

I am about to embark on a new project - a video website. Users will be able to register, and vote on videos by clicking "like" or "dislike", or something to that effect. In any event, it will be a 2-option voting system, not a 5-star system.
Every X number of days, I will be generating a "chart" of the most popular videos. So my question is: how should I determine the popularity of a given video?
If I went the route of tallying up the videos with the most views, this could have the effect of exceptionally bad videos making it to the of the charts (just because they're so bad).
If I go the route of a scoring system based on the amount of "like" and "dislike" votes (eg. 100 like votes, and 50 dislike votes equals a score of 2), videos with few views could appear on the top of the charts.
So, what I need to do is a combination of the two. Barring, of course, spammy views and votes.
What's your guys' thoughts on the subject?
Edit: the following tags were removed: [mysql] [postgresql], to make room for other, more representative tags; the SQL technology used in the intended implementation does not seem to bear much on the considerations regarding the rating model per-se.
You seem to be missing the point that likes and dislikes in movies are anything but objective even within the context of a relatively homogeneous group of "voters". Think how the term "Chix Flix" or the success story called "NetFlix", illustrate this subjectivity...
Yet, if you persist in implementing the model you suggest, there are several hidden variables and system dynamics that need to be acknowledged and possibly taken into account in the rating's formula.
the existence of a third, implicit, value of the vote: "No vote"
i.e. when someone views the movie page and yet doesn't vote, either way.
The problem of dealing with this extra value is its ambiguity: do people not vote because they didn't see the movie or because they neither truly like nor disliked it? Very likely a bit of both, therefore we can/should use the count of the "Page views without vote" in the formula, to boost (somewhat) the rating of movies that do not generate a strong (positive or negative) sentiment (lest the "polarizing" movies will appear more notorious or popular)
the bandwagon effect
Past a certain threshold, and particularly if the rating and/or vote counts is visible before the page view, the rating and vote counts can influence the way people decide to vote (either way) or even decide to abstain from voting. The implication is that the total vote and/or view counts do not relate linearly to the effective rating.
"quality" vs. "notoriety"
Vote ratios in general (eg "likes" / "total" or "likes"/"dislikes" etc.) are indicative of the "quality" of a movie (note the quotes around quality...), whereby the number of votes (and of views) is indicative of the notoriety ("name recognition" etc.) of a movie.
statistical representativity
Very small vote and/or view counts are to be handled carefully because they introduce much volatility in the rating. Phrased otherwise, small samples make for not so statically representative ratings.
trends (the time variable)
At the risk of complicating the model, consider keeping [some] record of when votes/view happened, to allow identifying "hot" (and "cooling") movies in the collection. This info may inform the rating logic, but also may be used to direct the users towards currently hot items. BTW, hence feeding the bandwagon effect mentioned :-( but also, increasing the voting sample size :-).
All these considerations suggest caution in implementing this rating system. It also hints at the likely need of including statistics about the complete set of movies into the rating formula for an individual movie. In other words, do not rate a given movie solely on the basis of the its own vote/view counts but also on say the average vote counts a move receives, the maximum view a movie page gets etc. In fact, an iterative process, whereby movies are [roughly] ranked at first and then the ranking is recalculated by using the statistics of groups of movies similarly rated may provide a better system (provided the formulas are "fair" and somehow converge)
A standard trick is to start with a neutral baseline: say 10 likes and 10 dislikes that gives a score of 1. The first few votes don't change the ratio too much, but as votes accumulate, the baseline is overwhelmed. The exact choice of the baseline values will influence the rating of a new movie (the two values don't have to be equal), and how many votes are needed to change the rating substantially.

Resources