This is the current sample structure
Posts(Collection)
- post1Id : {
viewCount : 100,
likes : 45,
points : 190,
title : "Title",
postType : image/video
url : FileUrl,
createdOn : Timestamp,
createdBy : user20Id,
userName : name,
profilePic: url
}
Users(Collection)
- user1Id(Document):{
postsCount : 10,
userName : name,
profilePic : url
}
viewed(Collection)
- post1Id(Document):{
viewedTime : ""
}
- user2Id(Document)
The End goal is
I need to getPosts that the current user did not view and in points field descending order with paging.
What are the possible optimal solutions(like changing structure, cloud functions, multiple queries from client-side)?
I'm working on a solution to show trending posts and eliminate posts that are already seen by users or poor content. It's really painful to deal with two queries especially when the user base is increasing.
It's difficult to maintain the "viewed" collection and filter the new posts. Imagine having 1 million viewed posts and then filter for the un-seen posts.
So I figured a solution, which is not that great, but still cool.
So here is our data structure
posts(Collection)
--postid(document)
Title.
Description.
Image.
timestamp.
priority
This is a simple post structure with basic details. You can see I have added a Priority field. This field will do the magic.
How to use Priority.
We should query the posts that start with the higher priority and ends with lower priority.
When a user posts a new Post. Assign the current timestamp as the default priority.
When the user upvotes (Likes) a post increase the priority by 1 minute(60000 milliseconds)
When the user downvotes (Dislike) a post decrease the priority by 1 minute (60000 ms)
You can reset the priority every 24 hours. If you start browsing the feed today morning you will see posts with the last 24 hours in past. Once the 24-hour duration reached you can reset the priority to the present time. The 24-hour limit can be changed according to your needs. You may want to reset the limit every 15 min. because in every 15 min 100s of new posts might have added. This limit will ensure the repetition of content in the feed.
So when you start scrolling the feed you will get all the trending posts first then lower priority posts later. If you post a post today and people start upvoting it. It will get an increased lifetime, thus overpowers the poor content and when you downvote it, it will push down the post as long as users will not reach it.
Using timestamp as a priority because the old posts should lose priority with time. Even the trending posts today should lose the priority tomorrow.
Things to consider:
The lifetime can vary according to your needs.
The bigger the user base. You should lower the lifetime value. because if a post posted today is upvoted by 10,000 users it trends 6.9 days in the future. And if there are more than 100 posts that have been upvoted by more than 10,000 users then you will never get to see a new post in those 6.9 days.
So a trending post should hardly last a day or two.
So in this case you can give 10 seconds lifetime, it will give 1.1 day lifetime for 10,000 upvotes.
This is not a perfect solution but it may help you get started.
Edit: 11th June 2021
Nowadays, there are two more options that can help you solve such a problem. The first one would be the whereNotEqualTo method and the second one would be whereNotIn. You might choose one, or the other according to your needs.
Seeing your database structure, I can say you're almost there. According to your comment, you are hosting under the following reference:
Users(Collection) -> userId(Document) -> viewed(Collection)
As documents, all the posts a user has seen and you want to get all the post that the user hasn't seen. Because there is no != (not equal to) operator in Firestore nor a arrayNotContains() function, the only option that you have is to create an extra database call for each post that you want to display and check if that particular post is already seen or not.
To achieve this, first you need to add another property under your post object named postId, which will hold as String the actual post id. Now everytime you want to display the new posts, you should check if the post id already exist in viewed collection or not. If it dons't exist, display that post in your desired view, otherwise don't. That's it.
Edit: According to your comments:
So, for the first post to appear, it needs two Server calls.
Yes, for the first post to appear, two database calls are need, one to get post and second to see if it was or not seen.
large number of server calls to get the first post.
No, only two calls, as explained above.
Am I seeing it the wrong way
No, this is how NoSQL database work.
or there is no other efficient way?
Not I'm aware of. There is another option that will work but only for apps that have limited number of users and limited number of post views. This option would be to store the user id within an array in each post object and everytime you want to display a post, you only need to check if that user id exist or not in that array.
But if a post can be viewd by millions of users, storing millions of ids within an array is not a good option because the problem in this case is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. So you cannot store pretty much everything in a document.
Related
I use Google Vision API Product Search for a while now and realized some entire product sets get unindexed after some time...
These are same product sets get indexed without error, but after some time if not queried, product sets get unindexed (their index time is reset to 1970-01-01T00:00:00Z).
If I query them after some days, I get a no product found response.
After being queried, the product set gets reindexed after an hour (1 or 2 cycles).
Is it a normal feature of the API? if so where can I read more about it?
The indexing is automatic. If your set has the 0 timestamp, it hasn't been indexed yet.
You could verify that you have added the products by listing all of them. If the product set has no images or missing one image, it won't be indexed.
The Product Search index of products is updated approximately every 30
minutes. When images are added or deleted, the change won't be
reflected in your Product Search responses until the index is next
updated.
Another option to make sure that indexing is completed successfully is to check the index time field of the product set.
The index time indicates the time at which the last index was. If this is increasing, that means it's still up to date.
You can see this document and more information about productSet.
Another possibility is that you’re hitting some type of limitation that affects the indexation, like quota limits.
I don’t have access into your project if this issue is affecting you drastically and persists, you could open a support case and a colleague will look deeply into your project to see if there is something wrong.
This is the current sample structure
Posts(Collection)
- post1Id : {
viewCount : 100,
likes : 45,
points : 190,
title : "Title",
postType : image/video
url : FileUrl,
createdOn : Timestamp,
createdBy : user20Id,
userName : name,
profilePic: url
}
Users(Collection)
- user1Id(Document):{
postsCount : 10,
userName : name,
profilePic : url
}
viewed(Collection)
- post1Id(Document):{
viewedTime : ""
}
- user2Id(Document)
The End goal is
I need to getPosts that the current user did not view and in points field descending order with paging.
What are the possible optimal solutions(like changing structure, cloud functions, multiple queries from client-side)?
I'm working on a solution to show trending posts and eliminate posts that are already seen by users or poor content. It's really painful to deal with two queries especially when the user base is increasing.
It's difficult to maintain the "viewed" collection and filter the new posts. Imagine having 1 million viewed posts and then filter for the un-seen posts.
So I figured a solution, which is not that great, but still cool.
So here is our data structure
posts(Collection)
--postid(document)
Title.
Description.
Image.
timestamp.
priority
This is a simple post structure with basic details. You can see I have added a Priority field. This field will do the magic.
How to use Priority.
We should query the posts that start with the higher priority and ends with lower priority.
When a user posts a new Post. Assign the current timestamp as the default priority.
When the user upvotes (Likes) a post increase the priority by 1 minute(60000 milliseconds)
When the user downvotes (Dislike) a post decrease the priority by 1 minute (60000 ms)
You can reset the priority every 24 hours. If you start browsing the feed today morning you will see posts with the last 24 hours in past. Once the 24-hour duration reached you can reset the priority to the present time. The 24-hour limit can be changed according to your needs. You may want to reset the limit every 15 min. because in every 15 min 100s of new posts might have added. This limit will ensure the repetition of content in the feed.
So when you start scrolling the feed you will get all the trending posts first then lower priority posts later. If you post a post today and people start upvoting it. It will get an increased lifetime, thus overpowers the poor content and when you downvote it, it will push down the post as long as users will not reach it.
Using timestamp as a priority because the old posts should lose priority with time. Even the trending posts today should lose the priority tomorrow.
Things to consider:
The lifetime can vary according to your needs.
The bigger the user base. You should lower the lifetime value. because if a post posted today is upvoted by 10,000 users it trends 6.9 days in the future. And if there are more than 100 posts that have been upvoted by more than 10,000 users then you will never get to see a new post in those 6.9 days.
So a trending post should hardly last a day or two.
So in this case you can give 10 seconds lifetime, it will give 1.1 day lifetime for 10,000 upvotes.
This is not a perfect solution but it may help you get started.
Edit: 11th June 2021
Nowadays, there are two more options that can help you solve such a problem. The first one would be the whereNotEqualTo method and the second one would be whereNotIn. You might choose one, or the other according to your needs.
Seeing your database structure, I can say you're almost there. According to your comment, you are hosting under the following reference:
Users(Collection) -> userId(Document) -> viewed(Collection)
As documents, all the posts a user has seen and you want to get all the post that the user hasn't seen. Because there is no != (not equal to) operator in Firestore nor a arrayNotContains() function, the only option that you have is to create an extra database call for each post that you want to display and check if that particular post is already seen or not.
To achieve this, first you need to add another property under your post object named postId, which will hold as String the actual post id. Now everytime you want to display the new posts, you should check if the post id already exist in viewed collection or not. If it dons't exist, display that post in your desired view, otherwise don't. That's it.
Edit: According to your comments:
So, for the first post to appear, it needs two Server calls.
Yes, for the first post to appear, two database calls are need, one to get post and second to see if it was or not seen.
large number of server calls to get the first post.
No, only two calls, as explained above.
Am I seeing it the wrong way
No, this is how NoSQL database work.
or there is no other efficient way?
Not I'm aware of. There is another option that will work but only for apps that have limited number of users and limited number of post views. This option would be to store the user id within an array in each post object and everytime you want to display a post, you only need to check if that user id exist or not in that array.
But if a post can be viewd by millions of users, storing millions of ids within an array is not a good option because the problem in this case is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. So you cannot store pretty much everything in a document.
I have authenticated users and would like to be able to protect myself from users spamming reads on a particular ref (thus driving up costs), how do you do this? I see the question here:
Firebase rate limiting in security rules?
That involves rate limiting writes by:
"The trick is to keep an audit of the last time a user posted a
message" - Kato
Is there a way to determine the last time a user read, and then limit their next read to some time interval from their last read? Probably better is limiting the amount of reads in a certain timeframe (say n reads per hour)?
Thanks
I just read that firebase uses a burstable billing plan, as seen here:
https://en.wikipedia.org/wiki/Burstable_billing
Such that you are not charged for spikes from a malicious user doing what I describe here or from a DDOS.
How about this:
A structure:
posts
post_id_0
msg: "a post about posting"
post_id_1
msg: "a post about pizza"
and a users node
users
uid_0
name: "biff"
post_activity
post_id_0
last_activity: "20170128100200"
pseudo-code since we don't know the platform
display a lists of posts in a table
a post about posting
a post about pizza
user taps or clicks 'a post about posting'
in code, get the last activity, which was today at 10:02 am
lastActivity = uid_0/post_activity/post_id_0/last_activity
and then compare the last activity to the current time, and if it's been accessed less then a minute ago, don't allow them to read it again
let currentTimestamp = current time (say it's 10:04 am)
if currentTimestamp - lastActivity > 1 minute then
show post details
update the lastActivity node to current timestamp
else
print("Posts can only be reviewed every minute")
In this case the last time the post was read 2 minutes ago, so allow it to be read again. If it was less than a minute it would be denied.
Also, if the user would tap/click post_id_1 the post activity would not be found which means the user has never viewed it before; in that case add it to the uid_0/post_activity node.
The same technique could be used with a counter instead of a time to limit the number of times the user reads a post.
I'm coming from a SCORM end and trying to figure out two related issues with how to do update and find the most recent data (ie, looking for best practices).
In SCORM I'd have a set of activities that would all store their answers and scores (easily understandable from the docs etc). The "how" I'm after is specifically related to resuming the set of activities multiple times, and hitting "reset" and submitting a different answer to a single activity after a statement has been sent in.
From what I read with xAPI it states that statements are immutable - so how would I go about this.
My first thought was that I'd make the statement id generated from the activity id and void the old answer when it changes - but that sounds wrong (not least because it reads like you can't re-use the id even with voiding).
So it looks like the Statement id needs to be unique, which would mean that multiple identical Objects would be found - so would I have to look through every attempt and check for the latest one?
I'm currently looking at using xAPIWrapper in the middle.
Moving from SCORM to xAPI requires a change of mindset. SCORM deals with statuses which get updated; xAPI logs events like a journal.
You can think of it like Facebook. You post a photo of your new cat; a month later you post a photo of your cat 1 month older. There's no need to go back and delete the old post. If you want the latest photo of your cat you just go and get the most recent photo tagged "Ryochet's cat". You can also look at older photos to see how your cat developed. xAPI is like that activity stream on Facebook.
So, if somebody scores 10 points on their first attempt, then 20 points on their second attempt, you'd simply send a second set of statements about the 2nd attempt. There's no need to get rid of the statements about the old attempt, that happened and is useful data to see how the learner developed.
I am building RSS feed for the first time and I have some simple, direct questions that I was unable to find on the web, well at list in a sense that would be clear to me. Can you help me understand following
Which items should I include in RSS generation? should I always put in all the articles or what is the criteria when I query my articles for the feed?
What value should I set for pubDate? The specification says "The publication date for the content in the channel. For example, the New York Times publishes on a daily basis, the publication date flips once every 24 hours. That's when the pubDate of the channel changes.". I do not quite understand how to apply this to my feed. I have new articles daily, should I set the pubDate to let say 06:00 AM today and update it every day?
lastBuildDate: if I understand this right is the date of the latest updated item?
Which items should I include in RSS generation?
You should have one generic feed with all the new articles you post (for example: news). Additionally if you got your webpage split into categories, or you have some specific feeds (eg. calendar of the events) then it's good to create additional separate RSS for each one of them
What value should I set for pubDate? I do not quite understand how to apply this to my feed. I have new articles daily, should I set the pubDate to let say 06:00 AM today and update it every day?
Always set pubDate to the time when your news/articles went online. So if you have new articles daily pubDate should be a date when they were released to the public. Not random hour in the morning. Not the moment when you started writing them.
lastBuildDate: if I understand this right is the date of the latest updated item?
lastBuildDate is the most recent date when any of the results was posted or modified. Usually you should skip it - especially if your lastBuildDate will be simply a most recent pubDate. It's an optional parameter.
I use lastBuildDate only for calendar RSS feeds to show when the calendar was updated (as in calendars you not only add new entries but also often edit existing).
You should put every article, but the best is to provide different feeds for different categories, even search keywords. You can build it like any dynamic page, with a querystring.
that's not super important, you can put whatever. I don't think may feed readers use it.
theoretically it's the date the content changed. So the date of the latest updated item should work.
Something super important, since people are going to do polling on this page (meaning a lot of requests on the page)
- Cache it on your server
- Serve and Etag header and/or a LastModifiedDate. That way your server can respond with just a "not modified" if the client has it in cache already.