How to retrieve Medium posts for a user real time? - web-scraping

I am trying to get Medium user posts real time. I tried:
https://medium.com/feed/#your_profile
https://medium.com/#yourhandle/latest?format=json
However, I'm getting several minutes cached reply back based on firstPublishedAt param in JSON response. I need to be able to get user posts as soon as they are posted.
Any advice how to achieve that?

Related

Firestore data model that can filter by 'not contains' or similar [duplicate]

This is the current sample structure
Posts(Collection)
- post1Id : {
viewCount : 100,
likes : 45,
points : 190,
title : "Title",
postType : image/video
url : FileUrl,
createdOn : Timestamp,
createdBy : user20Id,
userName : name,
profilePic: url
}
Users(Collection)
- user1Id(Document):{
postsCount : 10,
userName : name,
profilePic : url
}
viewed(Collection)
- post1Id(Document):{
viewedTime : ""
}
- user2Id(Document)
The End goal is
I need to getPosts that the current user did not view and in points field descending order with paging.
What are the possible optimal solutions(like changing structure, cloud functions, multiple queries from client-side)?
I'm working on a solution to show trending posts and eliminate posts that are already seen by users or poor content. It's really painful to deal with two queries especially when the user base is increasing.
It's difficult to maintain the "viewed" collection and filter the new posts. Imagine having 1 million viewed posts and then filter for the un-seen posts.
So I figured a solution, which is not that great, but still cool.
So here is our data structure
posts(Collection)
--postid(document)
Title.
Description.
Image.
timestamp.
priority
This is a simple post structure with basic details. You can see I have added a Priority field. This field will do the magic.
How to use Priority.
We should query the posts that start with the higher priority and ends with lower priority.
When a user posts a new Post. Assign the current timestamp as the default priority.
When the user upvotes (Likes) a post increase the priority by 1 minute(60000 milliseconds)
When the user downvotes (Dislike) a post decrease the priority by 1 minute (60000 ms)
You can reset the priority every 24 hours. If you start browsing the feed today morning you will see posts with the last 24 hours in past. Once the 24-hour duration reached you can reset the priority to the present time. The 24-hour limit can be changed according to your needs. You may want to reset the limit every 15 min. because in every 15 min 100s of new posts might have added. This limit will ensure the repetition of content in the feed.
So when you start scrolling the feed you will get all the trending posts first then lower priority posts later. If you post a post today and people start upvoting it. It will get an increased lifetime, thus overpowers the poor content and when you downvote it, it will push down the post as long as users will not reach it.
Using timestamp as a priority because the old posts should lose priority with time. Even the trending posts today should lose the priority tomorrow.
Things to consider:
The lifetime can vary according to your needs.
The bigger the user base. You should lower the lifetime value. because if a post posted today is upvoted by 10,000 users it trends 6.9 days in the future. And if there are more than 100 posts that have been upvoted by more than 10,000 users then you will never get to see a new post in those 6.9 days.
So a trending post should hardly last a day or two.
So in this case you can give 10 seconds lifetime, it will give 1.1 day lifetime for 10,000 upvotes.
This is not a perfect solution but it may help you get started.
Edit: 11th June 2021
Nowadays, there are two more options that can help you solve such a problem. The first one would be the whereNotEqualTo method and the second one would be whereNotIn. You might choose one, or the other according to your needs.
Seeing your database structure, I can say you're almost there. According to your comment, you are hosting under the following reference:
Users(Collection) -> userId(Document) -> viewed(Collection)
As documents, all the posts a user has seen and you want to get all the post that the user hasn't seen. Because there is no != (not equal to) operator in Firestore nor a arrayNotContains() function, the only option that you have is to create an extra database call for each post that you want to display and check if that particular post is already seen or not.
To achieve this, first you need to add another property under your post object named postId, which will hold as String the actual post id. Now everytime you want to display the new posts, you should check if the post id already exist in viewed collection or not. If it dons't exist, display that post in your desired view, otherwise don't. That's it.
Edit: According to your comments:
So, for the first post to appear, it needs two Server calls.
Yes, for the first post to appear, two database calls are need, one to get post and second to see if it was or not seen.
large number of server calls to get the first post.
No, only two calls, as explained above.
Am I seeing it the wrong way
No, this is how NoSQL database work.
or there is no other efficient way?
Not I'm aware of. There is another option that will work but only for apps that have limited number of users and limited number of post views. This option would be to store the user id within an array in each post object and everytime you want to display a post, you only need to check if that user id exist or not in that array.
But if a post can be viewd by millions of users, storing millions of ids within an array is not a good option because the problem in this case is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. So you cannot store pretty much everything in a document.

Firebase Firestore Structure for getting un-seen trending posts - Social

This is the current sample structure
Posts(Collection)
- post1Id : {
viewCount : 100,
likes : 45,
points : 190,
title : "Title",
postType : image/video
url : FileUrl,
createdOn : Timestamp,
createdBy : user20Id,
userName : name,
profilePic: url
}
Users(Collection)
- user1Id(Document):{
postsCount : 10,
userName : name,
profilePic : url
}
viewed(Collection)
- post1Id(Document):{
viewedTime : ""
}
- user2Id(Document)
The End goal is
I need to getPosts that the current user did not view and in points field descending order with paging.
What are the possible optimal solutions(like changing structure, cloud functions, multiple queries from client-side)?
I'm working on a solution to show trending posts and eliminate posts that are already seen by users or poor content. It's really painful to deal with two queries especially when the user base is increasing.
It's difficult to maintain the "viewed" collection and filter the new posts. Imagine having 1 million viewed posts and then filter for the un-seen posts.
So I figured a solution, which is not that great, but still cool.
So here is our data structure
posts(Collection)
--postid(document)
Title.
Description.
Image.
timestamp.
priority
This is a simple post structure with basic details. You can see I have added a Priority field. This field will do the magic.
How to use Priority.
We should query the posts that start with the higher priority and ends with lower priority.
When a user posts a new Post. Assign the current timestamp as the default priority.
When the user upvotes (Likes) a post increase the priority by 1 minute(60000 milliseconds)
When the user downvotes (Dislike) a post decrease the priority by 1 minute (60000 ms)
You can reset the priority every 24 hours. If you start browsing the feed today morning you will see posts with the last 24 hours in past. Once the 24-hour duration reached you can reset the priority to the present time. The 24-hour limit can be changed according to your needs. You may want to reset the limit every 15 min. because in every 15 min 100s of new posts might have added. This limit will ensure the repetition of content in the feed.
So when you start scrolling the feed you will get all the trending posts first then lower priority posts later. If you post a post today and people start upvoting it. It will get an increased lifetime, thus overpowers the poor content and when you downvote it, it will push down the post as long as users will not reach it.
Using timestamp as a priority because the old posts should lose priority with time. Even the trending posts today should lose the priority tomorrow.
Things to consider:
The lifetime can vary according to your needs.
The bigger the user base. You should lower the lifetime value. because if a post posted today is upvoted by 10,000 users it trends 6.9 days in the future. And if there are more than 100 posts that have been upvoted by more than 10,000 users then you will never get to see a new post in those 6.9 days.
So a trending post should hardly last a day or two.
So in this case you can give 10 seconds lifetime, it will give 1.1 day lifetime for 10,000 upvotes.
This is not a perfect solution but it may help you get started.
Edit: 11th June 2021
Nowadays, there are two more options that can help you solve such a problem. The first one would be the whereNotEqualTo method and the second one would be whereNotIn. You might choose one, or the other according to your needs.
Seeing your database structure, I can say you're almost there. According to your comment, you are hosting under the following reference:
Users(Collection) -> userId(Document) -> viewed(Collection)
As documents, all the posts a user has seen and you want to get all the post that the user hasn't seen. Because there is no != (not equal to) operator in Firestore nor a arrayNotContains() function, the only option that you have is to create an extra database call for each post that you want to display and check if that particular post is already seen or not.
To achieve this, first you need to add another property under your post object named postId, which will hold as String the actual post id. Now everytime you want to display the new posts, you should check if the post id already exist in viewed collection or not. If it dons't exist, display that post in your desired view, otherwise don't. That's it.
Edit: According to your comments:
So, for the first post to appear, it needs two Server calls.
Yes, for the first post to appear, two database calls are need, one to get post and second to see if it was or not seen.
large number of server calls to get the first post.
No, only two calls, as explained above.
Am I seeing it the wrong way
No, this is how NoSQL database work.
or there is no other efficient way?
Not I'm aware of. There is another option that will work but only for apps that have limited number of users and limited number of post views. This option would be to store the user id within an array in each post object and everytime you want to display a post, you only need to check if that user id exist or not in that array.
But if a post can be viewd by millions of users, storing millions of ids within an array is not a good option because the problem in this case is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. So you cannot store pretty much everything in a document.

How to get only CHANGED data from Google Analytics API?

I'm using Google Analytics API to get the number of page views for each page of my website. In order to reduce the number of api calls, I'm setting an interval for doing this and cache the data on my server. For each api call, I try to get the page views of every page on my site and update them to my database.
Is there a way to get only CHANGED DATA from a specific time stamp? For example, only page views that changed within last 2 hours.
I think it would be a kind of filters (if any) but I could not find it from the documentation here https://developers.google.com/analytics/devguides/reporting/core/v3/reference#filters
You could add a filter for ga:dateHour so that it comes back in the last two hours. But the problem is that it takes Google around 4 hours to process the data. So you wouldn't get anything back for two hours ago.
If you want to see data that is that new you have to use the Realtime api https://developers.google.com/analytics/devguides/reporting/realtime/v3/
What exactly is your query currently? If you do ga:date, ga:dateHour, ga:pagepath, ga:pageviews The results will all be returned in one query (not counting next pages), thats a log way to the 10,000 queries per day limit.
On a side note. What do you mean by changed? Nothing is going to change in data previously processed.

Need some hints for my own WP Theme

After taking some online tutorials I am willing to create my own custom theme for my myself. This is going to be an online Contact Lense store! So far I have learned how to generate and use Custom Post Types, Custom Taxonamyies, Metaboxes , and Option pages.but there is still one confusing part left for me(hopefully not more! :-))
I need to get some user inputs through HTML Select Options like following image to finalize the users orders:
Now my question is:
1- Do I have to create some thing lik Metaboxes to manipulate these data from users?
2- can I handle these kind of data through simple Form ans Post function in PHP? If so where should I store these data? Do I have to create a table on my own to handle these things?
I really appreciate your time regrading this post,
What you're asking for carries a little more complexity than you think!
Let's break this down into its meaningful steps:
A user visits your shop, and decides that they like what they see and wants to make an order
The user fills out a form defining their exact eye requirements, quantity, as well as their contact information
Upon completing this form, a new order has been created
But wait.... how will you get paid? What happens if the user's computer explodes before the payment goes through? How will you know to send them their contacts without first knowing the payment even succeeded?
This is where things start to get tricky. You need to be able to keep a record of orders for the sake of your users, but you also need to look out for your own interests too. Your business is doomed to fail if you're sending out expensive products to people without the proper assurance that you're getting paid.
This is where you'll need to set up a Merchant Account with a service like PayPal or Google Checkout. As much as I despise PayPal, their Instant Payment Notification (IPN) System has been very reliable for me. What this does is automatically send a POST request to your server with all of the information you need to finalize the checkout process and alert your user that their payment has either succeeded or failed.
So with this in mind, how does this affect our step-by-step process?
A user visits your shop, and decides that they like what they see and wants to make an order
The user fills out a form defining their exact eye requirements, quantity, as well as their contact information
Upon completing this form, a new order has been created with a status of pending
The user is then sent to PayPal/Google Checkout to enter their Credit Card information to complete their purchase
PayPal/Google processes the payment
PayPal/Google sends your server the results of the processed payment
The corresponding order is updated with a status of Payment Received or Payment Failed for your own records
You send out the product to a very satisfied customer
So what will this mean from a Wordpress standpoint?
My first suggestion:
Check if a Plugin already exists that can handle this for you!!!
Seriously, this will make your life much easier. Handling people's money as well as your own stock is a nightmare all in itself, you don't want to be responsible for handling the code that drives it, or the possibility of security holes that you might not know about (that other plugins may have already addressed). WooCommerce is a popular one. See if that can handle what you need.
If a Plugin can't do it for you, then you'll need to:
Register a Custom Post Type for Orders
Create a new Order Post using wp_insert_post when a user submits the form with their POST data
Save the relevant POST data you need as metadata using update_post_meta
Send PayPal/Google/Whatever some Custom Information it needs to hang on to - in this case, the newly created Order Post ID - so that it can send it back to your own server
Set up a side-script to process the data sent by PayPal/Google Checkout/Whatever and send an email to the user detailing the status of their purchase and update the corresponding Order Post ID that was sent back by PayPal/Google Checkout/Whatever
(Optional) Set up a CRON Job to periodically scan all Pending orders in case a user's session was interrupted, or they bailed at the last second during checkout and send them an email notifying them about this and provide them a link to your website to reopen, reevaluate, and resend the order, or cancel and clear it from your database
Quite honestly, this would take even a seasoned Developer at least a few weeks worth of work just to get it in working condition. Presentation is a whole different animal.
Hopefully this will give you a step in the right direction. I doubt anybody here will give you the code to do what you need, because there's just too much to post. Entire libraries are built just for these kinds of things.
Good luck!

How to get FB like count for each post while pulling the feed via Open Graph API?

I am trying to get all the posts for a page by using
https://graph.facebook.com/PAGE_ID/feed
And it works like a charm. I can get all the info for each post except the like count.
The feed does return "likes" for each post, but it shows the like info for the first 25 likes. I cannot know the like count of a post.
The closest solution I found on the net is to set "summary=1" when requesting info of a post, e.g.
https://graph.facebook.com/POST_ID/likes?summary=1
This will return a summary field that shows the like count of this post, which is exactly what I need.
However, if this is the only way to solve the problem, I have to make additional network request for each post just for getting the like count. I could originally finish the job with only ONE network request, but now I have make 1+N times (number of posts in the page feed) of network requests.
I think I must be missing something. FB must have some way to get the like count embedded in the feed info. Just like the FB app or website, all posts show their like counts immediately, there is no way to make additional N times of network requests in order to get the like count for each post.
Hope someone can help. Thanks a lot in advance.
Finally, I found there is a way to get the like/comment counts for each post while pulling the feed without making further network requests:
/url/feed?fields=likes.summary(1).limit(0)
Isn't it great?

Resources