Bulk edit-tag for Google Reader - google-reader

How to bulk edit tag of the google reader item ?
Now I'm using /reader/api/0/edit-tag to edit tags, but it's very slow to update tags for all items (in loop).
Dow you know any way to send tags for many items at once?
Possible solution looks like using some threads to send these requests to Google Reader Server.

You can include multiple i= and s= in the same post. just make sure that as you add a new i= you add the corresponding s= for that item even if you've already included the s= for that exact same stream previously (this is really important, or you'll get a 400 error when making the call). I was doing batches of 10 with my code, I'm sure you can do more but I don't know the limit.

Could a working URL for marking all items as read be posted?
I have tried:
<?php echo 'http://www.google.com/reader/api/0/edit-tag?'.'s=feed%2F'.urlencode('http://feeds.feedburner.com/filehippo').'&i='.urlencode('tag:google.com,2005:reader/item/c7701cf414f3539e').'&a=user%2F-%2Flabel%2Fread'.'&T=bbi44C5CQzjzM43yKUPwnA'; ?>
but I just keep getting a 400 error back.
Many thanks.

Related

How to scrape/download all tumblr images with a particular tag

I am trying to download many (1000's) of images from tumblr with a particular tag (.e.g #art). I am trying to figure out the fastest and easiest way to do this. I have considered both scrapy and puppeteer as options, and I read a little bit about the tumblr API, but I'm not sure how to use the API to locally download the images I want.
Currently, puppeteer seems like the best way, but I'm not sure how to deal with the fact that tumblr uses lazy loading (e.g. what is the code for getting all the images, scrolling down, waiting for for images to load, and getting these)
Would appreciate any tips!
I recommend you use the Tumblr API, so here's some instructions on how to go about that.
Read up on the What You Need section of the documentation
Read up on the Get Posts With Tag section
Consider using a library like PyTumblr
import pytumblr
list_of_all_posts = []
# Authenticate via OAuth
client = pytumblr.TumblrRestClient(
'YOUR KEY HERE'
)
def get_art_posts():
posts = client.tagged('art', **params) # returns HTML of 20 most recent posts in the tag
# use params (shown in tumblr documentation) to change the timestamp of limit of the posts
# i.e. to only posts before a certain time
return posts
list_of_all_posts.append(get_art_posts())
I'm pretty rusty with the Tumblr API, not gonna lie. But the documentation is kept well up to date. Once you have the HTML of the post, the link to the images will be in there. There's plenty of libraries out there like Beautiful Soup that can extract the images from the HTML by their CSS selectors. Hope this helped!
My solution is below. Since I couldn't use offset, I used the timestamps of each post as an offset instead. Since I was trying to specifically get the links of images in the posts, I did a little processing of the output as well. I then used a simple python script to download every image from my list of links. I have included a website and an additional stack overflow post which I found helpful.
import pytumblr
def get_all_posts(client, blog):
offset = None
for i in range(48):
#response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
response = client.tagged('YOUR TAG HERE', limit=20, before=offset)
for post in response:
# for post in response:
if('photos' not in post):
#print(post)
if('body' in post):
body = post['body']
body = body.split('<')
body = [b for b in body if 'img src=' in b]
if(body):
body = body[0].split('"')
print(body[1])
yield body[1]
else:
yield
else:
print(post['photos'][0]['original_size']['url'])
yield post['photos'][0]['original_size']['url']
# move to the next offset
offset = response[-1]['timestamp']
print(offset)
client = pytumblr.TumblrRestClient('USE YOUR API KEY HERE')
blog = 'staff'
# use our function
with open('{}-posts.txt'.format(blog), 'w') as out_file:
for post in get_all_posts(client, blog):
print(post, file=out_file)
Links:
https://64.media.tumblr.com/9f6b4d8d15caffe88c5877cd2fb31726/8882b6bec4975045-23/s540x810/49586f5b05e8661d77e370845d01b34f0f5f2ca6.png
Print more than 20 posts from Tumblr API
Also thank you very much to Harada, whose advice helped a lot!

Firebase Storage : Get the token of the URL

I currently have an application that works with Firebase.
I repeatedly load profile pictures. However the link is quite long, it consumes a certain amount of data. To reduce this load, I would like to put the link in raw and only load the token that is added to the link.
To explain, a link looks like this: “https://firebasestorage.googleapis.com/v0/b/fir-development.appspot.com/o/9pGveKDGphYVNTzRE5U3KTpSdpl2?alt=media&token=f408c3be-07d2-4ec2-bad7-acafedf59708”
So I would like to put in gross: https://firebasestorage.googleapis.com/v0/b/fir-developpement.appspot.com/o/
In continuation: “9pGveKDGphYVNTzRE5U3KTpSdpl2” which is the UID of the user that I recover already and the or my problem this poses: “alt = media & token = f408c3be-07d2-4ec2-bad7-acafedf59708” which adds randomly for each photo .
I would like to get back only this last random piece …
Is it possible ?
Thank you
UP : 01/11 Still no solution
It's not supported to break apart and reassemble download URLs. You should be treating these strings as if their implementation details might change without warning.

How can subscribers be informed that an already published item has changed?

I want to implement a very basic RSS feed for a website that has an FAQ.
Subscribers will be informed about new questions/answers. That's great.
But questions will have essential changes in content from time to time (e.g. when a better answer for a question is found). Is RSS able to inform subscribers that such an answer has changed?
If not, what could be a good workaround? I'm thinking of offering another RSS feed which only announces changes to existing questions. Is this "the right way to go"?
The answer depends on what kind of client you want to implement your feed for.
If you are talking about simple RSS readers, there is no real standard to update only one entry that have changed, but most of them are doing long polling, effectively getting all the feed again with the update.
But there are protocols called light and fat pinging that are specifically designed to handle what you want to do, the difference between the two are:
Light pinging means that the url of the feed that changed will be sent to the subscriber
Fat pinging means that the updated content of the feed that changed will be sent to the subscriber
One pretty popular protocol (fat-ping) and backed by Google is called PubSubHubbub. There is a few services using it already, like Blogger, Youtube, MySpace, Tumblr or Wordpress. They have open-source clients for a lot of languages available on their Github, I recommend taking a look at their wiki if you are interested, which is pretty complete and informative.
Using a second feed for updates do not seems like a good idea, since that's not what it's designed for and would mean that you have to implement a client for the people to install to effectively get the updates.
I would do something with edit or update dates, titles and other XML node values compair old versions with the new versions if a node has been changed to another value, for example:
If you have a variable for your sml nodes that can be changed to an other value you'll be able to check for a change with if statements, for this example I'll check the value of $title..
<?php
$title = 'My awesome title';
/** Here your php code to get the content and assign variable to all nodes **/
$title_old = $title; /** Get the info your xml script or cached XML data **/
$title_new = 'My awesome title'; /** Use YOUR new value! - requestt the content again after someone clicked the save button or use a crontab to get info on changes every x minutes, hours or days **/
if($title_new == $title_old){
$title_changed = 'No changes detected to the title node';
echo $title_changed . ': The title is "' . $title . '"';
}else{
$title_changed = 'The title node has been changed';
echo $title_changed . ': The new title is "' . $title . '"';
}
?>
I hope this will bring you into the right direction..
You could format the links in such a way that they include a hash or and ID of the last answer. This way the links will get picked up as fresh, and if you have a custom feed reader it can trace if the link is new or was read by the user

What keeps caching from working in WebMatrix?

I have a number of pages in a WebMatrix Razor ASP.Net site where I have added one line of code:
Response.OutputCache(600);
From reading about it I had assumed that this mean that IIS would create a cache of the html produced by the page, serve that html for the next 10 minutes, and after 10 minutes when the next request came in, it would run the code again.
Now the page is being fetched as part of an timed jquery call. The time code in the client runs every minute. The code there is very simple:
function wknTimer4() {
$.get('PerfPanel', function(data) {
$('#perfPanel').html(data);
});
It occasionally appears to cache, but when i look at the number of database queries done during the 10 minute period, i might have well over 100 database queries. I know the caching isn't working the way I expect. Does the cache only work for a single session? Is there some other limitation?
Update: it really shouldn't matter what the client does, whether it fetches the page through a jQuery call, or straight html. If the server is caching, it doesn't matter what the client does.
Update 2: complete code dumped here. Boring stuff:
#{
var db = Database.Open("LOS");
var selectQueryString = "SELECT * FROM LXD_funding ORDER BY LXDOrder";
// cache the results of this page for 600 seconds
Response.OutputCache(600);
}
#foreach (var row in db.Query(selectQueryString) ){
<h1>
#row.quotes Loans #row.NALStatus, oldest #(NALWorkTime.WorkDays(row.StatusChange,DateTime.Now)) days
</h1>
}
Your assumptions about how OutputCache works are correct. Can you check firebug or chrome tools to look at the outgoing requests hitting your page? If you're using jQuery, sometimes people set the cache property on the $.get or $.ajax to false, which causes the request to the page to have a funky trailing querystring. I've made the mistake of setting this up globally to fix some issues with jQuery and IE:
http://api.jquery.com/jQuery.ajaxSetup/
The other to look at here is the grouping of DB calls. Are you just making a lot of calls with one request? Are you executing a db command in a loop, within another reader? Code in this case would be helpful.
Good luck, I hope this helps!

WordPress Write Cache Issue with Multiple Sessions

I'm working on a content dripper custom plugin in WordPress that my client asked me to build. He says he wants it to catch a page view event, and if it's the right time of day (24 hours since last post), to pull from a resource file and output another post. He needed it to also raise a flag and prevent other sessions from firing that same snippet of code. So, raise some kind of flag saying, "I'm posting that post, go away other process," and then it makes that post and releases the flag again.
However, the strangest thing is occurring when placed under load with multiple sessions hitting the site with page views. It's firing instead of one post -- it's randomly doing like 1, 2, or 3 extra posts, with each one thinking that it was the right time to post because it was 24 hours past the time of the last post. Because it's somewhat random, I'm guessing that the problem is some kind of write caching where the other sessions don't see the raised flag just yet until a couple microseconds pass.
The plugin was raising the "flag" by simply writing to the wp_options table with the update_option() API in WordPress. The other user sessions were supposed to read that value with get_option() and see the flag, and then not run that piece of code that creates the post because a given session was already doing it. Then, when done, I lower the flag and the other sessions continue as normal.
But what it's doing is letting those other sessions in.
To make this work, I was using add_action('loop_start','checkToAddContent'). The odd thing about that function though is that it's called more than once on a page, and in fact some plugins may call it. I don't know if there's a better event to hook. Even still, even if I find an event to hook that only runs once on a page view, I still have multiple sessions to contend with (different users who may view the page at the same time) and I want only one given session to trigger the content post when the post is due on the schedule.
I'm wondering if there are any WordPress plugin devs out there who could suggest another event hook to latch on to, and to figure out another way to raise a flag that all sessions would see. I mean, I could use the shared memory API in PHP, but many hosting plans have that disabled. Can't use a cookie or session var because that's only one single session. About the only thing that might work across hosting plans would be to drop a file as a flag, instead. If the file is present, then one session has the flag. If the file is not present, then other sessions can attempt to get the flag. Sure, I could use the file route, but it's kind of immature in my opinion and I was wondering if there's something in WordPress I could do.
The key may be to create a semaphore record in the database for the "drip" event.
Warning - consider the following pseudocode - I'm not looking up the functions.
When the post is queried, use a SQL statement like
$ts = get_time_now(); // or whatever the function is
$sid = session_id();
INSERT INTO table (postcategory, timestamp, sessionid)
VALUES ("$category", $ts, "$sid")
WHERE NOT EXISTS (SELECT 1 FROM table WHERE postcategory = "$category"
AND timestamp < $ts - 24 hours)
Database integrity will make this atomic so only one record can be inserted.
and the insertion will only take place if the timespan has been exceeded.
Then immediately check to see if the current session_id() and timestamp are yours. If they are, drip.
SELECT sessionid FROM table
WHERE postcategory = "$postcategory"
AND timestamp = $ts
AND sessionid = "$sid"
The problem goes like this with page requests even from the same session (same visitor), but also can occur with page requests from separate visitors. It works like this:
If you are doing content dripping, then a page request is probably what you intercept with add_action('wp','myPageRequest'). From there, if a scheduled post is due, then you create the new post.
The post takes a little bit of time to write to the database. In that time, a query on get_posts() may not see that new record yet. It may actually trigger your piece of code to create a new post when one has already been placed.
The fix is to force WordPress to flush the write cache appears to be this:
try {
$asPosts = array();
$asPosts = # wp_get_recent_posts(1);
foreach($asPosts as $asPost) {break;}
# delete_post_meta($asPost['ID'], '_thwart');
# add_post_meta($asPost['ID'], '_thwart', '' . date('Y-m-d H:i:s'));
} catch (Exception $e) {}
$asPosts = array();
$asPosts = # wp_get_recent_posts(1);
foreach($asPosts as $asPost) {break;}
$sLastPostDate = '';
# $sLastPostDate = $asPost['post_date'];
$sLastPostDate = substr($sLastPostDate, 0, strpos($sLastPostDate, ' '));
$sNow = date('Y-m-d H:i:s');
$sNow = substr($sNow, 0, strpos($sNow, ' '));
if ($sLastPostDate != $sNow) {
// No post today, so go ahead and post your new blog post.
// Place that code here.
}
The first thing we do is get the most recent post. But we don't really care if it's not the most recent post or not. All we're getting it for is to get a single Post ID, and then we add a hidden custom field (thus the underscore it begins with) called
_thwart
...as in, thwart the write cache by posting some data to the database that's not too CPU heavy.
Once that is in place, we then also use wp_get_recent_posts(1) yet again so that we can see if the most recent post is not today's date. If not, then we are clear to drip some content in. (Or, if you want to only drip in like every 72 hours, etc., you can change this a little here.)

Resources