When I put a twitter feed (https://api.twitter.com/1/statuses/user_timeline.rss?screen_name=chulian1819) into yahoo pipes, I get an error 400, and when I use the YQL console it says "Redirected to a robots.txt restricted URL: https://api.twitter.com/1/statuses/user_timeline.rss?screen_name=chulian1819"
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Fapi.twitter.com%2F1%2Fstatuses%2Fuser_timeline.rss%3Fscreen_name%3Dchulian1819%22&diagnostics=true
how to get the twitter feed of a user into yahoo pipes?
Thanks!
ps: my twitter post are not protected, i can se the rss feed on my browser and not logged into twitter
Hi there i was able to make a twitter feed mix using yahoo! pipes
I tried alot of different other "programs" but Yahoo! pipes just rules this one ;)
I used Fetch Feed, Sort and Regex to do my thing.
Folowing details are maybe interesting for other people
the url you can fetch from
http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=REPLACEWITHNAME
http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=REPLACEWITHOTHERNAME
...
sort by item.pubDate to get a mix of feeds by date
and i use regex to remove url's in the text
(https?://([-\w.]+)+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?)
probably there are pre-made yahoo pipes that are public and that you can simply clone and adapt, but i haven't looked into that so maybe someone else can post about that
anyway hope it helps
When Yahoo Pipes retrieves content from either an RSS feed or even a web page it identifies itself using the User Agent String in the request header, this is fixed by Yahoo and cannot be changed. So if the site being scraped has blocked yahoo pipes then you are out of luck and it cannot be done.
The only workaround is to change over to using cURL, this can mimic a web browsers userAgentstring and bypass the robots.txt file. However this will mean using a PHP enabled webserver or a google app engine to grab the feed.
Related
As far as I know, in order to get RSS in real time (i.e be pubsubhubbub subscriber) , the one who generates the RSS feed, must be a pubsubhubbub publisher, which means the RSS feed must include a tag which includes the HUB address.
However, there are lots of RSS feeds (published using RSS 2.0 only and not pubsubhubbub) which I can subscribe to via Feedly.
How is it possible?
Thanks,
Qwerty
So, Feedly does use PubSubHubbub, thru Superfeedr (and other hubs, such as Google's or Wordpress).
Fir the feeds which do not support PubSubHubbub, Feedly polls feeds at regular intervals. You may want to check this other question for more details.
Also, please note that Superfeedr can also be used as a "default hub" which works even for feeds which do not support PubSubHubbub.
We utilize the LinkedIn "Customized URL" to allow sharing of articles to LinkedIn users' feeds. (https://developer.linkedin.com/docs/share-on-linkedin) The encoded URL that is passed into the "url" parameter contains our own tracking parameters.
https://www.linkedin.com/shareArticle?url=http%3A%2F%2Fwww.thedailybeast.com%2Farticles%2F2015%2F03%2F26%2Famerica-loses-no-matter-who-wins-the-next-great-middle-east-war.html%3Fvia%3Ddesktop%26social%3DLinkedin
Unfortunately, the URL's to our articles shared in this way have been stripped of the query parameters.
The presumed reason is that LinkedIn ingests our og:url metadata and uses that canonical URL for the link that is shared. We'd prefer to override this, but the docs seem to indicate that that is only possible if you use the REST API. We'd prefer to avoid that since we are only trying to share articles to LinkedIn. Can someone tell me if there is any other way to incorporate our tracking query params using the "shareArticle" URL?
If you utilize the API for sharing, rather than via URL as you are doing, you can specifically provide values that LinkedIn will use, which avoids the crawler picking up the page's meta-data and "overriding" the URL you are giving it.
More information on making the API call here: https://developer.linkedin.com/docs/share-on-linkedin
Hey guys, I'm trying to sync 2 sites on drupal, ideally the route to go would be to use feeds, however the content I'm trying to sync is restricted to users with a role they pay to have.
That means the other site can't see the feed because it gets denied the content. How would I go about exposing this feed so feeds can crawl it on the other site?
I ended up writing a module that outputs as rss, I used an access callback to validate the host and a key I appended in the url of the feed. works pretty well.
Here's the scenario:
I have a mailing list that contains a PDF download link. The PDF contains ads with clickable links. I need to get analytic data on the link clicks - preferably via Google Analytics (due to the richness of information available).
The solution I have in mind is for the link to go to a web page that I host with some sort of ad-specific token. GA records the request and then I use a client-side technique to redirect to the actual target URL. The redirect page serves no purpose other than to track the click and so I'm not worried about it being perceived as cloaking by search engines.
What I want to know is:
Are there any alternative ways to achieve the tracking without using an intermediate redirect page (could I perhaps call GA server-side somehow)?
If I do use the redirect page approach, what are potential pitfalls could I encounter?
Thanks in advance for any advice.
dunno what server-side environment/language you use but for instance in php you can use cURL to send an image request to google, with the custom code appended to the url. Easiest way to do it is to output the code with javascript with your custom code and then capture the image request url with a sniffer, so you can replicate the format for your cURL request. Make sure to send header info, including fake browser info so GA doesn't weed it out as a bot. Then forward to the ad url. That way you don't need to output a page.
Yeah you still have a 'redirect' happening but you cut out having to have the client download a page or worry about javascript being disabled, etc...
unfortunately there really isn't anything better you can do.
What's the best way to track how many times items in your RSS have been accessed?
Assuming your RSS is served from a webserver, the server logs would be the obvious place to gather statistics from. There are numerous packages for parsing and interpreting webserver logs.
AWStats is a popular (free) package, and Wikipedia keeps a fairly comprehensive list.
If you serve your feeds through something like FeedBurner then you can also get stats from there including clicks
You could use Google Analytics, but you would need a service to make the correct requests to the Google Analytics API or redirect to it. There are two APIs you can use:
the __utm.gif "API"
the Measurement Protocol API
To use the later (you need Universal Analytics), which is way better in my opinion, you would need to make a request or redirect to something like:
http://www.google-analytics.com/collect?z=<randomnumber>&t=pageview&dh=<domainname>&cid=<unique-client-uuid>&tid=<propertyid>&v=1dp=<path>
Where:
<randomnumber> is a random number to avoid caches (especially if you do redirects)
<domainname> is the domain name you see in your tracking code
<propertyid> is the property id you see in your tracking code (eg: UA-123456)
<path> is the path to the page you want to register the pageview for. Note that it must be quoted. Eg, for /path/to/page you would need to send %2Fpath%2Fto%2Fpage
I've implemented a simple redirector service that does exactly that here (explained at length here)
If you're stuck with the Classic Analytics then you would need to use nojsstats or the older implementation