How to retrieve delicious related tags - similarity

I have found this example here which uses delicious related tags and create a graph. But I don't know how they implemented it. I don't know how to get a list of related tags from delicious API, because in the documentation it is not mentioned at all, but in delicious website when you search for a tag it shows related tags in the right hand.
Does anybody know how to get related tags using API?
Thank you

You might want to refer to Delicious' API page. There is specific section on getting tags.
Not knowing what language you're using (I didn't see any examples in the link you provided; admittedly I didn't dig too deep), I'm presenting some Python which uses the urllib.FancyURLopener:
import urllib
u = urllib.FancyURLopener({})
f = u.open("https://api.del.icio.us/v1/tags/get")
tags = f.readlines()
for tag_line in tags:
print tag_line
Notes about this code:
The urllib doc page contains this caveat about using the module with https:
Warning - When opening HTTPS URLs, it does not attempt to validate the server certificate. Use at your own risk!
As coded above, you will be prompted for your Delicious username & password. To work around this, you need to override the prompt_user_password method.
As you may have guessed by the need for authentication, this only gets tags for the user whose credentials you provide. I did not see how to get tags for all of Delicious.

Related

Is there way to be absolutely sure that access came from QR code scan? [duplicate]

I have this project where I need to know if a visitor legitimately arrived from a QR code. Document.referrer value from a QR code shows blank. I have looked at some answers suggesting to put parameter in the query string (e.g. ?source=qr), but anyone could easily add the parameter into the URL and my code would believe it is from a QR code (e.g. www.project.com/check.page?source=qr) . I have thought of adding codes to make sure it is from a mobile phone / tablet as secondary way to authenticate but many browsers have add-ons to fool websites.
Any suggestions would be greatly appreciated.
Thanks in advance.
I think the best solution for you is creating your regional QR Codes pointing to:
Region 1) http://example.com/?qr=f61060194c9c6763bb63385782aa216f
Region 2) http://example.com/?qr=731417b947aa548528344fab8e0f29b6
Region 3) http://example.com/?qr=df189e7f7c8b89edd05ccc6aec36c36d
if the value of the parameter qr is anything other than f61060194c9c6763bb63385782aa216f, 731417b947aa548528344fab8e0f29b6 or df189e7f7c8b89edd05ccc6aec36c36d, then you can ignore it and assume the user didn't come from any QR Code.
Of course, any user can remove the source parameter. But at least he can't add a valid one, unless he really had access to the code.
...but anyone could easily add the parameter into the URL and my code would believe it is from a QR code
Well, anyone could also scan the QR code, view the link, and remove the source=qr from it.
Data collection is never 100% reliable. Users can change their browser's user agent, inject cookies with some strange values, open your page through a proxy server, and so on.
You could create your own device or App for scanning the QR-code. If you read the post I've linked, you will see that this is a waste of time and resources.
So, what is left is to make a solution which will work for most of the users. Appending a source=qr parameter to your URL seems to be the simplest solution. You could also link to an entirely different domain and redirect the request, so it would be more fraud-safe. But it will never be 100% accurate.

Nutch possibilities

i am new to nutch and am using nutch 1.9. right now am doing some POC on a sample site(shaadi.com). I have few questions, can somebody help me out on this?
i cant access the urls that requires login authentication(form based), though i setup the configuration in httpclient-auth.xml, nutch-site.xml and all.
i know nutch fetches us only the whole content of the website. but is it possible to get only a piece of information like first name, address etc.. from the website page using nutch? (i think its more like scraping.. this is what pythons scrapy does)
Thanks in advance.
You will need to use plugin to extract specific data & add that data to nutch document while indexing.
This plugin can be used to extract data
www.atlantbh.com/precise-data-extraction-with-apache-nutch/

How to crawl publicly shared Secret (secret.ly) posts

Secret (secret.ly) is an anonymous social network where people share their thoughts in the form of short messages. From time to time people share their "secrets" on social media like this, this and this
I am trying to create a stream of publicly available secrets and I was wondering if there is a way to crawl the secret.ly domain to extract all those public secrets despite the fact that the url's are random strings. I could just search on Twitter but I am wondering if there is a way to just do it directly on secret.ly
Here is a start using perl. It appears they are also linked in social media. This script will only get the site and dump the links. I couldn't do much more without knowing more of what you want.
use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
$mech->get('http://www.secret.ly');
print $mech->dump_links;
Update: There is also a find_all_links method of WWW::Mechanize which you may find helpful, too.

Can anyone provide a good info on the various uses of hash(#) in urls?

I'm developing a software, which is going to provide in-deep information about url's.
While the get-params are simple, I'm having trouble with the hash.
At first it was used to mark places in the document to navigate to, but we're past that now. I've seen JS engines using it to store params similar to the get strings.
So, here's my question: is everything that comes after a hash free game, or are there any conventions about what it should look like?
Try these sites it could help. Fragment Identifier, Wikipedia or Pound Sign, Google
It's got a list of examples you could use.
It all depends on what you need. Hashes are used in modern web applications that make use of asynchronous calls to the server using ajax. This e.g. allows the user to copy the link and receive the same content after pasting (actions taken are put into hash which changes the url which otherwise would remain static).
You want to read http://www.jenitennison.com/blog/node/154

Is it old-fashioned use query string for id?

I am curious if is out-of-date to use query string for id. We have webapp running on Net 2.0. When we display detail of something (can be product) we use query string like this : http://www.somesite.com/Shop/Product/Detail.aspx?ProductId=100
We use query string for reason that user can save the link somewhere and come back any time later. I suppose that we use url rewriting soon or later but in mean time I would like to know your opinion. Thanks.Cheers, X.
A common strategy is to use an item ID in the URL, coupled with some keywords that describe the item. This is good from a user's perspective, because they can easily see what a URL refers to if they save it somewhere. More importantly, it's useful from a SEO (Search Engine Optimisation) point of view, as search engines will - it is said - rate a given URL more highly if it contains the keywords someone is searching for.
You can see this approach on this very site, where the ID after 'questions' is used for the database query and the text is purely for the benefit of users and search engines.
Whether you use a straightforward query string, or a more advanced approach that makes the ID look like part of the folder path, is up to you. It's largely a matter of personal taste.
Yes, it is old fashioned!
However, if you are thinking about changing it to a RESTful implementation as others have suggested, then you should continue to support the old URL and querystring addresses by implementing an HTTP 301 redirect to forward from the querystring URLs, to the new restful URLs. This will ensure that any users old links and bookmarks will continue to work while telling the search engine bots that the url has changed.
Since your post is tagged ASP.Net, there is a good write-up on how you can support both, using the new ASP.Net routing mechanism here: http://msdn.microsoft.com/en-us/magazine/dd347546.aspx
Nothing wrong with query string parameters. Simple to create and understand. A lot of sites are using fancy urls like 'www.somesite.com/Shop/Product/white_sox_t_shirt` which is cool and sort-of user friendly, but more work for us poor developers.
Using query strings is not outdated at all, it just has to be used in the right places. However, never place anything in the query string that could be a security issue and remember that anything you read from the query string could have been modified so you should be validating all input in your checks.
It's not outdated, but anothter alternative is a more RESTful approach:
yourwebsite.com/products/100/usb-coffee-maker
The reason is that a) search engines usually ignore any URL with a QueryString (so the product.aspx?id=100 page may never get indexed) and b) having the name in the url purely for display purposes supposedly helps SEO as well.
Permanent links are best for SEO and also , what if your product moved to another database , and the ID of the product needs to be changed ?
I don't think the chances of a product's name will be changed or the manufacturer.
E.g Apple/Iphone won't change :) Seems to me a good Permalink

Resources