Bing Custom Search - sort/order by file modification date - bing-custom-search

I have been asked by a client to see if Bing Custom Search can order results containing links to PDFs by file modification date.
I know results can be ordered by the date the content was indexed (or re-indexed), but they are concerned with the actual age of PDF files as determined by the filesystem timestamp, and want to order the results by that criteria.
I could not find anything in the Azure documentation, and personally I don't believe it is possible, but I wanted to check in with SO first.

I don't believe this specific scenario is currently supported at the time. But if this is a feature you would like to see supported in the future, you may leave your feedback on Uservoice.

Related

How to analyze multiple query parameters in Google Analytics

I'm setting up Google Analytics for a website where a user can find an event to attend (concerts, plays, etc.). The results can be filtered by 5 different parameters.
So, unfiltered results would look like: example.com/event-finder/
And filtered results showing concerts in January or February would look like: example.com/event-finder?type=concert&month=jan,feb
I'm struggling to figure out the best way to use the query parameters in Google Analytics to analyze filtering behavior.
Example questions I'd want to be able to pull answers for:
What percentage of results were filtered by type?
What percentage of results were also sorted by month?
What is the most common type filtered by?
I have full access to both Google Analytics and Tag Manager but I suspect I shouldn't do this with events or custom dimensions and that there's got to be a way to use the query parameters to do this in a clean way.
I've tried to use a new view and site search to group the types of filters. Seems like it could work, but seems hacky and limited.
I've considered pushing those values into custom dimensions, but that too seems like overkill.
I've considered pulling content reports into Google Sheets and sorting through things there, but I'm 1) not entirely sure how I'd do that and 2) suspect there may be an easier approach.
Let me know if you have any questions or need more clarification. Thanks!
Have you tried to use "category parameters" when configuring site search (admin -> view settings)? You could set the "type" as a category parameter. You can also enter multiple parameters in there.
Check this screenshot of site search configuration

How to remove/hide all Google Analytics data associated with a specific page?

For about a week, Google Analytics was erroneously reporting page views for a few request URIs, severely skewing my data. I have read that there is no way to remove data once it is reported. If this is the case, is there a way to simply hide this data from the view?
I have tried a number of things (such as creating global filters, view filters, etc.) to no avail. Using segments also doesn't work, because apparently you can only filter out visits/users (whereas my goal is to filter out page views associated with a specific page). At this point, I feel like I must be going about it the totally wrong way...
Below is a screenshot of the Behavior > Overview section. The page views I want to move are #1, #2, and #5.
Alex, unfortunately, there is nothing you can do about the historical data.
However, you can use simple filter to exclude pages you don't want to see (the filter field above the report table, not filters related to account/profiles) -- see the attached screen below.
Make sure you select exclude and then pick Page dimension. The easiest way would be to use regular expressions, like:
(a|b|c)
This one would remove any pages that contain either "a", or "b" or "c".
The expression would be probably a bit more complicated in your case and I suggest using tools like RegEx Hero (free, online). I am not sure if there is anything common for the pages you would like to remove from the reports, but regular expression can do quite a lot :).
One last thing -- be aware there is a slight difference in segments and (table) filters. If you use segments for page dimension, you would end up with ALL the pages that were seen during a visit, which includes the page you set in the segment. Might be a bit confusing, but see this article for detailed explanation.

Scrape all google search result for a specific name

I think the question has been answered here before,but i could not find the desired topic.I am a newbie in web scraping.I have to develop a script that will take all the google search result for a specific name.Then it will grab the related data against that name and if there is found more than one,the data will be grouped according to their names.
All I know is that,google has some kind of restriction on scraping.They provide a custom search api.I still did not use that api,but hoping to get all the resulted links corresponding to a query from that api. But, could not understand what will be the ideal process to do the scraping of the information from that links.Any tutorial link or suggestion is very much appreciated.
You should have provided a bit more what you have been doing, it does not sound like you even tried to solve it yourself.
Anyway, if you are still on it:
You can scrape Google through two ways, one is allowed one is not allowed.
a) Use their API, you can get around 2k results a day.
You can up it to around 3k a day for 2000 USD/year. You can up it more by getting in contact with them directly.
You will not be able to get accurate ranking positions from this method, if you only need a lower number of requests and are mainly interested in getting some websites according to a keyword that's the choice.
Starting point would be here: https://code.google.com/apis/console/
b) You can scrape the real search results
That's the only way to get the true ranking positions, for SEO purposes or to track website positions. Also it allows to get a large amount of results, if done right.
You can Google for code, the most advanced free (PHP) code I know is at http://scraping.compunect.com
However, there are other projects and code snippets.
You can start off at 300-500 requests per day and this can be multiplied by multiple IPs. Look at the linked article if you want to go that route, it explains it in more details and is quite accurate.
That said, if you choose route b) you break Googles terms, so either do not accept them or make sure you are not detected. If Google detects you, your script will be banned by IP/captcha. Not getting detected should be a priority.

Advice needed on REST URL to be given to 3rd parties to access my site

Important: This question isn't actually really an ASP.NET question. Anyone who knows anything about URLS can answer it. I just happen to be using ASP.NET routing so included that detail.
In a nutshell my question is :
"What URL format should I design that i can give to external parties to get to a specific place on my site that will be future proof. [I'm new to creating these 'REST' URLs]."
I need an ASP.NET routing URL that will be given to a third party for tracking marketing campaigns. It is essentially a 'gateway' URL that redirects the user to a specific page on our site which may be the homepage, a special contest or a particular product.
In addition to trying to capture the referrer I will need to receive a partnerId, a campaign number and possibly other parameters. I want to provide a route to do this BUT I want to get it right first time because obviously I cant easily change it once its being used externally.
How does something like this look?
routes.MapRoute(
"3rd-party-campaign-route",
"campaign/{destination}/{partnerid}/{campaignid}/{custom}",
new
{
controller = "Campaign",
action = "Redirect",
custom = (string)null // optional so we need to set it null
}
);
campaign : possibly don't want the word 'campaign' in the actual link -- since users will see it in the URL bar. i might change this to just something cryptic like 'c'.
destination : dictates which page on our site the link will take the user to. For instance PR to direct the user to products page.
partnerid : the ID for the company that we've assigned - such as SO for Stack overflow.
campaignid : campaign id such as 123 - unique to each partner. I have realized that I think I'd prefer for the 3rd party company to be able to manage the campaign ids themselves rather than us providing a website to 'create a campaign'. I'm not
completely sure about this yet though.
custom : custom data (optional). i can add further custom data parameters without breaking existing URLS
Note: the reason i have 'destination' is because the campaign ID is decided upon by the client so they need to also tell us where the destination of that campaign is. Alternatively they could 'register' a campaign with us. This may be a better solution to avoid people putting in random campaign IDs but I'm not overly concerned about that and i think this system gives more flexibility.
In addition we want to know perhaps which image they used to link to us (so we can track which banner works the best). I THINK this is a candiate for a new campaignid as opposed to a custom data field but i'm not sure.
Currently I am using a very primitive URL such as http://example.com?cid=123. In this case the campaign ID needs to be issued to the third party and it just isn't a very flexible system. I want to move immediately to a new system for new clients.
Any thoughts on future proofing this system? What may I have missed? I know i can always add new formats but I want to use this format as much as possible if that is a good idea.
This URL:
"campaign/{destination}/{partnerid}/{campaignid}/{custom}",
...doesn't look like a resource to me, it looks like a remote method call. There is a lot of business logic here which is likely to change in the future. Also, it's complicated. My gut instinct when designing URLs is that simpler is generally better. This goes double when you are handing the URL to an external partner.
Uniform Resource Locators are supposed to specify, well, resources. The destination is certainly a resource (but more on this in a moment), and I think you could consider the campaign a resource. The partner is not a resource you serve. Custom is certainly not a resource, as it's entirely undefined.
I hear what you're saying about not wanting to have to tell the partners to "create a campaign," but consider that you're likely to eventually have to go down this road anyway. As soon as the campaign has any properties other than the partner identifier, you pretty much have to do this.
So my first to conclusions are that you should probably get rid of the partner ID, and derive it from the campaign. Get rid of custom, too, and use query string parameters instead, should it be necessary. It is appropriate to use query string parameters to specify how to return a resource (as opposed to the identity of the resource).
Removing those yields:
"campaign/{destination}/{campaignid}",
OK, that's simpler, but it still doesn't look right. What's destination doing in between campaign and campaign ID? One approach would be to rearrange things:
"campaign/{campaignid}/{destination}",
Another would be to use Astoria-style indexing:
"campaign({campaignid})/{destination}",
For some reason, this looks odd to a lot of people, but it's entirely legal. Feel free to use other legal characters to separate campaign from the ID; the point here is that a / is not the only choice, and may not be the appropriate choice.
However...
One question we haven't covered yet is what should happen if/when the user submits a valid destination, but an invalid campaign or partner ID. If the correct response is that the user should see an error, then all of the above is still valid. If, on the other hand, the correct response is that the user should be silently taken to the destination page anyway, then the campaign ID is really a query string parameter, not a part of the resource. Perhaps some partners wouldn't like being given a URL with a question mark in it, but from a purely REST point of view, I think that's the right approach, if the campaign ID's validity does not determine where the user ends up. In this case, the URL would be:
"campaign/{destination}",
...and you would add a query string parameter with the campaign ID.
I realize that I haven't given you a definite answer to your question. The trouble is that most of this rests on business considerations which you are probably aware of, but I'm certainly not. So I'm more trying to cover the philosophy of a REST-ful URL, rather than attempting to explain your business to you. :)
I think the URL rewriting is getting out of hand a little bit lately. Not everything belongs to the URL. After all, a URL is supposed to describe a resource that can be searched for, discovered or manipulated and it seems to me that at least the partner ID and the custom fields from above are not part of the resource.
Not to mention that that at some point you would like to actually keep the partner ID constant across multiple campaigns and that means that it is now orthogonal to the particular places they need to visit. If you keep these as parameters, you will allow your partners to access uniformly multiple resources on your website, while still reliably identifying themselves, so you can track their participation in any of your campaigns.
It looks like you've covered all of your bases. The only suggestion I have is to change
{custom}
to
{*custom}
That way, if you ever need to accept further parameters, you don't have to take the chance that old URLs will get a 404. For example:
If you have a URL that looks like:
campaign/PR/SO/123
and you decide in the future that you would like to accept a fourth and fifth parameter:
campaign/PR/SO/123/blah/foo
then the first URL will still be valid, because you're using a wildcard character in {*custom}. "blah/foo" would be passed as a string to your action. To get those extra two parameters, you would simply split the custom argument in your action by '/'. Add some friendly error handling if they don't exist and you've successfully changed the amount of information you can receive with a campaign URL without completely breaking URLs already in the wild.
Why not use URL encoded variables instead of routes? They're a lot more flexible - you can add any new features in the future while still maintaining 100% backwards compatibility. Admittedly, it's a little more trouble to type manually, but if there's all those parameters anyway, it's already no picnic.
http://mysite.com/page?campaign=1&dest=products&pid=15&cid=25
To me, this is much more indicative of what is really going on. Using paths implies a that a resource exists at that location. But really you're just providing a web service with various parameters, and this model captures that much more clearly. And in the future, you can add more parameters effortlessly. You can also default parameters if they are missing without messing anything up.
Not sure of the code in ASP, but it should be trivial to implement.
I think I'd look at doing it the way that SO does it's questions.
"campaign/{campaign-id}/friendly-name-of-campaign"
Create a mapping in your database when the campaign is created that associates all the data you need with an automatically generated id. The friendly name could be assigned basically the same way as a question is on SO -- by the user -- but you could also have an approval process that makes sure that it meets your requirements and is distinct from any existing campaign names. Your tracking company can track by the id and you can correlate that with your associated data with a simple look up.
What you have looks good for your needs. The other posts here have good points. But may not be suitable for you. One thing that you could consider with future proofing your links is to put a version number somewhere in there.
"campaign/{version}/{destination}/{partnerid}/{campaignid}/{custom}"
This way if you decide to completely change your format you can up the version to 2.0 (or whatever) and still keep track of the old links coming in.
I would do
/c/{destination}/{partnerid}/{campaignid}/?customvar=s
You should think about the hierarchy of the first parameters, you already got that managed quite well. Only if there's a hierarchy path segments should be used.
From your description, destination seems to be the broadest parameter, partnerid only works with destination, and campaingid is specific to a partner.
When you really need to add custom parameters I would go for query variables (they are not forbidden in REST), because these are not part of the hierarchy.
You also shouldn't try to be too RESTful here. After all, it's for a campaign and for redirecting to a final resource. So the URL you want to design here is not really a specific resource in the terms of REST.
Create an URL called http://mysite.com/gateway
Return an HTML form, tell your partners to fill in the form and POST it. Redirect based on the form values.
You could easily provide your partners with the javascript to do the GET and POST. Should be trivial.
The most important thing i have learned about REST URL´s thats usually burried deep in some book or article:
The URL should point to a resource and the following ?querystring should have all the scoping information needed. DONT mix those two or you will have a design thats very hard to work with.
Other then that i fully agree with Craig Stuntz

Using Yahoo! Pipes

Have you used pipes.yahoo.com to quickly and easily do... anything? I've recently created a quick mashup of StackOverflow tags (via rss) so that I can browse through new questions in fields I like to follow.
This has been around for some time, but I've just recently revisited it and I'm completely impressed with it's ease of use. It's almost to the point where I could set up a pipe and then give a client privileges to go in and edit feed sources... and I didn't have to write more than a few lines of code.
So, what other practical uses can you think of for pipes?
It's nice for aggregating feeds, yes, but the other handy thing to do is filtering the feeds. A while back, I created a feed for Digg (before Digg fell into the Fark pit of dispair). I didn't care about the overwhelming Apple and Ubuntu news, so I filtered those keywords out of Technology, which I then combined with Science and World & Business feeds.
Anyway, you can do a lot more than just combine things. If you wanted to be smart about it, you could set up per-subfeed and whole-feed filters to give granular or over-arching filtering abilities as the news changes and you get bored with one topic or another.
The one thing I have really used Y! Pipes for (rather than just playing around with it) is to clean up item titles, merge and finally de-dupe the feeds I got from querying multiple blog search engines with the same search term. This is something I’ve done in several very different contexts, eg. for my own ego surfing, in another case for the planet site set up by some conference’s organisers to keep an eye on their conference’s buzz, etc. Highly recommended.
You can do tons of things with pipes. For example for sites like digg or reddit, you can make one to bypass the site and go directly to the linked article (rewriting the RSS).
I like also to filter webcomics' feeds to keep just the comics, and then mix them all in only one feed
I've taken the liberty of copying your pipe and rearranging it a bit so that it's easier to add and remove tags:
Yahoo Pipe: StackOverflow Merge Tags
Tags are now listed in a string builder, so to add a tag you just have to hit the + button on the string builder and type in the tag preceded by a slash.
Well, pipes are real fast and useful.
Other effective uses might be:
1) combine many feeds into one, then sort, filter and translate it.
2) geocode your favorite feeds and browse the items on an interactive map.
3) power widgets/badges on your web site.
4) grab the output of any Pipes as RSS, JSON, KML, and other formats.
This is by no means a comprehensive list.
One of my favorite things to do with Yahoo! Pipes is to aggregate multiple craigslist feeds into a single feed. You can make a feed out of any category or search criteria on craigslist. I live in a university town and am always on the lookout for tickets to sporting events, for example. I have a half-dozen craigslist searches all being combined into a single feed via Yahoo! Pipes. This works a lot better for me than simply monitoring the entire "Tickets" category; filters out most of the tickets I am not interested in. Yes, this is another aggregating feeds example, but the craigslist usage is quite valuable with the ability to aggregate feeds that are themselves based upon searches.
I've used Pipes to translate blogs into English. I would have liked to use it to fetch the full text for blogs which only provide a summary of the content in the feed, but unfortunately they don't provide any input which fetches the content from a parameterizable source :-(.
Just stumbled on this while looking for ways to connect Excel to Pipes. A bit necromancer-ish, but here goes.
One thing I've done, is take an HTML page (science data) which has links to tons of CSV files for a bunch of Army Corps measurement stations. Each station has a big table of datafiles, all organized individually by month and year. I use YQL to parse out and organize the links to the individual CSV files in a way that Pipes can read them. Then, I use that as input into a Pipe, which has a user input for "Station" and "Date."
Using this, I can go to the Pipes page, type in those values and get the values only for a specific station and date, rather than have to find the station on a website, find the year and month in a big table, click the link, open the CSV file, and find the values for a day within that month's worth of data. I can even change the pipe to specify the hour, and the parameter, and then get a single value returned.
Now, I wish I could figure out how to program Excel so that I can use "=yahoo_function(station, datetime)" to place that value automatically into a cell give the values of other columns!

Resources