Use Google Search Appliance Test center as my own aspx page - asp.net

How would I use the view google has in the test center (where i test my frontend)?
When a user browse to site/search.aspx I want the to get the view testcenter shows, searchboxes and everything. I would also like to add my own javascript and css to the page.
Is this possible?
Now I have created a search box with updatepanel to show the results but this approach will force me to do a lot of parsing and setting variables for the dynamic navigation. I.e. a lot of logic Google already serves in test center.
By the way, I dont want to use the McA+ library supporting GSA 6.14.

I serialized the xml result from the GSA to C# objects and then fed them to my frontend where I could handle them.

Example of converting XML to HTML using XSL in C# ASP.NET can be found at: http://www.codeproject.com/Articles/469723/Rendering-XML-Data-as-HTML-using-XSL-Transformatio

Related

Dynamic open graph tags in Single Page Application

I am trying to inject op:tags in my reactjs App. I came across https://github.com/nfl/react-helmet and it dynamically inject the tags ion my index.html header juts like i wanted it. The problem is, it injects the tags at the end of the head and thus was not recognised by facebook debugger here. It works when the ogen graph tags appear right in the beginning of the header before the script tags. With reac-helmet however, it injects them at the extreme end. Please how do i best fix this ? I am trying to have article preview on social media and this is failing just because of the arrangement. Any help would be appreciated.
well, I don't think it is because of the arrangement.
As far as I remember FB doesn't execute javascript code in the provided URL.
Facebook’s scraper just looks at the HTML code of your page; it’s not a full-fledged “browser” that would execute any client site code.
with that being said.whatever meta tags you need there it can't be done via JS on the client-side. it must be server-side rendered.
I am not sure what technology you are using to serve this app, but I can assume it is a react app. and it would be easy to handle this via a small express server. that serves the app with the right meta tags in place even.

Extracting content data from webpages

I'm looking to get structured article data from webpage urls. So far I've found these two services http://www.diffbot.com/ and http://embed.ly/extract/demos/nlp. Are there better alternatives or is it worthwhile to write the code to do this myself?
If you'd like to skip the code, and are looking for a simple software for web scraping / ETL applications, I'd suggest Foxtrot. It's easy enough to use and doesn't require coding. I use it to scrape data from certain gov't websites and dump it into an Excel spreadsheet for reporting purposes.
I have done web scraping / content extract for quite some time now.
For me the best approach is to write a Chrome content extension and automate the browser with their API. This requires that you know Javascript and HTML. In one of my recent projects I use a background page with a couple of editable divs to configure the scraping session. I have some buttons on the background page to start the process. The background page loads a JS script which listens to click events of the buttons.
When one of the buttons is clicked I add a new tab for the scraping session with chrome.tab.create. The background js also defines some chrome.tabs.onUpdated.addListener to inject content scripts when the tab url contains a specific page/domain name.
The content script then does the scraping job for example selecting some elements with jquery, regular expressions etc and finally send a message with an object back to background JS using chrome.runtime.sendmessage. The background JS script listens to messages with chrome.runtime.onMessage.addListener and acts based on the content being extracted.
The extension also automates web databases by clicking for example the next page links.
I have added a timing setting to control the amount of links being clicked / tabs being opened per minute so that the access is slowed down on purpose and too much crawling is avoided.
Finally the results are being uploaded to a database with an AJAX call and inserted with a PHP page into MySQL.
When the extension runs the next time it compares the keys/links which already exist in the database with another AJAX call and ensures that only new information is being extracted.
I have also built extension like the above with Firefox but the best and easiest solution for me is a Chrome/Chromium content extension.

google analytics event tracking for PDF downloads, Generic approach

I need to implement the GA tracking event for PDF file downloads, For that had searched a lot and found out many code where i can add some code to links and track them from GA's content section, But the problem is I do have a lot of PDF link on the page and don't want to edit every link and I want the code to be generic for future uploaded links also.
So what would be the best approach for this task, Any referral links would do or any code would be highly appreciated .
Thanks in advance.
You can explore the use of Google Tag Manager, where you can create a generic tag that will return to you information for each individual link. GTM uses things called "macros" which is like a template that returns useful information including the clicked element's ID, or pathname (which in your case for the PDF files, would all be different). So in this way, you would only need to call this macro each time a PDF file is clicked. No coding is involved using this standard approach through GTM. Here's a link to a descriptive explanation: http://porcelainduck.com/2014/03/track-pdf-downloads-google-tag-manager/. You can see that it uses the {{element URL}} macro that returns the PDF's unique URL. GTM not only applies to current links, but also all future links.
Based on the tags which you've used to mark your question with you're using C#, ASP.NET
If that's the case, can't you create a base page that on rendering replaces all the
I would recommend adding a click event with JQuery to all links or all links inside the download widget class. Inside the click handler I would then grab the link text and use that as part of the Google analytics event you fire.

Custom Parser for Nutch (or open source .NET Crawler)

I have been using Nutch/Solr/SolrNet for my search solutions, I must say, it works a treat. On a new site I'm working on, I am using Master pages, as a result, content in the header and footer is getting indexed and distorts the results. For example, I have a link to the Contact Us page in the header. Now, when I search for 'Contact' the result returns all the pages in the site.
Is there a customizable Nutch parser that i can maybe pass a div id and then it only indexes content inside the div.
Or if there are .NET based crawlers that I can customize.
See https://issues.apache.org/jira/browse/NUTCH-585
and https://issues.apache.org/jira/browse/NUTCH-961
BTW you'd get a more relevant audience by posting to the Nutch user list
You can implement a Nutch filter (I like Jericho HTML Parser) to extract only the parts of the page you need to index using DOM manipulation. You can use the TextExtractor class to grab clean text (sans HTML tags) to be used in your index. I usually save that data in custom fields.

re-rendering a site within an iframe?

I want to make a site where there user can basically navigate the web from within an iframe. The catch is that I'd like to be able to have more control over what is rendered within the iframe. Specifically,
I'd like to be able to filter out images or text, disable forms etc.
I'd also like to be able to gather feedback such as what links the users clicked on.
Question 1:
Is this even possible using a standard back-end scripting language (like php), with html and javascript on the frontend?
Question 2:
Would I first need to grab the source of the site before it is rendered, then do whatever manipulation is necessary, and finally re-render it somehow?
Question 3:
Could somebody please explain the programming flow that would occur here (assuming its possible)?
I think you would probably want to grab the source of the of site (with server-side code) before rendering it. You might run into cross-site scripting issues if you try to use JavaScript. Your iframe would load a page like render.php and pass the address of the page to render os a querystring parameter. Then use regular expressions to find elements in the HTML that render.php downloads from the address. Rewrite the HTML as necessary and then write it all out to the iframe.
Rewrite links so that that the user is taken to a page you control and redirected onto a target site if you want to track where people are going. Example: a link in the page needs to go to google.com. You would send them to tracker.php?target=http://google.com. You control tracker.php and can log each load of this page and then redirect the user to the target site.
Update:
Another possible solution is to use Apache or other server to proxy the target website. There are modules like mod_proxy for this. There may also be modules that let you parse the HTML or you could roll your own.
I should point out that even the best solutions offered to your question will be somewhat brittle if you do not have full control over the target site. You will want to have lots of error handling or alerting.
You can have a look at this. It uses iFrame really well, and maybe even use the library it has.

Resources