capture details from external web page - http

I'm wondering if it's possible to capture details from the web page that a user previously visited, if my page was not linked from it?
What I am trying to achieve is to allow users to my site to find a page they like while browsing the web, and then navigate to a page on my site via a bookmark, which will add the URL (and possibly some other details like the page title) to a form which they can then submit to my site to add the page to a list of favourites there.
I am not really sure where to start looking for this. I wondered if I could use http referrer, but think this may only work if there is a link to my page?
Alternatively, I am open to other suggestions as to how I could capture this data - a Firefox plugin? A page which users browse other sites in an iframe, with a skinny frame on top?
Thanks in advance for your suggestions.

Features like this are typically not allowed by browsers for security and privacy reasons. The IFrame would work, but this is a common hacking technique so it may be likely to break or be flagged in the future.
The firefox addon is the best solution, but requires users to install it manually.
Also, a bookmarklet could be used. While they are actively on the target page, the bookmarklet could send you the URL.
This example bookmarklet would create a tinyURL for the destination page. You could add it to your database or whatnot.
javascript:void(window.open('http://tinyurl.com/create.php?url='+document.location.href));

If some other site links to yours and the user clicked on that link which took them to your site you can access the "referrer" from the http headers. How you get a hold of the HTTP headers is language / framework specific. In .NET you would use the Request.UrlReferrer; other frameworks would probably handle it differently.
EDIT: After reading your question again, my guess would be what you're looking for is some sort of browser plugin. If I understand correctly, you want to give your clients the ability to bookmark a site, while they are on that site, which would somehow notify your site about the page they're viewing. The cleanest way to achieve this would be a browser plugin. You can also do FRAME tricks, like the Digg bar.

Related

How to check search results of website on google?

I am working on a website "https://datasiplus.com".
When i type datasiplus on google, i get as 3rd result this url "https://datasiplus.com.cutestat.com/".
Is it normal ?
Can it be the cause for my website having unwanted popup ads ?
How to check search results of website on google?
You can see all indexed pages from your website (domain) if you go to google search and type the following
site:datasiplus.com
cutestat.com is it normal?
This page is a tool to get information about a specific domain. It's estimating the value, traffic and lot more. Either this tool has automatically crawled your page or someone searched for your domain with it.
There is a form on their site, where you can request to remove your domain from cutestat.com here
So yes, it's normal that this is in google index because it's like a subpage of their tool and datasiplus is a keyword for both sites, yours and datasiplus.com.cutestat.com
If you go to google now and search for datasiplus, then you can already see your own question there.
Can it be the cause for my website having unwanted popup ads?
No, this page will not cause unwanted popup ads on your page (or any other page).
Popups like this is most probably caused by malware on your page. This may be introduced through some security holes in wordpress and / or from one of the plugins you are using.
To get started to search and remove such malware, you can start at this SO question

Logged-in Users need to Refresh page to see content

Hi I'm having an issue with a site where visitors need to be members to access certain pages, but once logged in they go to these pages and still see the 'not logged in' page and need to refresh to view the actual content.
This obviously leads to a lot of bounces and I'd like to fix so that they see the content right away.
The root issue comes from some cache settings or something from the host - unfortunately we can't change host (and it's not a regular hosting company with a website but a design company reseller) for the time being. This issue does not occur in our offline environment of the same site.
I've already had to add a ?randomnumber to the stylesheet so it loads new versions properly. I was wondering if something like this would work - but dynamically as pages are being added all the time by different admins.
Or any other solutions also appreciated!
Thanks
Like you said, tweaking the caching settings would be the most ideal. But since that's not an option, I'd suggest adding a random, meaningless query string to the URL of the member pages so that it's seen as a 'new page' and (likely) won't cache.
So instead of /member-page
Direct them to /member-page?cache-buster=randomlyGeneratedStringHere

Does Wordpress list all pages for crawlers?

I created a page on a Wordpress site that was for internal use only and triggers some backend code. Within a few days I started seeing hits on that page from "bingbot".
I'm not using any kind of sitemap plugin. How are crawlers finding this page?
I know the robots.txt file can block them but I want to make sure they don't show up for crawlers that don't respect this. I still want to have the page publicly accessible if someone types in the URL.
What needs to be done in Wordpress to make sure a page can't be discovered except by typing in the URL?
Any given URL is potentially "discovered" once the post is published and if there's a link to it from elsewhere on your site. There's no guaranteed way to prevent search engines from indexing a URL.

LinkedIn sharing doesn't work as expected

I'm looking for an easy way to share through LinkedIn without all that hassle with OAuth 2.0 which I don't see required when I see other pages that use this kind of sharing (and they didn't required anything from - I can straight out share).
Straight to the issue:
this one works: https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Frefair.me
this one doesn't: https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Frefair.me%2Fjob%2F494
Seems like beyond main domain I can't get sharing working. For instance from other site a link that goes deeper and is still shareable: https://www.linkedin.com/shareArticle?mini=true&url=https://bulldogjob.pl/companies/jobs/2043-programista-java-warszawa-bms-sp-z-o-o&title=Programista+Java&summary=&source=https://bulldogjob.pl
I also tested with and without source and summary query params. Anyone had that issue?
LinkedIn uses the Open Graph protocol (http://ogp.me/) to determine how pages are shared in LinkedIn.
You may also use the LinkedIn Post Inspector (https://www.linkedin.com/post-inspector/) tool to debug how various pages would be shared in LinkedIn.
I decoded your URL so I could get a cleaner look...
https://www.linkedin.com/sharing/share-offsite/?url=https://refair.me/job/494
So, let's try to visit your URL: https://refair.me/job/494 . The webpage you are sharing DOES NOT LOAD.
Is your site down for everyone? Yes, your site is down for everyone.
In order to share a URL on LinkedIn, you must fulfill the following minimum requirements:
The URL must load.
If you just want to test out the API, try using wikipedia.org or google.com as test pages.
Surprisingly, the old refair.me URL by itself works fine in LinkedIn, but that could be from some internal cache, from way back in the day when the page once did work. It certainly does not do so anymore.

Site Hijacking RSS Feed and Entire Site in iFrame

The following site appears to be hijacking a client's content.
http://mothernova2.rssing.com/chan-24556607/latest.php
This is my client's site.
http://www.mothernova.com/
How would I go about blocking that domain from accessing the site? It also appears they are pulling the site into an iframe allowing full browsing.
FYI, the site is using WordPress, WordFence and iThemes Security (if there are any settings I should add for blocking).
You need to use a framekilling script, which uses javascript to check if your script is the top one. Here's a simple version:
<script type="text/javascript">
if(top != self) top.location.replace(location);
</script>
One drawback to this approach: if there is a legitimate site iframing your code, you need to check the referrer and start adding exceptions.
And a question to answer before you do it: you're getting a pageview and ad impression from the annoying framing site; is there any reason why you need to go to the bother, when they're sending a few viewers to your client's content?
The site owners of rssing.com are well known scrapers. And they are grabbing your content by RSS, hence the name rssing.com.
You can use the contact form to ask that they take your content down. Tell them they are clearly violating your TOS and copyright for your content.
(I had to do this in the past for my own content scraped from my site; they did remove my site at my request.)
Maybe I wasn't implementing the above suggestions correctly (I was adding them at the page level), but they weren't working for me. I did find this post and it seems to work as outlined.
http://forum.ait-pro.com/forums/topic/rssing-com-good-or-bad/
I updated my .htaccess file with the suggested code.
Brett

Resources