we'd like to implement OpenGraph on an intranet application, so that when people share a URL from the application into a social network (Yammer, Jive, Chatter ...), it would show a nice thumbnail, description, and so forth.
The problem: because Yammer is not connected to the intranet, it follows the redirections and serves OpenGraph data from the login page...
Is there a way to behave properly in such a case ?
We've come up with 3 possible solutions:
Implement an unknown but possibly existing part of the OpenGraph protocol, to serve private pages, ignoring as well as possible the redirections
Doing some kind of cloaking - detecting the agent is Yammer or Chatter, and serve a dedicated page
Keeping the OpenGraph meta data in some kind of session, and serves them from the login page (where the social network eventually ends up...)
Thanks for your input if you've been confronted to this problem too !
Third solution sounds like the best one. Since it is allowed (by your rules) to show part of a data outside your intranet, you have to add individual thumbnail and description in the meta tags of login page.
If a user is logged in, he can see all the data from page yoursite.com/username/post123/ (as usual),
But if user is not logged in (like any bot), he will see login form (with thumbnail and description in meta tags) on the same address yoursite.com/username/post123/
So all bots will see proper OG data, all users will be able to login as usual.
(i.e. you shouldn't redirect not logged in visitors to the page yoursite.com/loginpage. You have to show a login form on all such pages)
Related
From my app I want to share contents on linkedin with a redirect url this is because when request comes from linkedinbot I want it to redirect to a different page from where it can gets all the meta tags, but if it comes from browser I want to show a different page(an iframe) for users.
For example:-
I want to share the below link "https://example.com/something/1000/redirect" after sharing the above link on linkedin the content have the following href "https://example.com/something/1000/social" which I don't want instead I want the same redirect url as href on the content.
Everything is working fine for facebook and twitter
Typically, you do not want to do any type of redirecting for search engines, or they may mistinterpret your redirection as abuse. Take a look at the Google Ads Policy...
The following is not allowed:
Not allowed Engaging in practices that attempt to circumvent or interfere with Google's advertising systems and processes
Example 1: Cloaking (showing different content to certain users, including Google, than to other users)...
So, you may have metadata stored in og: tags, oEmbed pages, rdf format, dc: dublin core tags, or actually plain simple metadata HTML tags.
Why would you want to show one set of this information to the user and another set to the search engine? What harm does it do to have the user see what the search engine sees, and vice versa?
I have an iframe canvas app that requires accepting of an auth dialog before use, i.e. from a browser point of view, all requests are immediately redirected to the auth dialog if the user has not already accepted them.
This means that the Facebook parser cannot read the OpenGraph tags on my page because it can never get past the authorisation point.
In a broader scale, this problem would also occur on any page that requires a login before viewing.
What would be the best way to work around this issue?
One solution I have is to check if the Organization that the client IP address belongs to is Facebook and if so let it in without authorisation, and display a page with no content except the OG tags. This is fine for robot programs access from Facebook, but what happens if an employee tries; I don't want my app to be disabled because someone from the policy team cannot properly verify the app.
As to why I need OG tags - users regularly share the link to the app and it looks bad with just a URL as the title and description and no image.
You need to allow Facebook's URL scraper to view your objects.
Best way to do this is to check for the string 'facebookexternalhit' in the useragent string. If present, return a stripped down HTML page containing the OG tags for the Facebook's scraper to read.
As part of a webapp I'm building, there is an iframe that allows the currently logged in user to edit some content that will only be displayed in their own logged-in profile, or on a public page with no logged in users.
As that means the content will only be viewable to the user who entered it, or to a user on a public site, does this mean the risk of XSS is redundant? If they can only inject javascript into their own page then they can only access their own cookies yeah? And if we then display that content on a public page that has no concept of a logged in user (on a different subdomain) then there are no cookies to access, correct?
Or is my simplistic view of the dangers of XSS incorrect?
Anthony
Stealing authorization cookie information is actually not the only harm JavaScript injection can bring to other users. Redirects, form submits, annoying alerts and uncountable other bad things can happen. You should not ever trust html content provided by user, and neither display it to others.
To avoid html injection and at the same time allow users to provide html, the general idea is to have the predefined set of html tags, that can bring no harm to other users, for example some text or division paragraphs, but not unchecked images and javascript. You parse provided html and delete all but those tags.
You can use HtmlAgilityPack or any other library that can help you parse html provided by user. Than you can filter out and delete any unwanted source, and leave only safe markup.
Often an attacker will use multiple vulnerabilities when attacking a site. There are a couple of problems with allowing a user to XSS him/herself.
CSRF - A user can visit a malicious site which posts malicious data to his profile and is thus XSSed.
Clickjacking with content - See http://blog.kotowicz.net/2011/07/cross-domain-content-extraction-with.html
Next if that content is displayed on the public page, it could redirect users to different sites containing exploits that automatically take over the users computer, or they could redirect to porn.
Let's say I have an ASP.NET web application. I create an aspx page that shows a table containing users and email addresses. The user data is stored in a database, and when the page is requested by a logged-in user, html is generated to display the data. If the users requesting the page are not logged in, they are redirected to a sign-in page.
All of this is very standard.
My question is, is there any way the personal data could end up being indexed by a search engine (besides someone hacking into the site or an evil user publishing the data somewhere public)?
What if there was no requirement that users log in? Would the data then be indexed?
In general, search engines should index exactly whats visible to the public visitors, google will be angry with you if you'll expose something different to their spiders.
if you want to control the pages that are indexed on you server check out: http://www.robotstxt.org
If the users don't have to login to access the data, then I see no reason why a search engine could not get access to it. Your data will be indexed if it's not protected by a login.
If there's a login mechanism, it will not be indexed.
IMO you should remove the login requirement from the profile page and also make a sitemap to give a list of users to the search engines. You should prevent guest from viewing users' extra information only.
I've setup forms authentication in my Google Search Appliance. Is there a way to have the title and a summary come back for protected pages? Currently, since they are all redirected to the login page, all search results are titled as "Login." I'm using asp.net with the .net framework 3.5.
You need to either:
Configure the Search Appliance to authenticate against your server.
Allow the search engine through to your protected pages.
On some of our client sites we've gone with option 2, partly because of the dynamic nature of the protection (i.e. articles published in the last 30 days are open, but you need a subscription to see the archive) which didn't lend themselves to using web.config settings.
We have a "Base Page" class that inherits from System.Web.UI.Page, and that all our pages inherit from.
In that class, we check a number of things, including the IP address and user agent of the calling client, if these match our search engine, we display a custom page layout that removes things like navigation, header, footer, etc (using a master page), and display some additional metadata that we use for filtering - this way the search engine sees and indexes the entire content.
If these checks fail, then we check to see if the user is authenticated, and if they have a vaild subscription.
If they don't have a valid subscription or aren't authenticated, we display a summary of the page, in place, along with a call to log in or register (using standard ASP.NET controls).
If the title of your pages is something other than login, you probably haven't set it up correctly. The title of the document is what was indexed by the GSA during the crawl. I posted previously some tips to completing the SSO wizard here: http://www.mcplusa.com/blog/2009/02/completing-the-sso-wizard-on-the-google-search-appliance/