How to identify if a page request is for a preview - http

When I paste my website link at a social network (Facebook or Twitter for example), the social network access my site to show a preview to the user.
I want to separate this access from real access at my reports, but to do this, I need to identify this cases.
This kind of access send any kind of information that is default for everysite that I can identify that this access is not a real user, but a robot?

You should be able to identify those robots from their user-agent string.
For example, Twitter uses the User-Agent of Twitterbot. And Facebook crawler identification is documented here.

You can use the User-Agent to do that.
if (strpos($_SERVER["HTTP_USER_AGENT"], "Twitterbot") !== false)
echo "TwitterBot";
else if (strpos($_SERVER["HTTP_USER_AGENT"], "facebookexternalhit") !== false)
echo "Facebook";
else
echo "regular user";
https://developers.facebook.com/docs/sharing/webmasters/crawler

Related

Redirecting a URL to QR code data (.vcf / VCARD) - i.e., not a URL

I've accidentally made (and distributed) a QR code to a URL (important: not a VCARD data set like below). So now I need to redirect visits from the URL "directly" to the VCARD data:
BEGIN:VCARD
VERSION:3.0
N:Doe;John;
FN:John Doe
TEL;TYPE=CELL:54321
END:VCARD
I say "directly" in the sense that iPhones happily offer to save the contact if the URL is to a .vcf file but, based on my tests, androids don't and instead need to have the QR code go "directly" to the data set (I don't know another way to get an android to directly prompt to save the contact?).
I studied some related posts but they talk about getting the android user to first download a VCARD file or an app or generate a .vcf file which is not my situation as my URL already goes straight to a .vcf file.
I don't know for sure whether it's possible to get the android to prompt to save a contact if I return the VCARD data set through redirecting to a page with some magical PHP functions.
Because the androids don't prompt to save a contact upon visiting xyz.com/jd.vcf, I need to "redirect" that to the VCARD data set - but given that it isn't a "URL" I can't redirect to it.
I did it in PHP! The below works for iphone and Android, so no need to split the URL visits by device either!
# Send correct headers
header("Content-type: text/x-vcard; charset=utf-8");
# Set variables for contact information
$family_name = "DOE";
$given_name = "JOHN";
$additional_names = "";
$prefix = "Mr";
$suffix = "";
$formatted_name = "$prefix $given_name $family_name";
# Output vCard data
echo "BEGIN:VCARD\r\n";
echo "VERSION:3.0\r\n";
echo "N:$family_name;$given_name;$additional_names;$prefix;$suffix\r\n";
echo "FN:$formatted_name\r\n";
echo "END:VCARD\r\n";

How to remove _ga query string from URL

I have a multidomain website for which there is GA tracking. Recently we moved to Universal Analytics and noticed that whenever the domain is changed (from US to Korean/Japanese), a _ga=[random number] is appended to the URL
i.e. from
abc.com
when i click on the japanese site, the URL becomes
japanese.abc.com/?_ga=1.3892897.20937502.9237834
Why does this happen?
How can I remove the _ga part of the URL?
Appreciate your help.
This is needed for cross-domain-tracking (i.e. track people who cross domain boundaries as one visitor and not as one visitor per domain). If you want cross domain tracking you cannot remove this. The _ga - part is the client id which identifies a session and since it cannot be shared via cookies (which are domain specific) it has to be passed via the url when the domain changes.
Since somebody set your site up for cross domain tracking I guess you actually want this (it does not happen by default). The parameter is a necessary side effect of cross domain tracking with Universal Analytics. If you do want this look in the tracking code for any of the linker functions mentioned in the documentation and remove them.
Updated to answer the questions from the comment.
Is there no way to remove the _ga string and still have the cross
domain facility?
No, currently not. Browser vendors work on better ways of cross
domain communication so there might be something in the future, but
at the moment the parameter is the best way.
Also, what if some user randomly changes the _ga value and presses
enter? How will GA record that?
If the user happens to create a client id that has been used before
(highly unlikely) his visit would be attributed to another user.
Realistically Google Analytics will just record him as a new user.
Updated
For those who like to play I did a proof of concept for cross domain tracking without the _ga parameter. Something along those lines could be developed further, as-is it is not suitable for production use.
Update: David Vallejo has a Javascript solution where the _ga parameter is removed via the history API (so while it is still added it is for all intents and purposes invisible to the end user). This is a more elaborate version of Michael Hampton's answer below.
I'm using HTML5 history.replaceState() to hide the GA query string in the browser's address bar.
This requires me to construct a new URL having the _ga= value removed (you can do this in your favorite language) and then simply calling it.
This only alters the URL in the address bar (and in the browser's history). Google Analytics still gets the information passed in via the query string, so your tracking still works.
I do this in a Go html/template:
{{if .URL.RawQuery}}
<script>
window.history.replaceState({}, document.title, '{{.ReplacedURL}}');
</script>
{{end}}
I was asked to remove this tag after it started showing up when we split our website between two domain names. With Apache Rewrite Rules:
RewriteCond %{QUERY_STRING} _ga
RewriteRule ^(.*)$ $1? [R=301,NC,L]
This will remove the tag, but will not be able to pass the _ga params to Google Analytics.
If the user doesn't mind a short refresh, then adding this code to every page
<?php
list($url, $qs) = preg_split('/\?/',$_SERVER['REQUEST_URI']);
if (preg_match('/_ga=/', $qs) ) header( "refresh:1;url=${url}" );
?>
will refresh after a second, removing the query string, but allowing the Google Analytics action to take place. This means that by the time your user has bookmarked or copied your URL, the pesky _ga stuff has long gone.
The above code will throw away ANY query string. This version will just strip out the '_ga' argument.
$urlA = parse_url($_SERVER['REQUEST_URI']);
$qs = $urlA['query'];
if (preg_match('/_ga=/',$qs)) {
$url = $urlA['path'];
$newargs = array();
$QSA = preg_split('/\&/',$qs);
foreach ($QSA as $e) {
list($arg,$val) = preg_split('/\=/',$e);
if ($arg == '_ga') continue; # get rid of this one
$newargs[$arg] = $val;
}
$nqs = http_build_query($newargs);
header( "refresh:1;url=${url}?${nqs}" );
}
You can't stop Google from adding the tag, but you can tell Analytics to ignore it in your reports. Thanks to Russ Henneberry for this: http://blog.crazyegg.com/2013/03/29/remove-url-parameters-from-google-analytics-reports/
It was written before Universal was released, so the language is outdated - now you create a new "view" (rather than "profile"). Creating a new view ensures that you still have the raw data in your default view (just in case you ever need it), so it's really the best solution (keeping in mind that you can't ever apply new settings retroactively in G Ax). Good luck!
You can't remove the _ga parameter from the URL on the website...BUT you can use an Advanced filter in Google Analytics to remove the query parameter from the reports!
Like this:
1) Field A: Request URI
Pattern: ^(.+)\?_ga
2) Field B: not needed
3) Output To -> Constructor
Field: Request URI
Pattern: $A1
This filter that will strip off all query parameters when _ga is the first parameter shown. You can get a lot fancier with the regex, but this approach should work for most websites.
See this page: https://support.google.com/tagmanager/answer/6107124?hl=en
& search for "use hash as delimiter"
Setting this value to true allows you to pass the value through a hash tag instead of through a query parameter
Should fix it
One way to handle this is to use the history.replaceState Javascript function to remove the query string from the URL after the page is finished loading and Google Analytics has done its thing. However, if you remove it too soon, it'll affect GA functionality (one visitor will show as multiple visitors). I've found that the following Javascript (with a 3-second delay)
<script defer src="data:text/javascript,async function main() {await new Promise(r => setTimeout(r, 3000));window.history.replaceState({}, document.title, window.location.pathname);}main();"></script>
I used "window.location.pathname" for convenience so that you can use the same script on many pages. However, you can also do like this (for the top page of the site):
<script defer src="data:text/javascript,async function main() {await new Promise(r => setTimeout(r, 3000));window.history.replaceState({}, document.title, '/');}main();"></script>
Or for a sub-page:
<script defer src="data:text/javascript,async function main() {await new Promise(r => setTimeout(r, 3000));window.history.replaceState({}, document.title, '/something/something.html');}main();"></script>
I did the "data:text/javascript" thing instead of a true in-line script so I could apply "defer" to it, although this probably isn't necessary if you're using a sufficiently long delay value.
You can filter out all (or only include) "?_ga=" parameters in Google Analytics for reporting purposes. I would also highly recommend adding a canonical to the base URL -- or adding the parameters to Google Webmaster Tools -- to avoid duplicate content.

how can a wordpress wesite show the visitor's name when he visit the website ? like welcome <visitor name>>

If someone visit my website then how can I show them their name when they visit and show them a welcome message like welcome Atul. My website is created in WordPress.
Your visitor need to be registered on your website. This is the only way WordPress can find his information. To know more about displaying information of USER read this article "http://codex.wordpress.org/Function_Reference/get_currentuserinfo"
To allow this to happen the user of the site will have to be registered as otherwise there is not a method to find their name. When they are logged in by default it will say something like "Welcome, Woolnut". There are some small plugins that can change the welcome message for you but you will need to get the user to log in before you have access to name for them. If they are logged in already and you want to display their username / name then take at look at this >> link, it may be of use!
Edit
Turns out my link is the same one as the other answer! (Is probably the best link however...)
A slightly better way is to call wp_get_current_user instead, like so:
$user = wp_get_current_user();
if ( 0 !== $user->ID ) {
echo $user->display_name;
}
This is a wrapper for the get_currentuserinfo that actually returns the user to you directly instead of just setting a global variable. It returns a WP_User object with the information of the current user in it.
If the user is unknown or not logged in, then the function will return a WP_User with the ID set to zero, so you can check for that and also handle unknown users.

How to get all client info from website visitors?

I want to collect all the information that we could when someone is visiting a webpage: e.g.:
clients screen resolution: <script type='text/javascript'>document.write(screen.width+'x'+screen.height); </script>
referer: <?php print ($_SERVER['HTTP_REFERER']); ?>
client ip: <?php print ($_SERVER['REMOTE_ADDR']); ?>
user agent: <?php print ($_SERVER['HTTP_USER_AGENT']); ?>
what else is there?
Those are the basic pieces of information. Anything beyond that could be viewed as SpyWare-like and privacy advocates will [justifiably] frown upon it.
The best way to obtain more information from your users is to ask them, make the fields optional, and inform your user of exactly what you will be using the information for. Will you be mailing them a newsletter?
If you plan to eMail them, then you MUST use the "confirmed opt-in" approach -- get their consent (by having them respond to an eMail, keyed with a special-secret-unique number, confirming that they are granting permission for you to send them that newsletter or whatever notifications you plan to send to them) first.
As long as you're up-front about how you plan to use the information, and give the users options to decide how you can use it (these options should all be "you do NOT have permission" by default), you're likely to get more users who are willing to trust you and provide you with better quality information. For those who don't wish to reveal any personal information about themselves, don't waste your time trying to get it because many of them take steps to prevent that and hide anyway (and that is their right).
Get all the information of client's machine with this small PHP:
<?php
foreach($_SERVER as $key => $value){
echo '$_SERVER["'.$key.'"] = '.$value."<br />";
}
?>
The list that is available to PHP is found here.
If you need more details than that, you might want to consider using Browserhawk.
For what end?
Remember that client IP is close to meaningless now. All users coming from the same proxy or same NAT point would have the same client IP. Years go, all of AOL traffic came from just a few proxies, though now actual AOL users may be outnumbered by the proxies :).
If you want to uniquely identify a user, its easy to create a cookie in apache (mod_usertrack) or whatever framework you use. If the person blocks cookies, please respect that and don't try tricks to track them anyway. Or take the lesson of Google, make it so useful, people will choose the utility over cookie worries.
Remember that Javascript runs on the client. Your document.write() will show the info on their webpage, not do anything for your server. You'd want to use Javascript to put this info in a cookie, or store with a form submission if you have any forms.
I like to use something like this:
$log = array(
'ip' => $_SERVER['REMOTE_ADDR'],
're' => $_SERVER['HTTP_REFERER'],
'ag' => $_SERVER['HTTP_USER_AGENT'],
'ts' => date("Y-m-d h:i:s",time())
);
echo json_encode($log);
You can save that string in a file, the JSON is pretty small and is just one line.
phpinfo(32);
Prints a table with the whole extractable information. You can simply copy and paste the variables directly into your php code.
e.g:
_SERVER["GEOIP_COUNTRY_CODE"] AT
would be in php code:
echo $_SERVER["GEOIP_COUNTRY_CODE"];
get all the outputs of $_SERVER variables:
<?php
$test_HTTP_proxy_headers = array('GATEWAY_INTERFACE','SERVER_ADDR','SERVER_NAME','SERVER_SOFTWARE','SERVER_PROTOCOL','REQUEST_METHOD','REQUEST_TIME','REQUEST_TIME_FLOAT','QUERY_STRING','DOCUMENT_ROOT','HTTP_ACCEPT','HTTP_ACCEPT_CHARSET','HTTP_ACCEPT_ENCODING','HTTP_ACCEPT_LANGUAGE','HTTP_CONNECTION','HTTP_HOST','HTTP_REFERER','HTTP_USER_AGENT','HTTPS','REMOTE_ADDR','REMOTE_HOST','REMOTE_PORT','REMOTE_USER','REDIRECT_REMOTE_USER','SCRIPT_FILENAME','SERVER_ADMIN','SERVER_PORT','SERVER_SIGNATURE','PATH_TRANSLATED','SCRIPT_NAME','REQYEST_URI','PHP_AUTH_DIGEST','PHP_AUTH_USER','PHP_AUTH_PW','AUTH_TYPE','PATH_INFO','ORIG_PATH_INFO','GEOIP_COUNTRY_CODE');
foreach($test_HTTP_proxy_headers as $header){
echo $header . ": " . $_SERVER[$header] . "<br/>";
}
?>

I want to port my delicious bookmarks to my website

I started building a app that will automatically download my delicious bookmarks, and save to a database, so they I can view them on my own website in my favoured format.
I am forced to use oAuth, as I have a yahoo id to login to delicious. The problem is I am stuck at the point where oAuth requires a user to manually go and authenticate.
Is there a code/ guidelines available anywhere I can follow? All I want is a way to automatically save my bookmarks to my database.
Any help is appreciated. I can work on java, .net and php. Thanks.
Delicious Provides an API for this already:
https://api.del.icio.us/v1/posts/all?
Returns all posts. Please use sparingly. Call the update function to see if you need to fetch this at all.
Arguments
&tag={TAG}
(optional) Filter by this tag.
&start={#}
(optional) Start returning posts this many results into the set.
&results={#}
(optional) Return this many results.
&fromdt={CCYY-MM-DDThh:mm:ssZ}
(optional) Filter for posts on this date or later
&todt={CCYY-MM-DDThh:mm:ssZ}
(optional) Filter for posts on this date or earlier
&meta=yes
(optional) Include change detection signatures on each item in a 'meta' attribute. Clients wishing to maintain a synchronized local store of bookmarks should retain the value of this attribute - its value will change when any significant field of the bookmark changes.
Example
$ curl https://user:passwd#api.del.icio.us/v1/posts/all
<posts tag="" user="user">
<post href="http://www.weather.com/" description="weather.com"
hash="6cfedbe75f413c56b6ce79e6fa102aba" tag="weather reference"
time="2005-11-29T20:30:47Z" />
...
<post href="http://www.nytimes.com/"
description="The New York Times - Breaking News, World News & Multimedia"
extended="requires login" hash="ca1e6357399774951eed4628d69eb84b"
tag="news media" time="2005-11-29T20:30:05Z" />
</posts>
There are also public and private RSS feeds for bookmarks, so if you can read and parse XML you don't necessarily need to use the API.
Note however that if you registered with Delicious after December, and therefore use your Yahoo account, the above will not work and you'll need to use OAuth.
There are a number of full examples on the Delicious support site, see for example: http://support.delicious.com/forum/comments.php?DiscussionID=3698

Resources