Geolocation of BGP Automous Systems - networking

Hi friends I've been looking around for the past few days on a way to find the geolocation of the BGP AS's, preferably through the use of some API. I've been using the ripestat API for the majority of my work on this, but it comes up inconclusive on some of the AS's, for example AS 10000. RIPE tells me the location is in JP. Which is sort of fine, I just would like to narrow it down more to like a city / postal code / etc if possible. Is there another API suited for this? or is it just a manual task of fixing all the information once gathered.
Alternatively, if it is possible to grab the IP address of the actual AS itself, and not the range, that would likely work as well.

IP Geolocation isn't nearly accurate enough to pinpoint an IP to a specific City/ZIP code. In many cases, IPs from the same block will be used across a large area in an ISP's control, so it's not possible to be very accurate. Autonomous Systems don't really have "an IP", as there's no one specific location of them.
If you're looking for the locations where they peer to other providers, you might want to check out PeeringDB.

Related

How can I test website content from different US states

I am working on a website that has a set of regular content for pretty much everyone. A few US state governments do not like the phrasing/implications of some of the content and would like to make sure that their states' residents are not able to see the regular content at all. The trouble is that these states are not my home state.
Most VPN solutions that I've found are about spoofing your location to make it appear that you are in a different country, and while many might have server locations in more than one US state, they are not the state(s) that I'm targeting.
Other than traveling to those states, or knowing someone in those states, what kind of options exist to test content as if I'm a resident of another state? I'm essentially looking for a VPN that is (or can be) US state specific, or some equivalent process.
For example, I have a normal corporate VPN, but is it possible to have IT set up alternate VPNs based in those states, such that connecting to "VPN 1" or "VPN 2" would make my traffic appear to be from those places instead of my home state? Would AWS have any kind of service/product that could assist?
Since your server is using Geo-IP to determine location, I would simply add some testing IPs to the database and associate them to the locations that you want to test. That way, you can use the IPs that you can control and test your system at your convenience.
No VPNs or fancy routing needed.

Google Analytics View for a large list of IP Addresses

We have a large amount of "internal traffic" that we want to filter to a separate view in Google Analytics. These are people that work for us but are in multiple locations. To be specific, I have over 2,000 ip addresses for this group of people.
When I try to set up a filter for this traffic, using regex, the character limit on the text box doesn't allow this many IP addresses.
The Filter Pattern field just isn't big enough to hold more than a few addresses. Any ideas how else I can import these addresses to set up a separate view or segment in Google Analaytics?
Expanding on Michele's and Eike's answer and trying to sum this up into 1 comprehensive answer. Your options:
Multiple filters: break down the rule into several smaller filters
Subnetting: define the rule as a collection of subnets instead of individual IP addresses. Tools like this one http://wintelguy.com/subnetcalc.pl might help you.
Custom Dimension Filter: for instance by providing a mechanism (eg ?internal) in the URL for people to tag themselves as being internal traffic. Example: https://www.simoahava.com/analytics/block-internal-traffic-gtm/
ISP Filter: if some of those 2K people work in the same offices and those offices are serviced by corporate ISPs, you can the ISP/Network Location built-in dimension to exclude those. When I work with large corporates with multiple offices around the world, it's very common that most traffic comes from ISPs named {company} ltd, {company} germany gmbh, {company} italia spa etcc... so I can filter with the company name instead of using IPs, very useful. To find out if you can use that method, have a look at the Audience -> Network -> Service Provider report to see what source ISPs are being used.
Test/QA Server: if those 2K people work for you to do testing etc..., you could have them access a test/qa/acceptance version of your site and simply use a different tracker for that one.
Just separate the list of IP in multiple filters (the number of filters you'll need will depend on how much your regex will be optimized).
At this point I would suggest you move the logic to your website - set a custom dimension in your tracking code depending on if the users IP is on a list of "internal" addresses or not, and then use the dimension in your filter. With that many addresses it seems like to more maintainable solution, especially if you have multiple views.

Finding the number of common users between two websites

There are two Swiss (.ch) websites, let's call them A and B. A is owned by me and B by a customer.
Because of legal data protection issues B is hosted in Switzerland and not allowed to store any user information abroad. Which means that software like Google Analytics is not available on B. A is a Swiss website but hosted in a (European) cloud.
Now we would like to find out how many common users we both have over the duration of 30 days. In short:
numberOfUsersA ∩ numberOfUsersB
For the sake of simplicity: Instead of users we are perfectly happy to measure common browsers.
What would you suggest is the simplest way to solve this problem?
First off all, best regards from Zurich/Zug :) Swiss people are everywhere...
I don't think you're correct that it's not legal to collect data in Switzerland at all (also abroad). As I'm working in the financial industry I know this topic very well and we also had to do a lot research to use GA at all.
It's always the question what and how you collect data. What you can't do - beside you got in upfront the permission of the user - is storing personal identifiable information. That's anyway not allowed by GA - you can't import/save in custom dimension/metrics for example email addresses.
Please check https://support.google.com/adsense/answer/6156630?hl=en as general basic information about this topic.
If you save the IP addresses via IP anonymization, you shouldn't run into problems if you're declaring this in your data-privacy statements. Take this approach: https://support.google.com/analytics/answer/2763052?hl=en
I'm not a lawyer and also not want to give you legal advises, but ours told us that's fine. If you are real paranoid about sending data to the USA - like we have to be - you can exclude your tracking from very sensitive forms.
To go back to your basic question, if you want to find this out via Google Analytics, your key is "cross domain tracking". Check https://support.google.com/analytics/answer/1034342?hl=en for more information in this direction.
The only work-around I have in my mind beside this, is if you start collecting browser-fingerprints yourself and then connect both collections over the finger prints together (that's not save, as your visitors will use more than one device/configuration). I personally would go for the IP anonimization, exclude very sensitive forms and ensure that your data-privacy declaration contains all necessary parts for and offer an opt-out option then you should be on the safe side.
All the best and TGIF :)

how to spoof location so google autocomplete API will provide local results, ideally with R

google has an API for downloading search suggestions:
https://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/xml_reference/query_suggestion.html
unfortunately, as far as i can tell, these results are specific to your location. for an analysis, i would like to be able to define the city/location that google thinks it is making the suggestion to. here's what happens when i scrape from dar es salaam, tanzania:
http://suggestqueries.google.com/complete/search?client=firefox&q=insurance
["insurance",["insurance","insurance companies in tanzania","insurance group of tanzania","insurance principles","insurance act","insurance policy","insurance act tanzania","insurance act 2009","insurance definition","insurance industry in tanzania"]]
i understand that a vpn would partially solve this issue, but only by giving me a different location and not lots of locations. is there a reasonable way to replicate this sort of thing quickly and easily from, say, the 100 largest cities in the united states?
confirmation that results differ within the usa-
thanks!
Google will use your IP and your location history (if turned on) to determine your location.
To be able to go around it, you can spoof your IP while logged off your google account (but I don't know if google will consider it a trial of hacking no matter what your intentions are).
Another way is to use Tor browser (even though it is not it's original purpose). You can configure tor to exit from a certain country using the Exitnode parameter in the torrc config file
As found in the docs:
ExitNodes node,node,…
A list of identity fingerprints, country codes, and address patterns of nodes to use as exit node
But if you want a fast way to do it, I don't think that's possible since google wants to know the real location of the users and have put a lot of effort into making such tricks fail.
The hl param for interface language changes the search results, but I can't tell if it's actually changing the location. For example:
http://suggestqueries.google.com/complete/search?client=chrome&q=why&hl=FR
Here's an example with 5 different values of hl:
http://jsbin.com/tusacufaza/edit?js,output

Standard and reliable way to track RSS subscribers?

What's the best way to track RSS subscribers reliably without using Feedburner? Some of the obvious approaches like tracking by IP or by the number of hits have some fata flaws. IP addresses can change with each request or multiple users can use the same IP. Also, feed readers can request a feed multiple times per day or even hour. Both problems make it really hard to get reliable stats on unique subscribers.
I've read articles by both Leo Notenboom and Tim Bray on the topic, but none of their suggestions seems to really solve how to track subscribers in an accurate and reliable way. Leo suggests having a unique ID generated programatically to be appended to the RSS feed URL for each time the referring page is loaded. Tim advocates having RSS readers generate a unique hashtag and also has suggestions ranging from tracking the referrers to using cookies. A unique URL would be reliable, but it has two flaws: It's not a user-friendly URL and it creates duplicate content for SEO. Are there any other reliable methods of tracking RSS subscribers? How does Feedburner estimate subscribers?
There isn't really a standard way to do this. Subscriber counting is always unreliable but you can get good estimates with it.
Here's how Google does it (source):
Subscribers counts are calculated by matching IP address and feed reader
combinations, then using our detailed understanding of the multitude of
readers, aggregators, and bots on the market to make additional inferences.
Of course part of this is easy for Google, as they can first calculate how many Google Reader users are subscribed to the feed in question. After that they use IP address matching also, and that's what you should use too.
You could calculate individual IP addresses (i.e. unique) from the web-servers logs, but that would count 10 people as 1 if they all use the same address. That's why you should inspect the HTTP-headers which are sent by the client, more specifically header fields HTTP_X_FORWARDED_FOR and HTTP_VIA. You could use the HTTP_VIA address as the "main" address, and then calculate how many unique HTTP_X_FORWARDED_FOR addresses are subscribed to the feed. If the subscriber doesn't have these proxy-added fields, then it's counted as a unique IP address. These should be handled in the code that generates the feed. You could also add a GeoIP lookup for the IP's and store everything to a database. This would allow you to see which country has the most subscribers to your feed.
This has it's problems too. All proxies don't use these fields and it doesn't fix the problem of calculating subscribers behind NAT gateways. It is however a good estimate. Besides, you are probably more interested in the order of magnitude rather than the exact count of subscribers, aren't you? If the counter says that you have 5989 subscribers you probably have more subscribers as the counter gives you the lower bound.
Standard and Reliable are not exactly word in RSS dictionary :-) Got to remember that the thing doesn't even have standard XSD after how many years ? If by tracking you mean the "count" there are a few things you can do and the tactics depend on the purpose i.e. are demonstrating a big number or small number? It is a marketing thing so you have to define your goals :-)
You may have to classify IP numbers for a start - to have the basic collection of big / corporate / umbrella IP numbers. For them, you can use referrer as a reasonable filtering criteria and count everything else as unique unless proven otherwise. Vast majority of IP numbers remain stable for about 2 days but again it always good to use basic referrer logic as a filter for people who just keep "clicking" so to speak.
Then you need a decent list of aggregators and a classification on how they process URLs and if they obscure end readers completely then you need either published or inferred averages - it's always fair game to use equitable distribution of an average count. Using cookies may help to collect aggregator IPs and differentiate between automated agents and individuals.
One very important thing is to keep in mind that you can't use just one method and expect it to be a silver bullet - you need to use these 3-4 aspects at the same time plus basic statistical reasoning.
You could query your web server logs for traffic to your RSS feed, perhaps filter it by IP to get the number of uniques.
The problem is, that would rely on folks checking the feed daily. The frequency of hits to your RSS feed by one individual could vary day to do and the number could be lower.
If you configure your RSS feed to require some kind of authentication, you can do user-based metrics instead of ip-based metrics. Although this would be a technically-correct solution, getting people to opt into an authenticated blog in anything other than an Intranet scenario is a stretch.

Resources