JSON RouteLinks response to identify traffic light - here-api

I am sending an RME request to obtain speed limit and traffic light information and get back a JSON response. As Here-API provides a lot of different traffic sign types I dont care for (e.g. overtaking etc.) I cannot figure out how those types I am interested in are numerically encoded.
Browsing through the online docs provided by Here I could did not find the information I'm looking for, i.e. enumeration codes assigned to traffic lights
The request I send out looks something like
https://rme.api.here.com/2/matchroute.json?
app_id=<my-app_id>
&app_code=<my-app-code>
&routemode=car
&file=<zip and base64 encoded route info>
&attributes=BASIC_HEIGHT_FCn(*),ROAD_GEOM_FCn(*),ADAS_ATTRIB_FCn(*)
&attributes=ADAS_ATTRIB_FCn(*),SPEED_LIMITS_FCn(*),TRAFFIC_SIGN_FCn(*)

Please see below documents page and go to layers section and check the currently supported layers with the resource Layers.
https://developer.here.com/documentation/fleet-telematics/dev_guide/topics/here-map-content.html
For examples, this link will show about the layer of TRAFFIC_SIGN_FC1 in detail.
https://fleet.api.here.com/1/doc/layer.html?region=WEU&layer=TRAFFIC_SIGN_FC1&app_id={{app_id}}&app_code={{app_code}}

Related

speed limits on highway

I would like to do a project on speed limits on highways in germany. I want to know the distance between changes of the speed limit. To do this I need get a dataset which includes the speed limit traffic signs or the areas where a speed limit is set along ONE highway.
I havent worked with here yet. And before I dig into the details I would like to know if here is the right tool to do this project. And of course it would be nice if you could also tell me briefly how to do it, since I dont't even know where to start in here :)
Thanks a lot!
I tried Openstreetmaps before, but the data is too outdated. For example you cannot see speed limits due to construction works.
I found this link on other posts https://github.com/seaBass3/here-pde-speed-limit
but it seems not valid any more
This can be solved by different approaches, but one the most feasible, is the following:
By using the tool HereTraffic API v7 you can get real-time traffic flow and information about traffic signs, by using query parameters, response structures, and data types.
You can get all the information required on real-time traffic flow data in JSON, including information on speed and jam factor for the region(s) defined in each request. Can also deliver additional data such as the geometry of the road segments in relation to the flow.
Provides aggregated information about traffic incidents in JSON, including the type and location of each traffic incident, status, start and end time, and other relevant data. This data is useful to dynamically optimize route calculations.
And if you need historical information also like from past dates you can always use Here Probe Data to get the data which can be compared with different datasets.
This is one of the examples you can use in order to get all the information you need:
curl -H "Authorization: Bearer $TOKEN" "https://data.traffic.hereapi.com/v7/flow?locationReferencing=shape&in=bbox:13.400,52.500,13.405,52.505"

HERE Traffic Incident API Call Not Displaying Accidents

I am using a HERE API call to request traffic incident data from a particular start time. Whenever I include the "type" key and specify "Accident" as the value, no response is returned. However, switching the value to "Construction" does provide a response.
Does anyone have information on how to make the API call return accident data specifically?
Here is the exact call I am using:
https://traffic.api.here.com/traffic/6.3/incidents.json?app_id={{app_id}}&app_code={{app_code}}&startTime=2017-01-01T00:00:00-05:00&type=Accident&bbox=52.5233,13.4035;52.5181,13.4159
There is no data returned as there are no “Accident” type incidents in that location. This can be seen when skipping the “type” parameter:
https://traffic.api.here.com/traffic/6.3/incidents.xml?app_id=APP_ID&app_code=APP_CODE&startTime=2017-01-01T00:00:00-05:00&bbox=52.5233,13.4035;52.5181,13.4159
Traffic Data is dynamic data. Accident can be located somewhere and disappear in a matter of minutes. We recommend to use wego.here.com to locate an accident, or Bing maps (they are all using our services and traffic data).
We were able to find one in Germany right now (should be there until 13:34 German time):
https://traffic.api.here.com/traffic/6.3/incidents.xml?app_id=app_id&app_code=app_code&startTime=2017-01-01T00:00:00-05:00&prox=51.52427,11.85887,15&type=Accident
Hope this helps! Happy Coding!

How to know if a user clicked a link using its network traffic

I have large traffic files that I'm trying to analyze in order to get statistical features of users.
One of the features that I would like to extract is links clicking in specific sites (for examples - clicking on popups and more)
My first idea was to look in the packets' content and search for hrefs and links, save them all in some kind of data structure with their time stamps, and then iterate again over the packets to search for requests at any time close to the time the links appeared.
Something like in the following pseudo code (in the following code, the packets are sorted by flows (flow: IP1 <=> IP2)):
for each packet in each flow:
search for "href" or "http://" or "https://"
save the links with their timestamp
for each packet in each flow:
if it's an HTTP request and its URL matches any URL in the list and the
time is close enough, record it
The problem with this code is that some links are dynamically generated while the page is loading (using javascript or so), and cannot be found using the above method.
I have also tried to check the referrer field in the HTTP header and look for packets that were referred by the relevant sites. This method generates a lot of false positives because of iframes and embedded objects.
It is important to mention that this is not my server, and my intention is to make a tool for statistical analysis of users behavior (thus, I can't add some kind of click tracker to my site).
Does anyone have an idea what can I do in order to check if the users clicked on links according to their network traffic?
Any help will be appreciated!
Thank you

Standard and reliable way to track RSS subscribers?

What's the best way to track RSS subscribers reliably without using Feedburner? Some of the obvious approaches like tracking by IP or by the number of hits have some fata flaws. IP addresses can change with each request or multiple users can use the same IP. Also, feed readers can request a feed multiple times per day or even hour. Both problems make it really hard to get reliable stats on unique subscribers.
I've read articles by both Leo Notenboom and Tim Bray on the topic, but none of their suggestions seems to really solve how to track subscribers in an accurate and reliable way. Leo suggests having a unique ID generated programatically to be appended to the RSS feed URL for each time the referring page is loaded. Tim advocates having RSS readers generate a unique hashtag and also has suggestions ranging from tracking the referrers to using cookies. A unique URL would be reliable, but it has two flaws: It's not a user-friendly URL and it creates duplicate content for SEO. Are there any other reliable methods of tracking RSS subscribers? How does Feedburner estimate subscribers?
There isn't really a standard way to do this. Subscriber counting is always unreliable but you can get good estimates with it.
Here's how Google does it (source):
Subscribers counts are calculated by matching IP address and feed reader
combinations, then using our detailed understanding of the multitude of
readers, aggregators, and bots on the market to make additional inferences.
Of course part of this is easy for Google, as they can first calculate how many Google Reader users are subscribed to the feed in question. After that they use IP address matching also, and that's what you should use too.
You could calculate individual IP addresses (i.e. unique) from the web-servers logs, but that would count 10 people as 1 if they all use the same address. That's why you should inspect the HTTP-headers which are sent by the client, more specifically header fields HTTP_X_FORWARDED_FOR and HTTP_VIA. You could use the HTTP_VIA address as the "main" address, and then calculate how many unique HTTP_X_FORWARDED_FOR addresses are subscribed to the feed. If the subscriber doesn't have these proxy-added fields, then it's counted as a unique IP address. These should be handled in the code that generates the feed. You could also add a GeoIP lookup for the IP's and store everything to a database. This would allow you to see which country has the most subscribers to your feed.
This has it's problems too. All proxies don't use these fields and it doesn't fix the problem of calculating subscribers behind NAT gateways. It is however a good estimate. Besides, you are probably more interested in the order of magnitude rather than the exact count of subscribers, aren't you? If the counter says that you have 5989 subscribers you probably have more subscribers as the counter gives you the lower bound.
Standard and Reliable are not exactly word in RSS dictionary :-) Got to remember that the thing doesn't even have standard XSD after how many years ? If by tracking you mean the "count" there are a few things you can do and the tactics depend on the purpose i.e. are demonstrating a big number or small number? It is a marketing thing so you have to define your goals :-)
You may have to classify IP numbers for a start - to have the basic collection of big / corporate / umbrella IP numbers. For them, you can use referrer as a reasonable filtering criteria and count everything else as unique unless proven otherwise. Vast majority of IP numbers remain stable for about 2 days but again it always good to use basic referrer logic as a filter for people who just keep "clicking" so to speak.
Then you need a decent list of aggregators and a classification on how they process URLs and if they obscure end readers completely then you need either published or inferred averages - it's always fair game to use equitable distribution of an average count. Using cookies may help to collect aggregator IPs and differentiate between automated agents and individuals.
One very important thing is to keep in mind that you can't use just one method and expect it to be a silver bullet - you need to use these 3-4 aspects at the same time plus basic statistical reasoning.
You could query your web server logs for traffic to your RSS feed, perhaps filter it by IP to get the number of uniques.
The problem is, that would rely on folks checking the feed daily. The frequency of hits to your RSS feed by one individual could vary day to do and the number could be lower.
If you configure your RSS feed to require some kind of authentication, you can do user-based metrics instead of ip-based metrics. Although this would be a technically-correct solution, getting people to opt into an authenticated blog in anything other than an Intranet scenario is a stretch.

How to decode google gclids

Now, I realise the initial response to this is likely to be "you can't" or "use analytics", but I'll continue in the hope that someone has more insight than that.
Google adwords with "autotagging" appends a "gclid" (presumably "google click id") to link that sends you to the advertised site. It appears in the web log since it's a query parameter, and it's used by analytics to tie that visit to the ad/campaign.
What I would like to do is to extract any useful information from the gclid in order to do our own analysis on our traffic. The reasons for this are:
Stats are imperfect, but if we are collating them, we know exactly what assumptions we have made, and how they were calculated.
We can tie the data to the rest of our data and produce far more accurate stats wrt conversion rate.
We don't have to rely on javascript for conversions.
Now it is clear that the gclid is base64 encoded (or some close variant), and some parts of it vary more than others. Beyond that, I haven't been able to determine what any of it relates to.
Does anybody have any insight into how I might approach decoding this, or has anybody already related gclids back to compaigns or even accounts?
I have spoken to a couple of people at google, and despite their "don't be evil" motto, they were completely unwilling to discuss the possibility of divulging this information, even under an NDA. It seems they like the monopoly they have over our web stats.
By far the easiest solution is to manually tag your links with Google Analytics campaign tracking parameters (utm_source, utm_campaign, utm_medium, etc.) and then pull out that data.
The gclid is dependent on more than just the adwords account/campaign/etc. If you click on the same adwords ad twice, it could give you different gclids, because there's all sorts of session and cost data associated with that particular click as well.
Gclid is probably not 100% random, true, but I'd be very surprised and concerned if it were possible to extract all your Adwords data from that number. That would be a HUGE security flaw (i.e. an arbitrary user could view your Adwords data). More likely, a pseudo-random gclid is generated with every impression, and if that ad is clicked on, the gclid is logged in Adwords (otherwise it's thrown out). Analytics then uses that number to reconcile the data with Adwords after the fact. Other than that, there's no intrinsic value in the gclid number itself.
In regards to your last point, attempting to crack or reverse-engineer this information is explicitly forbidden in both the Google Analytics and Google Adwords Terms of Service, and is grounds for a permanent ban. Additionally, the TOS that you agreed to when signing up for these services says that it is not your data to use in any way you feel like. Google is providing a free service, so there are strings attached. If you don't like not having complete control over your data, then there are plenty of other solutions out there. However, you will pay a premium for that kind of control.
Google makes nearly all their money from selling ads. Adwords is their biggest money-making product. They're not going to give you confidential information about how it works. They don't know who you are, or what you're going to do with that information. It doesn't matter if you sign an NDA and they have legal recourse to sue you; if you give away that information to a competitor, your life isn't worth enough to pay back the money you will have lost them.
Sorry to break it to you, but "Don't be Evil" or not, Google is a business, not a charity. They didn't become one of the most successful companies in the world by giving away their search algorithm to the first guy who asked for it.
The gclid parameter is encoded in Protocol Buffers, and then in a variant of Base64.
See this guide to decoding the gclid and interpreting it, including an (Apache-licensed) PHP function you can use.
There are basically 3 parameters encoded inside it, one of which is a timestamp. The other 2 as yet are not known.
As far as understanding what these other parameters mean—it may be helpful to compare it to the ei parameter, which is encoded in an extremely similar way (basically Protocol Buffers with the keys stripped out). The ei parameter also has a timestamp, with what seem to be microseconds, and 2 other integers.
FYI, I just posted a quick analysis of some glcid data from my sites on this post. There definitely is some structure to the gclid, but it is difficult to decipher.
I think you can get all the goodies linked to the gclid via google's adword api. Specifically, you can query the click performance report.
https://developers.google.com/adwords/api/docs/appendix/reports#click
I've been working on this problem at our company as well. We'd like to be able to get a better sense of what our AdWords are doing but we're frustrated with limitations in Analytics.
Our current solution is to look in the Apache access logs for GET requests using the regex:
.*[?&]gclid=([^$&]*)
If that exists, then we look at the referer string to get the keyword:
.*[?&]q=([^$&]*).*
An alternative option is to change your Apache web log to start logging the __utmz cookie that google sets, which should have a piece for the keyword in utmctr. Google __utmz cookie and you should be able to find plenty of information.
How accurate is the referer string? Not 100%. Firewalls and security appliances will strip it out. But parsing it out yourself does give you more flexibility than Google Analytics. It would be a great feature to send the gclid to AdWords and get data back, but that feature does not look like it's available.
EDIT: Since I wrote this we've also created our own tags that are appended to each destination url as a request parameter. Each tag is just an md5 hash of the text, ad group, and campaign name. We grab it using regex from the access log and look it up in a SQL database.
This is a non-programmatic way to decode the GCLID parameter. Chances are you are simply trying to figure out the campaign, ad group, keyword, placement, ad that drove the click and conversion. To do this, you can upload the GCLID into AdWords as a separate conversion type and then segment by conversion type to drill down to the criteria that triggered the conversion. These steps:
In AdWords UI, go to Tools->Conversions->Add conversion with source "Import from clicks"
Visit the AdWords help topic about importing conversions https://support.google.com/adwords/answer/7014069 and create a bulk load file with your GCLID values, assigning the conversions to you new "Import from clicks" conversion type
Upload the conversions into AdWords in Tools->Conversions->Conversion actions (Uploads) on left navigation
Go to campaigns tab, Segment->Conversions->Conversion name
Find your new conversion name in the segment list, this is where the conversion came from. Continue this same process on the ad groups and keywords tab until you know the GCLID originating criteria
Well, this is no answer, but the approach is similar to how you'd tackle any cryptography problem.
Possibility 1: They're just random, in which case, you're screwed. This is analogous to a one-time pad.
Possibility 2: They "mean" something. In that case, you have to control the environment.
Get a good database of them. Find gclids for your site, and others. Record all times that all clicks occur, and any other potentially useful data
Get cracking! As you have started already, start regressing your collected data against your known, and see if you can find patterns used decrypting techniques
Start scraping random gclid's, and see where they take you.
I wouldn't hold high hope for this to be successful though, but I do wish you luck!
Looks like my rep is weak, so I'll just post another answer rather than a comment.
This is not an answer, clearly. Just voicing some thoughts.
When you enable auto tagging in Adwords, the gclid params are not added to the destination URLs. Rather they are appended to the destination URLs at run time by the Google click tracking servers. So, one of two things is happening:
The click servers are storing the gclid along with Adwords entity identifiers so that Analytics can later look them up.
The gclid has the entity identifiers encoded in some way so that Analytics can decode them.
From a performance perspective it seems unlikely that Google would implement anything like option 1. Forcing Analytics to "join" the gclid to Adwords IDs seems exceptionally inefficient at scale.
A different approach is to simply look at the referrer data which will at least provide the keyword which was searched.
Here's a thought: Is there a chance the gclid is simply a crytographic hash, a la bit.ly or some other URL shortener?
In which case the contents of the hashed text would be written to a database, and replaced with a unique id.
Afterall, the gclid is shortening a bunch of otherwise long text.
Takes this example:
www.example.com?utm_source=google&utm_medium=cpc
Is converted to this:
www.example.com?gclid=XDF
just like a URL shortener.
One would need a substitution cipher in order to reverse engineer the cryptographic hash... not as easy task: https://crypto.stackexchange.com/questions/300/reverse-engineering-a-hash
Maybe some deep digging into logs, looking for patterns, etc...
I agree with Ophir and Chris. My feeling is that it is purely a serial number / unique click ID, which only opens up its secrets when the Analytics and Adwords systems talk to each other behind the scenes.
Knowing this, I'd recommend looking at the referring URL and pulling as much as possible from this to use in your back end click tracking setup.
For example, I live in NZ, and am using Firefox. This is a search from the Firefox Google toolbar for "stack overflow":
http://www.google.co.nz/search?q=stack+overflow&ie=utf-8&oe=utf-8&aq=t&client=firefox-a&rlz=1R1GGLL_en-GB
You can see that: a) im using .NZ domain, b) my keyword "stack+overflow", c) im running firefox.
Finally, if you also stash the full landing page URL, you can store the GCLID, which will tell you the visitor came from paid, whereas if it doesn't have a GCLID, then the user must have come from natural search (if URL tagging is enabled of course).
This would theoretically allow you to then search for the keyword in your campaign, and figure out which adgroup them came from. Knowing the creative would probably be impossible though, unless you split test your landing URLs or tag them somehow.

Resources