How to check if any URLs within my website contain specific Google Analytics UTM codes? - google-analytics

The website I manage uses Google Analytics to track URLs. Recently I found out that some of the URLs contain UTM codes and should not. I need some way of determining whether or not URLs that contain the following UTM codes utm_source=redirect or utm_source=redirectfolder are currently on the website and being redirected within the same website. If so, I will need to remove the UTM codes on those URLs, because Google Analytics automatically tracks URLs that redirect within the same domain. So it does not require UTM codes (and this actually hurts the analytics).
My apologies if I sound a little broken here, I am still trying to understand it all myself, as I am a new graduate with a CS degree and I am now the only web developer. I am not asking for anyone to write this for me, just if I could be pointed in the right direction to writing a ColdFusion script that may help with this.

So if I understand correctly your codebase is riddled with problematic URLS. To clean up the URLs programmatically you'll need to do a couple of things up front.
Identify the querystring parameter variable/value pair that needs to be
eliminated.
Create a worker file to access all your .cfm and .cfc files (of interest).
Create a loop that goes through the directories and reads, edits and saves your files (be careful here not to go crazy, maybe do not set to overwrite existing files (like make unique, unless you are sure).
Create a find/replace function or regex expression to target and remove your troublesome parameters
Save your file and move on in the loop.
OR:
You can use and IDE like dreamweaver or sublimetext to locate these via a regex search and spot check and remove.
I would selectively remove the URL parameters, but if you have so many pages that it makes no sense, then programmatic removal would be the way to go.
You will be using cfdirectory, cffile, rematch() (and create an array and rebuild) or find/replace replaceNoCase()
Your cfdirectory call will return a variable and like a query you will spin through it like you do with a normal query and cfoutput.
Pull one or two files out of your repo to create your code with until you are confortable. I would code in exit strategies (fail gracefully) like adding a locatable comments to the change spot so you can check it later manually, or escape out if a file won't write and many other try/catch opportunities.
I hope this helps.

Related

How to get github files information with webscraping

A few weeks back I was faced with a challenge that was basically to use webscraping to get all the files of a GitHub repository and group them by extension and sum the size for those files of that particular extension. The important thing about this is that we SHOULD NOT use Github's API nor any webscraping tool.
My solution was to get the main HTML page as a string, apply a regex to extract all URLs that had <repo_owner>/<repo_name>/blob and <repo_owner>/<repo_name>/tree. From the blob URLs, we could make another request and apply another regex to extract the filesize and lines, for the URLs of the other type we'd make another request to extract more blob URLs. I did this until there are no URLs of the latter.
It solved the problem, but it was a pretty bad solution because we need to make too many requests to GitHub and we're always blocked at some point while analyzing a repository. I applied a delay between requests but it takes a "LOOOT" of time to process one repository, also, if we make like 10 requests simultaneously we'd still have too many requests problem.
Until today this bothers me because I couldn't find a better solution for this. As the challenge isn't valid anymore I'd like to know someone else's ideas of how this could be solved!

A problems with detect data

The task is download the table with names of bookmakers and odds (here).
I can not find in source code part which corresponds to these data. I tried to use chrome extension named SelectorGadget, unsuccessfuly.
Similarly, when I want to open matches (matches) I meet same problem. Thank you for any advice.
The data is not in the HTML, it is dynamically loaded via JavaScript.
From the Terms of Service:
Without prior authorisation in writing from the Provider, Visitors are not authorised to copy, modify, tamper with, distribute, transmit, display, reproduce, transfer, upload, download or otherwise use or alter any of the content of the Website.
Therefore, do not expect us to assist you with breaching their terms of use.

Drupal WebForms Data Structure Reference?

I am relatively new to Drupal. We have a site and I've been asked to jump in and make some changes. I'm working on customizing the output of the Webforms module. I'm having trouble doing so because I can't seem to find a reference to the various data structures Webforms uses.
For example, I need to change something in a preprocess hook. Passed into the hook is a structure called $variables. I can see that attributes are being added to the piece I want to change, so I know I'm in the right hook. What I want to do is add something to the text. But I can't figure out where in $variables the text is so I can change it.
I'm sure what I need to change is in there, but I can't seem to get at it. All the documentation I've found on the web is either "paste this code in" or assumes you know the data structures.
So:
1. Is there a reference anywhere to these structures? $variables is one. $submission, $components are others. There are probably more. I know their contents vary widely with the specific webform, but looking for a general reference.
2. How can I see the contents of one of the structures from inside a hook? I've tried a lot of things, but no luck. Would be great to either have it output to the Apache log, or show up on the screen, something...
Any help would be greatly appreciated. It feels like there's real power here, but I can't get at it because I'm missing some basics.
I would say you need to install 2 modules to figure out what is going on...
First Devel, allowing you to use the dmp function. This will output a whole array to the message area.
And then my new favorite module, Search Krumo.
A webform is generated from large array of data and finding the bit that is relevant to you can often be difficult just looking though the dmp output. Search Krumo puts a search box in the message area allowing you to search for any instances of a string in the whole array structure. When you've found the bit that is relevant it also lets you copy the path to that array element so you can easily modify values buried deep in multi-arrays.
EDIT:
If you don't want the output on the screen but would rather log it then use Devel Debug Log. Very useful for debugging ajax requests etc.
If you just need to log simple strings not whole arrays then the dd function is useful combined with: tail -f /tmp/drupal_debug.txt assuming you have SSH access.

Is static URL better than dynamic URL in terms of SERP?

I've been reading up on SEO and how to construct my links in terms of getting better SERP.
I'm using WordPress as the framework for my site and have custom templates retrieving data from my DB.
What makes a URL dynamic, is the usage of ? and &. Nothing more, nothing less. Google recommends that I should not have too many attributes in my URL - and that's understandable.
Dynamic: www.mysite.com/?id=123&name=some+store+name&city=london
Static: www.mysite.com/london/some+store+name/123
Q1: I don't feel that adding the store ID in this static URL looks nice. But I do need it in order to fetch data from the DB, right?
Reading various blogs, I see many SEO (experts) saying different things, but I feel most of it is just talk without actually proving their statements. We can all agree that static URLs are good in terms of usability (and readability).
Q2: But many claim that static URLs prevent duplicate content. I don't agree on that as all my contents have unique ID. Can anyone comment on this?
Q3: In the end, for the Google search engine (and others) it really doesn't matter if the URL is static or dynamic. But since Google is working towards user friendly content, is that the only argument for having static URLs?
1) There's no problem using DB ids alongside static URLs. Many huge e-commerce and other commercial sites do this (Amazon, eBay... hell, everyone really.)
2) A static URL in and of itself does not prevent duplicate content. There are hundreds of ways duplicates can happen (child pages, external copy, javascript, form fields, ajax, archive sections... the list foes on.)
3) It doesn't matter if it's static or dynamic for indexing. But in terms of ranking well, static URLs with informative (and relevant to the targeted keywords) searches are hugely beneficial. Multivariate testing I've done shows users are also generally re-assured by clean looking URLs in terms of usability.
If you give me some more examples, I can probably help out a bit more.
Urls without parameters are always better. It won't absolutely kill SEO - but it is better not to have them.
!0 years ago Google would ignore parameters and would penalize you for URLs with parameters. Today they are really good at figuring out these db parameters - but not perfect. Among other things Google has to try to figure out which URL parameters matter, and which don't and if parameter order matters.
E.g. you may have URL parameters that store user preferences, navigation state etc. This will just proliferate URLs that Google has to try to decode. So what you should do is:
Right before generating an URL at least sort your parameters.
Convert parameters that matter into things that don't look like parameters. So if I had a shoe store with a urls like http://mystore.com/mypage?category=boots&brand=great&color=red I'd rewrite that to something like http://mystore.com/mypage/category/boots/brand/great/color/red or even better:
http://myscore.com/mypage/boots/great/red
Then you can add the parameters that don't matter for the page content at the end. Google will figure out they don't matter.
The other reason to fix your URLs is that Goolge displays them to users in the SERP, and people are more likely to click on readable URLs than database URLs.
Why do big stores like amazon use database urls? because they are giant, bad urls don't hurt them, and their systems are so large and complex it is the only way to manage it. But for smaller sites with fewer products, readble URLs are achievable and are one of the few advantages a small site can have over a big one.
If anyone observing closely Google SERP results definitely find some part of SERP results are highlighted and bold as well. Now noticing further one can easily find "Search Query" are getting highlighted or bold in "Title" , "Descriptions" and "URL" who are using same "Search Query" in Title, Descriptions and URL as well.
Now thing is if any website URL's are dynamic and coming with parameter ID, they are loosing keywords from Title, Descriptions and URL as well.
Ex:
http://www.johnzaccheofineart.com/catagory-2/?id=4
http://www.johnzaccheofineart.com/painting/johnzaccheo
Sample Search : Painting for Sale
Now easily we can understand difference between static and dynamic URL performance. One URL coming with such word which has no search value, other URL is coming with category name as well as painter name.
So, being a user i will give preference to 2nd one which is understandable from URL itself.

Can anyone provide a good info on the various uses of hash(#) in urls?

I'm developing a software, which is going to provide in-deep information about url's.
While the get-params are simple, I'm having trouble with the hash.
At first it was used to mark places in the document to navigate to, but we're past that now. I've seen JS engines using it to store params similar to the get strings.
So, here's my question: is everything that comes after a hash free game, or are there any conventions about what it should look like?
Try these sites it could help. Fragment Identifier, Wikipedia or Pound Sign, Google
It's got a list of examples you could use.
It all depends on what you need. Hashes are used in modern web applications that make use of asynchronous calls to the server using ajax. This e.g. allows the user to copy the link and receive the same content after pasting (actions taken are put into hash which changes the url which otherwise would remain static).
You want to read http://www.jenitennison.com/blog/node/154

Resources