A problems with detect data - web-scraping

The task is download the table with names of bookmakers and odds (here).
I can not find in source code part which corresponds to these data. I tried to use chrome extension named SelectorGadget, unsuccessfuly.
Similarly, when I want to open matches (matches) I meet same problem. Thank you for any advice.

The data is not in the HTML, it is dynamically loaded via JavaScript.
From the Terms of Service:
Without prior authorisation in writing from the Provider, Visitors are not authorised to copy, modify, tamper with, distribute, transmit, display, reproduce, transfer, upload, download or otherwise use or alter any of the content of the Website.
Therefore, do not expect us to assist you with breaching their terms of use.

Related

How to check if any URLs within my website contain specific Google Analytics UTM codes?

The website I manage uses Google Analytics to track URLs. Recently I found out that some of the URLs contain UTM codes and should not. I need some way of determining whether or not URLs that contain the following UTM codes utm_source=redirect or utm_source=redirectfolder are currently on the website and being redirected within the same website. If so, I will need to remove the UTM codes on those URLs, because Google Analytics automatically tracks URLs that redirect within the same domain. So it does not require UTM codes (and this actually hurts the analytics).
My apologies if I sound a little broken here, I am still trying to understand it all myself, as I am a new graduate with a CS degree and I am now the only web developer. I am not asking for anyone to write this for me, just if I could be pointed in the right direction to writing a ColdFusion script that may help with this.
So if I understand correctly your codebase is riddled with problematic URLS. To clean up the URLs programmatically you'll need to do a couple of things up front.
Identify the querystring parameter variable/value pair that needs to be
eliminated.
Create a worker file to access all your .cfm and .cfc files (of interest).
Create a loop that goes through the directories and reads, edits and saves your files (be careful here not to go crazy, maybe do not set to overwrite existing files (like make unique, unless you are sure).
Create a find/replace function or regex expression to target and remove your troublesome parameters
Save your file and move on in the loop.
OR:
You can use and IDE like dreamweaver or sublimetext to locate these via a regex search and spot check and remove.
I would selectively remove the URL parameters, but if you have so many pages that it makes no sense, then programmatic removal would be the way to go.
You will be using cfdirectory, cffile, rematch() (and create an array and rebuild) or find/replace replaceNoCase()
Your cfdirectory call will return a variable and like a query you will spin through it like you do with a normal query and cfoutput.
Pull one or two files out of your repo to create your code with until you are confortable. I would code in exit strategies (fail gracefully) like adding a locatable comments to the change spot so you can check it later manually, or escape out if a file won't write and many other try/catch opportunities.
I hope this helps.

application to list the page elements of an url

I need to make an application which will access an URL(like http://google.com) and return the time spent to load all elements(images, css, js...) and compare this results with the previous results.
This application need to be a Desktop app, and I will save the informations in a text file ou xml, and use this file do compare with previous results.
I have searched for a similar application, but nothing...
There are some plugins for firefox that list these elements, like Yslow or Firebug, but not what I need.
So, i'm totally lost and I don't know how to start this work?
Exists the possibility of make this application? What language is better for this type of application?
Thks!
This is a very objective question, so without you elaborating more on your requirements, you may not get any useful answers.
Some things you would need to answer are: how many URLs you want to check, where are you wanting to store the results (database, files etc), does it need to run on the desktop or on a server etc.
Personally, I like the statistics that cURL gives you - DNS time, connect time, receive time etc - so you could write something in PHP, but as I stress that is personal preference and may not suit your situation.

To Develop LMS and Scorm Sequesncing Engine

We want a LMS(coded in ASP.NET/vb.net) which is able to import SCORM packages & display it to learner for viewing content. I am totally new to SCORM and have been shifted to this project. I want to know how can I access SCORM Assessment object's (Test) result, like Learner ID, passed/fail, time.
Can you please guide me what will I need to implement in ASP.NET code to accomplish my goal ?
Task that I have done so far is,
Reading a manifest zip file, unzipping the file and get all information from the file(content name,description,items and launching page) and when user clicks on a particular course a pop up window is launching the page.
I eagerly want to know what I can do next to communicate with the LMS with the APIs. Shall I need to develop my own LMS to get the result,If there is a quiz which is running, all I need to know is the no of questions attempted by the user, whether the user is pass or fail and I need to store all information in the database for individual user so that I can review the result afterwards.
So the task remaining.
Tracking mechanism to deliver the content.
SCORM/LMS sequencing engine that controls the navigation between parts of SCORM conformant course.
Please help.
SLK at codeplex provides a good starting point. However, if you are truly wanting to provide an in-house written SCORM play that is fully compliant, you have a major task ahead of you. In essence there are three party you need to fully develop:
CAM - the unzipping process, which it sounds like you have already achieved.
RTE - the javascript host for SCORM, providing the 8 specified methods. Behind this you also need to implement the SCORM object model, which SLC does help with. If you have implemented all of this, then there should be data entries on the data model that indicate completion etc.
SN - the sequencing and navigation processing. This is significantly the most complex part. I am still in the process of trying to implement this, using SLC, and it is hard. It is the completion of this that will potentially give you more information that will enable you to know what has been done.
it is also worth looking at scorm.com, who are a consultancy, but provide a lot of useful information about the scorm standard.
That is true. SCORM is one of these stadarts where you can implement as little as possible. But you will need some of Javascript with a Backend-Script (JSON to the rescue) so you can track the scorm data, and save it your database.
But let me tell you this: This is the easiest task! Making your own course-creator is a whole other beast.

Excel Upload to database table

I'm looking for the best solution to allow our users to upload XLS spreadsheet so that they can be used to populate tables in our data warehouse (DW).
Our users are heavy Business Object (BO) users, and BO lets you export to XLS. When they have data in a spreadsheet that needs to be loaded to the DW, they need a process to upload the data in the XLS to the DW's db. As a result, we end up with many of these "interfaces" when I think that what we really need is a programmatic automated feed. Using Excel as a data source for inter-system feeds, in my gut, just seems like a bad idea to me.
Question #1: I'd like to see if you agree and why or why not.
OK, there is no swimming against that tide, so I now take as a given that XLS uploads are here to stay for us. Now I need to find the best solution. First, I'll explain what we do now and then what I don't like about it:
Via web pages, we provide empty XLS files (no rows) with a defined set of columns. Each file is intended to be used to update a different target dest table. In each spreadsheet is an "upload" button. Pushing the Upload button results in the macro in the spreadsheet serializing the contents of the file to CSV and FTPing the data to server folder. Periodically, a scheduler fires off an Informatica ETL job that uses the CSV file as input and loads the data into a custom XLS-specific staging table and then, if the records pass edits, into the appropriate target table. Any errors encountered are logged to an error table. For each XLS file uploaded, the data ends up in a separate staging and error table that is specific for the file.
Some of the things I don't like include about our process are:
1) The macro code in the XLS is too exposed, includes passwords for example, can be tampered with and there are issues ensuring that the users are using the latest XLS templates.
2) Business Rule edits are placed in the ETL program, where they should probably be, but because we would like to catch the errors ASAP, i.e, in the spreadsheet, edits are also added to the macro code. This results in duplication of business edits. I want these rules in one place and centrally controlled. IMHO, I think putting any macro code in the XLS introduces a maintenance issue, even calls to stored procedures (some of which we have) or calls to web services (we haven't yet tried to call .NET Web Services from XLS macros.)
3) Every XLS file upload template has its own process with distinct set of staging and error tables and a custom screen for reporting errors encountered. It seems like we need a more generalized re-usable solution.
Besides often getting data exported to XLS from BO, the users like also Excel because it is easier to edit a large number of records and less clunkier than editing individual records via a web interface.
This is the general direction that I am thinking:
First, I want the users to have the ease of editing of Excel with editing, but without including embedded macros in the spreadsheet. I experimented with Farpoint's Grid with Excel compatibility...
http://www.fpoint.com/netproducts/spreadweb/tour/excel.aspx
...and I found that it was quite easy to allow a user the ability to open up an XLS file that resides on their PC and have it open up in a browser and be able to easily access the data read from server-side .NET web code. Excel isn't running locally in their browser, but the functionality of Excel is reproduced, presumably through a lot of client sided scripting that I expect would be a real pain to duplicate myself. You can even cut and paste from a local spreadsheet into the web's spreadsheet. This sounds great, by biggest problem is cost. Our company is near death and won't allow us to purchase any new software.
Next, I want to identify the common components across all spreadsheet upload processing and come up with generic processing code. For example, I imagine a table which defines each of our spreadsheets and the format of each including the column names and data type definitions, perhaps in terms of their destination columns instead of hard coding. Based on this table template definition, I can generate XLS templates for download from this table definition. I can also perform simple generic edits to ensure that the data entered matches the table definition. And one common web page can be used to present the data and allow report data type mismatch errors and allow for the user to correct them. I would also define a common table for storing the data in a "staging" table, using a table with two columns, submission #, row num, name and value, perhaps. No more "custom everything" is the goal.
Next I need to decide where to put the business rules. My dept's mgt firmly believes that all loading of data should be done by Informatica ETL batch processes and therefore the rules/edits belong "in Informatica". I have zero experience with Informatica tools, I am more of a .NET guy. I am therefore unsure as to how these rules are implemented but I suspect that they are not reusable in the sense that they can be used by a .NET web page to validate a particular record against. You see, in some cases, when the user is not performing a bulk upload, they do have the ability to edit a specific record and I would like the same edits that were applied by the ETL bulk insert process to be applied to an individual update attempt to a single record via a web page. If the solution to write a single web service or stored procedure that can be called from either the web page doing an update of a single record or called thousands of times for each record in a bulk upload? The latter sounds inefficient.
Your thoughts on anything above would be very much welcomed.
From a cost perspective, the efforts you'll need to go through to re-create spreadsheet functionality on the web will exceed the cost of Farpoint or other controls. Even if you made $20 an hour, do you think you could complete a working product in under 2 weeks? I think you have the facts on your side when you discussed maintenance issues if you allow ETL functionality to exist in Excel - you have twice the amount of work to maintain the transformation rules. I think you need to convince management that in order to create a maintainable, robust solution you need some flexible utilities.
Farpoint is a good choice. There is also SpreadsheetGear that is a .Net engine that interprets Excel macros and can run on a web server. It has a Win32 control that allows you to create a WinForms solution with very Excel interface functionality. Last time I checked there was no web control for the product. It does an excellent job of providing Excel capability for processing large amounts of data.
Good luck. I think you will find a good solution since you seem to have a good grasp of the pro's and con's of all the different potential solutions.

How do you handle attachments in your web application?

Due to a lack of response to my original question, probably due to poor wording on my part. Since then, I have thought about my original question and decided to reword it, hopefully for the better! :)
We create custom business software for our customers, and quite often they want attachments to be added to certain business entities. For example, they want to attach a Word document to a customer, or an image to a job. I'm curious as to how other are handling the following:
How the user attaches documents? Single attachment? Batch attachment?
How you display the attached
documents? Simple list? Detailed list?
And the killer question, how the
user then edits attached documents? Is this even possible in a web environment? Granted the user can just view the attachment.
Is there a good control library to help manage this process?
Our current development environment is ASP.NET and C#, but I don't think this is a pretty agnostic question when it comes to development tools, save for the fact I need to work in a web environment.
It seems we always run into problems with the customer and working with attachments in a web environment so I am looking for some successes that other programmers have had with their user base on how best to interact with attachments.
Start with one file upload control ("Browse button"), and use JavaScript to dynamically add more upload controls if they want to attach multiple files in a single batch.
Display them in a simple list format (Filename, type, size, date), but provide full details somewhere else if they want them.
If they want to edit the files, they have to download them, then re-upload them. Hence, you need a way that they can say "this attachment overrides that old attachment".
I'm not familiar with C# and ASP.NET, so I can't recommend any libraries that will help.
http://developer.yahoo.com/yui/uploader/

Resources