Web scraping client abstraction - compatibility with future web API - web-scraping

I'm creating a client for a web site, which will scrap this website for data.
What I would like to do, is to design API of this client in the way, that it could be used without modifications, if a web API was created in the future.
Currently the website does not provide any web API. It does use AJAX, so parts of its functionality can be easily reused within the client.
The biggest issue I'm dealing with now, is that some data is not identified by integers. Instead a string is used, which describes name of the object. So, if I were to use integer in the abstraction and string in web scraping implementation, I would have to use some sort of mapping between integers and strings.
So my question is: should I continue trying to create a "perfect" abstraction for the client? Or should I just create web scraping client and if/when web API is available, I would create a new client?

If I understand what you are asking, you are wondering if it is worthwhile to create in intermediate API which your client talks to, and then the intermediate API does the web scraping:
client-->API-->Web Site
Then when the Web site creates an API, your API would talk to it without modifications to the client:
client-->API-->Web Site API
versus just continuing to have the client scrape the web site directly until the web site provides an API:
client-->Web Site
And then have the client talk to the API:
client-->Web Site API
It's difficult to give you an answer without understanding your situation, but here are some considerations that can help you make decision:
How difficult will it be to update the client? If there are many clients or its difficult to update them, then hiding some logic in your own API makes sense.
How likely is it that the Web Site API will match directly to your API? You may need to change your client anyway if your API doesn't fit with the web site API.
Will some other website provide a better or cheaper service? If so you could switch to that other website with less impact on the client by using an API.

Related

Frontend-backend communication for a mobile app

I am pretty new to stuff related to server and backend services and I want to develop a mobile app with a backend part. I want this backend to serve an ios app, an android app as well as a website.
My concerns today are how does the frontend part communicate with the backend part :
does it work the same way a website works ? (Http request to the server ?)
how does happen the exchange of datas between the frontend and the backend ?
which are the common solutions to my problem ?
is there an efficient way to desing this backend to serve mobile apps as well as a website ?
is parse (https://parse.com/) a good starting point ?
Thanks
Looking at your questions in turn:
does it work the same way a website works ? (Http request to the server ?)
There are many options, but probably the most common, or fashionable, at the moment is to use a RESTFUL interface:
http://en.wikipedia.org/wiki/Representational_state_transfer
Previously, a SOAP based web service might have been the most common choice:
http://en.wikipedia.org/wiki/SOAP
See here for some discussion on why you might use REST rather than the SOAP now:
Why would one use REST instead of SOAP based services?
how does happen the exchange of datas between the frontend and the backend ?
Assuming REST, HTTP is used to transport messages and application data is typically included in XML or JSON forms
which are the common solutions to my problem ?
I think this is covered by the other parts of the question/answer.
is there an efficient way to desing this backend to serve mobile apps as well as a website ?
Thats very dependent on your particular server application, especially its size and architecture. If the server application is broken down into components or parts, and the parts that generate the 'views' or the 'HTML' pages for the web app are distinct and well separated from the 'backend' parts of your server application, AND your application is of a type that the functionality is largely the same whether the end user is using a web site or a mobile and it is just the way the view are generated for the different devices that differs, then an efficient design would be one that keeps as much of the backend common as possible. If the use of the application is very different when used by a mobile client this may not make sense. More generally, an efficient design would keep as much functionality as possible common between the Mobile and Web applications.
It would definitely be worth becoming familiar with the 'Model View Controller' architectural pattern as most of the server side frameworks, as well as many of the Javascript Web client frameworks and even the iOS and (to a lesser extent) Android frameworks use these concepts:
http://en.wikipedia.org/wiki/Model–view–controller
One important considerations whether you need 'push' or notification like functionality on your mobile app. If so you may want to look at some of the common solutions to understand if they meet your needs - probably easiest to start with Apple and Google's offerings to get an understanding, but there are lots of other solutions available also:
https://developer.apple.com/library/ios/documentation/NetworkingInternet/Conceptual/RemoteNotificationsPG/Chapters/ApplePushService.html
http://developer.android.com/google/gcm/index.html
is parse (https://parse.com/) a good starting point ?
I am not familiar with this service but you might be better looking at a simple REST based approach first and see if it meets your needs.
To answer your question
is parse (https://parse.com/) a good starting point ?
Yes it is.
But I would recommend you to read well on topics such as
REST services
RESTful services vs SOAP - a good article
REST/JSON vs REST/JSON
Services such as parse are called Mobile Backend as a Service (MBaaS).They are ideal to quickly create web services for mobile developers who have little experience with backend development.
A quick search on google on 'MBaaS' will return many services similar to parse and most offer free developer accounts. (With a certain Number of free API calls per second/app)
I have used Apigee similarly & the open source equivalent is Usergrid.
These services will provide a GUI for the developer to create & deploy services and the services are immediately available.
Separate test & production end points will be available.
In addition to basic CRUD operations, these services will also enable easy social network integration, caching & analytics (Depends on service provider)
Features such as security, scalability are built in by the MBaaS provider(Like Parse).

Using WebAPI for everything including web site?

I am building a web site, and have created an API for it using WebAPI. The API is secured using OAuth v1 using DotNetOpenAuth and all is working fine with an iPhone app calling into the API. I would like to go back and make the pertinent parts of the web use the API too so that evrything always goes through the API.
The part I am slightly confused about though is if I make my website login go through the API, set up the web site as an OAuth Consumer, get an OAuth token for the current user, should I then in the web site code make a http call into WebAPI on the same box to call the API passing my OAuth token? (in the HTTP Auth header)
It seems like quite an inefficient way to get the web site to call the API as all calls require the server side to make a HTTP call as well, doesn't sound particularly scalable to me? I am not sure of the alternatives though given I want to use OAuth to secure the API.
This is a good question and keeps coming up (since you clearly realise the overhead of having another network-bound hop):
Do I need to consume my own API in my ASP.NET MVC or bypass API and go straight to the business logic?
I have tried to explain this in a blog post (towards the end of the post). However, in short: it depends. If your API and MVC site are part of the same application, then they sit next to each other as they are both the Presentation Layer - as I explain in the post. If, however, your API is the presentation layer of an SOA service and used by several clients including your MVC site, then yes it has to be separate.
In your case, I am inclined to put the MVC side by side your Web API - accessing the same business layer. And I believe this also fixes the OAuth issue you are having.

WCF and ASP.NET Web API: Benefits of both?

I'm about to start a project where we have a back-end service to do long-winded processing so that our ASP.NET website is free to do quicker requests. As a result I have been reading up on services such as WCF and Web API to get a feel for what they do. Since this back-end service will actually be made up of several services communicating to each other and will not be publicly available to our customers, it seems that WCF is the ideal technology for this kind of scenario.
But after doing a lot of research I am still confused as to the benefits and differences between WCF and Web API. In general it seems that:
If you want a public and/or a RESTful API then Web API is best
WCF can support far more transports than just HTTP so you can have far more control over them
Web API development seems easier than WCF due to the additional features/complexity of WCF
But perhaps my question boils down to the following:
Why would a REST service be more beneficial anyway? Would a full blown WCF service ever be a good idea for a public API? Or is there anything that a WCF service could provide that Web API cannot?
Conversely, if I have a number of internal services that need to communicate with each other and would be happy to just use HTTP as the transport, does Web API suddenly become a viable option?
I answered a couple of related questions:
What is the future of ASP.NET MVC framework after releasing the asp.net Web API
Should it be a WebAPI or asmx
As an additional resource, I would like to recommend you to read:
http://www.codeproject.com/Articles/341414/WCF-or-ASP-NET-Web-APIs-My-two-cents-on-the-subjec
If you want to learn more about REST, check this Martin Fowler article
Summaring up:
As far as I know, both technologies are being developed by the same team in Microsoft, WCF won't be discontinued, it will still be an option (for example, if you want to increase the performance of your services, you could expose them through TCP or Named Pipes). The future is clearly Web API
WCF is built to work with SOAP
Web API is built to work with HTTP
In order to take the correct choice:
If your intention is to create services that support special scenarios – one way messaging, message queues, duplex communication etc, then you’re better of picking WCF
If you want to create services that can use fast transport channels when available, such as TCP, Named Pipes, or maybe even UDP (in WCF 4.5), and you also want to support HTTP when all other transports are unavailable, then you’re better off with WCF and using both SOAP-based bindings and the WebHttp binding.
If you want to create resource-oriented services over HTTP that can use the full features of HTTP – define cache control for browsers, versioning and concurrency using ETags, pass various content types such as images, documents, HTML pages etc., use URI templates to include Task URIs in your responses, then the new Web APIs are the best choice for you.
If you want to create a multi-target service that can be used as both resource-oriented service over HTTP and as RPC-style SOAP service over TCP – talk to me first, so I’ll give you some pointers.
One combersome bit of WCF is the need to generate new client proxys when input and/or output models change in the service. REST services don't require proxys, the client simply changes the query string sent or changes to parse and/or use the different output.
I found the default JSON serializers in .Net to be a bit slow, I implemented http://json.codeplex.com/ to do the inbound and output serialzation.
WCF services are not that complex, REST services can be equally challenging as you're working within the confines of HTTP.
ASP.net Web API is all about HTTP and REST based GET,POST,PUT,DELETE with well know ASP.net MVC style of programming and JSON returnable; web API is for all the light weight process and pure HTTP based components. For one to go ahead with WCF even for simple or simplest single web service it will bring all the extra baggage. For light weight simple service for ajax or dynamic calls always WebApi just solves the need. This neatly complements or helps in parallel to the ASP.net MVC.
Check out the podcast : Hanselminutes Podcast 264 - This is not your father's WCF - All about the WebAPI with Glenn Block by Scott Hanselman for more information.

Recommended way to create Public API for ASP.net website that uses Entity Framework

I currently have a webforms asp.net using entity framework to do all the CRUD operations.
I need to create a public facing API for my website.
I need the following from an API:
Authentication of clients consuming the API
Usage Logging, to make sure there is no abuse etc
Throttling as an added extra to make sure one person doesn't overload the API.
Preferably the return data should be able to return in either JSON or XML, based on a flag the calling client uses.
I am looking for any guidance as the the most efficient way to create a public API to cater for these requirements. Suggested Books, Links, suggestions are all and any thing else are welcome.
Doing this in code is definitely do-able, but it's fairly involved for all those functions. An easier way is to use something like 3scale (http://www.3scale.net) which does all of this out of the box (you can issue API keys, rate limit them, get analytics for the API + create a developer portal). Setup is via a code library you drop into your system in general (libraries are here: (https://support.3scale.net/libraries) or there's an API or lastly set up Varnish as an API proxy in front of your application using this mod: https://github.com/3scale/libvmod-3scale/.
For the data return type, typically you would switch this by having .json, .xml in the API requests and handle this as a content type within the code.
Use WebAPI:
ASP.NET Web API is a framework that makes it easy to build HTTP
services that reach a broad range of clients, including browsers and
mobile devices. ASP.NET Web API is an ideal platform for building
RESTful applications on the .NET Framework.
http://blogs.msdn.com/b/henrikn/archive/2012/02/23/using-asp-net-web-api-with-asp-net-web-forms.aspx
http://www.asp.net/web-api/overview/hosting-aspnet-web-api/using-web-api-with-aspnet-web-forms
http://www.beletsky.net/2011/10/integrating-aspnet-mvc-into-legacy-web.html

Web Database or SOAP?

We’ve got a back office CRM application that exposes some of the data in a public ASP.NET site. Currently the ASP.NET site sits on top of a separate cut down version of the back office database (we call this the web database). Daily synchronisation routines keep the databases up-to-date (hosted in the back office). The problem is that the synchronisation logic is very complex and time consuming to change. I was wondering whether using a SOAP service could simply things? The ASP.NET web pages would call the SOAP service which in tern would do the database calls. There would be no need for a separate web database or synchronisation routines. My main concern with the SOAP approach is security because the SOAP service would be exposed to the internet.
Should we stick with our current architecture? Or would the SOAP approach be an improvement?
The short answer is yes, web service calls would be better and would remove the need for synchronization.
The long answer is that you need to understand the technology available for you in terms of web services. I would highly recommend looking into WCF which will allow you to do exactly what you want to do and also you will be able to only expose your services to the ASP.NET web server and not to the entire internet.
There would be no security problem. Simply use one of the secure bindings, like wsHttpBinding.
I'd look at making the web database build process more maintainable
Since security is obviously a concern, this means you need to add logic to limit the types of data & requests and that logic has to live SOMEWHERE.

Resources