I'm using Node.js and ws for my WebSocket servers and want to know the best practice methods of tracking connections and incoming and outgoing messages with Azure Azure Application Insights.
It appears as though this service is really only designed for HTTP requests and responses so would I be fine if I tracked everything as an event? I'm currently passing the JSON.parse'd connection message values.
What to do here really depends on the semantics of your websocket operations. You will have to track these manually since the Application Insights SDK can't infer the semantics to map to Request/Dependency/Event/Trace the same way it can for HTTP. The method names in the API do indeed make this unclear for non-HTTP, but it becomes clearer if you map the methods to the telemetry schema generated and what those item types actually represent.
If you would consider a receiving a socket message to be semantically beginning an "operation" that would trigger dependencies in your code, you should use trackRequest to record this information. This will populate the information in the most useful way for you to take advantage of the UI in the Azure Portal (eg. response time analysis in the Performance blade or failure rate analysis in the Failures blade). Because this request isn't HTTP, you'll have to mend your data to fit the schema a bit. An example:
client.trackRequest({name:"WS Event (low cardinality name)", url:"WS Event (high cardinality name)", duration:309, resultCode:200, success:true});
In this example, use the name field to describe that items that share this name are related and should be grouped in the UI. Use the url field as information that more completely describes the operation (like GET parameters would in HTTP). For example, name might be "SendInstantMessage" and url might be "SendInstantMessage/user:Bob".
In the same way, if you consider sending a socket message to be request for information from your app, and has meaningful impact how your "operation" acts, you should use trackDependency to record this information. Much like above, doing this will populate the data in most useful way to take advantage of the Portal UI (Application Map in this case would then be able to show you % of failed Websocket calls)
If you find you're using websockets in a way that doesn't really fit into these, tracking as an event as you are now would be the correct use of the API.
How to effectively implement subscription mechanism in G-Wan? Suppose, I want to make g-wan aggregate data from various tickers and farther process it. And, obviously, every feed provides the data in its unique format.
The straightforward way would be to create connections and subscribe to data in the init() function of the connection handler, then parse source info from the responses and dispatch data from the main() function to dedicated queues. But this approach doesn't seem to make any use of the effective task scheduling engine of G-Wan. So, may be a dedicated software would solve the problem faster?
Another approach would be creating dedicated servlets for every subscription. For that, in the main() func of the connection handler, I would need rewriting headers and including names of corresponding servlets. In this case I would employ the whole g-wan machinery. But doesn't the rewriting headers negate all performance advantage of g-wan?
G-WAN already provides a simple publisher/subscriber engine, see the Comet servlet example.
This works fine with slow (typically 1 update per second) feeds.
For real-time and BigData feeds, there's no alternative to using G-WAN protocol handlers (to bypass connection handler rewrites and to precisely define the desired latency).
That's what happened for this project distributing 150 million messages per second via 75,000 channels to 1.5 million subscribers.
We have also made a (now famous) demo for the ORACLE OpenWorld expo in SFO that processed 1.2 billion TPS (transactions per second) on one single server, by using G-WAN as a cache for the ORACLE noSQL database (a Java KV store).
So the limits are more a question of precise tuning than G-WAN's core engine limitations.
I'm having difficulty conceptualising a requirement I have into something that will fit into our nascent SOA/EDA
We have a component I'll call the Data Downloader. This is a facade for an external data provider that has both high latency and a cost associated with every request. I want to take this component and create a re-usable service out of it with a clear contract definition. It is up to me to decide how that contract should work, however its responsibilities are two-fold:
Maintain the parameter list (called a Download Definition) for an upcoming scheduled download
Manage the technical details of the communication to the external service
Basically, it manages the 'how' of the communication. The 'what' and the 'when' are the responsibilities of two other components:
The 'what' is managed by 'Clients' who are responsible for
determining the parameters for the download.
The 'when' is managed by a dedicated scheduling component. Because of the cost associated with the downloads we'd like to batch the requests intraday.
Hopefully this sequence diagram explains the responsibilities of the services:
Because each of the responsibilities are split out in three different components, we get all sorts of potential race conditions with async messaging. For instance when the Scheduler tells the Downloader to do its work, because the 'Append to Download Definition' command is asynchronous, there is no guarantee that the pending requests from Client A have actually been serviced. But this all screams high-coupling to me; why should the Scheduler necessarily know about any 'prerequisite' client requests that need to have been actioned before it can invoke a download?
Some potential solutions we've toyed with:
Make the 'Append to Download Definition' command a blocking request/response operation. But this then breaks the perf. and scalability benefits of having an EDA
Build something in the Downloader to ensure that it only runs when there are no pending commands in its incoming request queue. But that then introduces a dependency on the underlying messaging infrastructure which I don't like either.
Makes me think I'm thinking about this problem in a completely backward way. Or is this just a classic case of someone trying to fit a synchronous RPC requirement into an async event-driven architecture?
The thing I like most about EDA and SOA, is that it almost completely eliminates the notion of race condition. As long as your events are associated with some association key (e.g. downloadId), the problem you describe can be addressed with several solutions of different complexities - depending on your needs. I'm not sure I totally understand the described use-case but I will try my best
Out of the top of my head:
DataDownloader maintains a list of received Download Definitions and a list of triggered downloads. When a definition is received it is checked against the triggers list to see if the associated download has already been triggered, and if it was, execute the download. When a TriggerDownloadCommand is recieved, the definitions list is checked against a definition with the associated downloadId.
For more complex situation, consider using the Saga pattern, which is implemented by some 3rd party messaging infrastructures. With some simple configuration, it will handle both messages, and initiate the actual download when the required condition is satisfied. This is more appropriate for distributed systems, where an in-memory collection is out of the question.
You can also configure your scheduler (or the trigger command handler) to retry when an error is signaled (e.g. by an exception), in order to avoid that race condition, and ultimately give up after a specified timeout.
Does this help?
What is the REST protocol and what does it differ from HTTP protocol ?
REST is a design style for protocols, it was developed by Roy Fielding in his PhD dissertation and formalised the approach behind HTTP/1.0, finding what worked well with it, and then using this more structured understanding of it to influence the design of HTTP/1.1. So, while it was after-the-fact in a lot of ways, REST is the design style behind HTTP.
Fielding's dissertation can be found at http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm and is very much worth reading, and also very readable. PhD dissertations can be pretty hard-going, but this one is wonderfully well-described and very readable to those of us without a comparable level of Computer Science. It helps that REST itself is pretty simple; it's one of those things that are obvious after someone else has come up with it. (It also for that matter encapsulates a lot of things that older web developers learnt themselves the hard way in one simple style, which made reading it a major "a ha!" moment for many).
Other application-level protocols as well as HTTP can also use REST, but HTTP is the classic example.
Because HTTP uses REST, all uses of HTTP are using a REST system. The description of a web application or service as RESTful or non-RESTful relates to whether it takes advantage of REST or works against it.
The classic example of a RESTful system is a "plain" website without cookies (cookies aren't always counter to REST, but they can be): Client state is changed by the user clicking a link which loads another page, or doing GET form queries which brings results. POST form queries can change both server and client state (the server does something on the basis of the POST, and then sends a hypertext document that describes the new state). URIs describe resources, but the entity (document) describing it may differ according to content-type or language preferred by the user. Finally, it's always been possible for browsers to update the page itself through PUT and DELETE though this has never been very common and if anything is less so now.
The classic example of a non-RESTful system using HTTP is something which treats HTTP as if it was a transport protocol, and with every request sends a POST of data to the same URI which is then acted upon in an RPC-like manner, possibly with the connection itself having shared state.
A RESTful computer-readable (i.e. not a website in a browser, but something used programmatically) system would obtain information about the resources concerned by GETting URI which would then return a document (e.g. in XML, but not necessarily) which would describe the state of the resource, including URIs to related resources (hypermedia therefore), change their state through PUTting entities describing the new state or DELETEing them, and have other actions performed by POSTing.
Key advantages are:
Scalability: The lack of shared state makes for a much more scalable system (demonstrated to me massively when I removed all use of session state from a heavily hit website, while I was expecting it to give a bit of extra performance, even a long-time anti-session advocate like myself was blown-away by the massive gain from removing what had been pretty slim use of sessions, it wasn't even why I had been removing them!)
Simplicity: There are a few different ways in which REST is simpler than more RPC-like models, in particular there are only a few "verbs" that are ever possible, and each type of resource can be reasoned about in reasonable isolation to the others.
Lightweight Entities: More RPC-like models tend to end up with a lot of data in the entities sent both ways just to reflect the RPC-like model. This isn't needed. Indeed, sometimes a simple plain-text document is all that is really needed in a given case, in which case with REST, that's all we would need to send (though this would be an "end-result" case only, since plain-text doesn't link to related resources). Another classic example is a request to obtain an image file, RPC-like models generally have to wrap it in another format, and perhaps encode it in some way to let it sit within the parent format (e.g. if the RPC-like model uses XML, the image will need to be base-64'd or similar to fit into valid XML). A RESTful model would just transmit the file the same as it does to a browser.
Human Readable Results: Not necessarily so, but it is often easy to build a RESTful webservice where the results are relatively easy to read, which aids debugging and development no end. I've even built one where an XSLT meant that the entire thing could be used by humans as a (relatively crude) website, though it wasn't primarily for human-use (essentially, the XSLT served as a client to present it to users, it wasn't even in the spec, just done to make my own development easier!).
Looser binding between server and client: Leads to easier later development or moves in how the system is hosted. Indeed, if you keep to the hypertext model, you can change the entire structure, including moving from single-host to multiple hosts for different services, without changing client code at all.
Caching: For the GET operations where the client obtains information about the state of a resource, standard HTTP caching mechanisms allow both for statements that the resource won't meaningfully change until a certain date at the earliest (no need to query at all until then) or that it hasn't changed since the last query (send a couple hundred bytes of headers saying this rather than several kilobytes of data). The improvement in performance can be immense (big enough to move the performance of something from the point where it is impractical to use to the point where performance is no longer a concern, in some cases).
Availability of toolkits: Because it works at a relatively simple level, if you have a webserver you can build a RESTful system's server and if you have any sort of HTTP client API (XHR in browser javascript, HttpWebRequest in .NET, etc) you can build a RESTful system's client.
Resiliance: In particular, the lack of shared state means that a client can die and come back into use without the server knowing, and even the server can die and come back into use without the client knowing. Obviously communications during that period will fail, but once the server is back online things can just continue as they were. This also really simplifies the use of web-farms for redundancy and performance - each server acts like it's the only server there is, and it doesn't matter that its actually only dealing with a fraction of the requests from a given client.
REST is an approach that leverages the HTTP protocol, and is not an alternative to it.
http://en.wikipedia.org/wiki/Representational_State_Transfer
Data is uniquely referenced by URL and can be acted upon using HTTP operations (GET, PUT, POST, DELETE, etc). A wide variety of mime types are supported for the message/response but XML and JSON are the most common.
For example to read data about a customer you could use an HTTP get operation with the URL http://www.example.com/customers/1. If you want to delete that customer, simply use the HTTP delete operation with the same URL.
The Java code below demonstrates how to make a REST call over the HTTP protocol:
String uri =
"http://www.example.com/customers/1";
URL url = new URL(uri);
HttpURLConnection connection =
(HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "application/xml");
JAXBContext jc = JAXBContext.newInstance(Customer.class);
InputStream xml = connection.getInputStream();
Customer customer =
(Customer) jc.createUnmarshaller().unmarshal(xml);
connection.disconnect();
For a Java (JAX-RS) example see:
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-45.html
REST is not a protocol, it is a generalized architecture for describing a stateless, caching client-server distributed-media platform. A REST architecture can be implemented using a number of different communication protocols, though HTTP is by far the most common.
REST is not a protocol, it is a way of exposing your application, mostly done over HTTP.
for example, you want to expose an api of your application that does getClientById
instead of creating a URL
yourapi.com/getClientById?id=4
you can do
yourapi.com/clients/id/4
since you are using a GET method it means that you want to GET data
You take advantage over the HTTP methods: GET/DELETE/PUT
yourapi.com/clients/id/4 can also deal with delete, if you send a delete method and not GET, meaning that you want to dekete the record
All the answers are good.
I hereby add a detailed description of REST and how it uses HTTP.
REST = Representational State Transfer
REST is a set of rules, that when followed, enable you to build a distributed application that has a specific set of desirable constraints.
It is stateless, which means that ideally no connection should be maintained between the client and server.
It is the responsibility of the client to pass its context to the server and then the server can store this context to process the client's further request. For example, session maintained by server is identified by session identifier passed by the client.
Advantages of Statelessness:
Web Services can treat each method calls separately.
Web Services need not maintain the client's previous interaction.
This in turn simplifies application design.
HTTP is itself a stateless protocol unlike TCP and thus RESTful Web Services work seamlessly with the HTTP protocols.
Disadvantages of Statelessness:
One extra layer in the form of heading needs to be added to every request to preserve the client's state.
For security we may need to add a header info to every request.
HTTP Methods supported by REST:
GET: /string/someotherstring:
It is idempotent(means multiple calls should return the same results every time) and should ideally return the same results every time a call is made
PUT:
Same like GET. Idempotent and is used to update resources.
POST: should contain a url and body
Used for creating resources. Multiple calls should ideally return different results and should create multiple products.
DELETE:
Used to delete resources on the server.
HEAD:
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The meta information contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request.
OPTIONS:
This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval.
HTTP Responses
Go here for all the responses.
Here are a few important ones:
200 - OK
3XX - Additional information needed from the client and url redirection
400 - Bad request
401 - Unauthorized to access
403 - Forbidden
The request was valid, but the server is refusing action. The user might not have the necessary permissions for a resource, or may need an account of some sort.
404 - Not Found
The requested resource could not be found but may be available in the future. Subsequent requests by the client are permissible.
405 - Method Not Allowed
A request method is not supported for the requested resource; for example, a GET request on a form that requires data to be presented via POST, or a PUT request on a read-only resource.
404 - Request not found
500 - Internal Server Failure
502 - Bad Gateway Error
I am soon to embark on a medium scale project. Although this isn't a very high priority in my large list of things to do but I have been trying of how I could affectively handle data concurrency.
I will be using a stateless EJB backend to my flex application.
Ideally I am looking for a simple method to deal with data concurrency. e.g. if data is saved on one interface it is refreshed in another. Or it warns that the data has been changed before saving a new version of the data.
Has anyone any ideas as I am at a loss at the moment. As I mentioned its not a high priority but I would feel a lot better if I had some mechanism to improve the process.
If you are planning on using AMF channels for communication you can use the long polling feature to effectively give your application "push message" type support. Both the BlazeDS and/or GraniteDS data services support this capability for exactly the reasons you mentioned.
Version control systems store user_id and datetime for every revision. You can use same method. Client app get current datetime for requested data and save it. App send on changed data with saved datetime. Server checks datetime of last revision and received datetime. And reply to app accordingly.
Second method is using broadcast messages from server to clients. But I don't think it's applicable in your case. This method put into practice in LAN (environment with stable connect) usually.