I'm hoping this will be a quick answer (probably a 'No').
I have set up a web service on Server B to receive HTTP POST data in JSON format from Server A. I don't have code level access to Server A, but I can manually trigger it to send data to my web service.
My current problem is that I have asked the Server A guys to send me a sample of what is being sent so I can program for the formats etc, but they are taking their sweet time responding.
I know the sending is working, and my WS is responding with my default return string (though Server A is seeing it as an error rather than success .. I don't know what they are expecting back for a successful transmission yet).
I am wondering if it is possible to receive and analyse the data without knowing exactly what is being sent? This way I can start my next phase of coding without needing to wait for them to provide a sample. Plus, I'm not sure how much the format will change for different jobs, so would be good to be able to accept whatever is sent and be able to look at it.
EDIT: To add more background.
Server A is a production application that we use. We have just found out that they have an API that can send data to us (HTTP POST in JSON format) each time one of our users completes whatever they are doing. We want to then store this data to build tables/stats for our clients to view (but that is another story).
You... can try putting together some dummy data, if you know enough about the type of data you'll be getting... But if you don't even know what shape it'll be, I don't know how in the world you would "analyze" it. Unless by "analyze," you mean get the size or something generic like that...
Related
I am writing a web application for an academic research group. The researchers need to be able to upload large data sets (100MB - 1GB) in CSV format. I've written the server to process the data as it comes in. This means that if there is an error in the first row of the CSV, we can return an error straight away.
However, when this happens, the browser reports that "The connection was reset" or similar. Clearly, my web server is responding in a way that doesn't make sense.
If I explicitly close the HTTP request stream (this is Kotlin on the JVM by the way) before returning the error to the browser, then the problem goes away. However, it turns out that the close implementation of the request stream first goes and reads the whole stream to its end. So at that point the user still has to wait 30mins+ to find out that there is an error in the first row of their CSV.
Is what I am trying to do possible? Does the HTTP protocol permit a web server, in any circumstances, to begin responding before the full request body has been sent? If not, can you suggest a workaround that would allow me to deliver a user experience where the user doesn't have to wait for the whole file to be uploaded before finding out if there are any problems?
The answer is yes, according to the http spec servers should be able to send responses early and the client should stop sending the request body. Most browsers however, don't implement this correctly.
In theory, your http server needs to return a 4xx error code with a response body, then reset the connection to prevent the upload continuing in the background. See the answers below for a more detailed description of the issue. There are a couple of browser versions that do support this, so if you're doing this in lab conditions where you can control the client being used the links below will help.
https://stackoverflow.com/a/14483857/2274303
https://stackoverflow.com/a/18370751/2274303
[edit]
To answer your question about using a workaround, chunking the uploads using javascript is a good way to mitigate internet connectivity issues, but if you want to parse it in real time it's not as simple as arbitrarily breaking up the file into pieces. You need to make sure you're not splitting the file in the middle of a line, otherwise it will fail even if the data is valid. That brings up the issue of parsing a 1GB file in javascript, which isn't a good idea imo.
If you want to use javascript, continue uploading the entire file at once via an ajax request, so you can get the response outside of the main dom and force a redirect or cancel the upload. Depending on which js libraries you're using there are different ways of doing this.
None of this solves the reverse scenario. What if the file is 95% uploaded before there's an error? The researcher will need to either upload the whole thing again or edit the file to only include the rows from the error going forward. That means your application needs to support partial uploads and know to pick up where it left off. All these things are possible, but you're probably not going to find a simple workaround to get this working well.
Without understanding the dataset and what kind of validation you are doing it's hard to come up with a full solution. If parsing each row doesn't depend on the previous rows being valid, you could always upload the whole file, then display the rows with errors at the end and ask them to upload a second file with just the corrections.
The normal process of a HTTP web server happens like:
Server listens for request
Client creates request
Client sends request to server
Server processes request
Server creates response
Server sends response to client
Client processes response
The client starts the connection for communication and the server is able to respond on that connection, however if you close the connection the server will need to send a response on another connection. The browser may not allow the server to start a new connection that the client didn't request.
You may be able to respond by reading the first line and creating an error quickly, but the client will not read the response until it is done sending the request.
By sending the file in chunks or asynchronously sending lines of the file, you will be able to give feedback more immediately. You will be sending many smaller requests with the ability to respond in between.
The question was about HTTP protocol. I feel like this would be allowed by the protocol if you wrote a custom app and web app, however if you are using browsers then you must use HTTP as the companies have implemented it. In a custom app you could check for interruptions however most browsers will probably fire a full request before listening for a response, which is also a reason AJAX took off 20years ago.
This probably could not possibly be a more basic HTTP question, but I am very new to web development and I do not even know the right question to ask (evidenced by the fact that googling has not helped).
What I have: an AWS server with an Elastic Beanstalk environment set up. I have successfully compiled, uploaded, and run a simple "Hello World" program to the environment using Eclipse.
What I want to do: pass the server a number via HTTP request and have the server give me back an HTTP response containing the square of that number. On the back end, I want a simple Java class to do the squaring. (Of course, the goal is to be able to pass more complicated data to the server and have more sophisticated Java code on the back end for processing.)
What I think I need to do: create a Java Servlet to listen for and process the request. I think (hope) the documentation is good enough that I can figure out the HTTPServlet API, but I can't answer a more basic question: how do you pass an HTTP request containing some elementary data, like a number?
Thanks in advance!
You need to either GET, or POST (or PUT) your data. GET provides the data in the URL of the request, and will be displayed in the browser's address bar. POST data is provided as a separate request body.
http://www.w3schools.com/tags/ref_httpmethods.asp
A simple GET would look like this:
http://example.com/server?number=4
You can make a POST using a browser extension such as PostMan:
https://chrome.google.com/webstore/detail/postman-rest-client/fdmmgilgnpjigdojojpjoooidkmcomcm?hl=en
Or you can do it from the command line using curl:
curl -X POST http://example.com/server -d'data'
Once the data is more complicated than a few variables, you probably want to use POST rather than GET. Also, you can start to think about what your requests are doing. GETs should only retrieve data from the server. If you modify or create data, then POST (or PUT) requests are the methods to use.
As your server becomes more complex, you probably want to start reading about REST.
http://en.wikipedia.org/wiki/Representational_state_transfer
I want to grab some financial data from sites like http://www.fxstreet.com/rates-charts/currency-rates/
up to now I'm using liburl to grab the sourcecode and some regexp search to get the data, which I afterwards store in a file.
Yet there is a little problem:
On the page as I see it in the browser, the data is updated almost each second. When I open the source code however the data I'm looking for changes only every two minutes.
So my program only gets the data with a much lower time-resolution than possible.
I have two questions:
(i) How is it possible that a source-code which remains static over two minutes produces a table that changes every second? What is the mechanism?
(ii) How do I get the data with second time-resolution, i.e. how do I read out such a changing table thats not shown in the sourcecode.
thanks in advance,
David
You can use the network panel in FireBug to examine the HTTP requests being sent out (typically to fetch data) while the page is open. This particular page you've referenced appears to be sending POST requests to http://ttpush.fxstreet.com/http_push/, then receiving and parsing a JSON response.
try sending POST request to http://ttpush.fxstreet.com/http_push/connect, and see what you get
it will continuously load new data
EDIT:
you can use liburl or python, it doesn't really matter. Under HTTP, when you browse the web, you send GET or POST requests.
Go to the website, open the Developer Tools (Chrome)/firebug(firefox plugin) and you will see that after all the data is loaded, there's a request that doesn't close - it stays open.
When you have a website and you want to fetch data continuously, you can do it in a few techniques:
make separate requests (using ajax) every few seconds - this will open a connection for each request, and if you want frequent data updates - it's wasteful
use long polling or server polling - make 1 request that fetches the data. it stays open, and flushes data to the socket (to your browser) whenever it needs. the TCP connection remains open. When the connection times out - you can reopen it. It's more effective than the above normally - but the connection remains open.
use XMPP or some other protocol (not HTTP) - used mainly on chats, like facebook/msn i think., probably google's and some others.
the website you posted uses the second method - when it detects a POST request to that page, it keeps the connection open and dumps data continuously.
What you need to do is make a POST request to that page, you need to see which parameters (if any) are needed to be sent. It doesn't matter how you make the request, as long as you send the right parameters.
you need to read the response with a delimiter - probably every time they want to process data, they send \n or some other delimiter.
Hope this helps. If you see that you still can't get around this let me know and i'll get into more technical details
It is quite easy to update the interface by sending jQuery ajax request and updating with new content. But I need something more specific.
I want to send the response to client without their having requested it and update the content when they have found something new on the server. No need to send an ajax request every time. When the server has new data it sends a response to every client.
Is there any way to do this using HTTP or some specific functionality inside the browser?
Websockets, Comet, HTTP long polling.
It has name server push (you can also find it under name Comet technology). Do search using these keywords and you will find bunch examples, tools and so on. No special protocol is required for that.
Aaah! You are trying to break the principles of the web :) You see if the web was pure MVC (model-view-controller) the 'server' could actually send messages to the client(s) and ask them to update. The issue is that the server could be load balanced and the same request could be sent to different servers. Now if you were to send a message back to the client you'll have to know who all are connected to the server. Let's say the site is quite popular and you have about 100,000 people connecting to it every day. You'll actually have to store the IPs of each of them to know where on the internet they are located and to be able to "push" them a message.
Caveats:
What if they are no longer browsing your website? You see currently there is no way to log out automatically if you close your browser. The server needs to check after a fixed timeout if you have logged out (or you send a new nonce with every response to prevent the server from doing that check)
What about a system restart/crash etc? You'd lose all the IPs that you were keeping track of and you are back to square one - you have people connected to you but until you receive new requests you can't really "send" them data when they may be expecting it as per your model.
Let's take an example of facebook's news feeds or "Most recent" link close to the top right - sometimes while you are browsing your wall you see the number next to most recent has gone up or a new 'feed' has come to the top of your wall post! It's the client sending periodic requests to the server to find out what was updated rather than the other way round
You see, it keeps it simple and restful. You may feel it's inefficient for the client to "poll" the server to pull the data and you'd prefer push, but the design of the server gets simplified :)
I suggest ajax-pulling is the best way to go - you are distributing computation to the client and keeping it simple (KIS principle :)
Of course you can get around it, the question is, is it worth it?
Hope this helps :)
RFC 6202 might be a good read.
I have to create a Java EE application which converts large documents into different formats. Each conversion takes between 10 seconds and 2 minutes.
The SOAP requests will be made from a client application which I also have to create.
What's the best way to handle these long running requests? Clearly the process takes to much time to run without any feedback to the user.
I can think of the following ways to provide some kind of feedback, but I'm not sure if there isn't a better way, perhaps something standardized.
The client performs the request from a thread and the server sends the document in the response, which can take a few minutes. Until then the client shows a "Please wait" message, progress spinner, etc. (This seems to be simple to implement.)
The client sends a "Start conversion" command. The server returns some kind of job ID which the client can use to frequently poll for a status update or the final document. (This seems to be user friendly, because I can display a progress, but also requires the server to be stateful.)
The client sends a "Start conversion" command. The server somehow notifies the client when it is done. (Here I don't even know how to do this)
Are there other approaches? Which one is the best in terms of performance, stability, fault tolerance, user-friendliness, etc.?
Thank you for your answers.
Since this almost all done server-side, there isn't much a client can do besides poll the server somehow for updates on the status.
#1 is OK, but users get impatient really fast. "A few minutes" is a bit too long for most people. You'd need HTTP Streaming to implement #3, but I think that's overkill.
I would just go with #2.
For 3 the server should return a unique ID back to the client and using that ID the client has to ask the server the result at a later time
option 4 for those desiring to use web sockets
you request will be response with a jobId,
you get progress state over the web soket