HTTP versus FTP over distributed data - http

I have data files , mostly text files of very large size and they are spread over systems .
Now i want to perform various operations like sorting , searching and other similar operations over these systems .
I have used parallel python so far in order to access other files in different systems and perform the operations like search. but i want to provide a web interface so that clients can send the request through it rather than writing programs for that.
I was wondering on whether to use HTTP or FTP request in order to fetch and process the request .
I apologise if the question sounds ridiculous , since i am myself figuring out stuff.

That pretty much depends on what you're trying to do, IMHO. If you're fetching the files and sorting/searching/whatever them client-side, FTP would be the appropriate protocol (since you're just transferring files). On the other hand, if the files are being processed (sorted/searched/etc) server-side, a HTTP POST request would be more appropriate.
So, judging from your post and what I think you're trying to do, I'd go with a HTTP POST request.

Related

LDAP Proxy with inspection/modification of requests and responses

I need to build an LDAP proxy that I can program to inspect and modify the LDAP requests and responses - some of the LDAP requests/responses will simply be passed through, but for others I might want to send two different requests to the server and then combine the results (that's just one example - there will be other use cases).
I've looked at the proxying options documented for OpenLDAP's slapd, and I see that it has quite flexible configuration and 'overlays', but no capability to insert custom code.
So I think that's not a solution, unless slapd's source code is easy to modify, to insert my own modules plus hooks to/from the existing code (?)
An alternative would be to start with a friendly TCP/IP framework library (or even a complete TCP/IP proxy). Then I can link to an ASN.1 decoding/encoding library, and write the rest myself.
I'd prefer to avoid having to write (& learn) all the TCP/IP connection/message handling and event loop myself.
So I'm looking for the most complete starting point that does the hard work and gives me the flexibility to write what I need. Typical lazy/greedy approach :-)
Must be open source, ideally in C or C++, and I'll probably be targetting RHEL/CentOS 8 in a container.

How to serve multiple connections with a single stream?

I'm just at the high-level of forming a concept here, so I'm really just looking for thoughts/ideas/suggestions about how this might be done -- if there's already something that does this, if I need to roll my own, or if maybe there's a couple separate projects that could be cobbled together to achieve this. Any and all input is appreciated :)
What I'd like to be able to do is have a "stream" (not sure what else to call it) broadcast? of data that will be served over an HTTP connection. This stream will serve events/updates/other data to clients that have subscribed to it (doesn't matter really what the data is, just that the client has subscribed in some fashion). This is nothing even remotely new, but what I'm having trouble tracking down is a way for multiple users/clients/connections to basically share the same stream/connection. In other words, I'm looking for a way for a web server to basically send data once and have all subscribed clients receive it. This way, for high-traffic applications, the web server doesn't have to send data explicitly to each and every listening connection.
I really hope that made sense.
Are there webservers/webserver-plugins that can already do this?
Would it be possible/feasible to adapt some form of video streaming library to achieve this?
Is this something I'll probably have to manage on my own (code that tracks subscribed connections, receives new data from some other service, and transparently replicates said data to each individual connection)?
Any other ideas, thoughts, concerns, caveats, etc?
Take a look at http://wiki.nginx.org/HttpPushStreamModule - it seems to answer what you're after in terms of existing functionality to track subscribed connections and relaying data to each client. It looks like it would take a bit to set up your data source as a channel, but beyond that it handles the rest.
I haven't used that module before, but have used Nginx itself -- it's quite nice and handles concurrent connections well.

When using HTTP, which encoding is better, base64, yEnc, or uuencode?

My requirement is to encode data and send it across network via HTTP, but I am stuck trying to choose the best encoding technique. Which of the three above is best? Is there anything better?
The criteria for "best" should be small size and fast encoding/decoding.
yEnc has less overhead but has other problems mentioned here http://en.wikipedia.org/wiki/YEnc#Criticisms.
What is "best" depends on the criteria you might have and how you plan to send data over the network. Are you using a web service, email or other means. You need to provide more details.
Edit:
Since you are uploading data via HTTP, you do not need to use any of Base64, yEnc or Uuencode. You just use the standard http file upload built in facility in both browser and web server. See this question as a reference:
How does HTTP file upload work?
Also this reference:
http://www.hanselman.com/blog/ABackToBasicsCaseStudyImplementingHTTPFileUploadWithASPNETMVCIncludingTestsAndMocks.aspx

.Net Scenario Based Opinion

I am facing a situation where I am stuck in a very heavy traffic load and keeping the performance high at the same time. Here is my scenario, please read it and advise me with your valuable opinion.
I am going to have a three way communication between my server, client and visitor. When visitor visits my client's website, he will be detected and sent to a intermediate Rule Engine to perform some tasks and output a filtered list of different visitors on my server. On the other side, I have a client who will access those lists. Now what my initial idea was to have a Web Service at my server who will act as a Rule Engine and output resultant lists on an ASPX page. But this seems to be inefficient because there will be huge traffic coming in and the clients will continuously requesting data from those lists so it will be a performance overhead. Kindly suggest me what approach should I do to achieve this scenario so that no deadlock will happen and things work smoothly. I also considered the option for writing and fetching from XML file but its also not very good approach in my case.
NOTE: Please remember that no DB will involve initially, all work will remain outside DB.
Wow, storing data efficiently without a database will be tricky. What you can possibly consider is the following:
Store the visitor data in an object list of some sort and keep it in the application cache on the server.
Periodically flush this list (say after 100 items in the list) to a file on the server - possibly storing it in XML for ease of access (you can associate a schema with it as well to make sure you always get the same structure you need). You can perform this file-writing asynchronously as to avoid keeping the thread locked while writing the file.
The Web Service sounds like a good idea - make it feed off the XML file. Possibly consider breaking the XML file up into several files as well. You can even cache the contents of this file separately so the service feeds of the cached data for added performance benefits...

How do I handle use 100 Continue in a REST web service?

Some background
I am planning to writing a REST service which helps facilitate collaboration between multiple client systems. Similar to how git or hg handle things I want the client to perform all merging locally and for the server to reject new changes unless they have been merged with existing changes.
How I want to handle it
I don't want clients to have to upload all of their change sets before being told they need to merge first. I would like to do this by performing a POST with the Expect 100 Continue header. The server can then verify that it can accept the change sets based on the header information (not hard for me in this case) and either reject the request or send the 100 Continue status through to the client who will then upload the changes.
My problem
As far as I have been able to figure out so far ASP.NET doesn't support this scenario, by the time you see the request in your controller actions the POST body has normally already been completely uploaded. I've had a brief look at WCF REST but I haven't been able to see a way to do it there either, their conditional PUT example has the full request body before rejecting the request.
I'm happy to use any alternative framework that runs on .net or can easily be made to run on Windows Azure.
I can't recommend WcfRestContrib enough. It's free, and it has a lot of abilities.
But I think you need to use OpenRasta instead of WCF in order to do what you're wanting. There's a lot of stuff out there on it, like wiki, blog post 1, blog post 2. It might be a lot to take in, but it's a .NET framework thats truly focused on being RESTful, and not RPC like WCF. And it has the ability work with headers, like you asked about. It even has PipelineContributors, which have access to the whole context of a call and can halt execution, handle redirections, or even render something different than what was expected.
EDIT:
As far as I can tell, this isn't possible in OpenRasta after all, because "100 continue is usually handled by the hosting environment, not by OR, so there’s no support for it as such, because we don’t get a chance to respond in the asp.net pipeline"

Resources