Proxys for WebDAV - webdav

I'd like to set up a reverse proxy for my webdav server. The main reason for this is so that I can better control which files are being uploaded to the webdav server. I cannot do this at the webdav server itself, it's a service by alfresco and I have now idea whether or not it's possible to configure the webdav service at all.
In particular I'd like to prevent my mac to do the AppleDouble thingy on the webdav server, i.e. stop my mac from uploading ._* files for every real file I upload. There is as far as I know no way to stop my mac from attempting this.
Does the proxy server need to know more than merely relaying http requests back and forth, does it also need to know something about webdav in order for this to work?
Which proxy servers could your recommend for this?
Günther

Unless I'm missing something, a reverse proxy will have to rewrite header fields (such as Destination: and If:) to work properly and potentially even request/response bodies, and thus is unlikely to work well.
A "proper" proxy shouldn't get in the way, though.

You could do this with SabreDAV. It has a TemporaryFileFilter Plugin that does exactly what you need. Not only does it intercept these resource forks, it also places them in a temporary 'quarantine'. This is important, because OS/X will check if the file was successfully written and fail horribly otherwise.
There will be two things you still need to do to make this work though:
Automatic cleanup of these files (a script suitable for cron is also supplied).
The actual proxy bit. This means you'll have to implement a Collection and a File class that perform the HTTP requests.
Disclaimer: I authored SabreDAV

Related

How to configure GWAN as a reverse proxy?

I saw some performance of GWAN and interested in testing it as a reverse proxy of static content in front of Apache with APC for optimizing PHP opcode, to run a Wordpress multisite. I can get GWAN up and running but I have no idea how to configure it for reverse proxy, as there seems to be almost no information on it. Anyone use GWAN as a reverse proxy?
It's still a not documented feature ... maybe in a next release ? #gil ?
Right now there's no easy way for you to do that. That will change with the next release.
We first hardcoded the reverse-proxy feature in G-WAN along with the load-balancer. Then, as we needed to personalize reverse-proxying, we implemented it as a protocol handler script.
Protocol handler scripts allow users to implement any protocol (like SMTP, LDAP, etc.) without haveing to deal with multi-threading nor socket events.
But finally, to reduce complexity for users, we might revert to the hard-coded implementation with connection handlers scripts to let people personalize the reverse proxy.
It's maturing under different use cases, hence the delay in publicly releasing this feature and a few others.
Rushing to implement features and interfaces is not always optimal, if the goal is to stay flexible and easy to use.

Simulating a remote website locally for testing

I am developing a browser extension. The extension works on external websites we have no control over.
I would like to be able to test the extension. One of the major problems I'm facing is displaying a website 'as-is' locally.
Is it possible to display a website 'as-is' locally?
I want to be able to serve the website exactly as-is locally for testing. This means I want to simulate the exact same HTTP data, including iframe ads, etc.
Is there an easy way to do this?
More info:
I'd like my system to act as closely to the remote website as possible. I'd like to run command fetch for example which would allow me to go to the site in my browser (without the internet on) and get the exact same thing I would otherwise (including information that is not from a single domain, google ads, etc).
I don't mind using a virtual machine if this helps.
I figured this was quite a useful thing in testing. Especially when I have a bug I need to reliably reproduce in sites that have many random factors (what ads show, etc).
As was already mentioned, caching proxies should do the trick for you (BTW, this is the simplest solution). There are quite a lot of different implementations, so you just need to spend some time selecting a proper one (according to my experience squid is a good solution). Anyway, I would like to highlight two other interesting options:
Option 1: Betamax
Betamax is a tool for mocking external HTTP resources such as web services and REST APIs in your tests. The project was inspired by the VCR library for Ruby. Betamax aims to solve these problems by intercepting HTTP connections initiated by your application and replaying previously recorded responses.
Betamax comes in two flavors. The first is an HTTP and HTTPS proxy that can intercept traffic made in any way that respects Java’s http.proxyHost and http.proxyPort system properties. The second is a simple wrapper for Apache HttpClient.
BTW, Betamax has a very interesting feature for you:
Betamax is a testing tool and not a spec-compliant HTTP proxy. It ignores any and all headers that would normally be used to prevent a proxy caching or storing HTTP traffic.
Option 2: Wireshark and replay proxy
Grab all traffic you are interested in using Wireshark and replay it. This I would say it is not that hard to implement required replaying tool, but you can use available solution called replayproxy
Replayproxy parses HTTP streams from .pcap files
opens a TCP socket on port 3128 and listens as a HTTP proxy using the extracted HTTP responses as a cache while refusing all requests for unknown URLs.
Such approach provide you with the full control and bit-to-bit precise simulation.
I don't know if there is an easy way, but there is a way.
You can set up a local webserver, something like IIS, Apache, or minihttpd.
Then you can grab the website contents using wget. (It has an option for mirroring). And many browsers have an option for "save whole web page" that will grab everything, like images.
Ads will most likely come from remote sites, so you may have to manually edit those lines in the HTML to either not reference the actual ad-servers, or set up a mock ad yourself (like a banner image).
Then you can navigate your browser to http://localhost to visit your local website, assuming port 80 which is the default.
Hope this helps!
I assume you want to serve a remote site that's not under your control. In that case you can use a proxy server and have that server cache every response aggressively. However, this has it's limits. First of all you will have to visit every site you intend to use through this proxy (with a browser for example), second you will not be able to emulate form processing.
Alternatively you could use a spider to download all content of a certain website. Depending on the spider software, it may even be able to download JavaScript-built links. You then can use a webserver to serve that content.
This service http://www.json-gen.com provides mock for html, json and xml via rest. By this way, you can test your frontend separately from backend.

Using NGINX to forward tracking data to Flume

I am working on providing analytics for our web property based on instrumentation data we collect via a simple image beacon. Our data pipeline starts with Flume, and I need the fastest possible way to parse query string parameters, form a simple text message and shove it into Flume.
For performance reasons, I am leaning towards nginx. Since serving static image from memory is already supported, my task is reduced to handling the querystring and forwarding a message to Flume. Hence, the question:
What is the simplest reliable way to integrate nginx with Flume? I am thinking about using syslog (Flume supports syslog listeners), but I struggle with how to configure nginx to forward custom log messages to a syslog (or just TCP) listener running on a remote server and on a custom port. Is it possible with existing 3rd party modules for nginx or would I have to write my own?
Separately, anything existing you can recommend for writing a fast $args parser would be much appreciated.
If you think I am on a completely wrong path and can recommend something better performance-wise, feel free to let me know.
Thanks in advance!
You should parse nginx log file like tail -f do and then pass results to Flume. It will be the most simple and reliable way. The problem with syslog is that it blocks nginx and may completely stuck under high-load or if something goes wrong (this is why nginx doesn't support it).

What is the best method to send data from a device to a server

I am currently developing a website for an energy-monitoring company. We are trying to send high volumes data from the devices which record the data to a server so they can be processed in a database. The guy developing the firmware seems to think that the best way to send the data is to produce CSV files and send them via FTP. A program on the server needs to monitor the files received via FTP and run a PHP script to process them. I, however, feel that the best way of sending the data is via HTTP POST.
We had HTTP POST working and then I began trying to work with the CSVs which became a pain as reliably monitoring the files received via FTP meant editing the ProFTPD configuration file (which I found to be a near impossible task) and install a package called mod_exec (which comes with security risks) so that ProFTPD could run a PHP script. These issues and the fact that I am unfamiliar with the linux console which I am required to use extensively to set this up, makes the CSV method very difficult to set up. HTTP POST to me seems like a more direct way of sending the data without having to worry about files or relying on ProFTPD. It would also allow us to use identifiers to give the data being passed meaning as opposed to a string of values for which the meaning is not immediately apparent. In addition, the query string could be URL encoded to pass a multidimensional array which would work well given the type of data being passed.
Nevertheless, just because the HTTP POST method would be easier doesn't mean that the CSV method doesn't have advantages. Furthermore, the firmware guy has far more experience than me with computers so I trust his opinion.
Can you please help me to understand his point of view on the advantages of the CSV method and explain what the best method is?
You're right. FTP has major issues with firewalls, and especially doesn't work well on mobile (NAT'ted) IPv4. HTTP POST works far, far better under such circumstances, if only because nobody accepts an "internet" connection that breaks HTTP.
Furthermore, HTTP is a lot easier on the device as well. It's just a single-socket protocol, with trivial read/write semantics on that socket.
Some more benefits? HTTP has almost-native support for compression (gzip). HTTP transmission can start before the input is complete. HTTP is easier to secure (HTTPS)...
No, there really is little reason to use FTP.
The 'CSV method' (I'd call it the 'FTP method' though) has the advantage of being known to the embedded developer. The receiving side will have to create some way of checking if there is a file though. That adds complexity.
The 'HTTP method' has several advantages:
HTTP is easy to implement on the sending side
No need to create a file-checker
You can reply to the embedded device if everything went OK
I actually just implemented a system just like that (not too much data, but still) and use HTTP POST to send the data. I implemented the HTTP POST myself.

Best way to let users download a file from my website: http or ftp

We have some files on our website that users of our software can download. Some of the files are in virtual folders on the website while others are on our ftp. The files on the ftp are generally accessed by clicking on an ftp:// link in a browser - most of our customers do not have an ftp client. The other files are accessed by clicking an http:// link in a browser.
Should I move all the files to the ftp? or does it not matter? Whats the difference?
HTTP has many advantages over FTP:
it is available in more places (think workplaces which block anything other than HTTP/S)
it works nicely with proxies (FTP requires extra settings for the proxy - like making sure that it allows the CONNECT method)
it provides built-in compression (with GZIP) which almost all browsers can handle (as opposed to FTP which has a non-official "MODE Z" extension)
NAT gateways must be configured in a special mode to support active FTP connections, while passive FTP connections require them to allow access to all ports (it it doesn't have conneciton tracking)
some FTP clients insist on opening a new data connection for each data transfer, which can leave you with a lot of "TIME_WAIT" sockets
If speed matters to your users, and they are technically inclined, http allows multiple connections for one file (if the client supports it. I use DownThemAll). Most browsers should handle ftp links just fine, though.
I think most users, even today, are more familiar with http than ftp and for that reason you should stick with http by default unless there's a compelling reason to use ftp. It's nit-picking, though.
I think it doesn't matter really, because the ftp is also transparent nowdays. You don't have to know anything special, the browser handles all.
I suggest that if they are downloading one file at one time, you can go to http.
However if they have to download several files with one go, I prefer ftp, because it's much more easy to manage.
There are some nice broswer extensions as _l0ser mentioned, but I prefer ftp for mass file-transfer.
Both FTP and HTTP seem sufficient for your needs, so I would definitely recommend choosing the simplest approach, which is either to leave things as they currently are or consolidate on HTTP.
Personally, I would put everything on HTTP. If nothing else, it eliminates an extra server. There is no compelling reason to choose FTP over HTTP anymore, and there are a few small advantages to HTTP (as others have pointed out).

Resources