Self hosted S3 alternative [closed]

Self hosted S3 alternative [closed] - http

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am looking for an S3 alternative which relies on a RESTful API, so that I can simply insert links such as http://datastore1.example.com/ID and they are directly downloadable.
I have looked at RIAK and Bitcache. They both seem very nice: http://bitcache.org/api/rest but they have one problem. I want to be the only one who can upload data. Else anyone could use our datastore by sending a PUT Request.
Is there a way to configure RIAK so that everyone can "GET" but not everyone can PUT or DELETE files except me? Are there other services which you can recommend?
Also adding Bounty :)
Requirements:
RESTful API
Guests GET only
Runs on Debian
Very nice to have:
auto distributed
EDIT: To clarify I don't want any connection to S3 I have great servers just lying around with harddrives and very good network connection (3Gbps) I don't need S3..

The Riak authors recommend to put a HTTP proxy in front of Riak in order to provide access control. You can chose any proxy server you like (such as nginx or Apache), and any access control policy you like (such as authorization based on IP addresses, HTTP basic auth, or cookies, assuming your proxy server can handle it). For example, in nginx, you might specify limit_except (likewise LimitExcept in Apache).
Alternatively, you could also add access control to Riak directly. It's based on Webmachine, so one approach would be to implement is_authorized.

Based on the information that you have given, I would suggest Eucalyptus ( http://open.eucalyptus.com/ ). They do have an S3 compatible storage system.

The reliable, distributed object store RADOS, which is part of the ceph file system, provides an S3 gateway.
We used the Eucalyptus storage system, Walrus, but we had reliably problems.

If you are looking for a distributed file system, why don't you try hadoop hdfs?
http://hadoop.apache.org/common/docs/r0.17.0/hdfs_design.html
There is a Java API available:
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/fs/FileSystem.html
Currently, security is an issue - at least if you have access to a terminal:
http://developer.yahoo.com/hadoop/tutorial/module2.html#perms
But you could deploy hdfs, put an application server (using the Java API) in front of it (GlassFish) and use Jersey to build the RESTful API:
http://jersey.java.net/
If you're interested in building such a thing, please let me know, for I may be building something similar quite soon.
You can use the Cloudera Hadoop Distribution to make life a bit more easy:
http://www.cloudera.com/hadoop/
Greetz,
J.

I guess that you should ask your question on serverfault.com , as it's more system related.
Anyway, I can suggest you mogileFS which scales very well : http://danga.com/mogilefs/ .

WebDAV is about as RESTful as it gets and there are many implementations that scale to various uses. In any case, if it is REST and it is HTTP then whatever authentication scheme that the server supports should allow you to control who can upload.

You can develop it yourself as a web app or a part of your existing application. It will consume HTTP requests, retrieve their URI component, convert it to S3 object name and use getObject() to get its content (using one of available S3 SDKs, for example AWS Java SDK).
You can try a hosted solution - s3auth.com (I'm a developer). It's an open source project, and you can see how this mechanism is implemented internally at one of its core classes. HTTP request is processed by the service and then re-translated to Amazon S3 internal authentication scheme.

Related

Screen scraping in server side

I am new to screen scraping. When i use proxy server and when i track the HTTP transactions, i am getting my post datas revealed to me. So my doubt/problem here is,
1)Will it get stored in the server side or it will be revealed only to the client side?
2)Do we have an option of encrypting the post data in screen scraping?
3)Is it advisable to use screen scraping for banking applications?
I am using screen scraper tool which i have downloaded it from
http://www.screen-scraper.com/download/choose_version.php. (Enterprise version)
Thanks in advance.

My experience with scraping is that if you aren't doing anything super complex (like logging into a secure website like an online banking website, etc.) then Python has some great libraries that will help you out a lot.
To answer your questions:
1) You may need to be more clear, but this really depends on your server/client architecture.
2) As a matter of fact you do. Urllib and Urllib2 (built-in Python libraries) both have functions that enable you to encrypt data before you make a POST. As far as how secure this encryption is, for most applications, this will suffice.
3) I actually have done scraping on online banking sites! I'm not exactly familiar with that tool, but I would recommend using something a little different than a scraper. Selenium, which is a "web-driver", allows you to simulate the use of a browser, meaning anything that the broswer does in the background in order to validate the session is automatically taken care of. The main problem I ran into while trying to scrape the banking site was the loss of important session data.
Selenium - https://pypi.python.org/pypi/selenium
Other libraries you may find useful are: urllib, urllib2, and Mechanize
I hope I was somewhat helpful!

I've used screen-scraper to scrape banking sites before. It will impact the site just like your browser--if the site uses encryption the connection from screen-scraper to the site will be too.
If you have a client page sending data to screen-scraper, you probably should encrypt that. I generally just make the connection via SSH.

1) What do you mean by server side? Your proxy server or screen-scraper software? Any of them can read/store your information.
2) If you are connecting through HTTPS then your software should warn you about malicious proxy server: https://security.stackexchange.com/questions/8145/does-https-prevent-man-in-the-middle-attacks-by-proxy-server
3) I don't think they have some logger which they can read. But if you are concerned you can try to write your own. There are some APIs which you can read HTML easily with jQuery sintax:
https://pypi.python.org/pypi/pyquery or XPath: http://net.tutsplus.com/tutorials/javascript-ajax/web-scraping-with-node-js/

Redis backed ASP.NET SessionState provider [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm currently developing an ASP.NET SessionState custom provider that is backed by Redis using Booksleeve. Redis seemed like a perfect fit for SessionState (if you must use it) because:
Redis can store durably like an RDBMS, however it is much faster.
A Key/Value datastore better fits the interface of SessionState.
Since data is not stored in-process (like the default Session provider), SessionState can live out web server restarts, crashes, etc.
Redis is easy to shard horizontally if that becomes a need.
So, I'm wondering if this will be useful to anyone since we (my company) are considering open sourcing it on GitHub. Thoughts?
UPDATE:
I did release a first version of this yesterday: https://github.com/angieslist/AL-Redis/blob/master/AngiesList.Redis/RedisSessionStateStore.cs

I've created a Redis-based SessionStateStoreProvider that can be found on GitHub using ServiceStatck.Redis as the client (rather than Booksleeve).
It can be installed via NuGet with Install-Package Harbour.RedisSessionStateStore.
I found a few quirks with #NathanD's approach. In my implementation, locks are stored with the session value rather than in a separate key (less round trips to Redis). Additionally, because it uses ServiceStack.Redis, it can used pooled connections.
Finally, it's tested. This was my biggest turn off from #NathanD's approach. There was no way of actually knowing if it worked without running through every use case manually.

Not only would it be useful, but I strongly consider you look closely at the Redis' Hash datatype if you plan to go down this road. In our application the session is basically a small collection of keys and values (i.e.: {user_id: 7, default_timezone: 'America/Chicago', ...}) with the entire user session stored under in a single Redis hash.
Not only does using Hash simplify mapping the data if your session data is similar, but Redis uses space much more efficiently with this approach.
Our app is in ruby, but you might still find some use from what we wrote.

Best Publish/Subscribe "Middleware" [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm in the market for a good open source network based Pub/Sub (observer pattern) library. I haven't found any I like:
JMS - tied to Java, treats message contents as dumb binary blobs
NDDS - $$, use of IDL
CORBA/ICE - Pub/Sub is built on-top of RPC, CORBA API is non-intuitive
JBOSS/ESB - not too familiar with
It would be nice if such a package could to the following:
Network based
Aware of payload data, users should not have to worry about endian/serialization issues
Multiple language support (C++, ruby, Java, python would be nice)
No auto-generated code (no IDLs!)
Intuitive subscription/topic management
For fun, I've created my own. Thoughts?

You might want to look into RabbitMQ.

As pointed-out by an earlier post in this thread, one of your options is OpenSplice DDS which is an Open Source implementation of the OMG DDS Standard (the same standard implemented by NDDS).
The main advantages of OpenSplice DDS over the other middleware you are considering can be summarized as:
Performance
Rich support for QoS (Persistence, Fault-Tolerance, Timeliness, etc.)
Data Centricity (e.g. possibility of querying and filtering data streams)
Something that I'd like to understand is what are your issues with IDL. DDS uses IDL as language-independent way of specifying user data types. However DDS is not limited to IDL, you could be using XML, if you prefer. The advantage of specifying your data types, and decoupling their representation from a specific programming language, is that the middleware can:
(1) take away from you the burden of serializing data,
(2) generate very time/space efficient serialization,
(3) ensure end-to-end type safety,
(4) allow content filtering on the whole data type (not just the header like in JMS), and
(5) enable on-the wire interoperability across programming languages (e.g. Java, C/C++, C#, etc.)
Depending on the system or application you are designing, some of the properties above might not be useful/relevant. In that case, you can simply generate one, a few, "DDS Type" which is the holder of you serialized data.
If you think about JMS, it provides you with 5 different topic types you can use to send your data. With DDS you can do the same, but you have the flexibility to define exactly the topic types.
Finally, you might want to check out this blog entry on Scala and DDS for a longer discussion on why types and static-typing are good especially in distributed systems.
-AC

We use the RTI DDS implementation. It costs $$, but it supports many quality of service parameters.
There is a free DDS implementation called OpenDDS, but I've not used it.
I don't see how you can get around the need to predefine your data types if the target language is statically typed.

Look a bit deeper into the various JMS implementations.
Most of them are not Java only, they provide client libraries for other languages too.
Suns OpenMQ have at least a C++ interface, Apache ActiveMQ provides client side libraries for many common languages.
When it comes to message formats, they're usually decoupled from the message middleware itself. You could define your own message format. You could define your own XML schema and send XML messages. You could send BER encoded ASN.1 using some 3. party library if you want.
Or format and parse the data with a JSON library.

You might be interested in the MUSCLE library (disclaimer: I wrote it, so I may be biased). I think it meets all of the criteria you specified.
https://public.msli.com/lcs/muscle/

Three I've used:
IBM MQ Series - Too Expensive, hard to work with.
Tico Rendezvous - (renamed now to EMS?) Was very fast, used UDP, could also be used with no central server. My favorite but expensive and requires a maint fee.
ActiveMQ - I'm using this currently but finding it crashes frequently. Also it's requires some projects ported from Java like spring.net. It works but I can't recommend it due to stability issues.
Also used MSMQ in an attempt to build my own Pub/Sub, but since it doesn't handle it out of the box your stuck writing a considerable amount of code.

There is also OpenSplice DDS. This one is similar to RTI's DDS, except that it's LGPL!
Check it out:

IBM Webpshere MQ, and the licence is not too expnsive if you work on a corporate level.

You might take a look at PubSubHubbub. It's a extension to Atom/RSS to alow pubsub through webhooks. The interface is HTTP and XML, so it's language-agnostic. It's gaining increasing adoption now that Google Reader, FriendFeed and FeedBurner are using it. The main use case is blogs and stuff, but of course you can have any sort of payload.
The only open source implementation I know of so far is this one for the Google AppEngine. They say support for self-hosting is coming.

Quality Center api using only HTTP?

So.. I set up IE to use WebScarab as a proxy, and then logged into Quality Center. Lo and behold, the program uses HTTP to do all its communication with the server, and the all commands and responses are human-readable text. It ain't XML, it ain't JSON, but its human-readable and I'm pretty sure I could write it if I had to.
So.. is this protocol documented anywhere? Are you "supposed" to be able to use this? Anybody have any experience using it anyway?
And yes I am aware that they have a COM api, but I have a feeling that the crashy behavior I normally experience from QC is probably in the COM objects, so any software I might write that uses them would exhibit the same behavior.

The officially supported method for communicating with QC is via the published Open Test Architecture (OTA) API which is very well documented. I think you would have your work cut-out trying to re-write the API at a lower HTTP level. Lots of people are using the OTA API successfully to customise QC and write third-party extensions. Also many of the COM idiosyncrasies are now documented on the .NET. Maybe you can elaborate on the sorts of problems you are having with the COM API?

The below page can help :
Visit
http://technologicaguru.blogspot.com/2009/06/connect-to-quality-center-ota-client.html

What are your experiences implementing/using WebDAV?

For a current project, I was thinking of implementing WebDAV to present a virtual file store that clients can access. I have only done Google research so far but it looks like I can get away with only implementing two methods:
GET, PROPFIND
I think that this is great. I was just curious though. If I wanted to implement file uploading via:
PUT
I haven't implemented it, but it seems simple enough. My only concern is whether a progress meter will be displayed for the user if they are using standard Vista Explorer or OSX Finder.
I guess I'm looking for some stories from people experienced with WebDAV.

For many WebDAV clients and even for read only access, you will also need to support OPTIONS. If you want to support upload, PUT obviously is required, and some clients (MacOS X?) will require locking support.
(btw, RFC 4918 is the authorative source of information).

I implemented most of the WebDAV protocol in about a day's work: http://github.com/nfarina/simpledav
I wrote it in Python to run on Google App Engine, and I expect any other language would be a similar effort. All in all, it's about two pages of code.
I implemented following methods: OPTIONS, PROPFIND, MKCOL, DELETE, MOVE, PUT, GET. So far I've tested Transmit and Cyberduck and both work great with it.
Hopefully this can provide some guidance for the next person out there interested in implementing a WebDAV server. It's not a difficult protocol, it's just very dense with abstracted language like 'depth' and 'collections' and blah.
Here's the spec: http://www.webdav.org/specs/rfc4918.html
But the best way to understand the protocol is to watch a client interacting with a working server. I used Transmit to connect to Box.net's WebDAV server and monitored traffic with Charles Proxy.

Bit late to the party, but I've implemented most of the webdav protocol and I can tell with confidence you'll need to implement most of the protocol.
For OS/X you'll need class-2 WebDAV support, which includes LOCK and UNLOCK (I found it particularly difficult to fully implement the http If: header, but for Finder you'll only need a bit of that.)
These are some of my personal findings:
http://sabre.io/dav/clients/windows/
http://sabre.io/dav/clients/finder/
Hope this helps

If you run Apache Jackrabbit under, say, Tomcat, it can be configured to offer WebDAV and store uploaded files. Perhaps that will be a useful model, or even a good enough replacement for the planned implementation.
Apache Jackrabbit Support for WebDAV
Also, you may want to be aware of the BitKinex client (free 30 day trial), which I have found to be a useful tool for testing a WebDAV server.
BitKinex Home Page

We use WebDAV internally to provide a folder-based view of some file shares to clients outside of our firewall. We're using IIS6 for this.
Basically, it boils down to creating a Virtual Directory in IIS that maps to each network file system that you want to make available via WebDAV. Set it up with the content coming from "A share located on another computer" -- use the UNC path to the share for the Network Directory value. We turn on all options except Index this resource. Disable all default content pages. Turn on Windows Integrated Authentication (ours is set up using SSL as well). I have the root set up to deny access to anonymous and allow access to any authenticated user. We also have a wildcard MIME mapping (.* to application/octet-stream). Enable the WebDAV web service extension in IIS. You also need to set up the web server to delegate permissions to all the file servers you may be accessing so it can pass on the user's credentials.
If you have Macintosh clients you may also need an ISAPI filter that maps 401 to 403 errors for Darwin clients. Microsoft and Apple disagree on how to handle the situation when you don't have permission to write to a directory. Apple keeps resending the credentials on a 401 (Access Denied) error, translating it to a 403 (Forbidden) error keeps this from happening. By default Apple likes to write a "dot" file to every directory it accesses. Navigating through directories where you don't have write access will end up crashing the Finder if you don't have the filter. I have source code for this if needed.
This is all off the top of my head. It's possible (probable?) that I may have missed something. Feel free to contact me via the contact information on my web site if you have problems.

We have a webDAV servlet on our web based product.
i've found Apache Jackrabbit a good help for implementing it. however webDav is a serious P.I.T.A on the client side support.
many client implementation differ widely in their behavior and you most likely will have to support several different kinds of bugged implementations.
some examples:
MS vista only supports authentication over SSL
most windows based webDAV client assume your webdav-server/let is a sharepoint server and will act accordingly (thus not according to the webDAV protocol)
one example of this is that you NEED to allow and Unauthenticated LOCK request on the root of your server (ie yourdomain.com/ not yourdomain.com/where/webdav/should/live) else you wont be able to get write acces in MS windows.
(this is a serious P.I.T.A on a tomcat machine where your stuff usualy lives in server.com/servlets/paths/thelocation)
most(all?) versions of MS office respond different to webdav links.
i guess my point is integrating webdav support into an existing product can be a LOT harder then you would expect. and if possible i would advice to use a (semi)-standalone webDAV server such as jackrabbit webdavServer, or apache mod_webdav

I've found OS X's Finder WebDAV support to be really finicky. In order to get read-write support, you have to implement LOCK, in addition to other bits.
I wrote a WebDAV interface to a Postres database, where python modules were stored in the database in a hierarchical folder-like structure. Accessing it with cadaver worked fine, and IIRC a GUI windows browser worked too, but Finder refused to mount the share as anything other than read-only.
So, I don't know if if would give a progress bar. The files I was dealing with were small enough that a read/copy from them was virtually instantaneous. I think a copy of a large file using the Finder would probably give a progress bar - it does for any other type of mounted share.

Here is another open source project for WSGI WebDAV
http://code.google.com/p/wsgidav/
where I picked up the PyFileServer project.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex