I've seen questions about how to find out the UUID, given the file path.
But what about the opposite? Can I find out the file path if I have the UUID? Do UUIDs reference the address location somehow?
Why am I asking this?
Because I've read this snippet about Openstack's Swift and am wondering if this concept applies to UUIDs in general:
Rather than the traditional idea of referring to files by their
location on a disk drive, developers can instead refer to a unique
identifier referring to the file or piece of information
In which context then, would I be able to use the identifier (e.g. SQL queries)?
Related
I have a db with original file names, location to files on disk, meta data like user that owns file... Those files on disk are with scrambled names. When user requests a file, the servlet will check whether he's authorized, then send the file in it's original name.
While researching on the subject i've found several cases that cover that issue, but nothing specific to mine.
Essentially there are 2 solutions:
A custom servlet that handles headers and other stuff the Default Servlet containers don't: http://balusc.omnifaces.org/2009/02/fileservlet-supporting-resume-and.html
Then there is the quick and easy one of just using the Default Servlet and do some path remapping. For ex., in Undertow you configure the Undertow subsystem and add file handlers in the standalone.xml that map http://example.com/content/ to /some/path/on/disk/with/files .
So i am leaning towards solution 1, since solution 2 is a straight path remap and i need to change file names on the fly.
I don't want to reinvent the hot water. And both solutions are non standard. So if i decide to migrate app server to other than Wildfly, it will be problematic. Is there a better way? How would you approach this problem?
While your problem is a fairly common one there isn't necessarily a standards based solution for every possible design challenge.
I don't think the #2 solution will be sufficient - what if two threads try to manipulate the file at the same time? If someone got the link to the file could they share it?
I've implemented something very similar to your #1 solution - the key there is that even if the link to the file got out no one could reuse the link as it requires security. You would just "return" a 401 or 403 for the resource.
Another possibility depends on how you're hosted. Amazon S3 allows you to generate a signed URL that has a limited time to live. In this way your server isn't sending the file directly. It is either sending a redirect or a URL to the front end to use. Keep the lifetime at like 15 seconds (depending on your needs) and then the URL is no longer valid.
I believe that the other cloud providers have a similar capability too.
In DICOM, following are the classes defined for C-Find and C-Move at Study Root.
Study Root Query/Retrieve Information Model - FIND: 1.2.840.10008.5.1.4.1.2.2.1
Study Root Query/Retrieve Information Model - MOVE: 1.2.840.10008.5.1.4.1.2.2.2
I have implemented Query Retrieve SCP and SCU in multiple applications. In all those cases, I always implemented both the classes. I do C-Find first to get the list of matching data. Then based on result, I do (automatically or manually) C-Move to get the instances. All those implementations are working fine.
Recently, I am working on one application that combines DICOM with other private protocol to fulfill some specific requirements. It just stuck to my mind if it is possible to directly do C-Move without doing C-Find as SCU?
I already know the identifier (StudyInstanceUID) to retrieve and I also know that it does present on SCP.
I looked into specifications but could not found anything conclusive. I am aware that C-Find and C-Move could be issued by SCU to SCP on different connections/associations. So in first glance, what I am thinking looks possible and legal.
I worked with many third party DICOM applications; none of them implements SCU the way I am thinking. All SCUs implement C-Find AND C-Move both.
Question:
Is it DICOM legal and practical to implement Query Retrieve SCU C-Move command without C-Find command? Please point me to the reference in specifications if possible.
Short answer: Yes this is perfectly legal per DICOM specification.
Long answer: Let's consider the DCMTK reference DICOM Q/R implementation. It provides a set of basic SCU command line tools, namely findscu and movescu. The idea is to pipe the output of findscu to movescu to construct a valid C-MOVE (SCU) request.
In your requirement you are simply replacing the findscu step with a private implementation that does not rely on the publicly defined C-FIND (SCU) protocol but by another mechanism (extension to DICOM).
So yes your C-MOVE (SCU) implementation is perfectly valid, since there is no requirement to provide C-FIND (SCU) during this query.
I understand you are not trying to backup an entire database using C-MOVE (SCU), that was just a possible scenario where someone would be trying to use C-MOVE (SCU) without first querying with a valid C-FIND (SCU) result.
I'm developing an online file storage service in mainly PHP and MySQL, where users will be able to upload files up to 10 - 20 GB in size.
Unregistered users will be able to upload files but not in a personal storage space, just a directory where all file uploads of unregistered users will be stored.
Registered users will get a fixed amount (that might increase in the future) of personal storage space and access to a file manager to easily manage and organize all their files. They'll also be able to set their files private (not downloadable by anyone but themselves) or public.
What would be a good possible directory set-up?
I'm thinking about a "personal" directory that will contain folders with the user's id as the folder name for each registered user.
Alongside the personal directory, there will be an "other" folder which will just contain every file that's been uploaded by unregistered users.
Both will contain uploaded files, with each their corresponding row id (from the files table in the database) as the file name.
ROOT
FOLDER uploads
FOLDER personal
FOLDER 1
FILE file_id1
FILE file_id2
(...)
FOLDER 2
FILE file_id3
FILE file_id4
(...)
(...)
FOLDER other
FILE file_id5
FILE file_id6
(...)
This is the first time I'm dealing with a situation like this, but this concept is all so far what I could came up with. Any suggestions are also welcome!
Basically you need to address the following topics:
Security: With what you described it is pretty unclear who is allowed to read access the files. If this is always "everybody read everything" you set up a file structure within a web server virtual server. Otherwise you set up the folder structure in a "hidden" area and only access those via server side scripts (eg. copy on demand). The secure approach eats more ressources, but opens room to setup a technically optimized folder structure.
OS constraints: Each OS limits there number of items and/or files per folder. The actual figures of limitation depend on the os specific configuration of the file system. If I remember that right, there are LINUX setups that support 32000 items per folder. At the end of the day the example is not important. However importance lays on the fact, that your utilization planning does not exceed the limitations on your servers. So if you plan to provide your service to 10 users you may likely have a folder "other", if you target at a million users you probably need lots of folders "other". If you also do not want to restrict your users in number of files being uploaded you probably need the option to extend the folder per user. Personally I apply a policy where I not have more than 1000 items in a folder.
SEO requirements: If your service needs to be SEO complaint, it needs to be able to present speaking names to users - ideally without general categorization such as "Personal"/"Other". Your proposed structure may meet this requirement. However the OS constraints may force you into a more technical physical structure (eg. where chunk item id into 3 digits and use those to make up your folder and file structure). On top of that you can implement a logical structure which then converts IDs into names. However such implementation means file access via server side scripts and therefore demands for more ressources. Alternatively you could play with webserver url rewrites...
Consistency + Availability + Partition tolerance: Making your service a service likely requires you to have a balanced setup according those. Separating the beast into physical and logical layer helps here a lot. Consistency + Availability + Partition tolerance would be dealt with at the logical layer. http://en.wikipedia.org/wiki/NoSQL might be your way to go forward. http://en.wikipedia.org/wiki/CAP_theorem for details on the topic.
====================== UPDATE
From the comments we know now that you store meta data in an relational database, that you have physical layer (files on disk) and logical layer (access via php scripts) and that you base your physical file/folder layer on IDs.
This opens room to fully move any structural considerations to the relational database and maybe to improve the physical layer from the very beginning. So here are the tables of the sql database I would create:
======
users
======
id (unsigned INT, primary key)
username
password
isregisteredflag
...any other not relevant for the topic...
======
files
======
id (unsigned INT,primary key)
filename
_userid (foreign key to users.id)
createddate
fileattributes
...any other not relevant for the topic...
======
tag2file
======
_fileid (foreign key to files.id)
_tagid (foreign key to tag.id)
======
tags
======
id (unsigned INT,primary key)
tagname
Since this structure allows you to derive files from user IDs and also you can derive userID from files you do not need to store that relation as part of your folder structure. You just name the files on the physical layer files.id, which is a numeric value generated by the database. Since the ID is generated by the datebase you make sure to have them unique. Also now you can have tags which gives a richer categorization experience to your users (if you do not like tags you could do folder instead as well - in the database).
Taking care for at point 4 very much impacts on your design. If you take care after you did set up the whole thing you potentially double efforts. Since everything is settled to build files from numeric IDs it is a very small step to store your physical files in a key value store in a no-sql database (rather than on the file system), which makes your system scalable as hell. This would mean you would employ a sql database for meta and structure data and a nosql database for files content.
Btw. to cover your public files I would assume you to have a user "public" with ID=1. This ends up in some data hardcoding which is meant to be ugly. However as the functionality "public" is such a central element in your application you can contribute to unwritten laws by documenting that in a proper way. Alternatively you can add some more tables and blow up your code to cover two different things in a 'clean' way.
In my opinion, it shouldn't actually matter which folder structure you have. Of course (as already mentioned), there are OS and FS restrictions, and you may want to spend a thought or two on scaling.
But in the end, I would recommend a more flexible approach to storage and retrieval:
Ok, files are physically stored somewhere in a file system.
But: There should be a database with meta information about the file like categories, tags, descriptions, modification dates, maybe even change revisions. Of course, it will also store the physical position of the file, which may or may not be on the same machine.
This database would be optimized for searching by those criteria. (There are a couple of libraries for semantical indexing/searching, depending on your language/framework.)
This way, you would separate the physical concerns of the logical/semantical ones. And if you or your users still want the hierarchical approach, you can always go with the category logic.
Finally, you will have a much more flexible and appealing file hosting service.
Can I physically move documents from one folder to other with xquery/Marklogic ? If yes, please explain in detail.
I'm not sure I understand the question. Why is "physical" movement important? The database abstracts the physical storage of documents away from the developer. If you're administering a MarkLogic database you can put the forests, the physical partitions where the data and indexes live, in different locations. I suspect that's not what you're asking, though. Can you please provide more details about the problem you're trying to solve?
No.
To my knowledge the only XQuery functions in MarkLogic that access the filesystem directly are:
xdmp:filesystem-directory,
xdmp:filesystem-file,
xdmp:filesystem-file-exists,
xdmp:filesystem-file-length,
xdmp:document-load,
xdmp:document-save
MarkLogic specifically does not let you exec commands or directly modify the host operating system's files.
I too am wondering what you mean by "move documents" ... are these MarkLogic documents or filesystem documents ? And what is a "folder" in this context ? If this is a marklogic document do you mean to put the URI in a different "directory" ?
The closest thing to "physically move" of a marklogic document is to change its URI.
There is no builtin to do this but xmlsh supports this by copying some code on a mailing list long ago. You can see the strategy here
http://xmlsh.svn.sourceforge.net/viewvc/xmlsh/extensions/marklogic/src/org/xmlsh/marklogic/resources/rename.xquery?revision=730&view=markup
I need an SQLite implementation that allows me to have the db file encrypted on disk, for security reasons. I noticed that SQLite only works with regular files, and that there's no implementation that supports streams available (oddly enough, as many people seem to want one). If I had such an implementation, I could easily pass it a stream that encrypts/decrypts the file first.
After googling and reading about the matter, it seems like a custom VFS might solve the problem, implementing only the file methods to open, read, write etc. to a stream instead of a regular file (the other methods may keep the default behavior).
My question then is as follows:
1. Does that sound like the correct approach?
2. Is there really no such implementation available??
Thanks.
If you just need an encrypted sqlite database there is The SQLite Encryption Extension. If not- ignore my answer.