I would like to know what is the max size of the value length field for the VR OB and OW. I know that currently its 2^32(32 bit application).I want to know in 64 bit application it will be 64 bit? I referred dicom standard(DICOM PS3.5 2014c - Data Structures and Encoding). I did not get any clue. Since we want to store the huge non image data (more than 4gb) I would like to know if that is possible.
Thanks in advance.
Although the maximum size of an attribute is 0xfffffffe, in the image data attribute (0x7fe0, 0x0010) larger data can be stored by using an encapsulated transfer syntax. This effectively lets you split up your image data into multiple "items" called fragments. Each fragment also has a maximum size of 0xfffffffe, but there is no limitation to the number of fragments in the image data attribute.
Refer to chapter 5, annex A.4 "Transfer Syntaxes For Encapsulation of Encoded Pixel Data" of the DICOM Standard for detailed explanation.
If you use a library also take a look at their documentation, lots of libraries, for example dcmtk, do support splitting an image into multiple frames. Just look for keywords like fragment or encapsulation.
The maximum size of the tag is dictated by the DICOM standard, not by the CPU architecture on which the DICOM library is compiled or used.
At the moment the maximum size (in bytes) of a OB or OW tag is represented by a 32-bit wide value (minus 1 or 2 because 0xFFFFFFFF is reserved).
Related
The BitTorrent protocol doesn't specify block (piece) size. This is left to the user. (I've seen different torrents for the same content with 3 or more different choices.)
I'm thinking of filing a BitTorrent Enhancement Proposal which needs to make a specific block size mandatory — both for the whole torrent, and also for individual files (for which BTv2 (BEP 52) specifies bs=16KiB).
The only thing I've found that's close is the rsync block size algorithm in Tridgell & Mackerras' technical paper. Their bs=300-1100 B (# bytes aren't powers of 2).
Torrents, however, usually use bs=64kB–16MB (# bytes are powers of 2, and much larger than rsync's) for the whole torrent (and, for BTv2, 16KiB for files).
The specified block size doesn't need to be a constant. It could be a function of thing-hashed size, of course (like it is in rsync). It could also be a function of file type; e.g. there might be some block sizes which are better for making partial video/archive/etc files more usable.
See also this analysis of BitTorrent as a block-aligned file system.
So…
What are optimal block sizes for a torrent, generic file, or partial usefulness of specific file types?
Where did the 16KiB bs in BEP 52 come from?
Block and piece size are not the same thing.
A piece is the unit that is hashed into the pieces string in v1 torrents, one hash per piece.
A block is a part of a piece that is requested via request (ID 6) and delivered via piece (ID 7) messages. These messages basically consist of (piece number, offset, length) tuple where the length is the block size. In this sense blocks are very ephemeral constructs in v1 torrents but they are still important since downloading clients have to keep a lot of state about them in memory. Since the downloading client is in control of the request size they customarily use fixed 16KiB blocks, even though they could do this more flexibly. For an uploading client it does not really matter complexity-wise as they have to simply serve the bytes covered by (piece,offset,length) and keep no further state.
Since clients generally implement an upper message size limit to avoid DoS attacks 16KiB is also the recommended upper bound. Specialized implementations could use larger blocks, but for public torrents that doesn't really happen.
For v2 torrents the picture changes a bit. There now are three concepts
the ephemeral blocks sent via messages
the pieces (now representing some layer in the merkle tree), needed for v1 compatibility in hybrid torrents and also stored as piece layers outside the info dictionary to allow partial file resume
the leaf blocks of the merkle tree
The first type is essentially unchanged compared to v1 torrents but the incentive to use 16KiB-sized blocks is much stronger now because that is also the leaf hash size.
The piece size must now be a power of two and multiple of 16KiB, this constraint did not exist in v1 torrents.
The leaf block size is fixed to 16KiB, it is relevant when constructing the merkle tree and exchanging message IDs 21 (hash request) and 22 (hashes)
What are optimal block sizes for a torrent, generic file, or partial usefulness of specific file types?
For a v1 torrent the piece size combined with the file sizes determines a lower bound of the metadata (aka .torrent file) size. Each piece must be stored as a 20byte hash in pieces, thus larger pieces result in fewer hashes and smaller .torrent files. For terabyte-scale torrents a 16KiB piece size result in a ~1GB torrent file, which is unacceptable for most use-cases.
For a v2 torrent it would result in a similarly sized piece layers in the root dictionary. Or if a client does not have the piece layers data available (e.g. because they started a download via infohash) they will have to retrieve the data via hash request messages instead, ultimately resulting in the same overhead, albeit more spread out over the course of the download.
Where did the 16KiB bs in BEP 52 come from?
16KiB was already the de-facto block size for most clients. Since a merkle-tree must be calculated from some leaf hashes a fixed block size for those leaves had to be defined. Hence the established messaging block size was also chosen for the merkle tree blocks.
The only thing I've found that's close is the rsync block size algorithm in Tridgell & Mackerras' technical paper. Their bs=300-1100 B (# bytes aren't powers of 2).
rsync uses a rolling hash for content-aware chunking rather than fixed-size blocks and that is the primary driver for their chunk-size choices. So rsync considerations do not apply to bittorrent.
We are writing an importer for dicom files.
How does one generally deceide if a series of images forms a 3D-Volume or is just a series of 2D images?
Is there a universal way to decide this for most vendors? I looked a the DICOM tags and could no find an apparent solution.
The DICOM standard defines UIDs for describing the hierarchy. These are from top to bottom:
Study UID - Identifier of the study or scanning session.
Series UID - The same within a series acquired in one scan.
Image UID - Should be unique for any image.
A DICOM image saved by a standard-conforming implementation should have all these IDs. If multiple images have the same SeriesUID, they are a volume (or time-series) as defined in the standard. Some software of course is not standard-conforming and you'll have to look at other things like timestamps and patient position, but it is usually best to start by following the standard.
For ordering the series after identifying it, GDCM (as malat suggested) or dcmtkdicom are pretty well-established libraries.
In MR, you'll want to look for:
MR Acquisition Type (0018,0023). It has two enumerated values:
2D = frequency x phase
3D = frequency x phase x phase
I'm not as sure about CT.
Most of the time, malat's answer is what you'll want to do (i.e. organize the slices by position and orientation and treat them in a 3D fashion through multi-planar reconstruction).
I think what you are searching for is the algorithm to organise DICOM dataset using Image Position (Patient) and Image Orientation (Patient).
A typical implementation can be found in GDCM
Please note that my answer may be totally unrelated to your specific DICOM instances, but since you did not specified which SOP Class UID you were dealing with, I simply assumed you were dealing with old CT or MR Image Storage
Patient Position (0018, 5100) is a type 1 required attribute for both the CT and MR modalities. This attribute is VERY IMPORTANT for accurately interpreting the patient's orientation.
Projection radiograph typically will have Patient Orientation (0020, 0020) attribute and cross-sectional image should have Image Position (0020, 0032) and Image Orientation (0020, 0037) attributes as they are type 1 required element of Image Plane module (see PS 3.3 section C.7.6.2.1.1).
However, localizer or scout image included with CT study is not really a cross-sectional image but a projection image and may contain Image Position and Image Orientation attributes. So is the case of MR study where one or more sagittal or coronal images are usually captured from which axial images are prescribed. In this case different logic is needed to identify the localizer image. For example, CT localizer may use the string "LOCALIZER" for value 3 of "Image Type" attributes.
If someone haven't found the answer, I looked through the tags in RadiAnt DICOM viewer where I compared different files and the Scan Options (0018, 0022) tag I think which contains the information. If the tag exists (because on some it was not there) and the value is equal to HELICAL MODE or HELIX then a 3D image can be constructed from that.
I'm increasingly looking at using QR codes to transmit binary information, such as images, since it seems whenever I demo my app, it's happening in situations where the WiFi or 3G/4G just doesn't work.
I'm wondering if it's possible to split a binary file up into multiple parts to be encoded by a series of QR codes?
Would this be as simple as splitting up a text file, or would some sort of complex data coherency check be required?
Yes, you could convert any arbitrary file into a series of QR codes,
something like Books2Barcodes.
The standard way of encoding data too big to fit in one QR code is with the "Structured Append Feature" of the QR code standard.
Alas, I hear that most QR encoders or decoders -- such as zxing -- currently do not (yet) support generating or reading such a series of barcodes that use the structured append feature.
QR codes already have a pretty strong internal error correction.
If you are lucky, perhaps splitting up your file with the "split" utility
into pieces small enough to fit into a easily-readable QR code,
then later scanning them in (hopefully) the right order and using "cat" to re-assemble them,
might be adequate for your application.
You surely can store a lot of data in a QR code; it can store 2953 bytes of data, which is nearly twice the size of a standard TCP/IP packet originated on an Ethernet network, so it's pretty powerful.
You will need to define some header for each QR code that describes its position in the stream required to rebuild the data. It'll be something like filename chunk 12 of 96, though encoded in something better than plain text. (Eight bytes for filename, one byte each for chunk number and total number of chunks -- a maximum of 256 QR codes, one simple ten-byte answer, still leaving 2943 bytes per code.)
You will probably also want to use some form of forward error correction such as erasure codes to encode sufficient redundant data to allow for mis-reads of either individual QR codes or entire missing QR codes to be transparently handled well. While you may be able to take an existing library, such as for Reed-Solomon codes to provide the ability to fix mis-reads within a QR code, handling missing QR codes entirely may take significantly more effort on your part.
Using erasure codes will of course reduce the amount of data you can transmit -- instead of all 753,408 bytes (256 * 2943), you will only have 512k or 384k or even less available to your final images -- depending upon what code rate you choose.
I think it is theoretically possible and as simple as splitting up text file. However, you probably need to design some kind of header to know that the data is multi-part and to make sure different parts can be merged together correctly regardless of the order of scanning.
I am assuming that the QR reader library returns raw binary data, and you will you the job of converting it to whatever form you want.
If you want automated creation and transmission, see
gre/qrloop: Encode a big binary blob to a loop of QR codes
maxg0/displaysocket.js: DisplaySocket.js - a JavaScript library for sending data from one device to another via QR ocdes using only a display and a camera
Note - I haven't used either.
See also: How can I publish data from a private network without adding a bidirectional link to another network - Security StackExchange
Goal (General)
My ultimate (long term) goal is to write an importer for a binary file into another application
Question Background
I am interested in two fields within a binary file format. One is
encrypted, and the other is compressed and possibly also encrypted
(See how I arrived at this conclusion here).
I have a viewer program (I'll call it viewer.exe) which can open these files for viewing. I'm hoping this can offer up some clues.
I will (soon) have a correlated deciphered output to compare and have values to search for.
This is the most relevant stackoverflow Q/A I have found
Question Specific
What is the best strategy given the resources I have to identify the algorithm being used?
Current Ideas
I realize that without the key, identifying the algo from just data is practically impossible
Having a file and a viewer.exe, I must have the key somewhere. Whether it's public, private, symmetric etc...that would be nice to figure out.
I would like to disassemble the viewer.exe using OllyDbg with the findcrypt plugin as a first step. I'm just not proficient enough in this kind of thing to accomplish it yet.
Resources
full example file
extracted binary from the field I am interested in
decrypted data In this zip archive there is a binary list of floats representing x,y,z (model2.vertices) and a binary list of integers (model2.faces). I have also included an "stl" file which you can view with many free programs but because of the weird way the data is stored in STL's, this is not what we expect to come out of the original file.
Progress
1. I disassembled the program with Olly, then did the only thing I know how to do at this poing and "searched for all referenced text" after pausing the porgram right before it imports of of the files. Then I searched for words stings like "crypt, hash, AES, encrypt, SHA, etc etc." I came up with a bunch of things, most notably "Blowfish64" which seems to go nicely with the fact that mydata occasionally is 4 bytes too long (and since it is guranteed to be mod 12 = 0) this to me looks like padding for 64 bit block size (odd amounts of vertices result in non mod 8 amounts of bytes). I also found error messages like...
“Invalid data size, (Size-4) mod 8 must be 0"
After reading Igor's response below, here is the output from signsrch. I've updated this image with green dot's which cause no problems when replaced by int3, red if the program can't start, and orange if it fails when loading a file of interest. No dot means I haven't tested it yet.
Accessory Info
Im using windows 7 64 bit
viewer.exe is win32 x86 application
The data is base64 encoded as well as encrypted
The deciphered data is groups of 12 bytes representing 3 floats (x,y,z coordinates)
I have OllyDb v1.1 with the findcrypt plugin but my useage is limited to following along with this guys youtube videos
Many encryption algorithms use very specific constants to initialize the encryption state. You can check if the binary has them with a program like signsrch. If you get any plausible hits, open the file in IDA and search for the constants (Alt-B (binary search) would help here), then follow cross-references to try and identify the key(s) used.
You can't differentiate good encryption (AES with XTS mode for example) from random data. It's not possible. Try using ent to compare /dev/urandom data and TrueCrypt volumes. There's no way to distinguish them from each other.
Edit: Re-reading your question. The best way to determine which symmetric algorithm, hash and mode is being used (when you have a decryption key) is to try them all. Brute-force the possible combinations and have some test to determine if you do successfully decrypt. This is how TrueCrypt mounts a volume. It does not know the algo beforehand so it tries all the possibilities and tests that the first few bytes decrypt to TRUE.
Imagine that you had all the supercomputers in the world at your disposal for the next 10 years. Your task was to compress 10 full-length movies losslessly as much as possible. Another criteria was that a normal computer should be able to decompress it on the fly and should not need to spend much of his HD to install the decompressing software.
My question is, how much more compression could you achieve than the best alternatives today? 1%, 5%, 50%? More specifically: is there a theoretical limit to compression, given a fixed dictionary size (if it is called that for video compression as well)?
The limits of compression are dictated by the randomness of the source. Welcome to the study of information theory! See data compression.
There is a theoretical limit: I suggest reading this article on Information theory and the pigeon hole principle. It seems to sum up the issue in a very easy to understand way.
If you have a fixed catalogue of all the movies you were ever going to compress, you could just send an id for the movie and have the "decompression" lookup up the data with that index. So compression could be to a fixed size of log2(N) bits, where N was the number of movies.
I suspect the practical lower bound is rather higher than this.
Do you really mean lossless? Most of today's video compression is lossy, I thought.
It is important to redefine the limits with the latest developments regarding information theory. Therefore, it is essential to report the hypotheses for which the limit is valid.
In information theory, 3 fundamental hypotheses are used which are the following:
the information is defined by the entropy function H(X).
the information that identifies the source is known both by the encoder and by the decoder.
the source and its isomorphisms are considered. It means that we can decode a symbol at a time.
First limit, the most famous, defined by Shannon in which all 3 hypotheses are true.
NH(X)
With H(X) entropy of the source X.
Second limit. we remove the second hypothesis the decoder does not know the source.
NH(X)+source information
Third limit, let's remove the third hypothesis. In this case, the Set Shaping Theory SST is used, a new method that is revolutionizing information theory. This theory studies the one-to-one functions f that transform a set of strings into a set of equal size made up of strings of greater length. With this method, we get the following limit:
N2H(Y)+ source information≈NH(X)
with f(X)=Y and N2>N.
In practice, we obtain a gain in terms of compression equivalent to the information necessary to describe the source is obtained. The information needed to describe the source represents the inefficiency of the entropy coding.
In this case, however, it is not possible to decode a symbol at a time (the code is not instantaneous) but the message must be decoded in full before obtaining the original message.
Important progress has been made in this area. It was possible to apply this theory to a concrete case of data compression "Practical applications of Set Shaping Theory in Huffman coding".
Another interesting aspect is that the authors shared the code and the function that performs the transform described in the set shaping theory. The file is shared on Matlab file exchange: https://www.mathworks.com/matlabcentral/fileexchange/115590-test-sst-huffman-coding?s_tid=FX_rc1_behav