Does an entire DICOM file has just one transfer syntax?

Does an entire DICOM file has just one transfer syntax? - dicom

Sorry if this is quite basic, I am new to DICOM.
I know a DICOM file has multiple parts like: Patient, Study, Series and Instance (Image).
Now to communicate with a device it needs a Transfer Syntax, which tells the mode of communication, like Little-Endian, Big-Endian, JPEG-Lossless, lossy etc.
So, does each of the DICOM file parts (Patient, Study, Series and Instance (Image)) have their own transfer syntax? Like Patient can communicate as Little-Endian, Study might use JPEG-Lossless or MPEG-4 (if it is video) etc?
OR does the entire DICOM file just use one transfer syntax.

A single transfer syntax is used through all the entire DICOM file (except for the first group with ID=0002, which is written with low endian/explicit VR transfer syntax)
When sending DICOM messages through a network then you can have a different transfer syntax for each message: there you can define different Presentation Contexts during the association negotiation, and each Presentation Context can have a different Transfer Syntax.
After the association negotiation, you can transmit messages with different transfer syntaxes by selecting the proper presentation context/transfer identifier in the message header

Your question does not entirely make sense with how DICOM is organized.
DICOM is composed of various SOP Classes. A SOP Class is Service-Object-Pair. Example services are the Storage Service Class (a service for network storage of messages (typically modality images) or the Media Service Class (for writing of files to media or just saving them to disk).
The Object portion of the SOP Class is defined in an IOD (Information Object Definition). IODs are defined by multiple Modules. The Modules in turn are composed of DICOM tags. Each Module usually groups tags together, and are typically related to an "Entity" in the DICOM model. The module might be associated with the Patient, Series, or Image level of the DICOM model. The IOD consists of all the tags defined in the various Modules. When encoding the IOD, the context of the module that the tags are defined in doesn't matter.
The DICOM Service defines how the tags within an IOD are encoded. Both a DICOM message for network transfer services (in its Group 0x0000 elements) and a DICOM file for media (in its Group 0x0002 elements) contain meta data that describe the encoding and a data set which contains the IOD tags. The group 0x0000 elements in a DICOM Message are always encoded in Implicit VR Little Endian, and the Group 0x0002 elements in a DICOM file are always encoded in the Explicit VR Little Endian transfer syntax. The datasets are always encoded in a single transfer syntax.
Hope this helps a bit.

Related

Is there a provably optimal block (piece) size for torrents (and individual files)?

The BitTorrent protocol doesn't specify block (piece) size. This is left to the user. (I've seen different torrents for the same content with 3 or more different choices.)
I'm thinking of filing a BitTorrent Enhancement Proposal which needs to make a specific block size mandatory — both for the whole torrent, and also for individual files (for which BTv2 (BEP 52) specifies bs=16KiB).
The only thing I've found that's close is the rsync block size algorithm in Tridgell & Mackerras' technical paper. Their bs=300-1100 B (# bytes aren't powers of 2).
Torrents, however, usually use bs=64kB–16MB (# bytes are powers of 2, and much larger than rsync's) for the whole torrent (and, for BTv2, 16KiB for files).
The specified block size doesn't need to be a constant. It could be a function of thing-hashed size, of course (like it is in rsync). It could also be a function of file type; e.g. there might be some block sizes which are better for making partial video/archive/etc files more usable.
See also this analysis of BitTorrent as a block-aligned file system.
So…
What are optimal block sizes for a torrent, generic file, or partial usefulness of specific file types?
Where did the 16KiB bs in BEP 52 come from?

Block and piece size are not the same thing.
A piece is the unit that is hashed into the pieces string in v1 torrents, one hash per piece.
A block is a part of a piece that is requested via request (ID 6) and delivered via piece (ID 7) messages. These messages basically consist of (piece number, offset, length) tuple where the length is the block size. In this sense blocks are very ephemeral constructs in v1 torrents but they are still important since downloading clients have to keep a lot of state about them in memory. Since the downloading client is in control of the request size they customarily use fixed 16KiB blocks, even though they could do this more flexibly. For an uploading client it does not really matter complexity-wise as they have to simply serve the bytes covered by (piece,offset,length) and keep no further state.
Since clients generally implement an upper message size limit to avoid DoS attacks 16KiB is also the recommended upper bound. Specialized implementations could use larger blocks, but for public torrents that doesn't really happen.
For v2 torrents the picture changes a bit. There now are three concepts
the ephemeral blocks sent via messages
the pieces (now representing some layer in the merkle tree), needed for v1 compatibility in hybrid torrents and also stored as piece layers outside the info dictionary to allow partial file resume
the leaf blocks of the merkle tree
The first type is essentially unchanged compared to v1 torrents but the incentive to use 16KiB-sized blocks is much stronger now because that is also the leaf hash size.
The piece size must now be a power of two and multiple of 16KiB, this constraint did not exist in v1 torrents.
The leaf block size is fixed to 16KiB, it is relevant when constructing the merkle tree and exchanging message IDs 21 (hash request) and 22 (hashes)
What are optimal block sizes for a torrent, generic file, or partial usefulness of specific file types?
For a v1 torrent the piece size combined with the file sizes determines a lower bound of the metadata (aka .torrent file) size. Each piece must be stored as a 20byte hash in pieces, thus larger pieces result in fewer hashes and smaller .torrent files. For terabyte-scale torrents a 16KiB piece size result in a ~1GB torrent file, which is unacceptable for most use-cases.
For a v2 torrent it would result in a similarly sized piece layers in the root dictionary. Or if a client does not have the piece layers data available (e.g. because they started a download via infohash) they will have to retrieve the data via hash request messages instead, ultimately resulting in the same overhead, albeit more spread out over the course of the download.
Where did the 16KiB bs in BEP 52 come from?
16KiB was already the de-facto block size for most clients. Since a merkle-tree must be calculated from some leaf hashes a fixed block size for those leaves had to be defined. Hence the established messaging block size was also chosen for the merkle tree blocks.
The only thing I've found that's close is the rsync block size algorithm in Tridgell & Mackerras' technical paper. Their bs=300-1100 B (# bytes aren't powers of 2).
rsync uses a rolling hash for content-aware chunking rather than fixed-size blocks and that is the primary driver for their chunk-size choices. So rsync considerations do not apply to bittorrent.

How could I implement relocatable data with internal pointers in Rust?

In game development, it is common to write "baked" assets that are ready to be used by reading their data into memory and using a relocation table to adjust all pointers by the base offset of the buffer used to cache the asset in memory, then directly dereferencing a pointer into that buffer as a particular type. It is similar to relocatable addresses for executable code. This allows for zero-copy reads at runtime, which is significantly faster than parsing the asset into an in-memory format (e.g. parsing JSON to a struct with serde). For example, Halo: Combat Evolved loads all .map data in retail builds to the virtual address 0x40440000 and all pointers in the data on disk are already offset by that base address.
While this is somewhat straightforward in C and C++, Rust's safety rules present a challenge. Not all bitpatterns are safe for all values, and there are stricter alignment requirements for structures. Given the following constraints,
All structures represented in the buffer are repr(C), all bitpatterns for non-pointer fields in them are considered valid representations, and may not necessarily have alignment requirements (ala the zerocopy crate's FromBytes and Unaligned traits)
Magic fields can be used and checked to ensure the validity of the referenced structures from a higher struct
The pointer to the data is wrapped with std::pin::Pin to prevent the data from being moved
The access of the relocated data does not necessarily have to be safe Rust (e.g. dereferencing unchecked pointers), but the pointers themselves can be verified by relocatable offsets accompanying the asset data
How could I implement offset relocation within the loaded asset's buffer, to ensure that the internal pointers of the root structure and all its values are within the bounds of the allocation?

How to determine which transfer syntax to use for each DICOM image?

I'm very new to DICOM protocol, and I'm having a questions related to "Transfer Syntax" that needs to be chosen, before sending the images.
I have a list of images that I want to send to a remote server. Images in that list can be in one of the following format: CR, CT, DOC, DX, ES, KO, MG, MR, NM, OT, PR, PT, RF, SC, US, XA.
So I was wondering if there is some list where I can see which transfer syntax, corresponds to which DICOM format? I can take my DICOM images and determine their format from above, but I'm not sure what transfer syntax to use for each of them.
This is an example, when I'm hard-coding for one image:
DicomDataSet ct = new DicomDataSet("CT.dcm");
DicomDataSetCollection instancesToSend = new DicomDataSetCollection();
instancesToSend.Add(ct);
DicomAssociation connection = new DicomAssociation();
// "Send CT in Implicit VR Little endian format"
connection.RequestedContexts.Add(ct.SOPClass, "1.2.840.10008.1.2");
connection.Open("remote host", 104, "client", "server");
connection.SendInstances(instancesToSend);
connection.Close();
As I said, I have list of images. I can take each of them in a loop, but how can I know which transfer syntax to use for each DICOM image?

As long as you only want to create and send images, a reasonable decision is to support Implicit Little Endian only. It is the default Transfer Syntax in DICOM - each system that claims to be DICOM conformant must support it.
It is going to become much more complicated when you want to apply lossy compression or need to receive objects.

DICOM C-StoreSCP: How to know in advance number of images SCU will send?

I have DICOM C-StoreSCP application which receives DICOM images from my other C-StoreSCU application. My SCU always send one (and only one) and complete (all images from given study) study on one association. So SCP always know that all images received from SCU belong to single study. I know I can also check StudyIUID; but that is not my point of interest here.
I want to know total number of images in study that is being transferred. Using this data, I want to display status like "Received 3 of 10 images..." on screen. I can count images received (3 in this case) but how can I know total number of images in given study (10 in this case) that is being transferred?
Workaround:
On receiving first C-Store request on SCP, I should read the StudyIUID and establish new association with SCU (SCU should also support Q\R SCP capabilities in this case) for Q\R and get total count of images in study using C-Find.
Limitations: -
SCU should also support Q\R SCP features.
SCU should compulsorily send image count in C-Find response.
SCU should always send all images from only one study on one asociation.
I can easily overcome the limitations if I write SCU (with Q\R SCP capabilities) myself. But my SCP also receive images from third party SCUs those may not implement features necessary.
Please suggest if there is any DICOM compatible solution?
Is this possible using MPPS? I have not worked on MPPS part of DICOM yet.
Conclusion: -
Accepted answer (kritzel_sw) suggests very good solution (using MPPS) with only one drawback. MPPS is not mandatory service for each SCU. MPPS is applicable to only SCUs those actually acquire the image i.e. modalities. Even not all modalities support MPPS out of the box; they need unlock of feature with additional license cost and configurations. Also, there are lot of scenarios where modalities push instances to some intermediate workstation and the workstation further push it to SCP.
May be, I need to look into combination of DICOM + NON_DICOM wayout.

Good question, but no simple answer.
Expecting a Storage SCU to support the C-FIND-SCP as well is not going to work well in practice unless you are referring to archive servers / VNAs.
MPPS is not a bad idea. All attributes (Study, Series, SOP Instance UID) you need are mandatory, so it should be valid to rely on them. "Should" because I have seen vendors violating these constraints.
However, how can you be sure that the SCU has received the complete study? Maybe the study consists of CT and MR series, but the SCU sending the images to you only conforms to CT and rejects to receive MRs.
You might want to consider the Instance Availability Notification service which is another service class with which information about "who has got which image" can be made available to other systems. Actually this would exactly do what you need, because you know in advance for each AET ("device") which images are available there. But this service is not widely supported in practice.
Even if you really know which images are available on the system that is sending the study to you - how can you be sure that there is no user sitting in front of it who has just selected a sub-set of the study for sending.
Sorry, that I cannot provide a "real solution" to you but for the reasons I have mentioned above, I am not aware of any real-world system which supports the functionality (progress bar) you are describing.

Developing Communication Protocol for XBee

I am using XBee Digimesh Modules in API-Mode to send data between different industrial machines allowing them to share data, information and commands.
The API-Mode offers some basic commands, mainly to perform addressing and talk with the XBee Module itself in order to do configuration, etc.
Sending user data is done via a corresponding XBee API-Command which allows to send user-defined data with a maximum payload of 72 Bytes.
Since I want to expand this communication to allow integration of more machines, etc. I am thinking about how to implement a basic communication system that's tailored perfectly to the super small payload of just 72 Bytes.
Coming from the web, I normally would use some sort of JSON here but that would fill up the payload very quickly.
Also it's not possible to send a frame with lot's of information since this also fills up the payload very quickly.
So I came up with a different way of communicating. Instead of transmitting frames packed with information, what about sending some sort of Messages like this:
Machine-A Broadcasts: Who's there?
Machine-B Answers: It's me I am a xxx-Machine
Machine-C Answers: It's me I am a xxx-Machine
Machine-A now evaluates the replies and decides to work with Machine-B (because Machine-C does not match As interface):
Machine-A to B: Hello B, Give me some Value, please!
Machine-B to A: There you go: 2.349590
This can be extended to different short messages. After each message the sender holds the type of message in a state and the reply will be evaluated in relation to the state / context.
What I was trying to avoid was defining a bit-based protocol (like MIDI) which defines all events as bit based flags. Since we do not now what type of hardware there will be added in the future I want a communication protocol that's very flexible and does not need a coordinator or message broker, etc.
But since this is the first time I am thinking about communication protocols I am curious to know if there might be some existing frameworks that can handle complex communication on a light payload.

You might want to read through the ZigBee Cluster Library specification with a focus on the general commands. It describes a system of attribute discovery and retrieval. Each attribute has a 16-bit ID and a datatype (integers of various sizes, enumerated types, bitmaps) that determines its size.
It's a protocol designed for the small payloads of an 802.15.4 network, and you could potentially based your protocol off of a subset of it. Other ZigBee specifications are simply a list of defined attributes (and commands) for a given 16-bit cluster ID.
Your master device can go through a discovery process to get a list of attribute IDs, and then send a request to get values for multiple IDs in one shot. The response will be packed tight with a 16-bit ID, 8-bit attribute type and then variable length data. Even if your master device doesn't know what the ID corresponds to, it can pass the data along to other systems (like a web server) that do know.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex