What is the structure of the Presentation-Data-Value in P-Data-TF? - dicom

I find these for Presentation Data Value:
0 | 0 | 0 | 0 | 0 | 0 | last-or-not | command-or-dataset | **Some Message**
But I couldn't find Some Message part. I suppose this part include C-Find, C-Get etc. How can I know this structure?

Where do you have this from? In fact it is a bit different.
Your example should read
0 | 0 | 0 | 0 | 0 | 0 | last-or-not | command-or-data*set* | **Some Message**
So the "command-or-dataset" flag indicates whether the following bytes are encoding a command (as defined in PS3.7) or a dataset as defined in PS3.3 or 3.4 respectively).
E.g. for DICOM Queries, there is a C-FIND command defined in PS3.7, Chapter 9.1.2.1. In C-FIND, the query criteria are part of the command ("Identifier") in table 9.1-2. How the identifier is formed and all its semantics is subject of the Query/Retrieve Service Class as defined in PS3.4, C.4.1.
For transferring objects, there is a C-STORE command, also defined in PS3.7 (chapter 9.1.1.1). The Data-Set is also a part of the C-STORE command, and its contents depend on the type of data (SOP Class). This is referred to as an Information Object Definition (IOD) and defined in PS3.3. The protocol for Strorage is also defined in PS3.4 (Annex B)
However, the length limitation of the PDV will only allow to have the whole object encoded in a single PDV and needs to be split. For the following PDVs, no command set will be present but only a fragment of the dataset. In this case, the "command-or-dataset" bit must be set to 0.
I hope I could make it a bit clear. It is a bit difficult in the beginning of learning DICOM to know all the terms and the interrelationships.
Encoding
Logically command- and dataset are encoded in the same way. The data dictionary (Part 6) is a complete list of all possible attributes and the major difference between command- and data set attributes is that command attributes are having the group number 0 while data set attributes have "any even number but 0".
For each attribute, the data dictionary gives you the Value Representation (VR) which needs to be considered for encoding the value. E.g. "PN" for patient name "UI" for Unique Identifier and so forth. The VRs are defined in PS3.5, Chapter 6.2.
The encoding of attributes is then
group | element | (VR) | length (always even) | value
How this is transformed to the binary level depends on the Transfer Syntax (TS) that was agreed for the service during association negotiation. For this reason "VR" is enclosed in brackets above - it depends whether it is an implicit or explicit TS if this must/must not be present.
There are some more things to consider (endianess, sequence encoding) when encoding code sets or datasets in binary form. Basically everything about it is described in various chapters in PS3.5

Related

Issue with Setting the Command Set in N_CREATE Response, pydicom

I'm working on a MPPS SCP as described here: MPPS SCP as a basic framework.
I've been able to test it a bit with DVTk, with some tools that are available here: DVTk
Most of it seems to be working correctly, but the issue I seem to be having is that the response is suppose to have tags with group 0000 returned in the "Command Set" and not in the returned DataSet itself: I actually did set them in the DataSet just to verify that I am getting the correct values, e.g.:
python_mpps_1 | (0000, 0000) Command Group Length ????
python_mpps_1 | (0000, 0002) Affected SOP Class UID UI: Modality Performed Procedure Step SOP Class
python_mpps_1 | (0000, 0100) Command Field US: 33088
python_mpps_1 | (0000, 0120) Message ID Being Responded To US: 2
python_mpps_1 | (0000, 0800) Command Data Set Type US: 0
python_mpps_1 | (0000, 0900) Status US: 0
python_mpps_1 | (0008, 0016) SOP Class UID UI: Modality Performed Procedure Step SOP Class
I'm not exactly sure what Command Group Length, Command Field and Command Data Set Type should be, but more importantly, I do not know how to set them appropriately. I do not think they should be set in the Dataset, but part of the Command Set object for the N_CREATE response:
# 'N-CREATE-RSP': (
# 'CommandGroupLength', 'AffectedSOPClassUID', 'CommandField',
# 'MessageIDBeingRespondedTo', 'CommandDataSetType', 'Status',
# 'AffectedSOPInstanceUID',
# 'ErrorID', 'ErrorComment'
# ),
Using DVTk as a testing tool, the MPPS.SCU script in their example scripts, everything seems to work except for the Command Set values not being sent in the response. After a little digging, I think those have to be set in another way, but I am not sure how.
The pynetdicom documentation might have some further info about that (first link), but I've been unable to find it.
The Command Group Length (0000,0000) is the total number of bytes of your binary encoded message. This should usually be set by the toolkit that you are using (see comment from Scaramillion).
Your command type is an N-CREATE Response, and usually this comes without any dataset. Not knowing the DVT script, I assume that your script does not expect a dataset being attached to the command set.
I.e. SOP Class UID (0008, 0016) should not be present (it is already part of the Command Set as Affected SOP Class UID (0000,0002)), and Command Data Set Type (0000, 0800) should be set to 0x0101 to indicate that no data set is following the command set.
At least this counts for a successful N-CREATE operation.

Interpreting and converting zone file data

I have a zone for a single TLD. I am trying to process the file data and convert it into JSON for other services that uses this data. Here's the first five lines of the file I have:
com. 900 in soa a.gtld-servers.net. nstld.verisign-grs.com. 1612915221 1800 900 604800 86400
0-------------------------------------------------------------0.com. 172800 in ns ns1.domainit.com.
0-------------------------------------------------------------0.com. 172800 in ns ns2.domainit.com.
0-------------------------------------------------------------5.com. 172800 in ns fns.frogsmart.net.
0-------------------------------------------------------------5.com. 172800 in ns sns.frogsmart.net.
0-------------------------------------------------------------5.com. 172800 in ns tns.frogsmart.net.
Now I am not sure as how to interpret this file's data. I have looked at reference and example zone files at multiple places but, it does not resemble this format. One of the references can be found here. I just need some pointers on how to interpret each line. My understanding are the following:
The first value is the domain name
The next value is a number which, if I use the first line as header seems to be 900 (not sure what is)
The next value is in (not sure what this is)
The next value is soa which is ns (I think this means Start of Authority for domain is with Name server)
Lastly, the name server which, if I use the first line as header seems to a.gtld-servers.net (I think this is the primary SOA address)
Now the other properties (the first line I think indicates 10 properties) but these are not present in this file I am trying to process. That's all I could figure out so far and some help will be greatly appreciated.
First a warning: zonefiles can be big, especially .com one and converting that to JSON, especially if you intend to fully build the object in memory before using it, you might have trouble.
So you should start by asking yourselves if you really need all the data (for example as seen below what will you do with SOA content?) and if JSON is the most adequate representation, especially if not in a streaming way.
DNS data is explained in RFC 1034+1035.
More specifically ยง3.3.13 in RFC1035:
3.3.13. SOA RDATA format
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
/ MNAME /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
/ RNAME /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| SERIAL |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| REFRESH |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| RETRY |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| EXPIRE |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| MINIMUM |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
where:
MNAME The of the name server that was the
original or primary source of data for this zone.
RNAME A which specifies the mailbox of the
person responsible for this zone.
SERIAL The unsigned 32 bit version number of the original
copy
of the zone. Zone transfers preserve this value. This
value wraps and should be compared using sequence space
arithmetic.
REFRESH A 32 bit time interval before the zone should be
refreshed.
RETRY A 32 bit time interval that should elapse before a
failed refresh should be retried.
EXPIRE A 32 bit time value that specifies the upper limit on
the time interval that can elapse before the zone is no
longer authoritative.
But do not that the semantics have changed in later RFCs, the MINIMUM is now called the NEGATIVE TTL.
Also IN (case not significant) means INternet but all records will have that, consider it as a left over of past DNS experiments around classes that never worked.

How do I flatten a large, complex, deeply nested JSON file into multiple CSV files a linking identifier

I have a complex JSON file (~8GB) containing publically available data for businesses. We have decided to split the files up into multiple CSV files (or tabs in a .xlsx), so clients can easily consume the data. These files will be linked by the NZBN column/key.
I'm using R and jsonlite to read a small sample in (before scaling up to the full file). I'm guessing I need some way to specify what key/columns go in each file (i.e, the first file will have headers: australianBusinessNumber, australianCompanyNumber, australianServiceAddress, the second file will have headers: annualReturnFilingMonth, annualReturnLastFiled, countryOfOrigin...)
Here's a sample of two businesses/entities (I've bunged some of the data as well so ignore the actual values): test file
I've read almost every post on s/o of similar questions and none seem to be giving me any luck. I've tried variations of purrr, *apply commands, custom flattening functions and jqr (an r version of 'jq' - looks promising but I can't seem to run it).
Here's an attempt at creating my separate files, but I'm unsure how to include the linking identifier (NZBN) + I keep running into further nested lists (i'm unsure how many levels of nesting there are)
bulk <- jsonlite::fromJSON("bd_test.json")
coreEntity <- data.frame(bulk$companies)
coreEntity <- coreEntity[,sapply(coreEntity, is.list)==FALSE]
company <- bulk$companies$entity$company
company <- purrr::reduce(company, dplyr::bind_rows)
shareholding <- company$shareholding
shareholding <- purrr::reduce(shareholding, dplyr::bind_rows)
shareAllocation <- shareholding$shareAllocation
shareAllocation <- purrr::reduce(shareAllocation, dplyr::bind_rows)
I'm not sure if it's easier to split the files up during the flattening/wrangling process, or just completely flatten the whole file so I just have one line per business/entity (and then gather columns as needed) - my only concern is that I need to scale this up to ~1.3million nodes (8GB JSON file).
Ideally I would want the csv files split every time there is a new collection, and the values in the collection would become the columns for the new csv/tab.
Any help or tips would be much appreciated.
------- UPDATE ------
Updated as my question was a little vague I think all I need is some code to produce one of the csv's/tabs and I replicate for the other collections.
Say for example, I wanted to create a csv of the following elements:
entityName (unique linking identifier)
nzbn (unique linking
identifier)
emailAddress__uniqueIdentifier
emailAddress__emailAddress
emailAddress__emailPurpose
emailAddress__emailPurposeDescription
emailAddress__startDate
How would I go about that?
i'm unsure how many levels of nesting there are
This will provide an answer to that quite efficiently:
jq '
def max(s): reduce s as $s (null;
if . == null then $s elif $s > . then $s else . end);
max(paths|length)' input.json
(With the test file, the answer is 14.)
To get an overall view (schema) of the data, you could
run:
jq 'include "schema"; schema' input.json
where schema.jq is available at this gist. This will produce a structural schema.
"Say for example, I wanted to create a csv of the following elements:"
Here's a jq solution, apart from the headers:
.companies.entity[]
| [.entityName, .nzbn]
+ (.emailAddress[] | [.uniqueIdentifier, .emailAddress, .emailPurpose, .emailPurposeDescription, .startDate])
| #csv
shareholding
The shareholding data is complex, so in the following I've used the to_table function defined elsewhere on this page.
The sample data does not include a "company name" field so in the following, I've added a 0-based "company index" field:
.companies.entity[]
| [.entityName, .nzbn] as $ix
| .company
| range(0;length) as $cix
| .[$cix]
| $ix + [$cix] + (.shareholding[] | to_table(false))
jqr
The above solutions use the standalone jq executable, but all going well, it should be trivial to use the same filters with jqr, though to use jq's include, it might be simplest to specify the path explicitly, as for example:
include "schema" {search: "~/.jq"};
If the input JSON is sufficiently regular, you
might find the following flattening function helpful, especially as it can emit a header in the form of an array of strings based on the "paths" to the leaf elements of the input, which can be arbitrarily nested:
# to_table produces a flat array.
# If hdr == true, then ONLY emit a header line (in prettified form, i.e. as an array of strings);
# if hdr is an array, it should be the prettified form and is used to check consistency.
def to_table(hdr):
def prettify: map( (map(tostring)|join(":") ));
def composite: type == "object" or type == "array";
def check:
select(hdr|type == "array")
| if prettify == hdr then empty
else error("expected head is \(hdr) but imputed header is \(.)")
end ;
. as $in
| [paths(composite|not)] # the paths in array-of-array form
| if hdr==true then prettify
else check, map(. as $p | $in | getpath($p))
end;
For example, to produce the desired table (without headers) for .emailAddress, one could write:
.companies.entity[]
| [.entityName, .nzbn] as $ix
| $ix + (.emailAddress[] | to_table(false))
| #tsv
(Adding the headers and checking for consistency,
are left as an exercise for now, but are dealt with below.)
Generating multiple files
More interestingly, you could select the level you want, and produce multiple tables automagically. One way to partition the output into separate files efficiently would be to use awk. For example, you could pipe the output obtained using this jq filter:
["entityName", "nzbn"] as $common
| .companies.entity[]
| [.entityName, .nzbn] as $ix
| (to_entries[] | select(.value | type == "array") | .key) as $key
| ($ix + [$key] | join("-")) as $filename
| (.[$key][0]|to_table(true)) as $header
# First emit the line giving all the headers:
| $filename, ($common + $header | #tsv),
# Then emit the rows of the table:
(.[$key][]
| ($filename, ($ix + to_table(false) | #tsv)))
to
awk -F\\t 'fn {print >> fn; fn=0;next} {fn=$1".tsv"}'
This will produce headers in each file; if you want consistency checking, change to_table(false) to to_table($header).

Google Sheets FILTER() and QUERY() not working with SUM()

I'm trying to pull and sum data from one sheet on another. This is GA data being built into a report, so I have sessions split up by landing page and device type, and would like to group them in different ways.
I usually use FILTER() for this sort of thing, but it keeps returning a 0 sum. Thinking this may be an odd edge case with FILTER(), I switched to using QUERY() instead. That gave me an error, but a Google search doesn't offer much documentation about what the error actually means. Taking a guess that it could be indicating an issue with the data type (i.e. not numeric), I changed the format of the source from "Automatic" to "Number", but to no avail.
Maybe it's a lack of coffee, I'm at a loss as to why neither function is working to do a simple lookup and sum by criteria.
FILTER() function
SUM(FILTER(AllData!C:C,AllData!A:A="/chestnut/",AllData!B:B="desktop"))
No error, but returns 0 regardless of filter parameters.
QUERY() function
QUERY(AllData!A:G, "SELECT SUM(C) WHERE A='/chestnut/' AND B='desktop'",1)
Error returned:
Unable to parse query string for Function QUERY parameter 2: AVG_SUM_ONLY_NUMERIC
Sample data:
landingPage | deviceCategory | sessions
-------------|----------------|----------
/chestnut/ | desktop | 4
/chestnut/ | desktop | 2
/chestnut/ | tablet | 5
/chestnut/ | tablet | 1
/maple/ | desktop | 1
/maple/ | desktop | 2
/maple/ | mobile | 3
/maple/ | mobile | 1
I think the summing doesn't work because your numbers are text formatted.
See if any of these work? (change ranges to suit)
using FILTER()
=SUM(FILTER(VALUE(AllData!C:C),AllData!A:A="/chestnut/",AllData!B:B="desktop"))
using QUERY()
=ArrayFormula(QUERY({AllData!A:B, VALUE(AllData!C:C)}, "SELECT SUM(Col3) WHERE Col1='/chestnut/' AND Col2='desktop' label SUM(Col3)''",1))
using SUMPRODUCT()
=SUMPRODUCT(VALUE(AllData!C2:C),AllData!A2:A="/chestnut/",AllData!B2:B="desktop")

TCP/IP communication from the unix server to the Pure Data

I am interested in TCP/IP communication from the Unix server to the Pure Data. I have it realized using sockets on the Unix server side, and netclient on the Pure Data side. I exploited the chat-server tutorial for this (3.Networking > 10.chat_client.pd).
Now the problem lies that the server is streaming the data out as a "string" message delimited with ";"
My question is, is there a way to send something other than string message to Pure Data, like byte-stream or serialized number stream? Can Pure Data receive such messages?
Since string takes too many bytes to transfer, for example number "1024;" is already 5 bytes, while such an integer number is just 4 bytes.
UPDATE: For everyone that stumbles upon this post in search for the answer.
Apparently [netclient] on the Pure Data side cannot receive nothing else than ; delimited messages.
So the solution for the problem posed above:
My question is, is there a way to send something other than string message to Pure Data, like byte-stream or serialized number stream? Can Pure Data receive such messages?
The solution is to use [tcpclient], it can receive byte-stream data.
Now my question is, how do I get four compact numbers to work with?
Now I have a series of bytes, at least in the correct order.
From my UNIX server I am sending a structure
typedef struct {
int var_code;
int sample_time;
int hr;
float hs;
} phy_data;
Sample data might be 2 1000000 51 2000.56
When received and printed in Pure Data I get output like this:
: 0 0 0 2 0 10 114 26 0 0 0 51 0 16 242 78
You can notice number 2 and number 51 clearly, I guess the others are correct as well.
How can I get these numbers back to a usable format?
Maybe some manipulation with [bytes2any] and [route], but I haven't been able to extract the data with it?
here's an outline of what you have to do:
repackage the bytelist to small messages of the correct size for the various types.
since all your elements are 4 byte long, you simply repackage your list (or bytestream, as TCP/IP doesn't guarantee to deliver your 16 bytes as a single list, but could also decide to break it into a list of arbitrary length) to a number of 4 atom lists.
the most stable way, would probably be to 1st serialize the list (check the "serializer" example in the [list] help) and than reassamble that list to 4 elements.
if you can use externals like zexy you could use [repack 4] for that.
if you trust [netclient] to output your messages as complete lists, you could simply use a large [unpack ....] and 4 [pack]s
interpret the raw data for each sublist
integers is rather simple, floats are way more complicated
integers:
|
[unpack 0 0 0 0]
| | | |
[<< 8] | | |
| | | |
[+ ] | |
| | |
[<< 8] | |
| | |
[+ ] |
| |
[<< 8] |
| |
[+ ]
|
floats are left as an exercise to the user :-)
the real solution to your problem would be to use a well-defined application-layer protocol, rather than brew your own.
the most widespread protocol in use for applications like Pd, is certainly OSC.
in order to decode the raw OSC-bytes into Pd-messages, use [unpackOSC] (part of the "mrpeach" library; on Debian, you install it via the pd-osc package)
on the "server" side, you can use liblo for encoding data and sending it.
note
be aware that since OSC is packet-based, you will need a packetizing mechanism for stream-based protocols like TCP/IP. as with OSC-1.2, this should be SLIP. liblo should already take care of this. check the patches accompanying [unpackOSC] for how to do this within Pd.
all this is not needed if you are using a UDP as a transport.

Resources