Pydicom to numpy and back to pydicom - dicom

This is my first post, I hope I have followed convention.
I've found a lot of success with pydicom, but am stuck on one particular application. I would like to do the following:
Read in dicom to numpy array
Reshape to (frames, rows, columns, pixels)
Do some processing including cropping and converting to grayscale
Output as new dicom file
I use
r = ds.Rows
c = ds.Columns
f = ds.NumberOfFrames
s = ds.SamplesPerPixel
imageC = np.reshape(img,(f,r,c,s), order='C')
to get the initial numpy matrix I want and do the processing. I have confirmed that these steps look reasonable.
Prior to saving the new dicom, I update the ds Rows and Columns with the new correct dimensions and set SamplesPerPixels to 1. I then reshape the numpy matrix before reassigning to PixelData with .tostring().
np.reshape(mat, (p, f, r, c), order='C')
The resulting image is nonsensical (green) in my dicom viewer. Are there any obvious logical mistakes? I can provide more code if it would be of use.

I am rather guessing, as I have not used pydicom for witing files. Anyway, if the original image is an RGB one and you convert it to grayscale, than you should change the Media Storage SOP Class UID of the image so that the viewer can interpret it properly. Can you check the value? It is under tag (0002,0002). Here is the list.
It is possible that there are more tags to change. Can you dump both files and show us differences?
By the way, from your post it seems that you import the image by ds.PixelData. Why don't you use ds.pixel_array? Then you wouldn't need to reshape.


How to convert .dicom (slices) to a single (volume) image?

I have 'n' number of slices, is it possible to convert them to a single file, (that has correct slice arrangement), and parse them using ImageIO or any other python package ?
I'm not sure what ImageIO is, however for parsing a set of slices (which I assume you mean a single CT or MR type series, that's meant to be a single 3D volume) check out simpleITK.
I think it will do exactly what you want: it's a very complete "3d aware" dicom library (and very fast as it's wrapped around C libraries). In particular it will read a complete multi-file series, and create a single 3D representation of it.
It's representation is based on extended numpy objects - so in particular it will have a 3D numpy array for the series, but in addition knows about the 3D location/orientation of the series relative to the dicom patient coordinate system.
So once you have that, you've got all the spatial/3D info you need to be able to use with any other python libraries.

Weka Apriori No Large Itemset and Rules Found

I am trying to do apriori association mining with WEKA (i use 3.7) using given database table
So, i exported two columns (orderLineNumber and productCode) and load it into weka, as far as i go, i haven't got any success attempt, always ended with "No large itemsets and rules found!"
Again, i tried to convert the csv into ARFF file first using ARFF Converter and still get the same message;
I also tried using database loader in WEKA, the data loaded just fine but still give the same result;
The filter i've applied in preprocessing is only numericToNominal filter;
What have i wrongly done here, i suspiciously think it was my ARFF format though, thank you
After further trial, i found out that i exported wrong column and i lack 1 filter process, which is "denormalized", i installed the plugin via packet manager and denormalized my data after converting it to nominal first;
I then compared the results with "Supermarket" sample's result; The only difference are my output came with 'f' instead of 't' (like shown below) and the confidence value seems like always 100%;
First of all, OrderLine is the wrong column.
Obviously, the position on the printed bill is not very important.
Secondly, the file format is not appropriate.
You want one line for every order, one column for every possible item in the #data section. To save memory, it may be helpful to use sparse formats (do not forget to set flags appropriately)
Other tools like ELKI can process input formats like this, that may be easier to use (it also was a lot faster than Weka):
apple banana
milk diapers beer
but last I checked, ELKI would "only" find frequent itemsets (the harder part) not compute association rules. I then used a tiny python script to produce actual association rules as desired.

Read HDF5 data with numpy axis order with Julia HDF5

I have an HDF5 file containing arrays that are saved with Python/numpy. When I read them into Julia using HDF5.jl, the axes are in the reverse of the order in which they appear in Python. To reduce the mental gymnastics involved in moving between the Python and Julia codebases, I reverse the axis order when I read the data into Julia. I have written my own function to do this:
function reversedims(ary::Array)
permutedims(ary, [ ndims(ary):-1:1 ])
data =, somekey) |> reversedims
This is not ideal because (1) I always have to import reversedims to use this; (2) I have to remember to do this for each Array I read. I am wondering if it is possible to either:
instruct HDF5.jl to read in the arrays with a numpy-style axis order, either through a keyword argument or some kind of global configuration parameter
use a builtin single argument function to reverse the axes
The best approach would be to create a H5py.jl package, modeled on MAT.jl (which reads and writes .mat files created by Matlab). See also
It looks to me like permutedims! does what you're looking for, however it does do an array copy. If you can rewrite the hdf5 files in python, numpy.asfortranarray claims to return your data stored in column-major format, though the numpy internals docs seem to suggest that the data isn't altered, simply the stride is, so I don't know if the hdf5 file output would be any different
Edit: Sorry, I just saw you are already using permutedims in your function. I couldn't find anything else on the Julia side, but I would still try the numpy.asfortranarray and see if that helps.

CSV Import in Gephi

I've created my network using R from a large dataset. I've used a smaller one to test and wrote my own plotter to show how I'd like it displayed, I just can't seem to get it right....
This Image shows how my network should look. I've tried square matrices of data (36x36) and a 1x36 exported as CSV, neither of which give the result I desire.
Ignoring the bigger circles, I'd like the network displayed in the image above.
Version 1 - 1x36 -
Version 2 - 36x36 -
The structure is as follows. Row 1 & Column 1 - node names. All numbers decide if an edge exists or not (0 or 1).
When I try to import these files, Gephi interprets them in an unusual way.
Is there something I'm doing wrong?
I suggest you to use rgexf. It is available at
I assume that you have a edgelist already. Let me call it x.
data <- edge.list(x) # It creates two objects from your edgelist: data$nodes and data$edges
g <- write.gexf(nodes=data$nodes,edges=data$edges,...) # It creates a graph in gexf format, here you can add nodes' attributes, edges' attributes, etc...
print(g, file="mygraph.gexf") # It saves the graph
For more details. The manual is here:

Generating a SequenceFile

Given data in the following format (tag_uri image_uri image_uri image_uri ...), I need to turn them into Hadoop SequenceFile format for further processing by Mahout (e.g. clustering)
Before this I would turn the input into csv (or arff) as follows,,...
with each row describes one tag. Then the arff file is converted into a vector file used by mahout for further processing. I am trying to skip the arff generation part, and generate a sequenceFile instead. If I am not mistaken, to represent my data as a sequenceFile, I would need to store each row of the data with $tag_uri as key, then $image_vector as value. What is the proper way of doing this (if possible, can I have the tag_url for each row to be included in the sequencefile somewhere)?
Some references that I found, but not sure if they are relevant:
Writing a SequenceFile
Formatting input matrix for svd matrix factorization (can I store my matrix in this form?)
RandomAccessSparseVector (considering I only list images that are assigned with a given tag instead of all the images in a line, is it possible to represent it using this vector?)
SequenceFile write
SequenceFile explanation
You just need a SequenceFile.Writer, which is explained in your link #4. This lets you write key-value pairs to the file. What the key and value are depends on your use case, of course. It's not at all the same for clustering versus matrix decomposition versus collaborative filtering. There's not one SequenceFile format.
Chances are that the key or value will be a Mahout Vector. The thing that knows how to write a Vector is VectorWritable. This is the class you would use to wrap a Vector and write it with SequenceFile.Writer.
You would need to look at the job that will consume it to make sure you're passing what it expects. For clustering, for example, I think the key is ignored and the value is a Vector.
