Conv3D model input tensor - multidimensional-array

I am new to PyTorch and I want to make a classifier for 3D DICOM MRIs. I want to use the pretrained resnet18 from monai library but I am confused with the input dimensions of the tensor. The shape of the images in my dataloader is [2,160,256,256] where 2 is the batch_size, 160 is the number of dicom images for each patient and 256x256 is the dimension of the images.
When I try to run the model I get this error:
Expected 5-dimensional input for 5-dimensional weight [64, 3, 7, 7, 7], but got 4-dimensional input of size [2, 160, 256, 256] instead
If I unsqueeze the tensor before feeding it to the model I get:
Given groups=1, weight of size [64, 3, 7, 7, 7], expected input[1, 2, 160, 256, 256] to have 3 channels, but got 2 channels instead
Can anybody help me figure this out ?

you need to add the channel dimension for each slice (which is one for MRIs).
Eg your input should be of shape (2,1,160,256,256)

Related

umap for dictionary in Julia

I'm given a dictionary with keys(ids) and values.
> Dict{Int64, Vector{Float64}} with 122 entries:
3828 => [1, 2, 3, 4...
2672 => [6,7,5,8...
...
Now I need to apply umap on it. I have the code that
embedding = umap(mat, 2; n_neighbors=15, min_dist=0.001, n_epochs=200)
println(size(embedding))
Plots.scatter(embedding[1,:],embedding[2,:])
Here mat is the matrix
1, 2, 3, 4
6, 7, 5, 8
....
So I got the embedding matrix and the umap plot. But in the plot all points are same color and no labels. How do I do so that I can get points with labels(keys in the dictionary)?
Looking at UMAP.jl, the input matrix should have the shape (n_features x n_samples). If each entry in your dictionary is a sample and I’m interpreting your matrix notation correctly, it appears you have this reversed.
You should be able to add the keys of the dictionary as annotations to the plot as follows (potentially with an optional additional offset to each coordinate):
Plots.annotate!(
embedding[1,:] .+ x_offset,
embedding[2,:] .+ y_offset,
string.(collect(keys(yourdict)))
)
Finally, I’m not sure what variable you actually want to map to the color of the markers in the scatterplot. If it’s the integer value of the keys you should pass this to the scatter function just like above except without turning them into strings.

Creating subgraphs with overlapping vertices

I've been looking for packages using which I could create subgraphs with overlapping vertices.
From what I understand in Networkx and metis one could partition a graph into two or multi-parts. But I couldn't find how to partition into subgraphs with overlapping nodes.
Suggestions on libraries that support partitioning with overlapping vertices will be really helpful.
EDIT: I tried the angel algorithm in CDLIB to partition the original graph into subgraphs with 4 overlapping nodes.
import networkx as nx
from cdlib import algorithms
if __name__ == '__main__':
g = nx.karate_club_graph()
coms = algorithms.angel(g, threshold=4, min_community_size=10)
print(coms.method_name)
print(coms.method_parameters) # Clustering parameters)
print(coms.communities)
print(coms.overlap)
print(coms.node_coverage)
Output:
ANGEL
{'threshold': 4, 'min_community_size': 10}
[[14, 15, 18, 20, 22, 23, 27, 29, 30, 31, 32, 8], [1, 12, 13, 17, 19, 2, 21, 3, 7, 8], [14, 15, 18, 2, 20, 22, 30, 31, 33, 8]]
True
0.6470588235294118
From the communities returned, I understand 1 and 3 have an overlap of 4 nodes but 2 and 3 or 1 and 3 don't have an overlap size of 4 nodes.
It is not clear to me how the overlap threshold (4 overlaps) has to be specified
here algorithms. angel(g, threshold=4, min_community_size=10). I tried setting threshold=4 here to define an overlap size of 4 nodes. However, from the documentation available for angel
:param threshold: merging threshold in [0,1].
I am not sure how to translate the 4 overlaps to the value that has to be set between the bounds [0, 1]. Suggestions will be really helpful.
You can check out CDLIB:
They have a great amount of community finding algorithms applicable to networkX, including some overlapping communities algorithms.
On a side note:
The return type of the functions is called Node Clustering which might be a little confusing at first so here are the methods applicable to it, usually you simply want to convert to a Python dictionary.
Specifically about the angel algorithm in CDLIB:
According to ANGEL: efficient, and effective, node-centric community discovery in static and dynamic networks, the threshold is not the overlapping threshold, but used as follows:
If the ratio is greater than (or equal to) a given threshold, the merge is applied and the node label updated.
Basically, this value determines whether to further merge the nodes into bigger communities, and is not equivalent to the number of overlapping nodes.
Also, don't mistake "labels" with "node's labels" (as in nx.relabel_nodes(G, labels)). The "labels" referred are actually correlated with the Label Propagation Algorithm which is used by ANGEL.
As for the effects of varying this threshold:
[...] Increasing the threshold, we obtain a higher number of communities since lower quality merges cannot take place.
[based on the comment by #J. M. Arnold]
From ANGEL's github repository you can see that when threshold >= 1 only the min_comsize value is used:
self.threshold = threshold
if self.threshold < 1:
self.min_community_size = max([3, min_comsize, int(1. / (1 - self.threshold))])
else:
self.min_community_size = min_comsize

Julia LightGraphs weakly_connected_components

Isn't it true that the weakly_connected_components in julia LightGraphs should provide connected components where if The DiGraph is turned into an undirected graph, then each component should be connected?
I have tried this and I do not receive such components? As an example I have tried this on the political blogs data as an undirected network
data=readdlm(path,',',Int64) #contains edges in each row
N_ = length(unique(vcat(data[:,1],data[:,2]))) ##to get number of vertices
network = LightGraphs.DiGraph(N_)
#construct the network
for i in 1:size(data,1)
add_edge!(network, Edge(data[i,1], data[i,2]))
end
#largest weakly connected component
net = weakly_connected_components(network)[1]
temp_net,vmap = induced_subgraph(network, net)
and after getting the largest weakly connected component, I see the following:
isempty([i for i in vertices(temp_net) if isempty(edges(temp_net).adj[i])])
julia>false
which signigies some nodes have no incoming or outgoing edges.
What can be the problem? I am using the latest release 6, but the LightGraphs package tests seem to be working.
In addition to what #dan-getz said, I must implore you not to access any internal data fields of structures - we have accessors for everything that's "public". Specifically, edges(temp_net).adj is not guaranteed to be available. It's currently the same as fadj(g), the forward adjacency list of g, for both directed and undirected graphs, but it's not intended to be used except to help keep edge iteration state.
If you use .adj, your code will break on you without warning at some point.
The TL;DR answer is that edges(temp_net).adj[i] contains only the vertices i connects to, and not those connecting to i. And some vertices have no incoming edges.
The longer version, is the following which shows temp_net in a randomly generated network and assigned as in the question is indeed weakly-connected. First building a random network with:
julia> using LightGraphs
julia> N_ = 1000 ;
julia> network = LightGraphs.DiGraph(N_)
{1000, 0} directed simple Int64 graph
julia> using Distributions
julia> for i in 1:N_
add_edge!(network, sample(1:N_,2)...)
end
julia> net = weakly_connected_components(network)[1]
julia> temp_net,vmap = induced_subgraph(network, net)
({814, 978} directed simple Int64 graph, [1, 3, 4, 5, 6, 9, 10, 11, 12, 13 … 989, 990, 991, 993, 995, 996, 997, 998, 999, 1000])
And, now, we have:
julia> length(vertices(temp_net))
814
julia> invertices = union((edges(temp_net).adj[i] for i in vertices(temp_net))...);
julia> outvertices = [i for i in vertices(temp_net) if !isempty(edges(temp_net).adj[i])] ;
julia> length(union(invertices,outvertices))
814
Thus all 814 vertices either have edges from, or to them (or both) and are part of the weakly connected component.

Node labels are not in sequence in trees using fancyRpartPlot

I applied the decision tress algorithm to famous breast cancer data set from UCI after removing 16 records with missing rows. The tree I obtained is given below. As it can be seen that the small number appearing on top of the node are not in order. The numbers are 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 28 and 20. Are the missing nodes deleted due to pruning? Can I enable the option to see them? Can I correct the sequence according to the nodes appearing i.e. 1 to 13?

predictions using pylearn2 models

I have trained the following CNN model using pylearn2.
h1
Input space: Conv2DSpace(shape=(25, 150), num_channels=1, axes=('b', 0, 1, 'c'), dtype=float64)
Total input dimension: 3750
h2
Input space: Conv2DSpace(shape=(11, 73), num_channels=8, axes=('b', 'c', 0, 1), dtype=float64)
Total input dimension: 6424
h3
Input space: VectorSpace(dim=1024, dtype=float64)
Total input dimension: 1024
h4
Input space: VectorSpace(dim=1024, dtype=float64)
Total input dimension: 1024
y
Input space: VectorSpace(dim=1024, dtype=float64)
Total input dimension: 1024
You can observe that input examples to this CNN are gray images of size 25 x150. The final number of outputs are 10, that is, the layer 'y' has an output dimension of 10.
My training dataset is created using the CSVDataset in pylearn2, and I'm able to train the model.
However, I have a problem in making predictions using this model, which I'm trying to do using the predict_csv.py file in scripts/mlp folder.
The problem is that predict_csv.py directly loads the test.csv file into a 2d matrix of 1000 x 3750 representing 1000 test examples each having 3750 pixels each. However, while predicting theano expects the input to be of the same format as input of layer 'h1'. The following error occurs:
TypeError: ('Bad input argument to theano function with name "../mlp/predict_csv.py:111" at index 0(0-based)', 'Wrong number of dimensions: expected 4, got 2 with shape (1000, 3750).')
I guess the required format is the ('b', 0, 1, 'c') format of pylearn2.
I would really like to know how do we make this transformation from the 2d array to the above required format. Or any other way this problem could be dealt with?
To solve my problem, I ended up manually converting the 2D set of images (1000 x 3750) to a 4D array with columns as number-of-examples, rows and columns in image, and number-of-channels (1000 x 25 x 150 x 1). It worked fine after this transformation.
I was hoping to find a pylearn2 class or function that directly served my purpose, because while training, pylearn2 is obviously making this change in space itself.

Resources