cycleGAN with blank background - generative-adversarial-network

I am building a cycleGAN with a u-net structure to improve the image quality of Cone Beam Computer Tomography (CBCT), with the Fan Beam Computer Tomography FBCT) being the target images. Because I want to mainly enhance the quality in the lung region, I crop the lung volumes out of the original images and assign value 0 to the remaining regions other than the lung (given that the lung pixel value ranges from -1000 to 0). Scaling and normalization are done to the array with the range of -1000 to 0. Please see the following images as an example for my dataset:
In the example above, the left most image is the CBCT and the right most image is the FBCT. The middle one is the generated image from the CBCT by the cycleGAN model. This is actually an example from the very early epoch of the training.
But when the model goes on training, it somehow gradually loses its ability to capture the anatomy of the images, and eventually generate a blank image with all value 0. (see image below)
The loss along the training is climbing back up after several epoch and it starts to lose the anatomy information:
What makes me curious is that such loss of anatomy does not occur when I simply input the whole CBCT and FBCT images as the dataset, without doing lung segmentation nor value assignment to the regions outside the lung. If un-segmented images are given, the model actually successfully translate the CBCT into mimicking the FBCT quality. I do the segmentation since I want the model to only concentrate in the lung region to see if it performs better.
I wonder if this is the consequences with that the background has extremely high value than the region of interest (i.e. background value: 0; lung value: -1000 to 0). Is there any work published on cycleGAN training with images containing blank background? If yes, is there any special measure when assigning value to the background, or when doing normalization and scaling? I can’t really find any so far.
Any insight is appreciated. Thank you.

Related

Rendering highly granular and "zoomed out" data

There was a gif on the internet where someone used some sort of CAD and drew multiple vector pictures in it. On the first frame they zoom-in on a tiny dot, revealing there a whole new different vector picture just on a different scale, and then they proceed to zoom-in further on another tiny dot, revealing another detailed picture, repeating several times. here is the link to the gif
Or another similar example: imagine you have a time-series with a granularity of a millisecond per sample and you zoom out to reveal years-worth of data.
My questions are: how such a fine-detailed data, in the end, gets rendered, when a huge amount of data ends up getting aliased into a single pixel.
Do you have to go through the whole dataset to render that pixel (i.e. in case of time-series: go through million records to just average them out into 1 line or in case of CAD render whole vector picture and blur it into tiny dot), or there are certain level-of-detail optimizations that can be applied so that you don't have to do this?
If so, how do they work and where one can learn about it?
This is a very well known problem in games development. In the following I am assuming you are using a scene graph, a node-based tree of objects.
Typical solutions involve a mix of these techniques:
Level Of Detail (LOD): multiple resolutions of the same model, which are shown or hidden so that only one is "visible" at any time. When to hide and show is usually determined by the distance between camera and object, but you could also include the scale of the object as a factor. Modern 3d/CAD software will sometimes offer you automatic "simplification" of models, which can be used as the low res LOD models.
At the lowest level, you could even just use the object's bounding
box. Checking whether a bounding box is in view is only around 1-7 point checks depending on how you check. And you can utilise object parenting for transitive bounding boxes.
Clipping: if a polygon is not rendered in the view port at all, no need to render it. In the GIF you posted, when the camera zooms in on a new scene, what is left from the larger model is a single polygon in the background.
Re-scaling of world coordinates: as you zoom in, the coordinates for vertices become sub-zero floating point numbers. Given you want all coordinates as precise as possible and given modern CPUs can only handle floats with 64 bits precision (and often use only 32 for better performance), it's a good idea to reset the scaling of the visible objects. What I mean by that is that as your camera zooms in to say 1/1000 of the previous view, you can scale up the bigger objects by a factor of 1000, and at the same time adjust the camera position and focal length. Any newly attached small model would use its original scale, thus preserving its precision.
This transition would be invisible to the viewer, but allows you to stay within well-defined 3d coordinates while being able to zoom in infinitely.
On a higher level: As you zoom into something and the camera gets closer to an object, it appears as if the world grows bigger relative to the view. While normally the camera space is moving and the world gets multiplied by the camera's matrix, the same effect can be achieved by changing the world coordinates instead of the camera.
First, you can use caching. With tiles, like it's done in cartography. You'll still need to go over all the points, but after that you'll be able zoom-in/zoom-out quite rapidly.
But if you don't have extra memory for cache (not so much actually, much less than the data itself), or don't have time to go over all the points you can use probabilistic approach.
It can be as simple as peeking only every other point (or every 10th point or whatever suits you). It yields decent results for some data. Again in cartography it works quite well for shorelines, but not so well for houses or administrative boarders - anything with a lot of straight lines.
Or you can take a more hardcore probabilistic approach: randomly peek some points, and if, for example, there're 100 data points that hit pixel one and only 50 hit pixel two, then you can more or less safely assume that if you'll continue to peek points still pixel one will be twice as likely to be hit that pixel two. So you can just give up and draw pixel one with a twice more heavy color.
Also consider how much data you can and want to put in a pixel. If you'll draw a pixel in black and white, then there're only 256 variants of color. And you don't need to be more precise. Or if you're going to draw a pixel in full color then you still need to ask yourself: will anyone notice the difference between something like rgb(123,12,54) and rgb(123,11,54)?

breaking and stiching of image after image prediction

I have a trained model to test images of 256x256 size only. But I have to test very big images of higher resolution, for example, 2048x2048.
This can be done while virtually dividing the image into non-overlapping patches of 256x256 and then predicting them by one by one, and then stitching them back to the full image, and then have to calculate the metrics comparing original and final ...
I am not able to make python code for this. Can somebody help

Calculate a dynamic iteration value when zooming into a Mandelbrot

I'm trying to figure out how to automatically adjust the maximum iteration value when moving around in the Mandelbrot fractal.
All examples I've found uses a constant of 1000 or less but that's not enough when zooming into the fractal set.
Is there a way to determine the number of max_iterations based on for example where you are in the Mandelbrot space (x_start,x_end,y_start,y_end)?
One method I tried was to repetitively pre-process a small area in the region of the Mset boundary with increasing iterations until the percentage change in status from one repetition to the next was small. The problem was, that would vary in different places on the current map, since the "depth" varies across it. How to find the right place to do it? By logging the "deepest" boundary area during the previous generation (that will still be within the next zoom area).
But my best strategy was to avoid iterating wherever possible:
Away from the boundary of the Mset, areas of equal depth can be "contoured" and then filled with that depth. It was not an easy algorithm. Basically I followed a raster scan but when I detected a boundary of iteration change (examining all the neighbours to ensure I wasn't close the the edge of the Mset), I would switch to a curve-stitching method to iterate around a contour back to where it started (obviously not recalculating spots I already did), and then make a second pass filling in the raster lines within the countour with the iteration level. It was fraught with leaks but eventually I cracked it.
Within the Mset, I followed the same approach, because the very last thing you want to do is to plough across vast areas and hit the iteration limit.
The difficult area is close the the boundary, where the iteration results can't be related to smooth contours with the neighbours. The contour stitching method won't work here, since there is only ever 1 pixel of a particular depth.
Using the contour method also will have faults to the lower or Mset sides of this region, but since this area looks chaotic until you zoom deeper, I lived with that.
So having said all that, I simply set the iteration depth as high as I can tolerate, but perhaps you can combine my first paragraph with the area-filling techniques.
BTW colouring the region adjacent to the Mset looks terrible when an animated smooth playback of the zoom is attempted. For that reason I coloured this area in a grey scale, by comparing with neighbours. If there was too much difference, I coloured to 0x808080 at first, then adapted that depending on the predominance of the neighbours' depth. All requiring fine tuning!

How to avoid strange structure artifacts in scaled images?

I create a big image stitched out of many single microscope images.
Suddenly, (after several month of working properly) the stitched overview images became blurry and they are containing strange structural artefacts like askew lines (not the rectangulars, they are because of not perfect stitching)
If I open any particular tile in full size, they are not blurry and the artefacts are hardly observable. (Consider, the image below is already 4x scaled)
The overview image is created manually by scaling each tile using QImage::scaled and copying all of them to the corresponding region in the big image. I'm not using opencv's stitching.
I assume, this happens because of image contents, because most if the overview images are ok.
The question is, how can I avoid such hardly observable artefacts to become very clearly visible after scaling? Is there some means in OpenCV or QImage?
Is there any algorithms to find out, if image content could lead to such effect for defined scale-factor?
Many thanks in advance!
Are you sure the camera is calibrated properly? That the lightning is uniform? Is the lens clear? Do you have electrical components that interfere with the camera connection?
If you add image frames of photos on a uniform material (or non-uniform material, moved randomly for significant time), the resultant integrated image should be completely uniform.
If your produced image is not uniform, especially if you get systematic noise (like the apparent sinusoidal noise in the provided pictures), write a calibration function that transforms image -> calibrated image.
Filtering in Fourier space is another way to filter out the noise but considering that the image is rotated you will lose precision, and you'll be cutting off components of the real signal, too. The following empiric method will reduce the noise in your particular case significantly:
ground_output: composite image with per-pixel sum of >10 frames (more is better) over uniform material (e.g. excited slab of phosphorus)
ground_input: the average(or sqrt(sum of px^2)) in ground_output
calib_image: ground_input /(per px) ground_output. Saved for the session, or persistent in a file (important: ensure no lossy compression! (jpeg)).
work_input: the images to work on
work_output = work_input *(per px) calib_image: images calibrated for systematic noise.
If you can't create a perfect ground_input target such as having a uniform material on hand, do not worry too much. If you move any material uniformly (or randomly) for enough time, it will act as a uniform material in this case (think of a blurred photo).
This method has the added advantage of calibrating solitary faulty pixels that ccd cameras have (eg NormalPixel.value(signal)).
If you want to have more fun you can always fit the calibration function to something more complex than a zero-intercept line (steps 3. and 5.).
I suggest scaling the image with some other software to verify if the artifacts are in fact caused by Qt or are inherent in the image you've captured.
The askew lines look a lot like analog tv interference, or CCTV noise induced by 50 or 60 Hz power lines running alongside the signal cable or some other electrical interference on the signal.
If the image distortion is caused by signal interference then you can try to mitigate it by moving the signal lines away from whatever could be the source of the problem, or fit something to try to filter the noise (baluns for example).

computer vision: segmentation setup. Graph cut potentials

I have been trying to teach myself some simple computer vision algorithms and am trying to solve a problem where I have some noise corrupted image and all I am trying to do is separate the black background from the foreground which has some signal. Now, the background RGB channels are not all completely zero as they can have some noise. However, the human eye can easily discern the foreground from the background.
So, what I did was use the SLIC algorithm to break the image down into super pixels. The idea being that since the image is noise corrupted, doing statistics on the patches might result in better classification of background and foreground because of higher SNR.
After this, I get around 100 patches which should have similar profile and the result of SLIC seems reasonable. I have been reading about graph cuts (the Kolmogorov paper) and it seemed like something nice to try for the binary problem I have. So, I constructed a graph which is a first order MRF and I have edges between the immediate neighbours (4-connected graph).
Now, I was wondering what possible unary and binary terms I can use here to do my segmentation. So, I was thinking for the unary term, I can model it as a simple Gaussian where the background should have a zero mean intensity and the foreground should have some non-zero mean. Although, I am struggling to figure out how to encode this. Should I just assume some noise variance and compute probabilities directly using patch statistics?
Similarly, for neighbouring patches I do want to encourage them to take similar label but I am not sure what binary term I can design that reflects that. Seems just the difference between the label (1 or 0) seems weird...
Sorry for the long-winded question. Hoping someone can give some helpful hint on how to start.
You could build your CRF model over superpixels, such that a superpixel has a connection to another superpixel if it is a neighbour of it.
For your statistical model Pixel Wise Posteriors are simple and cheap to compute.
So, I suggest the following for the unary terms of the CRF:
Build foreground and background histograms over texture per pixel(assuming you have a mask, or reasonable amount of marked foreground pixels(note, not superpixels)).
For each superpixel, make an independence assumption over pixels within it, such that a superpixels likelihood of being either foreground or background is the product over each observation in the superpixel(in practice, we sum logs). The individual likelihood terms come from the histograms that you generated.
Compute the posterior for foreground as the cumulative likelihood described above for foreground divided by the sum of the cumulative likelihoods of both. Similar for background.
The pairwise terms between superpixels can be as simple as the difference between the mean observed textures(pixelwise) for each passed through a kernel, such as the Radial Basis Function.
Alternatively, you could compute histograms over each superpixels observed texture(again, pixel wise) and compute the Bhattacharyya Distance between each neighbouring pair of superpixels.

Resources