Why does Vision gives me this text relevance on the picture? - google-cloud-vision

This is the image that I'm scanning
So the first result is Heb, why it isn't 1350? which is in second place.

I think the ordering is based on the position in the image, not size or score, since there is no 'Score' response for text. Presumably the response is sorted by the vertices of the bounding polygon of the text.

Related

Dicom protocol and CT scan and sorting according to z position of axial slices?

I have a CT scan of the chest where I can't seem to figure out how to determine how to sort the aixal slices such that the first slices are the ones closet to the head.
The scan has the following dicom parameters:
Patient Position Attribute (0018,5100): FFS (Feet First Supine)
Image Position (Patient) (0020,0032): -174-184-15 (one slice)
Image Orientation (Patient)(0020,0037): 1\0\0\0\1\0
The most cranial slice (anatomically, closet to the head) has z position 13 and the most caudal (lower) -188.
However, when the Patient Position is FFS shouldn't the slice with the lowest z position (e.g. -188) be the one being the one most cranially (anatomically, i.e. closet to the head) located?
Can anyone enlighten me?
Kind regards
DICOM very clearly defines, that the x-axis has to go from the patients right to the left, the y-axis has to go from front to back, and the z-axis has to go from foot to head.
So the lower z-position of -188 has to be closer to the feet than the higher position of 13. You should always rely on this.
Patient Position Attribute is rather some informational annotation. If you do all the math yourself, then you can ignore it.
If a viewer does not do the math (there are a lot of them) and just loads the images and shows them sored by ImageNumber, then the Position Attribute is the info to annotate, if the image with ImageNumber 1 is the one closer to the head or the one closer to the feet. Meaning: when the patient went throught the ct scanner, which one was the first imate aquisitioned: the one of the head or of the feet.

Add text next to a Point in PCL visualizer

I have an application where I successfully plot 2D laser range data from a LiDAR in real-time and run PCL's Euclidean clustering algorithm to paint those cluster points in a different color. I would however like to add a text next to each detected cluster and tell its distance from the sensor. I do have the coordinate of the centroid point of each detected cluster but when I try to use the addText:
bool pcl::visualization::PCLVisualizer::addText (const std::string & text, int xpos, int ypos,double r,double g,double b,const std::string & id = "")
text: Text to be printed in window
xpos: position in x
ypos: position in y
r: red
g: green
b: blue
id: Text ID tag
It seems like the function addText() puts the text in on PIXEL x- and y-values in stead of real-world values (meters). However, PCLs other method such as "addPoint()", addCircle() etc are indeed placing the data based on real world measurements.
Does anyone have experience with transforming spatial coordinates to pixels in PCL visualizer, or have successfully plotted text in other ways?
Below is a screenshot of my application. Clusters are drawn in red with a white circle around the centroid. At the bottom left I'm printing the distance of each cluster. As can be seen they are just stacked on top of each other instead of being added on top of its own white circle.
Thankful for any help
regards
Screenshot
Okey I got it to work with a function called: pcl::visualization::PCLVisualizer::addText3D.
There is no support for erasing/updating all text fields that have been added over a period of time though, so one always needs to know the ID tags of each respective text and iterate through them to erase/update them.
You can delete texts with function: pcl::visualization::PCLVisualizer::removeText3D
Do however keep in mind that text ID tags share the same memory space as other ID name tags (e.g names you have given circles, clouds or cylinders etc). This means that if you try to add a text of name "abc" they command will fail if there is a circle pressent in your window named "abc".
Below is a visual example of how it looks now.Obstacle distance plotting

Get Dicom image position into a sequence

A simple question as i am developing a java application based on dcm4che ...
I want to calculate/find the "position" of a dicom image into its sequence (series). By position i mean to find if this image is first, second etc. in its series. More specifically i would like to calculate/find:
Number of slices into a Sequence
Position of each slice (dicom image) into the Sequence
For the first question i know i can use tag 0020,1002 (however it is not always populated) ... For the second one?
If you are dealing with volumetric image series, best way to order your series is to use the Image Position (Patient) (0020, 0032). This is a required Type 1 tag (should always have value) and it is part of the image plane module. It will contain the X, Y and Z values coordinates representing the upper left corner of the image in mm. If the slices are parallel to each other, only one value should change between the slices.
Please note that the Slice Location (0020, 1041) is an optional (Type 3) element and it may not exist in the DICOM file.
We use the InstanceNumber tag (0x0020, 0x0013) as our first choice for the slice position. If there is no InstanceNumber, or if they are all the same, then we use the SliceLocation tag (0x0020, 0x1041). If neither tag is available, then we give up.
We check the InstanceNumber tag such that the Max(InstanceNumber) - Min(InstanceNumber) + 1 is equal to the number of slices we have in the sequence (just in case some manufacturers start counting at 0 or 1, or even some other number). We check the SliceLocation the same way.
This max - min + 1 is then the number of slices in the sequence (substitute for tag ImagesInAcquisition 0x0020, 0x1002).
Without the ImagesInAcquisition tag, we have no way of knowing in advance how many slices to expect...
I would argue that if the slice location is available, use that. It will be more consistent with the image acquisition. If it is not available, then you'll have to use or compute from the image position (patient) attribute. Part 3 section C.7.6.2.1 has details on these attributes.
The main issue comes when you have a series that is oblique. If you just use the z-value of the image position (patient), it may not change by the slice thickenss/spacing between slices attributes, while the slice location typically will. That can cause confusion to end users.

How to deal with arbitrary size for Laplacian Pyramid?

Recently I had much fun with the Laplacian Pyramid algorithm (http://persci.mit.edu/pub_pdfs/pyramid83.pdf). But one big problem is that the original paper is limited to 2^m+1*2^n+1 images. My question is: What is the best way to deal with arbitrary w*h instead? I can think of a couple of options:
Up sample the input to the next 2^m+1,2^n+1 up front
Pad even lines. How exactly? Wouldn't it shift the signal?
Shift even lines by half a sample? Wouldn't it loose half a sample?
Does anybody have experience with this? What is the most practical and efficient approach? Also any pointers to papers dealing with this would be very welcome.
One approach is to create an image with a width and height equal to the next 2^m+1,2^n+1, but instead of up-sampling the image to fill the expanded dimensions, just place it in the top-left corner and fill the empty space to the right and below with a constant value (the average value for the image is a good choice for this). Then encode in the normal way, storing the original image dimensions along with the pyramid. When decoding, decode and then crop to the original size.
This won't introduce any visual artifacts or degradation because you aren't stretching or offsetting the image in any way.
Because the empty space to the right and below the original image is a constant value, the high-pass bands at each level in the image pyramid will be all zero in this area. So if you are using a compression scheme like run length encoding to store each level this will be automatically taken care off and these areas will be compressed to almost nothing. If not then you can simply store the top-left (potentially non-zero) area of each level and then fill out the rest with zeros when decoding.
You could find the min and max x and y bounding rectangle of the non-zero values for each level and store this along with the level, cropped to include only non-zero values. The decoder could also be optimized so that areas of the image that are going to be cropped away are not actually decoded in the first place, by only processing the top-left of each level.
Here's an illustration of the technique:
Instead of just filling the lower-right area with a flat color, you could fill it with horizontally and vertically mirrored copies of the image to the right and below, and a copy mirrored in both directions to the bottom-right, like this:
This will avoid the discontinuities of the first technique, although there will be a discontinuity in dx (e.g. if the value was gradually increasing from left to right it will suddenly be decreasing). Choosing a mirror that keeps dx constant and ddx zero will avoid this second-order discontinuity by linearly extrapolating the values.
Another technique, which is similar to what some JPEG encoders do to pad out an image to a whole number of MCU blocks, is to take the last pixel value of each row and repeat it, and likewise for columns, with the bottom-right-most pixel of the image used to fill the bottom-right area:
This last technique could easily be modified to extrapolate the gradient of values or even the gradient of gradients instead of just repeating the same value for the remainder of the row or column.

How do you calculate the Angle of Incidence?

I'm working on a raytracer for a large side project, with the goal being to produce realistic renders without worrying about CPU time. Basically pre-rendering, so I'm going for accuracy over speed.
I'm having some trouble wrapping my head around some of the more advanced math going on in the lighting aspects of things. Basically, I have a point for my light. Assuming no distance falloff, I should be able to use the point on the polygon I've found, and compare the normal at that point to the angle of incidence on the light to figure out my illumination value. So given a point on a plane, the normal for that plane, and the point light, how would I go about figuring out that angle?
The reason I ask is that I can't seem to find any reference on finding the angle of incidence. I can find lots of references detailing what to do once you've got it, but nothing telling me how to get it in the first place. I imagine it's something simple, but I just can't logic it out.
Thanks
The dot product of the surface normal vector and the incident light vector will give you the cosine of the angle of incidence, if you've normalised your vectors.
It sounds to me like you are trying to calculate diffuse illumination. Assuming you have Surface Point http://www.yourequations.com/eq.latex?%5Cinline%20%5Coverrightarrow%7Bp_o%7D the point on the surface, Light Position http://www.yourequations.com/eq.latex?%5Cinline%20%5Coverrightarrow%7Bp_L%7D, and the Normal Vector http://www.yourequations.com/eq.latex?%5Cinline%20%5Coverrightarrow%7Bn%7D normal vector. You can calculate diffuse illumination like this:
Diffuse Illumination http://www.yourequations.com/eq.latex?%5Coverrightarrow%7BL%7D%3D%5Coverrightarrow%7Bp_L%7D-%5Coverrightarrow%7Bp_o%7D%5C%5C%0AI_d%3Dk%2a%5Cfrac%7B%5Coverrightarrow%7BL%7D%5Ccdot%5Coverrightarrow%7Bn%7D%7D%7B%5C%7C%5Coverrightarrow%7BL%7D%5C%7C%2a%5C%7C%5Coverrightarrow%7Bn%7D%5C%7C%7D
You technically don't need to calculate the actual angle of incident because you only need the cosine of that which the dot product conveniently gives you.
NOTE: From where I'm sitting right now, I can't upload a picture for you. I'll try to lay it out for you in words, though.
Here's how you can imagine this process:
Define alt text http://www.yourequations.com/eq.latex?%5Chat%7Bn%7D as your normalized normal (the vertical vector that comes out of your planar polygon and is of unit length, making the math easier).
Define alt text http://www.yourequations.com/eq.latex?p_0 as your eyeball point.
Define alt text http://www.yourequations.com/eq.latex?p_1 as the impact point of your "eyeball ray" on the polygon.
Define alt text http://www.yourequations.com/eq.latex?%5Chat%7Bv%7D as the normalized vector pointing from alt text http://www.yourequations.com/eq.latex?p_1 back to alt text http://www.yourequations.com/eq.latex?p_0. You can write this like so:
alt text http://www.yourequations.com/eq.latex?%5Chat%7Bv%7D%20=%20%5Cfrac%7B%5Coverrightarrow%7B(p_0%20-%20p_1)%7D%7D%7B||p_0%20-%20p_1||%7D
So, you have created a vector that points from alt text http://www.yourequations.com/eq.latex?p_1 to alt text http://www.yourequations.com/eq.latex?p_0 and then divided that vector by its own length, giving you a vector of length 1 that points from alt text http://www.yourequations.com/eq.latex?p_1 to alt text http://www.yourequations.com/eq.latex?p_0
The reason that we went to all this trouble is that we would really like the angle alt text http://www.yourequations.com/eq.latex?%5Ctheta which is the angle between the normal alt text http://www.yourequations.com/eq.latex?%5Chat%7Bn%7D and that vector alt text http://www.yourequations.com/eq.latex?%5Chat%7Bv%7D that you just created. Another word for theta is the angle of incidence.
An easy way to calculate this angle of incidence is to use the dot product. Using the terms defined above, you take the x, y and z components of each of those unit length vectors, multiply them together and add the sums to get the dot product.
alt text http://www.yourequations.com/eq.latex?%5Chat%7Bn%7D%20%5Ccdot%20%5Chat%7Bv%7D%20=%20%5Ccos%7B%5Ctheta%7D%20=%20n_x%20%20v_x%20+%20n_y%20%20v_y%20+%20n_z%20%20v_z
To calculate alt text http://www.yourequations.com/eq.latex?%5Ctheta, therefore, you simple use the inverse cosine on the dot product:
alt text http://www.yourequations.com/eq.latex?%5Ctheta%20=%20%5Carccos%28%5Chat%7Bn%7D%20%5Ccdot%20%5Chat%7Bv%7D%29
Edit: modified the above to add yourequations.com formatting.

Resources