I'm curious to know what would be a good approach for getting the masks of objects in an image in which we only have many instances of one object (see the image), but just for the instances in which the whole shape is visible.
A box full of similar Levers
I've already tried Mask-RCNN and annotated the fully visible objects for a handful of images.
The annotated image
However, apparently, Mask-RCNN does not care about the fact that I'm interested in getting the masks for the items which are fully visible. It tries to find all the objects, even those which are partially visible, and gives me the all the masks.
After weeks of try and error, I got a proper result by doing the followings:
annotate both of the holes on the levers (separately)
annotate as many levers as it makes sense in an image (even the partially occluded ones in which their holes are still visible)
Filter out the results of Mask-RCNN by checking whether a lever mask also contains two hole masks or not
This gives a roughly accurate answer. However, I'm still curious to know whether there's a better approach or not.
Related
I've been looking for a robust method of pathfinding for a platformer based game I'm developing and A* looks like it's the best method available. I noticed there is a demo for the AStar implementation in Godot. However, it is written for a grid/tile based game and I'm having trouble adapting that to a platformer where the Y axis is limited by gravity.
I found a really good answer that describes how A* can be applied to platformers in Unity. My question is... Is it possible to use AStar in Godot to achieve the same thing described in the above answer? Is it possible this could be done better without using the built in AStar framework? What is a really simple example of how it would work (with or without AStar) in GDscript?
Though I have already posted a 100 point bounty (and it has expired), I would still be willing to post another 100 point bounty and award it, pending an answer to this question.
you could repurpose the Navigation2D node for platformer purposes. The picture below shows an example usage. The Navigation2D node makes it possible to navigate the shortest path between two point that lie within the combined navigation polygon (this is the union of all NavigationPolygonInstances).
You can use the get_simple_path method to get a vector2 array that describes the points your agent/character should try to reach (or get close to, by using some predefined margin) in sequence. Place each point in a queue, and move the character towards the different points by moving it horizontally. Whenever your agent's next point in the queue is too high up to reach, then you can make the agent jump.
I hope this makes sense!
The grey/dark-blue rectangles are platforms with collision whereas the green shapes are NavigationPolygonInstance nodes
This approach is by no means perfect. If you were to implement slopes into your game then the agent may jump up the slope instead of ascending it normally. It is also pretty tedious to create all the shapes needed.
A more robust solution would be to have a custom graph system that you could place in the scene and position its vertices. This opens up the possibility to make one-way paths and have certain edges/connections between vertices marked as "jumpable" only. This is a lot more work though if you can not find any such solution online.
I'm working on a R Shiny app that plots (with ggplot2) information about different chromosomes, and there is also an option to show all the chromosomes together.
I have a really bad performance problem, especially with the 'all' view, I looked for some caching solutions, but didn't really find anything helpful.
I thought to put 2 tabs in my app, the first is for the single chromosomes, and the second for all together, so it would run in the background, and load while the user is still in the first tab, but when I read about Tabsets I found out this-
Notice that outputs that are not visible are not re-evaluated until they become visible.
So my question is, is there a way to bypass this? or is there a way to cache a plot so when I want to plot it the only thing left to do is to draw it, and skip the construct, build, and render stages. (there is a good chance that I don't really understand the plotting process, and it's not possible)
One last thing, I am aware of (the much faster) ggvis and ggobi, but these are for now not an option.
I have seen around that there is a b2Manifold. What I want to accomplish is to detect whether or not a collision was on the top part or not of one of the objects that were colliding.
I have already set up a b2ContactListener and it works fine. I would just like to provide more accurate collisions by setting up the manifold to detect if one b2Body is on top of the other b2Body that it collided with.
How would I do this?
Thanks!
http://postimage.org/image/kbfr7c5db/
I'm a (theoretical) computer science student, and as such the investigating of semantics of programming languages is one of the subjects of my study (wikipedia).
I've played around a lot with CSS and have a reasonable understanding of the box positioning rules. (If you tell me to create a page with certain layout, I can often think of the correct box approach and applicable CSS rules.)
It would be cool to have some sort of formal semantics for the CSS box positioning rules, but after searching the net for a while, I couldn't quite find anything useful.
I mostly simply end up at the CSS specifications, which are formatted as long texts with pseudo-algorithms (not the greatest reading matter --- I haven't read any of these specifications with much effort just yet).
Has no one tried to formalize this “theory” into some mathematical model, more rigorous than what the specifications have to offer? I'm not looking for something complete or definitive, but it sure would be neat (and useful!) if, at least, the way boxes should be positioned could be modeled in a formal manner.
Does anyone know of such research?
Not an answer! This is an example of a possible formalization of a very simplified case (see my comment above).
Say, for instance, we're working in a world featuring (1) a known screen width , (2) an ordered list of boxes which aren't nested, have no margins/padding's/borders, are floated left and of which we know their (2.1) height and (2.2) width via the mathematical functions and .
We'll be defining the functions and , which state the coordinates of the left upper corner of each box.
We'll be defining/using the relations " starts line " and " has height ".
First of all, starts line 0.
Then, if starts line , and furthermore if for certain
...we conclude that:
has height
starts line +1 iff
These rules define the positions of given boxes in a formal manner. It's only one way to do so, of course, and probably not the smartest (just thought it up quickly), but it does correctly formalize the way floats work (modulo typos, I haven't checked it over well enough).
When dealing with programming languages, one can choose of many of these formalisms, each invented for particular purposes (see wikipedia).
I'm just interested if anyone has ever tried to come up with some formalization for CSS box positioning. Of course the specifications go a long way, but they're just not quite as rigorous as the mathematical way forces you to be.
What are the parameters/factors that a QR detector need to detect/check before(during) decoding the QR code itself.
From what I know:
1. it need to find/locate three finder patterns
2. need to locate alignment patterns (if there is any)
3. need to check luminance
Is there anything else that need to be determined/checked?
I suppose that there are many ways to detect a QR code, and it's not required to do it one particular way or the other as long as the detection succeeds. There is a reference algorithms in the QR code specification, though in my opinion it is too slow to be practical, though it's quite thorough.
I can tell you how zxing does it. Yes, it first locates the three finder patterns. This is done by looking for 1:1:3:1:1 black/white/black/white/black crossings horizontally and vertically. It figures out which one is which by looking at the vectors between them.
Then it needs a fourth point since four points are needed to correct for perspective distortion. It uses the location of the 3 finder patterns to guess about where it is and scans for it similarly (looking for 1:1:1:1:1). You don't need to find all alignment patterns, though doing so would allow you to correct for warping in the QR code, which is very rare.
Then you can sample the image to get the black/white modules by computing the perspective transform and reversing it. Then the decoding proceeds, the processing of those black/white modules, which is a fair bit of work too but nothing to do with detection or image processing anymore.
Looking at luminance is really a step before all this, so you even have a notion of black and white in the image to begin with. That's different.