I am building an ASP.NET web site where the users may upload photos of themselves. There could be thousands of photos uploaded every day. One thing my boss has asked a few time is if there is any way we could detect if any of the photos are showing too much 'skin' and automatically move flag these as 'Adults Only' before the editors make the final decision.
Your best bet is to deal with the image in the HSV colour space (see here for rgb - hsv conversion). The colour of skin is pretty much the same between all races, its just the saturation that changes. By dealing with the image in HSV you can simply search for the colour of skin.
You might do this by simply counting the number of pixel within a colour range, or you could perform region growing around pixel to calculate the size of the areas the colour.
Edit: for dealing with grainy images, you might want to perform a median filter on the image first, and then reduce the number of colours to segment the image first, you will have to play around with the settings on a large set of pre-classifed (adult or not) images and see how the values behave to get a satisfactory level of detection.
EDIT: Heres some code that should do a simple count (not tested it, its a quick mashup of some code from here and rgb to hsl here)
Bitmap b = new Bitmap(_image);
BitmapData bData = b.LockBits(new Rectangle(0, 0, _image.Width, _image.Height), ImageLockMode.ReadWrite, b.PixelFormat);
byte bitsPerPixel = GetBitsPerPixel(bData.PixelFormat);
byte* scan0 = (byte*)bData.Scan0.ToPointer();
int count;
for (int i = 0; i < bData.Height; ++i)
{
for (int j = 0; j < bData.Width; ++j)
{
byte* data = scan0 + i * bData.Stride + j * bitsPerPixel / 8;
byte r = data[2];
byte g = data[1];
byte b = data[0];
byte max = (byte)Math.Max(r, Math.Max(g, b));
byte min = (byte)Math.Min(r, Math.Min(g, b));
int h;
if(max == min)
h = 0;
else if(r > g && r > b)
h = (60 * ((g - b) / (max - min))) % 360;
else if (g > r && g > b)
h = 60 * ((b - r)/max - min) + 120;
else if (b > r && b > g)
h = 60 * ((r - g) / max - min) + 240;
if(h > _lowerThresh && h < _upperThresh)
count++;
}
}
b.UnlockBits(bData);
Of course, this will fail for the first user who posts a close-up of someone's face (or hand, or foot, or whatnot). Ultimately, all these forms of automated censorship will fail until there's a real paradigm-shift in the way computers do object recognition.
I'm not saying that you shouldn't attempt it nontheless; but I want to point to these problems. Do not expect a perfect (or even good) solution. It doesn't exist.
I doubt that there exists any off-the-shelf software that can determine if the user uploads a naughty picture. Your best bet is to let users flag images as 'Adults Only' with a button next to the picture. (Clarification: I mean users other than the one who uploaded the picture--similar to how posts can be marked offensive here on StackOverflow.)
Also, consider this review of an attempt to do the same thing in a dedicated product: http://www.dansdata.com/pornsweeper.htm.
Link stolen from today's StackOverflow podcast, of course :).
We can't even write filters that detect dirty words accurately in blog posts, and your boss is asking for a porno detector? CLBUTTIC!
I would say your answer lies in crowdsourcing the task. This almost always works and tends to scale very well.
It doesn't have to involve making some users into "admins" and coming up with different permissions - it can be as simple as to enable an "inappropriate" link near each image and keeping a count.
See the seminal paper "Finding Naked People" by Fleck/Forsyth published in ECCV. (Advanced).
http://www.cs.hmc.edu/~fleck/naked.html
Interesting question from a theoretical / algorithmic standppoint. One approach to the problem would be to flag images that contain large skin-colored regions (as explained by Trull).
However, the amount of skin shown is not a determinant of an offesive image, it's rather the location of the skin shown. Perhaps you can use face detection (search for algorithms) to refine the results -- determine how large the skin regions are relative to the face, and if they belong to the face (perhaps how far below it they are).
I know either Flickr or Picasa has implemented this. I believe the routine was called FleshFinder.
A tip on the architecture of doing this:
Run this as a windows service separate from the ASP.NET Pipeline, instead of analyzing images in real time, create a queue of new images that are uploaded for the service to work through.
You can use the normal System.Drawing stuff if you want, but if you really need to process a lot of images, it would be better to use native code and a high performance graphics library and P/invoke the routine from your service.
As resources are available, process images in the background and flag ones that are suspicious for editors review, this should prune down the number of images to review significantly, while not annoying people who upload pictures of skin colored houses.
I would approach the problem from a statistical standpoint. Get a bunch of pictures that you consider safe, and a bunch that you don't (that will make for a fun day of research), and see what they have in common. Analyze them all for color range and saturation to see if you can pick out characteristics that all of the naughty photos, and few of the safe ones have.
Perhaps the Porn Breath Test would be helpful - as reported on Slashdot.
Rigan Ap-apid presented a paper at WorldComp '08 on just this problem space. The paper is allegedly here, but the server was timing out for me. I attended the presentation of the paper and he covered comparable systems and their effectiveness as well as his own approach. You might contact him directly.
I'm afraid I can't help point you in the right direction, but I do remember reading about this being done before. It was in the context of people complaining about baby pictures being caught and flagged mistakenly. If nothing else, I can give you the hope that you don't have to invent the wheel all by yourself... Someone else has been down this road!
CrowdSifter by Dolores Labs might do the trick for you. I read their blog all the time as they seem to love statistics and crowdsourcing and like to talk about it. They use amazon's mechanical turk for a lot of their processing and know how to process the results to get the right answers out of things. Check out their blog at the very least to see some cool statistical experiments.
As mentioned above by Bill (and Craig's google quote) statistical methods can be highly effective.
Two approaches you might want to look into are:
Neural Networks
Multi Variate Analysis (MVA)
The MVA approach would be to get a "representative sample" of acceptable pictures and of unacceptable pictures. The X data would be an array of bytes from each picture, the Y would be assigned by you as a 1 for unacceptable and a 0 for acceptable. Create a PLS model using this data. Run new data against the model and see how well it predicts the Y.
Rather than this binary approach you could have multiple Y's (e.g. 0=acceptable, 1=swimsuit/underwear, 2=pornographic)
To build the model you can look at open source software or there are a number of commercial packages available (although they are typically not cheap)
Because even the best statistical approaches are not perfect the idea of also including user feedback would probably be a good idea.
Good luck (and worst case you get to spend time collecting naughty pictures as an approved and paid activity!)
Related
I am suddenly in a recursive language class (sml) and recursion is not yet physically sensible for me. I'm thinking about the way a floor of square tiles is sometimes a model or metaphor for integer multiplication, or Cuisenaire Rods are a model or analogue for addition and subtraction. Does anyone have any such models you could share?
Imagine you're a real life magician, and can make a copy of yourself. You create your double a step closer to the goal and give him (or her) the same orders as you were given.
Your double does the same to his copy. He's a magician too, you see.
When the final copy finds itself created at the goal, it has nowhere more to go, so it reports back to its creator. Which does the same.
Eventually, you get your answer back – without having moved an inch – and can now create the final result from it, easily. You get to pretend not knowing about all those doubles doing the actual hard work for you. "Hmm," you're saying to yourself, "what if I were one step closer to the goal and already knew the result? Wouldn't it be easy to find the final answer then ?" (*)
Of course, if you were a double, you'd have to report your findings to your creator.
More here.
(also, I think I saw this "doubles" creation chain event here, though I'm not entirely sure).
(*) and that is the essence of the recursion method of problem solving.
How do I know my procedure is right? If my simple little combination step produces a valid solution, under assumption it produced the correct solution for the smaller case, all I need is to make sure it works for the smallest case – the base case – and then by induction the validity is proven!
Another possibility is divide-and-conquer, where we split our problem in two halves, so will get to the base case much much faster. As long as the combination step is simple (and preserves validity of solution of course), it works. In our magician metaphor, I get to create two copies of myself, and combine their two answers into one when they are finished. Each of them creates two copies of themselves as well, so this creates a branching tree of magicians, instead of a simple line as before.
A good example is the Sierpinski triangle which is a figure that is built from three quarter-sized Sierpinski triangles simply, by stacking them up at their corners.
Each of the three component triangles is built according to the same recipe.
Although it doesn't have the base case, and so the recursion is unbounded (bottomless; infinite), any finite representation of S.T. will presumably draw just a dot in place of the S.T. which is too small (serving as the base case, stopping the recursion).
There's a nice picture of it in the linked Wikipedia article.
Recursively drawing an S.T. without the size limit will never draw anything on screen! For mathematicians recursion may be great, engineers though should be more cautious about it. :)
Switching to corecursion ⁄ iteration (see the linked answer for that), we would first draw the outlines, and the interiors after that; so even without the size limit the picture would appear pretty quickly. The program would then be busy without any noticeable effect, but that's better than the empty screen.
I came across this piece from Edsger W. Dijkstra; he tells how his child grabbed recursions:
A few years later a five-year old son would show me how smoothly the idea of recursion comes to the unspoilt mind. Walking with me in the middle of town he suddenly remarked to me, Daddy, not every boat has a lifeboat, has it? I said How come? Well, the lifeboat could have a smaller lifeboat, but then that would be without one.
I love this question and couldn't resist to add an answer...
Recursion is the russian doll of programming. The first example that come to my mind is closer to an example of mutual recursion :
Mutual recursion everyday example
Mutual recursion is a particular case of recursion (but sometimes it's easier to understand from a particular case than from a generic one) when we have two function A and B defined like A calls B and B calls A. You can experiment this very easily using a webcam (it also works with 2 mirrors):
display the webcam output on your screen with VLC, or any software that can do it.
Point your webcam to the screen.
The screen will progressively display an infinite "vortex" of screen.
What happens ?
The webcam (A) capture the screen (B)
The screen display the image captured by the webcam (the screen itself).
The webcam capture the screen with a screen displayed on it.
The screen display that image (now there are two screens displayed)
And so on.
You finally end up with such an image (yes, my webcam is total crap):
"Simple" recursion is more or less the same except that there is only one actor (function) that calls itself (A calls A)
"Simple" Recursion
That's more or less the same answer as #WillNess but with a little code and some interactivity (using the js snippets of SO)
Let's say you are a very motivated gold-miner looking for gold, with a very tiny mine, so tiny that you can only look for gold vertically. And so you dig, and you check for gold. If you find some, you don't have to dig anymore, just take the gold and go. But if you don't, that means you have to dig deeper. So there are only two things that can stop you:
Finding some gold nugget.
The Earth's boiling kernel of melted iron.
So if you want to write this programmatically -using recursion-, that could be something like this :
// This function only generates a probability of 1/10
function checkForGold() {
let rnd = Math.round(Math.random() * 10);
return rnd === 1;
}
function digUntilYouFind() {
if (checkForGold()) {
return 1; // he found something, no need to dig deeper
}
// gold not found, digging deeper
return digUntilYouFind();
}
let gold = digUntilYouFind();
console.log(`${gold} nugget found`);
Or with a little more interactivity :
// This function only generates a probability of 1/10
function checkForGold() {
console.log("checking...");
let rnd = Math.round(Math.random() * 10);
return rnd === 1;
}
function digUntilYouFind() {
if (checkForGold()) {
console.log("OMG, I found something !")
return 1;
}
try {
console.log("digging...");
return digUntilYouFind();
} finally {
console.log("climbing back...");
}
}
let gold = digUntilYouFind();
console.log(`${gold} nugget found`);
If we don't find some gold, the digUntilYouFind function calls itself. When the miner "climbs back" from his mine it's actually the deepest child call to the function returning the gold nugget through all its parents (the call stack) until the value can be assigned to the gold variable.
Here the probability is high enough to avoid the miner to dig to the earth kernel. The earth kernel is to the miner what the stack size is to a program. When the miner comes to the kernel he dies in terrible pain, when the program exceed the stack size (causes a stack overflow), it crashes.
There are optimization that can be made by the compiler/interpreter to allow infinite level of recursion like tail-call optimization.
Take fractals as being recursive: the same pattern get applied each time, yet each figure differs from another.
As natural phenomena with fractal features, Wikipedia presents:
Moutain ranges
Frost crystals
DNA
and, even, proteins.
This is odd, and not quite a physical example except insofar as dance-movement is physical. It occurred to me the other morning. I call it "Written in Latin, solved in Hebrew." Huh? Surely you are saying "Huh?"
By it I mean that encoding a recursion is usually done left-to-right, in the Latin alphabet style: "Def fac(n) = n*(fac(n-1))." The movement style is "outermost case to base case."
But (please check me on this) at least in this simple case, it seems the easiest way to evaluate it is right-to-left, in the Hebrew alphabet style: Start from the base case and move outward to the outermost case:
(fac(0) = 1)
(fac(1) = 1)*(fac(0) = 1)
(fac(2))*(fac(1) = 1)*(fac(0) = 1)
(fac(n)*(fac(n-1)*...*(fac(2))*(fac(1) = 1)*(fac(0) = 1)
(* Easier order to calculate <<<<<<<<<<< is leftwards,
base outwards to outermost case;
more difficult order to calculate >>>>>> is rightwards,
outermost case to base *)
Then you do not have to suspend items on the left while awaiting the results of calculations further right. "Dance Leftwards" instead of "Dance rightwards"?
Okay, so I am trying to drive a 7 segment based display in order to display temperature in degrees celcius. So, I have two displays, plus one extra LED to indicate positive and negative numbers.
My problem lies in the software. I have to find some way of driving these displays, which means converting a given integer into the relevant voltages on the pins, which means that for each of the two displays I need to know the number of tens and number of 1s in the integer.
So far, what I have come up with will not be very nice for an arduino as it relies on division.
tens = numberToDisplay / 10;
ones = numberToDisplay % 10;
I have admittedly not tested this yet, but I think I can assume that for a microcontroller with limited division capabilities this is not an optimal solution.
I have wracked my brain and looked around for a solution using addition/subtraction/bitwise but I cannot think of one at all. This division is the only one I can see.
For this application it's fine. You don't need to get bothered with performance in a simple thermometer.
If however you do need something quicker than division and modulo, then bitwise operations come to help. Basically you would use bitwise & operator, to compare your value to display with patterns describing digits to be displayed on the display.
See the project here for example: http://fritzing.org/projects/2-digit-7-segment-0-99-counting-with-arduino/
You might also try using a 7-seg display driver chip to simplify your output and save pins. The MC14511BCP (a "4511") is a good one. It'll translate binary coded decimal (BCD) to the appropriate 7-seg configuration. Spec sheets are available here and they can be commonly found at electronics parts stores online.
The typical FFT for audio looks pretty similar to this, with most of the action happening on the far left side
http://www.flight404.com/blog/images/fft.jpg
He multiplied it by a partial sine wave to get it to the bottom, but the article isn't too specific on this part of it. It also seems like a "good enough" modification of the dataset, rather than one based on some property. I understand that human hearing is better suited to the higher frequencies, thus, most music will have amplified bass and attenuated treble so that both sound to us as being of relatively equal strength.
My question is what modification needs to be done to the FFT to compensate for this standard falloff?
for(i = 0; i < fft.length; i++){
fft[i] = fft[i] * Math.log(i + 1); // does, eh, ok but the high
// end is still not really "loud"
// enough
}
EDIT ::
http://en.wikipedia.org/wiki/Equal-loudness_contour
I came across this article, I think it might be the direction to head in, but there still might be some property of an FFT that needs to be counteracte.
First, are you sure you want to do this? It makes sense to compensate for some things, like the microphone response not being flat, but not human perception. People are used to hearing sounds with the spectral content that the sounds have in the real world, not along perceptual equal loudness curves. If you play a sound that you've modified in the way you suggest it would sound strange. Maybe some people like the music to have enhanced low frequencies, but this is a matter of taste, not psychophysics.
Or maybe you are compensating for some other reason, for example, taking into account the poorer sensitivity to lower frequencies might enhance a compression algorithm. Is this the idea?
If you do want to normalize by the equal loudness curves, one should note that most of the curves and equations are in terms of sound pressure level (SPL). SPL is the log of the square of the waveform amplitude, so when you work with the FFTs, it's probably easiest to work with their square (the power specta). (Or, of course, you could compensate in other ways by, say, multiplying by sqrt(log(i+1)) in your equation above -- assuming that the log was an approximation of the inverse equal-loudness curve.)
I think the equal loudness contour is exactly the right direction.
However, its shape depends on the absolute pressure level.
In other words the sensitivity curve of our hearing changes with sound pressure.
There is no "correct normalization" if you have no information about absolute levels.
If this is a problem depends on what you want to do with the data.
The loudness contour is standardized in ISO 226 but this document is not freely available for download. It should be in a decent university library though.
Here is another source for
loudness contours
So you are trying to raise the level of the high end frequencies? Sounds like a high pass filter with a minimum multiplier might work, so that you don't attenuate the low frequency signals too much. Pick up a good book on filter design, maybe monkey around with this applet
In the old days of first samplers, this is before MOTU Boost people :) it wasn't FFT but simple (Fairlight or Roland it first I think) Normalisation done on the original or resulting time-domain signal (if you are doing beat slicing, recycle-style); can't you do that? Or only go for the FFT after you compensate to counteract for it?
Seems like a two phase procedure otherwise, I'd personally leave FFT as is for the task..
we have a particle detector hard-wired to use 16-bit and 8-bit buffers. Every now and then, there are certain [predicted] peaks of particle fluxes passing through it; that's okay. What is not okay is that these fluxes usually reach magnitudes above the capacity of the buffers to store them; thus, overflows occur. On a chart, they look like the flux suddenly drops and begins growing again. Can you propose a [mostly] accurate method of detecting points of data suffering from an overflow?
P.S. The detector is physically inaccessible, so fixing it the 'right way' by replacing the buffers doesn't seem to be an option.
Update: Some clarifications as requested. We use python at the data processing facility; the technology used in the detector itself is pretty obscure (treat it as if it was developed by a completely unrelated third party), but it is definitely unsophisticated, i.e. not running a 'real' OS, just some low-level stuff to record the detector readings and to respond to remote commands like power cycle. Memory corruption and other problems are not an issue right now. The overflows occur simply because the designer of the detector used 16-bit buffers for counting the particle flux, and sometimes the flux exceeds 65535 particles per second.
Update 2: As several readers have pointed out, the intended solution would have something to do with analyzing the flux profile to detect sharp declines (e.g. by an order of magnitude) in an attempt to separate them from normal fluctuations. Another problem arises: can restorations (points where the original flux drops below the overflowing level) be detected by simply running the correction program against the reverted (by the x axis) flux profile?
int32[] unwrap(int16[] x)
{
// this is pseudocode
int32[] y = new int32[x.length];
y[0] = x[0];
for (i = 1:x.length-1)
{
y[i] = y[i-1] + sign_extend(x[i]-x[i-1]);
// works fine as long as the "real" value of x[i] and x[i-1]
// differ by less than 1/2 of the span of allowable values
// of x's storage type (=32768 in the case of int16)
// Otherwise there is ambiguity.
}
return y;
}
int32 sign_extend(int16 x)
{
return (int32)x; // works properly in Java and in most C compilers
}
// exercise for the reader to write similar code to unwrap 8-bit arrays
// to a 16-bit or 32-bit array
Of course, ideally you'd fix the detector software to max out at 65535 to prevent wraparound of the sort that is causing your grief. I understand that this isn't always possible, or at least isn't always possible to do quickly.
When the particle flux exceeds 65535, does it do so quickly, or does the flux gradually increase and then gradually decrease? This makes a difference in what algorithm you might use to detect this. For example, if the flux goes up slowly enough:
true flux measurement
5000 5000
10000 10000
30000 30000
50000 50000
70000 4465
90000 24465
60000 60000
30000 30000
10000 10000
then you'll tend to have a large negative drop at times when you have overflowed. A much larger negative drop than you'll have at any other time. This can serve as a signal that you've overflowed. To find the end of the overflow time period, you could look for a large jump to a value not too far from 65535.
All of this depends on the maximum true flux that is possible and on how rapidly the flux rises and falls. For example, is it possible to get more than 128k counts in one measurement period? Is it possible for one measurement to be 5000 and the next measurement to be 50000? If the data is not well-behaved enough, you may be able to make only statistical judgment about when you have overflowed.
Your question needs to provide more information about your implementation - what language/framework are you using?
Data overflows in software (which is what I think you're talking about) are bad practice and should be avoided. While you are seeing (strange data output) is only one side effect that is possible when experiencing data overflows, but it is merely the tip of the iceberg of the sorts of issues you can see.
You could quite easily experience more serious issues like memory corruption, which can cause programs to crash loudly, or worse, obscurely.
Is there any validation you can do to prevent the overflows from occurring in the first place?
I really don't think you can fix it without fixing the underlying buffers. How are you supposed to tell the difference between the sequences of values (0, 1, 2, 1, 0) and (0, 1, 65538, 1, 0)? You can't.
How about using an HMM where the hidden state is whether you are in an overflow and the emissions are observed particle flux?
The tricky part would be coming up with the probability models for the transitions (which will basically encode the time-scale of peaks) and for the emissions (which you can build if you know how the flux behaves and how overflow affects measurement). These are domain-specific questions, so there probably aren't ready-made solutions out there.
But one you have the model, everything else---fitting your data, quantifying uncertainty, simulation, etc.---is routine.
You can only do this if the actual jumps between successive values are much smaller than 65536. Otherwise, an overflow-induced valley artifact is indistinguishable from a real valley, you can only guess. You can try to match overflows to corresponding restorations, by simultaneously analysing a signal from the right and the left (assuming that there is a recognizable base line).
Other than that, all you can do is to adjust your experiment by repeating it with different original particle flows, so that real valleys will not move, but artifact ones move to the point of overflow.
I have 3 Bitmap point .
Bitmap* totalCanvas = new Bitmap(400, 300, PixelFormat32bppARGB); // final canvas
Bitmap* bottomLayer = new Bitmap(400, 300,PixelFormat32bppARGB); // background
Bitmap* topLayer = new Bitmap(XXX); // always changed.
I will draw complex background on bottomLayer. I don't want to redraw complex background on totalCanvas again and again, so I stored it in bottomLayer.
TopLayer changed frequently.
I want to draw bottomLayer to totalCanvas. Which is the fastest way?
Graphics canvas(totalCanvas);
canvas.DrawImage(bottomLayer, 0, 0); step1
canvas.DrawImage(topLayer ,XXXXX); step2
I want step1 to be as fast as possible. Can anyone give me some sample?
Thanks very much!
Thanks for unwind's answer. I write the following code:
Graphics canvas(totalCanvas);
for (int i = 0; i < 100; ++i)
{
canvas.DrawImage(bottomLayer, 0,0);
}
this part takes 968ms... it is too slow...
Almost all GDI+ operations should be implemented by the driver to run as much as possible on the GPU. This should mean that a simple 2D bitmap copy operation is going to be "fast enough", for even quite large values of "enough".
My recommendation is the obvious one: don't sweat it by spending time hunting for a "fastest" way of doing this. You have formulated the problem very clearly, so just try implementing it that clearly, by doing it as you've outlined in the question. Then you can of course go ahead and benchmark it and decide to continue the hunt.
A simple illustration:
A 32 bpp 400x300 bitmap is about 469 KB in size. According to this handy table, an Nvidia GeForce 4 MX from 2002 has a theoretical memory bandwidth of 2.6 GB/s. Assuming the copy is done in pure "overwrite" mode, i.e. no blending of the existing surface (which sounds right, as your copy is basically a way of "clearing" the frame to the copy's source data), and an overhead factor of four just to be safe, we get:
(2.6 * 2^30 / (4 * 469 * 2^10)) = 1453
This means your copy should run at 1453 FPS, which I happily assume to be "good enough".
If at all possible (and it looks like it from your code), using DrawImageUnscaled will be significgantly faster than DrawImage. Or if you are using the same image over and over again, create a TextureBrush and use that.
The problem with GDI+, is that for the most part, it is unaccelerated. To get the lightening fast drawing speeds you really need GDI and BitBlt, which is a serious pain in the but to use with GDI+, especially if you are in Managed code (hard to tell if you are using managed C++ or straight C++).
See this post for more information about graphics quickly in .net.