I calculated an average in crossfilter, but it's wrong. I can't understand why. Can you point out my error? - crossfilter

You can see my gistup here: http://bl.ocks.org/markarios/058f85800d598fc9f2b6
While checking reductio, I calculated the average PPI per device type and the following code is producing the wrong result. The only thing I can think of is that I some how need to use the index of ppi_device_sum[i].key but I'm not sure how to reference that.
Thanks advance for your time!
// What's the average PPI per device?
write("");
write("Average PPI By Type");
for (var i = 0; i < type_device_count.length; i++) {
write(ppi_device_sum[i].key + "(s): " + ppi_device_sum[i].value/type_device_count[i].value);
};
Product Types
tablet(s): 7
desktop monitor(s): 4
laptop(s): 2
smartphone(s): 2
desktop(s): 1
Total PPI by Device Type
tablet(s): 1997
smartphone(s): 770
desktop monitor(s): 444
laptop(s): 350
desktop(s): 108
Average PPI By Type
tablet(s): 285.2857142857143 (correct)
smartphone(s): 192.5 (incorrect, should be 385)
desktop monitor(s): 222 (incorrect, should be 111)
laptop(s): 175 (correct)
desktop(s): 108 (correct)

Probably best to sort your arrays by key before you iterate through them so that their keys are in the same order (JavaScript Array.prototype.sort() method is fine for this).
If you find any problems with the calculations in Reductio, please file an issue on Github. It is very raw at the moment. I will be integrating it into a larger application in the next couple of weeks, so it will be getting more use and eyes on it at that point.
One other note: In your gist you are doing something that makes me think you are working under a very common misconception about how Crossfilter works. It's not exactly intuitive, but this
// calculate the number of device types
var type_count = type.group().reduceCount().size();
// how many of each device are there?
var type_device_count = type.group()
.reduceCount()
.top(type_count);
is doing the same thing as this
// Build the Crossfilter group.
var typeGroup = type.group(); // .reduceCount() is the default
// calculate the number of device types
var type_count = typeGroup.size(); // Now redundant
// how many of each device are there?
var type_device_count = typeGroup.top(Infinity); // Returns all groups
The later is the better way to do things because once you've created a Crossfilter group, that group will be updated when new data is added to the Crossfilter and when you filter on other dimensions. So typeGroup.size() and typeGroup.top(Infinity) will return different results as the contents and filters on your Crossfilter change. Keeping these groups updated uses resources, so you want to create as few dimensions and groups as possible to accomplish your task.

Related

Maximum number of elements in map

What is the maximum number of elements that can be stored in a Map in GO? If I need to access data from Map frequently, is it a good idea to keep on adding items to Map and retrieving from it, in a long running program?
There is no theoretical limit to the number of elements in a map except the maximum value of the map-length type which is int. The max value of int depends on the target architecture you compile to, it may be 1 << 31 - 1 = 2147483647 in case of 32 bit, and 1 << 63 - 1 = 9223372036854775807 in case of 64 bit.
Note that as an implementation restriction you may not be able to add exactly max-int elements, but the order of magnitude will be the same.
Since the builtin map type uses a hashmap implementation, access time complexity is usually O(1), so it is perfectly fine to add many elements to a map, you can still access elements very fast. Note that however adding many elements will cause a rehashing and rebuilding the internals, which will require some additional calculations - which may happen occasionally when adding new keys to the map.
If you can "guess" or estimate the size of your map, you can create your map with a big capacity to avoid rehashing. E.g. you can create a map with space for a million elements like this:
m := make(map[string]int, 1e6)
"A maximum number"? Practically no.
"A good idea"? Measure, there cannot be a general answer.

how to develop demo application with FPS rate using kick.js?

I'm very interested in Kick.js. To convince my professor to use this framework, I want to develop an application which I can load/code custom 3D model using kick.js and should be able to add more objects. I should also able to print FPS to check the variations in FPS as I add more 3D objects on canvas. I'm new to graphic programming, I neither have knowledge on shader programming nor openGL. Being a newbie, how can I start diving into this framework?
The following steps I wanted to implement (Suggest me if I go wrong):
Develop simple demo using kick.js loading single cube or sphere or teapot on canvas.
Able to see the fps as I change the camera angles.
Later I should be able to add more triangles(Models) on the canvas of same type (ex: Teapot) and able to compare the fps with single teapot one.
Am i approaching the right way or please suggestions needed. The provided tutorials neither of them having FPS demo. Please someone HELP ME. I really liked the features stated on homepage but I don't know how can I implement them in my demo.
Assuming that Kick.js has a "render" callback or something similar that's invoked for each frame you want to render (and you know the time between frames, or the absolute time since program start), it's fairly simple to calculate your frame rate.
The method I've used before is: pick a sample rate (I like 250ms so it updates 4 times a second), and count how many frames have executed every 250ms. When you hit 250ms, update the on-screen frame rate counter variable and start counting again.
timeSinceLastFPSUpdate += millisecondsSinceLastFrame;
framesSinceLastFPSUpdate++;
if timeSinceLastFPSUpdate > 250:
timeSinceLastFPSUpdate = 0
fps = framesSinceLastFPSUpdate * (1000 / 250); // convert "frames per 250ms" to "frames per 1s"
framesSinceLastFPSUpdate = 0;
print fps to screen;
You can play around with different sample rates or use a different frame rate calculation method to get the timer to be more "accurate" (to better find frame rate dips) but it sounds like you're looking for something that's less accurate and is just giving you a reasonable idea of the overall complexity of rendering rather than frame to frame dips.

why the difference between bitmap.Save with ImageFormat and ImageCodecInfo is so big in .net?

I'm experimenting with image re-sizing in asp.net. Actual re-sizing code aside, I am wondering why there is such a big difference between bitmap's Save overloads
method 1
ImageCodecInfo jpgEncoder =
ImageCodecInfo.GetImageDecoders()
.First(c => c.FormatID == ImageFormat.Jpeg.Guid);
Encoder encoder = Encoder.Quality;
EncoderParameters encoderParameters = new EncoderParameters(1);
encoderParameters.Param[0] = new EncoderParameter(encoder, (long)quality);
bitmap.Save(_current_context.Response.OutputStream,jpgEncoder,encoderParameters)
method 2
bitmap.Save(_current_context.Response.OutputStream,ImageFormat.Jpeg)
So Method 1, at 100 quality, outputs this particular jpeg image at about 250kb. At 90 quality, it drops to about 100kb
Method 2 however, drops the image to about 60kb, which is a huge difference and with no visible difference as well.
I can't seem to find anywhere why the difference is so big, MSDN has zero details on these two overloads.
Any insight is appreciated. Thanks
Looking at the ImageCodeInfo / Encoder objects which don't seem to provide a way to extract the settings out. I would assume that by default it's setting the Quality to 100 on the save.
Without looking more into the Windows Imaging stuff it's really hard to say.
You could try doing your code with the Default save (Method2) , and the Method 1 with 100 and see if they are the same. it's most likely that way.
http://msdn.microsoft.com/en-us/library/system.drawing.imaging.encoder.quality.aspx#Y800

What is the fastest way to draw an Image on another image?

I have 3 Bitmap point .
Bitmap* totalCanvas = new Bitmap(400, 300, PixelFormat32bppARGB); // final canvas
Bitmap* bottomLayer = new Bitmap(400, 300,PixelFormat32bppARGB); // background
Bitmap* topLayer = new Bitmap(XXX); // always changed.
I will draw complex background on bottomLayer. I don't want to redraw complex background on totalCanvas again and again, so I stored it in bottomLayer.
TopLayer changed frequently.
I want to draw bottomLayer to totalCanvas. Which is the fastest way?
Graphics canvas(totalCanvas);
canvas.DrawImage(bottomLayer, 0, 0); step1
canvas.DrawImage(topLayer ,XXXXX); step2
I want step1 to be as fast as possible. Can anyone give me some sample?
Thanks very much!
Thanks for unwind's answer. I write the following code:
Graphics canvas(totalCanvas);
for (int i = 0; i < 100; ++i)
{
canvas.DrawImage(bottomLayer, 0,0);
}
this part takes 968ms... it is too slow...
Almost all GDI+ operations should be implemented by the driver to run as much as possible on the GPU. This should mean that a simple 2D bitmap copy operation is going to be "fast enough", for even quite large values of "enough".
My recommendation is the obvious one: don't sweat it by spending time hunting for a "fastest" way of doing this. You have formulated the problem very clearly, so just try implementing it that clearly, by doing it as you've outlined in the question. Then you can of course go ahead and benchmark it and decide to continue the hunt.
A simple illustration:
A 32 bpp 400x300 bitmap is about 469 KB in size. According to this handy table, an Nvidia GeForce 4 MX from 2002 has a theoretical memory bandwidth of 2.6 GB/s. Assuming the copy is done in pure "overwrite" mode, i.e. no blending of the existing surface (which sounds right, as your copy is basically a way of "clearing" the frame to the copy's source data), and an overhead factor of four just to be safe, we get:
(2.6 * 2^30 / (4 * 469 * 2^10)) = 1453
This means your copy should run at 1453 FPS, which I happily assume to be "good enough".
If at all possible (and it looks like it from your code), using DrawImageUnscaled will be significgantly faster than DrawImage. Or if you are using the same image over and over again, create a TextureBrush and use that.
The problem with GDI+, is that for the most part, it is unaccelerated. To get the lightening fast drawing speeds you really need GDI and BitBlt, which is a serious pain in the but to use with GDI+, especially if you are in Managed code (hard to tell if you are using managed C++ or straight C++).
See this post for more information about graphics quickly in .net.

Showing too much 'skin' detection in software

I am building an ASP.NET web site where the users may upload photos of themselves. There could be thousands of photos uploaded every day. One thing my boss has asked a few time is if there is any way we could detect if any of the photos are showing too much 'skin' and automatically move flag these as 'Adults Only' before the editors make the final decision.
Your best bet is to deal with the image in the HSV colour space (see here for rgb - hsv conversion). The colour of skin is pretty much the same between all races, its just the saturation that changes. By dealing with the image in HSV you can simply search for the colour of skin.
You might do this by simply counting the number of pixel within a colour range, or you could perform region growing around pixel to calculate the size of the areas the colour.
Edit: for dealing with grainy images, you might want to perform a median filter on the image first, and then reduce the number of colours to segment the image first, you will have to play around with the settings on a large set of pre-classifed (adult or not) images and see how the values behave to get a satisfactory level of detection.
EDIT: Heres some code that should do a simple count (not tested it, its a quick mashup of some code from here and rgb to hsl here)
Bitmap b = new Bitmap(_image);
BitmapData bData = b.LockBits(new Rectangle(0, 0, _image.Width, _image.Height), ImageLockMode.ReadWrite, b.PixelFormat);
byte bitsPerPixel = GetBitsPerPixel(bData.PixelFormat);
byte* scan0 = (byte*)bData.Scan0.ToPointer();
int count;
for (int i = 0; i < bData.Height; ++i)
{
for (int j = 0; j < bData.Width; ++j)
{
byte* data = scan0 + i * bData.Stride + j * bitsPerPixel / 8;
byte r = data[2];
byte g = data[1];
byte b = data[0];
byte max = (byte)Math.Max(r, Math.Max(g, b));
byte min = (byte)Math.Min(r, Math.Min(g, b));
int h;
if(max == min)
h = 0;
else if(r > g && r > b)
h = (60 * ((g - b) / (max - min))) % 360;
else if (g > r && g > b)
h = 60 * ((b - r)/max - min) + 120;
else if (b > r && b > g)
h = 60 * ((r - g) / max - min) + 240;
if(h > _lowerThresh && h < _upperThresh)
count++;
}
}
b.UnlockBits(bData);
Of course, this will fail for the first user who posts a close-up of someone's face (or hand, or foot, or whatnot). Ultimately, all these forms of automated censorship will fail until there's a real paradigm-shift in the way computers do object recognition.
I'm not saying that you shouldn't attempt it nontheless; but I want to point to these problems. Do not expect a perfect (or even good) solution. It doesn't exist.
I doubt that there exists any off-the-shelf software that can determine if the user uploads a naughty picture. Your best bet is to let users flag images as 'Adults Only' with a button next to the picture. (Clarification: I mean users other than the one who uploaded the picture--similar to how posts can be marked offensive here on StackOverflow.)
Also, consider this review of an attempt to do the same thing in a dedicated product: http://www.dansdata.com/pornsweeper.htm.
Link stolen from today's StackOverflow podcast, of course :).
We can't even write filters that detect dirty words accurately in blog posts, and your boss is asking for a porno detector? CLBUTTIC!
I would say your answer lies in crowdsourcing the task. This almost always works and tends to scale very well.
It doesn't have to involve making some users into "admins" and coming up with different permissions - it can be as simple as to enable an "inappropriate" link near each image and keeping a count.
See the seminal paper "Finding Naked People" by Fleck/Forsyth published in ECCV. (Advanced).
http://www.cs.hmc.edu/~fleck/naked.html
Interesting question from a theoretical / algorithmic standppoint. One approach to the problem would be to flag images that contain large skin-colored regions (as explained by Trull).
However, the amount of skin shown is not a determinant of an offesive image, it's rather the location of the skin shown. Perhaps you can use face detection (search for algorithms) to refine the results -- determine how large the skin regions are relative to the face, and if they belong to the face (perhaps how far below it they are).
I know either Flickr or Picasa has implemented this. I believe the routine was called FleshFinder.
A tip on the architecture of doing this:
Run this as a windows service separate from the ASP.NET Pipeline, instead of analyzing images in real time, create a queue of new images that are uploaded for the service to work through.
You can use the normal System.Drawing stuff if you want, but if you really need to process a lot of images, it would be better to use native code and a high performance graphics library and P/invoke the routine from your service.
As resources are available, process images in the background and flag ones that are suspicious for editors review, this should prune down the number of images to review significantly, while not annoying people who upload pictures of skin colored houses.
I would approach the problem from a statistical standpoint. Get a bunch of pictures that you consider safe, and a bunch that you don't (that will make for a fun day of research), and see what they have in common. Analyze them all for color range and saturation to see if you can pick out characteristics that all of the naughty photos, and few of the safe ones have.
Perhaps the Porn Breath Test would be helpful - as reported on Slashdot.
Rigan Ap-apid presented a paper at WorldComp '08 on just this problem space. The paper is allegedly here, but the server was timing out for me. I attended the presentation of the paper and he covered comparable systems and their effectiveness as well as his own approach. You might contact him directly.
I'm afraid I can't help point you in the right direction, but I do remember reading about this being done before. It was in the context of people complaining about baby pictures being caught and flagged mistakenly. If nothing else, I can give you the hope that you don't have to invent the wheel all by yourself... Someone else has been down this road!
CrowdSifter by Dolores Labs might do the trick for you. I read their blog all the time as they seem to love statistics and crowdsourcing and like to talk about it. They use amazon's mechanical turk for a lot of their processing and know how to process the results to get the right answers out of things. Check out their blog at the very least to see some cool statistical experiments.
As mentioned above by Bill (and Craig's google quote) statistical methods can be highly effective.
Two approaches you might want to look into are:
Neural Networks
Multi Variate Analysis (MVA)
The MVA approach would be to get a "representative sample" of acceptable pictures and of unacceptable pictures. The X data would be an array of bytes from each picture, the Y would be assigned by you as a 1 for unacceptable and a 0 for acceptable. Create a PLS model using this data. Run new data against the model and see how well it predicts the Y.
Rather than this binary approach you could have multiple Y's (e.g. 0=acceptable, 1=swimsuit/underwear, 2=pornographic)
To build the model you can look at open source software or there are a number of commercial packages available (although they are typically not cheap)
Because even the best statistical approaches are not perfect the idea of also including user feedback would probably be a good idea.
Good luck (and worst case you get to spend time collecting naughty pictures as an approved and paid activity!)

Resources