how to modify GAN to work well on larger image sizes - generative-adversarial-network

I am working with the Image to Image GAN architecture presented by Isola et al. https://arxiv.org/abs/1611.07004
Their architecture was used on images of 256x256x3. I tried training with 512x512x3 images but my results are not as good as with my test using their smaller resolution. In particular, it seems that small details seem to be missing from what the GAN generates. It also quickly reaches a failure mode where the generator starts degrading (from my reading this is probably caused by the discriminator not being powerful enough to discern a difference so the generator starts learning random effects). Are there any general rules for adapting the architecture for larger resolution input? Also how to change the architecture to better generate small (high frequency) details? Or if anyone has links to papers showing GANs generating lots of detail.

Use progressive gans (PCGANs) for outputting the larger resolution images.

Related

QR Code Recognition in AGV (Auto Guided Vehicle)

I have some questions.
The first question is which equipment should be used to recognize QR Code.
I'm thinking of two things.
The first is the QR code Scanner used in the industrial field.
The second is the camera module. (opencv will be used)
However, the situation to consider is that it should be recognized at the speed of 50cm/s.
What do you think about?
And if I use a camera, is there a library that you can recommend to recognize QR Code? (C/C++ only)
Always start with the simplest solution and then go more complex if needed. If you're using ROS/OpenCV, OpenCV has a QR Code scanner, ex. Other options include ZBar, quirc, and more, found by searching github or the internet.
As for a camera, if you don't need the intrinsic matrix, then you only need to decide on the resolution: more resolution takes (non-linearly) longer to compute, but less resolution prohibits seeing the objects well.
Your comment about "recognize at 50cm/s" doesn't make much sense. I assume you mean that you want to be able to decode a QR code that's up-to 50 cm away, and do it in less than a second (to have time to stop). First you'll have to check if the algorithm, running on your hardware, can detect the QR code at different desired distances, and how that changes with scaling the image up/down in OpenCV. Then you'll have to time how long it takes to detect/decode it at those distances/resolutions/scales. If it fails to be good enough, you can try another algorithm, try different compilation settings, perhaps give it it's own thread, change the scaling on the image, accept the limitations, or change the hardware.

frequency analysis of sound

I record birds cries with two microphones. The records can go up to 3 hours and it is time-consuming on audacity to listen to the whole file each day. What I want is a script that takes my original file and gives me a bunch of short audio files, each containing a bird cry. With my microphones I am able to record in mp3 or wav. But the script should take only cries that have a higher frequency than nHz. This frequency represents the background sound that is fixed and that should not be saved. I don't know which language is the best for that and I have absolutly no idea how to do that.
Thank you all,
Thomas
This should be pretty easily doable in a variety of languages but Python is a decent place to start. I'll link you some relevant resources to get you started and then you can narrow your question if you run into problems.
To read your audio file in .wav format look at this documentation.
To take the data from your audio file and put it into a numpy array see this question and answer.
Here is the documentation for computing the Fourier transform of your data (to get the frequency content).
I would suggest taking a moving window and computing the Fourier transform of the data within that window and then saving the result to a file if there's significant content above your threshold frequency. The first link should have info on saving the audio file.
You can get some background on using the Fourier transform for this type of application from this Q&A and if it turns out that your problem is really difficult, I would suggest looking into some of the methods for speech detection.
For a more out-there suggestion, you could try frequency shifting your recording by adjusting the sample rate to make bird sounds resemble human speech and then use a black box tool like Googles VAD to pick out the bird calls. I'm not sure how well that would work though.
The problem of cutting up a long file into sections of interest is usually referred to as (automatic) Audio Segmentation. If you are willing to have a fixed audio clips out (say 10 seconds), you can also treat it as an Audio Classification problem.
The latter is very well studied problem, also applied to birds.
The DCASE2018 challenge had one taks about Bird Detection, and has lots of advanced methods. Basically all the best performing systems use a Constitutional Neural Network on log-scaled mel-spectrograms. A mel-spectrogram is 2D, so it basically becomes image classification. Many of the submissions are open source, so you can look at the code and play with them. Do note they are mostly focused on scoring well in a research competition, not to be practical tools for splitting a few files.
If you want to build your own model for this, I would recommend going with a Convolutional Neural Network pretrained on images, then pretrain on DCASE2018 data, then test it on your own data. That should give a very accurate system, though it will take a while to set up.

Depth Of Field in OpenCL

This might be a "homework" issue, but I think I did enough so I can get a help here.
In my assignment, we have a working OpenGL/OpenCL application. OpenGL application renders a scene and OpenCL should apply depth-of-field like effect. OpenCL part gets texture where each pixel has original color and depth and should output color for given pixel. I'm supposed to only change per-pixel function, that is part of the OpenCL.
I already have working solution using variable-size gausian filter, that samples area around calculated pixel. But it gets laggy on higher resolutions even on my dedicated NVidia graphics card. I tried optimizing out most of the redundant operations, but I haven't gotten much performance gain.
I also tried searching the web, but all algorithms I'm finding are closely tied to graphical pipeline of OpenGL or DirectX, nothing that can be used in my scenario.
Are there any algorithms, that could work in my situation?
AMD APP SDK has a sample called URNGGL (Uniform Random Noise Generator with OpenGL/OpenCL interoperability).
Have a look at https://github.com/clockfort/amd-app-sdk-fixes/tree/master/samples/opencl/cl/app/URNG.

What's the fastest force-directed network graph engine for large data sets?

We currently have a dynamically updated network graph with around 1,500 nodes and 2,000 edges. It's ever-growing. Our current layout engine uses Prefuse - the force directed layout in particular - and it takes about 10 minutes with a hefty server to get a nice, stable layout.
I've looked a little GraphViz's sfpd algorithm, but haven't tested it yet...
Are there faster alternatives I should look at?
I don't care about the visual appearance of the nodes and edges - we process that separately - just putting x, y on the nodes.
We do need to be able to tinker with the layout properties for specific parts of the graph, for instance, applying special tighter or looser springs for certain nodes.
Thanks in advance, and please comment if you need more specific information to answer!
EDIT: I'm particularly looking for speed comparisons between the layout engine options. Benchmarks, specific examples, or just personal experience would suffice!
I wrote a JavaScript-based graph drawing library VivaGraph.js.
It calculates layout and renders graph with 2K+ vertices, 8.5K edges in ~10-15 seconds. If you don't need rendering part it should be even faster.
Here is a video demonstrating it in action: WebGL Graph Rendering With VivaGraphJS.
Online demo is available here. WebGL is required to view the demo but is not needed to calculate graphs layouts. The library also works under node.js, thus could be used as a service.
Example of API usage (layout only):
var graph = Viva.Graph.graph(),
layout = Viva.Graph.Layout.forceDirected(graph);
graph.addLink(1, 2);
layout.run(50); // runs 50 iterations of graph layout
// print results:
graph.forEachNode(function(node) { console.log(node.position); })
Hope this helps :)
I would have a look at OGDF, specifically http://www.ogdf.net/doku.php/tech:howto:frcl
I have not used OGDF, but I do know that Fast Multipole Multilevel is a good performant algorithm and when you're dealing with the types of runtimes involved with force directed layout with the number of nodes you want, that matters a lot.
Why, among other reasons, that algorithm is awesome: Fast Multipole method. The fast multipole method is a matrix multiplication approximation which reduces the O() runtime of matrix multiplication for approximation to a small degree. Ideally, you'd have code from something like this: http://mgarland.org/files/papers/layoutgpu.pdf but I can't find it anywhere; maybe a CUDA solution isn't up your alley anyways.
Good luck.
The Gephi Toolkit might be what you need: some layouts are very fast yet with a good quality: http://gephi.org/toolkit/
30 secondes to 2 minutes are enough to layout such a graph, depending on your machine.
You can use the ForAtlas layout, or the Yifan Hu Multilevel layout.
For very large graphs (+50K nodes and 500K links), the OpenOrd layout wil
In a commercial scenario, you might also want to look at the family of yFiles graph layout and visualization libraries.
Even the JavaScript version of it can perform layouts for thousands of nodes and edges using different arrangement styles. The "organic" layout style is an implementation of a force directed layout algorithm similar in nature to the one used in Neo4j's browser application. But there are a lot more layout algorithms available that can give better visualizations for certain types of graph structures and diagrams. Depending on the settings and structure of the problem, some of the algorithms take only seconds, while more complex implementations can also bring your JavaScript engine to its knees. The Java and .net based variants still perform quite a bit better, as of today, but the JavaScript engines are catching up.
You can play with these algorithms and settings in this online demo.
Disclaimer: I work for yWorks, which is the maker of these libraries, but I do not represent my employer on SO.
I would take a look at http://neo4j.org/ its open source which is beneficial in your case so you can customize it to your needs. The github account can be found here.

What are the options and best practices for PV3D inspired modeling

The studio I work at is currently developing the Tony Hawk XI website and I am responsible for the flash/AS3 development. As part of the pitch, I entered an augmented reality skateboard example to be shown which impressed the client very much.
After a few weeks of getting stronger with Papervision3D, and getting to know the Flar Toolkit, I have successfully imported md2 and dae files that load and interact with my custom marker.
Now it has come time to develop some of my own models; I will be using 3DSMAX. I want to know what the limitations are on things like poly-count, character rigging and animation, texturing, tricks for exporting and creating the proper format file and any other bits of information that may save me some serious headaches down the road.
Currently I have a Quake2 MD2 model, Ernie, pulled inside of a FlarToolkit demo here.
This is very low-poly and I was wondering how many polys could I expect to get away with being that today's machines are so much faster;
Brian Hodgeblog.hodgedev.com hodgedev.com
I've heard that 2000 polys is about the threshold for good performance. In practice though, its been hit or miss and a lot of things can have an impact. So far I've run into perfomance hits when using animated movieclip materials, animated materials with an alpha chanel and precise materieals.
Having to clip objects seems to be a double edged sword. In some cases, it will increase performance by a good deal, and in others (seems to be primarily when there are alot of polys on the edge of the viewport) it'll drop the framerate by a good 10-15 fps. So, I'd say the view you setup is something to think about as well.
For example, we have a model of an interior of a store with some shelves and products and customers walking around. In total we have just under 600 triangles (according to the StatsView, which you should check out if you haven't yet: org.papervision3d.view.stats.StatsView). On my computer, which is a new computer with a quad core it runs at a steady 30fps (which is where we want it), but on an old Dell XPS (Pentium 4) it runs between 20 and 30fps depending on what objects are being clipped, etc.
We try to reduce the poly count and texture creatively to fix as many of the performance issues as possible. Unfortunatley our minimum specs are really low, so we need to do alot to get it to run well.
Edit:
Another thing we're doing is swapping out less detailed models for higher detailed ones when zoomed in. If you aren't zooming at all, than this probably won't help.
Hope that helps a bit.

Resources