Create clusters in R - r

I have a df that looks like:
selection.body selection.hair selection.eyes selection.breasts selection.butt selection.skin
normal blonde other large medium tanned
normal blonde other xl medium tanned
normal blonde other large medium tanned
chubby blonde blue xl large tanned
slim blonde other medium small white
Let's imagine this dataset as the answer to a survey:
each row represents the choice of a single responder, selecting his preference from a closed set of preferences.
What I did already is checking the frequencies of each choice but I want to move forward with that.
My goal is to:
identify the most common combinations of choices.
group the users on the basis of this combination.
the correlations between the choices
Thanks for your hints.

Finding the most common combinations isn't clustering, but frequent itemset mining.
Have you tried apriori?

Related

cycleGAN with blank background

I am building a cycleGAN with a u-net structure to improve the image quality of Cone Beam Computer Tomography (CBCT), with the Fan Beam Computer Tomography FBCT) being the target images. Because I want to mainly enhance the quality in the lung region, I crop the lung volumes out of the original images and assign value 0 to the remaining regions other than the lung (given that the lung pixel value ranges from -1000 to 0). Scaling and normalization are done to the array with the range of -1000 to 0. Please see the following images as an example for my dataset:
In the example above, the left most image is the CBCT and the right most image is the FBCT. The middle one is the generated image from the CBCT by the cycleGAN model. This is actually an example from the very early epoch of the training.
But when the model goes on training, it somehow gradually loses its ability to capture the anatomy of the images, and eventually generate a blank image with all value 0. (see image below)
The loss along the training is climbing back up after several epoch and it starts to lose the anatomy information:
What makes me curious is that such loss of anatomy does not occur when I simply input the whole CBCT and FBCT images as the dataset, without doing lung segmentation nor value assignment to the regions outside the lung. If un-segmented images are given, the model actually successfully translate the CBCT into mimicking the FBCT quality. I do the segmentation since I want the model to only concentrate in the lung region to see if it performs better.
I wonder if this is the consequences with that the background has extremely high value than the region of interest (i.e. background value: 0; lung value: -1000 to 0). Is there any work published on cycleGAN training with images containing blank background? If yes, is there any special measure when assigning value to the background, or when doing normalization and scaling? I can’t really find any so far.
Any insight is appreciated. Thank you.

How to analyse spatial data using grid codes from a map

I would like to analyse movement data from a semi-captive animal population. We record their location every 5 mins using a location code which corresponds to a map of the reserve we have made ourselves. Each grid square represents 100 square meters, and has a letter and number to correspond with each grid square e.g. H5 or L6 (letters correlate with columns, whereas numbers correlate with rows.I would like to analyse differences in space use between three different periods of time, to answer questions such as do the animals move around more in certain periods, or are more restricted in their space use in other periods. Please can someone give me any indication of how to go about this? I have looked into spatial analysis in rstudio but haven't come across anything that doesn't use official maps or location co-ordinates. I've not done this type of analysis before so any help would be greatly appreciated! Thanks so much.

HX711 and weight cell, throws large negative value

I have successfully installed 3 weight cells to Particle Photon, using HX711. The fourth weight cell shows large negative values e.g.-69798 when a certain amount of weight is added. The weight cell should be able to measure up to 10kg, as do the other three weight cells.
Is there a possible explanation for this outcome? When lowering the weight (on all four weight cells), which are connected to a single plate, the values look good. When adding about 3-4kg on the plate, the three weight cells shows good values while the fourth shows large negative values.
Simple solution. Just turn the load cell by 180 degree in vertical axis. Sometimes the manufacturer puts the sticker other way around.
This should solve.

what kind of qr-like code is divided into 4 quarters

I was wondering what kind of QR code does not sport squares in the corners and is divided into 4 quarters by a solid black line? I would like to replicate this, since I think they look more professional than the variety I have seen before, but I cannot find out what kind of code it would be?
It's a data matrix 2D barcode.
http://en.wikipedia.org/wiki/Data_Matrix
http://en.wikipedia.org/wiki/Barcode
It only looks divided into 4 blocks when you get above a certain size (such as 20x20 shown below)
http://jpgraph.net/download/manuals/chunkhtml/images/datamatrix-structure-details.png
This article talks more about these blocks (or more officially, 'Data Regions' or 'sub matrices'). See page 12 for a table of common sizes and data region breakdowns:
http://www.gs1.org/docs/barcodes/GS1_DataMatrix_Introduction_and_technical_overview.pdf
The matrix symbol (square or rectangle) will be composed of several
areas of data (or: Data Regions), which together encode the data.

Expressing order or disorder mathematically

I work for game development company which makes casual games. One of the main casual genres is match-3: there is a field and chips of different colors. One should move chips so that they make lines of at least three chips of the same color. If the move leads to making a line the chips in the line disappear.
Chips on field can be located differently: there may be a lot of chips of the same color gouped in one place or there may be a situation when a player can't make a move - all the neighbour chips are of the different colors.
So, I want to express the situation on the field mathematically with a factor of order (disorder). If the factor is high a player can make a lot of matches and the lines made by the player are long. If the factor is low, the field is in complete disorder and one can't make a single match. This may be helpful for generating field of different difficulty.
The question is: what branch of math can help me to do this. Where should I start my research. Any suggestions for keywords to google?
Thanks in advace.
Entropy.
I would look into graph theory. You can for example make a graph, where nodes would be positions on the board, and two nodes would be connected with an edge if they are neighbours and have a chip of the same color. If you have large components with nodes of large degree, you have less disorder. If all your components are small, you have high disorder.
First thing that comes to mind is that you're looking at the distribution of n populations (one for each color), which I would approach with Poisson sampling,. You can use that to calculate the probability of finding two adjacent units of the same population (color), which will give you a measure of the difficulty of your puzzle.

Resources