How can I convert a scatterplot into a hexagonal/honeycomb chart? - math

I have data in a scatterplot that places colors respective of their lightness (x-axis) and saturation (y-axis):
Here's the data set in a spreadsheet.
I would like to transform this into a hexagonal/honeycomb chart. I did this by hand...finding some "lines" in the data for the edges, and then filling in the middle based on intuition:
I'm not sure if that's the "best" honeycomb representation of the data, but something that looks okay to me.
Does anyone have a suggestion on how I could make this process into an algorithm? I have a feeling there's some math or algorithms that would fit this problem which I am unaware of.
Thanks!

Related

heatmap.2 color legend custom bins

Hi there stackoverflow community!
I am a graduate student inquiring for some consultation on an aethetics R problem I am encountering.
The data I am working with is in the form of a VERY large matrix (49x51).
My problem is that my data ranges from very small to very large, with the bulk of my data falling within the "very large" end of the spectrum, so unless I convert my data to log10, the heatmap is rather boring and almost entirely the same color.
The spectrum of my data is totally within the range I am expecting, but I am hoping to display it in a more aesthetic way.
Proposed solution: I think I need to bin my data in a non-uniform way. If you look at the attached image, you will see that their heatmap looks nice and the color key shows the heat spectrum in a non-fixed bin format. I would like to do something like that, however, I am not sure how to declare cutoffs for each bin. I would ideally like to declare the cutoffs.
For example, bin 1 (0-1), bin 2 (2-50), bin 3 (51-5000). As you can see, my bins would not be fixed in equal increments.
I have been using heatmap.2 for this. Thanks so much in advance!
heatmap with color legend in non-uniform bins:
Hey #Punintended and #S Rivero,
I think I have reached the point that my heatmap will only improve marginally. Both of you contributed deeply to this success, so thanks! First, to condense the matrix values as much as possible, I normalized by column. I was then able to assign gradients. This turned out much better than I had hoped. As you can see, most of my data is clustered (check out the density in the key) at very low values, this is okay though, for I am interested in the higher values. I had to use custom color gradients to account for possible instances of colorblind attendees that might look at my poster. Anyways, if you guys have comments or recommendations, they will be much appreciated :). Again, thanks a bunch!
enter image description here

Decreasing the range of a dataset

I have a dataset which ranges from 0.00000787 to 1.39151821, quite a large disparity when it comes to plotting the data. I'd like to try and decrease the range of data so the plot (I'm using a colour coded plot, and right now it's pretty monotonous) is more visually understandable. I tried using log(dataset) however this creates some negative numbers which my software doesn't like.
Mathematics is not my strong point, if someone could recommend a method of fitting my data into a smaller range it would be much appreciated.
Thanks.
Try log + 1, like this:
list<-seq(0.00000787,1.39151821,0.01)
plot(log(list+1))

R - Heatmap from sparse 2d data

I'd like to achieve what this person has achieved without using ggplot. Any ideas?
How do I create a continuous density heatmap of 2D scatter data in R?
You can see what I get when using the solution detailed in that question.
ggplot(df,aes(x=x,y=y))+
stat_density2d(aes(alpha=..level..), geom="polygon") +
scale_alpha_continuous(limits=c(0,1),breaks=seq(0,1,by=0.1))+
geom_point(colour="red",alpha=0.2)+
theme_bw()
The heatmap is so sparse. I want it to cover much more than what it is covering now. It's terribly hard to see anything about the density. Any ideas of different ways to make density heatmaps from 2D data besides this ggplot solution?
One idea I had was instead of using linear color labeling (see the black to white spectrum on the left, which is linear), using logarithmic scale for the density labeling. Any ideas how I could do this?
"The heatmap is so sparse. I want it to cover much more than what it is covering now. It's terribly hard to see anything about the density."
Please be specific: what do you want to see in areas with most or all NAs?
if you use geom_point with alpha-blending and position_jitter, the current plot is as good as it gets
if some solid color, then use geom_hex(), see http://mfcovington.github.io/r_club/solutions/2013/02/28/peer-produced-plots-solutions/ for code. Then play with the continuous color_scale... you probably want a nonlinear transform. Post us your revised attempt, if you want a critique.
I actually ended up using smoothScatter, which works well and uses classic R plotting.

Labeling plot maximum peaks in GNUPlot

I have some data that I'm plotting with GNUPlot. I have three different data sets for different energies. What I need to do is label the maximas on the plot. For example, I need something like (20, 4.5) for the red plot. The values do not need to be above the maximas, as they only need to be distinguishable to which is what. Is there any easy way to do this in GNUPlot? I haven't been able to find anything online.
Thanks in advanced. Below is an example plot that I'm trying to work with. It wouldn't let me post images so I'm posting the link below.
http://i.imgur.com/xA3q52I.png
I think this example can help
http://www.gnuplot.info/demo/stats.html

Applying functions from histograms - in R

I have a very basic grasp of stats, and a very basic grasp of R so please bear with me.
I have survey data which shows the weekly expenditure of a number of respondents. I have put this into a histogram, and have plotted a density function as well. So far so good.
How do I then apply this curve to a larger population? Say that I know that the population of my town is 25000. How can I apply that to the density curve to arrive at a new histogram and the data table behind it?
I hope this is an appropriate question, thank you.
It is not exactly clear what you want to do.
If you only have data on the sample then the best estimate that you have of the histogram/density for the population is the histogram/density of the sample, the only difference would be the scale on the y-axis. Personally I think the tick marks on the y axis should be ignored (and my preference would be that the tick labels were never plotted) since it is really the shape of the histogram/density that is important and the tick labels can change based on things that don't change the meaning. If you really feel the need to have the tick labels represent population values then see the axis function.
If you want something more than this then give us a better description of what you are trying to accomplish.

Resources