Create heatmap in R using stat_density2d - r

I have several (x,y) coordinates, and each one is associated with a binary value (either 1 or 0). I want to create a heatmap showing what the probability is at each point that a given point in that location will have a 1 associated with it.
Sample data:
data = read.table(header=TRUE,
text="x y value
7 3 0
4 5 0
3 7 1
3 6 0
4 5 1
5 6 0")
And so on. I can create a plot showing where the points are concentrated using the following:
ggplot(data, aes(x=x,y=y)) + stat_density2d(aes(fill=..level..), geom="polygon")
But when I try to set fill = value, I get the following error:
Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0
How do I do this?
Edit: I should add that I can easily accomplish this using stat_summary2d or even geom_tile, but it looks much more boxy and less smooth, which I want it to be.

Related

Grouping Set of Points to a Pre Defined Point

I'm looking to create a model that classifies a set of points that are near a pre-defined point.
For example, let's say I have points:
X
Y
1
1
1
2
1
3
2
1
2
3
3
1
3
2
3
3
6
6
8
7
8
5
9
3
10
7
My goal is to identify which points are closest to predefined point (2,2) and ideally output which points those are.
I tried using KNN, but I could not figure out how to get the KNN model to train results near (2,2). Any guidance to how I may accomplish this would be awesome. :)
Plot of Points
df <- data.frame( x = c(1,1,1,2,2,2,3,3,3,6,8,8,9,10), y = c(1,2,3,1,2,3,1,2,3,6,7,5,3,7))
df
goal_point <- c(x=2,y=2)
goal_point
You might approach this by calculating distance from goal as a feature.
df$dist = sqrt((df$x - goal_point["x"])^2 +
(df$y - goal_point["y"])^2)
df$clust = kmeans(df, 2)$cluster
library(ggplot2)
ggplot(df, aes(x, y, color = clust)) +
geom_point()
In this case kmeans is using x, y, and distance from goal. You could also use just distance from goal by using df$clust = kmeans(df[,3], 2)$cluster, which would lead here to the same clustering.

Connect observations with lines to a common point (i.e. (0,0))

I would like to connect observations from my df with a common point, i.e. the centerpoint (0,0) using ggplot2.
x y
1 5 4
2 -4 -2
3 -1 5
4 2 -8
Using geom_point(), I get the following.
Now, I would like to have lines connecting the four observations with the centerpoint at (0,0), like in the following (not made with R):
Is this possible at all using ggplot2?
I found a solution:
ggplot(df) + geom_point(aes(x,y)) + geom_segment(aes(xend=0, yend=0))
Answer based on #roland comments on a question.

How do I change the size of the points based on the value of the column using ggplot?

Using ggplot on rstudio, I am trying to change the size of the point of my scatter plot based on the log of the pvalue column. This is how my matrix looks like.
head(BDpvalue)
id t-value pvalue mean.f mean.m Gene Chromosome
1 ILMN_1212619 3.0512842692996 0.00938046962249251 85.40076 80.02744 Mfap3l 8
2 ILMN_1212693 3.40887110529531 0.00452088152864021 87.28189 82.89533 Snx33 9
3 ILMN_1213324 -4.54750670298688 0.000414140589714275 82.68924 88.81421 Zfp961 8
4 ILMN_1213848 -3.63180275429357 0.00246745595956587 421.61780 469.51845 Itgb1bp1 12
5 ILMN_1213961 2.97573716869553 0.00960659647288939 82.01748 78.44721 Copg2 6
6 ILMN_1214482 -4.23666060706341 0.000813240203181102 136.55021 153.34681 2700081O15Rik 19
>
The code to change the size based on the log of the pvalue column seems correct to me, but for some reason I am not seeing a change in the graph, this is the code that I used.
ggplot(BDpvalue, aes(x=(log(mean.m,10)+log(mean.f,10))/2,
y= log(mean.f/mean.m,10),color=Chromosome) + geom_point(aes(size=(-log(pvalue,10))))

Color the individuals of a R PCoA plot by groups

Should be a simple question, but I haven't found exactly how to do it so far.
I have a matrix as follow:
sample var1 var2 var3 etc.
1 5 7 3 1
2 0 1 6 8
3 7 6 8 9
4 5 3 2 4
I performed a PCoA using Vegan and plotted the results. Now my problem is that I want to color the samples according to a pre-defined group:
group sample
1 1
1 2
2 3
2 4
How can I import the groups and then plot the points colored according to the group tey belong to? It looks simple but I have been scratching my head over this.
Thanks!
Seb
You said you used vegan PCoA which I assume to mean wcmdscale function. The default vegan::wcmdscale only returns a scores matrix similarly as standard stats::cmdscale, but if you added some special arguments (such as eig = TRUE) you get a full wcmdscale result object with dedicated plot and points methods and you can do:
plot(<pcoa-result>, type="n") # no reproducible example: edit like needed
points(<pcoa-result>, col = group) # no reproducible example: group must be visible
If you have a modern vegan (2.5.x) the following also works:
library(magrittr)
plot(<full-pcoa-result>, type = "n") %>% points("sites", col = group)

Visualize igraph degree distribution with ggplot2

I want to visualise the degree distribution of an igraph object with ggplot2. Because ggplot2 doesn't take a the simple numeric vector generated by degree() I convert it to a frequency table. Then I pass it to ggplot(). Still I get: geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic? I can't set the table column degree to factors since I need to plot it also on a log scale.
library(igraph)
library(ggplot2)
g <- ba.game(20)
degree <- degree(g, V(g), mode="in")
degree
# [1] 6 2 7 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
degree <- data.frame(table(degree))
degree
# degree Freq
# 1 0 13
# 2 1 4
# 3 2 1
# 4 6 1
# 5 7 1
ggplot(degree, aes(x=degree, y=Freq)) +
geom_line()
# geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
The problem is that you have turned degree$degree into a factor by using table. Two things to fix this:
make it a factor with all possible values (up to the largest degree) so that you don't miss the zeros.
convert the labels back to numbers before plotting
Implementing those (I used degree.df instead of overwriting degree to keep the different steps distinct):
degree.df <- data.frame(table(degree=factor(degree, levels=seq_len(max(degree)))))
degree.df$degree <- as.numeric(as.character(degree.df$degree))
Then the plotting code is what you had:
ggplot(degree.df, aes(x=degree, y=Freq)) +
geom_line()

Resources