Sequence labeller - r

I am analysing a time series signal. I set a threshold to separate the noise from the baseline noise. In order to identify the properties of each signal sequence (duration, amplitude, maximum signal...), I built a function to aggregate all the signal points that are continuous as different "peaks". Despite this function does what I want, I was wondering if anyone can help me to make it more efficient -e. g. vectorization, because I aim to run the function on a data.table of more than 1M rows. Here is a sample data with the function:
# Generate dummy data
x <- sin(seq(from = 0, to = 20, length.out = 200)) + rnorm(200, 0,0.1)
x <- zoo(x)
plot(x)
# Label each point as signal (== )1) or noise (0)
y <- ifelse(x > 0.5, 1, 0)
# Function to label each peak
peak_labeler <- function(x) {
tmp <- NULL
for (i in seq_along(x)) {
if (x[i] == 0) { tmp[i] <- 0 } # If baseline, mark as 0
if (x[i] == 1) {
# If x[n] belongs to a peak
if (i == 1) {tmp[i] <- 1} # Label as 1 at t0
else{
if (!exists("Peak")) {Peak <- 0}
if (x[i - 1] == 0) {
# if previous point is no peak, add as peak
Peak <- Peak + 1
tmp[i] <- Peak
}
if (x[i - 1] == 1) {
tmp[i] <- Peak
}
}
}
}
return(tmp)
rm(tmp, Peak, i) # Garbage collection
}
# Label peaks
dummy <- data.frame(t = 1:200, x,y,tmp = peak_labeler(y))
# Show data
ggplot(dummy, aes(x = t, y = x)) +
geom_point(aes(col = as.factor(tmp), group = 1))

Here's an approach using dplyr.
The test in the cross_threshold line works by evaluating whether y is on a different side of 0.5 than the prior y. If so, the sign of the two terms y - threshold and lag(y) - threshold will be different, leading to a TRUE, which is multiplied by 1 to become 1. If they're on the same side of 0.5, you'll get a FALSE and a 0. The default = 0 part deals with the first line, where lag(y) is undefined. Then we add up how many cumulative crosses there have been to define the tmp group.
library(dplyr)
threshold = 0.5
dummy <- data.frame(t = 1:200, x, y) %>%
mutate(cross_threshold = 1 * (sign(y - threshold) != sign(lag(y, default = 0) - threshold)),
# Line above now optional, just if we want to label all crossings
up = 1 * ((y > threshold) & (lag(y) < threshold)),
tmp = if_else(y > threshold, cumsum(up), 0))
ggplot(dummy, aes(x = t, y = x)) +
geom_point(aes(col = as.factor(tmp), group = 1)) +
geom_point(data = filter(dummy, cross_threshold == 1), shape = 21, size = 5)

Related

Coin flip probability

I'm wondering what I should be doing here (please refer to image). I have already defined two vectors which are k=c(0,1) and v=c(runif(2,0.3,0.7)) where alpha=v[1] and beta=v[2].
Afterwards, I used an if statement, if(Xn==k[1]){...} However this is where I am stuck at. According to the question, I have to assign Xn+1=k[1] with probability (alpha) at the same time Xn+1=k[2] with probability (1-alpha) and if(Xn==k[2]){...} then Xn+1=k[1] has probability (beta) and Xn+1=k[2] will have probability (1-beta).
So my question is how do you assign the values to the respective Xn+1 values of 0 and 1 with probabilities [(alpha), (1-alpha)] and [(beta),(1-beta)]. After assigning it, how do you then run a simulation of 500 observations from X1 to X500 of the random variable by using a for loop This is similar to the coin toss experiment with the exception being that probability of Heads and Tails are decided by [alpha,beta] = runif(2,0.3,0.7)`.
Here is a base R solution.
toss <- function(n = 500L){
a <- runif(2, min = 0.3, max = 0.7)
alpha <- a[1]
beta <- a[2]
x <- integer(n)
x[1] <- rbinom(1, size = 1, prob = alpha)
for(i in seq_len(n - 1)){
if(x[i] == 0)
x[i + 1L] <- rbinom(1, size = 1, prob = 1 - alpha)
else
x[i + 1L] <- rbinom(1, size = 1, prob = 1 - beta)
}
list(x = x, alpha = alpha, beta = beta)
}
set.seed(2021)
X <- toss()
table(X$x)
#
# 0 1
#277 223
mean(X$x)
#[1] 0.446
X$alpha
#[1] 0.4805069
X$beta
#[1] 0.6135119
Histogram of 1000 runs.
To run the function repeatedly, use replicate.
Y <- replicate(1000, mean(toss()$x))
hist(Y, xlab = "Proportion of successes")

Infinite loop in coin-flipping game

Consider the following coin-flipping game:
A single play of the game consists of repeatedly flipping a fair coin until the difference between the number of heads tossed and the number of tails is 4.
You are required to pay 1 dollar for each flip of the coin, and you may not quit during the play of the game.
You receive 10 dollars at the end of each play of the game. The “winnings” from the game is defined as the 10 received at the end minus the amount paid. a. Simulate this game to estimate the expected winnings from many plays of the game. b. Suppose we use a biased coin. Find value(s) of P(tail) that make the game fair, meaning the expected winnings is 0 dollar.
This is the question that I'm supposed to answer and here is my try
h <- function() {
A <- c("H", "T")
s <- sample(A,4, replace = T)
heads <- length(which(s=="H"))
tails <- length(which(s =="T"))
w <- heads - tails
counter <- 4
while (w != 4) {
s <- sample(A,1)
w <- heads - tails
heads <- length(which(s=="H"))
tails <- length(which(s =="T"))
counter <- counter +1
}
return(counter)
}
h()
But I think this gave me a infinite loop, can anyone help please?
You are recomputing w in ever iteration of the loop based on the current value of heads and tails. But these values will always be 1 and 0 (or 0 and 1). So w is always either -1 or 1, never any other value.
Another error in your code is that you only stop when heads is 4 ahead. But according to the rules, the game should also stop when tails is 4 ahead: only the absolute difference matters.
The logic of your code could be fixed, but a much simpler logic would work (note that the following code uses self-explanatory variable names, which makes the resulting code much more readable):
h = function () {
sides = c('H', 'T')
diff = 0L
cost = 0L
repeat {
cost = cost + 1L
flip = sample(sides, 1L)
if (flip == 'H') diff = diff + 1L
else diff = diff - 1L
if (abs(diff) == 4L) return(cost)
}
}
You can simplify this further because the labels of the coin sides don’t actually matter. All you care about is a coin toss that returns one of two results.
We can implement that as a separate function. The return value of the function isn’t very important, as long as we have a fixed convention: it could be in c('H', 'T'), or c(FALSE, TRUE), or c(0L, 1L), etc. For our purposes, it would be convenient to return either -1L or 1L, so that our function h could directly add that value to diff:
coin_toss = function () {
sample(c(-1L, 1L), 1L)
}
But there’s a different way of obtaining a coin toss: a Bernoulli trial of size 1. And using a Bernoulli trial has a nice property: we can trivially extend our function to allow unfair (biased) coin tosses. So here’s the same function, but with an optional bias parameter (by default the coin toss is fair):
coin_toss = function (bias = 0.5) {
rbinom(1L, 1L, prob = bias) * 2L - 1L
}
(rbinom(…) returns either 0L or 1L. To transform the domain of values into c(-1L, 1L), we multiply by 2 and subtract 1.)
Now let’s change h to use this function:
h = function (bias = 0.5) {
cost = 0L
diff = 0L
repeat {
cost = cost + 1L
diff = diff + coin_toss(bias)
if (abs(diff) == 4L) return(cost)
}
}
coin_toss() is either 0 or 1 but, depending on its value, we either
I'd like to answer your questions, both a) and b) part. I'll use my codes to save my time.
It's a cool game, where software simulation could prove to be very helpful.
The bare bones of the game is "never ending loop", which eventually ends when absolute difference of the number of heads and tails is equal 4. The payoff is then recorded. As Konrad Rudolph mentioned, the game is of Bernoulli type. The game is simulated with the code below:
n_games <- 1000 # number of games to play
bias <- 0.5
game_payoff <- c()
for (i in seq_len(n_games)) {
cost <- 0
flip_record <- c()
payoff <- c()
repeat{
cost <- cost + 1
flip <- rbinom(1, 1, prob = bias)
flip_record <- c(flip_record, flip)
n_tails <- length(flip_record) - sum(flip_record) # number of 0s/tails
n_heads <- sum(flip_record) # number of 1s/heads
if (abs(n_tails - n_heads) == 4) {
game_payoff <- c(game_payoff, 10 - cost) # record game payoff
print(paste0("single game payoff: ", 10 - cost)) # print game payoff
break
}
}
}
With a large number of runs, e.g. another loop over this loop, we learn, that the expected value is very close to -6. Thus, the game has negative expected value. It follows from this code:
library(ggplot2)
seed <- 122334
# simulation
n_runs <- 100
n_games <- 10000
bias <- 0.5
game_payoff <- c()
expected_value_record <- c()
for (j in seq_len(n_runs)) {
for (i in seq_len(n_games)) {
cost <- 0
flip_record <- c()
payoff <- c()
repeat{
cost <- cost + 1
flip <- rbinom(1, 1, prob = bias)
flip_record <- c(flip_record, flip)
# print(flip_record)
n_tails <- length(flip_record) - sum(flip_record) # number of 0s/tails
n_heads <- sum(flip_record) # number of 1s/heads
if (abs(n_tails - n_heads) == 4) {
game_payoff <- c(game_payoff, 10 - cost) # record game payoff
print(paste0("single game payoff: ", 10 - cost))
break
}
}
}
expected_value_record <- c(expected_value_record, mean(game_payoff))
game_payoff <- c()
}
# plot expected value
expected_value_record <- cbind.data.frame("run" = seq_len(length(expected_value_record)), expected_value_record)
ggplot(data = expected_value_record) +
geom_line(aes(x = run, y = expected_value_record)) +
scale_x_continuous(breaks = c(seq(1, max(expected_value_record$run), by = 3), max(expected_value_record$run))) +
labs(
title = "Coin flip experiment: expected value in each run. ",
caption = paste0("Number of runs: ", n_runs, ". ", "Number of games in each run: ", n_games, "."),
x = "Run",
y = "Expected value") +
geom_hline(yintercept = mean(expected_value_record$expected_value_record), size = 1.4, color = "red") +
annotate(
geom = "text",
x = 0.85 * n_runs,
y = max(expected_value_record$expected_value_record),
label = paste0("Mean across runs: ", mean(expected_value_record$expected_value_record)),
color = "red") +
theme(plot.title = element_text(hjust = 0.5), plot.caption = element_text(hjust = 0.5))
Graphics:
Let's now look at part b) of the question with another simulation. The loop has been wrapped into a function, which with the help of sapply we run over a sequence of probabilities:
library(ggplot2)
seed <- 122334
# simulation function
coin_game <- function(n_runs, n_games, bias = 0.5){
game_payoff <- c()
expected_value_record <- c()
for (j in seq_len(n_runs)) {
for (i in seq_len(n_games)) {
cost <- 0
flip_record <- c()
payoff <- c()
repeat{
cost <- cost + 1
flip <- rbinom(1, 1, prob = bias)
flip_record <- c(flip_record, flip)
# print(flip_record)
n_tails <- length(flip_record) - sum(flip_record) # number of 0s/tails
n_heads <- sum(flip_record) # number of 1s/heads
if (abs(n_tails - n_heads) == 4) {
game_payoff <- c(game_payoff, 10 - cost) # record game payoff
break
}
}
}
expected_value_record <- c(expected_value_record, mean(game_payoff))
game_payoff <- c()
}
return(expected_value_record)
}
# run coin_game() on a vector of probabilities - introduce bias to find fair game conditions
n_runs = 1
n_games = 1000
expected_value_record <- sapply(seq(0.01, 0.99, by = 0.01), coin_game, n_runs = n_runs, n_games = n_games)
# plot expected value
expected_value_record <- cbind.data.frame("run" = seq_len(length(expected_value_record)), "bias" = c(seq(0.01, 0.99, by = 0.01)), expected_value_record)
ggplot(data = expected_value_record) +
geom_line(aes(x = bias, y = expected_value_record)) +
scale_x_continuous(breaks = c(seq(min(expected_value_record$bias), max(expected_value_record$bias), by = 0.1), max(expected_value_record$bias))) +
scale_y_continuous(breaks = round(c(0, seq(min(expected_value_record$expected_value_record), max(expected_value_record$expected_value_record), length.out = 10)), digits = 4)) +
labs(
title = "Coin flip experiment: expected value for each probability level",
caption = paste0("Number of runs per probability level: ", n_runs, ". ", "Number of games in each run: ", n_games, "."),
x = "Probability of success in Bernoulli trial",
y = "Expected value") +
geom_hline(yintercept = 0, size = 1.4, color = "red") +
geom_text(aes(x = 0.1, y = 0, label = "Fair game", hjust = 1, vjust = -1), size = 4, color = "red") +
theme(plot.title = element_text(hjust = 0.5), plot.caption = element_text(hjust = 0.5))
Graphics:
Examination of the expected_value_record dataframe suggests, the game is fair when probability values are within ranges: 0.32-0.33 or 0.68-0.69.
It's easy to tweak the last code to squeeze more robust numbers out of it.

Why is my Monte Carlo Integration wrong by a factor of 2?

I am trying to integrate the following function using a Monte Carlo Integration. The interval I want to integrate is x <- seq(0, 1, by = 0.01) and y <- seq(0, 1, by = 0.01).
my.f <- function(x, y){
result = x^2 + sin(x) + exp(cos(y))
return(result)
}
I calculated the integral using the cubature package.
library(cubature)
library(plotly)
# Rewriting the function, so it can be integrated
cub.function <- function(x){
result = x[1]^2 + sin(x[1]) + exp(cos(x[2]))
return(result)
}
cub.integral <- adaptIntegrate(f = cub.function, lowerLimit = c(0,0), upperLimit = c(1,1))
The result is 3.134606. But when I use my Monte Carlo Integration Code, see below, my result is about 1.396652. My code is wrong by more than a factor of 2!
What I did:
Since I need a volume to conduct a Monte Carlo Integration, I calculated the function values on the mentioned interval. This will give me an estimation of the maximum and minimum of the function.
# My data range
x <- seq(0, 1, by = 0.01)
y <- seq(0, 1, by = 0.01)
# The matrix, where I save the results
my.f.values <- matrix(0, nrow = length(x), ncol = length(y))
# Calculation of the function values
for(i in 1:length(x)){
for(j in 1:length(y)){
my.f.values[i,j] <- my.f(x = x[i], y = y[j])
}
}
# The maximum and minimum of the function values
max(my.f.values)
min(my.f.values)
# Plotting the surface, but this is not necessary
plot_ly(y = x, x = y, z = my.f.values) %>% add_surface()
So, the volume that we need is simply the maximum of the function values, since 1 * 1 * 4.559753 is simply 4.559753.
# Now, the Monte Carlo Integration
# I found the code online and modified it a bit.
monte = function(x){
tests = rep(0,x)
hits = 0
for(i in 1:x){
y = c(runif(2, min = 0, max = 1), # y[1] is y; y[2] is y
runif(1, min = 0, max = max(my.f.values))) # y[3] is z
if(y[3] < y[1]**2+sin(y[1])*exp(cos(y[2]))){
hits = hits + 1
}
prop = hits / i
est = prop * max(my.f.values)
tests[i] = est
}
return(tests)
}
size = 10000
res = monte(size)
plot(res, type = "l")
lines(x = 1:size, y = rep(cub.integral$integral, size), col = "red")
So, the result is completely wrong. But if I change the function a bit, suddenly is works.
monte = function(x){
tests = rep(0,x)
hits = 0
for(i in 1:x){
x = runif(1)
y = runif(1)
z = runif(1, min = 0, max = max(my.f.values))
if(z < my.f(x = x, y = y)){
hits = hits + 1
}
prop = hits / i
est = prop * max(my.f.values)
tests[i] = est
}
return(tests)
}
size = 10000
res = monte(size)
plot(res, type = "l")
lines(x = 1:size, y = rep(cub.integral$integral, size), col = "red")
Can somebody explain why the result suddenly changes? To me, both functions seem to do the exact same thing.
In your (first) code for monte, this line is in error:
y[3] < y[1]**2+sin(y[1])*exp(cos(y[2]))
Given your definition of my.f, it should surely be
y[3] < y[1]**2 + sin(y[1]) + exp(cos(y[2]))
Or..., given that you shouldn't be repeating yourself unnecessarily:
y[3] < my.f(y[1], y[2])

My variogram code result different from variog() result

I am writing code for producing a variogram. For validating my result, I checked with geoR::variog() but both variograms are different.
I tried to understand the code of variog() to see what happens under the hood but there are so many things happening that I can't seem to understand it. I, in my code, am using the parameters X-coordinate, Y-coordiante, data value, number of lags, minimum lag value, lag interval, azimuth (angle in degrees; 90 corresponds to vertical direction), angle tolerance (in degrees) and maximum bandwidth.
variogram = function(xcor, ycor, data, nlag, minlag, laginv, azm, atol, maxbandw){
dl <- length(data)
lowangle <- azm - atol
upangle <- azm + atol
gamlag <- integer(nlag)
n <- integer(nlag)
dist <- pairdist(xcor, ycor)
maxd <- max(dist)
llag <- seq(minlag, minlag + (nlag-1) * laginv, by = laginv)
hlag <- llag + laginv
for(i in 1:dl){
for(j in i:dl){
if(i != j){
if(xcor[j]- xcor[i] == 0)
theta <- 90
else
theta <- 180/pi * atan((ycor[j] - ycor[i])/(xcor[j] - xcor[i]))
for(k in 1:nlag){
d <- dist[j, i]
b <- abs(d * sin(theta - azm))
if((llag[k] <= d & d < hlag[k]) & (lowangle <= theta & theta < upangle) & (b <= maxbandw)){
gamlag[k] <- gamlag[k] + (data[i] - data[j])^2;
n[k] <- n[k] + 1
}
}
}
}
}
gamlag <- ifelse(n == 0, NA, gamlag/(2*n))
tmp <- data.frame("lag" = llag, "gamma" = gamlag)
return(tmp)
}
function call for the above code
ideal_variogram_2 <- variogram(data3[,1], data3[,2], data3[,3], 18, 0, 0.025, 90, 45, 1000000)
ideal_variogram_2 <- na.omit(ideal_variogram_2)
plot(ideal_variogram_2$lag, ideal_variogram_2$gamma, main = "Using my code")
function call for variog()
geodata1 <- as.geodata(data3, coords.col = 1:2, data.col = 3)
ideal_variogram_1 <- variog(geodata1, coords = geodata1$coords, data = geodata1$data, option = "bin", uvec = seq(0, 0.45, by = 0.025), direction = pi/2, tolerance = pi/4)
df <- data.frame(u = ideal_variogram_1$u, v = ideal_variogram_1$v)
plot(df$u, df$v, main = "Using variog()")
The 2 variograms that I got are at the following link:
Variogram

Ternary plots with binned means/medians

I am looking to generate a ternary plot with binned polygons (either triangle or hex, preferably in a ggplot framework) where the color of the polygon is a binned mean or median of selected values.
This script gets very close, but triangle cell color is representative of a number of observations, rather than a mean value of observations contained within the triangle cell.
So rather than soley providing X,Y, and Z; I would provide a fourth fill/value variable is provided from which binned means or medians are calculated and represented as a color on a gradient.
Akin to the below image, though in a ternary framework with an additional axis.
Image of stat_summary_hex() plot with color as binned mean value
I appreciate the help. Thank you.
Dummy data to begin with:
#load libraries
devtools::install_git('https://bitbucket.org/nicholasehamilton/ggtern')
library(ggtern)
library(ggplot)
# example data
sig <- matrix(c(3,0,0,2),2,2)
data <- data.frame(mvrnorm(n=10000, rep(2, 2), sig))
data$X1 <- data$X1/max(data$X1)
data$X2 <- data$X2/max(data$X2)
data$X1[which(data$X1<0)] <- runif(length(data$X1[which(data$X1<0)]))
data$X2[which(data$X2<0)] <- runif(length(data$X2[which(data$X2<0)]))
data$X3 <- with(data, 1-X1-X2)
data <- data[data$X3 >= 0,]
data$X4 <- rnorm(dim(data)[1])
data <- data.frame(X = data$X1, Y = data$X2, Z = data$X3, fill_variable = data$X4)
str(data)
# simple ternary plot where color of point is the fill variable value
ggtern(data,aes(X,Y,Z, color = fill_variable))+geom_point()
# 2D example, not a ternary though. Keep in mind in geom_hex Z is the fill, not the additional axis like ggtern
ggplot(data,aes(X,Y))+stat_summary_hex(aes(z = fill_variable))
This code isn't cleaned up, but it's a good jumping off point. Credit for original goes the OP referenced in the first question.
I made some minor adjustments to the count_bin function to instead of doing bin counts, it does bin medians. Use at your own risk and please point out any bugs. For my implementation this reports 0 for NA bins.
Example:
Function for binned median (pardon the name, just saves time):
count_bin <- function(data, minT, maxT, minR, maxR, minL, maxL) {
ret <- data
ret <- with(ret, ret[minT <= X1 & X1 < maxT,])
ret <- with(ret, ret[minL <= X2 & X2 < maxL,])
ret <- with(ret, ret[minR <= X3 & X3 < maxR,])
if(is.na(median(ret$VAR))) {
ret <- 0
} else {
ret <- median(ret$VAR)
}
ret
}
Modified heatmap function:
heatmap3d <- function(data, inc, logscale=FALSE, text=FALSE, plot_corner=TRUE) {
# When plot_corner is FALSE, corner_cutoff determines where to stop plotting
corner_cutoff = 1
# When plot_corner is FALSE, corner_number toggles display of obervations in the corners
# This only has an effect when text==FALSE
corner_numbers = TRUE
count <- 1
points <- data.frame()
for (z in seq(0,1,inc)) {
x <- 1- z
y <- 0
while (x>0) {
points <- rbind(points, c(count, x, y, z))
x <- round(x - inc, digits=2)
y <- round(y + inc, digits=2)
count <- count + 1
}
points <- rbind(points, c(count, x, y, z))
count <- count + 1
}
colnames(points) = c("IDPoint","T","L","R")
#str(points)
#str(count)
# base <- ggtern(data=points,aes(L,T,R)) +
# theme_bw() + theme_hidetitles() + theme_hidearrows() +
# geom_point(shape=21,size=10,color="blue",fill="white") +
# geom_text(aes(label=IDPoint),color="blue")
# print(base)
polygons <- data.frame()
c <- 1
# Normal triangles
for (p in points$IDPoint) {
if (is.element(p, points$IDPoint[points$T==0])) {
next
} else {
pL <- points$L[points$IDPoint==p]
pT <- points$T[points$IDPoint==p]
pR <- points$R[points$IDPoint==p]
polygons <- rbind(polygons,
c(c,p),
c(c,points$IDPoint[abs(points$L-pL) < inc/2 & abs(points$R-pR-inc) < inc/2]),
c(c,points$IDPoint[abs(points$L-pL-inc) < inc/2 & abs(points$R-pR) < inc/2]))
c <- c + 1
}
}
#str(c)
# Upside down triangles
for (p in points$IDPoint) {
if (!is.element(p, points$IDPoint[points$T==0])) {
if (!is.element(p, points$IDPoint[points$L==0])) {
pL <- points$L[points$IDPoint==p]
pT <- points$T[points$IDPoint==p]
pR <- points$R[points$IDPoint==p]
polygons <- rbind(polygons,
c(c,p),
c(c,points$IDPoint[abs(points$T-pT) < inc/2 & abs(points$R-pR-inc) < inc/2]),
c(c,points$IDPoint[abs(points$L-pL) < inc/2 & abs(points$R-pR-inc) < inc/2]))
c <- c + 1
}
}
}
#str(c)
# IMPORTANT FOR CORRECT ORDERING.
polygons$PointOrder <- 1:nrow(polygons)
colnames(polygons) = c("IDLabel","IDPoint","PointOrder")
df.tr <- merge(polygons,points)
Labs = ddply(df.tr,"IDLabel",function(x){c(c(mean(x$T),mean(x$L),mean(x$R)))})
colnames(Labs) = c("Label","T","L","R")
#str(Labs)
#triangles <- ggtern(data=df.tr,aes(L,T,R)) +
# geom_polygon(aes(group=IDLabel),color="black",alpha=0.25) +
# geom_text(data=Labs,aes(label=Label),size=4,color="black") +
# theme_bw()
# print(triangles)
bins <- ddply(df.tr, .(IDLabel), summarize,
maxT=max(T),
maxL=max(L),
maxR=max(R),
minT=min(T),
minL=min(L),
minR=min(R))
#str(bins)
count <- ddply(bins, .(IDLabel), summarize,
N=count_bin(data, minT, maxT, minR, maxR, minL, maxL)
#N=mean(data)
)
df <- join(df.tr, count, by="IDLabel")
str(count)
Labs = ddply(df,.(IDLabel,N),function(x){c(c(mean(x$T),mean(x$L),mean(x$R)))})
colnames(Labs) = c("Label","N","T","L","R")
if (plot_corner==FALSE){
corner <- ddply(df, .(IDPoint, IDLabel), summarize, maxperc=max(T,L,R))
corner <- corner$IDLabel[corner$maxperc>=corner_cutoff]
df$N[is.element(df$IDLabel, corner)] <- 0
if (text==FALSE & corner_numbers==TRUE) {
Labs$N[!is.element(Labs$Label, corner)] <- ""
text=TRUE
}
}
heat <- ggtern(data=df,aes(L,T,R)) +
geom_polygon(aes(fill=N,group=IDLabel),color="black",alpha=1, size = 0.1,show.legend = F)
if (logscale == TRUE) {
heat <- heat + scale_fill_gradient(name="Observations", trans = "log",
low=palette[2], high=palette[4])
} else {
heat <- heat + scale_fill_distiller(name="Median Value",
palette = "Spectral")
}
heat <<- heat +
Tlab("x") +
Rlab("y") +
Llab("z") +
theme_bw() +
theme(axis.tern.arrowsep=unit(0.02,"npc"), #0.01npc away from ticks ticklength
axis.tern.arrowstart=0.25,axis.tern.arrowfinish=0.75,
axis.tern.text=element_text(size=12),
axis.tern.arrow.text.T=element_text(vjust=-1),validate = F,
axis.tern.arrow.text.R=element_text(vjust=2),
axis.tern.arrow.text.L=element_text(vjust=-1),
#axis.tern.arrow.text=element_text(size=12),
axis.tern.title=element_text(size=15),
axis.tern.text=element_blank(),
axis.tern.arrow.text=element_blank())
if (text==FALSE) {
print(heat)
} else {
print(heat + geom_text(data=Labs,aes(label=N),size=3,color="white"))
}
}
Dummy example:
# dummy example
sig <- matrix(c(3,3,3,3),3,3)
data <- data.frame(mvrnorm(n=10000, rep(2, 2), sig))
data$X1[which(data$X1<0)] <- runif(length(data$X1[which(data$X1<0)]))
data$X2[which(data$X2<0)] <- runif(length(data$X2[which(data$X2<0)]))
data$X3 <- with(data, 1-X1-X2)
data <- data[data$X3 >= 0,]
data$VAR <- rnorm(dim(data)[1])
data <- data.frame(X = data$X1, Y = data$X2, Z = data$X3, fill_variable = data$X4)
str(data)
ggtern(data,aes(X1,
X2,
X3, color = VAR))+geom_point(size = 5)+scale_color_distiller(palette = "Spectral")
heatmap3d(data,.05)

Resources