I'm trying to calculate the volume under a 3d surface in R.
My data, dat, looks like:
0.003 0.019 0.083 0.25 0.5 1
0 1.0000000 0.8884265 0.8603268 0.7719994 0.7443621 0.6571405
0.111 0.6909722 0.6775000 0.6443750 0.6243750 0.5914730 0.5698242
0.25 0.5847205 0.6022367 0.5572917 0.5432991 0.5170673 0.4835819
0.429 0.5210938 0.5139063 0.4995312 0.4864062 0.4648636 0.4163698
0.667 0.4363103 0.4526562 0.4321859 0.4027519 0.4046011 0.3661616
1 0.3958333 0.4167468 0.3964428 0.3810459 0.3486328 0.3487930
where x = rownames(dat), y = colnames(dat) and z = dat.
I've looked here, here, and here, but can't seem to figure out how to apply those answers to my use case.
Here's a reproducible version of my data:
dat = structure(c(1,0.690972222222222,0.584720477386935,0.52109375,0.436310279187817,0.395833333333333,0.888426507537688,0.6775,0.602236675126904,0.51390625,0.45265625,0.416746794871795,0.860326776649746, 0.644375, 0.557291666666667,0.49953125,0.432185913705584,0.396442819148936,0.771999378109453,0.624375,0.543299129353234,0.48640625,0.402751865671642,0.381045854271357,0.744362113402062,0.591472989949749,0.517067307692308,0.464863578680203,0.404601130653266,0.3486328125,0.657140544041451,0.56982421875,0.483581852791878,0.41636981865285,0.366161616161616,0.348792989417989),.Dim = c(6L, 6L), .Dimnames = list(c("0","0.111","0.25","0.429","0.667","1"),c("0.003","0.019","0.083","0.25","0.5","1")))
You could use the getVolume() function provided in the answer you linked here provided that your matrix were in the requisite dataframe format.
Here is some code to make that dataframe:
df <- expand.grid(x = as.numeric(rownames(dat)), y = as.numeric(colnames(dat)))
df$z = as.vector(dat)
Then define the function and apply:
library(geometry)
getVolume=function(df) {
#find triangular tesselation of (x,y) grid
res=delaunayn(as.matrix(df[,-3]),full=TRUE,options="Qz")
#calulates sum of truncated prism volumes
sum(mapply(function(triPoints,A) A/3*sum(df[triPoints,"z"]),
split.data.frame(res$tri,seq_along(res$areas)),
res$areas))
}
getVolume(df)
[1] 0.4714882
Related
I'm working on a one species, two resources phytoplankton competition model based on Tilman's work in the 70s and 80s. I have a dataframe set up for the analytical solution but am really struggling with the syntax to plot the graphs I need. Here is my code so far:
library(dplyr)
r <- 0.1
g1 <- 0.001
g2 <- 0.01
v1 <- 0.1
v2 <- 1
k1 <- 0.01
k2 <- 0.1
d <- 0.15
s1_star = (r*g1*k1*d)-((v1*(r-d))-r*g1*d)
s2_star = (r*g2*k2*d)-((v2*(r-d))-r*g2*d)
s01 = s1_star+((s02-s2_star)*(g1/g2))
params <- list(r = 0.1,
g1 = 0.001,
g2 = 0.01,
d = 0.5,
v1 = 0.1,
v2 = 1,
k1 = 0.01,
k2 = 0.1)
df <- data.frame(s02 = seq(10, 1, -1)) |>
mutate(
s1_star = (r*g1*k1*d)-((v1*(r-d))-r*g1*d),
s2_star = (r*g2*k2*d)-((v2*(r-d))-r*g2*d),
s01 = s1_star+((s02-s2_star)*(g1/g2)), ## Tilman eq 17, supply concentration of resource 1
## in the reservoir that would result in co-limitation given some concentration of
## resource 2 (s20) in the reservoir
s1_limiting_ratio = s02/s01 ## ratio of supply points that result in co-limitation
)
cbind(params, df) |> as.data.frame() -> limiting_ratio
library(ggplot2)
limiting_ratio |> ggplot(aes(x = s1_star, y = s2_star)) + geom_line()
I want to plot s1_star and s2_star as the axes (which I did), but I'm trying to add the s1_limiting_ratio as a line on the graph (it's a ratio of s02/s01, which represents when resource 1 (S1) and resource 2 (S2) are co-limited. Then, I want to plot various values of s01 and s02 on the graph to see where they fall (to determine which resource is limiting to know which resource equation to use, either S1 or S2, in the analytical solution.
I've tried googling ggplot help, and struggling to apply it to the graph I need. I'm still fairly new to R and definitely pretty new to ggplot, so I really appreciate any help and advice!
I am trying to create or find a function that calculates the distance and angle between two points, the idea is that I can have two data.frames with x, y coordinates as follows:
Example dataset
From <- data.frame(x = c(0.5,1, 4, 0), y = c(1.5,1, 1, 0))
To <- data.frame(x =c(3, 0, 5, 1), y =c(3, 0, 6, 1))
Current function
For now, I've managed to develop the distance part using Pythagoras:
distance <- function(from, to){
D <- sqrt((abs(from[,1]-to[,1])^2) + (abs(from[,2]-to[,2])^2))
return(D)
}
Which works fine:
distance(from = From, to = To)
[1] 2.915476 1.414214 5.099020 1.414214
but I can't figure out how to get the angle part.
What I tried so far:
I tried adapting the second solution of this question
angle <- function(x,y){
dot.prod <- x%*%y
norm.x <- norm(x,type="2")
norm.y <- norm(y,type="2")
theta <- acos(dot.prod / (norm.x * norm.y))
as.numeric(theta)
}
x <- as.matrix(c(From[,1],To[,1]))
y <- as.matrix(c(From[,2],To[,2]))
angle(t(x),y)
But I am clearly making a mess of it
Desired output
I would like having the angle part of the function added to my first function, where I get both the distance and angle between the from and to dataframes
By angle between two points, I am assuming you mean angle between two vectors
defined by endpoints (and assuming the start is the origin).
The example you used was designed around only a single pair of points, with the transpose used only on this principle. It is however robust enough to work in more than 2 dimensions.
Your function should be vectorised as your distance function is, as it is expecting a number of pairs of points (and we are only considering 2 dimensional points).
angle <- function(from,to){
dot.prods <- from$x*to$x + from$y*to$y
norms.x <- distance(from = `[<-`(from,,,0), to = from)
norms.y <- distance(from = `[<-`(to,,,0), to = to)
thetas <- acos(dot.prods / (norms.x * norms.y))
as.numeric(thetas)
}
angle(from=From,to=To)
[1] 0.4636476 NaN 0.6310794 NaN
The NaNs are due to you having zero-length vectors.
how about:
library(useful)
df=To-From
cart2pol(df$x, df$y, degrees = F)
which returns:
# A tibble: 4 x 4
r theta x y
<dbl> <dbl> <dbl> <dbl>
1 2.92 0.540 2.50 1.50
2 1.41 3.93 -1.00 -1.00
3 5.10 1.37 1.00 5.00
4 1.41 0.785 1.00 1.00
where r us the distance and theta is the angle
I am a beginner of R and would like to get some help from experts.
I want to create a function to calculate risk under 3 conditions
1st: control: Xhh=0 Xmi=0
2st: hh: Xhh=1 Xmi=0
3rd: hh+mi: Xhh=1 Xmi=1
and compare between 2 groups
group 1: Xenv=50
group 2: Xenv=90
my parameters:
thi lambda1 lambda2 lambda3 Beta Z2 Z1 Z4 Z3 Z6 Z5 theta
1.38 0.34 0.25 0.49 0.5 0.58 0.55 0.59 0.56 0.44 0.61 0.88
I want to plug in all these parameters into this equation
http://i.stack.imgur.com/DYR81.png
and calculate the value at different time points,
Ti = from 1 to 10
For Ti=0, make the value at 0
and then plot a graph with the value at different time point, with 3 curves of these three conditions and compared between the two groups. So at the end having 6 curves in the graph.
Can anyone please offer some help?
This is probably a little more interpretation of your needs than you will tend to get here, but I was looking for a reason to procrastinate at other tasks this morning.
I think this meets your needs as stated:
#define a function called myfunction
myfunction <- function(lambda1 = 0.34
,lambda2 = 0.25
,lambda3 = 0.49
,Beta = 0.5
,Z2 = 0.58
,Z1 = 0.55
,Z4 = 0.59
,Z3 = 0.56
,Z6 = 0.44
,Z5 = 0.61
,theta = 0.88
,Xenv
,Xhi
,Xmi
,Tmin
,Tmax){
#this should make this function somewhat generalizable to values of T
#create an empty vector to hold values of our function as defined
f <- rep(NA, length(Tmin:Tmax))
#loop through values of T in your function
#check parentheses here - I make no promises
#I'm also unclear what your value of thi is. I may be missing something,
#but I don't see it in the function you have written
i <- Tmin:Tmax
f <- -log(exp((-(lambda1*i)^theta)*exp(log10(1-Beta*Xhi)+log10((Xenv/100)*(Z2-Z1)+Z2))-
((lambda2*i)^theta)*exp(log10(1-Beta*Xmi))*log10((Xenv/100)*(Z4-Z3)+Z4)-
((lambda3*i)^theta)*exp(log10((Xenv/100)*(Z6-Z5)+Z6)))*(1-theta)+theta)
#set f=0 at T=0 (I think this is what you want)
if(Tmin==0) f[1] <- 0
return(f)
}
#you didn't specify how to plot, but this seems to lend itself to a ggplot facted viz.
require(ggplot2)
require(reshape2)
#calculate for group 1
datg1 <- data.frame(t = 0:10
,group = 1
,condition1 = myfunction(Xenv=50, Xhi=0, Xmi=0, Tmin=0, Tmax=10)
,condition2 = myfunction(Xenv=50, Xhi=1, Xmi=0, Tmin=0, Tmax=10)
,condition3 = myfunction(Xenv=50, Xhi=1, Xmi=1, Tmin=0, Tmax=10)
)
#calculate for group 2
datg2 <- data.frame(t = 0:10
,group = 2
,condition1 = myfunction(Xenv=90, Xhi=0, Xmi=0, Tmin=0, Tmax=10)
,condition2 = myfunction(Xenv=90, Xhi=1, Xmi=0, Tmin=0, Tmax=10)
,condition3 = myfunction(Xenv=90, Xhi=1, Xmi=1, Tmin=0, Tmax=10)
)
#bind values together
dat <- rbind(datg1, datg2)
#melt your data into long format
datm <- melt(dat, id.vars = c("t", "group"))
#plot and facet
ggplot(datm, aes(x=t, y=value, colour=variable)) +
geom_line() +
facet_grid(.~group)
I'm running k-means clustering on a data frame df1, and I'm looking for a simple approach to computing the closest cluster center for each observation in a new data frame df2 (with the same variable names). Think of df1 as the training set and df2 on the testing set; I want to cluster on the training set and assign each test point to the correct cluster.
I know how to do this with the apply function and a few simple user-defined functions (previous posts on the topic have usually proposed something similar):
df1 <- data.frame(x=runif(100), y=runif(100))
df2 <- data.frame(x=runif(100), y=runif(100))
km <- kmeans(df1, centers=3)
closest.cluster <- function(x) {
cluster.dist <- apply(km$centers, 1, function(y) sqrt(sum((x-y)^2)))
return(which.min(cluster.dist)[1])
}
clusters2 <- apply(df2, 1, closest.cluster)
However, I'm preparing this clustering example for a course in which students will be unfamiliar with the apply function, so I would much prefer if I could assign the clusters to df2 with a built-in function. Are there any convenient built-in functions to find the closest cluster?
You could use the flexclust package, which has an implemented predict method for k-means:
library("flexclust")
data("Nclus")
set.seed(1)
dat <- as.data.frame(Nclus)
ind <- sample(nrow(dat), 50)
dat[["train"]] <- TRUE
dat[["train"]][ind] <- FALSE
cl1 = kcca(dat[dat[["train"]]==TRUE, 1:2], k=4, kccaFamily("kmeans"))
cl1
#
# call:
# kcca(x = dat[dat[["train"]] == TRUE, 1:2], k = 4)
#
# cluster sizes:
#
# 1 2 3 4
#130 181 98 91
pred_train <- predict(cl1)
pred_test <- predict(cl1, newdata=dat[dat[["train"]]==FALSE, 1:2])
image(cl1)
points(dat[dat[["train"]]==TRUE, 1:2], col=pred_train, pch=19, cex=0.3)
points(dat[dat[["train"]]==FALSE, 1:2], col=pred_test, pch=22, bg="orange")
There are also conversion methods to convert the results from cluster functions like stats::kmeans or cluster::pam to objects of class kcca and vice versa:
as.kcca(cl, data=x)
# kcca object of family ‘kmeans’
#
# call:
# as.kcca(object = cl, data = x)
#
# cluster sizes:
#
# 1 2
# 50 50
Something I noticed about both the approach in the question and the flexclust approaches are that they are rather slow (benchmarked here for a training and testing set with 1 million observations with 2 features each).
Fitting the original model is reasonably fast:
set.seed(144)
df1 <- data.frame(x=runif(1e6), y=runif(1e6))
df2 <- data.frame(x=runif(1e6), y=runif(1e6))
system.time(km <- kmeans(df1, centers=3))
# user system elapsed
# 1.204 0.077 1.295
The solution I posted in the question is slow at calculating the test-set cluster assignments, since it separately calls closest.cluster for each test-set point:
system.time(pred.test <- apply(df2, 1, closest.cluster))
# user system elapsed
# 42.064 0.251 42.586
Meanwhile, the flexclust package seems to add a lot of overhead regardless of whether we convert the fitted model with as.kcca or fit a new one ourselves with kcca (though the prediction at the end is much faster)
# APPROACH #1: Convert from the kmeans() output
system.time(km.flexclust <- as.kcca(km, data=df1))
# user system elapsed
# 87.562 1.216 89.495
system.time(pred.flexclust <- predict(km.flexclust, newdata=df2))
# user system elapsed
# 0.182 0.065 0.250
# Approach #2: Fit the k-means clustering model in the flexclust package
system.time(km.flexclust2 <- kcca(df1, k=3, kccaFamily("kmeans")))
# user system elapsed
# 125.193 7.182 133.519
system.time(pred.flexclust2 <- predict(km.flexclust2, newdata=df2))
# user system elapsed
# 0.198 0.084 0.302
It seems that there is another sensible approach here: using a fast k-nearest neighbors solution like a k-d tree to find the nearest neighbor of each test-set observation within the set of cluster centroids. This can be written compactly and is relatively speedy:
library(FNN)
system.time(pred.knn <- get.knnx(km$center, df2, 1)$nn.index[,1])
# user system elapsed
# 0.315 0.013 0.345
all(pred.test == pred.knn)
# [1] TRUE
You can use the ClusterR::KMeans_rcpp() function, use RcppArmadillo. It allows for multiple initializations (which can be parallelized if Openmp is available). Besides optimal_init, quantile_init, random and kmeans ++ initilizations one can specify the centroids using the CENTROIDS parameter. The running time and convergence of the algorithm can be adjusted using the num_init, max_iters and tol parameters.
library(scorecard)
library(ClusterR)
library(dplyr)
library(ggplot2)
## Generate data
set.seed(2019)
x = c(rnorm(200000, 0,1), rnorm(150000, 5,1), rnorm(150000,-5,1))
y = c(rnorm(200000,-1,1), rnorm(150000, 6,1), rnorm(150000, 6,1))
df <- split_df(data.frame(x,y), ratio = 0.5, seed = 123)
system.time(
kmrcpp <- KMeans_rcpp(df$train, clusters = 3, num_init = 4, max_iters = 100, initializer = 'kmeans++'))
# user system elapsed
# 0.64 0.05 0.82
system.time(pr <- predict_KMeans(df$test, kmrcpp$centroids))
# user system elapsed
# 0.01 0.00 0.02
p1 <- df$train %>% mutate(cluster = as.factor(kmrcpp$clusters)) %>%
ggplot(., aes(x,y,color = cluster)) + geom_point() +
ggtitle("train data")
p2 <- df$test %>% mutate(cluster = as.factor(pr)) %>%
ggplot(., aes(x,y,color = cluster)) + geom_point() +
ggtitle("test data")
gridExtra::grid.arrange(p1,p2,ncol = 2)
Here's what I can use to list weight for all terminal nodes : but how can I add some code to get response prediction as well as weight by each terminal node ID :
say I want my output to look like this
--
Here below is what I have so far to get the weight
nodes(airct, unique(where(airct)))
Thank you
The Binary tree is a big S4 object, so sometimes it is difficult to extract the data.
But the plot method for BinaryTree object, has an optional panel function of the form function(node) plotting the terminal nodes. So when you plot you can get node informations.
here I use the plot function, to extract the information and even better I used the gridExtra package to convert the terminal node to a table.
library(party)
library(gridExtra)
set.seed(100)
lls <- data.frame(N = gl(3, 50, labels = c("A", "B", "C")),
a = rnorm(150) + rep(c(1, 0,150)),
b = runif(150))
pond= sample(1:5,150,replace=TRUE)
tt <- ctree(formula=N~a+b, data=lls,weights = pond)
output.df <- data.frame()
innerWeights <- function(node){
dat <- data.frame (x=node$nodeID,
y=sum(node$weights),
z=paste(round(node$prediction,2),collapse=' '))
grid.table(dat,
cols = c('ID','Weights','Prediction'),
h.even.alpha=1,
h.odd.alpha=1,
v.even.alpha=0.5,
v.odd.alpha=1)
output.df <<- rbind(output.df,dat) # note the use of <<-
}
plot(tt, type='simple', terminal_panel = innerWeights)
data
ID Weights Prediction
1 4 24 0.42 0.5 0.08
2 5 17 0.06 0.24 0.71
3 6 24 0.08 0 0.92
4 7 388 0.37 0.37 0.26
Here's what I found , it works fine with a bit extra information. But I just want to post it here just in case someone need them in the future.
y <- do.call(rbind, nodes(tt, unique(where(tt))))
write.table(y, 'clipboard', sep='\t')
#agstudy , let me know what you think.