Truncated cone volume in R - r

I'd like to see an effective way of estimating the volume of a cone having irregular tapering. We have cone diameters and height in these vectors:
D = c(30, 29, 29, 27) #diameter (cm) vector of the cone
deltah = c(10, 10, 10) #delta height, cm, may vary
Current solution involves a for-loop (in R) using the truncated cone formula for each cone section:
conevol=NULL
for(i in 2:length(D)){
conevol[(i-1)] = (D[i]^2 + D[(i-1)]^2 + D[i]*D[(i-1)]) *deltah[(i-1)]*pi/3
}
sum(conevol)
#[1] 78403.68
So: any idea for a vectorized approach?

No need to use the for loop at all, just create a vector and apply your operation over it:
> D = c(30, 29, 29, 27)
> deltah = c(10, 10, 10)
> i=2:length(D)
> i
[1] 2 3 4
> i-1
[1] 1 2 3
> conevol = (D[i]^2 + D[(i-1)]^2 + D[i]*D[(i-1)]) *deltah[(i-1)]*pi/3
> conevol
[1] 27342.33 26420.79 24640.56
> sum(conevol)
[1] 78403.68

Related

How to calculate a mean value from multiple maximal values

I have a variable e.g. c(0, 8, 7, 15, 85, 12, 46, 12, 10, 15, 15)
how can I calculate a mean value out of random maximal values in R?
for example, I would like to calculate a mean value with three maximal values?
First step: You draw a sample of 3 from your data and store it in x
Second step: You calculate the mean of the sample
try
dat <- c(0,8,7,15, 85, 12, 46, 12, 10, 15,15)
x <- sample(dat,3)
x
mean(x)
possible output:
> x <- sample(dat,3)
> x
[1] 85 15 0
> mean(x)
[1] 33.33333
If you mean the three highest values, just sort your vector and subset:
> mean(sort(c(0,8,7,15, 85, 12, 46, 12, 10, 15,15), decreasing=T)[1:3])
[1] 48.66667

Calculate quantiles in R without interpolation - round up or down to actual value

It's my understanding that when calculating quantiles in R, the entire dataset is scanned and the value for each quantile is determined.
If you ask for .8, for example it will give you a value that would occur at that quantile. Even if no such value exists, R will nonetheless give you the value that would have occurred at that quantile. It does this through linear interpolation.
However, what if one wishes to calculate quantiles and then proceed to round up/down to the nearest actual value?
For example, if the quantile at .80 gives a value of 53, when the real dataset only has a 50 and a 54, then how could one get R to list either of these values?
Try this:
#dummy data
x <- c(1,1,1,1,10,20,30,30,40,50,55,70,80)
#get quantile at 0.8
q <- quantile(x, 0.8)
q
# 80%
# 53
#closest match - "round up"
min(x[ x >= q ])
#[1] 55
#closest match - "round down"
max(x[ x <= q ])
#[1] 50
There are many estimation methods implemented in R's quantile function. You can choose which type to use with the type argument as documented in https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html.
x <- c(1, 1, 1, 1, 10, 20, 30, 30, 40, 50, 55, 70, 80)
quantile(x, c(.8)) # default, type = 7
# 80%
# 53
quantile(x, c(.8), FALSE, TRUE, 7) # equivalent to the previous invocation
# 80%
# 53
quantile(x, c(.8), FALSE, TRUE, 3) # type = 3, nearest sample
# 80%
# 50

rounding in R cut function

Does anyone know how R chooses the number of significant digits in the cut function?
y<-c(61, 64, 64, 65, 66)
table(cut(y, breaks=c(60.555, 67.123, 75.055)))
produces the result
(60.6,67.1] (67.1,75.1]
5 0
but
table(cut(y, breaks=c(60.958, 67.958, 74.958)))
produces the result
(61,68] (68,75]
5 0
I would prefer that r use the exact boundaries that I provide in the cut function, but it seems to be rounding. I'm not clear on how it chooses the precision of the rounding. See the example below. Is it possible to force R to use my exact boundaries?
How about using nchar to find the number of digits per cut? Here are three examples.
> y <- c(61, 64, 64, 65, 66)
> breaks1 <- c(60.555, 67.123, 75.055)
> table(cut(y, breaks = breaks1, dig.lab = min(nchar(breaks1))))
## (60.555,67.123] (67.123,75.055]
## 5 0
> breaks2 <- c(60.5, 67.1, 75.4)
> table(cut(y, breaks = breaks2, dig.lab = min(nchar(breaks2))))
## (60.5,67.1] (67.1,75.4]
## 5 0
> breaks3 <- c(60, 67, 75)
> table(cut(y, breaks = breaks3, dig.lab = min(nchar(breaks3))))
## (60,67] (67,75]
## 5 0
NOTE that the use of min is just to control for warning messages that would occur should the digits not be identical in the breaks vector.

How do i split a number randomly into multiple numbers given the number and n groups?

For instance , if the number is 100 and the number of groups is 4 it should give any random list of 4 numbers that add upto 100:
input number = 100
number of groups = 4
Possible outputs:
25, 25, 25, 25
10, 20, 30, 40
15, 35, 2, 48
The output should only be one list generated. More application oriented example would be how i would split a probability 1 into multiple groups given the number of groups using R?
rmultinom might be handy here:
x <- rmultinom(n = 1, size = 100, prob = rep(1/4, 4))
x
colSums(x)
Here I draw one vector, with a total size of 100, which is splitted into 4 groups.
You can try following
total <- 100
n <- 4
as.vector(table(sample(1:n, size = total, replace = T)))
## [1] 23 27 24 26
as.vector(table(sample(1:n, size = total, replace = T)))
## [1] 25 26 28 21
as.vector(table(sample(1:n, size = total, replace = T)))
## [1] 24 20 28 28
When it comes to probabilities, I think this is a good idea:
generate.probabilities <- function(n){
bordersR <- c(sort(runif(n-1)), 1)
bordersL <- c(0, bordersR[1:(n-1)])
bordersR - bordersL
}
It gives you n numbers from random distribution which sum up to 1.
Define the parameters for generality
inN <- 100 # input number
nG <- 4 # number of groups
Following storaged's idea that we only need 3 random numbers to split the space into 4 regions, but requiring integers, the inner borders can be found as:
sort(sample(inN,nG-1, replace = TRUE))
The OP wanted the count in each group which we can find by
diff(c(0,sort(sample(inN,nG-1, replace = TRUE)), inN))

Optimize value with linear (or non-linear?) constraints in R

I am trying to pick the best possible fantasy football team given different constraints. My goal is to pick the players that maximize the sum of their projected points.
The constraints are:
1) The team must include:
-1 QB
-2 RBs
-2 WRs
-1 TE
2) A player's risk must not exceed 6
3) The sum of the players' costs must not exceed 300.
How can I do this? What is the best package/function in R to optimize these constraints? What would the function call look like to maximize the projected points given these constraints? FYI, I'll be searching through 100-300 players.
Thanks in advance! Here is a small example data set:
name <- c("Aaron Rodgers","Tom Brady","Arian Foster","Ray Rice","LeSean McCoy","Calvin Johnson","Larry Fitzgerald","Wes Welker","Rob Gronkowski","Jimmy Graham")
pos <- c("QB","QB","RB","RB","RB","WR","WR","WR","TE","TE")
pts <- c(167, 136, 195, 174, 144, 135, 89, 81, 114, 111)
risk <- c(2.9, 3.4, 0.7, 1.1, 3.5, 5.0, 6.7, 4.7, 3.7, 8.8)
cost <- c(60, 47, 63, 62, 40, 60, 50, 35, 40, 40)
mydata <- data.frame(name, pos, pts, risk, cost)
Your constraints and objective are linear, but your variables are binaries: whether each player should be picked or not. So your problem is a little more general than a Linear Programming (LP), it is a Mixed-Integer Programming (MIP). On CRAN's Optimization Task View, look for their MIP section.
CPLEX is a commercial solver you probably not have access to, but GLPK is free. If I were you, I would probably go with the high level interface Rglpk.
It will require you put your problem in matrix form, I suggest you look at the documentation and examples.
Edit: Here is an implementation:
# We are going to solve:
# maximize f'x subject to A*x <dir> b
# where:
# x is the variable to solve for: a vector of 0 or 1:
# 1 when the player is selected, 0 otherwise,
# f is your objective vector,
# A is a matrix, b a vector, and <dir> a vector of "<=", "==", or ">=",
# defining your linear constraints.
# number of variables
num.players <- length(name)
# objective:
f <- pts
# the variable are booleans
var.types <- rep("B", num.players)
# the constraints
A <- rbind(as.numeric(pos == "QB"), # num QB
as.numeric(pos == "RB"), # num RB
as.numeric(pos == "WR"), # num WR
as.numeric(pos == "TE"), # num TE
diag(risk), # player's risk
cost) # total cost
dir <- c("==",
"==",
"==",
"==",
rep("<=", num.players),
"<=")
b <- c(1,
2,
2,
1,
rep(6, num.players),
300)
library(Rglpk)
sol <- Rglpk_solve_LP(obj = f, mat = A, dir = dir, rhs = b,
types = var.types, max = TRUE)
sol
# $optimum
# [1] 836 ### <- the optimal total points
# $solution
# [1] 1 0 1 0 1 1 0 1 1 0 ### <- a `1` for the selected players
# $status
# [1] 0 ### <- an optimal solution has been found
# your dream team
name[sol$solution == 1]
# [1] "Aaron Rodgers" "Arian Foster" "LeSean McCoy"
# [4] "Calvin Johnson" "Wes Welker" "Rob Gronkowski

Resources