Does anyone know how R chooses the number of significant digits in the cut function?
y<-c(61, 64, 64, 65, 66)
table(cut(y, breaks=c(60.555, 67.123, 75.055)))
produces the result
(60.6,67.1] (67.1,75.1]
5 0
but
table(cut(y, breaks=c(60.958, 67.958, 74.958)))
produces the result
(61,68] (68,75]
5 0
I would prefer that r use the exact boundaries that I provide in the cut function, but it seems to be rounding. I'm not clear on how it chooses the precision of the rounding. See the example below. Is it possible to force R to use my exact boundaries?
How about using nchar to find the number of digits per cut? Here are three examples.
> y <- c(61, 64, 64, 65, 66)
> breaks1 <- c(60.555, 67.123, 75.055)
> table(cut(y, breaks = breaks1, dig.lab = min(nchar(breaks1))))
## (60.555,67.123] (67.123,75.055]
## 5 0
> breaks2 <- c(60.5, 67.1, 75.4)
> table(cut(y, breaks = breaks2, dig.lab = min(nchar(breaks2))))
## (60.5,67.1] (67.1,75.4]
## 5 0
> breaks3 <- c(60, 67, 75)
> table(cut(y, breaks = breaks3, dig.lab = min(nchar(breaks3))))
## (60,67] (67,75]
## 5 0
NOTE that the use of min is just to control for warning messages that would occur should the digits not be identical in the breaks vector.
Related
X<-c(-95, 67, -92, 65, 20, 25, -28, 55, 53, -77)
That is my vector, I would like to get the mean of the positive numbers, and then get the mean if the negative numbers.
How could I do this? I have looked everywhere and nothing pops up.
Thank you.
Try this:
# Mean of positive numbers
mean(X[X > 0])
# Mean of negative numbers
mean(X[X < 0])
You can use "which" as it will not consider any NA values
Mean of positive numbers:
mean(X[which(X>=0)])
[1] 47.5
Mean of negative numbers:
mean(X[which(X<0)])
[1] -73
You can use tapply :
tapply(X, X > 0, mean)
#FALSE TRUE
#-73.0 47.5
If you want to give good label names to the values.
tapply(X, ifelse(X > 0, 'positive', 'negative'), mean)
#negative positive
# -73.0 47.5
R-Programming: Using the ISLR library, I want to predict someone's wage if they are age 35, with a supposed model that utilizes the function cut() with values 0, 35, 45, 55, 65, 80 to cut variable "age" into different brackets. With that being said, how should the predict() code look like with cut() and my model in consideration?
Here is my code so far prior to predict():
table(cut(age, breaks = c(0, 35, 45, 55, 65, 80))) # cut()
getfit.1 = lm(wage~education+cut(age, breaks = c(0,25,35,45,55,80)),data=Wage) # model with cut()
You will make your life easier if you create the categorical variable and then use it to fit the model:
library(ISLR)
agecat <- cut(Wage$age, breaks = c(0,25,35,45,55,80))
getfit.1 <- lm(wage~education+agecat,data=Wage)
predict(getfit.1, data.frame(education="2. HS Grad", agecat="(25,35]"))
# 1
# 88.56445
Note, you must specify the education category as well to get a prediction. As a result it may be useful to get all the the combinations:
cross <- expand.grid(agecat=levels(agecat), education=levels(Wage$education))
predictions <- data.frame(cross, pwage=predict(getfit.1, cross))
head(predictions)
# agecat education pwage
# 1 (0,25] 1. < HS Grad 59.12711
# 2 (25,35] 1. < HS Grad 77.65516
# 3 (35,45] 1. < HS Grad 91.86200
# 4 (45,55] 1. < HS Grad 90.84853
# 5 (55,80] 1. < HS Grad 88.53072
# 6 (0,25] 2. HS Grad 70.03640
I have a (for me) pretty complex problem. I have got two vectors:
vectora <- c(111, 245, 379, 516, 671)
vectorb <- c(38, 54, 62, 67, 108)
Furthermore i have got two variables
x = 80
y = 0.8
The third vector is based on the variables x and y the following way:
vectorc <- vectora^y/(1+(vectora^y-1)/x)
The goal is to minimize the deviation of vectorb and vectorc by changing x and y. The deviation is the defined by following function:
deviation <- (abs(vectorb[1]-vectorc[1])) + (abs(vectorb[2]-vectorc[2])) + (abs(vectorb[3]-vectorc[3])) + (abs(vectorb[4]-vectorc[4])) + (abs(vectorb[5]-vectorc[5]))
How can i do this in R?
You can use the optim procedure!
Here's how it'd work:
vectora <- c(111, 245, 379, 516, 671)
vectorb <- c(38, 54, 62, 67, 108)
fn <- function(v) {
x = v[1]
y = v[2]
vectorc <- vectora^y/(1+(vectora^y-1)/x);
return <- sum(abs(vectorb - vectorc))
}
optim(c(80, 0.8), fn)
The output of that is:
$par
[1] 91.4452617 0.8840952
$value
[1] 37.2487
$counts
function gradient
151 NA
$convergence
[1] 0
$message
NULL
I'd like to see an effective way of estimating the volume of a cone having irregular tapering. We have cone diameters and height in these vectors:
D = c(30, 29, 29, 27) #diameter (cm) vector of the cone
deltah = c(10, 10, 10) #delta height, cm, may vary
Current solution involves a for-loop (in R) using the truncated cone formula for each cone section:
conevol=NULL
for(i in 2:length(D)){
conevol[(i-1)] = (D[i]^2 + D[(i-1)]^2 + D[i]*D[(i-1)]) *deltah[(i-1)]*pi/3
}
sum(conevol)
#[1] 78403.68
So: any idea for a vectorized approach?
No need to use the for loop at all, just create a vector and apply your operation over it:
> D = c(30, 29, 29, 27)
> deltah = c(10, 10, 10)
> i=2:length(D)
> i
[1] 2 3 4
> i-1
[1] 1 2 3
> conevol = (D[i]^2 + D[(i-1)]^2 + D[i]*D[(i-1)]) *deltah[(i-1)]*pi/3
> conevol
[1] 27342.33 26420.79 24640.56
> sum(conevol)
[1] 78403.68
I am using pbinom in R to determine the p-values for multiple outcome values and probabilities of success:
1 - pbinom(1:2, 21, c(0.02, 0.05))
1:2 represents the number of observed counts, 21 represents the sample size, and 0.02 and 0.05 represent the probability of success. However, the output of the above command is:
[1] 0.06534884 0.08491751
These values represent the probabilities of:
1 - pbinom(1, 21, 0.02) & 1 - pbinom(2, 21, 0.05)
respectively.
I wish to obtain the outputs of : 1 - pbinom(1:2, 21, 0.02) and 1 - pbinom(1:2, 21, 0.05)
such that I obtain the output:
[1] 0.065348840 0.008125299 ## pvalues for 1 - pbinom(1:2, 21, 0.02)
[1] 0.28302816 0.08491751 ## pvalues for 1 - pbinom(1:2, 21, 0.05)
My actual data set is very lengthy, so I can't type code for every probability of success.
I also tried this using a for loop:
output=c()
for (i in 1:2) {
output[i]=(1 - pbinom(i, 21, c(0.02, 0.05)))
}
But I get the following warning message:
1: In output[i] = (1 - pbinom(i, 21, c(0.02, 0.05))) :
number of items to replace is not a multiple of replacement length
2: In output[i] = (1 - pbinom(i, 21, c(0.02, 0.05))) :
number of items to replace is not a multiple of replacement length
I realize this question maybe difficult to interpret, but any help will be greatly appreciated.
Thank you.
Using sapply:
t(sapply( c(0.02, 0.05), function(x) 1 - pbinom(1:2, 21, x)))
# [,1] [,2]
# [1,] 0.06534884 0.008125299
# [2,] 0.28302816 0.084917514
Hi you can try this,
matrix(1-pbinom(c(1:2,1:2), size=21, prob = rep(c(0.02,0.05), each=2)), ncol=2, byrow=TRUE)
PS : The error means that your vector is not of the same length of your input.