density histogram in R

density histogram in R - r

I'm new to R and probability&statistics. I have a question regarding the histograms...
hist(rbinom(10000, 10, 0.1), freq=FALSE)
it shows the histogram following diagram which is not clear to me:
if the y-axis is density, so the total number should be %100, am I wrong?
But in the histogram, I can see that it is bigger than %100.

Function hist returns a list object with all information necessary to answer the question.
I will set the RNG seed to make the example reproducible.
set.seed(1234)
h <- hist(rbinom(10000, 10, 0.1), freq=FALSE)
str(h)
#List of 6
# $ breaks : num [1:11] 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ...
# $ counts : int [1:10] 3448 3930 0 1910 0 588 0 112 0 12
# $ density : num [1:10] 0.69 0.786 0 0.382 0 ...
# $ mids : num [1:10] 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75
# $ xname : chr "rbinom(10000, 10, 0.1)"
# $ equidist: logi TRUE
# - attr(*, "class")= chr "histogram"
The relevant list members are breaks and density.
breaks is a vector of length 11, so there are 10 bins.
density is a vector of length 10, each corresponding to one of the bins.
Now compute the area of each bar by multiplying the bins lengths by the respective densities.
diff(h$breaks) # bins lengths
# [1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
h$density*diff(h$breaks)
# [1] 0.3448 0.3930 0.0000 0.1910 0.0000 0.0588 0.0000 0.0112 0.0000 0.0012
Total area:
sum(h$density*diff(h$breaks))
#[1] 1

The area under the curve should be 1. Since your boxes appear to have width 1/2, the sum of the heights should be 2. To make this make more sense, use the breaks parameter to hist
hist(rbinom(10000, 10, 0.1), freq=FALSE, breaks = 5)
Or maybe even better
hist(rbinom(10000, 10, 0.1), freq=FALSE, breaks=seq(-0.5,5.5,1))

You can integrate the density function estimated based on your sample. The answer is approximately 1, so no contradiction.
set.seed(444)
s <- rbinom(10000, 10, 0.1)
dens_s <- table(s)/sum(table(s))
sum(as.numeric(names(dens_s))*dens_s)

Related

Splitting up a correlation matrix in R

I have a dataset with 400 variables that I am using to produce a correlation matrix (i.e. comparing each variable against one another). The resulting matrix has the following structure:
var_1
var_2
var_3
var_4
var_5
var_6
etc (to 400 vars)
var_1
1
0.1
0.2
0.2
0.4
0.8
var_2
0.1
1
0.15
0.3
0.11
0.6
var_3
0.2
0.15
1
0.47
0.05
0.72
var_4
0.2
0.3
0.47
1
0.25
0.54
var_5
0.4
0.11
0.05
0.25
1
0.84
var_6
0.8
0.6
0.72
0.54
0.84
1
etc (to 400 vars)
I am then generating a figure of the correlation matrix with the corrplot package using the following command:
library(corrplot)
corrplot(df, order = "hclust",
tl.col = "black", tl.srt = 45)
Unsurprisingly, this results in a very large figure that is interpretable. I was hoping to split up my matrix and then create separate correlation matrix figures (realistically 20 pairwise comparisons of variables at a time). I am struggling to find code that will help me split up my matrix and plug it back into corrplot. Any help would be hugely appreciated!
Thank you.

One approach would be to split your correlation matrix up and then plot all submatrices. The challenge with that is that you are setting order=, which you (apparently) do not know a priori. Assuming you want to allow corrplot to determine the order, then here's a method: plot the whole thing first capturing the function's return value (which contains order information), then split the matrix and plot the components.
Helpful: while most plotting functions operate by side-effect (creating a plot, not necessarily returning values), some return information useful for working with or around its plot components. corrplot is no different; from ?corrplot:
Value:
(Invisibly) returns a 'list(corr, corrTrans, arg)'. 'corr' is a
reordered correlation matrix for plotting. 'corrPos' is a data
frame with 'xName, yName, x, y, corr' and 'p.value'(if p.mat is
not NULL) column, which x and y are the position on the
correlation matrix plot. 'arg' is a list of some corrplot() input
parameters' value. Now 'type' is in.
With this, let's get started. I'll be using mtcars
Plot the whole thing. If this takes a long time or you don't want it to try to plot in R's graphics pane, then uncomment the png and dev.off, intended just to dump the plot itself to "nothing". ("NUL" is a windows thing ... I suspect "/dev/null" should work on most other OSes, untested.)
# png("NUL")
CP <- corrplot::corrplot(M, order="hclust", tl.col="black", tl.srt=45)
# dev.off()
str(CP)
# List of 3
# $ corr : num [1:6, 1:6] 1 0.4 0.8 0.1 0.2 0.2 0.4 1 0.84 0.11 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:6] "var_1" "var_5" "var_6" "var_2" ...
# .. ..$ : chr [1:6] "var_1" "var_5" "var_6" "var_2" ...
# $ corrPos:'data.frame': 36 obs. of 5 variables:
# ..$ xName: chr [1:36] "var_1" "var_1" "var_1" "var_1" ...
# ..$ yName: chr [1:36] "var_1" "var_5" "var_6" "var_2" ...
# ..$ x : num [1:36] 1 1 1 1 1 1 2 2 2 2 ...
# ..$ y : num [1:36] 6 5 4 3 2 1 6 5 4 3 ...
# ..$ corr : num [1:36] 1 0.4 0.8 0.1 0.2 0.2 0.4 1 0.84 0.11 ...
# $ arg :List of 1
# ..$ type: chr "full"
rownames(CP$corr) (which equals colnames(CP$corr)) provides the column/row order, so we can use those as our ordering (resulting from order="hclust".
M_reord <- M[rownames(CP$corr), colnames(CP$corr)]
M_reord
# var_1 var_5 var_6 var_2 var_3 var_4
# var_1 1.0 0.40 0.80 0.10 0.20 0.20
# var_5 0.4 1.00 0.84 0.11 0.05 0.25
# var_6 0.8 0.84 1.00 0.60 0.72 0.54
# var_2 0.1 0.11 0.60 1.00 0.15 0.30
# var_3 0.2 0.05 0.72 0.15 1.00 0.47
# var_4 0.2 0.25 0.54 0.30 0.47 1.00
Now we split the matrix up. For this example, I'll assume your "20" is really "3". Some helper objects:
grps <- (seq_len(nrow(M)) - 1) %/% 3
grps
# [1] 0 0 0 1 1 1
eg <- expand.grid(row = unique(grps), col = unique(grps))
eg
# row col
# 1 0 0
# 2 1 0
# 3 0 1
# 4 1 1
where the row of eg counts columns-first (top-bottom then left-right), like so:
Given a specific submatrix number (row of eg), plot it. Let's try "2":
subplt <- 2
rows <- which(grps == eg[subplt, "row"])
cols <- which(grps == eg[subplt, "col"])
corrplot::corrplot(M[rows, cols], tl.col="black", tl.srt=45)
If you want to automate plotting these (e.g., in an rmarkdown document, in shiny), then you can loop over them with:
for (subplt in seq_len(nrow(eg))) {
rows <- which(grps == eg[subplt, "row"])
cols <- which(grps == eg[subplt, "col"])
corrplot::corrplot(M[rows, cols], tl.col="black", tl.srt=45)
}

2 survival functions, 1 left-truncated, 1 not truncated. How create survival function in R that assumes same experience over the truncated interval?

I have two survival functions, one is not truncated so I have experience for all time periods. The other is left-truncated until t = 4, so it has no experience until t > 4. I can plot the two together in the following code in R using the survival package.
library(tidyverse)
library(survival)
library(ggfortify)
# create two survival functions
set1 <- tibble(start0 = rep(0,10), end0 = 1:10, event0 = rep(1,10))
set2 <- tibble(start0 = rep(4,10), end0 = c(5, 5, 7, 9, rep(10, 6)), event0 = rep(1,10))
combined_set <- bind_rows(set1, set2)
survival_fn <- survfit(Surv(start0, end0, event0) ~ start0, data = combined_set)
# plot the survival function:
autoplot(survival_fn, conf.int = FALSE)
I would like to show the difference in survival between the two functions if they had both experienced the same survival experience during the truncation period - i.e. up to t = 4. I've manually sketched the approximate graph I am trying to achieve (size of steps not to scale).
This is a simplified example - in practice I have eight different sets of data with different truncation periods, and around 2000 data-points in each set.

If you look at the structure of the survival_fn object (which is not a function but rather a list), you see:
str(survival_fn)
List of 17
$ n : int [1:2] 10 10
$ time : num [1:14] 1 2 3 4 5 6 7 8 9 10 ...
$ n.risk : num [1:14] 10 9 8 7 6 5 4 3 2 1 ...
$ n.event : num [1:14] 1 1 1 1 1 1 1 1 1 1 ...
$ n.censor : num [1:14] 0 0 0 0 0 0 0 0 0 0 ...
$ surv : num [1:14] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ...
$ std.err : num [1:14] 0.105 0.158 0.207 0.258 0.316 ...
$ cumhaz : num [1:14] 0.1 0.211 0.336 0.479 0.646 ...
$ std.chaz : num [1:14] 0.1 0.149 0.195 0.242 0.294 ...
$ strata : Named int [1:2] 10 4
..- attr(*, "names")= chr [1:2] "start0=0" "start0=4"
$ type : chr "counting"
$ logse : logi TRUE
$ conf.int : num 0.95
$ conf.type: chr "log"
$ lower : num [1:14] 0.732 0.587 0.467 0.362 0.269 ...
$ upper : num [1:14] 1 1 1 0.995 0.929 ...
$ call : language survfit(formula = Surv(start0, end0, event0) ~ start0, data = combined_set)
- attr(*, "class")= chr "survfit"
So one way of getting something like you goal although still with an automatic start to the survival function at (t=0,S=1) would be to multiply all the $surv items in the 'start0=4'-stratum by the surv value at t=4, and then redo the plot:
survival_fn[['surv']][11:14] <- survival_fn[['surv']][11:14]*survival_fn[['surv']][4]
I can see why this might not be a totally conforming answer since there is still a blue line from 1 out to t=5 and it doesn't actually start at the surv value for stratum 1 at t=4. That is however a limitation of using a "high-level" abstraction plotting paradigm. The customizability is inhibited by the many "helpful" assumptions built into the plotting grammar. It would not be as difficult to do this in base plotting since you could "move things around" without as many constraints.
If you do need to build a step unction from estimated survial proportions and times you might look at this answer and then build an augmented dataset with a y at time=4 adjustment for the later stratum. You would need to add a time=0 value of for the main stratum and a time=4 value of the first stratum for the second stratum as well as dong the adjustment as shown above. See this question and answer. Reconstruct survival curve from coordinates

Logistic Ridge Regression predict ROC/ AUC and accuracy testing

I am trying to fit Logistic Ridge Regression and developed the model as follows; I need help with the coding for testing it for accuracy and ROC/AUC curve with threshold value.
My coding is as follows:
Fitting the model
library(glmnet)
library(caret)
data1<-read.csv("D:\\Research\\Final2.csv",header=T,sep=",")
str(data1)
'data.frame': 154 obs. of 12 variables:
$ Earningspershare : num 12 2.69 8.18 -0.91 3.04 ...
$ NetAssetsPerShare: num 167.1 17.2 41.1 14.2 33 ...
$ Dividendpershare : num 3 1.5 1.5 0 1.25 0 0 0 0 0.5 ...
$ PE : num 7.35 8.85 6.66 -5.27 18.49 ...
$ PB : num 0.53 1.38 1.33 0.34 1.7 0.23 0.5 3.1 0.5 0.3 ...
$ ROE : num 0.08 0.16 0.27 -0.06 0.09 -0.06 -0.06 0.15 0.09 0.
$ ROA : num 0.02 0.09 0.14 -0.03 0.05 -0.04 -0.05 0.09 0.03 0
$ Log_MV : num 8.65 10.38 9.81 8.3 10.36 ..
$ Return_yearly : int 0 1 0 0 0 0 0 0 0 0 ...
$ L3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ L6 : int 0 0 0 0 0 0 0 0 0 0 ...
$ Sector : int 2 2 2 2 2 1 2 2 4 1 ...
smp_size <- floor(0.8 * nrow(data1))
set.seed(123)
train_ind <- sample(seq_len(nrow(data1)), size = smp_size)
train <- data1[train_ind, ]
test <- data1[-train_ind, ]
train$Return_yearly <-as.factor(train$Return_yearly)
train$L3 <-as.factor(train$L3)
train$L6 <-as.factor(train$L6)
train$Sector <-as.factor(train$Sector)
train$L3 <-model.matrix( ~ L3 - 1, data=train)
train$L6 <-model.matrix( ~ L6 - 1, data=train)
train$Sector<-model.matrix( ~ Sector - 1, data=train)
x <- model.matrix(Return_yearly ~., train)
y <- train$Return_yearly
ridge.mod <- glmnet(x, y=as.factor(train$Return_yearly), family='binomial', alpha=0, nlambda=100, lambda.min.ratio=0.0001)
set.seed(1)
cv.out <- cv.glmnet(x, y=as.factor(train$Return_yearly), family='binomial', alpha=0, nfolds = 5, type.measure = "auc", nlambda=100, lambda.min.ratio=0.0001)
plot(cv.out)
best.lambda <- cv.out$lambda.min
best.lambda
[1] 5.109392
Testing the model
test$L3 <-as.factor(test$L3)
test$L6 <-as.factor(test$L6)
test$Sector <-as.factor(test$Sector)
test$Return_yearly <-as.factor(test$Return_yearly)
test$L3 <-model.matrix( ~ L3 - 1, data=test)
test$L6 <-model.matrix( ~ L6 - 1, data=test)
test$Sector<-model.matrix( ~ Sector - 1, data=test)
newx <- model.matrix(Return_yearly ~., test)
y.pred <- as.matrix(ridge.mod,newx=newx, type="class",data=test)
comparing for accuracy testing; error pops up, unable to continue
compare <- cbind (actual=test$Return_yearly, y.pred)
Warning message:
In cbind(actual = test$Return_yearly, y.pred) :
number of rows of result is not a multiple of vector length (arg 1)

Without a reproducible dataset here's a guess:
The train and test matrices have different columns as the result of converting L3 and L6 to factors. By default, as.factor() creates as many levels in a factor as there are unique values, so if by chance the train/test split has different unique values of L3 or L6, the number of dummy variables created by model.matrix() will be different as well.
Possible solution: do as.factor() before train/test split, or supply as.factor with the complete levels, like
train$L3 <- as.factor(train$L3, levels = unique(data1$L3))

Use the following code to plot the accuracy and sensitivity.
ROC_Pre <- prediction(ROC_Pre, data$LSD)
ROC <- performance(ROC_Pre, "tpr", "fpr")
plot(ROC)

geom_histogram: wrong bins?

I am using ggplot 2.1.0 to plot histograms, and I have an unexpected behaviour concerning the histogram bins.
I put here an example with left-closed bins (i.e. [ 0, 0.1 [ ) with a binwidth of 0.1.
mydf <- data.frame(myvar=c(-1,-0.5,-0.4,-0.1,-0.1,0.05,0.1,0.1,0.25,0.5,1))
myplot <- ggplot(mydf, aes(myvar)) + geom_histogram(aes(y=..count..),binwidth = 0.1, boundary=0.1,closed="left")
myplot
ggplot_build(myplot)$data[[1]]
On this example, one may expect the value -0.4 to be within the bin [-0.4, -0.3[, but it falls instead (mysteriously) in the bin [-0.5,-0.4[. Same thing for the value -0.1 which falls in [-0.2,-0.1[ instead of [-0.1,0[...etc.
Is there something here I do not fully understand (especially with the new "center" and "boundary" params)? Or is ggplot2 doing weird things there?
Thanks in advance,
Best regards,
Arnaud
PS: Also asked here: https://github.com/hadley/ggplot2/issues/1651

Edit: The problem described below was fixed in a recent release of ggplot2.
Your issue is reproducible and appears to be caused by rounding errors, as suggested in the comments by Roland. At this point, this looks to me like a bug introduced in version ggplot2_2.0.0. I speculate below about its origin, but first let me present a workaround based on the boundary option.
PROBLEM:
df <- data.frame(var = seq(-100,100,10)/100)
as.list(df) # check the data
$var
[1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2
[10] -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
[19] 0.8 0.9 1.0
library("ggplot2")
p <- ggplot(data = df, aes(x = var)) +
geom_histogram(aes(y = ..count..),
binwidth = 0.1,
boundary = 0.1,
closed = "left")
p
SOLUTION
Tweak the boundary parameter. In this example, setting just below 1, say 0.99, works. Your use case should be amenable to tweaking too.
ggplot(data = df, aes(x = var)) +
geom_histogram(aes(y = ..count..),
binwidth = 0.05,
boundary = 0.99,
closed = "left")
(I have made the binwidth narrower for better visual)
Another workaround is to introduce your own fuzziness, e.g. multiply the data by 1 plus slightly less than the machine zero (see eps below). In ggplot2 the fuzziness multiplies by 1e-7 (earlier versions) or 1e-8 (later versions).
CAUSE:
The problem appears clearly in ncount:
str(ggplot_build(p)$data[[1]])
## 'data.frame': 20 obs. of 17 variables:
## $ y : num 1 1 1 1 1 2 1 1 1 0 ...
## $ count : num 1 1 1 1 1 2 1 1 1 0 ...
## $ x : num -0.95 -0.85 -0.75 -0.65 -0.55 -0.45 -0.35 -0.25 -0.15 -0.05 ...
## $ xmin : num -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 ...
## $ xmax : num -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 ...
## $ density : num 0.476 0.476 0.476 0.476 0.476 ...
## $ ncount : num 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0 ...
## $ ndensity: num 1.05 1.05 1.05 1.05 1.05 2.1 1.05 1.05 1.05 0 ...
## $ PANEL : int 1 1 1 1 1 1 1 1 1 1 ...
## $ group : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ ymin : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ymax : num 1 1 1 1 1 2 1 1 1 0 ...
## $ colour : logi NA NA NA NA NA NA ...
## $ fill : chr "grey35" "grey35" "grey35" "grey35" ...
## $ size : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
## $ linetype: num 1 1 1 1 1 1 1 1 1 1 ...
## $ alpha : logi NA NA NA NA NA NA ...
ggplot_build(p)$data[[1]]$ncount
## [1] 0.5 0.5 0.5 0.5 0.5 1.0 0.5 0.5 0.5 0.0 1.0 0.5
## [13] 0.5 0.5 0.0 1.0 0.5 0.0 1.0 0.5
ROUNDING ERRORS?
Looks like:
df <- data.frame(var = as.integer(seq(-100,100,10)))
# eps <- 1.000000000000001 # on my system
eps <- 1+10*.Machine$double.eps
p <- ggplot(data = df, aes(x = eps*var/100)) +
geom_histogram(aes(y = ..count..),
binwidth = 0.05,
closed = "left")
p
(I have removed the boundary option altogether)
This behaviour appears some time after ggplot2_1.0.1. Looking at the source code, e.g. bin.R and stat-bin.r in https://github.com/hadley/ggplot2/blob/master/R, and tracing the computations of count leads to function bin_vector(), which contains the following lines:
bin_vector <- function(x, bins, weight = NULL, pad = FALSE) {
... STUFF HERE I HAVE DELETED FOR CLARITY ...
cut(x, bins$breaks, right = bins$right_closed,
include.lowest = TRUE)
... STUFF HERE I HAVE DELETED FOR CLARITY ...
}
By comparing the current versions of these functions with older ones, you should be able to find the reason for the different behaviour... to be continued...
SUMMING UP DEBUGGING
By "patching" the bin_vector function and printing the output to screen, it appears that:
bins$fuzzy correctly stores the fuzzy parameters
The non-fuzzy bins$breaks are used in the computations, but as far as I can see (and correct me if I'm wrong) the bins$fuzzy are not.
If I simply replace bins$breaks with bins$fuzzy at the top of bin_vector, the correct plot is returned. Not a proof of a bug, but a suggestion that perhaps more could be done to emulate the behaviour of previous versions of ggplot2.
At the top of bin_vector I expected to find a condition upon which to return either bins$breaks or bins$fuzzy. I think that's missing now.
PATCHING
To "patch" the bin_vector function, copy the function definition from the github source or, more conveniently, from the terminal, with:
ggplot2:::bin_vector
Modify it (patch it) and assign it into the namespace:
library("ggplot2")
bin_vector <- function (x, bins, weight = NULL, pad = FALSE)
{
... STUFF HERE I HAVE DELETED FOR CLARITY ...
## MY PATCH: Replace bins$breaks with bins$fuzzy
bin_idx <- cut(x, bins$fuzzy, right = bins$right_closed,
include.lowest = TRUE)
... STUFF HERE I HAVE DELETED FOR CLARITY ...
ggplot2:::bin_out(bin_count, bin_x, bin_widths)
## THIS IS THE PATCHED FUNCTION
}
assignInNamespace("bin_vector", bin_vector, ns = "ggplot2")
df <- data.frame(var = seq(-100,100,10)/100)
ggplot(data = df, aes(x = var)) + geom_histogram(aes(y = ..count..), binwidth = 0.05, boundary = 1, closed = "left")
Just to be clear, the code above is edited for clarity: the function has a lot of type-checking and other calculations which I have removed, but which you would need to patch the function. Before you run the patch, restart your R session or detach your currently loaded ggplot2.
OLD VERSIONS
The unexpected behaviour is NOT observed in versions 2.0.9.3 or 2.1.0.1 and appears to originate in the current release 2.2.0.1 (or perhaps the earlier 2.2.0.0, which gave me an error when I tried to call it).
To install and load an old version, say ggplot2_0.9.3, create a separate directory (no point in overwriting the current version), say ggplot2093:
URL <- "http://cran.r-project.org/src/contrib/Archive/ggplot2/ggplot2_0.9.3.tar.gz"
install.packages(URL, repos = NULL, type = "source",
lib = "~/R/testing/ggplot2093")
To load the old version, call it from your local directory:
library("ggplot2", lib.loc = "~/R/testing/ggplot2093")

how can i find the total frequency of a given range in a histogram?

I created a histogram for a simulation, and now I need to find the total number of instances where the x-variable is greater than a given value. Specifically, my data is correlation (ranging from -1 to 1, with bin size 0.05), and I want to find the percent of the events where the correlation is greater than 0.1. Finding the total number of events greater than 0.1 is fine, because it's an easy percent to compute.
library(psych)
library(lessR)
corrData=NULL
for (i in 1:1000){
x1 <- rnorm(mean=0, sd = 1, n=20)
x2 <- rnorm(mean=0, sd = 1, n=20)
data <- data.frame(x1,x2)
r <- with(data, cor(x1, x2))
corrData <- append(corrData,r)
}
describe(corrData)
hist <- hist(corrData, breaks=seq(-1,1,by=.05), main="N=20")
describe(hist) count(0.1, "N=20")

TRy something like this:
N=500
bh=hist(runif(N,-1,1))
#str(bh)
sum(bh$counts[bh$mids>=.1])/N

Look at what hist is actually giving you (see ?hist):
set.seed(10230)
x<-hist(2*runif(1000)-1)
> str(x)
List of 6
$ breaks : num [1:11] -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 ...
$ counts : int [1:10] 92 99 100 105 92 116 95 102 100 99
$ density : num [1:10] 0.46 0.495 0.5 0.525 0.46 0.58 0.475 0.51 0.5 0.495
$ mids : num [1:10] -0.9 -0.7 -0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9
$ xname : chr "2 * runif(1000) - 1"
$ equidist: logi TRUE
- attr(*, "class")= chr "histogram"
The breaks list item tells you the endpoints of the "catching" intervals. The counts item tells you the counts in the (one fewer) bins defined by these breaks.
So, to get as close as you can to what you want using only your hist object, you could do:
sum(x$counts[which(x$breaks>=.1)-1L])/sum(x$counts)
But, as #Frank said, this may be incorrect, particularly if the bin containing .1 does not have an endpoint at .1.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

density histogram in R - r

You can integrate the density function estimated based on your sample. The answer is approximately 1, so no contradiction. set.seed(444) s <- rbinom(10000, 10, 0.1) dens_s <- table(s)/sum(table(s)) sum(as.numeric(names(dens_s))*dens_s)

Related

Splitting up a correlation matrix in R

2 survival functions, 1 left-truncated, 1 not truncated. How create survival function in R that assumes same experience over the truncated interval?

Logistic Ridge Regression predict ROC/ AUC and accuracy testing

geom_histogram: wrong bins?

how can i find the total frequency of a given range in a histogram?

Categories

Resources