Splitting up a correlation matrix in R

Splitting up a correlation matrix in R - r

I have a dataset with 400 variables that I am using to produce a correlation matrix (i.e. comparing each variable against one another). The resulting matrix has the following structure:
var_1
var_2
var_3
var_4
var_5
var_6
etc (to 400 vars)
var_1
1
0.1
0.2
0.2
0.4
0.8
var_2
0.1
1
0.15
0.3
0.11
0.6
var_3
0.2
0.15
1
0.47
0.05
0.72
var_4
0.2
0.3
0.47
1
0.25
0.54
var_5
0.4
0.11
0.05
0.25
1
0.84
var_6
0.8
0.6
0.72
0.54
0.84
1
etc (to 400 vars)
I am then generating a figure of the correlation matrix with the corrplot package using the following command:
library(corrplot)
corrplot(df, order = "hclust",
tl.col = "black", tl.srt = 45)
Unsurprisingly, this results in a very large figure that is interpretable. I was hoping to split up my matrix and then create separate correlation matrix figures (realistically 20 pairwise comparisons of variables at a time). I am struggling to find code that will help me split up my matrix and plug it back into corrplot. Any help would be hugely appreciated!
Thank you.

One approach would be to split your correlation matrix up and then plot all submatrices. The challenge with that is that you are setting order=, which you (apparently) do not know a priori. Assuming you want to allow corrplot to determine the order, then here's a method: plot the whole thing first capturing the function's return value (which contains order information), then split the matrix and plot the components.
Helpful: while most plotting functions operate by side-effect (creating a plot, not necessarily returning values), some return information useful for working with or around its plot components. corrplot is no different; from ?corrplot:
Value:
(Invisibly) returns a 'list(corr, corrTrans, arg)'. 'corr' is a
reordered correlation matrix for plotting. 'corrPos' is a data
frame with 'xName, yName, x, y, corr' and 'p.value'(if p.mat is
not NULL) column, which x and y are the position on the
correlation matrix plot. 'arg' is a list of some corrplot() input
parameters' value. Now 'type' is in.
With this, let's get started. I'll be using mtcars
Plot the whole thing. If this takes a long time or you don't want it to try to plot in R's graphics pane, then uncomment the png and dev.off, intended just to dump the plot itself to "nothing". ("NUL" is a windows thing ... I suspect "/dev/null" should work on most other OSes, untested.)
# png("NUL")
CP <- corrplot::corrplot(M, order="hclust", tl.col="black", tl.srt=45)
# dev.off()
str(CP)
# List of 3
# $ corr : num [1:6, 1:6] 1 0.4 0.8 0.1 0.2 0.2 0.4 1 0.84 0.11 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:6] "var_1" "var_5" "var_6" "var_2" ...
# .. ..$ : chr [1:6] "var_1" "var_5" "var_6" "var_2" ...
# $ corrPos:'data.frame': 36 obs. of 5 variables:
# ..$ xName: chr [1:36] "var_1" "var_1" "var_1" "var_1" ...
# ..$ yName: chr [1:36] "var_1" "var_5" "var_6" "var_2" ...
# ..$ x : num [1:36] 1 1 1 1 1 1 2 2 2 2 ...
# ..$ y : num [1:36] 6 5 4 3 2 1 6 5 4 3 ...
# ..$ corr : num [1:36] 1 0.4 0.8 0.1 0.2 0.2 0.4 1 0.84 0.11 ...
# $ arg :List of 1
# ..$ type: chr "full"
rownames(CP$corr) (which equals colnames(CP$corr)) provides the column/row order, so we can use those as our ordering (resulting from order="hclust".
M_reord <- M[rownames(CP$corr), colnames(CP$corr)]
M_reord
# var_1 var_5 var_6 var_2 var_3 var_4
# var_1 1.0 0.40 0.80 0.10 0.20 0.20
# var_5 0.4 1.00 0.84 0.11 0.05 0.25
# var_6 0.8 0.84 1.00 0.60 0.72 0.54
# var_2 0.1 0.11 0.60 1.00 0.15 0.30
# var_3 0.2 0.05 0.72 0.15 1.00 0.47
# var_4 0.2 0.25 0.54 0.30 0.47 1.00
Now we split the matrix up. For this example, I'll assume your "20" is really "3". Some helper objects:
grps <- (seq_len(nrow(M)) - 1) %/% 3
grps
# [1] 0 0 0 1 1 1
eg <- expand.grid(row = unique(grps), col = unique(grps))
eg
# row col
# 1 0 0
# 2 1 0
# 3 0 1
# 4 1 1
where the row of eg counts columns-first (top-bottom then left-right), like so:
Given a specific submatrix number (row of eg), plot it. Let's try "2":
subplt <- 2
rows <- which(grps == eg[subplt, "row"])
cols <- which(grps == eg[subplt, "col"])
corrplot::corrplot(M[rows, cols], tl.col="black", tl.srt=45)
If you want to automate plotting these (e.g., in an rmarkdown document, in shiny), then you can loop over them with:
for (subplt in seq_len(nrow(eg))) {
rows <- which(grps == eg[subplt, "row"])
cols <- which(grps == eg[subplt, "col"])
corrplot::corrplot(M[rows, cols], tl.col="black", tl.srt=45)
}

Related

Error in quantmod::Lag when adding columns to a dataframe

I have the following dataframe df:
tickers <- c('AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL')
returns <- c(0.1, 0.2, 0.3, -0.15, -0.25, .09, 0.4, -0.2)
df <- data.frame(tickers, returns)
df
tickers returns
1 AAPL 0.10
2 AAPL 0.20
3 AAPL 0.30
4 AAPL -0.15
5 AAPL -0.25
6 AAPL 0.09
7 AAPL 0.40
8 AAPL -0.20
I would like to add a column with the lagged returns. To do so, I use:
df$lag_1 <- Lag(df$returns , k=1)
Which produces:
tickers returns Lag.1
1 AAPL 0.10 NA
2 AAPL 0.20 0.10
3 AAPL 0.30 0.20
4 AAPL -0.15 0.30
5 AAPL -0.25 -0.15
6 AAPL 0.09 -0.25
7 AAPL 0.40 0.09
8 AAPL -0.20 0.40
So far, so good. But, when I try to use a variable to define the 2-day lag, I get an error message:
lookup <- 'returns'
df$lag_2 <- Lag(paste('df$', lookup) , k=2)
Error in Lag.default(paste("df$", lookup), k = 2) :
x must be a time series or numeric vector

Use [[ instead of $
library(quantmod)
df$lag_2 <- Lag(df[[lookup]], k = 2)[,1]
-output
> df
tickers returns lag_2
1 AAPL 0.10 NA
2 AAPL 0.20 NA
3 AAPL 0.30 0.10
4 AAPL -0.15 0.20
5 AAPL -0.25 0.30
6 AAPL 0.09 -0.15
7 AAPL 0.40 -0.25
8 AAPL -0.20 0.09

The stats::lag function is designed for application to time series objects. It is not designed to "lag" ordinary vectors. The lagging of a time series object is accomplished by altering its time base. The quantmod package's help page for its Lag function describes the differences succinctly:
This function differs from lag by returning the original series modified, as opposed to simply changing the time series properties. It differs from the like named Lag in the Hmisc as it deals primarily with time-series like objects.
It is important to realize that if there is no applicable method for Lag, the value returned will be from lag in base. That is, coerced to 'ts' if necessary, and subsequently shifted.
Neither the question, nor the current answer have included the needed code to load the quantmod package:
library(quantmod)
The other learning opportunity is that the expression paste('df$', lookup) will never be effective. That attempt probably comes from experience with what are called "macro" languages". R does not parse and interpret constructed strings like that. The unquoted strings typed at the console are handled differently than strings built with paste or paste0. As #akrun demonstrated, it is possible to use the extraction and assignment operators, [[ and [[<-, with string valued values.
And a third learning opportunity comes from noticing that the name that appears at the top of your new column was not the same on that you assigned to it. What happened is that the result from quantmod::Lag was a matrix named "Lag.1" rather than a vector. The quantmod package is designed to work with zoo-like objects which are matrices rather than dataframes. Noter further that trying to access that clumn with the name that appears in the print-representation will not succeed:
> str(df)
'data.frame': 8 obs. of 3 variables:
$ tickers: chr "AAPL" "AAPL" "AAPL" "AAPL" ...
$ returns: num 0.1 0.2 0.3 -0.15 -0.25 0.09 0.4 -0.2
$ lag_1 : num [1:8, 1] NA 0.1 0.2 0.3 -0.15 -0.25 0.09 0.4
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "Lag.1"
> df$Lag.1 # FAIL
NULL
> df$lag_1 # Success
Lag.1
[1,] NA
[2,] 0.10
[3,] 0.20
[4,] 0.30
[5,] -0.15
[6,] -0.25
[7,] 0.09
[8,] 0.40
If you will be using "quantmod" or "tidyquant", you will definitely need to understand he differences in accessing values inside matrices versus accessing values in matrices.

Calculating and Appending Column Totals of Select Columns in a Data Frame in R

I have the following code for calculating certain quantities of interest, specifically the sum of the two right-most columns.
library(dplyr)
library(janitor)
m = c(0, 0.8, 2.3, 4.1, 2.1)
l = c(0.3, 0.8, 0.9, 0.75, 0.25)
mytable = data.frame(l, m)
rownames(mytable) = paste("Group", 1:5)
# Initial population
n0 = c(1,1,1,1,1)
mytable = mytable %>%
mutate(lm = l * m) %>%
mutate(n = n0) %>%
mutate(offspring = lm * n) %>%
adorn_totals("row")
This gives the following output:
> mytable
l m lm n offspring
0.3 0.0 0.000 1 0.000
0.8 0.8 0.640 1 0.640
0.9 2.3 2.070 1 2.070
0.75 4.1 3.075 1 3.075
0.25 2.1 0.525 1 0.525
Total 9.3 6.310 5 6.310
I have the following issues:
How to isolate the column totals for specific columns? In my case, I would like the column totals for just columns n and offspring. I read the documentation for the adorn_totals() function but I could not figure out how to do this.
The row names assigned are missing. How can I make the row names appear and have the word "Total" as the row name for the new row of column totals?
The row total does not appear for the first column, which is strange.

To your first and third points: you can control which columns are totaled by specifying column names to the ... argument of adorn_totals(). Using ... requires specifying values for the other arguments, even if they're empty, thus the ,,,, below to accept the default values for those arguments.
The first column is skipped by default, as this is usually a group ID (like your rownames), but you can specify that it should be totaled.
Here is how you'd total the columns l, n, and offspring:
mytable %>%
mutate(lm = l * m) %>%
mutate(n = n0) %>%
mutate(offspring = lm * n) %>%
adorn_totals("row",,,,l, n, offspring)
Returns:
l m lm n offspring
0.30 0 0 1 0.000
0.80 0.8 0.64 1 0.640
0.90 2.3 2.07 1 2.070
0.75 4.1 3.075 1 3.075
0.25 2.1 0.525 1 0.525
3.00 - - 5 6.310
Along with the warning:
Because the first column was specified to be totaled, it does not contain the label 'Total' (or user-specified name) in the totals row

An option is to convert the columns other than the required columns to character class and then change it later. Regarding the row names, tibble doesn't allow for row names. We may need to create a column first with rownames_to_column
library(dplyr)
library(tibble)
library(janitor)
out <- mytable %>%
rownames_to_column('rn') %>%
mutate(lm = l *m, n = n0, offspring = lm * n) %>%
mutate(across(-c(n, offspring), as.character)) %>%
adorn_totals('row', fill = NA) %>%
type.convert(as.is = TRUE)
-output
> out
rn l m lm n offspring
Group 1 0.30 0.0 0.000 1 0.000
Group 2 0.80 0.8 0.640 1 0.640
Group 3 0.90 2.3 2.070 1 2.070
Group 4 0.75 4.1 3.075 1 3.075
Group 5 0.25 2.1 0.525 1 0.525
Total NA NA NA 5 6.310
> str(out)
Classes ‘tabyl’ and 'data.frame': 6 obs. of 6 variables:
$ rn : chr "Group 1" "Group 2" "Group 3" "Group 4" ...
$ l : num 0.3 0.8 0.9 0.75 0.25 NA
$ m : num 0 0.8 2.3 4.1 2.1 NA
$ lm : num 0 0.64 2.07 3.075 0.525 ...
$ n : int 1 1 1 1 1 5
$ offspring: num 0 0.64 2.07 3.075 0.525 ...
- attr(*, "core")='data.frame': 5 obs. of 6 variables:
..$ rn : chr [1:5] "Group 1" "Group 2" "Group 3" "Group 4" ...
..$ l : chr [1:5] "0.3" "0.8" "0.9" "0.75" ...
..$ m : chr [1:5] "0" "0.8" "2.3" "4.1" ...
..$ lm : chr [1:5] "0" "0.64" "2.07" "3.075" ...
..$ n : num [1:5] 1 1 1 1 1
..$ offspring: num [1:5] 0 0.64 2.07 3.075 0.525
- attr(*, "tabyl_type")= chr "two_way"
- attr(*, "totals")= chr "row"

mlogit "row names supplied are of the wrong length", R

I am implementing a multinomial logit model using the mlogit package in R. The data includes three different "choices" and three variables (A, B, C) which contains information for the independent variable. I have transformed the data into a wide format using the mlogit.data function which makes it look like this:
Observation Choice VariableA VariableB VariableC
1 1 1.27 0.2 0.81
1 0 1.27 0.2 0.81
1 -1 1.27 0.2 0.81
2 1 0.20 0.45 0.70
2 0 0.20 0.45 0.70
2 -1 0.20 0.45 0.70
The thing is that I want the independent variable to be choice-specific and therefore being constructed as Variable D below:
Observation Choice VariableA VariableB VariableC VariableD
1 1 1.27 0.2 0.81 1.27
1 0 1.27 0.2 0.81 0.2
1 -1 1.27 0.2 0.81 0.81
2 1 0.20 0.45 0.70 0.20
2 0 0.20 0.45 0.70 0.45
2 -1 0.20 0.45 0.70 0.70
Variable D was constructed using the following code:
choice_map <- data.frame(choice = c(1, 0, -1), var = grep('Variable[A-C]', names(df)))
df$VariableD <- df[cbind(seq_len(nrow(df)), with(choice_map, var[match(df$Choice, choice)]))]
However, when I try to run the multinomial logit model,
mlog <- mlogit(Choice ~ 1 | VariableD, data=df, reflevel = "0")
the error message "row names supplied are of the wrong length" is returned. When I use any of the other variables A-C separately the regression is run without any problems, so my questions are therefore: why can't Variable D be used and how can this problem be solved?
Thanks!

I got this error when I entered my original dataframe into the model, and not the wide dataframe created by mlogit.data.
So make sure to create your "wide" dataframe first and enter this into your mlogit function.
(source: Andy Field, Discovering statistics using R, page 348)

geom_histogram: wrong bins?

I am using ggplot 2.1.0 to plot histograms, and I have an unexpected behaviour concerning the histogram bins.
I put here an example with left-closed bins (i.e. [ 0, 0.1 [ ) with a binwidth of 0.1.
mydf <- data.frame(myvar=c(-1,-0.5,-0.4,-0.1,-0.1,0.05,0.1,0.1,0.25,0.5,1))
myplot <- ggplot(mydf, aes(myvar)) + geom_histogram(aes(y=..count..),binwidth = 0.1, boundary=0.1,closed="left")
myplot
ggplot_build(myplot)$data[[1]]
On this example, one may expect the value -0.4 to be within the bin [-0.4, -0.3[, but it falls instead (mysteriously) in the bin [-0.5,-0.4[. Same thing for the value -0.1 which falls in [-0.2,-0.1[ instead of [-0.1,0[...etc.
Is there something here I do not fully understand (especially with the new "center" and "boundary" params)? Or is ggplot2 doing weird things there?
Thanks in advance,
Best regards,
Arnaud
PS: Also asked here: https://github.com/hadley/ggplot2/issues/1651

Edit: The problem described below was fixed in a recent release of ggplot2.
Your issue is reproducible and appears to be caused by rounding errors, as suggested in the comments by Roland. At this point, this looks to me like a bug introduced in version ggplot2_2.0.0. I speculate below about its origin, but first let me present a workaround based on the boundary option.
PROBLEM:
df <- data.frame(var = seq(-100,100,10)/100)
as.list(df) # check the data
$var
[1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2
[10] -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
[19] 0.8 0.9 1.0
library("ggplot2")
p <- ggplot(data = df, aes(x = var)) +
geom_histogram(aes(y = ..count..),
binwidth = 0.1,
boundary = 0.1,
closed = "left")
p
SOLUTION
Tweak the boundary parameter. In this example, setting just below 1, say 0.99, works. Your use case should be amenable to tweaking too.
ggplot(data = df, aes(x = var)) +
geom_histogram(aes(y = ..count..),
binwidth = 0.05,
boundary = 0.99,
closed = "left")
(I have made the binwidth narrower for better visual)
Another workaround is to introduce your own fuzziness, e.g. multiply the data by 1 plus slightly less than the machine zero (see eps below). In ggplot2 the fuzziness multiplies by 1e-7 (earlier versions) or 1e-8 (later versions).
CAUSE:
The problem appears clearly in ncount:
str(ggplot_build(p)$data[[1]])
## 'data.frame': 20 obs. of 17 variables:
## $ y : num 1 1 1 1 1 2 1 1 1 0 ...
## $ count : num 1 1 1 1 1 2 1 1 1 0 ...
## $ x : num -0.95 -0.85 -0.75 -0.65 -0.55 -0.45 -0.35 -0.25 -0.15 -0.05 ...
## $ xmin : num -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 ...
## $ xmax : num -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 ...
## $ density : num 0.476 0.476 0.476 0.476 0.476 ...
## $ ncount : num 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0 ...
## $ ndensity: num 1.05 1.05 1.05 1.05 1.05 2.1 1.05 1.05 1.05 0 ...
## $ PANEL : int 1 1 1 1 1 1 1 1 1 1 ...
## $ group : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ ymin : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ymax : num 1 1 1 1 1 2 1 1 1 0 ...
## $ colour : logi NA NA NA NA NA NA ...
## $ fill : chr "grey35" "grey35" "grey35" "grey35" ...
## $ size : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
## $ linetype: num 1 1 1 1 1 1 1 1 1 1 ...
## $ alpha : logi NA NA NA NA NA NA ...
ggplot_build(p)$data[[1]]$ncount
## [1] 0.5 0.5 0.5 0.5 0.5 1.0 0.5 0.5 0.5 0.0 1.0 0.5
## [13] 0.5 0.5 0.0 1.0 0.5 0.0 1.0 0.5
ROUNDING ERRORS?
Looks like:
df <- data.frame(var = as.integer(seq(-100,100,10)))
# eps <- 1.000000000000001 # on my system
eps <- 1+10*.Machine$double.eps
p <- ggplot(data = df, aes(x = eps*var/100)) +
geom_histogram(aes(y = ..count..),
binwidth = 0.05,
closed = "left")
p
(I have removed the boundary option altogether)
This behaviour appears some time after ggplot2_1.0.1. Looking at the source code, e.g. bin.R and stat-bin.r in https://github.com/hadley/ggplot2/blob/master/R, and tracing the computations of count leads to function bin_vector(), which contains the following lines:
bin_vector <- function(x, bins, weight = NULL, pad = FALSE) {
... STUFF HERE I HAVE DELETED FOR CLARITY ...
cut(x, bins$breaks, right = bins$right_closed,
include.lowest = TRUE)
... STUFF HERE I HAVE DELETED FOR CLARITY ...
}
By comparing the current versions of these functions with older ones, you should be able to find the reason for the different behaviour... to be continued...
SUMMING UP DEBUGGING
By "patching" the bin_vector function and printing the output to screen, it appears that:
bins$fuzzy correctly stores the fuzzy parameters
The non-fuzzy bins$breaks are used in the computations, but as far as I can see (and correct me if I'm wrong) the bins$fuzzy are not.
If I simply replace bins$breaks with bins$fuzzy at the top of bin_vector, the correct plot is returned. Not a proof of a bug, but a suggestion that perhaps more could be done to emulate the behaviour of previous versions of ggplot2.
At the top of bin_vector I expected to find a condition upon which to return either bins$breaks or bins$fuzzy. I think that's missing now.
PATCHING
To "patch" the bin_vector function, copy the function definition from the github source or, more conveniently, from the terminal, with:
ggplot2:::bin_vector
Modify it (patch it) and assign it into the namespace:
library("ggplot2")
bin_vector <- function (x, bins, weight = NULL, pad = FALSE)
{
... STUFF HERE I HAVE DELETED FOR CLARITY ...
## MY PATCH: Replace bins$breaks with bins$fuzzy
bin_idx <- cut(x, bins$fuzzy, right = bins$right_closed,
include.lowest = TRUE)
... STUFF HERE I HAVE DELETED FOR CLARITY ...
ggplot2:::bin_out(bin_count, bin_x, bin_widths)
## THIS IS THE PATCHED FUNCTION
}
assignInNamespace("bin_vector", bin_vector, ns = "ggplot2")
df <- data.frame(var = seq(-100,100,10)/100)
ggplot(data = df, aes(x = var)) + geom_histogram(aes(y = ..count..), binwidth = 0.05, boundary = 1, closed = "left")
Just to be clear, the code above is edited for clarity: the function has a lot of type-checking and other calculations which I have removed, but which you would need to patch the function. Before you run the patch, restart your R session or detach your currently loaded ggplot2.
OLD VERSIONS
The unexpected behaviour is NOT observed in versions 2.0.9.3 or 2.1.0.1 and appears to originate in the current release 2.2.0.1 (or perhaps the earlier 2.2.0.0, which gave me an error when I tried to call it).
To install and load an old version, say ggplot2_0.9.3, create a separate directory (no point in overwriting the current version), say ggplot2093:
URL <- "http://cran.r-project.org/src/contrib/Archive/ggplot2/ggplot2_0.9.3.tar.gz"
install.packages(URL, repos = NULL, type = "source",
lib = "~/R/testing/ggplot2093")
To load the old version, call it from your local directory:
library("ggplot2", lib.loc = "~/R/testing/ggplot2093")

how can i find the total frequency of a given range in a histogram?

I created a histogram for a simulation, and now I need to find the total number of instances where the x-variable is greater than a given value. Specifically, my data is correlation (ranging from -1 to 1, with bin size 0.05), and I want to find the percent of the events where the correlation is greater than 0.1. Finding the total number of events greater than 0.1 is fine, because it's an easy percent to compute.
library(psych)
library(lessR)
corrData=NULL
for (i in 1:1000){
x1 <- rnorm(mean=0, sd = 1, n=20)
x2 <- rnorm(mean=0, sd = 1, n=20)
data <- data.frame(x1,x2)
r <- with(data, cor(x1, x2))
corrData <- append(corrData,r)
}
describe(corrData)
hist <- hist(corrData, breaks=seq(-1,1,by=.05), main="N=20")
describe(hist) count(0.1, "N=20")

TRy something like this:
N=500
bh=hist(runif(N,-1,1))
#str(bh)
sum(bh$counts[bh$mids>=.1])/N

Look at what hist is actually giving you (see ?hist):
set.seed(10230)
x<-hist(2*runif(1000)-1)
> str(x)
List of 6
$ breaks : num [1:11] -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 ...
$ counts : int [1:10] 92 99 100 105 92 116 95 102 100 99
$ density : num [1:10] 0.46 0.495 0.5 0.525 0.46 0.58 0.475 0.51 0.5 0.495
$ mids : num [1:10] -0.9 -0.7 -0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9
$ xname : chr "2 * runif(1000) - 1"
$ equidist: logi TRUE
- attr(*, "class")= chr "histogram"
The breaks list item tells you the endpoints of the "catching" intervals. The counts item tells you the counts in the (one fewer) bins defined by these breaks.
So, to get as close as you can to what you want using only your hist object, you could do:
sum(x$counts[which(x$breaks>=.1)-1L])/sum(x$counts)
But, as #Frank said, this may be incorrect, particularly if the bin containing .1 does not have an endpoint at .1.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Splitting up a correlation matrix in R - r

Related

Error in quantmod::Lag when adding columns to a dataframe

Calculating and Appending Column Totals of Select Columns in a Data Frame in R

mlogit "row names supplied are of the wrong length", R

geom_histogram: wrong bins?

how can i find the total frequency of a given range in a histogram?

Categories

Resources