Rolling calculation of beta (linear regression slope) - r

I have a dataframe
> df
date comp ret mret
1 1/1/75 A 0.07 0.06
2 1/2/75 A 0.04 0.05
3 1/3/75 A 0.01 0.01
4 1/4/75 A -0.05 -0.04
5 1/5/75 A 0.05 0.05
6 1/6/75 A 0.04 0.04
7 1/7/75 A 0.07 0.08
8 1/8/75 A 0.01 0.00
9 1/9/75 A -0.02 -0.01
10 1/10/75 A -0.03 -0.01
11 1/11/75 A 0.01 0.02
12 1/12/75 A 0.03 0.04
13 1/1/75 B 0.09 0.06
14 1/2/75 B 0.07 0.05
15 1/3/75 B 0.04 0.01
16 1/4/75 B -0.02 -0.04
17 1/5/75 B 0.06 0.05
18 1/6/75 B 0.08 0.04
19 1/7/75 B 0.10 0.08
20 1/8/75 B 0.02 0.00
21 1/9/75 B -0.01 -0.01
22 1/10/75 B 0.01 -0.01
23 1/11/75 B -0.01 0.02
24 1/12/75 B 0.07 0.04
I want to calculate beta based on CAPM which is the slope between ret and mret (y-variable = ret, x-variable = mret). This means that I need to do a linear regression to calculate this beta.
The twist is then that I want to calculate the rolling beta over the past 5 months and at least 3 months for each company. To break it down:
I need to make the first beta calculation at line number 3 since this has 3 months of data. At line 4 I want to use the past 4 months of data when calculating beta, at line 5 I want the past 5 months of data, at line 6 I want the past 5 months of data again etc.
I want to group the calculation by the variable 'comp', meaning that at line 13 everything resets and the first calculation starts at line 15 and then follows the method mentioned above.
The results should end up looking like this:
date comp ret mret beta
1 1/1/75 A 0.07 0.06 NA
2 1/2/75 A 0.04 0.05 NA
3 1/3/75 A 0.01 0.01 1.0714
4 1/4/75 A -0.05 -0.04 1.1129
5 1/5/75 A 0.05 0.05 1.1098
6 1/6/75 A 0.04 0.04 1.0578
7 1/7/75 A 0.07 0.08 1.0193
8 1/8/75 A 0.01 0.00 0.9839
9 1/9/75 A -0.02 -0.01 0.9307
10 1/10/75 A -0.03 -0.01 1.0161
11 1/11/75 A 0.01 0.02 0.9895
12 1/12/75 A 0.03 0.04 1.0106
13 1/1/75 B 0.09 0.06 NA
14 1/2/75 B 0.07 0.05 NA
15 1/3/75 B 0.04 0.01 0.9286
16 1/4/75 B -0.02 -0.04 1.0484
17 1/5/75 B 0.06 0.05 0.9913
18 1/6/75 B 0.08 0.04 0.9932
19 1/7/75 B 0.10 0.08 0.9807
20 1/8/75 B 0.02 0.00 1.0046
21 1/9/75 B -0.01 -0.01 1.1496
22 1/10/75 B 0.01 -0.01 1.1613
23 1/11/75 B -0.01 0.02 1.0559
24 1/12/75 B 0.07 0.04 1.0426
Is there a way to do this in R?

Using df from the Note at the end, create a slope function and use rollapplyr to run it on a moving window. partial = 3 tells it to use partial windows at the beginning of at least 3 rows.
library(dplyr)
library(zoo)
slope <- function(m) {
ret <- m[, 1]
mret <- m[, 2]
cov(ret, mret) / var(mret)
}
df %>%
group_by(comp) %>%
mutate(beta = rollapplyr(cbind(ret, mret), 5, slope, partial = 3, fill = NA,
by.column = FALSE)) %>%
ungroup
giving:
# A tibble: 24 x 5
date comp ret mret beta
<chr> <chr> <dbl> <dbl> <dbl>
1 1/1/75 A 0.07 0.06 NA
2 1/2/75 A 0.04 0.05 NA
3 1/3/75 A 0.01 0.01 1.07
4 1/4/75 A -0.05 -0.04 1.11
5 1/5/75 A 0.05 0.05 1.11
6 1/6/75 A 0.04 0.04 1.06
7 1/7/75 A 0.07 0.08 1.02
8 1/8/75 A 0.01 0 0.984
9 1/9/75 A -0.02 -0.01 0.931
10 1/10/75 A -0.03 -0.01 1.02
# ... with 14 more rows
Note
Input in reproducible form:
Lines <- "date comp ret mret
1 1/1/75 A 0.07 0.06
2 1/2/75 A 0.04 0.05
3 1/3/75 A 0.01 0.01
4 1/4/75 A -0.05 -0.04
5 1/5/75 A 0.05 0.05
6 1/6/75 A 0.04 0.04
7 1/7/75 A 0.07 0.08
8 1/8/75 A 0.01 0.00
9 1/9/75 A -0.02 -0.01
10 1/10/75 A -0.03 -0.01
11 1/11/75 A 0.01 0.02
12 1/12/75 A 0.03 0.04
13 1/1/75 B 0.09 0.06
14 1/2/75 B 0.07 0.05
15 1/3/75 B 0.04 0.01
16 1/4/75 B -0.02 -0.04
17 1/5/75 B 0.06 0.05
18 1/6/75 B 0.08 0.04
19 1/7/75 B 0.10 0.08
20 1/8/75 B 0.02 0.00
21 1/9/75 B -0.01 -0.01
22 1/10/75 B 0.01 -0.01
23 1/11/75 B -0.01 0.02
24 1/12/75 B 0.07 0.04"
df <- read.table(text = Lines)

Related

R: need help matching up table rows and getting differences

I have chromatographic data in a table organized by peak position and integration value of various samples. All samples in the table have a repeated measurement as well with a different sample log number.
What I'm interested in, is the repeatability of the measurements of the various peaks. The measure for that would be the difference in peak integration = 0 for each sample.
The data
Sample Log1 Log2 Peak1 Peak2 Peak3 Peak4 Peak5
A 100 104 0.20 0.80 0.30 0.00 0.00
B 101 106 0.25 0.73 0.29 0.01 0.04
C 102 103 0.20 0.80 0.30 0.00 0.07
C 103 102 0.22 0.81 0.31 0.04 0.00
A 104 100 0.21 0.70 0.33 0.00 0.10
B 106 101 0.20 0.73 0.37 0.00 0.03
with Log1 is the original sample log number, and Log2 is the repeat log number.
How can I construct a new variable for every peak (being the difference PeakX_Log1 - PeakX_Log2)?
Mind that in my example I only have 5 peaks. The real-life situation is a complex mixture involving >20 peaks, so very hard to do it by hand.
If you will only have two values for each sample, something like this could work:
df <- data.table::fread(
"Sample Log1 Log2 Peak1 Peak2 Peak3 Peak4 Peak5
A 100 104 0.20 0.80 0.30 0.00 0.00
B 101 106 0.25 0.73 0.29 0.01 0.04
C 102 103 0.20 0.80 0.30 0.00 0.07
C 103 102 0.22 0.81 0.31 0.04 0.00
A 104 100 0.21 0.70 0.33 0.00 0.10
B 106 101 0.20 0.73 0.37 0.00 0.03"
)
library(tidyverse)
new_df <- df %>%
mutate(Log = ifelse(Log1 < Log2,"Log1","Log2")) %>%
select(-Log1,-Log2) %>%
pivot_longer(cols = starts_with("Peak"),names_to = "Peak") %>%
pivot_wider(values_from = value, names_from = Log) %>%
mutate(Variation = Log1 - Log2)
new_df
# A tibble: 15 × 5
Sample Peak Log1 Log2 Variation
<chr> <chr> <dbl> <dbl> <dbl>
1 A Peak1 0.2 0.21 -0.0100
2 A Peak2 0.8 0.7 0.100
3 A Peak3 0.3 0.33 -0.0300
4 A Peak4 0 0 0
5 A Peak5 0 0.1 -0.1
6 B Peak1 0.25 0.2 0.05
7 B Peak2 0.73 0.73 0
8 B Peak3 0.29 0.37 -0.08
9 B Peak4 0.01 0 0.01
10 B Peak5 0.04 0.03 0.01
11 C Peak1 0.2 0.22 -0.0200
12 C Peak2 0.8 0.81 -0.0100
13 C Peak3 0.3 0.31 -0.0100
14 C Peak4 0 0.04 -0.04
15 C Peak5 0.07 0 0.07

How do i create a 3d surface plot in R If I have a dataframe of 3 columns?

Here are the first 20 rows of my dataframe:
x y z
1 0.50 0.50 48530.98
2 0.50 0.51 49029.34
3 0.50 0.52 49576.12
4 0.50 0.53 50161.22
5 0.50 0.54 50752.05
6 0.50 0.55 51354.43
7 0.50 0.56 51965.09
8 0.50 0.57 38756.51
9 0.50 0.58 39262.34
10 0.50 0.59 39783.68
11 0.51 0.60 41052.09
12 0.51 0.61 41447.51
13 0.51 0.62 26972.85
14 0.51 0.63 27134.74
15 0.51 0.64 27297.85
16 0.51 0.65 27462.82
17 0.51 0.66 27632.45
18 0.51 0.67 27806.77
19 0.51 0.68 27988.12
20 0.51 0.69 25514.42
I need to create a 3d surface plot to view it.
The best would be one where I can rotate it around angles to view it from all perspectives.
Thanks.
You can use plotly to create a 3d surface plot. Use xtabs to turn your data into a suitable matrix
library(plotly)
plot_ly(z = ~xtabs(z ~ x + y, data = df)) %>% add_surface()
Sample data
df <- read.table(text =
" x y z
1 0.50 0.50 48530.98
2 0.50 0.51 49029.34
3 0.50 0.52 49576.12
4 0.50 0.53 50161.22
5 0.50 0.54 50752.05
6 0.50 0.55 51354.43
7 0.50 0.56 51965.09
8 0.50 0.57 38756.51
9 0.50 0.58 39262.34
10 0.50 0.59 39783.68
11 0.51 0.60 41052.09
12 0.51 0.61 41447.51
13 0.51 0.62 26972.85
14 0.51 0.63 27134.74
15 0.51 0.64 27297.85
16 0.51 0.65 27462.82
17 0.51 0.66 27632.45
18 0.51 0.67 27806.77
19 0.51 0.68 27988.12
20 0.51 0.69 25514.42", header = T)

Applying a custom function repeatedly to same dataframe using purrr

Suppose I have a dataframe as follows:
df <- data.frame(
alpha = 0:20,
beta = 30:50,
gamma = 100:120
)
I have a custom function that makes new columns. (Note, my actual function is a lot more complex and can't be vectorized without a custom function, so please ignore the substance of the transformation here.) For example:
newfun <- function(var = NULL) {
newname <- paste0(var, "NEW")
df[[newname]] <- df[[var]]/100
return(df)
}
I want to apply this over many columns of the dataset repeatedly and have the dataset "build up." This happens just fine when I do the following:
df <- newfun("alpha")
df <- newfun("beta")
df <- newfun("gamma")
Obviously this is redundant and a case for map. But when I do the following I get back a list of dataframes, which is not what I want:
df <- data.frame(
alpha = 0:20,
beta = 30:50,
gamma = 100:120
)
out <- c("alpha", "beta", "gamma") %>%
map(function(x) newfun(x))
How can I iterate over a vector of column names AND see the changes repeatedly applied to the same dataframe?
Writing the function to reach outside of its scope to find some df is both risky and will bite you, especially when you see something like:
df[['a']] <- 2
# Error in df[["a"]] <- 2 : object of type 'closure' is not subsettable
You will get this error when it doesn't find your variable named df, and instead finds the base function named df. Two morals from this discovery:
While I admit to using df myself, it's generally bad practice to name variables the same as R functions (especially from base); and
Scope-breach is sloppy and renders a workflow unreproducible and often difficult to troubleshoot problems or changes.
To remedy this, and since your function relies on knowing what the old/new variable names are or should be, I think pmap or base R Map may work better. Further, I suggest that you name the new variables outside of the function, making it "data-only".
myfunc <- function(x) x/100
setNames(lapply(dat[,cols], myfunc), paste0("new", cols))
# $newalpha
# [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17
# [19] 0.18 0.19 0.20
# $newbeta
# [1] 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47
# [19] 0.48 0.49 0.50
# $newgamma
# [1] 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17
# [19] 1.18 1.19 1.20
From here, we just need to column-bind (cbind) it:
cbind(dat, setNames(lapply(dat[,cols], myfunc), paste0("new", cols)))
# alpha beta gamma newalpha newbeta newgamma
# 1 0 30 100 0.00 0.30 1.00
# 2 1 31 101 0.01 0.31 1.01
# 3 2 32 102 0.02 0.32 1.02
# 4 3 33 103 0.03 0.33 1.03
# 5 4 34 104 0.04 0.34 1.04
# ...
Special note: if you plan on doing this iteratively (repeatedly), it is generally bad to iteratively add rows to frames; while I know this is a bad idea for adding rows, I suspect (without proof at the moment) that doing the same with columns is also bad. For that reason, if you do this a lot, consider using do.call(cbind, c(list(dat), ...)) where ... is the list of things to add. This results in a single call to cbind and therefore only a single memory-copy of the original dat. (Contrast that with iteratively calling the *bind functions which make a complete copy with each pass, scaling poorly.)
additions <- lapply(1:3, function(i) setNames(lapply(dat[,cols], myfunc), paste0("new", i, cols)))
str(additions)
# List of 3
# $ :List of 3
# ..$ new1alpha: num [1:21] 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
# ..$ new1beta : num [1:21] 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 ...
# ..$ new1gamma: num [1:21] 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 ...
# $ :List of 3
# ..$ new2alpha: num [1:21] 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
# ..$ new2beta : num [1:21] 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 ...
# ..$ new2gamma: num [1:21] 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 ...
# $ :List of 3
# ..$ new3alpha: num [1:21] 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
# ..$ new3beta : num [1:21] 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 ...
# ..$ new3gamma: num [1:21] 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 ...
do.call(cbind, c(list(dat), additions))
# alpha beta gamma new1alpha new1beta new1gamma new2alpha new2beta new2gamma new3alpha new3beta new3gamma
# 1 0 30 100 0.00 0.30 1.00 0.00 0.30 1.00 0.00 0.30 1.00
# 2 1 31 101 0.01 0.31 1.01 0.01 0.31 1.01 0.01 0.31 1.01
# 3 2 32 102 0.02 0.32 1.02 0.02 0.32 1.02 0.02 0.32 1.02
# 4 3 33 103 0.03 0.33 1.03 0.03 0.33 1.03 0.03 0.33 1.03
# 5 4 34 104 0.04 0.34 1.04 0.04 0.34 1.04 0.04 0.34 1.04
# 6 5 35 105 0.05 0.35 1.05 0.05 0.35 1.05 0.05 0.35 1.05
# ...
An alternative approach is to change your function to only return a vector:
newfun2 <- function(var = NULL) {
df[[var]] / 100
}
newfun2('alpha')
# [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13
#[15] 0.14 0.15 0.16 0.17 0.18 0.19 0.20
Then, using base, you can use lapply() to loop through your list of functions to do:
cols <- c("alpha", "beta", "gamma")
df[, paste0(cols, 'NEW')] <- lapply(cols, newfun2)
#or
#df[, paste0(cols, 'NEW')] <- purrr::map(cols, newfun2)
df
alpha beta gamma alphaNEW betaNEW gammaNEW
1 0 30 100 0.00 0.30 1.00
2 1 31 101 0.01 0.31 1.01
3 2 32 102 0.02 0.32 1.02
4 3 33 103 0.03 0.33 1.03
5 4 34 104 0.04 0.34 1.04
6 5 35 105 0.05 0.35 1.05
7 6 36 106 0.06 0.36 1.06
8 7 37 107 0.07 0.37 1.07
9 8 38 108 0.08 0.38 1.08
10 9 39 109 0.09 0.39 1.09
11 10 40 110 0.10 0.40 1.10
12 11 41 111 0.11 0.41 1.11
13 12 42 112 0.12 0.42 1.12
14 13 43 113 0.13 0.43 1.13
15 14 44 114 0.14 0.44 1.14
16 15 45 115 0.15 0.45 1.15
17 16 46 116 0.16 0.46 1.16
18 17 47 117 0.17 0.47 1.17
19 18 48 118 0.18 0.48 1.18
20 19 49 119 0.19 0.49 1.19
21 20 50 120 0.20 0.50 1.20
Based on the way you wrote your function, a for loop that assign the result of newfun to df repeatedly works pretty well.
vars <- names(df)
for (i in vars){
df <- newfun(i)
}
df
# alpha beta gamma alphaNEW betaNEW gammaNEW
# 1 0 30 100 0.00 0.30 1.00
# 2 1 31 101 0.01 0.31 1.01
# 3 2 32 102 0.02 0.32 1.02
# 4 3 33 103 0.03 0.33 1.03
# 5 4 34 104 0.04 0.34 1.04
# 6 5 35 105 0.05 0.35 1.05
# 7 6 36 106 0.06 0.36 1.06
# 8 7 37 107 0.07 0.37 1.07
# 9 8 38 108 0.08 0.38 1.08
# 10 9 39 109 0.09 0.39 1.09
# 11 10 40 110 0.10 0.40 1.10
# 12 11 41 111 0.11 0.41 1.11
# 13 12 42 112 0.12 0.42 1.12
# 14 13 43 113 0.13 0.43 1.13
# 15 14 44 114 0.14 0.44 1.14
# 16 15 45 115 0.15 0.45 1.15
# 17 16 46 116 0.16 0.46 1.16
# 18 17 47 117 0.17 0.47 1.17
# 19 18 48 118 0.18 0.48 1.18
# 20 19 49 119 0.19 0.49 1.19
# 21 20 50 120 0.20 0.50 1.20

Repeat data.frame N times with adding column

I have the following data frame and I want to repeat it N times
dc <- read.table(text = "from 1 2 3 4 5
1 0.01 0.02 0.03 0.04 0.05
2 0.06 0.07 0.08 0.09 0.10
3 0.11 0.12 0.13 0.14 0.15
4 0.16 0.17 0.18 0.19 0.20
5 0.21 0.22 0.23 0.24 0.25", header = TRUE)
n<-20
ddr <- NA
for(i in 1:n) {
ddr <- rbind(ddr, cbind(dc,i))
}
As a result, I would like to receive:
from X1 X2 X3 X4 X5 i
1 0.01 0.02 0.03 0.04 0.05 1
2 0.06 0.07 0.08 0.09 0.10 1
3 0.11 0.12 0.13 0.14 0.15 1
4 0.16 0.17 0.18 0.19 0.20 1
5 0.21 0.22 0.23 0.24 0.25 1
1 0.01 0.02 0.03 0.04 0.05 2
2 0.06 0.07 0.08 0.09 0.10 2
3 0.11 0.12 0.13 0.14 0.15 2
4 0.16 0.17 0.18 0.19 0.20 2
5 0.21 0.22 0.23 0.24 0.25 2
.............................
1 0.01 0.02 0.03 0.04 0.05 20
2 0.06 0.07 0.08 0.09 0.10 20
3 0.11 0.12 0.13 0.14 0.15 20
4 0.16 0.17 0.18 0.19 0.20 20
5 0.21 0.22 0.23 0.24 0.25 20
The matrix must be repeated N times, and repeat number is added.
Is there a correct solution (easy function to do this in R) to this issue? In my case if the ddr is not declared (ddr<-NA), the script does not work. Thanks!
You can use rep() to replicate the row indexes, and also to create the repeat number column.
cbind(dc[rep(1:nrow(dc), n), ], i = rep(1:n, each = nrow(dc)))
Let's break it down:
dc[rep(1:nrow(dc), n), ] uses replicated row indexes in the i value of row indexing of [ for data frames
rep(1:n, each = nrow(dc)) replicates a sequence the length of the n value nrow(dc) times each
cbind(...) combines the two into a single data frame
As #HubertL points out in the comments, this can be further simplified to
cbind(dc, i = rep(1:n, each = nrow(dc)))
thanks to the magic of recycling. Please go give him a vote.
Here is also a more intuitive way, about identical in speed to the other top answer:
n <- 3
data.frame(df,i=rep(1:n,ea=NROW(df)))
Output (repeated 3x):
from X1 X2 X3 X4 X5 i
1 1 0.01 0.02 0.03 0.04 0.05 1
2 2 0.06 0.07 0.08 0.09 0.10 1
3 3 0.11 0.12 0.13 0.14 0.15 1
4 4 0.16 0.17 0.18 0.19 0.20 1
5 5 0.21 0.22 0.23 0.24 0.25 1
6 1 0.01 0.02 0.03 0.04 0.05 2
7 2 0.06 0.07 0.08 0.09 0.10 2
8 3 0.11 0.12 0.13 0.14 0.15 2
9 4 0.16 0.17 0.18 0.19 0.20 2
10 5 0.21 0.22 0.23 0.24 0.25 2
11 1 0.01 0.02 0.03 0.04 0.05 3
12 2 0.06 0.07 0.08 0.09 0.10 3
13 3 0.11 0.12 0.13 0.14 0.15 3
14 4 0.16 0.17 0.18 0.19 0.20 3
15 5 0.21 0.22 0.23 0.24 0.25 3
EDIT: Top Answer Speed Test
This test was scaled up to n=1e+05, iterations=100:
func1 <- function(){
data.frame(df,i=rep(1:n,ea=NROW(df)))
}
func2 <- function(){
cbind(dc, i = rep(1:n, each = nrow(dc)))
}
func3 <- function(){
cbind(dc[rep(1:nrow(dc), n), ], i = rep(1:n, each = nrow(dc)))
}
microbenchmark::microbenchmark(
func1(),func2(),func3())
Unit: milliseconds
expr min lq mean median uq max neval cld
func1() 15.58709 21.69143 28.62695 22.01692 23.85648 117.9012 100 a
func2() 15.99023 21.59375 28.37328 22.18298 23.99953 136.1209 100 a
func3() 414.18741 436.51732 473.14571 453.26099 498.21576 666.8515 100 b

R dividing dataset into ranged bins?

I am having some problems sorting my dataset into bins, that based on the numeric value of the data value. I tried doing it with the function shingle from the lattice which seem to split it accurately.
I can't seem to extract the desired output which is the knowledge how the data is divided into the predefined bins. I seem only able to print it.
bin_interval = matrix(c(0.38,0.42,0.46,0.50,0.54,0.58,0.62,0.66,0.70,0.74,0.78,0.82,0.86,0.90,0.94,0.98,
0.40,0.44,0.48,0.52,0.56,0.60,0.64,0.68,0.72,0.76,0.80,0.84,0.88,0.92,0.96,1.0),
ncol = 2, nrow = 16)
bin_1 = shingle(data_1,intervals = bin_interval)
How do i extract the intervals which is outputted by the shingle function, and not only print it...
the intervals being the output:
Intervals:
min max count
1 0.38 0.40 0
2 0.42 0.44 6
3 0.46 0.48 46
4 0.50 0.52 251
5 0.54 0.56 697
6 0.58 0.60 1062
7 0.62 0.64 1215
8 0.66 0.68 1227
9 0.70 0.72 1231
10 0.74 0.76 1293
11 0.78 0.80 1330
12 0.82 0.84 1739
13 0.86 0.88 2454
14 0.90 0.92 3048
15 0.94 0.96 8936
16 0.98 1.00 71446
As an variable, that can be fed to another function.
The shingle() function returns the values using attributes().
The levels are specifically given by attr(bin_1,"levels").
So:
set.seed(1337)
data_1 = runif(100)
bin_interval = matrix(c(0.38,0.42,0.46,0.50,0.54,0.58,0.62,0.66,0.70,0.74,0.78,0.82,0.86,0.90,0.94,0.98,
0.40,0.44,0.48,0.52,0.56,0.60,0.64,0.68,0.72,0.76,0.80,0.84,0.88,0.92,0.96,1.0),
ncol = 2, nrow = 16)
bin_1 = shingle(data_1,intervals = bin_interval)
attr(bin_1,"levels")
This gives:
[,1] [,2]
[1,] 0.38 0.40
[2,] 0.42 0.44
[3,] 0.46 0.48
[4,] 0.50 0.52
[5,] 0.54 0.56
[6,] 0.58 0.60
[7,] 0.62 0.64
[8,] 0.66 0.68
[9,] 0.70 0.72
[10,] 0.74 0.76
[11,] 0.78 0.80
[12,] 0.82 0.84
[13,] 0.86 0.88
[14,] 0.90 0.92
[15,] 0.94 0.96
[16,] 0.98 1.00
Edit
The count information for each interval is only computed within the print.shingle method. Thus, you would need to run the following code:
count.shingle = function(x){
l <- levels(x)
n <- nlevels(x)
int <- data.frame(min = numeric(n), max = numeric(n),
count = numeric(n))
for (i in 1:n) {
int$min[i] <- l[[i]][1]
int$max[i] <- l[[i]][2]
int$count[i] <- length(x[x >= l[[i]][1] & x <= l[[i]][2]])
}
int
}
a = count.shingle(bin_1)
This gives:
> a
min max count
1 0.38 0.40 0
2 0.42 0.44 1
3 0.46 0.48 3
4 0.50 0.52 1
5 0.54 0.56 2
6 0.58 0.60 2
7 0.62 0.64 2
8 0.66 0.68 4
9 0.70 0.72 1
10 0.74 0.76 3
11 0.78 0.80 2
12 0.82 0.84 2
13 0.86 0.88 5
14 0.90 0.92 1
15 0.94 0.96 1
16 0.98 1.00 2
where a$min is lower range, a$max is upper range, and a$count is the number within the bins.

Resources