Merging two columns in R considering other corresponding columns? - r

I am using R and have imported data from 2 excel sheets containing 3 columns each. The first matrix contains 3 columns (1-3) and 380 rows and the second matrix contains 3 columns and 365 rows. Columns 2 and 3 are always values corresponding to the first column. I would like to merge the first columns of the two matrices in a single column such that after merging the identical values in the two columns are just replaced (they should not be in individual rows one after the other) and the column is arranged in an ascending order. Also, the main condition should be that columns 2,3 of each matrix (that are values for column 1) should get rearranged correspondingly, but should not get merged. If there are some values in the first column (generated after merging) whose value is not present in the corresponding column then it should be replaced by zero. I have done merging and rearranging for the first column, but I am not able to make the corresponding changes in the other columns. How should I go around?
Here are the two matrices:
Matrix A
92.6691 1076.5 0.48
93.324 1110.1 0.5
96.9597 1123.3 0.5
97.7539 968.4 0.43
98.992 1006.1 0.45
99.0061 5584.6 2.49
101.0243 1555.7 0.69
101.0606 12821.2 5.72
102.1221 972 0.43
Matrix B
95.4466 974.2 0.43
99.0062 4721.9 2.06
100.0321 1040.1 0.45
101.0241 2115.8 0.92
101.0606 15202.8 6.64
102.2736 945.3 0.41
108.4273 1059.7 0.46
115.0397 25106.3 10.96
115.0761 54740 23.9
After merging, the results should be a single matrix:
Column 1 - Merged 1st columns of matrices A and B (ascending order)
Column 2 - Rearranged based on change in row positions of column 1 in matrix A
Column 3 - Rearranged based on change in row positions of column 1 in matrix A
Column 4 - Rearranged based on change in row positions of column 1 in matrix B
Column 5 - Rearranged based on change in row positions of column 1 in matrix B
Here is the resulting matrix:
92.6691 1076.5 0.48 0 0
93.324 1110.1 0.5 0 0
95.4466 0 0 974.2 0.43
96.9597 1123.3 0.5 0 0
97.7539 968.4 0.43 0 0
98.992 1006.1 0.45 0 0
99.0061 5584.6 2.49 0 0
99.0062 0 0 4721.9 2.06
100.0321 0 0 1040.1 0.45
101.0241 0 0 2115.8 0.92
101.0243 1555.7 0.69 0 0
101.0606 12821.2 5.72 15202.8 6.64
102.1221 972 0.43 0 0
102.2736 0 0 945.3 0.41
108.4273 0 0 1059.7 0.46
115.0397 0 0 25106.3 10.96
115.0761 0 0 54740 23.9
Note that in matrices A and B, the value 101.0606 is common.

This can be done easily with merge().
# read your data:
read.table(
t="92.6691 1076.5 0.48
93.324 1110.1 0.5
96.9597 1123.3 0.5
97.7539 968.4 0.43
98.992 1006.1 0.45
99.0061 5584.6 2.49
101.0243 1555.7 0.69
101.0606 12821.2 5.72
102.1221 972 0.43") -> M1
read.table(
t="95.4466 974.2 0.43
99.0062 4721.9 2.06
100.0321 1040.1 0.45
101.0241 2115.8 0.92
101.0606 15202.8 6.64
102.2736 945.3 0.41
108.4273 1059.7 0.46
115.0397 25106.3 10.96
115.0761 54740 23.90") -> M2
# merge data -- note `all = TRUE`
result <- merge(M1,M2,by = "V1", all = TRUE)
# replace na with 0
result[is.na(result)] <- 0
result
# V1 V2.x V3.x V2.y V3.y
# 1 92.67 1076.5 0.48 0.0 0.00
# 2 93.32 1110.1 0.50 0.0 0.00
# 3 95.45 0.0 0.00 974.2 0.43
# 4 96.96 1123.3 0.50 0.0 0.00
# 5 97.75 968.4 0.43 0.0 0.00
# 6 98.99 1006.1 0.45 0.0 0.00
# 7 99.01 5584.6 2.49 0.0 0.00
# 8 99.01 0.0 0.00 4721.9 2.06
# 9 100.03 0.0 0.00 1040.1 0.45
# 10 101.02 0.0 0.00 2115.8 0.92
# 11 101.02 1555.7 0.69 0.0 0.00
# 12 101.06 12821.2 5.72 15202.8 6.64
# 13 102.12 972.0 0.43 0.0 0.00
# 14 102.27 0.0 0.00 945.3 0.41
# 15 108.43 0.0 0.00 1059.7 0.46
# 16 115.04 0.0 0.00 25106.3 10.96
# 17 115.08 0.0 0.00 54740.0 23.90

df3 <- merge(df1,df2,all.x=T,all.y=T)
df3[is.na(df3)] <- 0
x a b c d
1 92.6691 1076.5 0.48 0.0 0.00
2 93.3240 1110.1 0.50 0.0 0.00
3 95.4466 0.0 0.00 974.2 0.43
4 96.9597 1123.3 0.50 0.0 0.00
5 97.7539 968.4 0.43 0.0 0.00
6 98.9920 1006.1 0.45 0.0 0.00
7 99.0061 5584.6 2.49 0.0 0.00
8 99.0062 0.0 0.00 4721.9 2.06
9 100.0321 0.0 0.00 1040.1 0.45
10 101.0241 0.0 0.00 2115.8 0.92
11 101.0243 1555.7 0.69 0.0 0.00
12 101.0606 12821.2 5.72 15202.8 6.64
13 102.1221 972.0 0.43 0.0 0.00
14 102.2736 0.0 0.00 945.3 0.41
15 108.4273 0.0 0.00 1059.7 0.46
16 115.0397 0.0 0.00 25106.3 10.96
17 115.0761 0.0 0.00 54740.0 23.90
data
df1
x a b
92.6691 1076.5 0.48
93.324 1110.1 0.5
96.9597 1123.3 0.5
97.7539 968.4 0.43
98.992 1006.1 0.45
99.0061 5584.6 2.49
101.0243 1555.7 0.69
101.0606 12821.2 5.72
102.1221 972 0.43
df2
x c d
95.4466 974.2 0.43
99.0062 4721.9 2.06
100.0321 1040.1 0.45
101.0241 2115.8 0.92
101.0606 15202.8 6.64
102.2736 945.3 0.41
108.4273 1059.7 0.46
115.0397 25106.3 10.96
115.0761 54740 23.9

I generated some data myself you can replace them with yours. Here you will need to merge two files; first vertically and then horizontally. Finally, order them according to first column.
set.seed(42)
# Load data 1
dat1<- as.data.frame(matrix(rexp(30), 10))
# Inly keep unique rows
dat1 <- unique(dat1)
set.seed(24)
# Load data 2
dat2 <-as.data.frame(matrix(rexp(30), 10))
# Inly keep unique rows
dat2 <- unique(dat2)
# Copy it in temp
dat2n <-dat2
# sed second and third column to 0s
dat2n[,2:3] <- 0
# Concatenate them and keep only unique
dat <- rbind(dat1,dat2n)
# Merge dat and dat2 with respect to column 1 and keep everything in dat
fin.dat <- merge(dat, dat2, by="V1", all.x = TRUE)
# Finally order the dataframe
fin.dat <- fin.dat[order(fin.dat[,1], decreasing = FALSE),]
# Replace NA with zeros
fin.dat[is.na(fin.dat)] <- 0

Related

R: need help matching up table rows and getting differences

I have chromatographic data in a table organized by peak position and integration value of various samples. All samples in the table have a repeated measurement as well with a different sample log number.
What I'm interested in, is the repeatability of the measurements of the various peaks. The measure for that would be the difference in peak integration = 0 for each sample.
The data
Sample Log1 Log2 Peak1 Peak2 Peak3 Peak4 Peak5
A 100 104 0.20 0.80 0.30 0.00 0.00
B 101 106 0.25 0.73 0.29 0.01 0.04
C 102 103 0.20 0.80 0.30 0.00 0.07
C 103 102 0.22 0.81 0.31 0.04 0.00
A 104 100 0.21 0.70 0.33 0.00 0.10
B 106 101 0.20 0.73 0.37 0.00 0.03
with Log1 is the original sample log number, and Log2 is the repeat log number.
How can I construct a new variable for every peak (being the difference PeakX_Log1 - PeakX_Log2)?
Mind that in my example I only have 5 peaks. The real-life situation is a complex mixture involving >20 peaks, so very hard to do it by hand.
If you will only have two values for each sample, something like this could work:
df <- data.table::fread(
"Sample Log1 Log2 Peak1 Peak2 Peak3 Peak4 Peak5
A 100 104 0.20 0.80 0.30 0.00 0.00
B 101 106 0.25 0.73 0.29 0.01 0.04
C 102 103 0.20 0.80 0.30 0.00 0.07
C 103 102 0.22 0.81 0.31 0.04 0.00
A 104 100 0.21 0.70 0.33 0.00 0.10
B 106 101 0.20 0.73 0.37 0.00 0.03"
)
library(tidyverse)
new_df <- df %>%
mutate(Log = ifelse(Log1 < Log2,"Log1","Log2")) %>%
select(-Log1,-Log2) %>%
pivot_longer(cols = starts_with("Peak"),names_to = "Peak") %>%
pivot_wider(values_from = value, names_from = Log) %>%
mutate(Variation = Log1 - Log2)
new_df
# A tibble: 15 × 5
Sample Peak Log1 Log2 Variation
<chr> <chr> <dbl> <dbl> <dbl>
1 A Peak1 0.2 0.21 -0.0100
2 A Peak2 0.8 0.7 0.100
3 A Peak3 0.3 0.33 -0.0300
4 A Peak4 0 0 0
5 A Peak5 0 0.1 -0.1
6 B Peak1 0.25 0.2 0.05
7 B Peak2 0.73 0.73 0
8 B Peak3 0.29 0.37 -0.08
9 B Peak4 0.01 0 0.01
10 B Peak5 0.04 0.03 0.01
11 C Peak1 0.2 0.22 -0.0200
12 C Peak2 0.8 0.81 -0.0100
13 C Peak3 0.3 0.31 -0.0100
14 C Peak4 0 0.04 -0.04
15 C Peak5 0.07 0 0.07

Applying a custom function repeatedly to same dataframe using purrr

Suppose I have a dataframe as follows:
df <- data.frame(
alpha = 0:20,
beta = 30:50,
gamma = 100:120
)
I have a custom function that makes new columns. (Note, my actual function is a lot more complex and can't be vectorized without a custom function, so please ignore the substance of the transformation here.) For example:
newfun <- function(var = NULL) {
newname <- paste0(var, "NEW")
df[[newname]] <- df[[var]]/100
return(df)
}
I want to apply this over many columns of the dataset repeatedly and have the dataset "build up." This happens just fine when I do the following:
df <- newfun("alpha")
df <- newfun("beta")
df <- newfun("gamma")
Obviously this is redundant and a case for map. But when I do the following I get back a list of dataframes, which is not what I want:
df <- data.frame(
alpha = 0:20,
beta = 30:50,
gamma = 100:120
)
out <- c("alpha", "beta", "gamma") %>%
map(function(x) newfun(x))
How can I iterate over a vector of column names AND see the changes repeatedly applied to the same dataframe?
Writing the function to reach outside of its scope to find some df is both risky and will bite you, especially when you see something like:
df[['a']] <- 2
# Error in df[["a"]] <- 2 : object of type 'closure' is not subsettable
You will get this error when it doesn't find your variable named df, and instead finds the base function named df. Two morals from this discovery:
While I admit to using df myself, it's generally bad practice to name variables the same as R functions (especially from base); and
Scope-breach is sloppy and renders a workflow unreproducible and often difficult to troubleshoot problems or changes.
To remedy this, and since your function relies on knowing what the old/new variable names are or should be, I think pmap or base R Map may work better. Further, I suggest that you name the new variables outside of the function, making it "data-only".
myfunc <- function(x) x/100
setNames(lapply(dat[,cols], myfunc), paste0("new", cols))
# $newalpha
# [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17
# [19] 0.18 0.19 0.20
# $newbeta
# [1] 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47
# [19] 0.48 0.49 0.50
# $newgamma
# [1] 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17
# [19] 1.18 1.19 1.20
From here, we just need to column-bind (cbind) it:
cbind(dat, setNames(lapply(dat[,cols], myfunc), paste0("new", cols)))
# alpha beta gamma newalpha newbeta newgamma
# 1 0 30 100 0.00 0.30 1.00
# 2 1 31 101 0.01 0.31 1.01
# 3 2 32 102 0.02 0.32 1.02
# 4 3 33 103 0.03 0.33 1.03
# 5 4 34 104 0.04 0.34 1.04
# ...
Special note: if you plan on doing this iteratively (repeatedly), it is generally bad to iteratively add rows to frames; while I know this is a bad idea for adding rows, I suspect (without proof at the moment) that doing the same with columns is also bad. For that reason, if you do this a lot, consider using do.call(cbind, c(list(dat), ...)) where ... is the list of things to add. This results in a single call to cbind and therefore only a single memory-copy of the original dat. (Contrast that with iteratively calling the *bind functions which make a complete copy with each pass, scaling poorly.)
additions <- lapply(1:3, function(i) setNames(lapply(dat[,cols], myfunc), paste0("new", i, cols)))
str(additions)
# List of 3
# $ :List of 3
# ..$ new1alpha: num [1:21] 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
# ..$ new1beta : num [1:21] 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 ...
# ..$ new1gamma: num [1:21] 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 ...
# $ :List of 3
# ..$ new2alpha: num [1:21] 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
# ..$ new2beta : num [1:21] 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 ...
# ..$ new2gamma: num [1:21] 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 ...
# $ :List of 3
# ..$ new3alpha: num [1:21] 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
# ..$ new3beta : num [1:21] 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 ...
# ..$ new3gamma: num [1:21] 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 ...
do.call(cbind, c(list(dat), additions))
# alpha beta gamma new1alpha new1beta new1gamma new2alpha new2beta new2gamma new3alpha new3beta new3gamma
# 1 0 30 100 0.00 0.30 1.00 0.00 0.30 1.00 0.00 0.30 1.00
# 2 1 31 101 0.01 0.31 1.01 0.01 0.31 1.01 0.01 0.31 1.01
# 3 2 32 102 0.02 0.32 1.02 0.02 0.32 1.02 0.02 0.32 1.02
# 4 3 33 103 0.03 0.33 1.03 0.03 0.33 1.03 0.03 0.33 1.03
# 5 4 34 104 0.04 0.34 1.04 0.04 0.34 1.04 0.04 0.34 1.04
# 6 5 35 105 0.05 0.35 1.05 0.05 0.35 1.05 0.05 0.35 1.05
# ...
An alternative approach is to change your function to only return a vector:
newfun2 <- function(var = NULL) {
df[[var]] / 100
}
newfun2('alpha')
# [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13
#[15] 0.14 0.15 0.16 0.17 0.18 0.19 0.20
Then, using base, you can use lapply() to loop through your list of functions to do:
cols <- c("alpha", "beta", "gamma")
df[, paste0(cols, 'NEW')] <- lapply(cols, newfun2)
#or
#df[, paste0(cols, 'NEW')] <- purrr::map(cols, newfun2)
df
alpha beta gamma alphaNEW betaNEW gammaNEW
1 0 30 100 0.00 0.30 1.00
2 1 31 101 0.01 0.31 1.01
3 2 32 102 0.02 0.32 1.02
4 3 33 103 0.03 0.33 1.03
5 4 34 104 0.04 0.34 1.04
6 5 35 105 0.05 0.35 1.05
7 6 36 106 0.06 0.36 1.06
8 7 37 107 0.07 0.37 1.07
9 8 38 108 0.08 0.38 1.08
10 9 39 109 0.09 0.39 1.09
11 10 40 110 0.10 0.40 1.10
12 11 41 111 0.11 0.41 1.11
13 12 42 112 0.12 0.42 1.12
14 13 43 113 0.13 0.43 1.13
15 14 44 114 0.14 0.44 1.14
16 15 45 115 0.15 0.45 1.15
17 16 46 116 0.16 0.46 1.16
18 17 47 117 0.17 0.47 1.17
19 18 48 118 0.18 0.48 1.18
20 19 49 119 0.19 0.49 1.19
21 20 50 120 0.20 0.50 1.20
Based on the way you wrote your function, a for loop that assign the result of newfun to df repeatedly works pretty well.
vars <- names(df)
for (i in vars){
df <- newfun(i)
}
df
# alpha beta gamma alphaNEW betaNEW gammaNEW
# 1 0 30 100 0.00 0.30 1.00
# 2 1 31 101 0.01 0.31 1.01
# 3 2 32 102 0.02 0.32 1.02
# 4 3 33 103 0.03 0.33 1.03
# 5 4 34 104 0.04 0.34 1.04
# 6 5 35 105 0.05 0.35 1.05
# 7 6 36 106 0.06 0.36 1.06
# 8 7 37 107 0.07 0.37 1.07
# 9 8 38 108 0.08 0.38 1.08
# 10 9 39 109 0.09 0.39 1.09
# 11 10 40 110 0.10 0.40 1.10
# 12 11 41 111 0.11 0.41 1.11
# 13 12 42 112 0.12 0.42 1.12
# 14 13 43 113 0.13 0.43 1.13
# 15 14 44 114 0.14 0.44 1.14
# 16 15 45 115 0.15 0.45 1.15
# 17 16 46 116 0.16 0.46 1.16
# 18 17 47 117 0.17 0.47 1.17
# 19 18 48 118 0.18 0.48 1.18
# 20 19 49 119 0.19 0.49 1.19
# 21 20 50 120 0.20 0.50 1.20

Cumulative sum based on factor on R

I have the following dataset, and I need to acumulate the value and
sum, if the factor is 0, and then put the cummulated sum when I found
the factor != 0.
I've tried the loop bellow, but it didn't worked at all.
for(i in dataset$Variable.1) {
ifelse(dataset$Factor == 0,
dataset$teste <- dataset$Variable.1 + i,
dataset$teste <- dataset$Variable.1)
i<- dataset$Variable.1
print(i)
}
Any ideas?
Bellow an example of the dataset. I wish to get the "Result" Column.
On the real one, I also have a negative factor (-1).
Date Factor Variable.1 Result
1 03/02/2018 0 0.75 0.75
2 04/02/2018 0 0.75 1.50
3 05/02/2018 1 0.96 2.46
4 06/02/2018 1 0.76 0.76
5 07/02/2018 0 1.35 1.35
6 08/02/2018 1 0.70 2.05
7 09/02/2018 1 2.02 2.02
8 10/02/2018 0 0.00 0.00
9 11/02/2018 0 0.00 0.00
10 12/02/2018 0 0.20 0.20
11 13/02/2018 0 0.13 0.33
12 14/02/2018 0 1.64 1.97
13 15/02/2018 0 0.03 2.00
14 16/02/2018 1 0.51 2.51
15 17/02/2018 1 0.00 0.00
16 18/02/2018 0 0.00 0.00
17 19/02/2018 0 0.83 0.83
18 20/02/2018 1 0.42 1.25
19 21/02/2018 1 0.17 0.17
20 22/02/2018 1 0.97 0.97
21 23/02/2018 0 0.92 0.92
22 24/02/2018 0 0.00 0.92
23 25/02/2018 0 0.00 0.92
24 26/02/2018 1 0.19 1.11
25 27/02/2018 1 0.87 0.87
26 28/02/2018 1 0.85 0.85
27 01/03/2018 1 1.95 1.95
28 02/03/2018 1 0.54 0.54
29 03/03/2018 1 0.00 0.00
30 04/03/2018 0 0.00 0.00
31 05/03/2018 0 1.17 1.17
32 06/03/2018 1 0.25 1.42
33 07/03/2018 1 1.45 1.45
Thanks In advance.
If you want to stick with the for-loop, you can try this code :
DF$Result <- NA
prev <- 0
for(i in seq_len(nrow(DF))){
DF$Result[i] <- DF$Variable.1[i] + prev
if(DF$Factor[i] == 1)
prev <- 0
else
prev <- DF$Result[i]
}
Iteratively, try something like:
a=as.data.frame(cbind(Factor=c(0,0,1,1,0,1,1,
rep(0,3),1),Variable.1=c(0.75,0.75,0.96,0.71,1.35,0.7,
0.75,0.96,0.71,1.35,0.7)))
Result=0
aux=NULL
for (i in 1:nrow(a)){
if (a$Factor[i]==0){
Result=Result+a$Variable.1[i]
aux=c(aux,Result)
} else{
Result=Result+a$Variable.1[i]
aux=c(aux,Result)
Result=0
}
}
a$Results=aux
a
Factor Variable.1 Results
1 0 0.75 0.75
2 0 0.75 1.50
3 1 0.96 2.46
4 1 0.71 0.71
5 0 1.35 1.35
6 1 0.70 2.05
7 1 0.75 0.75
8 0 0.96 0.96
9 0 0.71 1.67
10 0 1.35 3.02
11 1 0.70 3.72
A possibility using tidyverse and data.table:
df %>%
mutate(temp = ifelse(Factor == 1 & lag(Factor) == 1, NA, 1), #Marking the rows after the first 1 in "Factor" as NA
temp = ifelse(!is.na(temp), rleid(temp), NA)) %>% #Run length along non-NA values
group_by(temp) %>% #Grouping by run length
mutate(Result = ifelse(!is.na(temp), cumsum(Variable.1), Variable.1)) %>% #Cumulative sum of desired rows
ungroup() %>%
select(-temp) #Removing the redundant variable
Date Factor Variable.1 Result
<chr> <int> <dbl> <dbl>
1 03/02/2018 0 0.750 0.750
2 04/02/2018 0 0.750 1.50
3 05/02/2018 1 0.960 2.46
4 06/02/2018 1 0.760 0.760
5 07/02/2018 0 1.35 1.35
6 08/02/2018 1 0.700 2.05
7 09/02/2018 1 2.02 2.02
8 10/02/2018 0 0. 0.
9 11/02/2018 0 0. 0.
10 12/02/2018 0 0.200 0.200

R dividing dataset into ranged bins?

I am having some problems sorting my dataset into bins, that based on the numeric value of the data value. I tried doing it with the function shingle from the lattice which seem to split it accurately.
I can't seem to extract the desired output which is the knowledge how the data is divided into the predefined bins. I seem only able to print it.
bin_interval = matrix(c(0.38,0.42,0.46,0.50,0.54,0.58,0.62,0.66,0.70,0.74,0.78,0.82,0.86,0.90,0.94,0.98,
0.40,0.44,0.48,0.52,0.56,0.60,0.64,0.68,0.72,0.76,0.80,0.84,0.88,0.92,0.96,1.0),
ncol = 2, nrow = 16)
bin_1 = shingle(data_1,intervals = bin_interval)
How do i extract the intervals which is outputted by the shingle function, and not only print it...
the intervals being the output:
Intervals:
min max count
1 0.38 0.40 0
2 0.42 0.44 6
3 0.46 0.48 46
4 0.50 0.52 251
5 0.54 0.56 697
6 0.58 0.60 1062
7 0.62 0.64 1215
8 0.66 0.68 1227
9 0.70 0.72 1231
10 0.74 0.76 1293
11 0.78 0.80 1330
12 0.82 0.84 1739
13 0.86 0.88 2454
14 0.90 0.92 3048
15 0.94 0.96 8936
16 0.98 1.00 71446
As an variable, that can be fed to another function.
The shingle() function returns the values using attributes().
The levels are specifically given by attr(bin_1,"levels").
So:
set.seed(1337)
data_1 = runif(100)
bin_interval = matrix(c(0.38,0.42,0.46,0.50,0.54,0.58,0.62,0.66,0.70,0.74,0.78,0.82,0.86,0.90,0.94,0.98,
0.40,0.44,0.48,0.52,0.56,0.60,0.64,0.68,0.72,0.76,0.80,0.84,0.88,0.92,0.96,1.0),
ncol = 2, nrow = 16)
bin_1 = shingle(data_1,intervals = bin_interval)
attr(bin_1,"levels")
This gives:
[,1] [,2]
[1,] 0.38 0.40
[2,] 0.42 0.44
[3,] 0.46 0.48
[4,] 0.50 0.52
[5,] 0.54 0.56
[6,] 0.58 0.60
[7,] 0.62 0.64
[8,] 0.66 0.68
[9,] 0.70 0.72
[10,] 0.74 0.76
[11,] 0.78 0.80
[12,] 0.82 0.84
[13,] 0.86 0.88
[14,] 0.90 0.92
[15,] 0.94 0.96
[16,] 0.98 1.00
Edit
The count information for each interval is only computed within the print.shingle method. Thus, you would need to run the following code:
count.shingle = function(x){
l <- levels(x)
n <- nlevels(x)
int <- data.frame(min = numeric(n), max = numeric(n),
count = numeric(n))
for (i in 1:n) {
int$min[i] <- l[[i]][1]
int$max[i] <- l[[i]][2]
int$count[i] <- length(x[x >= l[[i]][1] & x <= l[[i]][2]])
}
int
}
a = count.shingle(bin_1)
This gives:
> a
min max count
1 0.38 0.40 0
2 0.42 0.44 1
3 0.46 0.48 3
4 0.50 0.52 1
5 0.54 0.56 2
6 0.58 0.60 2
7 0.62 0.64 2
8 0.66 0.68 4
9 0.70 0.72 1
10 0.74 0.76 3
11 0.78 0.80 2
12 0.82 0.84 2
13 0.86 0.88 5
14 0.90 0.92 1
15 0.94 0.96 1
16 0.98 1.00 2
where a$min is lower range, a$max is upper range, and a$count is the number within the bins.

How to count these transitions - in R

Given a table of values, where A = state of system, B = length of state, and C = cumulative length of states:
A B C
1 1.16 1.16
0 0.51 1.67
1 1.16 2.84
0 0.26 3.10
1 0.59 3.69
0 0.39 4.08
1 0.78 4.85
0 0.90 5.75
1 0.78 6.53
0 0.26 6.79
1 0.12 6.91
0 0.51 7.42
1 0.26 7.69
0 0.51 8.20
1 0.39 8.59
0 0.51 9.10
1 1.16 10.26
0 1.10 11.36
1 0.59 11.95
0 0.51 12.46
How would I use R to calculate the number of transitions (where A gives the state) per constant interval length - where the intervals are consecutive and could be any arbitrary number (I chose a value of 2 in my image example)? For example, using the table values or the image included we count 2 transitions from 0-2, 3 transitions from greater than 2-4, 3 transitions from >4-6, etc.
This is straightforward in R. All you need is column C and ?cut. Consider:
d <- read.table(text="A B C
1 1.16 1.16
0 0.51 1.67
1 1.16 2.84
0 0.26 3.10
1 0.59 3.69
0 0.39 4.08
1 0.78 4.85
0 0.90 5.75
1 0.78 6.53
0 0.26 6.79
1 0.12 6.91
0 0.51 7.42
1 0.26 7.69
0 0.51 8.20
1 0.39 8.59
0 0.51 9.10
1 1.16 10.26
0 1.10 11.36
1 0.59 11.95
0 0.51 12.46", header=TRUE)
fi <- cut(d$C, breaks=seq(from=0, to=14, by=2))
table(fi)
# fi
# (0,2] (2,4] (4,6] (6,8] (8,10] (10,12] (12,14]
# 2 3 3 5 3 3 1

Resources