R - Reducing a matrix

R - Reducing a matrix - r

I have a square matrix that is like:
A <- c("111","111","111","112","112","113")
B <- c(100,10,20,NA,NA,10)
C <- c(10,20,40,NA,10,20)
D <- c(10,20,NA,NA,40,200)
E <- c(20,20,40,10,10,20)
F <- c(NA,NA,40,100,10,20)
G <- c(10,20,NA,30,10,20)
df <- data.frame(A,B,C,D,E,F,G)
names(df) <- c("Codes","111","111","111","112","112","113")
# Codes 111 111 111 112 112 113
# 1 111 100 10 10 20 NA 10
# 2 111 10 20 20 20 NA 20
# 3 111 20 40 NA 40 40 NA
# 4 112 NA NA NA 10 100 30
# 5 112 NA 10 40 10 10 10
# 6 113 10 20 200 20 20 20
I want to reduce it so that observations with the same row and column names are summed up.
So I want to end up with:
# Codes 111 112 113
# 1 111 230 120 30
# 2 112 50 130 40
# 3 113 230 40 20
I tried to first combine the rows with the same "Codes" number, but I was having a lot of trouble.

In tidyverse
library(tidyverse)
df %>%
pivot_longer(-Codes, values_drop_na = TRUE) %>%
group_by(Codes, name) %>%
summarise(value = sum(value), .groups = 'drop')%>%
pivot_wider()
# A tibble: 3 x 4
Codes `111` `112` `113`
<chr> <dbl> <dbl> <dbl>
1 111 230 120 30
2 112 50 130 40
3 113 230 40 20
One way in base R:
tapply(unlist(df[-1]), list(names(df)[-1][col(df[-1])], df[,1][row(df[-1])]), sum, na.rm = TRUE)
111 112 113
111 230 50 230
112 120 130 40
113 30 40 20
Note that this can be simplified as denoted by #thelatemail to
grp <- expand.grid(df$Codes, names(df)[-1])
tapply(unlist(df[-1]), grp, FUN=sum, na.rm=TRUE)
You can also use `xtabs:
xtabs(vals~., na.omit(cbind(grp, vals = unlist(df[-1]))))
Var2
Var1 111 112 113
111 230 120 30
112 50 130 40
113 230 40 20

When dealing with actual matrices - especially with large ones -, expressing the operation as (sparse) linear algebra should be most efficient.
library(Matrix) ## for sparse matrix operations
idx <- c("111","111","111","112","112","113")
mat <- matrix(c(100,10,20,NA,NA,10,
10,20,40,NA,10,20,
10,20,NA,NA,40,200,
20,20,40,10,10,20,
NA,NA,40,100,10,20,
10,20,NA,30,10,20),
nrow=length(idx),
byrow=TRUE, dimnames=list(idx, idx))
## convert NA's to zero
mat[is.na(mat)] <- 0
## examine matrix
mat
## 111 111 111 112 112 113
## 111 100 10 20 0 0 10
## 111 10 20 40 0 10 20
## 111 10 20 0 0 40 200
## 112 20 20 40 10 10 20
## 112 0 0 40 100 10 20
## 113 10 20 0 30 10 20
## indicator matrix
## converts between "code" and "idx" spaces
M_code_idx <- fac2sparse(idx)
## project to "code_code" space
M_code_idx %*% mat %*% t(M_code_idx)
## 3 x 3 Matrix of class "dgeMatrix"
## 111 112 113
## 111 230 50 230
## 112 120 130 40
## 113 30 40 20

Related

R - more effective left_join [duplicate]

This question already has answers here:
Overlap join with start and end positions
(5 answers)
Closed 1 year ago.
I have got two dataframes - one containing names and ranges of limits (only few hundreds of rows, 1000 at most), which needs to be assigned to a "measurements" dataframe which can consist of million of rows (or ten's of millions of row).
Currently I am doing left_join and filtering value to get a specific limit assigned to each measurement. This however is quite ineffective and cost a lot of resources. For larger dataframes, the code is even unable to run.
Any ideas for more effective solutions will be helpful.
library(dplyr)
## this one has got only few houndreds rows
df_limits <- read.table(text="Title station_id limit_from limit_to
Level_3_Low 1 0 70
Level_2_Low 1 70 90
Level_1_Low 1 90 100
Optimal 1 100 110
Level_1_High 1 110 130
Level_2_High 1 130 150
Level_3_High 1 150 180
Level_3_Low 2 0 70
Level_2_Low 2 70 90
Level_1_Low 2 90 100
Optimal 2 100 110
Level_1_High 2 110 130
Level_2_High 2 130 150
Level_3_High 2 150 180
Level_3_Low 3 0 70
Level_2_Low 3 70 90
Level_1_Low 3 90 100
Optimal 3 100 110
Level_1_High 3 110 130
Level_2_High 3 130 150
Level_3_High 3 150 180
",header = TRUE, stringsAsFactors = TRUE)
# this DF has got millions of rows
df_measurements <- read.table(text="measurement_id station_id value
12121534 1 172
12121618 1 87
12121703 1 9
12121709 2 80
12121760 2 80
12121813 2 115
12121881 3 67
12121907 3 100
12121920 3 108
12121979 1 102
12121995 1 53
12122022 1 77
12122065 2 158
12122107 2 144
12122113 2 5
12122135 3 100
12122187 3 136
12122267 3 130
12122359 1 105
12122366 1 126
12122398 1 143
",header = TRUE, stringsAsFactors = TRUE)
df_results <- left_join(df_measurements,df_limits, by = "station_id") %>%
filter ((value >= limit_from & value < limit_to) | is.na(Title)) %>%
select(names(df_measurements), Title)

Another data.table solution using non-equijoins:
library(data.table)
setDT(df_measurements)
setDT(df_limits)
df_limits[df_measurements, .(station_id, measurement_id, value, Title),
on=.(station_id = station_id, limit_from < value, limit_to >= value)]
station_id measurement_id value Title
1: 1 12121534 172 Level_3_High
2: 1 12121618 87 Level_2_Low
3: 1 12121703 9 Level_3_Low
4: 2 12121709 80 Level_2_Low
5: 2 12121760 80 Level_2_Low
6: 2 12121813 115 Level_1_High
7: 3 12121881 67 Level_3_Low
8: 3 12121907 100 Level_1_Low
9: 3 12121920 108 Optimal
10: 1 12121979 102 Optimal
11: 1 12121995 53 Level_3_Low
12: 1 12122022 77 Level_2_Low
13: 2 12122065 158 Level_3_High
14: 2 12122107 144 Level_2_High
15: 2 12122113 5 Level_3_Low
16: 3 12122135 100 Level_1_Low
17: 3 12122187 136 Level_2_High
18: 3 12122267 130 Level_1_High
19: 1 12122359 105 Optimal
20: 1 12122366 126 Level_1_High
21: 1 12122398 143 Level_2_High

A simple base R (no need additional packages) option using subset + merge
subset(
merge(
df_measurements,
df_limits,
all = TRUE
),
limit_from < value & limit_to >= value
)
gives
station_id measurement_id value Title limit_from limit_to
7 1 12121534 172 Level_3_High 150 180
9 1 12121618 87 Level_2_Low 70 90
15 1 12121703 9 Level_3_Low 0 70
23 1 12122022 77 Level_2_Low 70 90
34 1 12122398 143 Level_2_High 130 150
39 1 12121979 102 Optimal 100 110
43 1 12121995 53 Level_3_Low 0 70
54 1 12122366 126 Level_1_High 110 130
60 1 12122359 105 Optimal 100 110
65 2 12121760 80 Level_2_Low 70 90
75 2 12121813 115 Level_1_High 110 130
79 2 12121709 80 Level_2_Low 70 90
91 2 12122065 158 Level_3_High 150 180
97 2 12122107 144 Level_2_High 130 150
99 2 12122113 5 Level_3_Low 0 70
108 3 12121907 100 Level_1_Low 90 100
116 3 12121920 108 Optimal 100 110
124 3 12122267 130 Level_1_High 110 130
127 3 12121881 67 Level_3_Low 0 70
136 3 12122135 100 Level_1_Low 90 100
146 3 12122187 136 Level_2_High 130 150
Another option is using dplyr
df_measurements %>%
group_by(station_id) %>%
mutate(Title = with(
df_limits,
Title[
findInterval(
value,
unique(unlist(cbind(limit_from, limit_to)[station_id == first(.$station_id)])),
left.open = TRUE
)
]
)) %>%
ungroup()
which gives
# A tibble: 21 x 4
measurement_id station_id value Title
<int> <int> <int> <fct>
1 12121534 1 172 Level_3_High
2 12121618 1 87 Level_2_Low
3 12121703 1 9 Level_3_Low
4 12121709 2 80 Level_2_Low
5 12121760 2 80 Level_2_Low
6 12121813 2 115 Level_1_High
7 12121881 3 67 Level_3_Low
8 12121907 3 100 Level_1_Low
9 12121920 3 108 Optimal
10 12121979 1 102 Optimal
# ... with 11 more rows
Benchmarking
f_TIC1 <- function() {
subset(
merge(
df_measurements,
df_limits,
all = TRUE
),
limit_from < value & limit_to >= value
)
}
f_TIC2 <- function() {
df_measurements %>%
group_by(station_id) %>%
mutate(Title = with(
df_limits,
Title[
findInterval(
value,
unique(unlist(cbind(limit_from, limit_to)[station_id == first(station_id)])),
left.open = TRUE
)
]
)) %>%
ungroup()
}
dt_limits <- as.data.table(df_limits)
dt_measurements <- as.data.table(df_measurements)
f_Waldi <- function() {
dt_limits[
dt_measurements,
.(station_id, measurement_id, value, Title),
on = .(station_id, limit_from < value, limit_to >= value)
]
}
f_TimTeaFan <- function() {
setkey(dt_limits, station_id, limit_from, limit_to)
foverlaps(dt_measurements[, value2 := value],
dt_limits,
by.x = c("station_id", "value", "value2"),
type = "within",
)[
value < limit_to,
.(measurement_id, station_id, value, Title)
]
}
you will see that
Unit: relative
expr min lq mean median uq max neval
f_TIC1() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100
f_TIC2() 4.848639 4.909985 4.895588 4.942616 5.124704 2.580819 100
f_Waldi() 3.182027 3.010615 3.069916 3.114160 3.397845 1.698386 100
f_TimTeaFan() 5.523778 5.112872 5.226145 5.112407 5.745671 2.446987 100

Here is one way to do it. The problematic part was the condition value < limit_to. foverlaps checks for the condition value <= limit_to which results in double matches so here we call the filter condition after the overlapping join and then select the desired columns. Note that the result is not in the same order as the df_results generated with dplyr.
library(data.table)
dt_limits <- as.data.table(df_limits)
dt_measurements <- as.data.table(df_measurements)
setkey(dt_limits, station_id, limit_from, limit_to)
dt_results <- foverlaps(dt_measurements[, value2 := value],
dt_limits,
by.x = c("station_id", "value", "value2"),
type = "within",
)[value < limit_to,
.(measurement_id , station_id, value, Title)]
dt_results[]
#> measurement_id station_id value Title
#> 1: 12121534 1 172 Level_3_High
#> 2: 12121618 1 87 Level_2_Low
#> 3: 12121703 1 9 Level_3_Low
#> 4: 12121709 2 80 Level_2_Low
#> 5: 12121760 2 80 Level_2_Low
#> 6: 12121813 2 115 Level_1_High
#> 7: 12121881 3 67 Level_3_Low
#> 8: 12121907 3 100 Optimal
#> 9: 12121920 3 108 Optimal
#> 10: 12121979 1 102 Optimal
#> 11: 12121995 1 53 Level_3_Low
#> 12: 12122022 1 77 Level_2_Low
#> 13: 12122065 2 158 Level_3_High
#> 14: 12122107 2 144 Level_2_High
#> 15: 12122113 2 5 Level_3_Low
#> 16: 12122135 3 100 Optimal
#> 17: 12122187 3 136 Level_2_High
#> 18: 12122267 3 130 Level_2_High
#> 19: 12122359 1 105 Optimal
#> 20: 12122366 1 126 Level_1_High
#> 21: 12122398 1 143 Level_2_High
#> measurement_id station_id value Title
Created on 2021-08-09 by the reprex package (v0.3.0)

Subtracting similar column names R

I have a dataframe with columns that have 'x1' and 'x1_fit' with the numbers going up to 5 in some cases.
date <- seq(as.Date('2019-11-04'), by = "days", length.out = 7)
x1 <- c(100,120,111,152,110,112,111)
x1_fit <- c(150,142,146,148,123,120,145)
x2 <- c(110,130,151,152,150,142,161)
x2_fit <- c(170,172,176,178,173,170,175)
df <- data.frame(date,x1,x1_fit,x2,x2_fit)
How can I do x1_fit - x1 and so on. The number of x's will change every time.

You can select those columns with regular expressions (surppose the columns are in appropriate order):
> df[, grep('^x\\d+_fit$', colnames(df))] - df[, grep('^x\\d+$', colnames(df))]
x1_fit x2_fit
1 50 60
2 22 42
3 35 25
4 -4 26
5 13 23
6 8 28
7 34 14
If you want to assign the differences to the original df:
df[, paste0(grep('^x\\d+$', colnames(df), value = TRUE), '_diff')] <-
df[, grep('^x\\d+_fit$', colnames(df))] - df[, grep('^x\\d+$', colnames(df))]
# > df
# date x1 x1_fit x2 x2_fit x1_diff x2_diff
# 1 2019-11-04 100 150 110 170 50 60
# 2 2019-11-05 120 142 130 172 22 42
# 3 2019-11-06 111 146 151 176 35 25
# 4 2019-11-07 152 148 152 178 -4 26
# 5 2019-11-08 110 123 150 173 13 23
# 6 2019-11-09 112 120 142 170 8 28
# 7 2019-11-10 111 145 161 175 34 14

Solution from #mt1022 is straightforward, however since you have tagged this as dplyr, here is one approach following it where we convert the data to long format, subtract the corresponding values and get the data in wide format again.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -date) %>%
mutate(name = sub('_.*', '', name)) %>%
group_by(date, name) %>%
summarise(diff = diff(value)) %>%
pivot_wider(names_from = name, values_from = diff) %>%
rename_at(-1, ~paste0(., "_diff")) %>%
left_join(df, by = "date")
# date x1_diff x2_diff x1 x1_fit x2 x2_fit
# <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2019-11-04 50 60 100 150 110 170
#2 2019-11-05 22 42 120 142 130 172
#3 2019-11-06 35 25 111 146 151 176
#4 2019-11-07 -4 26 152 148 152 178
#5 2019-11-08 13 23 110 123 150 173
#6 2019-11-09 8 28 112 120 142 170
#7 2019-11-10 34 14 111 145 161 175

In base R, you could loop over the unique column names and diff on the the fitted column using
> lapply(setNames(nm = unique(gsub("_.*", "", names(df)))), function(nm) {
fit <- paste0(nm, "_fit")
diff <- df[, nm] - df[, fit]
})
# $x1
# [1] -50 -22 -35 4 -13 -8 -34
#
# $x2
# [1] -60 -42 -25 -26 -23 -28 -14
Here, I set the Date column as the row names and removed the column using
df <- data.frame(date,x1,x1_fit,x2,x2_fit)
row.names(df) <- df$date
df$date <- NULL
but you could just loop over the the column names without the Date column.

We can also do with a split in base R
out <- sapply(split.default(df[-1], sub("_.*", "", names(df)[-1])),
function(x) x[,2] - x[1])
df[sub("\\..*", "_diff", names(lst1))] <- out
df
# date x1 x1_fit x2 x2_fit x1_diff x2_diff
#1 2019-11-04 100 150 110 170 50 60
#2 2019-11-05 120 142 130 172 22 42
#3 2019-11-06 111 146 151 176 35 25
#4 2019-11-07 152 148 152 178 -4 26
#5 2019-11-08 110 123 150 173 13 23
#6 2019-11-09 112 120 142 170 8 28
#7 2019-11-10 111 145 161 175 34 14

Sorting one variable in a data frame by id

I have a data frame with lot of company information separated by an id variable. I want to sort one of the variables and repeat it for every id. Let's take this example,
df <- structure(list(id = c(110, 110, 110, 90, 90, 90, 90, 252, 252
), var1 = c(26, 21, 54, 10, 18, 9, 16, 54, 39), var2 = c(234,
12, 43, 32, 21, 19, 16, 34, 44)), .Names = c("id", "var1", "var2"
), row.names = c(NA, -9L), class = "data.frame")
Which looks like this
df
id var1 var2
1 110 26 234
2 110 21 12
3 110 54 43
4 90 10 32
5 90 18 21
6 90 9 19
7 90 16 16
8 252 54 34
9 252 39 44
Now, I want to sort the data frame according to var1 by the vector id. Easiest solution I can think of is using apply function like this,
> apply(df, 2, sort)
id var1 var2
[1,] 90 9 12
[2,] 90 10 16
[3,] 90 16 19
[4,] 90 18 21
[5,] 110 21 32
[6,] 110 26 34
[7,] 110 39 43
[8,] 252 54 44
[9,] 252 54 234
However, this is not the output I am seeking. The correct output should be,
id var1 var2
1 110 21 12
2 110 26 234
3 110 54 43
4 90 9 19
5 90 10 32
6 90 16 16
7 90 18 21
8 252 39 44
9 252 54 34
Group by id and sort by var1 column and keep original id column order.
Any idea how to sort like this?

Note. As mentioned by Moody_Mudskipper, there is no need to use tidyverse and can also be done easily with base R:
df[order(ordered(df$id, unique(df$id)), df$var1), ]
A one-liner tidyverse solution w/o any temp vars:
library(tidyverse)
df %>% arrange(ordered(id, unique(id)), var1)
# id var1 var2
# 1 110 26 234
# 2 110 21 12
# 3 110 54 43
# 4 90 10 32
# 5 90 18 21
# 6 90 9 19
# 7 90 16 16
# 8 252 54 34
# 9 252 39 44
Explanation of why apply(df, 2, sort) does not work
What you were trying to do is to sort each column independently. apply runs over the specified dimension (2 in this case which corresponds to columns) and applies the function (sort in this case).
apply tries to further simplify the results, in this case to a matrix. So you are getting back a matrix (not a data.frame) where each column is sorted independently. For example this row from the apply call:
# [1,] 90 9 12
does not even exist in the original data.frame.

Another base R option using order and match
df[with(df, order(match(id, unique(id)), var1, var2)), ]
# id var1 var2
#2 110 21 12
#1 110 26 234
#3 110 54 43
#6 90 9 19
#4 90 10 32
#7 90 16 16
#5 90 18 21
#9 252 39 44
#8 252 54 34

We can convert the id to factor in order to split while preserving the original order. We can then loop over the list and order, and rbind again, i.e.
df$id <- factor(df$id, levels = unique(df$id))
do.call(rbind, lapply(split(df, df$id), function(i)i[order(i$var1),]))
# id var1 var2
#110.2 110 21 12
#110.1 110 26 234
#110.3 110 54 43
#90.6 90 9 19
#90.4 90 10 32
#90.7 90 16 16
#90.5 90 18 21
#252.9 252 39 44
#252.8 252 54 34
NOTE: You can reset the rownames by rownames(new_df) <- NULL

In base R we could use split<- :
split(df,df$id) <- lapply(split(df,df$id), function(x) x[order(x$var1),] )
or as #Markus suggests :
split(df, df$id) <- by(df, df$id, function(x) x[order(x$var1),])
output in either case :
df
# id var1 var2
# 1 110 21 12
# 2 110 26 234
# 3 110 54 43
# 4 90 9 19
# 5 90 10 32
# 6 90 16 16
# 7 90 18 21
# 8 252 39 44
# 9 252 54 34

With the following tidyverse pipe, the question's output is reproduced.
library(tidyverse)
df %>%
mutate(tmp = cumsum(c(0, diff(id) != 0))) %>%
group_by(id) %>%
arrange(tmp, var1) %>%
select(-tmp)
## A tibble: 9 x 3
## Groups: id [3]
# id var1 var2
# <dbl> <dbl> <dbl>
#1 110 21 12
#2 110 26 234
#3 110 54 43
#4 90 9 19
#5 90 10 32
#6 90 16 16
#7 90 18 21
#8 252 39 44
#9 252 54 34

Create dataset based on condition

I have dataset new with variable a b and c
a b c
hdjfh 434 876
sdfdsf 34 98
gfdsdfdsf 534 672
rsdfdsf 65 87
gsdfdsf 67 54
vbvnn 98 09
gkhjgfk 100 768
rknfg 78 3546
i want to create two datatsets such that dataset new1 need to satisfy condition b >110 or c >110. second dataset new2 will have records that are not satisfied by the condition b >110 or c >110

If you want to assign the two data sets to new variables, you can do this:
df <- data.frame(a=c('hdjfh','sdfdsf','gfdsdfdsf','rsdfdsf','gsdfdsf','vbvnn','gkhjgfk','rknfg'),b=c(434L,34L,534L,65L,67L,98L,100L,78L),c=c(876L,98L,672L,87L,54L,9L,768L,3546L),stringsAsFactors=F);
cond <- df$b>110|df$c>110;
new1 <- df[cond,];
new2 <- df[!cond,];
new1;
## a b c
## 1 hdjfh 434 876
## 3 gfdsdfdsf 534 672
## 7 gkhjgfk 100 768
## 8 rknfg 78 3546
new2;
## a b c
## 2 sdfdsf 34 98
## 4 rsdfdsf 65 87
## 5 gsdfdsf 67 54
## 6 vbvnn 98 9
Another option is to use split() to get a list:
split(df,df$b>110|df$c>110);
## $`FALSE`
## a b c
## 2 sdfdsf 34 98
## 4 rsdfdsf 65 87
## 5 gsdfdsf 67 54
## 6 vbvnn 98 9
##
## $`TRUE`
## a b c
## 1 hdjfh 434 876
## 3 gfdsdfdsf 534 672
## 7 gkhjgfk 100 768
## 8 rknfg 78 3546
##

Split intervals (genomic regions) in individual numbers (nucleotides)

I would like to transform my data frame df based in regions in point by point (number by number or nucletide by nucleotide) information.
My input df:
start end state freq
100 103 1nT 22
100 103 3nT 34
104 106 1nT 12
104 106 3nT 16
My expected output:
position state freq
100 1nT 22
101 1nT 22
102 1nT 22
103 1nT 22
100 3nT 34
101 3nT 34
102 3nT 34
103 3nT 34
104 1nT 12
105 1nT 12
106 1nT 12
104 3nT 16
105 3nT 16
106 3nT 16
Any ideas? Thank you very much.

Here is a vectorized approach:
# load your data
df <- read.table(textConnection("start end state freq
100 103 1nT 22
100 103 3nT 34
104 106 1nT 12
104 106 3nT 16"), header=TRUE)
# extract number of needed replications
n <- df$end - df$start + 1
# calculate position and replicate state/freq
res <- data.frame(position = rep(df$start - 1, n) + sequence(n),
state = rep(df$state, n),
freq = rep(df$freq, n))
res
# position state freq
# 1 100 1nT 22
# 2 101 1nT 22
# 3 102 1nT 22
# 4 103 1nT 22
# 5 100 3nT 34
# 6 101 3nT 34
# 7 102 3nT 34
# 8 103 3nT 34
# 9 104 1nT 12
# 10 105 1nT 12
# 11 106 1nT 12
# 12 104 3nT 16
# 13 105 3nT 16
# 14 106 3nT 16

Here is one approach....
Build you data
require(data.table)
fakedata <- data.table(start=c(100,100,104,104),
end=c(103,103,106,106),
state=c("1nT","3nT","1nT","3nT"),
freq=c(22,34,12,16))
Perform calculation
fakedata[ , dur := (end-start+1)]
outdata <- fakedata[ , lapply(.SD,function(x) rep(x,dur))]
outdata[ , position := (start-1)+1:.N, by=list(start,end,state)]
And the output
start end state freq dur position
1: 100 103 1nT 22 4 100
2: 100 103 1nT 22 4 101
3: 100 103 1nT 22 4 102
4: 100 103 1nT 22 4 103
5: 100 103 3nT 34 4 100
6: 100 103 3nT 34 4 101
7: 100 103 3nT 34 4 102
8: 100 103 3nT 34 4 103
9: 104 106 1nT 12 3 104
10: 104 106 1nT 12 3 105
11: 104 106 1nT 12 3 106
12: 104 106 3nT 16 3 104
13: 104 106 3nT 16 3 105
14: 104 106 3nT 16 3 106

This can be accomplished with a simple apply command.
Let's build this in sequence:
You want to perform an operation based on every row, so apply by row should be your first thought (or for loop). So we know we want to use apply(data, 1, row.function).
Think of what you would want to do for a single row. You want to repeat state and freq for every number between start and stop.
To get the range of numbers between start and stop we can use the colon operator start:stop.
Now, R will automatically repeat the values in a vector to match the longest vector length when creating a data.frame. So, we can create the piece from a single row like this:
data.frame(position=(row['start']:row['end']), state=row['state'], freq=row['freq'])
Then we want to bind it all together, so we use `do.call('rbind', result).
Putting this all together now, we have:
do.call('rbind',
apply(data, 1, function(row) {
data.frame(position=(row['start']:row['end']),
state=row['state'], freq=row['freq'])
}))
Which will give you what you want. Hopefully this helps teach you how to approach problems like this in the future too!

Here's rough implementation using for loop.
a = t(matrix(c(100, 103, "1nT" , 22,
100, 103 , "3nT" , 34,
104, 106 , "1nT" , 12,
104, 106 , "3nT" , 16), nrow = 4))
a = data.frame(a, stringsAsFactor = F)
colnames(a) = c("start", "end" , "state", "freq")
a$start = as.numeric(as.character(a$start))
a$end = as.numeric(as.character(a$end))
n = dim(a)[1]
res = NULL
for (i in 1:n) {
position = a$start[i]:a$end[i]
state = rep(a$state[i], length(position))
freq = rep(a$freq[i], length(position))
temp = cbind.data.frame(position, state, freq)
res = rbind(res, temp)
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - Reducing a matrix - r

Related

R - more effective left_join [duplicate]

Subtracting similar column names R

Sorting one variable in a data frame by id

Create dataset based on condition

Split intervals (genomic regions) in individual numbers (nucleotides)

Categories

Resources