Preserve names when coercing vector from binary to `as.numeric`?

Preserve names when coercing vector from binary to `as.numeric`? - r

In R, when you coerce a vector from binary to numeric, the names are stripped away.
There are a few possible solutions, which I've outlined before. It seems dangerous to rely on implicit conversion by adding 0 to all the values, and the sapply() adds an additional loop to my operations (which seems inefficient). Is there any other way to preserve the names when converting a vector using as.numeric?
# Set the seed
set.seed(1045)
# Create a small sample vector and give it names
example_vec <- sample(x = c(TRUE,FALSE),size = 10,replace = TRUE)
names(example_vec) <- sample(x = LETTERS,size = 10,replace = FALSE)
example_vec
# Y N M P L J H O F D
# FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
as.numeric(x = example_vec)
# [1] 0 1 0 0 1 1 1 1 1 1
example_vec + 0
# Y N M P L J H O F D
# 0 1 0 0 1 1 1 1 1 1
sapply(X = example_vec,FUN = as.numeric)
# Y N M P L J H O F D
# 0 1 0 0 1 1 1 1 1 1

One possibility is to use the mode<- replacement function to change the internal storage mode (type) of the object. Also, integers are more appropriate than doubles (i.e. numerics) for this case of logical coercion.
mode(example_vec) <- "integer"
example_vec
# Y N M P L J H O F D
# 0 1 0 0 1 1 1 1 1 1
From help(mode) -
mode(x) <- "newmode" changes the mode of object x to newmode. This is only supported if there is an appropriate as.newmode function, for example "logical", "integer", "double", "complex", "raw", "character", "list", "expression", "name", "symbol" and "function". Attributes are preserved.
The documentation also notes that storage.mode<- is a more efficient primitive version of mode<-. So the following could also be used.
storage.mode(example_vec) <- "integer"
But as #joran pointed out in the comments, it looks like class<- also does the same thing.

Just to throw another option out there, since your input is a logical vector, you can use ifelse(). And one could argue this approach is more explicit and straightforward:
ifelse(example_vec,1L,0L);
## Y N M P L J H O F D
## 0 1 0 0 1 1 1 1 1 1
Benchmarking
library(microbenchmark);
ifelse. <- function(x) ifelse(x,1L,0L);
sapply. <- function(x) sapply(x,as.integer);
setstoragemode <- function(x) { storage.mode(x) <- 'integer'; x; };
setmode <- function(x) { mode(x) <- 'integer'; x; };
setclass <- function(x) { class(x) <- 'integer'; x; };
as.and.setnames <- function(x) setNames(as.integer(x),names(x));
plus <- function(x) +x;
addzero <- function(x) x+0L;
## small scale (OP's example input)
set.seed(1045L);
x <- sample(c(T,F),10L,T);
names(x) <- sample(LETTERS,10L);
ex <- ifelse.(x);
identical(ex,sapply.(x));
## [1] TRUE
identical(ex,setstoragemode(x));
## [1] TRUE
identical(ex,setmode(x));
## [1] TRUE
identical(ex,setclass(x));
## [1] TRUE
identical(ex,as.and.setnames(x));
## [1] TRUE
identical(ex,plus(x));
## [1] TRUE
identical(ex,addzero(x));
## [1] TRUE
microbenchmark(ifelse.(x),sapply.(x),setstoragemode(x),setmode(x),setclass(x),as.and.setnames(x),plus(x),addzero(x));
## Unit: nanoseconds
## expr min lq mean median uq max neval
## ifelse.(x) 6843 8126.0 9627.13 8981 9837.0 21810 100
## sapply.(x) 18817 20100.5 23234.93 21383 22666.5 71418 100
## setstoragemode(x) 856 1283.0 1745.54 1284 1711.0 15396 100
## setmode(x) 7270 8126.0 9862.36 8982 10264.0 32074 100
## setclass(x) 429 1283.0 2138.97 1284 1712.0 32075 100
## as.and.setnames(x) 1283 1711.0 1997.78 1712 2139.0 7271 100
## plus(x) 0 428.0 492.39 428 428.5 9837 100
## addzero(x) 0 428.0 539.39 428 856.0 2566 100
## large scale
set.seed(1L);
N <- 1e5L;
x <- sample(c(T,F),N,T);
names(x) <- make.unique(rep_len(LETTERS,N));
ex <- ifelse.(x);
identical(ex,sapply.(x));
## [1] TRUE
identical(ex,setstoragemode(x));
## [1] TRUE
identical(ex,setmode(x));
## [1] TRUE
identical(ex,setclass(x));
## [1] TRUE
identical(ex,as.and.setnames(x));
## [1] TRUE
identical(ex,plus(x));
## [1] TRUE
identical(ex,addzero(x));
## [1] TRUE
microbenchmark(ifelse.(x),sapply.(x),setstoragemode(x),setmode(x),setclass(x),as.and.setnames(x),plus(x),addzero(x));
## Unit: microseconds
## expr min lq mean median uq max neval
## ifelse.(x) 7633.598 7757.1900 16615.71251 7897.4600 29401.112 96503.642 100
## sapply.(x) 86353.737 102576.0945 125547.32957 123909.1120 137900.406 264442.788 100
## setstoragemode(x) 84.676 92.8015 343.46124 98.3605 113.543 23939.133 100
## setmode(x) 124.020 155.0245 603.15744 167.2125 181.111 22395.736 100
## setclass(x) 85.104 92.3740 328.25393 100.2850 118.460 21807.713 100
## as.and.setnames(x) 70.991 78.2610 656.98177 82.3235 88.953 35710.697 100
## plus(x) 40.200 42.9795 48.68026 44.9040 49.608 88.953 100
## addzero(x) 181.326 186.4580 196.34882 189.6650 201.211 282.679 100
## very large scale
set.seed(1L);
N <- 1e7L;
x <- sample(c(T,F),N,T);
names(x) <- make.unique(rep_len(LETTERS,N));
ex <- ifelse.(x);
identical(ex,sapply.(x));
## [1] TRUE
identical(ex,setstoragemode(x));
## [1] TRUE
identical(ex,setmode(x));
## [1] TRUE
identical(ex,setclass(x));
## [1] TRUE
identical(ex,as.and.setnames(x));
## [1] TRUE
identical(ex,plus(x));
## [1] TRUE
identical(ex,addzero(x));
## [1] TRUE
microbenchmark(ifelse.(x),sapply.(x),setstoragemode(x),setmode(x),setclass(x),as.and.setnames(x),plus(x),addzero(x),times=5L);
## Unit: milliseconds
## expr min lq mean median uq max neval
## ifelse.(x) 1082.220903 1308.106967 3452.639836 1473.723533 6306.320235 7092.82754 5
## sapply.(x) 16766.199371 17431.458634 18401.672635 18398.345499 18843.890150 20568.46952 5
## setstoragemode(x) 13.298283 13.648103 173.574496 19.661753 24.736278 796.52806 5
## setmode(x) 19.043796 19.878573 75.669779 19.969235 39.683589 279.77370 5
## setclass(x) 14.025292 14.119804 259.627934 14.414457 26.838618 1228.74150 5
## as.and.setnames(x) 12.889875 24.241484 178.243948 24.962934 25.103631 804.02182 5
## plus(x) 7.577576 7.676364 9.047674 8.245142 8.253266 13.48602 5
## addzero(x) 18.861615 18.960403 71.284716 26.622226 26.950662 265.02867 5
Looks like the unary plus takes the cake. (And my ifelse() idea kinda sucks.)

Related

Generating Coin flips using purrr

Now I'm learning how to use purrr package in r, and thinking about how to generate 5 samples of each 1, 2, ..., 99, 100 coin flips.
My image is to create a list, that should look like..
[[1]]
[[1]]
[1] 1 0 1 0 0
[[2]]
[[1]]
[1] 1 0 0 0 1
[[2]]
[1] 0 1 0 1 1
[[3]]
[[1]]
[1] 0 1 1 1 0
[[2]]
[1] 1 0 0 0 1
[[3]]
[1] 0 1 1 1 1
..
Can anyone help me make this up?

You want the function rerun applied to each element of the vector 1:100 using the map function as follows
library(purrr)
1:100 %>% map(function(x) rerun(x, rbinom(5,1,.5)))
However, it is just as easy to use replicate, where the default for replicate is to produce a column wise array.
lapply(1:100, function(x) replicate(x,rbinom(5,1,0.5)))
Note that the base R expression is much faster in this case.
a <- function() 1:100 %>% map(function(x) rerun(x, rbinom(5,1,.5)))
b <- function() lapply(1:100, function(x) replicate(x,rbinom(5,1,0.5)))
library(microbenchmark)
microbenchmark(a(),b())
Unit: milliseconds
expr min lq mean median uq max neval cld
a() 96.89941 104.83822 117.10245 111.48309 120.28554 391.9411 100 b
b() 16.88232 18.47104 23.22976 22.20549 26.31445 49.0042 100 a
Edit Regarding your question in the comments, if you are just interested in the law of large numbers representation, you could do as follows.
plot(1:100, do.call("c", lapply(b(), mean)),
type= "l", xlab = "replications",
ylab = "proportion of heads")
abline(h = .5)

If I understand you correctly, this is what you're after:
lapply(1:100, function(x) replicate(x,rbinom(5,1,0.5),simplify = FALSE))

Return vector position in list r

I am trying to determine the vector where an element is coming from in a list I have created. I'll give a repeatable example here:
set.seed(101)
a <- runif(10, min=0, max=100)
b <- runif(10, min=0, max=100)
c <- runif(10, min=0, max=100)
d <- runif(10, min=0, max=100)
information <- list(a, b, c, d)
information.wanted <- mean(do.call(pmax, information))
The code to get the information.wanted works just fine. What I am now trying to find is the individual vector in the list where each of the maximum values comes from. For example, value 1 in information.wanted (87.97...) comes from vector b in the information list. I would like to create another piece of code that gives the vector where the information.wanted comes from.
> information.wanted
[1] 87.97957 95.68375 73.19726 93.16344 92.33189 91.34787 82.04361 81.42830 62.20120
[10] 92.48044
I have no idea how to do this though. None of the code that I've tried has gotten me anywhere close.
postition.of.information.wanted <- ??
I'm looking to get something like this. A numeric vector is fine. I can supplement the values in later.
> position.of.informaiton.wanted
[1] 2 3 ...
Any help would be greatly appreciated. Thanks.

You need to apply which.max to each "i" index of each element in "information":
f1 = function(x)
sapply(seq_along(x[[1]]), function(i) which.max(sapply(x, "[[", i)))
f1(information)
# [1] 2 3 2 2 3 4 2 4 1 4
mapply already provides that kind of "parallel" functionality:
f2 = function(x)
unlist(.mapply(function(...) which.max(c(...)), x, NULL))
f2(information)
# [1] 2 3 2 2 3 4 2 4 1 4
Or, instead of concatenating "information" in chunks, convert to a "matrix" -as David Arenburg notes in the comments- at start and apply which.max to its rows:
f3a = function(x)
apply(do.call(cbind, x), 1, which.max)
f3a(information)
# [1] 2 3 2 2 3 4 2 4 1 4
or its columns:
f3b = function(x)
apply(do.call(rbind, x), 2, which.max)
f3b(information)
# [1] 2 3 2 2 3 4 2 4 1 4
also, max.col is convenient for a "matrix":
f4 = function(x)
max.col(do.call(cbind, x), "first")
f4(information)
# [1] 2 3 2 2 3 4 2 4 1 4
If it wasn't R, then a simple loop over the elements would provide both which.max and max ...but R, also, handles vectors:
f5 = function(x)
{
ans = rep_len(1L, length(x[[1]]))
maxs = x[[1]]
for(i in 2:length(x)) {
wh = x[[i]] > maxs
maxs[wh] = x[[i]][wh]
ans[wh] = i
}
ans #or '(data.frame(i = ans, val = maxs)' for both
}
f5(information)
# [1] 2 3 2 2 3 4 2 4 1 4
It had to end with a benchmark:
set.seed(007)
dat = replicate(13, runif(1e4), FALSE)
identical(f1(dat), f2(dat))
#[1] TRUE
identical(f2(dat), f3a(dat))
#[1] TRUE
identical(f3a(dat), f3b(dat))
#[1] TRUE
identical(f3b(dat), f4(dat))
#[1] TRUE
identical(f4(dat), f5(dat))
#[1] TRUE
microbenchmark::microbenchmark(f1(dat), f2(dat), f3a(dat), f3b(dat), f4(dat), f5(dat), do.call(pmax, dat), times = 50)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# f1(dat) 274995.963 298662.210 339279.948 318937.172 350822.539 723673.972 50 d
# f2(dat) 94619.397 100079.205 114664.776 107479.127 114619.439 226733.260 50 c
# f3a(dat) 19767.925 23423.688 26382.919 25795.499 29215.839 40100.656 50 b
# f3b(dat) 20351.872 22829.997 28889.845 25090.446 30503.100 140311.058 50 b
# f4(dat) 975.102 1109.431 1546.571 1169.462 1361.733 8954.100 50 a
# f5(dat) 2427.665 2470.816 5299.386 2520.755 3197.793 112986.612 50 a
# do.call(pmax, dat) 1477.618 1530.166 1627.934 1551.046 1602.898 2814.295 50 a

Most efficient way to turn factor matrix into binary (indicator) matrix in R

I can think of several ways to turn matrix (data frame) of this type:
dat = data.frame(
x1 = rep(c('a', 'b'), 100),
x2 = rep(c('x', 'y'), 100)
)
head(dat)
x1 x2
1 a x
2 b y
3 a x
4 b y
5 a x
6 b y
Into a binary (indicator) matrix (or data frame) like this:
a b x y
1 0 1 0
0 1 0 1
...
(This structure is, of course, trivial and only for illustrative purpose!)
Many thanks!

We can use table
tbl <- table(rep(1:nrow(dat),2),unlist(dat))
head(tbl, 2)
# a b x y
# 1 1 0 1 0
# 2 0 1 0 1
Or a possibly efficient option would be
library(Matrix)
sM <- sparse.model.matrix(~ -1 + x1 +x2, dat,
contrasts.arg = lapply(dat, contrasts, contrasts = FALSE))
colnames(sM) <- sub(".*\\d", "", colnames(sM))
head(sM, 2)
# 2 x 4 sparse Matrix of class "dgCMatrix"
# a b x y
#1 1 . 1 .
#2 . 1 . 1
It can be converted to binary by converting to matrix
head(as.matrix(sM),2)
# a b x y
#1 1 0 1 0
#2 0 1 0 1

There are some good solutions posted already, but none are optimal for performance. We can optimize performance by looping over each input column, and then looping over each factor level index within each input column and doing a straight integer comparison of the factor indexes. It's not the most concise or elegant piece of code, but it's fairly straightforward and fast:
do.call(cbind,lapply(dat,function(col)
`colnames<-`(do.call(cbind,lapply(seq_along(levels(col)),function(i)
as.integer(as.integer(col)==i)
)),levels(col))
));
Performance:
library(Matrix);
library(data.table);
library(microbenchmark);
bgoldst <- function(dat) do.call(cbind,lapply(dat,function(col) `colnames<-`(do.call(cbind,lapply(seq_along(levels(col)),function(i) as.integer(as.integer(col)==i))),levels(col))));
akrun1 <- function(dat) table(rep(1:nrow(dat),2),unlist(dat));
akrun2 <- function(dat) sparse.model.matrix(~-1+x1+x2,dat,contrasts.arg=lapply(dat,contrasts,contrasts=FALSE));
davidar <- function(dat) { dat[,rowid:=.I]; dcast(melt(dat,id='rowid'),rowid~value,length); }; ## requires a data.table
dataminer <- function(dat) t(apply(dat,1,function(x) as.numeric(unique(unlist(dat))%in%x)));
N <- 100L; dat <- data.frame(x1=rep(c('a','b'),N),x2=rep(c('x','y'),N)); datDT <- setDT(copy(dat));
identical(unname(bgoldst(dat)),matrix(as.vector(akrun1(dat)),ncol=4L));
## [1] TRUE
identical(unname(bgoldst(dat)),unname(matrix(as.integer(as.matrix(akrun2(dat))),ncol=4L)));
## [1] TRUE
identical(bgoldst(dat),as.matrix(davidar(datDT)[,rowid:=NULL]));
## [1] TRUE
identical(unname(bgoldst(dat)),matrix(as.integer(dataminer(dat)),ncol=4L));
## [1] TRUE
N <- 100L;
dat <- data.frame(x1=rep(c('a','b'),N),x2=rep(c('x','y'),N)); datDT <- setDT(copy(dat));
microbenchmark(bgoldst(dat),akrun1(dat),akrun2(dat),davidar(datDT),dataminer(dat));
## Unit: microseconds
## expr min lq mean median uq max neval
## bgoldst(dat) 67.570 92.374 106.2853 99.6440 121.2405 188.596 100
## akrun1(dat) 581.182 652.386 773.6300 690.6605 916.4625 1192.299 100
## akrun2(dat) 4429.208 4836.119 5554.5902 5145.3135 5977.0990 11263.537 100
## davidar(datDT) 5064.273 5498.555 6104.7621 5664.9115 6203.9695 11713.856 100
## dataminer(dat) 47577.729 49529.753 55217.3726 53190.8940 60041.9020 74346.268 100
N <- 1e4L;
dat <- data.frame(x1=rep(c('a','b'),N),x2=rep(c('x','y'),N)); datDT <- setDT(copy(dat));
microbenchmark(bgoldst(dat),akrun1(dat),akrun2(dat),davidar(datDT));
## Unit: milliseconds
## expr min lq mean median uq max neval
## bgoldst(dat) 1.775617 1.820949 2.299493 1.84725 1.972124 8.362336 100
## akrun1(dat) 38.954524 41.109257 48.409613 45.60304 52.147633 162.365472 100
## akrun2(dat) 16.915832 17.762799 21.288200 19.20164 23.775180 46.494055 100
## davidar(datDT) 36.151684 38.366715 42.875940 42.38794 45.916937 58.695008 100
N <- 1e5L;
dat <- data.frame(x1=rep(c('a','b'),N),x2=rep(c('x','y'),N)); datDT <- setDT(copy(dat));
microbenchmark(bgoldst(dat),akrun1(dat),akrun2(dat),davidar(datDT));
## Unit: milliseconds
## expr min lq mean median uq max neval
## bgoldst(dat) 17.16473 22.97654 35.01815 26.76662 31.75562 152.6188 100
## akrun1(dat) 501.72644 626.14494 671.98315 680.91152 727.88262 828.8313 100
## akrun2(dat) 212.12381 242.65505 298.90254 272.28203 357.65106 429.6023 100
## davidar(datDT) 368.04924 461.60078 500.99431 511.54921 540.39358 638.3840 100

If you have a data.frame as you are showing (not a matrix), you could as well recast the data
library(data.table)
setDT(dat)[, rowid := .I] # Creates a row index
res <- dcast(melt(dat, id = "rowid"), rowid ~ value, length) # long/wide format
head(res)
# rowid a b x y
# 1 1 1 0 1 0
# 2 2 0 1 0 1
# 3 3 1 0 1 0
# 4 4 0 1 0 1
# 5 5 1 0 1 0
# 6 6 0 1 0 1
Some benchmarks
dat = data.frame(
x1 = rep(c('a', 'b'), 1e3),
x2 = rep(c('x', 'y'), 1e3)
)
library(data.table)
library(Matrix)
library(microbenchmark)
dat2 <- copy(dat)
microbenchmark("akrun1 : " = table(rep(1:nrow(dat),2),unlist(dat)),
"akrun2 : " = sparse.model.matrix(~ -1 + x1 +x2, dat, contrasts.arg = lapply(dat, contrasts, contrasts = FALSE)),
"DatamineR : " = t(apply(dat,1, function(x) as.numeric(unique(unlist(dat)) %in% x))),
"David Ar : " = {setDT(dat2)[, rowid := .I] ; dcast(melt(dat2, id = "rowid"), rowid ~ value, length)},
times = 10L)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# akrun1 : 3.826075 4.061904 6.654399 5.165376 11.26959 11.82029 10 a
# akrun2 : 5.269531 5.713672 8.794434 5.943422 13.34118 20.01961 10 a
# DatamineR : 3199.336286 3343.774160 3410.618547 3385.756972 3517.22133 3625.70909 10 b
# David Ar : 8.092769 8.254682 11.030785 8.465232 15.44893 19.83914 10 a
The apply solution is highly inefficient and will take forever on a bigger data set. Comparing for a bigger data set while excluding the apply solution
dat = data.frame(
x1 = rep(c('a', 'b'), 1e4),
x2 = rep(c('x', 'y'), 1e4)
)
dat2 <- copy(dat)
microbenchmark("akrun1 : " = table(rep(1:nrow(dat),2),unlist(dat)),
"akrun2 : " = sparse.model.matrix(~ -1 + x1 +x2, dat, contrasts.arg = lapply(dat, contrasts, contrasts = FALSE)),
#"DatamineR : " = t(apply(dat,1, function(x) as.numeric(unique(unlist(dat)) %in% x))),
"David Ar : " = {setDT(dat2)[, rowid := .I] ; dcast(melt(dat2, id = "rowid"), rowid ~ value, length)},
times = 100L)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# akrun1 : 38.66744 41.27116 52.97982 42.72534 47.17203 161.0420 100 b
# akrun2 : 17.02006 18.93534 27.27582 19.35580 20.72022 153.2397 100 a
# David Ar : 34.15915 37.91659 46.11050 38.58536 41.40412 149.0038 100 b
Seems like the Matrix package shines for a bigger data sets.
It probably worth comparing different scenarios when there are more columns/unique values too.

One alternative using apply
head(t(apply(dat,1, function(x) as.numeric(unique(unlist(dat)) %in% x))))
[,1] [,2] [,3] [,4]
[1,] 1 0 1 0
[2,] 0 1 0 1
[3,] 1 0 1 0
[4,] 0 1 0 1
[5,] 1 0 1 0
[6,] 0 1 0 1

Check lengths of elements in list of equal length

Check whether the elements in a list is of equal length?
E.g.:
l <- list(c(1:3),c(2:7),c(12:13))
[[1]]
[1] 1 2 3
[[2]]
[1] 2 3 4 5 6 7
[[3]]
[1] 12 13
I have a long list with many entries and want a way to check if each element is of the same length.
Above it should return FALSE as the lengths differ (3,6,2).

Try this:
length(unique(sapply(l, length))) == 1
# [1] FALSE
Or #PierreLafortune's way:
length(unique(lengths(l))) == 1L
Or #CathG's way:
all(sapply(l, length) == length(l[[1]]))
#or
all(lengths(l) == length(l[[1]]))
Some benchmarking:
#data
set.seed(123)
l <- lapply(round(runif(1000,1,100)), runif)
library(microbenchmark)
library(ggplot2)
#benchmark
bm <- microbenchmark(
zx8754 = length(unique(sapply(l, length))) == 1,
PierreLafortune=length(unique(lengths(l))) == 1L,
CathG_1 = all(lengths(l) == length(l[[1]])),
CathG_2 = all(sapply(l, length) == length(l[[1]])),
times = 10000)
# result
bm
Unit: microseconds
expr min lq mean median uq max neval cld
zx8754 326.605 355.281 392.39741 364.034 377.618 84109.597 10000 d
PierreLafortune 23.545 25.960 30.24049 27.168 28.375 3312.829 10000 b
CathG_1 9.056 11.471 13.49464 12.679 13.584 1832.847 10000 a
CathG_2 319.965 343.207 371.50327 351.659 364.940 3531.068 10000 c
#plot benchmark
autoplot(bm)

I would use:
length(unique(lengths(l))) == 1L
[1] FALSE

Piecewise linear transformation without for loop or nested ifelse

I'm trying to perform a piecewise linear transformation of my data. Here's an example table describing a transformation:
dat <- data.frame(x.low = 0:2, x.high = 1:3, y.low=c(0, 2, 3), y.high=c(2, 3, 10))
dat
# x.low x.high y.low y.high
# 1 0 1 0 2
# 2 1 2 2 3
# 3 2 3 3 10
If I defined x <- c(1.75, 2.5), I would expect transformed values 2.75 and 6.5 (my elements would be matched by rows 2 and 3 of dat, respectively).
I know how to solve this problem with a for loop, iterating through the rows of dat and transforming the corresponding values:
pw.lin.trans <- function(x, m) {
out <- rep(NA, length(x))
for (i in seq(nrow(m))) {
matching <- x >= m$x.low[i] & x <= m$x.high[i]
out[matching] <- m$y.low[i] + (x[matching] - m$x.low[i]) /
(m$x.high[i] - m$x.low[i]) * (m$y.high[i] - m$y.low[i])
}
out
}
pw.lin.trans(x, dat)
# [1] 2.75 6.50
While this works, it strikes me there should be a better approach that matches x values to rows of dat and then performs all the interpolations in a single computation. Could somebody point me to a non-for-loop solution for this problem?

Try approx:
(xp <- unique(c(dat$x.low, dat$x.high)))
## [1] 0 1 2 3
(yp <- unique(c(dat$y.low, dat$y.high)))
## [1] 0 2 3 10
x <- c(1.75, 2.5)
approx(xp, yp, x)
## $x
## [1] 1.75 2.50
##
## $y
## [1] 2.75 6.50
or approxfun (which returns a new function):
f <- approxfun(xp, yp)
f(x)
## [1] 2.75 6.50
Some benchmarks:
set.seed(123L)
x <- runif(10000, min(xp), max(yp))
library(microbenchmark)
microbenchmark(
pw.lin.trans(x, dat),
approx(xp, yp, x)$y,
f(x)
)
## Unit: microseconds
## expr min lq median uq max neval
## pw.lin.trans(x, dat) 3364.241 3395.244 3614.0375 3641.7365 6170.268 100
## approx(xp, yp, x)$y 359.080 379.669 424.0895 453.6800 522.756 100
## f(x) 202.899 209.168 217.8715 232.3555 293.499 100

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Preserve names when coercing vector from binary to `as.numeric`? - r

Related

Generating Coin flips using purrr

Return vector position in list r

Most efficient way to turn factor matrix into binary (indicator) matrix in R

Check lengths of elements in list of equal length

Piecewise linear transformation without for loop or nested ifelse

Categories

Resources