Create a sequence from vectors with start and end positions - r

Given two separate vectors of equal length: f.start and f.end, I would like to construct a sequence (by 1), going from f.start[1]:f.end[1] to f.start[2]:f.end[2], ..., to f.start[n]:f.end[n].
Here is an example with just 6 rows.
f.start f.end
[1,] 45739 122538
[2,] 125469 202268
[3,] 203563 280362
[4,] 281657 358456
[5,] 359751 436550
[6,] 437845 514644
Crudely, a loop can do it, but is extremely slow for larger datasets (rows>2000).
f.start<-c(45739,125469,203563,281657,359751,437845)
f.end<-c(122538,202268,280362,358456,436550,514644)
f.ind<-f.start[1]:f.end[1]
for (i in 2:length(f.start))
{
f.ind.temp<-f.start[i]:f.end[i]
f.ind<-c(f.ind,f.ind.temp)
}
I suspect this can be done with apply(), but I have not worked out how to include two separate arguments in apply, and would appreciate some guidance.

You can try using mapply or Map, which iterates simultaneously on your two vectors. You need to provide the function as first argument:
vec1 = c(1,33,50)
vec2 = c(10,34,56)
unlist(Map(':',vec1, vec2))
# [1] 1 2 3 4 5 6 7 8 9 10 33 34 50 51 52 53 54 55 56
Just replace vec1 and vec2 by f.start and f.end provided all(f.start<=f.end)

Your loop is going to be slow as you are growing the vector
f.ind. You will also get an increase in speed if you pre-allocate
the length of the output vector.
# Some data (of length 3000)
set.seed(1)
f.start <- sample(1:10000, 3000)
f.end <- f.start + sample(1:200, 3000, TRUE)
# Functions
op <- function(L=1) {
f.ind <- vector("list", L)
for (i in 1:length(f.start)) {
f.ind[[i]] <- f.start[i]:f.end[i]
}
unlist(f.ind)
}
op2 <- function() unlist(lapply(seq(f.start), function(x) f.start[x]:f.end[x]))
col <- function() unlist(mapply(':',f.start, f.end))
# check output
all.equal(op(), op2())
all.equal(op(), col())
A few benchmarks
library(microbenchmark)
# Look at the effect of pre-allocating
microbenchmark(op(L=1), op(L=1000), op(L=3000), times=500)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# op(L = 1) 46.760416 48.741080 52.29038 49.636864 50.661506 113.08303 500 c
# op(L = 1000) 41.644123 43.965891 46.20380 44.633016 45.739895 94.88560 500 b
# op(L = 3000) 7.629882 8.098691 10.10698 8.338387 9.963558 60.74152 500 a
# Compare methods - the loop actually performs okay
# I left the original loop out
microbenchmark(op(L=3000), op2(), col(), times=500)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# op(L = 3000) 7.778643 8.123136 10.119464 8.367720 11.402463 62.35632 500 b
# op2() 6.461926 6.762977 8.619154 6.995233 10.028825 57.55236 500 a
# col() 6.656154 6.910272 8.735241 7.137500 9.935935 58.37279 500 a
So a loop should perform okay speed wise, but of course the Colonel's code is a lot cleaner. The *apply functions here wont really give much speed up in the calculation but they do offer tidier code and remove the need for pre-allocation.

Related

Subset list of vectors by position in a vectorized way

I have a list of vectors and I'm trying to select (for example) the 2nd and 4th element in each vector. I can do this using lapply:
list_of_vec <- list(c(1:10), c(10:1), c(1:10), c(10:1), c(1:10))
lapply(1:length(list_of_vec), function(i) list_of_vec[[i]][c(2,4)])
[[1]]
[1] 2 4
[[2]]
[1] 9 7
[[3]]
[1] 2 4
[[4]]
[1] 9 7
[[5]]
[1] 2 4
But is there a way to do this in a vectorized way -- avoiding one of the apply functions? My problem is that my actual list_of_vec is fairly long, so lapply takes awhile.
Solutions:
Option 1 #Athe's clever solution using do.call?:
do.call(rbind, list_of_vec)[ ,c(2,4)]
Option 2 Using lapply more efficiently:
lapply(list_of_vec, `[`, c(2, 4))
Option 3 A vectorized solution:
starts <- c(0, cumsum(lengths(list_of_vec)[-1]))
matrix(unlist(list_of_vec)[c(starts + 2, starts + 4)], ncol = 2)
Option 4 the lapply solution you wanted to improve:
lapply(1:length(list_of_vec), function(i) list_of_vec[[i]][c(2,4)])
Data:
And a few datasets I will test them on:
# The original data
list_of_vec <- list(c(1:10), c(10:1), c(1:10), c(10:1), c(1:10))
# A long list with short elements
list_of_vec2 <- rep(list_of_vec, 1e5)
# A long list with long elements
list_of_vec3 <- lapply(list_of_vec, rep, 1e3)
list_of_vec3 <- rep(list_of_vec3, 1e4)
Benchmarking:
Original list:
Unit: microseconds
expr min lq mean median uq max neval cld
o1 2.276 2.8450 3.00417 2.845 3.129 10.809 100 a
o2 2.845 3.1300 3.59018 3.414 3.414 23.325 100 a
o3 3.698 4.1250 4.60558 4.267 4.552 20.480 100 a
o4 5.689 5.9735 17.52222 5.974 6.258 1144.606 100 a
Longer list, short elements:
Unit: milliseconds
expr min lq mean median uq max neval cld
o1 146.30778 146.88037 155.04077 149.89164 159.52194 184.92028 10 b
o2 185.40526 187.85717 192.83834 188.42749 190.32103 213.79226 10 c
o3 26.55091 27.27596 28.46781 27.48915 28.84041 32.19998 10 a
o4 407.66430 411.58054 426.87020 415.82161 437.19193 473.64265 10 d
Longer list, long elements:
Unit: milliseconds
expr min lq mean median uq max neval cld
o1 4855.59146 4978.31167 5012.0429 5025.97619 5072.9350 5095.7566 10 c
o2 17.88133 18.60524 103.2154 21.28613 195.0087 311.4122 10 a
o3 855.63128 872.15011 953.8423 892.96193 1069.7526 1106.1980 10 b
o4 37.92927 38.87704 135.6707 124.05127 214.6217 276.5814 10 a
Summary:
Looks like the vectorized solution wins out if the list is long and the elements are short, but lapply is the clear winner for a long list with longer elements. Some of the options output a list, others a matrix. So keep in mind what you want your output to be. Good luck!!!
If your list is composed of vectors of the same length, you could first transform it into a matrix and then get the columns you want.
matrix_of_vec <- do.call(rbind,list_of_vec)
matrix_of_vec[ ,c(2,4)]
Otherwise I'm afraid you'll have to stick to the apply family. The most efficient way to do it is using the parallel package to compute parallely (surprisingly).
corenum <- parallel::detectCores()-1
cl<-parallel::makeCluster(corenum)
parallel::clusterExport(cl,"list_of_vec"))
parallel::parSapply(cl,list_of_vec, '[', c(2,4) )
In this piece of code '[' is the name of the subsetting function and c(2,4) the argument you pass to it.

Generating 100, 000 samples of size three from numbers 1 to 20, without replacement

I am trying to Generate some 100,000 samples of size three from numbers 1 to 20, without replacement, and used the following code in R:
s <- sample(N,3,pi<-n*x/sum(x),replace=FALSE)
[1] 12 6 17
Now this gave me one sample of size three, but how do I generate 100,000 of them? We also used
N<-20 #size of the population we could choose from
n<- 3
x <- runif(N)
pi<-n*x/sum(x)
but I do not know what went wrong. Any advice will be greatly appreciated, thank You.
Your question inspired me to try to write an implementation of multiple sampling-without-replacement using recursion on sampling-with-replacement.
Letting NS represent the number of desired samples and NE the number of elements to select from the input set for each sample, my idea was that it might be beneficial to try to avoid looping over NS sample() calls, which would be time-consuming for large NS. Instead, we can start by running a single sample call taking NS values with replacement, and consider that to represent the "first selection" of each sample. Then, for each unique selection, we can reduce the input set (and the probability weighting vector) by the selected element, and recurse until we've reached NE levels. By combining each (sub)sample, we can produce a matrix whose rows will each consist of a sample-without-replacement of NE values from the input set.
samplesNoReplace <- function(NS,set,NE=length(set),prob=NULL) {
if (NE>1L) {
inds <- sample(seq_along(set),NS,T,prob);
uris <- split(seq_len(NS),inds);
us <- as.integer(names(uris));
res <- base::matrix(set[inds],NS,NE);
for (ui in seq_along(uris)) {
u <- us[ui];
ris <- uris[[ui]];
res[ris,-1L] <- samplesNoReplace(length(ris),set[-u],NE-1L,prob[-u]);
}; ## end for
} else {
res <- base::matrix(sample(set,NS,T,if (length(set)==1L) NULL else prob),ncol=1L);
}; ## end if
res;
}; ## end samplesNoReplace()
Demo:
set.seed(10L); samplesNoReplace(10L,1:5,3L,c(10,2,2,2,1));
## [,1] [,2] [,3]
## [1,] 1 3 2
## [2,] 1 4 3
## [3,] 1 2 4
## [4,] 3 2 1
## [5,] 1 3 2
## [6,] 1 4 2
## [7,] 1 4 2
## [8,] 1 2 5
## [9,] 3 1 2
## [10,] 1 2 5
Benchmarking
library(microbenchmark);
bgoldst <- function() samplesNoReplace(NS,set,NE,prob);
akrun <- function() { N1 <- seq_len(NS); N <- length(set); lapply(N1, function(i) sample(set, size =NE, replace=FALSE,prob)); };
khashaa <- function() { replicate(NS, sample(set, NE,prob=prob), simplify = FALSE); };
## OP's case (100k samples, smallish set, smaller subset)
set.seed(1L);
NS <- 1e5L; set <- 1:20; NE <- 3L; prob <- runif(length(set));
microbenchmark(times=5L,bgoldst(),akrun(),khashaa());
## Unit: milliseconds
## expr min lq mean median uq max neval
## bgoldst() 40.9888 42.69257 46.33044 46.68856 47.40488 53.8774 5
## akrun() 547.3142 564.94249 599.96134 625.07602 631.19658 631.2774 5
## khashaa() 501.1226 521.14871 531.50227 524.65247 549.47600 561.1116 5
## 10k samples, large set, small subset
set.seed(1L);
NS <- 1e4L; set <- 1:1000; NE <- 5L; prob <- runif(length(set));
microbenchmark(times=5L,bgoldst(),akrun(),khashaa());
## Unit: milliseconds
## expr min lq mean median uq max neval
## bgoldst() 2716.1904 2722.8242 2756.9302 2731.2763 2753.5668 2860.7935 5
## akrun() 682.0505 688.3639 691.3169 689.6165 693.9692 702.5842 5
## khashaa() 684.5865 689.2030 698.8313 693.0822 696.1211 731.1638 5
## 1k samples, large set, large subset
set.seed(1L);
NS <- 1e3L; set <- 1:1000; NE <- 500L; prob <- runif(length(set));
microbenchmark(times=1L,bgoldst(),akrun(),khashaa());
## Unit: milliseconds
## expr min lq mean median uq max neval
## bgoldst() 74478.4313 74478.4313 74478.4313 74478.4313 74478.4313 74478.4313 1
## akrun() 350.7270 350.7270 350.7270 350.7270 350.7270 350.7270 1
## khashaa() 353.2574 353.2574 353.2574 353.2574 353.2574 353.2574 1
## 1M samples, small set, necessarily small subset
set.seed(1L);
NS <- 1e6L; set <- 1:4; NE <- 4L; prob <- runif(length(set));
microbenchmark(times=5L,bgoldst(),akrun(),khashaa());
## Unit: milliseconds
## expr min lq mean median uq max neval
## bgoldst() 502.0865 519.1875 602.5631 627.6124 648.3831 715.5459 5
## akrun() 5450.3987 5653.0774 5817.0921 5799.4497 5987.0575 6195.4771 5
## khashaa() 5301.3673 5667.8592 5683.3805 5744.1461 5824.8801 5878.6497 5
## 10M samples, small set, necessarily small subset
set.seed(1L);
NS <- 1e7L; set <- 1:4; NE <- 4L; prob <- runif(length(set));
microbenchmark(times=1L,bgoldst(),akrun(),khashaa());
## Unit: seconds
## expr min lq mean median uq max neval
## bgoldst() 5.023389 5.023389 5.023389 5.023389 5.023389 5.023389 1
## akrun() 75.891354 75.891354 75.891354 75.891354 75.891354 75.891354 1
## khashaa() 69.422056 69.422056 69.422056 69.422056 69.422056 69.422056 1
The pattern is very interesting and, I think, easily explicable. My function outperforms for many samples, small sets, and small subsets, because there are very few recursions required to cover all possible (sub)sample branches, while the looping solutions must iterate and make a sample() call for every sample. But my function severely underperforms for fewer samples, large sets, and large subsets, because the looping solutions don't have very many iterations to complete, and the tree of (sub)sample branches grows somewhat exponentially with each new selection. Hence, my function is only appropriate for the case of many samples, small sets, and small subsets, which, incidentally, pretty accurately describes your example use-case.
Of course, even for their most unfavorable timings, the looping solutions still perform decently, within approximately an order of magnitude of my function. Furthermore, many millions of samples of a small subset of a small set is unlikely to be required under any circumstances. So, for the sake of simplicity, I wouldn't consider it unreasonable to ignore this solution entirely, and always use the looping approach.
We can use lapply by looping over a sequence
N1 <- seq_len(100000)
N <- 20
lapply(N1, function(i) sample(N, size =3, replace=FALSE))
I have tried both the replicate command and the 1apply, and both give me 100,000 samples of size three for the numbers 1 to 20, which is good, but now I would like to be able to count how often each number appears. I understand that 9, for example, could potentially turn up 100,000 times, being in all 100,000 3-samples, but more likely, it might occur about one twentieth of the time. So if I have 100,000 samples of 3 digits per time, the total count of all numbers should be 300,000, since say for argument's sake R gave me 100,000 nines, where 9 happened to be in every sample, then there are two hundred thousand places left for all the other numbers. I referred to the function as s, and tried
count1 <- length(which(s == 2)); count1 , but this said
Error in which(s == 1) : (list) object cannot be coerced to type 'double',
but I do not get what that means. How do I ask R to give me an accurate count of all ones, all twos, etc., where I assume their total should sum to 300,000, because we end up with 300,000 numbers in the run. Thanks. Chris Lilly.

How to get row wise standard deviation over specific columns [duplicate]

I'd like to compute the variance for each row in a matrix. For the following matrix A
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 5 6 10
[3,] 50 7 11
[4,] 4 8 12
I would like to get
[1] 16.0000 7.0000 564.3333 16.0000
I know I can achieve this with apply(A,1,var), but is there a faster or better way? From octave, I can do this with var(A,0,2), but I don't get how the Y argument of the var() function in R is to be used.
Edit: The actual dataset of a typical chunk has around 100 rows and 500 columns. The total amount of data is around 50GB though.
You could potentially vectorize var over rows (or columns) using rowSums and rowMeans
RowVar <- function(x, ...) {
rowSums((x - rowMeans(x, ...))^2, ...)/(dim(x)[2] - 1)
}
RowVar(A)
#[1] 16.0000 7.0000 564.3333 16.0000
Using #Richards data, yields in
microbenchmark(apply(m, 1, var), RowVar(m))
## Unit: milliseconds
## expr min lq median uq max neval
## apply(m, 1, var) 343.369091 400.924652 424.991017 478.097573 746.483601 100
## RowVar(m) 1.766668 1.916543 2.010471 2.412872 4.834471 100
You can also create a more general function that will receive a syntax similar to apply but will remain vectorized (the column wise variance will be slower as the matrix needs to be transposed first)
MatVar <- function(x, dim = 1, ...) {
if(dim == 1){
rowSums((x - rowMeans(x, ...))^2, ...)/(dim(x)[2] - 1)
} else if (dim == 2) {
rowSums((t(x) - colMeans(x, ...))^2, ...)/(dim(x)[1] - 1)
} else stop("Please enter valid dimension")
}
MatVar(A, 1)
## [1] 16.0000 7.0000 564.3333 16.0000
MatVar(A, 2)
V1 V2 V3
## 547.333333 1.666667 1.666667
This is one of the main reasons why apply() is useful. It is meant to operate on the margins of an array or matrix.
set.seed(100)
m <- matrix(sample(1e5L), 1e4L)
library(microbenchmark)
microbenchmark(apply(m, 1, var))
# Unit: milliseconds
# expr min lq median uq max neval
# apply(m, 1, var) 270.3746 283.9009 292.2933 298.1297 343.9531 100
Is 300 milliseconds too long to make 10,000 calculations?

Efficient dataframe iteration in R

Suppose I have a a 5 million row data frame, with two columns, as such (this data frame only has ten rows for simplicity):
df <- data.frame(start=c(11,21,31,41,42,54,61,63), end=c(20,30,40,50,51,63,70,72))
I want to be able to produce the following numbers in a numeric vector:
11 to 20, 21 to 30, 31 to 40, 41 to 50, 51, 54-63, 64-70, 71-72
And then take the length of the new vector (in this case, 10+10+10+10+1+10+7+2) = 60
*NOTE, I do not need the vector itself, just it's length will suffice. So if someone has a more intelligent logical approach to obtain the length, that is welcomed.
Essentially, what was done, was the for each row in the dataframe, the sequence from the start to end was taken, and all these sequences were combined, and then filtered for UNIQUE values.
So I used an approach as such:
length(unique(c(apply(df, 1, function(x) {
return(as.numeric(x[1]):as.numeric(x[2]))
}))))
which proves incredibly slow on my five million row data frame.
Any quicker more efficient solutions? Bonus, please try to add system time.
user system elapsed
19.946 0.620 20.477
This should work, assuming your data is sorted.
library(dplyr) # for the lag function
with(df, sum(end - pmax(start, lag(end, 1, default = 0)+1) + 1))
#[1] 60
library(microbenchmark)
microbenchmark(
beginneR={with(df, sum(end - pmax(start, lag(end, 1, default = 0)+1) + 1))},
r2evans={vec <- pmax(mm[,1], c(0,1+head(mm[,2],n=-1))); sum(mm[,2]-vec+1);},
times = 1000
)
Unit: microseconds
expr min lq median uq max neval
beginneR 37.398 41.4455 42.731 44.0795 74.349 1000
r2evans 31.788 35.2470 36.827 38.3925 9298.669 1000
So matrix is still faster, but not much (and the conversion step is still not included here). And I wonder why the max duration in #r2evans's answer is so high compared to all other values (which are really fast)
Another method:
mm <- as.matrix(df) ## critical for performance/scalability
(vec <- pmax(mm[,1], c(0,1+head(mm[,2],n=-1))))
## [1] 11 21 31 41 51 54 64 71
sum(mm[,2] - vec + 1)
## [1] 60
(This should scale reasonable well, certainly better than data.frames.)
Edit: after I updated my code to use matrices and no apply calls, I did a quick benchmark of my implementation compared with the other answer (which is also correct):
library(microbenchmark)
library(dplyr)
microbenchmark(
beginneR={
df <- data.frame(start=c(11,21,31,41,42,54,61,63),
end=c(20,30,40,50,51,63,70,72))
with(df, sum(end - pmax(start, lag(end, 1, default = 0)+1) + 1))
},
r2evans={
mm <- matrix(c(11,21,31,41,42,54,61,63,
20,30,40,50,51,63,70,72), nc=2)
vec <- pmax(mm[,1], c(0,1+head(mm[,2],n=-1)))
sum(mm[,2]-vec+1)
}
)
## Unit: microseconds
## expr min lq median uq max neval
## beginneR 230.410 238.297 244.9015 261.228 443.574 100
## r2evans 37.791 40.725 44.7620 47.880 147.124 100
This benefits greatly from the use of matrices instead of data.frames.
Oh, and system time is not that helpful here :-)
system.time({
mm <- matrix(c(11,21,31,41,42,54,61,63,
20,30,40,50,51,63,70,72), nc=2)
vec <- pmax(mm[,1], c(0,1+head(mm[,2],n=-1)))
sum(mm[,2]-vec+1)
})
## user system elapsed
## 0 0 0

Vectorize comparison of a row vector with every row of a dataframe in R?

Suppose I have a data frame that comes from reading in the following file Foo.csv
A,B,C
1,2,3
2,2,4
1,7,3
I would like to count the number of matching elements between the first row and subsequent rows. For example, the first row matches with the second row in one position, and matches with the third row in two positions. Here is some code that will achieve the desired effect.
foo = read.csv("Foo.csv")
numDiffs = rep(0,dim(foo)[1])
for (i in 2:dim(foo)[1]) {
numDiffs[i] = sum(foo[i,] == foo[1,])
}
print(numDiffs)
My question is, can this be vectorized to kill the loop and possibly reduce the running time? My first attempt is below, but it leaves an error because == is not defined for this type of comparison.
colSums(foo == foo[1,])
> rowSums(sapply(foo, function(x) c(0,x[1] == x[2:nrow(foo)])))
[1] 0 1 2
Or using the automatic recycling of matrix comparisons:
bar <- as.matrix(foo)
c(0, rowSums(t(t(bar[-1, ]) == bar[1, ])))
# [1] 0 1 2
t() is there twice because the recycling is column- rather than row-wise.
As your dataset grows larger, you might get a bit more speed with something like this:
as.vector(c(0, rowSums(foo[rep(1, nrow(foo) - 1), ] == foo[-1, ])))
# [1] 0 1 2
The basic idea is to create a data.frame of the first row the same dimensions of the overall dataset less one row, and use that to check for equivalence with the remaining rows.
Deleting my original update, here are some benchmarks instead. Change "N" to see the effect on different data.frame sizes. The solution from #nacnudus scales best.
set.seed(1)
N <- 10000000
mydf <- data.frame(matrix(sample(10, N, replace = TRUE), ncol = 10))
dim(mydf)
# [1] 1000000 10
fun1 <- function(data) rowSums(sapply(data, function(x) c(0,x[1] == x[2:nrow(data)])))
fun2 <- function(data) as.vector(c(0, rowSums(data[rep(1, nrow(data) - 1), ] == data[-1, ])))
fun3 <- function(data) {
bar <- as.matrix(data)
c(0, rowSums(t(t(bar[-1, ]) == bar[1, ])))
}
library(microbenchmark)
## On your original sample data
microbenchmark(fun1(foo), fun2(foo), fun3(foo))
# Unit: microseconds
# expr min lq median uq max neval
# fun1(foo) 109.903 119.0975 122.5185 127.0085 228.785 100
# fun2(foo) 333.984 354.5110 367.1260 375.0370 486.650 100
# fun3(foo) 233.490 250.8090 264.7070 269.8390 518.295 100
## On the sample data created above--I don't want to run this 100 times!
system.time(fun1(mydf))
# user system elapsed
# 15.53 0.06 15.60
system.time(fun2(mydf))
# user system elapsed
# 2.05 0.01 2.06
system.time(fun3(mydf))
# user system elapsed
# 0.32 0.00 0.33
HOWEVER, if Codoremifa were to change their code to vapply instead of sapply, that answer wins! From 15 seconds down to 0.24 seconds on 1 million rows.
fun4 <- function(data) {
rowSums(vapply(data, function(x) c(0, x[1] == x[2:nrow(data)]),
vector("numeric", length=nrow(data))))
}
microbenchmark(fun3(mydf), fun4(mydf), times = 20)
# Unit: milliseconds
# expr min lq median uq max neval
# fun3(mydf) 369.5957 422.9507 438.8742 462.6958 486.3757 20
# fun4(mydf) 238.1093 316.9685 323.0659 328.0969 341.5154 20
eh, I don't see why you can't just do..
c(foo[1,]) == foo
# A B C
#[1,] TRUE TRUE TRUE
#[2,] FALSE TRUE FALSE
#[3,] TRUE FALSE TRUE
.. or even better foo[1,,drop=TRUE] == foo...
Thus the result becomes...
rowSums( c( foo[1,] ) == foo[-1,] )
#[1] 3 1 2
Remember, f[1,] is still a data.frame. Coerce to a vector and == is defined for what you need. This seems to be a little quicker than the vapply answer suggested #AnandaMahto on a big dataframe.
Benchmarking
Comparing this against fun3 and fun4 from #AnandaMahto's answer above I see a small speed improvement when using the larger data.frame, my.df...
microbenchmark(fun3(mydf), fun4(mydf), fun6(mydf) , times = 20)
#Unit: milliseconds
# expr min lq median uq max neval
# fun3(mydf) 320.7485 344.9249 356.1657 365.7576 399.5334 20
# fun4(mydf) 299.6660 313.7105 319.1700 327.8196 555.4625 20
# fun6(mydf) 196.8244 241.4866 252.6311 258.8501 262.7968 20
fun6 is defined as...
fun6 <- function(data) rowSums( c( data[1,] ) == data )

Resources