Repeat sequence every nth element x times - r

I have a vector e.g:
v <- c(1, 2, 3, 4)
I want to repeat the sequence of every nth element x times e.g:
x=2
n= 2
[1] 1, 2, 1, 2, 3, 4, 3, 4
I know that
rep(v, times=n)
[1] 1, 2, 3, 4, 1, 2, 3, 4
rep(v, each=n)
[1] 1, 1, 2, 2, 3, 3, 4, 4
Thanks!

You could split the vector and then repeat:
fun <- function(v, m, n) {
unlist(by(v, ceiling(seq_along(v) / m), rep, n), use.names = FALSE)
}
v <- c(1, 2, 3, 4)
fun(v, 2, 2)
# [1] 1 2 1 2 3 4 3 4
fun(v, 3, 3)
# [1] 1 2 3 1 2 3 1 2 3 4 4 4

Another split option:
unlist(rep(split(v,(seq_along(v)-1) %/% n), each = x), use.names = FALSE)
#[1] 1 2 1 2 3 4 3 4

We may create a grouping index with gl and use tapply
f_rep <- function(v, x, n)
{
unname(unlist(tapply(v, as.integer(gl(length(v), x,
length(v))), rep, times = n)))
}
-testing
> x <- 2
> n <- 2
> f_rep(v, x, n)
[1] 1 2 1 2 3 4 3 4
> f_rep(v, 3, 3)
[1] 1 2 3 1 2 3 1 2 3 4 4 4

With sequence:
v <- 9:1
x <- 2L
n <- 3L
v[sequence(rep(n, x*(length(v) %/% n)), rep(seq(1, length(v), n), each = x))]
#> [1] 9 8 7 9 8 7 6 5 4 6 5 4 3 2 1 3 2 1
If it needs to handle vectors whose lengths are not multiples of n:
v <- 1:5
x <- 3L
n <- 3L
v[
sequence(
c(
rep(n, x*(length(v) %/% n)),
rep(length(v) %% n, x)
),
c(
rep(seq(1, length(v), n), each = x),
length(v) - length(v) %% n + 1L
)
)
]
#> [1] 1 2 3 1 2 3 1 2 3 4 5 4 5 4 5

Related

How do I create a simple table in R using for loops?

I was asked to create a table with three columns, A, B and C and eight rows. Column A must go 1, 1, 1, 1, 2, 2, 2, 2. Column B must alternate 1, 2, 1, 2, 1, 2, 1, 2. And column C must go 1, 1, 2, 2, 1, 1, 2, 2. I am able to produce the A column data fine, but don't know how to get B or C. This is the code I have so far:
dataSheet <- matrix(nrow = 0, ncol = 3)
colnames(dataSheet) <- c('A', 'B', 'C')
A <- 1
B <- 1
C <- 1
for (A in 1:4){
A=1
dataSheet <- rbind(dataSheet, c(A, B, C))
}
for (A in 5:8){
A=2
dataSheet <- rbind(dataSheet, c(A, B, C))
}
This seems like a good excuse to get familiar with the rep() function as it easily supports this question, but many more complicated questions if you're clever enough:
dt <- data.frame(A = rep(1:2, each = 4),
B = rep(1:2, times = 4),
C = rep(1:2, each = 2))
dt
#> A B C
#> 1 1 1 1
#> 2 1 2 1
#> 3 1 1 2
#> 4 1 2 2
#> 5 2 1 1
#> 6 2 2 1
#> 7 2 1 2
#> 8 2 2 2
Created on 2019-01-26 by the reprex package (v0.2.1)
Simply use R's vectorization for this task, i.e.
A <- c(1, 1, 1, 1, 2, 2, 2, 2)
B <- c(1, 2, 1, 2, 1, 2, 1, 2) # or rep(1:2, 4)
C <- c(1, 1, 2, 2, 1, 1, 2, 2)
cbind(A,B,C)
Maybe something along the lines of the following will be acceptable by your professor.
for (i in 1:8){
A <- if(i <= 4) 1 else 2
B <- if(i %% 2) 1 else 2
C <- if(any(i %% 4 == c(0, 1, 4, 5))) 1 else 2
dataSheet <- rbind(dataSheet, c(A, B, C))
}
dataSheet
# A B C
#[1,] 1 1 1
#[2,] 1 2 2
#[3,] 1 1 2
#[4,] 1 2 1
#[5,] 2 1 1
#[6,] 2 2 2
#[7,] 2 1 2
#[8,] 2 2 1

Extract multiple ranges from a numeric vector

First, I simplify my question. I want to extract certain ranges from a numeric vector. For example, extracting 3 ranges from 1:20 at the same time :
1 < x < 5
8 < x < 12
17 < x < 20
Therefore, the expected output is 2, 3, 4, 9, 10, 11, 18, 19.
I try to use the function findInterval() and control arguments rightmost.closed and left.open to do that, but any arguments sets cannot achieve the goal.
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
x[findInterval(x, v) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19
x[findInterval(x, v, rightmost.closed = T) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19 20
x[findInterval(x, v, left.open = T) %% 2 == 1]
# [1] 2 3 4 5 9 10 11 12 18 19 20
By the way, the conditions can also be a matrix like that :
[,1] [,2]
[1,] 1 5
[2,] 8 12
[3,] 17 20
I don't want to use for loop if it's not necessary.
I am grateful for any helps.
I'd probably do it using purrr::map2 or Map, passing your lower-bounds and upper-bounds as arguments and filtering your dataset with a custom function
library(purrr)
x <- 1:20
lower_bounds <- c(1, 8, 17)
upper_bounds <- c(5, 12, 20)
map2(
lower_bounds, upper_bounds, function(lower, upper) {
x[x > lower & x < upper]
}
)
You may use data.table::inrange and its incbounds argument. Assuming ranges are in a matrix 'm', as shown in your question:
x[data.table::inrange(x, m[ , 1], m[ , 2], incbounds = FALSE)]
# [1] 2 3 4 9 10 11 18 19
m <- matrix(v, ncol = 2, byrow = TRUE)
You were on the right path, and left.open indeed helps, but rightmost.closed actually concerns only the last interval rather than the right "side" of each interval. Hence, we need to use left.open twice. As you yourself figured out, it looks like an optimal way to do that is
x[findInterval(x, v) %% 2 == 1 & findInterval(x, v, left.open = TRUE) %% 2 == 1]
# [1] 2 3 4 9 10 11 18 19
Clearly there are alternatives. E.g.,
fun <- function(x, v)
if(length(v) > 1) v[1] < x & x < v[2] | fun(x, v[-1:-2]) else FALSE
x[fun(x, v)]
# [1] 2 3 4 9 10 11 18 19
I found an easy way just with sapply() :
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
(v.df <- as.data.frame(matrix(v, 3, 2, byrow = T)))
# V1 V2
# 1 1 5
# 2 8 12
# 3 17 20
y <- sapply(x, function(x){
ind <- (x > v.df$V1 & x < v.df$V2)
if(any(ind)) x else NA
})
y[!is.na(y)]
# [1] 2 3 4 9 10 11 18 19

How to get indices of top k values for each (selected) column in data.table

How to find the indices of the top k (say k=3) values for each column
> dt <- data.table( x = c(1, 1, 3, 1, 3, 1, 1), y = c(1, 2, 1, 2, 2, 1, 1) )
> dt
x y
1: 1 1
2: 1 2
3: 3 1
4: 1 2
5: 3 2
6: 1 1
7: 1 1
Required output:
> output.1
x y
1: 1 2
2: 3 4
3: 5 5
Or even better (notice the additional helpful descending sort in x):
> output.2
var top1 top2 top3
1: x 3 5 1
2: y 2 4 5
Having the output would be already a great help.
We can use sort (with index.return=TRUE) after looping over the columns of the dataset with lapply
dt[, lapply(.SD, function(x) sort(head(sort(x,
decreasing=TRUE, index.return=TRUE)$ix,3)))]
# x y
#1: 1 2
#2: 3 4
#3: 5 5
Or use order
dt[, lapply(.SD, function(x) sort(head(order(-x),3)))]
If the order of the elements having same rank doesn't matter then this answer would be also valid.
The order information can be extracted from data.table index.
library(data.table)
dt = data.table(x = c(1, 1, 3, 1, 3, 1, 1), y = c(1, 2, 1, 2, 2, 1, 1))
set2key(dt, x)
set2key(dt, y)
tail.index = function(dt, index, n){
idx = attr(attr(dt, "index"), index)
rev(tail(idx, n))
}
tail.index(dt, "__x", 3L)
#[1] 5 3 7
tail.index(dt, "__y", 3L)
#[1] 5 4 2
Here's a verbose solution which I'm sure undermines the slickness of the data.table package:
dt$idx <- seq.int(1:nrow(dt))
k <- 3
top_x <- dt[order(-x), idx[1:k]]
top_y <- dt[order(-y), idx[1:k]]
dt_top <- data.table(top_x, top_y)
dt_top
# top_x top_y
# 1: 3 2
# 2: 5 4
# 3: 1 5

Reverse score a vector

I have numeric vectors, such as c(1, 2, 3, 3, 2, 1, 3) or c(1, 4, 1, 4, 4, 1), and I would like to keep individual element's position, but swap/reverse the value, so that we get c(3, 2, 1, 1, 2, 3, 1), c(4, 1, 4, 1, 1, 4) respectively.
To achieve that, I came up with a rather rough and ugly code below with lots of debugging and patching...
blah <- c(1, 4, 1, 4, 4, 1, 3)
blah.uniq <- sort(unique(blah))
blah.uniq.len <- length(blah.uniq)
j <- 1
end <- ceiling(blah.uniq.len / 2)
if(end == 1) {end <- 2} # special case like c(1,4,1), should get c(4,1,4)
for(i in blah.uniq.len:end) {
x <- blah == blah.uniq[i]
y <- blah == blah.uniq[j]
blah[x] <- blah.uniq[j]
blah[y] <- blah.uniq[i]
j = j + 1
}
blah
Is there an easier way to do this?
I think you're trying to reverse score. The algorithm is (1 + max(x_i)) - x_i
so...
x <- c(1, 2, 3, 3, 2, 1, 3)
y <- c(1, 4, 1, 4, 4, 1)
(max(x, na.rm=T) + 1) - x
(max(y, na.rm=T) + 1) - y
yielding:
> (max(x, na.rm=T) + 1) - x
[1] 3 2 1 1 2 3 1
> (max(y, na.rm=T) + 1) - y
[1] 4 1 4 1 1 4
Per the OP's comment:
rev.score <- function(x) {
h <- unique(x)
a <- seq(min(h, na.rm=T), max(h, na.rm=T))
b <- rev(a)
dat <- data.frame(a, b)
dat[match(x, dat[, 'a']), 2]
}
x <- c(1, 2, 3, 3, 2, 1, 3)
rev.score(x)
y <- c(1, 4, 1, 4, 4, 1)
rev.score(y)
z <- c(1, 5, 10, -3, -5, 2)
rev.score(z)
Congratulations! You might have finally found a use for factors , I was still looking for one :-)
x <- c(1, 2, 3, 3, 2, 1, 3)
# [1] 1 2 3 3 2 1 3
y <- factor(x)
# [1] 1 2 3 3 2 1 3
# Levels: 1 2 3
levels(y) <- rev(levels(y))
# [1] 3 2 1 1 2 3 1
# Levels: 3 2 1
Built on that idea, here is a function that returns an object with the same class as the input:
swap <- function(x) {
f <- factor(x)
y <- rev(levels(f))[f]
class(y) <- class(x)
return(y)
}
swap(c(1, 2, 3, 3, 2, 1, 3))
# [1] 3 2 1 1 2 3 1
swap(c(1, 4, 1, 4, 4, 1))
# [1] 4 1 4 1 1 4
A possible generalisable function.
revscore <- function(x) {
rx <- min(x):max(x)
rev(rx)[sapply(1:length(x), function(y) match(x[y],rx))]
}
x1 <- c(-3,-1,0,-2,3,2,1)
x2 <- c(-1,0,1,2)
x3 <- 1:7
Some testing:
> x1
[1] -3 -1 0 -2 3 2 1
> revscore(x1)
[1] 3 1 0 2 -3 -2 -1
> x2
[1] -1 0 1 2
> revscore(x2)
[1] 2 1 0 -1
> x3
[1] 1 2 3 4 5 6 7
> revscore(x3)
[1] 7 6 5 4 3 2 1

Sort list of vectors into single frequency table with a factor column

I have a data frame containing a list vector with jagged entries:
df = data.frame(x = rep(c(1,2), 2), y = rep(c("a", "b"), each = 2))
L = list()
for (each in round(runif(4, 1,5))) L = c(L, list(1:each))
df$L = L
For example,
x y L
1 a 1
2 a 1, 2, 3, 4
1 b 1, 2, 3
2 b 1, 2, 3
How could I create a table which counts the values of L for each x, across the values of y? So, in this example it would output something like,
1 2 3 4
X
1 2 1 1 0
2 2 2 2 1
I had some luck using
tablist = function(L) table(unlist(L))
tapply(df$L, df$x, tablist)
which produces,
$`1`
1 2 3
2 1 1
$`2`
1 2 3 4
2 2 2 1
However, I'm not sure how to go from here to a single table. Also, I'm beggining to suspect that this approach might start taking an unruly amount of time for large data frames. Any thoughts / suggestions would be greatly appreciated!
Using pylr
library(plyr)
df = data.frame(x = rep(c(1,2), 2), y = rep(c("a", "b"), each = 2))
L = list()
set.seed(2)
for (each in round(runif(4, 1,5))) L = c(L, list(1:each))
df$L = L
> df
x y L
1 1 a 1, 2
2 2 a 1, 2, 3, 4
3 1 b 1, 2, 3
4 2 b 1, 2
table(ddply(df,.(x),summarize,unlist(L)))
> table(ddply(df,.(x),summarize,unlist(L)))
..1
x 1 2 3 4
1 2 2 1 0
2 2 2 1 1
If you're not into plyr...
vals <- unique(unlist(df$L))
names(vals) <- vals
do.call("rbind",
lapply(split(df,df$x),function(byx){
sapply(vals, function(i){
sum(unlist(sapply(byx$L,"==",i)))
})
})
)

Resources