I have a two value
3 and 5
and I make vector
num1 <- 3
num2 <- 12
a <- c(num1, num2)
I want add number(12) to vector "a" and
also I want to make new vector with repeat and append
like this:
3,12, 15,24, 27,36, 39,48 ....
repeat number "n" is 6
I don't have any idea.
Here are two methods in base R.
with outer, you could do
c(outer(c(3, 12), (12 * 0:4), "+"))
[1] 3 12 15 24 27 36 39 48 51 60
or with sapply, you can explicitly loop through and calculate the pairs of sums.
c(sapply(0:4, function(i) c(3, 12) + (12 * i)))
[1] 3 12 15 24 27 36 39 48 51 60
outer returns a matrix where every pair of elements of the two vectors have been added together. c is used to return a vector. sapply loops through 0:4 and then calculates the element-wise sum. It also returns a matrix in this instance, so c is used to return a vector.
Here is a somewhat generic function that takes as input your original vector a, the number to add 12, and n,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n/len1), function(i) x*i)
v2 <- rep(v1, each = n/length(v1))
v3 <- rep(vec, n/len1)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48
f1(a, 11, 12)
#[1] 3 12 14 23 25 34 36 45 47 56 58 67 69 78
f1(a, 3, 2)
#[1] 3 12 6 15
EDIT
If by n=6 you mean 6 times the whole vector then,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n), function(i) x*i)
v2 <- rep(v1, each = len1)
v3 <- rep(vec, n)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48 51 60 63 72 75 84
Using rep for repeating and cumsum for the addition:
n = 6
rep(a, n) + cumsum(rep(c(12, 0), n))
# [1] 15 24 27 36 39 48 51 60 63 72 75 84
Related
I was wondering how I could eliminate the x elements from the second variable on (in this case, x[[2]] i.e., 0:90) in list x whose corresponding y is 0?
x = list(0:5, 0:90) # from the second variable on, in this list, eliminate elements whose
# corresponding `y` is `0` ?
y = lapply(list(dbinom(x[[1]], 5, .9), dpois(x[[2]], 50)), round, digits = 4)
P.S. My goal is to possibly do this using lapply for any larger list.
In this case, you could do
x[[2]][y[[2]] != 0]
to get your expected output.
However, as mentioned you have a larger list and want to do it for each one of them. In that case, we could use mapply
mapply(function(p, q) p[q != 0], x[2:length(x)], y[2:length(y)], SIMPLIFY = FALSE)
OR if we want to use lapply we could do
lapply(2:length(x), function(i) x[[i]][y[[i]] != 0])
If we want to keep the 1st element as it is we could do
c(list(x[[1]]), lapply(2:length(x), function(i) x[[i]][y[[i]] != 0]))
EDIT
To maintain the order we can rearrange the both x and y based on smallest_max
get_new_list <- function(x, y) {
smallest_max <- which.min(sapply(x, max))
new_x <- c(x[smallest_max], x[-smallest_max])
new_y <- c(y[smallest_max], y[-smallest_max])
c(new_x[1], lapply(2:length(new_x), function(i) new_x[[i]][new_y[[i]] != 0]))
}
x = list(0:5, 0:40)
y = lapply(list(dbinom(x[[1]], 5, .9), dpois(x[[2]], 50)), round, digits = 4)
get_new_list(x, y)
#[[1]]
#[1] 0 1 2 3 4 5
#[[2]]
#[1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
x = list(0:40, 0:5)
y = lapply(list(dpois(x[[1]], 50), dbinom(x[[2]], 5, .9)), round, digits = 4)
get_new_list(x, y)
#[[1]]
#[1] 0 1 2 3 4 5
#[[2]]
#[1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
I have a dataframe that looks something like -
test A B C
28 67 4 23
45 82 43 56
34 8 24 42
I need to compare test to the other three columns in that I just need the number of elements in the other column that is less than the corresponding element in the test column.
So the desired output is -
test A B C result
28 67 4 23 2
45 82 43 56 1
34 8 24 42 2
When I tried -
comp_vec = "test"
name_vec = c("A", "B", "C")
rowSums(df[, comp_vec] > df[, name_vec])
I get the error -
Error in Ops.data.frame(df[, comp_vec], df[, name_vec]) :
‘>’ only defined for equally-sized data frames
I am looking for a way without replicating test to match size of dataframe.
You can use sapply to return a vector of mapping the df$test column against the other three columns. That will return a T/F matrix that you can do rowSums, and set as your result column.
df <- data.frame(test = c(28, 45, 34), A = c(67, 82, 8), B = c(4, 43, 24), C = c(23, 56, 42))
df$result <- rowSums(sapply(df[,2:4], function(x) df$test > x))
> df
test A B C result
1 28 67 4 23 2
2 45 82 43 56 1
3 34 8 24 42 2
I noticed your expected results has 82 for the second row of A, whereas its 5 in your starting example.
df$result <- apply(df, 1, function(x) sum(x < x[1]))
Use apply, specify 1 to indicate by row. x < x[1] will give a vector of TRUE/FALSE if the value at each position in the row is smaller than the first column's value. Use sum to give the number of TRUE values.
# test A B C result
# 1 28 67 4 23 2
# 2 45 82 43 56 1
# 3 34 8 24 42 2
The seq function in R would give me a sequence from x to y with a constant step m:
seq(x, y, m)
E.g. seq(1,9,2) = c(1,3,5,7,9).
What would be the most elegant way to get a sequence from x to y with alternating steps m1 and m2, such that something like "seq(x, y, c(m1, m2))" would give me c(x, x + m1, (x + m1) + m2, (x + m1 + m2) + m1, ..., y), each time adding one of the steps (not necessarily reaching up to y, of course, as in seq)?
Example: x = 1; y = 19; m1 = 2; m2 = 4 and I get c(1,3,7,9,13,15,19).
I arrived the solution by:
1. Use cumsum with a vector c(from,rep(by,times),...), with by repeated times = ceiling((to-from)/sum(by)) times.
2. Truncate the sequence by !(seq > to).
seq_alt <- function(from, to, by) {
seq <- cumsum(c(from,rep(by,ceiling((to-from)/sum(by)))))
return(seq[! seq > to])
}
First n terms of this sequence you can generate with
x = 1; m1 = 2; m2 = 4
n <- 0:10 # first 11 terms
x + ceiling(n/2)*m1 + ceiling((n-1)/2)*m2
# [1] 1 3 7 9 13 15 19 21 25 27 31
Here is another idea,
fun1 <- function(x, y, j, z){
if(j >= y) {return(x)}else{
s1 <- seq(x, y, j+z)
s2 <- seq(x+j, y, j+z)
return(sort(c(s1, s2)))
}
}
fun1(1, 19, 2, 4)
#[1] 1 3 7 9 13 15 19
fun1(1, 40, 4, 3)
#[1] 1 5 8 12 15 19 22 26 29 33 36 40
fun1(3, 56, 7, 10)
#[1] 3 10 20 27 37 44 54
fun1(1, 2, 2, 4)
#[1] 1
Here is an alternative that uses diffinv This method over allocates the values, so as a stopping rule, I get the elements that are less than or equal to the stopping value.
seqAlt <- function(start, stop, by1, by2) {
out <- diffinv(rep(c(by1, by2), ceiling(stop / (by1 + by2))), xi=start)
return(out[out <= stop])
}
seqAlt(1, 19, 2, 4)
[1] 1 3 7 9 13 15 19
You could use Reduce with accumulate = TRUE to iteratively add either 2 or 4:
Reduce(`+`, rep(c(2,4), 10), init = 1, accumulate = TRUE)
# [1] 1 3 7 9 13 15 19 21 25 27 31 33 37 39 43 45 49 51 55 57 61
The number of times you repeat c(2,4) will determine sequence length; since it is 10 above, the sequence is length 20.
The purrr package has an accumulate wrapper, if you prefer the syntax:
purrr::accumulate(rep(c(2,4), 10), `+`, .init = 1)
## [1] 1 3 7 9 13 15 19 21 25 27 31 33 37 39 43 45 49 51 55 57 61
perfect example of recycling vectors in R
# 1.
x = 1; y = 19; m1 = 2; m2 = 4
(x:y)[c(TRUE, rep(FALSE, m1-1), TRUE, rep(FALSE,m2-1))]
# [1] 1 3 7 9 13 15 19
# 2.
x = 3; y = 56; m1 = 7; m2 = 10
(x:y)[c(TRUE, rep(FALSE, m1-1), TRUE, rep(FALSE,m2-1))]
# [1] 3 10 20 27 37 44 54
I have loaded a table of integer data with 2,200 columns. What I'd like to do is condense the data down by averaging the values in every 5 columns and placing that in a new column in a new table.
For example, if I had:
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 | Col10
2 4 6 8 10 12 14 16 18 20
I would get:
Col1 | Col2
6 16
Which is just the average of the values in columns 1-5 from the original table in Col1 and the average of the values in columns 6-10 in Col2.
I haven't quite wrapped my head around R syntax, so any help would be appreciated.
Here's one approach that's applicable if the number of elements to be grouped is divisible by n (5, in your case):
x <- 1:100
n <- 5
tapply(x, rep(seq(1, length(x), n), each=n), mean)
# 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
# 3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98
The first row of output contains element names, and the second row contains means of successive groups of n elements.
To apply this to all rows of a matrix or data.frame, you can do, e.g.:
m <- matrix(1:1000, ncol=100)
apply(m, 1, function(x) tapply(x, rep(seq(1, length(x), n), each=n), mean))
EDIT
This alternative approach will give you some performance gains due to vectorisation with rowMeans:
t(mapply(function(x, y) rowMeans(m[, x:y]),
seq(1, ncol(m), n), seq(n, ncol(m), n)))
Oops, I see this is the comment of #user20650 in #jbaums answer. The rowsum function splits rows of a matrix by a factor, and sums the columns of each split. So for
m <- matrix(1:1000, ncol=100)
n <- 5
we have
rowsum(t(m), rep(seq_len(ncol(m) / n), each=n)) / n
This is fast, if that's important
library(microbenchmark)
f0 = function(m, n) rowsum(t(m), rep(seq_len(ncol(m) / n), each=n)) / n
f1 = function(m, n)
apply(m, 1, function(x) tapply(x, rep(seq(1, length(x), n), each=n), mean))
f2 = function(m, n)
t(mapply(function(x, y) rowMeans(m[, x:y]),
seq(1, ncol(m), n), seq(n, ncol(m), n)))
all.equal(f0(m, n), f1(m, n), check.attributes=FALSE)
## [1] TRUE
all.equal(f0(m, n), f2(m, n), check.attributes=FALSE)
## [1] TRUE
microbenchmark(f0(m, n), f1(m, n), f2(m, n))
## Unit: microseconds
## expr min lq median uq max neval
## f0(m, n) 164.351 170.1675 176.730 187.8570 237.419 100
## f1(m, n) 8060.639 8513.3035 8696.742 8908.5190 9771.019 100
## f2(m, n) 540.894 588.3820 603.787 634.1615 732.209 100
Here's another approach using a loop and rowMeans instead, in case you prefer a loop in this case. Will work for matrices, but needs adjustment for vectors.
# example data
dat <- as.data.frame( matrix(1:20,ncol=10,byrow=TRUE) )
# pick range
range <- 5
ind <- seq(1,ncol(dat),range)
newdat <- NULL
for(i in ind){
newcol <- rowMeans(dat[,i:(i+range-1)])
newdat <- cbind(newdat, newcol)
}
Will result in:
> newdat
newcol newcol
[1,] 3 8
[2,] 13 18
#jbaums answer looks pretty good. Since I had already started this answer, I thought I would post my solution as well.
#Make some fake data
require(data.table)
data <- data.table(t(iris[,1:4]))
#Transpose since rows are easier to deal with than columns
data <- data.table(t(data))
data[ , row := .I]
#Sum by every 5 rows
data <- data[ , lapply(.SD,sum), by=cut(row,seq(0,nrow(data),5))]
#Transpose back to original results
result <- data.table(t(data))
If you wanted to get the means of the elements from col1-col5, col6-col10, etc.
m1 <- matrix(c(rep(1:100, 2), 1:20), ncol=22)
n <- 5
p1 <- prod(dim(m1))
n1 <- nrow(m1)*n
n2 <- p1-p1%%n1
c(rowMeans(matrix(m1[1:n2], nrow=p1%/%n1, byrow=TRUE)), mean(m1[(n2+1):p1]))
#[1] 25.5 75.5 25.5 75.5 10.5
Or
sapply(seq(1,ncol(m1), by=n), function(i) mean(m1[,i:(min(c(i+n-1), ncol(m1)))]) )
#[1] 25.5 75.5 25.5 75.5 10.5
With some labels
indx <- seq(1,n2/nrow(m1), by=n)
indx1 <- paste("Col",paste(indx, indx+4, sep="-"),sep="_")
indx2 <- paste("Col", paste(seq(p1%%n1+1, ncol(m1)),collapse="-"), sep="_")
c(rowMeans(matrix(m1[1:n2], nrow=p1%/%n1, byrow=TRUE, dimnames=list(indx1, NULL))), setNames(mean(m1[(n2+1):p1]), indx2))
# Col_1-5 Col_6-10 Col_11-15 Col_16-20 Col_21-22
# 25.5 75.5 25.5 75.5 10.5
Update
I realized that you wanted the rowMeans by splitting up columns 1:5, 6:10, 11:15 etc. If that is the case:
res1 <- cbind( colMeans(aperm(array(m1[1:n2], dim=c(nrow(m1), n, p1%/%n1)), c(2,1,3))),
rowMeans(m1[,(ncol(m1)-ncol(m1)%%n+1):ncol(m1)]))
which is equal to manual splitting the columns
res2 <- cbind(rowMeans(m1[,1:5]), rowMeans(m1[,6:10]), rowMeans(m1[,11:15]),
rowMeans(m1[,16:20]), rowMeans(m1[,21:22]))
identical(res1,res2)
#[1] TRUE
colnames(res1) <- c(indx1,indx2)
res1
# Col_1-5 Col_6-10 Col_11-15 Col_16-20 Col_21-22
#[1,] 21 71 21 71 6
#[2,] 22 72 22 72 7
#[3,] 23 73 23 73 8
#[4,] 24 74 24 74 9
#[5,] 25 75 25 75 10
#[6,] 26 76 26 76 11
#[7,] 27 77 27 77 12
#[8,] 28 78 28 78 13
#[9,] 29 79 29 79 14
#[10,] 30 80 30 80 15
Please, input the following code:
A <- matrix(11, nrow = 4, ncol = 3)
A[,2] <- seq(119, 122, 1)
A[,3] <- seq(45, 42)
B <- matrix(39, nrow = 4, ncol = 3)
B[,2] <- seq(119, 122, 1)
B[,3] <- seq(35, 32)
C <- matrix(67, nrow = 4, ncol = 3)
C[,2] <- seq(119, 122, 1)
C[,3] <- seq(27, 24)
D <- rbind(A, B, C)
You will get D which is a 12 x 3 matrix; I would like to know the most efficient way to obtain Mat starting from D.
> Mat
11 39 67
119 45 35 27
120 44 34 26
121 43 33 25
122 42 32 24
In fact, Mat is the last column of D indexed by the first and the second column of D; e.g. consider Mat[1,1] which is equal to 45: it comes from the only row of D which is identified by 11 and 119.
How may I obatin it?
Thanks,
You can use xtabs:
xtabs(D[,3]~D[,2]+D[,1])
D[, 1]
D[, 2] 11 39 67
119 45 35 27
120 44 34 26
121 43 33 25
122 42 32 24
library(reshape2)
dcast(data.frame(D), X2 ~ X1)