Why use c() to define vector? - r

c is not the abbreviation of vector in English, so why use c() to define a vector in R?
v1<- c(1,2,3,4,5)

This is a good question, and the answer is kind of odd. "c", believe it or not, stands for "combine", which is what it normally does:
> c(c(1, 2), c(3))
[1] 1 2 3
But it happens that in R, a number is just a vector of length 1:
> 1
[1] 1
So, when you use c() to create a vector, what you are actually doing is combining together a series of 1-length vectors.

Owen's answer is perfect, but one other thing to note is that c() can concatenate more than just vectors.
> x = list(a = rnorm(5), b = rnorm(7))
> y = list(j = rpois(3, 5), k = rpois(4, 2), l = rbinom(9, 1, .43))
> foo = c(x,y)
> foo
$a
[1] 0.280503895 -0.853393705 0.323137905 1.232253725 -0.007638861
$b
[1] -2.0880857 0.2553389 0.9434817 -1.2318130 -0.7011867 0.3931802 -1.6820880
$j
[1] 5 12 5
$k
[1] 3 1 2 1
$l
[1] 1 0 0 1 0 0 1 1 0
> class(foo)
[1] "list"
Second Example:
> x = 1:10
> y = 3*x+rnorm(length(x))
> z = lm(y ~ x)
> is.vector(z)
[1] FALSE
> foo = c(x, z)
> foo
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
[[6]]
[1] 6
[[7]]
[1] 7
[[8]]
[1] 8
[[9]]
[1] 9
[[10]]
[1] 10
$coefficients
(Intercept) x
0.814087 2.813492
$residuals
1 2 3 4 5 6 7
-0.2477695 -0.3375283 -0.1475338 0.5962695 0.5670256 -0.5226752 0.6265995
8 9 10
0.1017986 -0.4425523 -0.1936342
$effects
(Intercept) x
-51.50810097 25.55480795 -0.05371226 0.66592081 0.61250676 -0.50136423
0.62374031 0.07476915 -0.49375185 -0.26900403
$rank
[1] 2
$fitted.values
1 2 3 4 5 6 7 8
3.627579 6.441071 9.254562 12.068054 14.881546 17.695038 20.508529 23.322021
9 10
26.135513 28.949005
$assign
[1] 0 1
$qr
$qr
(Intercept) x
1 -3.1622777 -17.39252713
2 0.3162278 9.08295106
3 0.3162278 0.15621147
4 0.3162278 0.04611510
5 0.3162278 -0.06398128
6 0.3162278 -0.17407766
7 0.3162278 -0.28417403
8 0.3162278 -0.39427041
9 0.3162278 -0.50436679
10 0.3162278 -0.61446316
attr(,"assign")
[1] 0 1
$qraux
[1] 1.316228 1.266308
$pivot
[1] 1 2
$tol
[1] 1e-07
$rank
[1] 2
attr(,"class")
[1] "qr"
$df.residual
[1] 8
$xlevels
named list()
$call
lm(formula = y ~ x)
$terms
y ~ x
attr(,"variables")
list(y, x)
attr(,"factors")
x
y 0
x 1
attr(,"term.labels")
[1] "x"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: R_GlobalEnv>
attr(,"predvars")
list(y, x)
attr(,"dataClasses")
y x
"numeric" "numeric"
$model
y x
1 3.379809 1
2 6.103542 2
3 9.107029 3
4 12.664324 4
5 15.448571 5
6 17.172362 6
7 21.135129 7
8 23.423820 8
9 25.692961 9
10 28.755370 10

Related

purrr - how to apply recursively a function with changing arguments

Ideally I would like to make use of purrr's accumulate function or similar.
Let's say I want to make use of utils::combn function iteratively, and get all the intermediate results (ideally put inside a list of lists).
In example below, initially, parameter x = 4, thus m will be also 4 (but (x, m) could be (5, 5), (6, 6), ...). Then, after first loop, x will be previous result, whilst m goes down by one, iteratively until m = 2.
n1 <- combn(x = 4, m = 4, simplify = FALSE)
n2 <- map(n1, ~ combn(.x, 3, simplify = FALSE))
n3 <- map(n2, ~ map(., ~ combn(.x, 2, simplify = FALSE)))
> n1
[[1]]
[1] 1 2 3 4
> n2
[[1]]
[[1]][[1]]
[1] 1 2 3
[[1]][[2]]
[1] 1 2 4
[[1]][[3]]
[1] 1 3 4
[[1]][[4]]
[1] 2 3 4
> n3
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] 1 2
[[1]][[1]][[2]]
[1] 1 3
[[1]][[1]][[3]]
[1] 2 3
[[1]][[2]]
[[1]][[2]][[1]]
[1] 1 2
[[1]][[2]][[2]]
[1] 1 4
[[1]][[2]][[3]]
[1] 2 4
[[1]][[3]]
[[1]][[3]][[1]]
[1] 1 3
[[1]][[3]][[2]]
[1] 1 4
[[1]][[3]][[3]]
[1] 3 4
[[1]][[4]]
[[1]][[4]][[1]]
[1] 2 3
[[1]][[4]][[2]]
[1] 2 4
[[1]][[4]][[3]]
[1] 3 4
As you can imagine, I want to get all possible combinations, e.g.:
choose(4, 4) -> choose(result, 3) -> choose(result, 2).
Any help or ideas would be much appreciated.
You can use accumulate + map_depth:
combn_recur <- function(n) {
accumulate(c(n, 0:(n-2)),
~ map_depth(.x, .y, combn, m = n-.y, simplify = FALSE))[-1]
}
all.equal(combn_recur(4), c(n1, n2, n3))
# TRUE
combn_recur(3)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [[2]][[1]]
# [1] 1 2
#
# [[2]][[2]]
# [1] 1 3
#
# [[2]][[3]]
# [1] 2 3
combn_recur(2)
# [[1]]
# [1] 1 2
combn_recur(1)
# Error in .f(.x[[i]], ...) : n < m

Get subsets between one element and the previous same element

Consider a vector:
vec <- c(1, 3, 4, 3, 3, 1, 1)
I'd like to get, for each element of the vector, a subset of the values in between the nth element and its previous occurrence.
The expected output is:
f(vec)
# [[1]]
# [1] 1
#
# [[2]]
# [1] 3
#
# [[3]]
# [1] 4
#
# [[4]]
# [1] 3 4 3
#
# [[5]]
# [1] 3 3
#
# [[6]]
# [1] 1 3 4 3 3 1
#
# [[7]]
# [1] 1 1
We may loop over the sequence of the vector, get the index of the last match of the same element ('i1') from the previous elements of the vector and get the sequence (:) to subset the vector
lapply(seq_along(vec), function(i) {
i1 <- tail(which(vec[1:(i-1)] == vec[i]), 1)[1]
i1[is.na(i1)] <- i
vec[i1:i]
})
-output
[[1]]
[1] 1
[[2]]
[1] 3
[[3]]
[1] 4
[[4]]
[1] 3 4 3
[[5]]
[1] 3 3
[[6]]
[1] 1 3 4 3 3 1
[[7]]
[1] 1 1

Conditional merging of two lists in R

I am trying to combine two lists that complement each other, where one contains half the set of values and the second the other half:
v1 <- c(1,2,2,4)
v2 <- c(NULL)
v3 <- c(1,2,2,4)
l1 <- list(v1,v2,v3)
v1b <- c(NULL)
v2b <- c(1,2,2,4)
v3b <- c(NULL)
l2 <- list(v1b,v2b,v3b)
> l1
[[1]]
[1] 1 2 2 4
[[2]]
NULL
[[3]]
[1] 1 2 2 4
> l2
[[1]]
NULL
[[2]]
[1] 1 2 2 4
[[3]]
NULL
The desired result is:
[[1]]
[1] 1 2 2 4
[[2]]
[1] 1 2 2 4
[[3]]
[1] 1 2 2 4
I tried several ways. This is the closest I got:
> sapply(l1, function(x) ifelse(x == "NULL", l2[[x]], x))
[[1]]
[1] 1 2 2 4
[[2]]
logical(0)
[[3]]
[1] 1 2 2 4
Any help is appreciated.

Getting all splits of numeric sequence in R

I'm trying to get all the possible splits of a sequence [1:n] in R. E.g.:
getSplits(0,3)
Should return all possible splits of the sequence 123, in other words (in a list of vectors):
[1] 1
[2] 1 2
[3] 1 2 3
[4] 1 3
[5] 2
[6] 2 3
[7] 3
Now I've created a function which does get to these vectors recursively, but having trouble combining them into one as above. My function is:
getSplits <- function(currentDigit, lastDigit, split) {
splits=list();
for (nextDigit in currentDigit: lastDigit)
{
currentSplit <- c(split, c(nextDigit));
print(currentSplit);
if(nextDigit < lastDigit) {
possibleSplits = c(list(currentSplit), getSplits(nextDigit+1, lastDigit, currentSplit));
}else{
possibleSplits = currentSplit;
}
splits <- c(splits, list(possibleSplits));
}
return(splits);
}
Where printing each currentSplit results in all the right vectors I need, but somehow the final returnt list (splits) nests them into deeper levels of lists, returning:
[1] 1
[[1]][[2]]
[[1]][[2]][[1]]
[1] 1 2
[[1]][[2]][[2]]
[1] 1 2 3
[[1]][[3]]
[1] 1 3
[[2]]
[[2]][[1]]
[1] 2
[[2]][[2]]
[1] 2 3
[[3]]
[1] 3
For the corresponding function call getSplits(1, 3, c()).
If anyone could help me out on getting this to work the way I described above, it'd be much appreciated!
character vector output
Try combn:
k <- 3
s <- unlist(lapply(1:k, combn, x = k, toString))
s
## [1] "1" "2" "3" "1, 2" "1, 3" "2, 3" "1, 2, 3"
data frame output
If you would prefer that the output be in the form of a data frame:
read.table(text = s, header = FALSE, sep = ",", fill = TRUE, col.names = 1:k)
giving:
X1 X2 X3
1 1 NA NA
2 2 NA NA
3 3 NA NA
4 1 2 NA
5 1 3 NA
6 2 3 NA
7 1 2 3
list output
or a list:
lapply(s, function(x) scan(textConnection(x), quiet = TRUE, sep = ","))
giving:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 1 2
[[5]]
[1] 1 3
[[6]]
[1] 2 3
[[7]]
[1] 1 2 3
Update: Have incorporated improvement mentioned in comments as well as one further simplification and also added data frame and list output.
Here is another approach:
f <- function(nums) sapply(1:length(nums), function(x) t(combn(nums, m = x)))
f(1:3)
This yields
[[1]]
[,1]
[1,] 1
[2,] 2
[3,] 3
[[2]]
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 2 3
[[3]]
[,1] [,2] [,3]
[1,] 1 2 3
The OP is looking for the Power set of c(1,2,3). There are several packages that will quickly get you this in one line. Using the package rje, we have:
library(rje)
powerSet(c(1,2,3))
[[1]]
numeric(0)
[[2]]
[1] 1
[[3]]
[1] 2
[[4]]
[1] 1 2
[[5]]
[1] 3
[[6]]
[1] 1 3
[[7]]
[1] 2 3
[[8]]
[1] 1 2 3
... and with iterpc:
library(iterpc)
getall(iterpc(c(2,1,1,1), 3, labels = 0:3))
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 0 0 2
[3,] 0 0 3
[4,] 0 1 2
[5,] 0 1 3
[6,] 0 2 3
[7,] 1 2 3
More generally,
n <- 3
getall(iterpc(c(n-1,rep(1, n)), n, labels = 0:n)) ## same as above

Remove outliers based on a preceding value

How to remove outliers using a criterion that a value cannot be more than 2-fold higher then its preceding one.
Here is my try:
x<-c(1,2,6,4,10,20,50,10,2,1)
remove_outliers <- function(x, na.rm = TRUE, ...) {
for(i in 1:length(x))
x < (x[i-1] + 2*x)
x
}
remove_outliers(y)
expected outcome: 1,2,4,10,20,2,1
Thanks!
I think the first 10 should be removed in your data because 10>2*4. Here's a way to do what you want without loops. I'm using the dplyr version of lag.
library(dplyr)
x<-c(1,2,6,4,10,20,50,10,2,1)
x[c(TRUE,na.omit(x<=dplyr::lag(x)*2))]
[1] 1 2 4 20 10 2 1
EDIT
To use this with a data.frame:
df <- data.frame(id=1:10, x=c(1,2,6,4,10,20,50,10,2,1))
df[c(TRUE,na.omit(df$x<=dplyr::lag(df$x,1)*2)),]
id x
1 1 1
2 2 2
4 4 4
6 6 20
8 8 10
9 9 2
10 10 1
A simple sapply:
bool<-sapply(seq_along(1:length(x)),function(i) {ifelse(x[i]<2*x[i-1],FALSE,TRUE)})
bool
[[1]]
logical(0)
[[2]]
[1] TRUE
[[3]]
[1] TRUE
[[4]]
[1] FALSE
[[5]]
[1] TRUE
[[6]]
[1] TRUE
[[7]]
[1] TRUE
[[8]]
[1] FALSE
[[9]]
[1] FALSE
[[10]]
[1] FALSE
resulting in:
x[unlist(bool)]
[1] 1 2 4 10 20 1

Resources