Convert values outside a range to the range's bounds - r

If I have a series of values
set.seed(123)
x <- rnorm(100)
and a given range (a, b), e.g.
a <- -1; b <- 2
How could I move those values less than a to a and those greater than b to b?
The following basic method works but I'm searching a function or a one-liner command.
x[x < a] <- a
x[x > b] <- b

If we need a single line, use pmin/pmax
out <- pmin(pmax(x, a), b)
-checking
x[x < a] <- a
x[x > b] <- b
identical(out, x)
[1] TRUE

Related

Specifying R to take one argument at a time when passing multiple arguments using '...'

I am a novice in R required by my superior to do things a certain way. I am interested in determining values of descriptive statistics setup count and heavy-dominance setup count. Setup count basically counts the number of setups found within a location, while heavy-dominance setup count counts the number of setups that has dominance values of x population ≥ 50% within the said location. This is how I would normally calculate said statistics:
##Normal Approach
#Sample Data 1
v <- c(53, 2, 97) #let vector "v" represent Location 1
w <- c(7, 16, 31, 44, 16) #let vector "w" represent Location 2
#Setup Count
sc_v <- length(v)
sc_w <- length(w)
sc <- c(sc_v, sc_w)
sc
#Heavy-Dominance Setup Count
hd_v <- length(which(v >= 50))
hd_w <- length(which(w >= 50))
hd <- c(hd_v, hd_w)
hd
I am tasked with developing a function that can both determine said statistical values from raw data and concatenate the outputs into a single vector. Here are the working functions I developed:
#Setup Count (2 vectors at a time only)
setup.count <- function(x, y){
a <- length(x)
b <- length(y)
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (2 vectors at a time only)
heavy.dominance <- function(x, y){
a <- length(which(x >= 50))
b <- length(which(y >= 50))
d <- c(a, b)
d
}
y <- setup.count(v, w)
y
z <- heavy.dominance(v, w)
z
Suppose there are more than two locations:
#Sample Data 2
v <- c(53, 2, 97)
w <- c(7, 16, 31, 44, 16)
x <- c(45, 22, 96, 74) #let vector "x" represent the additional Location 3
How can I specify R to take one argument at a time when passing multiple arguments using '...'? Here are the failed attempts to revise the abovementioned functions, to give an idea:
##Attempt 1
#Setup Count (incorrect v1)
setup.count <- function(x, ...){
data <- list(...)
a <- length(x)
b <- length(data) #will return the number of locations other than x, not the separate number of setups within each of these locations
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (incorrect v1)
heavy.dominance <- function(x, ...){
data <- list(...)
a <- length(which(x >= 50))
b <- length(which(data >= 50)) #will return the error "'list' object cannot be coerced to type 'double'"
d <- c(a, b)
d
}
y <- setup.count(v, w, x)
y
z <- heavy.dominance(v, w, x)
z
##Attempt 2
#Setup Count (incorrect v2)
setup.count <- function(x, ...){
data <- list(...)
a <- length(x)
b <- length(unlist(data)) #will return the total number of setups in all locations other than x, not as separate values
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (incorrect v2)
heavy.dominance <- function(x, ...){
data <- list(...)
a <- length(which(x >= 50))
b <- length(which(unlist(data) >= 50)) #will return the total number of setups with dominance ≥ 50% in all locations other than x, not as separate values
d <- c(a, b)
d
}
y <- setup.count(v, w, x)
y
z <- heavy.dominance(v, w, x)
z
You may just list() elements in the ellipsis. Use sapply() to loop over the list elements. Add a type= argument to have one function for both purposes, and a thresh= argument.
setup.fun <- function(..., type=c('count', 'dominance'), thresh=50) {
x <- list(...)
type <- match.arg(type)
if (type == 'count') sapply(x, length)
else sapply(x, function(x) length(which(x >= thresh)))
}
setup.fun(v, w, x)
# [1] 3 5 4
setup.fun(v, w, x, type='count')
# [1] 3 5 4
setup.fun(v, w, x, type='dominance')
# [1] 2 0 2
setup.fun(v, w, x, type='d')
# [1] 2 0 2
setup.fun(v, w, x, v)
# [1] 3 5 4 3
setup.fun(v)
# [1] 3
setup.fun(v, w, x, type='dominance', thresh=40)
# [1] 2 1 3

Create indicator variables within a list

I have a list containing sequences of numbers. I want to create a list that indicates all non-zero elements up to the first element that matches a defined limit. I also want to create a list that indicates all non-zero elements after the first element to match the defined limit.
I prefer a base R solution. Presumably the solution will use lapply, but I have not been able to come up with a simple solution.
Below is a minimally reproducible example in which the limit is 2:
my.limit <- 2
my.samples <- list(0,c(1,2),0,c(0,1,1),0,0,0,0,0,c(1,1,2,2,3,4),c(0,1,2),0,c(0,0,1,1,2,2,3))
Here are the two desired lists:
within.limit <- list(0,c(1,1),0,c(0,1,1),0,0,0,0,0,c(1,1,1,0,0,0),c(0,1,1),0,c(0,0,1,1,1,0,0))
outside.limit <- list(0,c(0,0),0,c(0,0,0),0,0,0,0,0,c(0,0,0,1,1,1),c(0,0,0),0,c(0,0,0,0,0,1,1))
We can use match with nomatch argument as a very big number (should be greater than any length of the list, for some reason I couldn't use Inf here.)
within.limit1 <- lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= match(my.limit, x, nomatch = 1000)))
outside.limit1 <- lapply(my.samples, function(x)
+(seq_along(x) > match(my.limit, x, nomatch = 1000)))
Checking if output is correct to shown one :
all(mapply(function(x, y) all(x == y), within.limit, within.limit1))
#[1] TRUE
all(mapply(function(x, y) all(x == y), outside.limit, outside.limit1))
#[1] TRUE
I would do
within.limit <- lapply(my.samples, function(x)
+(x!=0 & (x<limit | cumsum(x == limit)==1)))
outside.limit <- lapply(my.samples, function(x)
+(x!=0 & (x>limit | cumsum(x == limit)>1)))
foo <- function(samples, limit, within = TRUE) {
`%cp%` <- if (within) `<=` else `>`
lapply(samples, function(x) pmin(x, seq_along(x) %cp% match(my.limit, x, nomatch = 1e8)))
}
> all.equal(foo(my.samples, my.limit, FALSE), outside.limit)
# [1] TRUE
> all.equal(foo(my.samples, my.limit, TRUE), within.limit)
# [1] TRUE
We can use findInterval
lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= findInterval(my.limit, x)-1))
and
lapply(my.samples, function(x) +(seq_along(x) > findInterval(my.limit, x)-1))

Returning absent values without inducing integer (0)

I want to identify which values in one vector are present in another vector. Sometimes, in my application, none of the values of the first vector are present; in such cases I would like NA. My current approach returns integer(0) when this occurs:
l <- 1:3
m <- 2:5
n <- 4:6
l[l %in% m]
1] 2 3
l[l %in% n]
integer(0)
This post discusses how to capture integer(0) using length, but is there a way to avoid integer(0) in the first place, and do this operation in just one step? Answers to the previous question suggest that any could be used but I fail to see how that would work in this example.
You could catch the integer(0) with a custom function:
l <- 1:3
m <- 2:5
n <- 4:6
returnsafe <- function(a, b) {
result <- a[a %in% b]
if(is.integer(result) && length(result) == 0L) {
return(NA)
} else {
return(result)
}
}
> returnsafe(l, n)
[1] NA
You can do:
l[match(l, n)]
[1] NA NA NA
Or:
any(l[match(l, n)])
[1] NA

Find components of a vector which increase continually by k-times

I want to create a function which finds components of a vector which increase continually by k-times.
That is, if the contrived function is f(x,k) and x=c(2,3,4,3,5,6,5,7), then
the value of f(x,1) is 2,3,3,5,5 since only these components of x increase by 1 time.
In addition, if k=2, then the value of f(x,2) is 2,3 since only these components increase continually by 2 times.(2→3→4 and 3→5→6)
I guess that I ought to use repetitive syntax like for for this purpose.
1) Use rollapply from the zoo package:
library(zoo)
f <- function(x, k)
x[rollapply(x, k+1, function(x) all(diff(x) > 0), align = "left", fill = FALSE)]
Now test out f:
x <- c(2,3,4,3,5,6,5,7)
f(x, 1)
## [1] 2 3 3 5 5
f(x, 2)
## [1] 2 3
f(x, 3)
## numeric(0)
1a) This variation is slightly shorter and also works:
f2 <- function(x, k) head(x, -k)[ rollapply(diff(x) > 0, k, all) ]
2) Here is a version of 1a that uses no packages:
f3 <- function(x, k) head(x, -k)[ apply(embed(diff(x) > 0, k), 1, all) ]
A fully vectorized solution:
f <- function(x, k = 1) {
rlecumsum = function(x)
{ #cumsum with resetting
#http://stackoverflow.com/a/32524260/1412059
cs = cumsum(x)
cs - cummax((x == 0) * cs)
}
x[rev(rlecumsum(rev(c(diff(x) > 0, FALSE) ))) >= k]
}
f(x, 1)
#[1] 2 3 3 5 5
f(x, 2)
#[1] 2 3
f(x, 3)
#numeric(0)
I don't quite understand the second part of your question (that with k=2) but for the first part you can use something like this:
test<-c(2,3,4,3,5,6,5,7) #Your vector
diff(test) #Differentiates the vector
diff(test)>0 #Turns the vector in a logical vector with criterion >0
test[diff(test)>0] #Returns only the elements of test that correspond to a TRUE value in the previous line

Selecting one column of a data frame returns a factor, instead of another data frame

I have the below code, if m ==2 then cd remains a dataframe and names(cd) are from the original dataframe d.
However if m == 3 only one column in the dataframe remains and cd turns into a factor and I lose the names...
samplesize <-100
g1 <- gl(2,samplesize/2,labels=c("V","M"))
g2 <- gl(3,samplesize/3,labels=c("V","M","U"))
m <- 2
d <- data.frame(g1,g2)
l <- sapply(d,nlevels)
cd <- d[,l <= m]
names(cd)
I would like to keep the names of d even if the filter only leaves one column?
Use drop=FALSE to avoid coercing to the lowest dimension.
cd <- d[,l <= m, drop=FALSE]
names(cd)
[1] "g1"
The classical problem of [, namely that it's default is drop=TRUE:
> args(`[.data.frame`)
function (x, i, j, drop = if (missing(i)) TRUE else length(cols) ==
1)
NULL
So (as said by P Lapointe), use drop=FALSE:
> cd <- d[,l <= m, drop = FALSE]
> names(cd)
[1] "g1"

Resources