I found this code on :http://onertipaday.blogspot.co.il/search/label/descriptive%20statistic
The code is describing a workaround for finding the min and max in a dataset that has inf and -inf in a vector. However I don't understand the purpose of the [1] and [2] in the last two lines of code.
data <- c(-Inf, 1,2,3,4,5,6,7,8,9,10, Inf)
max(data)
# Return Inf
min(data)
# Return -Inf
# To solve the problem I went to:
range(data, finite=TRUE)
# Then you can do
myMinimum <- range(data, finite=TRUE)[1]
myMaximum <- range(data, finite=TRUE)[2]
The range function returns a vector of length 2, with the first being the minimum and the second being the maximum.
For instance:
> a <- 15:30
> range(a)
[1] 15 30
Using the [] operator you extract the desired element
> range(a)[1]
[1] 15
> range(a)[2]
[1] 30
Or you can also do:
r <- range(a)
my.min <- r[1]
my.max <- r[2]
For more information read ?range.
Also, you can directly use the min and max functions.
Related
Given a numeric vector, I'd like to find the smallest absolute difference in combinations of size 2. However, the point of friction comes with the use of combn to create the matrix holding the pairs. How would one handle issues when a matrix/vector is too large?
When the number of resulting pairs (number of columns) using combn is too large, I get the following error:
Error in matrix(r, nrow = len.r, ncol = count) :
invalid 'ncol' value (too large or NA)
This post states that the size limit of a matrix is roughly one billion rows and two columns.
Here is the code I've used. Apologies for the use of cat in my function output -- I'm solving the Minimum Absolute Difference in an Array Greedy Algorithm problem in HackerRank and R outputs are only counted as correct if they're given using cat:
minimumAbsoluteDifference <- function(arr) {
combos <- combn(arr, 2)
cat(min(abs(combos[1,] - combos[2,])))
}
# This works fine
input0 <- c(3, -7, 0)
minimumAbsoluteDifference(input0) #returns 3
# This fails
inputFail <- rpois(10e4, 1)
minimumAbsoluteDifference(inputFail)
#Error in matrix(r, nrow = len.r, ncol = count) :
# invalid 'ncol' value (too large or NA)
TL;DR
No need for combn or the like, simply:
min(abs(diff(sort(v))))
The Nitty Gritty
Finding the difference between every possible combinations is O(n^2). So when we get to vectors of length 1e5, the task is burdensome both computationally and memory-wise.
We need a different approach.
How about sorting and taking the difference only with its neighbor?
By first sorting, for any element vj, the difference min |vj - vj -/+ 1| will be the smallest such difference involving vj. For example, given the sorted vector v:
v = -9 -8 -6 -4 -2 3 8
The smallest distance from -2 is given by:
|-2 - 3| = 5
|-4 - -2| = 2
There is no need in checking any other elements.
This is easily implemented in base R as follows:
getAbsMin <- function(v) min(abs(diff(sort(v))))
I'm not going to use rpois as with any reasonably sized vector, duplicates will be produces, which will trivially give 0 as an answer. A more sensible test would be with runif or sample (minimumAbsoluteDifference2 is from the answer provided by #RuiBarradas):
set.seed(1729)
randUnif100 <- lapply(1:100, function(x) {
runif(1e3, -100, 100)
})
randInts100 <- lapply(1:100, function(x) {
sample(-(1e9):(1e9), 1e3)
})
head(sapply(randInts100, getAbsMin))
[1] 586 3860 2243 2511 5186 3047
identical(sapply(randInts100, minimumAbsoluteDifference2),
sapply(randInts100, getAbsMin))
[1] TRUE
options(scipen = 99)
head(sapply(randUnif100, getAbsMin))
[1] 0.00018277206 0.00020549633 0.00009834766 0.00008395873 0.00005299225 0.00009313226
identical(sapply(randUnif100, minimumAbsoluteDifference2),
sapply(randUnif100, getAbsMin))
[1] TRUE
It's very fast as well:
library(microbenchmark)
microbenchmark(a = getAbsMin(randInts100[[50]]),
b = minimumAbsoluteDifference2(randInts100[[50]]),
times = 25, unit = "relative")
Unit: relative
expr min lq mean median uq max neval
a 1.0000 1.0000 1.0000 1.0000 1.00000 1.00000 25
b 117.9799 113.2221 105.5144 107.6901 98.55391 81.05468 25
Even for very large vectors, the result is instantaneous;
set.seed(321)
largeTest <- sample(-(1e12):(1e12), 1e6)
system.time(print(getAbsMin(largeTest)))
[1] 3
user system elapsed
0.083 0.003 0.087
Something like this?
minimumAbsoluteDifference2 <- function(x){
stopifnot(length(x) >= 2)
n <- length(x)
inx <- rep(TRUE, n)
m <- NULL
for(i in seq_along(x)[-n]){
inx[i] <- FALSE
curr <- abs(x[i] - x[which(inx)])
m <- min(c(m, curr))
}
m
}
# This works fine
input0 <- c(3, -7, 0)
minimumAbsoluteDifference(input0) #returns 3
minimumAbsoluteDifference2(input0) #returns 3
set.seed(2020)
input1 <- rpois(1e3, 1)
minimumAbsoluteDifference(input1) #returns 0
minimumAbsoluteDifference2(input1) #returns 0
inputFail <- rpois(1e5, 1)
minimumAbsoluteDifference(inputFail) # This fails
minimumAbsoluteDifference2(inputFail) # This does not fail
for a given matrix F I want to calculate the sum of the 2-norm of its rows, so I use the function sum() but it doesn't work as I expect it to do here an example
# The matrix F
> F <- matrix(c(9,1,1,1,4,1),nrow=3)
# index of the sum i
> i=1:NROW(F)
#And here is the result
> sum(norm(F[i,], type = "2")^4)
[1] 7376.60160040254
# and if i calculate each element of the sum i get
> norm(F[1,], type = "2")^4
[1] 6724
> norm(F[2,], type = "2")^4
[1] 289
> norm(F[3,], type = "2")^4
[1] 4
I think you're looking for the apply function. It applies a function along the dimensions of a matrix.
sum(apply(F,MARGIN = 1,function(x){norm(x,type = "2")^4}))
#[1] 7017
The reason yours doesn't work is because you assigned c(1,2,3) to i. Then, when you subset F, you just get the whole matrix.
i=1:NROW(F)
i
#[1] 1 2 3
norm(F,type="2")^4
#[1] 7376.602
norm(F[1:3,],type="2")^4
#[1] 7376.602
norm(F[i,],type="2")^4
#[1] 7376.602
Disclaimer: I have not assessed the mathematical validity of this approach, only programmatically recreated the OP's desired behavior.
I have an expression
qbinom(0.05, n, .47) - 1
and I want to create a loop which iterates this expression over n for n = (20,200). For each iteration of this loop, this function will produce a number. I want to take the maximum of the 180 numbers it will produce. So, something like.
for (n in 20:200) {
max(qbinom(0.05, n, .47)-1)
But I'm not sure how exactly to do this.
Thanks!
First, I will show you how to do this with a loop.
n <- 20:200
MAX = -Inf ## initialize maximum
for (i in 1:length(n)) {
x <- qbinom(0.05, n[i], 0.47) - 1
if (x > MAX) MAX <- x
}
MAX
# [1] 81
Note, I am not keeping a record of all 181 values generated. Each value is treated as a temporary value and will be overwritten in the next iteration. In the end, we only have a single value MAX.
If you want to at the same time retain all the records, we need first initialize a vector to hold them.
n <- 20:200
MAX = -Inf ## initialize maximum
x <- numeric(length(n)) ## vector to hold record
for (i in 1:length(n)) {
x[i] <- qbinom(0.05, n[i], 0.47) - 1
if (x[i] > MAX) MAX <- x[i]
}
## check the first few values of `x`
head(x)
# [1] 5 5 6 6 6 7
MAX
# [1] 81
Now I am showing the vectorization solution.
max(qbinom(0.05, 20:200, 0.47) - 1)
# [1] 81
R functions related to probability distributions are vectorized in the same fashion. For those related to binomial distributions, you can read ?rbinom for details.
Note, the vectorization is achieved with recycling rule. For example, by specifying:
qbinom(0.05, 1:4, 0.47)
R will first do recycling:
p: 0.05 0.05 0.05 0.05
mean: 1 2 3 4
sd: 0.47 0.47 0.47 0.47
then evaluate
qbinom(p[i], mean[i], sd[i])
via a C-level loop.
Follow-up
How would I be able to know which of the 20:200 corresponds to the maximum using the vectorization solution?
We can use
x <- qbinom(0.05, 20:200, 0.47) - 1
i <- which.max(x)
# [1] 179
Note, i is the position in vector 20:200. To get the n you want, you need:
(20:200)[i]
# 198
The maximum is
x[i]
# [1] 81
I want to get the column means for the last list element, which is a sparse matrix multiplied times a regular matrix. Whenever I use colMeans, however, I get an error. For example:
# Use the igraph package to create a sparse matrix
library(igraph)
my.lattice <- get.adjacency(graph.lattice(length = 5, dim = 2))
# Create a conformable matrix of TRUE and FALSE values
start <- matrix(sample(c(TRUE, FALSE), 50, replace = T), ncol = 2)
# Multiply the matrix times the vector, and save the results to a list
out <- list()
out[[1]] <- my.lattice %*% start
out[[2]] <- my.lattice %*% out[[1]]
# Try to get column means of the last element
colMeans(tail(out, 1)[[1]]) # Selecting first element because tail creates a list
# Error in colMeans(tail(out, 1)[[1]]) :
# 'x' must be an array of at least two dimensions
# But tail(out, 1)[[1]] seems to have two dimensions
dim(tail(out, 1)[[1]])
# [1] 25 2
Any idea what's causing this error, or what I can do about it?
It looks like explicitly calling the colMeans function from the Matrix package works:
> Matrix::colMeans(tail(out, 1)[[1]])
# [1] 4.48 5.48
Thanks to user20650 for this suggestion.
I am filling a 10x10 martix (mat) randomly until sum(mat) == 100
I wrote the following.... (i = 2 for another reason not specified here but i kept it at 2 to be consistent with my actual code)
mat <- matrix(rep(0, 100), nrow = 10)
mat[1,] <- c(0,0,0,0,0,0,0,0,0,1)
mat[2,] <- c(0,0,0,0,0,0,0,0,1,0)
mat[3,] <- c(0,0,0,0,0,0,0,1,0,0)
mat[4,] <- c(0,0,0,0,0,0,1,0,0,0)
mat[5,] <- c(0,0,0,0,0,1,0,0,0,0)
mat[6,] <- c(0,0,0,0,1,0,0,0,0,0)
mat[7,] <- c(0,0,0,1,0,0,0,0,0,0)
mat[8,] <- c(0,0,1,0,0,0,0,0,0,0)
mat[9,] <- c(0,1,0,0,0,0,0,0,0,0)
mat[10,] <- c(1,0,0,0,0,0,0,0,0,0)
i <- 2
set.seed(129)
while( sum(mat) < 100 ) {
# pick random cell
rnum <- sample( which(mat < 1), 1 )
mat[rnum] <- 1
##
print(paste0("i =", i))
print(paste0("rnum =", rnum))
print(sum(mat))
i = i + 1
}
For some reason when sum(mat) == 99 there are several steps extra...I would assume that once i = 91 the while would stop but it continues past this. Can somone explain what I have done wrong...
If I change the while condition to
while( sum(mat) < 100 & length(which(mat < 1)) > 0 )
the issue remains..
Your problem is equivalent to randomly ordering the indices of a matrix that are equal to 0. You can do this in one line with sample(which(mat < 1)). I suppose if you wanted to get exactly the same sort of output, you might try something like:
set.seed(144)
idx <- sample(which(mat < 1))
for (i in seq_along(idx)) {
print(paste0("i =", i))
print(paste0("rnum =", idx[i]))
print(sum(mat)+i)
}
# [1] "i =1"
# [1] "rnum =5"
# [1] 11
# [1] "i =2"
# [1] "rnum =70"
# [1] 12
# ...
See ?sample
Arguments:
x: Either a vector of one or more elements from which to choose,
or a positive integer. See ‘Details.’
...
If ‘x’ has length 1, is numeric (in the sense of ‘is.numeric’) and
‘x >= 1’, sampling _via_ ‘sample’ takes place from ‘1:x’. _Note_
that this convenience feature may lead to undesired behaviour when
‘x’ is of varying length in calls such as ‘sample(x)’. See the
examples.
In other words, if x in sample(x) is of length 1, sample returns a random number from 1:x. This happens towards the end of your loop, where there is just one 0 left in your matrix and one index is returned by which(mat < 1).
The iteration repeats on level 99 because sample() behaves very differently when the first parameter is a vector of length 1 and when it is greater than 1. When it is length 1, it assumes you a random number from 1 to that number. When it has length >1, then you get a random number from that vector.
Compare
sample(c(99,100),1)
and
sample(c(100),1)
Of course, this is an inefficient way of filling your matrix. As #josilber pointed out, a single call to sample could do everything you need.
The issue comes from how sample and which do the sampling when you have only a single '0' value left.
For example, do this:
mat <- matrix(rep(1, 100), nrow = 10)
Now you have a matrix of all 1's. Now lets make two numbers 0:
mat[15]<-0
mat[18]<-0
and then sample
sample(which(mat<1))
[1] 18 15
by adding a size=1 argument you get one or the other
now lets try this:
mat[18]<-1
sample(which(mat<1))
[1] 3 13 8 2 4 14 11 9 10 5 15 7 1 12 6
Oops, you did not get [1] 15 . Instead what happens in only a single integer (15 in this case) is passed tosample. When you do sample(x) and x is an integer, it gives you a sample from 1:x with the integers in random order.