I'm trying to run a simple custom function test that takes a vector x and rounds the contents if the entry is greater than 1, returning a new vector. For instance, for input x <- c(.5, .2, 1.6, 7.9) I would expect test(x) to output [1] 0.5 0.2 2 8.
However, my code either returns 1 entry or only alters 1 entry and leaves the rest of the output blank.
test <- function(x) {
nr <- nrow(x)
e <- numeric(nr)
for (i in x) {
if(i > 1) {
e <- round(i, 1)
}
else {
e <- i
}
}
return(e)
}
How do I iterate through a predetermined list instead of a range of integers?
Instead of performing a for loop with an embedded if statement, it would be more efficient (both in terms of runtime and amount of code written) to use ifelse. You would transform your entire function to a single line of code:
test <- function(x) ifelse(x > 1, round(x, 0), x)
Here it is in action:
x <- c(.5, .2, 1.6, 7.9)
test(x)
# [1] 0.5 0.2 2.0 8.0
Related
I just saw a YouTube video from Numberphile on the Yellowstone sequence (A098550). It's base on a sequence starting with 1 and 2, with subsequent terms generated by the rules:
no repeated terms
always pick the lowest integer
gcd(a_n, a_(n-1)) = 1
gcd(a_n, a_(n-2)) > 1
The first 15 terms would be: 1 2 3 4 9 8 15 14 5 6 25 12 35 16 7
A Q&D approach in R could be something like this, but understandably, this becomes very slow at attempts to make longer sequences. It also make some assumptions about the highest number that is possible within the sequence (as info: the sequence of 10,000 items never goes higher than 5000).
What can we do to make this faster?
library(DescTools)
a <- c(1, 2, 3)
p <- length(a)
# all natural numbers
all_ints <- 1:5000
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
next_a_set <- all_ints[which(!all_ints %in% a)]
# rule 3 - search the remaining set for numbers that have gcd == 1
next_a_option <- next_a_set[which(
sapply(
next_a_set,
function(x) GCD(a[n], x)
) == 1
)]
# rule 4 - search the remaining number for gcd > 1
next_a <- next_a_option[which(
sapply(
next_a_option,
function(x) GCD(a[n - 1], x)
) > 1
)]
# select the lowest
a <- c(a, min(next_a))
n <- n + 1
}
Here's a version that's about 20 times faster than yours, with comments about the changes:
# Set a to the final length from the start.
a <- c(1, 2, 3, rep(NA, 997))
p <- 3
# Define a vectorized gcd() function. We'll be testing
# lots of gcds at once. This uses the Euclidean algorithm.
gcd <- function(x, y) { # vectorized gcd
while (any(y != 0)) {
x1 <- ifelse(y == 0, x, y)
y <- ifelse(y == 0, 0, x %% y)
x <- x1
}
x
}
# Guess at a reasonably large vector to work from,
# but we'll grow it later if not big enough.
allnum <- 1:1000
# Keep a logical record of what has been used
used <- c(rep(TRUE, 3), rep(FALSE, length(allnum) - 3))
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
# nothing to do -- used already records that.
repeat {
# rule 3 - search the remaining set for numbers that have gcd == 1
keep <- !used & gcd(a[n], allnum) == 1
# rule 4 - search the remaining number for gcd > 1
keep <- keep & gcd(a[n-1], allnum) > 1
# If we found anything, break out of this loop
if (any(keep))
break
# Otherwise, make the set of possible values twice as big,
# and try again
allnum <- seq_len(2*length(allnum))
used <- c(used, rep(FALSE, length(used)))
}
# select the lowest
newval <- which.max(keep)
# Assign into the appropriate place
a[n+1] <- newval
# Record that it has been used
used[newval] <- TRUE
}
If you profile it, you'll see it spends most of its time in the gcd() function. You could probably make that a lot faster by redoing it in C or C++.
The biggest change here is pre-allocation and restricting the search to numbers that have not yet been used.
library(numbers)
N <- 5e3
a <- integer(N)
a[1:3] <- 1:3
b <- logical(N) # which numbers have been used already?
b[1:3] <- TRUE
NN <- 1:N
system.time({
for (n in 4:N) {
a1 <- a[n - 1L]
a2 <- a[n - 2L]
for (k in NN[!b]) {
if (GCD(k, a1) == 1L & GCD(k, a2) > 1L) {
a[n] <- k
b[k] <- TRUE
break
}
}
if (!a[n]) {
a <- a[1:(n - 1L)]
break
}
}
})
#> user system elapsed
#> 1.28 0.00 1.28
length(a)
#> [1] 1137
For a fast C++ algorithm, see here.
x is a double precision vector containing numbers both smaller and bigger than 1:
x <- c(1, 2, 3, 0.1, 0.02, 0.003)
If an element in x is smaller than one, I would like to multipy that element by 10 until it is bigger than one; if the element is already bigger than one, it should remain unchanged. The result should be a double precision vector again:
(1, 2, 3, 1, 2, 3)
I feel like this should be very simple, but the solution eludes me. Here is what I did so far. I defined the function times_ten that does the operation described above:
times_ten <- function(x) {
while (x < 1) {
x <- x * 10
}
}
Then I apply this function to the vector x:
y <- sapply(c(1, 2, 3, 0.1, 0.02, 0.003), function(x) times_ten(x))
But now y is a list and no longer a double. Even worse, all values in y are now NULL.
Do I have to change the function, and if so, how? Or is there a better approach entirely?
As #Neel said, you need return(x) at the end of your function times_ten.
Below is an easier method for the conversion, where log10() was used
y <- x*10**ifelse(-log10(x)>=1,ceiling(-log10(x)),0)
such that
> y
[1] 1 2 3 1 2 3
I guess you missed return statement within your function.
times_ten <- function(x){
while (x < 1) {
x <- x * 10
}
return(x)
}
with that, your solution is perfectly right.
y <- sapply(c(1, 2, 3, 0.1, 0.02, 0.003), function(x) times_ten(x))
Here is another solution.
library(tidyverse)
x %>% map_dbl(~times_ten(.x))
gives me output as
[1] 1 2 3 1 2 3
I'm given arrays of numbers between 1 and 4, but usually they don't differ more than .5 between the min and max. The difference between each element is no smaller than .1. I want to find the smallest margin that contains at least 90% (or some other specified rate) of the elements.
That is, given the array
c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
I want my function to return .4 because 2.3 - 1.9 = .4 < 2.3 - 1 = 1.3. Details:
2.3 - 1.9 comes from the 90%-length subvector starting at 1.9 and running to the end
2.3 - 1 comes from the 90%-length subvector starting at 1 and ending at the first 2.3
I tried to build the function a few times, but it keeps growing overly complicated, and I'm wondering if there's a simple way to do this that I haven't considered.
Edit: it has to be able to satisfy skewed distributions. I don't have any completed examples of code I produced since I keep reconstructing it, but I'll make something and post it.
Edit2: I can't provide any examples of the arrays I want to feed into function, but Here's a function for generating similar values. It's not important that it doesn't fall in the 1 to 4 range as long as it works.
x = round(rbeta(20,5,2)*100)/10
The easiest way will be to brute force by testing all possible ranges that include 90%. To do this, we figure out how many terms that is, and what indices the ranges therefore can start at, and compute the difference for each, and then the minimum of those.
x <- c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
n <- ceiling(length(x)*0.9) # get the number of terms needed to include 90%
k <- 1 : (length(x) - n + 1) # get the possible indices the range can start at
x <- sort(x) # need them sorted...
d <- x[k + n - 1] - x[k] # get the difference starting at each range
min(d) # get the smallest difference
Here's one way (same as #Aaron's except head/tail instead of x[i]):
x = c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
xn= length(x)
# number of elements to drop
n = round(0.1*xn)
# achievable ranges
v = tail(x, n+1) - head(x, n+1)
min(v)
# [1] 0.4
Confirmation that a subvector of x dropping n elements really has this range:
n_up = which.min(v) - 1
n_dn = n-n_up
xs = x[(1 + n_up):(xn - n_dn)]
diff(range(xs))
# [1] 0.4
length(x) - length(xs) == n
# [1] TRUE
Testing on new example:
set.seed(1)
x0 = round(rbeta(20,5,2)*100)/10
x = sort(x0)
xn= length(x)
n = round(0.1*xn)
v = tail(x, n+1) - head(x, n+1)
min(v)
# [1] 4.1
# confirm...
n_up = which.min(v) - 1
n_dn = n-n_up
xs = x[(1 + n_up):(xn - n_dn)]
diff(range(xs))
# [1] 4.1
length(x) - length(xs) == n
# [1] TRUE
Partial sorting might be sufficient (just to get the top and bottom values on the ends); see ?sort.
This can be solved with quantile.
Compute the 0.05 and 0.95 quantiles.
Get the values of x that are within those limits. Call this vector in_90.
Return the difference between the minimum and the maximum of those values of in_90.
The sequence of instructions would be this.
qq <- quantile(x, c(0.05, 0.95))
in_90 <- x[qq[1] <= x & x <= qq[2]]
diff(range(in_90))
#[1] 0.4
As a function:
amplitude <- function(x, conf = 0.9){
quants <- c((1 - conf)/2, 1 - (1 - conf)/2)
qq <- quantile(x, quants)
inside <- x[qq[1] <= x & x <= qq[2]]
diff(range(inside))
}
amplitude(x)
#[1] 0.4
Data.
x <- c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
For tutorial purposes, I'd like to be able to print or display matrices and vectors side-by-side, often to illustrate the result of a matrix equation, like $A x = b$.
I could do this using SAS/IML, where the print statement takes an arbitrary collection of (space separated) expressions, evaluates them and prints the result, e.g.,
print A ' * ' x '=' (A * x) '=' b;
A X #TEM1001 B
1 1 -4 * 0.733 = 2 = 2
1 -2 1 -0.33 1 1
1 1 1 -0.4 0 0
Note that quoted strings are printed as is.
I've searched, but can find nothing like this in R. I imagine something like this could be done by a function showObj(object, ...) taking its list of arguments, formatting each to a block of characters, and joining them side-by-side.
Another use of this would be a compact way of displaying a 3D array as the side-by-side collection of its slices.
Does this ring a bell or does anyone have a suggestion for getting started?
I have created a very simple function that can print matrices and vectors with arbitrary character strings (typically operators) in between. It allows for matrices with different numbers of rows and treats vectors as column matrices. It is not very elaborate, so I fear there are many examples where it fails. But for an example as simple as the one in your question, it should be enough.
format() is used to convert the numbers to characters. This has the advantage that all the rows of the matrix have the same width and are thus nicely aligned when printed. If needed, you could add some of the arguments of format() also as arguments mat_op_print() to make the configurable. As an example, I have added the argument width that can be used to control the minimal width of the columns.
If the matrices and vectors are name in the function call, these names are printed as headers in the first line. Otherwise, only the numbers are printed.
So, this is the function:
mat_op_print <- function(..., width = 0) {
# get arguments
args <- list(...)
chars <- sapply(args, is.character)
# auxilliary function to create character of n spaces
spaces <- function(n) paste(rep(" ", n), collapse = "")
# convert vectors to row matrix
vecs <- sapply(args, is.vector)
args[vecs & !chars] <- lapply(args[vecs & !chars], function(v) matrix(v, ncol = 1))
# convert all non-characters to character with format
args[!chars] <- lapply(args[!chars], format, width = width)
# print names as the first line, if present
arg_names <- names(args)
if (!is.null(arg_names)) {
get_title <- function(x, name) {
if (is.matrix(x)) {
paste0(name, spaces(sum(nchar(x[1, ])) + ncol(x) - 1 - nchar(name)))
} else {
spaces(nchar(x))
}
}
cat(mapply(get_title, args, arg_names), "\n")
}
# auxiliary function to create the lines
get_line <- function(x, n) {
if (is.matrix(x)) {
if (nrow(x) < n) {
spaces(sum(nchar(x[1, ])) + ncol(x) - 1)
} else {
paste(x[n, ], collapse = " ")
}
} else if (n == 1) {
x
} else {
spaces(nchar(x))
}
}
# print as many lines as needed for the matrix with most rows
N <- max(sapply(args[!chars], nrow))
for (n in 1:N) {
cat(sapply(args, get_line, n), "\n")
}
}
And this is an example of how it works:
A = matrix(c(0.5, 1, 3, 0.75, 2.8, 4), nrow = 2)
x = c(0.5, 3.7, 2.3)
y = c(0.7, -1.2)
b = A %*% x - y
mat_op_print(A = A, " * ", x = x, " - ", y = y, " = ", b = b, width = 6)
## A x y b
## 0.50 3.00 2.80 * 0.5 - 0.7 = 17.090
## 1.00 0.75 4.00 3.7 -1.2 13.675
## 2.3
Also printing the slices of a 3-dimensional array side-by-side is possible:
A <- array(1:12, dim = c(2, 2, 3))
mat_op_print(A1 = A[, , 1], " | ", A2 = A[, , 2], " | ", A3 = A[, , 3])
## A1 A2 A3
## 1 3 | 5 7 | 9 11
## 2 4 6 8 10 12
I want to create a function which finds components of a vector which increase continually by k-times.
That is, if the contrived function is f(x,k) and x=c(2,3,4,3,5,6,5,7), then
the value of f(x,1) is 2,3,3,5,5 since only these components of x increase by 1 time.
In addition, if k=2, then the value of f(x,2) is 2,3 since only these components increase continually by 2 times.(2→3→4 and 3→5→6)
I guess that I ought to use repetitive syntax like for for this purpose.
1) Use rollapply from the zoo package:
library(zoo)
f <- function(x, k)
x[rollapply(x, k+1, function(x) all(diff(x) > 0), align = "left", fill = FALSE)]
Now test out f:
x <- c(2,3,4,3,5,6,5,7)
f(x, 1)
## [1] 2 3 3 5 5
f(x, 2)
## [1] 2 3
f(x, 3)
## numeric(0)
1a) This variation is slightly shorter and also works:
f2 <- function(x, k) head(x, -k)[ rollapply(diff(x) > 0, k, all) ]
2) Here is a version of 1a that uses no packages:
f3 <- function(x, k) head(x, -k)[ apply(embed(diff(x) > 0, k), 1, all) ]
A fully vectorized solution:
f <- function(x, k = 1) {
rlecumsum = function(x)
{ #cumsum with resetting
#http://stackoverflow.com/a/32524260/1412059
cs = cumsum(x)
cs - cummax((x == 0) * cs)
}
x[rev(rlecumsum(rev(c(diff(x) > 0, FALSE) ))) >= k]
}
f(x, 1)
#[1] 2 3 3 5 5
f(x, 2)
#[1] 2 3
f(x, 3)
#numeric(0)
I don't quite understand the second part of your question (that with k=2) but for the first part you can use something like this:
test<-c(2,3,4,3,5,6,5,7) #Your vector
diff(test) #Differentiates the vector
diff(test)>0 #Turns the vector in a logical vector with criterion >0
test[diff(test)>0] #Returns only the elements of test that correspond to a TRUE value in the previous line