R convert vector of numbers to skipping indexes - r

I have a vector of widths,
ws = c(1,1,2,1,3,1)
From this vector I'd like to have another vector of this form:
indexes = c(1,2,3,5,6,7,9,11,12)
In order to create such vector I did the following for loop in R:
ws = c(1,1,2,1,3,1)
indexes = rep(0, sum(ws))
counter = 1
counter2 = 1
last = 0
for(i in 1:length(ws))
{
if (ws[i] == 1)
{
indexes[counter] = counter2
counter = counter + 1
} else {
for(j in 1:ws[i])
{
indexes[counter] = counter2
counter = counter + 1
counter2 = counter2+2
}
counter2 = counter2 - 2
}
counter2 = counter2+1
}
The logic is as follows, each element in ws specifies the respective number of elements in index. For example if ws is 1, the respective number of elements in indexes is 1, but if ws is > 1, let us say 3, the respective number of elements in index is 3, and the elements are skipped 1-by-1, corresponding to 3,5,7.
However, I'd like to avoid for loops since they tend to be very slow in R. Do you have any suggestions on how to achieve such results only with vector operations? or some more crantastic solution?
Thanks!

Here's a vectorized one-liner for you:
ws <- c(1,1,2,1,3,1)
cumsum((unlist(sapply(ws, seq_len)) > 1) + 1)
# [1] 1 2 3 5 6 7 9 11 12
You can pick it apart piece by piece, working from the inside out, to see how it works.

Related

Rewriting a loop with Rccp

I am a novice Rcpp user. I want to fasten my for loop which uses several entities from r environment, and updates two vectors through iterations.
The problem is that this is my first time facing c or c++ so I do not understand how to write rcpp with inline packages.
Here is the reproducible loop that I want to rewrite.
rsi <- c(NaN, 0, 0, 9.2, 28, 11, 9, 8, 38, 27, 62, 57,59,67, 76, 68, 69, 49)
L <- 2
o <- 2
T_min <-100
T_m <- 0
# Predefine two vectors for results to be written in
rsi_u <- rep(0, length(rsi))
rsi_d <- rep(0, length(rsi))
# Set range of for loop to be apllied on
st <- L + 1 # L and o is some param fron environment
en <- length(rsi) - o - 2
for (i in st:en) {
k <- i - o + 1
k1 <- i - L + 1
if (sum(rsi_u[k:i]) == 0 & sum(rsi_d[k:i]) == 0) {
if (min(rsi[k1:i]) == rsi[i] & rsi[i] < T_min) {
rsi_d[i] <- 1
}
if (max(rsi[k1:i]) == rsi[i] & rsi[i] > T_m) {
rsi_u[i] <- 1
}
}
}
So as you can see there are loop which checks first condition
if (sum(rsi_u[k:i]) == 0 & sum(rsi_d[k:i]) == 0)
and then checks two other conditions. If one of the condition is T, then it writes 1L to ith element of one of two predefined vecs. In addition each iteration relies on result of previous iterations.
The result of this loop is two vecs: rsi_u and rsi_d
In order to speed up this loop I decided to rewrite it with rccp and inline.
This is what I ended up with:
library("Rcpp")
library("inline")
loop_c <- cxxfunction(signature(k = "numeric", L = "numeric",
en = "numeric", rsi = "numeric", o = "numeric", T_min = "numeric", T_m ="numeric"),
plugin = "Rcpp", body = "
for (int i = L + 1; i <= en; i++) {
k = i - o + 1
k1 = i - L + 1
if (accumulate(rsi_u.k(), rsi_u.i(), 0)=0 &&
accumulate(rsi_d.k(), rsi_d.i(), 0)=0) {
if (min_element(rsi.k1(), rsi.i()) = rsi.i() && rsi.i < T_min) {
rsi_u.i = 1
}
if (max_element(rsi.k1(), rsi.i()) = rsi.i() && rsi.i > T_m) {
rsi_d.i = 1
}
}
}
return ?")
So here is the questions:
How can I return to R environment vecs rsi_u and rsi_d in form of data.frame or matrix with 2 cols and length(rsi) rows?
May be this loop can be speeded up with other tools? I tried apply family, but it was slower.
How can I return to R environment vecs rsi_u and rsi_d in form of data.frame or matrix with 2 cols and length(rsi) rows?
Not entirely sure what you're trying to achieve, but regardless you can rewrite your code in C++ using Rcpp and the sugar functions sum, max and min. The code is very similar to the R equivalent code. Some important things to be aware of is that C++ is type-strong, meaning 2 and 2.0 are different numbers, (equivalent to 2 and 2L in R), and vectors are 0-indexed rather than 1-index as in R (eg: The first element of NumericVector F(3) is 0 and the last is 2, in R it would be 1 and 3). This can lead to some confusion but the remaining code is the same.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List fun(NumericVector rsi,
double T_min, double T_m,
R_xlen_t L, R_xlen_t o) {
R_xlen_t n = rsi.size(),
st = L + 1,
en = n - o - 2;
NumericVector rsi_u(n), rsi_d(n);
// Note subsets are 0 indexed, so add -1 to indices
for(R_xlen_t i = st - 1; i < en; i++) {
R_xlen_t k = i - o + 1;
R_xlen_t k1 = i - L + 1;
Range sr(k, i), mr(k1, i);
//LogicalVector rsub = sum(rsi_u[sr]) == 0, rsdb = sum(rsi_d[sr]) == 0;
if(sum(rsi_u[sr]) == 0 && sum(rsi_d[sr]) == 0){
if(min(rsi[mr]) == rsi[i] && rsi[i] < T_min){
rsi_d[i] = 1.0;
}
if(max(rsi[mr]) == rsi[i] && rsi[i] > T_m){
rsi_u[i] = 1.0;
}
}
}
return DataFrame::create(Named("rsi_d") = rsi_d, Named("rsi_u") = rsi_u);
}
As a side note, the inline package is now-a-days completely redundant. Most (if not all?) of the functionality is encapsulated within the Rcpp::cppFunction and Rcpp::sourceCpp functions. The code above can be imported using either of the commands below:
library(Rcpp)
cppFunction(
'
// copy code to here. Note the single " ' "! Needed if there are double quotes in your C++ code
')
# Alternative
sourceCpp(
file = # Insert file path to file with code here
# Alt:
# code = '
# // copy code to here. Note the single " ' "! Needed if there are double quotes in your C++ code
# '
)
And that's it.
May be this loop can be speeded up with other tools? I tried apply family, but it was slower.
As for this part of your question, the main ideas you should be looking toward is vectorizing your code. In your example it is not immediately possible, as you are overwriting part of the rsi_d and rsi_u vectors used in your conditions within the loop. Using *apply is equivalent to using a for-loop and will not improve performance significantly.

Error in for loop - attempt to select less than one element in integerOneIndex

I'm trying to translate a C routine from an old sound synthesis program into R, but have indexing issues which I'm struggling to understand (I'm a beginner when it comes to using loops).
The routine creates an exponential lookup table - the vector exptab:
# Define parameters
sinetabsize <- 8192
prop <- 0.8
BP <- 10
BD <- -5
BA <- -1
# Create output vector
exptab <- vector("double", sinetabsize)
# Loop
while(abs(BD) > 0.00001){
BY = (exp(BP) -1) / (exp(BP*prop)-1)
if (BY > 2){
BS = -1
}
else{
BS = 1
}
if (BA != BS){
BD = BD * -0.5
BA = BS
BP = BP + BD
}
if (BP <= 0){
BP = 0.001
}
BQ = 1 / (exp(BP) - 1)
incr = 1 / sinetabsize
x = 0
stabsize = sinetabsize + 1
for (i in (1:(stabsize-1))){
x = x + incr
exptab [[sinetabsize-i]] = 1 - (BQ * (exp(BP * x) - 1))
}
}
Running the code gives the error:
Error in exptab[[sinetabsize - i]] <- 1 - (BQ * (exp(BP * x) - 1)) :
attempt to select less than one element in integerOneIndex
Which, I understand from looking at other posts, indicates an indexing problem. But, I'm finding it difficult to work out the exact issue.
I suspect the error may lie in my translation. The original C code for the last few lines is:
for (i=1; i < stabsize;i++){
x += incr;
exptab[sinetabsize-i] = 1.0 - (float) (BQ*(exp(BP*x) - 1.0));
}
I had thought the R code for (i in (1:(stabsize-1))) was equivalent to the C code for (i=1; i< stabsize;i++) (i.e. the initial value of i is i = 1, the test is whether i < stabsize, and the increment is +1). But now I'm not so sure.
Any suggestions as to where I'm going wrong would be greatly appreciated!
As you say, array indexing in R starts at 1. In C it starts at zero. I reckon that's your problem. Can sinetabsize-i ever get to zero?

Sherlock and Cost on Hackerrank

It's about this dynamic programming challenge.
If you have a hard time to understand the Problem then see also on AbhishekVermaIIT's post
Basically, you get as input an array B and you construct array A. Fo this array A you need the maximum possible sum with absolute(A[i] - A[i-1]), for i = 1 to N. How to construct array A? --> You can choose for every element A[i] in array A either the values 1 or B[i]. (As you will deduce from the problem description any other value between these two values doesn't make any sense.)
And I came up with this recursive Java solution (without memoization):
static int costHelper(int[] arr, int i) {
if (i < 1) return 0;
int q = max(abs(1 - arr[i-1]) + costHelper(arr, i-1) , abs(arr[i] - arr[i-1]) + costHelper(arr, i-1));
int[] arr1 = new int[i];
for (int j = 0; j < arr1.length-1; j++) {
arr1[j] = arr[j];
}
arr1[i-1] = 1;
int r = max(abs(1 - 1) + costHelper(arr1, i-1) , abs(arr[i] - 1) + costHelper(arr1, i-1));
return max(q , r);
}
static int cost(int[] arr) {
return costHelper(arr, arr.length-1);
}
public static void main(String[] args) {
int[] arr = {55, 68, 31, 80, 57, 18, 34, 28, 76, 55};
int result = cost(arr);
System.out.println(result);
}
Basically, I start at the end of the array and check what is maximizing the sum of the last element minus last element - 1. But I have 4 cases:
(1 - arr[i-1])
(arr[i] - arr[i-1])
(1 - 1) // I know, it is not necessary.
(arr[i] -1)
For the 3rd or 4th case I construct a new array one element smaller in size than the input array and with a 1 as the last element.
Now, the result of arr = 55 68 31 80 57 18 34 28 76 55 according to Hackerrank should be 508. But I get 564.
Since it has to be 508 I guess the array should be 1 68 1 80 1 1 34 1 76 1.
For other arrays I get the right answer. For example:
79 6 40 68 68 16 40 63 93 49 91 --> 642 (OK)
100 2 100 2 100 --> 396 (OK)
I don't understand what is wrong with this algorithm.
I'm not sure exactly what's happening with your particular solution but I suspect it might be that the recursive function only has one dimension, i, since we need a way to identify the best previous solution, f(i-1), both if B_(i-1) was chosen and if 1 was chosen at that point, so we can choose the best among them vis-a-vis f(i). (It might help if you could add a description of your algorithm in words.)
Let's look at the brute-force dynamic program: let m[i][j1] represent the best sum-of-abs-diff in A[0..i] when A_i is j1. Then, generally:
m[i][j1] = max(abs(j1 - j0) + m[i-1][j0])
for j0 in [1..B_(i-1)] and j1 in [1..B_i]
Python code:
def cost(arr):
if len(arr) == 1:
return 0
m = [[float('-inf')]*101 for i in xrange(len(arr))]
for i in xrange(1, len(arr)):
for j0 in xrange(1, arr[i-1] + 1):
for j1 in xrange(1, arr[i] + 1):
m[i][j1] = max(m[i][j1], abs(j1 - j0) + (m[i-1][j0] if i > 1 else 0))
return max(m[len(arr) - 1])
That works but times out since we are looping potentially 100*100*10^5 iterations.
I haven't thought through the proof for it, but, as you suggest, apparently we can choose only from either 1 or B_i for each A_i for an optimal solution. This allows us to choose between those directly in a significantly more efficient solution that won't time out:
def cost(arr):
if len(arr) == 1:
return 0
m = [[float('-inf')]*2 for i in xrange(len(arr))]
for i in xrange(1, len(arr)):
for j0 in [1, arr[i-1]]:
for j1 in [1, arr[i]]:
a_i = 0 if j1 == 1 else 1
b_i = 0 if j0 == 1 else 1
m[i][a_i] = max(m[i][a_i], abs(j1 - j0) + (m[i-1][b_i] if i > 1 else 0))
return max(m[len(arr) - 1])
This is a bottom-up tabulation but we could easily convert it to a recursive one using the same idea.
Here is the javascript code with memoization-
function cost(B,n,val) {
if(n==-1){
return 0;
}
let prev1=0,prev2=0;
if(n!=0){
if(dp[n-1][0]==-1)
dp[n-1][0] = cost(B,n-1,1);
if(dp[n-1][1]==-1)
dp[n-1][1] = cost(B,n-1,B[n]);
prev1=dp[n-1][0];
prev2=dp[n-1][1];
}
prev1 = prev1 + Math.abs(val-1);
prev2 = prev2+ Math.abs(val-B[n]);
return Math.max(prev1,prev2);
}
where B->given array,n->total length,val-> 1 or B[n], value considered by the calling function.
Initial call -> Math.max(cost(B,n-2,1),cost(B,n-2,B[n-1]));
BTW, this took me around 3hrs, rather could have easily done with iteration method. :p
//dp[][0] is when a[i]=b[i]
dp[i][0]=max((dp[i-1][0]+abs(b[i]-b[i-1])),(dp[i-1][1]+abs(b[i]-1)));
dp[i][1]=max((dp[i-1][1]+abs(1-1)),(dp[i-1][0]+abs(b[i-1]-1)));
Initially all the elements in dp have the value of 0.
We know that we will get the answer if at any i the value is b[i] or 1. So the final answer is :
max(dp[n-1][0],dp[n-1][1])
dp[i][0] signifies a[i]=b[i] and dp[i][1] signifies a[i]=1.
So at every i we want the maximum of [i-1][0] (previous element is b[i-1]) or [i-1][1] (previous element is 1)

Sum of combinations of numbers

I want to solve a mathematical problem in a fastest possible way.
I have a set of natural numbers between 1 to n, for example {1,2,3,4,n=5} and I want to calculate a formula like this:
s = 1*2*3*4+1*2*3*5+1*2*4*5+1*3*4*5+2*3*4*5
as you can see, each element in the sum is a multiplications of n-1 numbers in the set. For example in (1*2*3*4), 5 is excluded and in (1*2*3*5), 4 is excluded. I know some of the multiplications are repeated, for example (1*2) is repeated in 3 of the multiplications. How can I solve this problem with least number of multiplications.
Sorry for bad English.
Thanks.
Here is a way that does not "cheat" by replacing multiplication with repeated addition or by using division. The idea is to replace your expression with
1*2*3*4 + 5*(1*2*3 + 4*(1*2 + 3*(1 + 2)))
This used 9 multiplications for the numbers 1 through 5. In general I think the multiplication count would be one less than the (n-1)th triangular number, n * (n - 1) / 2 - 1. Here is Python code that stores intermediate factorial values to reduce the number of multiplications to just 6, or in general 2 * n - 4, and the addition count to the same (but half of them are just adding 1):
def f(n):
fact = 1
term = 2
sum = 3
for j in range(2, n):
fact *= j
term = (j + 1) * sum
sum = fact + term
return sum
The only way to find which algorithm is the fastest is to code all of them in one language, and run each using a timer.
The following would be the most straightforward answer.
def f(n):
result = 0
nList = [i+1 for i in range(n)]
for i in range(len(nList)):
result += reduce(lambda x, y: x*y,(nList[:i]+nList[i+1:]))
return result
Walkthrough - use the reduce function to multiply all list's of length n-1 and add to the variable result.
If you just want to minimise the number of multiplications, you can replace all the multiplications by additions, like this:
// Compute 1*2*…*n
mult_all(n):
if n = 1
return 1
res = 0
// by adding 1*2*…*(n-1) an entirety of n times
for i = 1 to n do
res += mult_all(n-1)
return res
// Compute sum of 1*2*…*(i-1)*(i+1)*…*n
sum_of_mult_all_but_one(n):
if n = 1
return 0
// by computing 1*2*…*(n-1) + (sum 1*2*…*(i-1)*(i+1)*…*(n-1))*n
res = mult_all(n-1)
for i = 1 to n do
res += sum_of_mult_all_but_one(n-1)
return res
Here is an answer that would work with javascript. It is not the fastest way because it is not optimized, but it should work if you want to just find the answer.
function combo(n){
var mult = 1;
var sum = 0;
for (var i = 1; i <= n; i++){
mult = 1;
for (var j = 1; j<= n; j++){
if(j != i){
mult = mult*j;
}
}
sum += mult;
}
return (sum);
}
alert(combo(n));

Project Euler #2 in R

I'm a novice at programming and I've been trying Project Euler to learn more about R.
The question that's stumping me is as follows:
Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89
By considering the terms in the Fibonacci sequence whose values do not exceed 4 million, find the sum of the even valued terms.
I've started to tackle this problem by setting the first couple numbers in the sequence:
num <- c(1, 2)
Next I'm trying to create a function fib() that will sum the last two numbers in num and append them to the vector.
My first attempt has been to do the following:
num <- c(1, 2)
fib <- function(x) {
sum <- sum(tail(x,2))
while (sum <= 4e6) {
x <- append(x, sum)
return(x)
}
}
But when I run this I only get one additional Fibonacci number:
> fib(num)
[1] 1 2 3
In turn, I tried moving return(x):
num <- c(1, 2)
fib <- function(x) {
sum <- sum(tail(x,2))
while (sum <= 4e6) {
x <- append(x, sum)
}
return(x)
}
But this seems to only create an infinite loop.
As a caveat, I'm comfortable summing the even-valued terms with a modulo and I'll do this after I work out this particular issue.
Can you point out where I'm going wrong with my code? (If you could give me a bump in the right direction without providing an explicit solution that would be even more appreciated.)
Thanks to #Minnow and #blakeoft for helping me to get a handle on this. Here's what I ended up doing to solve this problem:
SPOILER ALERT
num <- c(1, 2)
fib <- function(x) {
last2 <- sum(tail(x,2))
while (last2 <= 4000000) {
x <- append(x, last2)
last2 <- sum(tail(x,2))
}
return(x)
}
I won't include the last bit with the modulo since I want to keep the answer specific to the initial question.
Is this a language-specific problem? You don't really need a function, just three variables say prev, curr and next. Preload the first two and prime the even values' sum. You know how to set the value of next, then process it and shove the values back through your history.
You didn't want a specific answer, but now you have accepted one, this is what I meant - it's not r but it shows how simple it can be.
int sum = 2;
int prev = 1;
int curr = 2;
int next = 0;
while (next < 4000000) {
next = prev + curr;
if (next % 2 == 0)
sum += next;
prev = curr;
curr = next;
}
printf ("%d", sum);
Without giving it away:
Walk through your loop and see if you can figure out the expected output. You're close, but the sequence of events is not going to yield what you expect.

Resources