Related
I am a novice Rcpp user. I want to fasten my for loop which uses several entities from r environment, and updates two vectors through iterations.
The problem is that this is my first time facing c or c++ so I do not understand how to write rcpp with inline packages.
Here is the reproducible loop that I want to rewrite.
rsi <- c(NaN, 0, 0, 9.2, 28, 11, 9, 8, 38, 27, 62, 57,59,67, 76, 68, 69, 49)
L <- 2
o <- 2
T_min <-100
T_m <- 0
# Predefine two vectors for results to be written in
rsi_u <- rep(0, length(rsi))
rsi_d <- rep(0, length(rsi))
# Set range of for loop to be apllied on
st <- L + 1 # L and o is some param fron environment
en <- length(rsi) - o - 2
for (i in st:en) {
k <- i - o + 1
k1 <- i - L + 1
if (sum(rsi_u[k:i]) == 0 & sum(rsi_d[k:i]) == 0) {
if (min(rsi[k1:i]) == rsi[i] & rsi[i] < T_min) {
rsi_d[i] <- 1
}
if (max(rsi[k1:i]) == rsi[i] & rsi[i] > T_m) {
rsi_u[i] <- 1
}
}
}
So as you can see there are loop which checks first condition
if (sum(rsi_u[k:i]) == 0 & sum(rsi_d[k:i]) == 0)
and then checks two other conditions. If one of the condition is T, then it writes 1L to ith element of one of two predefined vecs. In addition each iteration relies on result of previous iterations.
The result of this loop is two vecs: rsi_u and rsi_d
In order to speed up this loop I decided to rewrite it with rccp and inline.
This is what I ended up with:
library("Rcpp")
library("inline")
loop_c <- cxxfunction(signature(k = "numeric", L = "numeric",
en = "numeric", rsi = "numeric", o = "numeric", T_min = "numeric", T_m ="numeric"),
plugin = "Rcpp", body = "
for (int i = L + 1; i <= en; i++) {
k = i - o + 1
k1 = i - L + 1
if (accumulate(rsi_u.k(), rsi_u.i(), 0)=0 &&
accumulate(rsi_d.k(), rsi_d.i(), 0)=0) {
if (min_element(rsi.k1(), rsi.i()) = rsi.i() && rsi.i < T_min) {
rsi_u.i = 1
}
if (max_element(rsi.k1(), rsi.i()) = rsi.i() && rsi.i > T_m) {
rsi_d.i = 1
}
}
}
return ?")
So here is the questions:
How can I return to R environment vecs rsi_u and rsi_d in form of data.frame or matrix with 2 cols and length(rsi) rows?
May be this loop can be speeded up with other tools? I tried apply family, but it was slower.
How can I return to R environment vecs rsi_u and rsi_d in form of data.frame or matrix with 2 cols and length(rsi) rows?
Not entirely sure what you're trying to achieve, but regardless you can rewrite your code in C++ using Rcpp and the sugar functions sum, max and min. The code is very similar to the R equivalent code. Some important things to be aware of is that C++ is type-strong, meaning 2 and 2.0 are different numbers, (equivalent to 2 and 2L in R), and vectors are 0-indexed rather than 1-index as in R (eg: The first element of NumericVector F(3) is 0 and the last is 2, in R it would be 1 and 3). This can lead to some confusion but the remaining code is the same.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List fun(NumericVector rsi,
double T_min, double T_m,
R_xlen_t L, R_xlen_t o) {
R_xlen_t n = rsi.size(),
st = L + 1,
en = n - o - 2;
NumericVector rsi_u(n), rsi_d(n);
// Note subsets are 0 indexed, so add -1 to indices
for(R_xlen_t i = st - 1; i < en; i++) {
R_xlen_t k = i - o + 1;
R_xlen_t k1 = i - L + 1;
Range sr(k, i), mr(k1, i);
//LogicalVector rsub = sum(rsi_u[sr]) == 0, rsdb = sum(rsi_d[sr]) == 0;
if(sum(rsi_u[sr]) == 0 && sum(rsi_d[sr]) == 0){
if(min(rsi[mr]) == rsi[i] && rsi[i] < T_min){
rsi_d[i] = 1.0;
}
if(max(rsi[mr]) == rsi[i] && rsi[i] > T_m){
rsi_u[i] = 1.0;
}
}
}
return DataFrame::create(Named("rsi_d") = rsi_d, Named("rsi_u") = rsi_u);
}
As a side note, the inline package is now-a-days completely redundant. Most (if not all?) of the functionality is encapsulated within the Rcpp::cppFunction and Rcpp::sourceCpp functions. The code above can be imported using either of the commands below:
library(Rcpp)
cppFunction(
'
// copy code to here. Note the single " ' "! Needed if there are double quotes in your C++ code
')
# Alternative
sourceCpp(
file = # Insert file path to file with code here
# Alt:
# code = '
# // copy code to here. Note the single " ' "! Needed if there are double quotes in your C++ code
# '
)
And that's it.
May be this loop can be speeded up with other tools? I tried apply family, but it was slower.
As for this part of your question, the main ideas you should be looking toward is vectorizing your code. In your example it is not immediately possible, as you are overwriting part of the rsi_d and rsi_u vectors used in your conditions within the loop. Using *apply is equivalent to using a for-loop and will not improve performance significantly.
Engineering notation differs from scientific notation in that:
The exponent is always a multiple of 3, and
The digits to the left of the decimal point are scaled to range from 1 to 999.
My use case calls for specifying 0 to 13 digits to the right of the decimal point. The default is 4.
Here are desired examples:
const Avogadro = 6.022140857e23
str = eng_notation(Avogadro, digits=0)
# str = "602E+21"
str = eng_notation(Avogadro, digits=1)
# str = "602.2E+21"
# Default 4 digits to right of decimal point.
str = eng_notation(Avogadro)
# str = "602.2141E+21"
str = eng_notation(Avogadro, digits=10)
# str = "602.2140857000E+21"
# Negative and fractional numbers should also work.
str = eng_notation(-0.01234567, digits=7)
# str = "-12.4567000E-03"
Any suggestions?
Edit: I updated the requirements to 0 to 13 digits to the right of the decimal point (from 0 to 15 previously).
Use the NumericIO.jl package
julia> using NumericIO
julia> const Avogadro = 6.022140857e23;
julia> formatted(Avogadro, :ENG, ndigits=4, charset=:ASCII)
"602.2E21"
julia> formatted(Avogadro, :ENG, ndigits=4)
"602.2×10²¹"
The updated eng_notation() function below appears to solve the problem.
The number of digits to the right of the decimal is now limited to 0 to 13 digits instead of 0 to 15 digits.
Here are some examples:
julia> const Avogadro = 6.022140857e23
6.022140857e23
julia> eng_notation(Avogadro, digits=0)
"602E+21"
julia> eng_notation(Avogadro, digits=1)
"602.2E+21"
julia> eng_notation(Avogadro)
"602.2141E+21"
julia> eng_notation(Avogadro, digits=10)
"602.2140857000E+21"
julia> eng_notation(-0.01234567, digits=7)
"-12.3456700E-03"
julia> eng_notation(Avogadro, digits=13, plus_sign=true)
"+602.2140857000000E+21"
julia> eng_notation(floatmax(Float64), digits=13)
"179.7693134862316E+306"
julia> eng_notation(floatmin(Float64), digits=13)
"22.2507385850720E-309"
Here is the updated code:
"""
eng_notation(num, digits=4, spec="E", plus_sign=false)
Return `num` in engineering notation where the exponent is a multiple of 3 and the
number before the decimal point ranges from 1 to 999.
# Arguments
- `num`: any subtype of `Number`. `Complex` subtypes are passed through unchanged.
Numbers greater than (in absolute value) `floatmax(Float64)`=1.7976931348623157e308
are passed through unchanged.
Numbers less than (in absolute value) `floatmin(Float64)`=2.2250738585072014e-308 and > 0.0
are passed through unchanged.
- `digits`: the number of digits to the right of the decimal point. `digits` is clipped from 0 to 13.
- `spec`: "E", 'E', "e", or 'e' sets case of the the exponent letter.
- `plus_sign`: when `true` includes a plus sign, "+", in front of numbers that are >= 0.0.
# Examples
```julia_repl
julia> const Avogadro = 6.022140857e23
6.022140857e23
julia> eng_notation(Avogadro, digits=0)
"602E+21"
julia> eng_notation(Avogadro, digits=1)
"602.2E+21"
julia> eng_notation(Avogadro)
"602.2141E+21"
julia> eng_notation(Avogadro, digits=10)
"602.2140857000E+21"
julia> eng_notation(-0.01234567, spec="e", digits=7)
"-12.3456700e-03"
julia> eng_notation(Avogadro, digits=13, plus_sign=true)
"+602.2140857000000E+21"
julia> eng_notation(floatmax(Float64), digits=13)
"179.7693134862316E+306"
julia> eng_notation(floatmin(Float64), digits=13)
"22.2507385850720E-309"
```
"""
function eng_notation(num::Number; digits=4, spec="E", plus_sign=false)
# Complex subtypes are just passed through unchanged.
if typeof(num) <: Complex; return num; end
# Values larger/smaller that Float64 limits just pass through unchanged.
if abs(num) > floatmax(Float64); return num; end # max=1.7976931348623157e308
if abs(num) < floatmin(Float64) && num != 0; return num; end # min=2.2250738585072014e-308
# Min of 0 and max of 13 digits after the decimal point (dp).
digits = digits < 0 ? 0 : digits
digits = digits > 13 ? 13 : digits
# Don't add a dp when 0 digits after dp.
dec_pt = digits == 0 ? "" : "."
spec_char = spec[1] == 'E' ? 'E' : 'e'
sign = ifelse(num < 0, "-", ifelse(plus_sign, "+", ""))
# This Julia code is modified from Java code at:
# http://www.labbookpages.co.uk/software/java/engNotation.html
# If the value is zero, then simply return 0 with the correct number of digits.
if num == 0; return string(sign, 0, dec_pt, "0"^digits, spec_char, "+00"); end
# If the value is negative, make it positive so the log10 works
pos_num = num < 0 ? -num : num
log10_num = log10(pos_num);
# Determine how many orders of 3 magnitudes the value is.
count = floor(log10_num/3);
# Scale num into the range 1 <= num < 1000.
val = num/10.0^(3count)
if digits == 0
val_int = Int(round(val, digits=0))
else
val_int = Int(trunc(val))
end
n_val_digits = length(string(val_int))
n_val_digits = ifelse(val_int < 0, n_val_digits-1, n_val_digits) # Account for - sign
# Determine fractional digits to requested number of digits.
# Use 15 below because 1 + 15 = 16, and 16 sigdigits is around the limit of Float64.
num_str = #sprintf "%+.15e" num
# Remove sign and decimal pt.
digits_str = replace(num_str[2:end], "." => "")
e_index = findlast("e", digits_str).start
# Remove exponent.
digits_str = digits_str[1:e_index-1]
# Jump over leading digits to get digits to right of dec pt.
frac_digits = digits_str[n_val_digits+1:end]
if digits == 0
frac_digits = ""
else
frac_digits = string(Int(round(parse(Int, frac_digits), sigdigits=digits)))
# Round may not give us digits zeros, so we just pad to the right.
frac_digits = rpad(frac_digits, digits, "0")
frac_digits = frac_digits[1:digits]
end
# Determine the scaled exponent and pad with zeros for small exponents.
exp = Int(3count)
exp_sign = exp >= 0 ? "+" : "-"
exp_digits = lpad(abs(exp), 2, "0")
return string(sign, abs(val_int), dec_pt, frac_digits, spec_char, exp_sign, exp_digits)
end # eng_notation()
Here are a few tests:
function test_eng_notation()
#testset "Test eng_notation() function" begin
Avogadro = 6.022140857e23
#test eng_notation(Avogadro, digits=0) == "602E+21"
#test eng_notation(Avogadro, digits=1) == "602.2E+21"
#test eng_notation(Avogadro) == "602.2141E+21"
#test eng_notation(Avogadro, digits=10) == "602.2140857000E+21"
#test eng_notation(-0.01234567, spec="e", digits=7) == "-12.3456700e-03"
#test eng_notation(Avogadro, digits=13, plus_sign=true) == "+602.2140857000000E+21"
#test eng_notation(floatmax(Float64), digits=13) == "179.7693134862316E+306"
#test eng_notation(floatmin(Float64), digits=13) == "22.2507385850720E-309"
end
return nothing
end
So here is my situation. Ive been trying to make a advanced calculator in python 3.4, one where you can just type something like this. '1 + 1', and it would then give you the answer of '2'. Now i will explain how my calculator is supposed to work. So you start by entering a maths equation, then it counts the words you entered based on the spaces. It does this so it knows how long some future loops need to be. Then it splits up everything that you entered. It splits it up into str's and int's but its all still in the same variable and it's all still in order. The thing i'm having trouble with is when it is meant to actually do the calculations.
here is all of my code-
# This is the part were they enter the maths equation
print("-------------------------")
print("Enter the maths equation")
user_input = input("Equation: ")
# This is were it counts all of the words
data_count = user_input.split(" ")
count = data_count.__len__()
# Here is were is splits it into str's and int's
n1 = 0
data = []
if n1 <= count:
for x in user_input.split():
try:
data.append(int(x))
except ValueError:
data.append(x)
n1 += 1
# And this is were it actually calculates everything
number1 = 0
number2 = 0
n1 = 0
x = 0
answer = 0
while n1 <= count:
#The code below checks if it is a number
if data[n1] < 0 or data[n1] > 0:
if x == 0:
number1 = data[n1]
elif x == 1:
number2 = data[n1]
elif data[n1] is "+":
if x == 0:
answer += number1
elif x == 1:
answer += number2
n1 += 1
x += 1
if x > 1:
x = 0
print("Answer =", answer)
but during the calculation it messes up and gives me and error
error-
if data[n1] < 0 or data[n1] > 0:
TypeError: unorderable types: str() < int()
can anyone see what i am doing wrong here?
Thanks
When you are comparing a string and an integer, this problem comes.
Python doesn't guess, it throws an error.
To fix this, simply call int() to convert your string to an integer:
int(input(...))
So, corrected statement should be:
if int(data[n1]) < 0 or int(data[n1]) > 0:
How can the Kendall tau distance (a.k.a. bubble-sort distance) between two permutations be calculated in R without loading additional libraries?
Here is an O(n.log(n)) implementation scraped together after reading around, but I suspect there may be better R solutions.
inversionNumber <- function(x){
mergeSort <- function(x){
if(length(x) == 1){
inv <- 0
#printind(' base case')
} else {
n <- length(x)
n1 <- ceiling(n/2)
n2 <- n-n1
y1 <- mergeSort(x[1:n1])
y2 <- mergeSort(x[n1+1:n2])
inv <- y1$inversions + y2$inversions
x1 <- y1$sortedVector
x2 <- y2$sortedVector
i1 <- 1
i2 <- 1
while(i1+i2 <= n1+n2+1){
if(i2 > n2 || (i1 <= n1 && x1[i1] <= x2[i2])){ # ***
x[i1+i2-1] <- x1[i1]
i1 <- i1 + 1
} else {
inv <- inv + n1 + 1 - i1
x[i1+i2-1] <- x2[i2]
i2 <- i2 + 1
}
}
}
return (list(inversions=inv,sortedVector=x))
}
r <- mergeSort(x)
return (r$inversions)
}
.
kendallTauDistance <- function(x,y){
return(inversionNumber(order(x)[rank(y)]))
}
If one needs custom tie-breaking one would have to fiddle with the last condition on the line marked # ***
Usage:
> kendallTauDistance(c(1,2,4,3),c(2,3,1,4))
[1] 3
You could use
(choose(length(x),2) - cov(x,y,method='kendall')/2)/2
if you know that both of the input lists x and y do not contain duplicates.
Hmmm. Somebody is interested in exactly same thing which I have been working on.
Below is my code in python.
from collections import OrderedDict
def invert(u):
identity = sorted(u)
ui = []
for x in identity:
index = u.index(x)
ui.append(identity[index])
print "Given U is:\n",u
print "Inverse of U is:\n",ui
return identity,ui
def r_vector(x,y,id):
# from collections import OrderedDict
id_x_Map = OrderedDict(zip(id,x))
id_y_Map = OrderedDict(zip(id,y))
r = []
for x_index,x_value in id_x_Map.items():
for y_index,y_value in id_y_Map.items():
if (x_value == y_index):
r.append(y_value)
print r
return r
def xr_vector(x):
# from collections import OrderedDict
values_checked = []
unorderd_xr = []
ordered_xr = []
for value in x:
values_to_right = []
for n in x[x.index(value)+1:]:
values_to_right.append(n)
result = [i for i in values_to_right if i < value]
if(len(result)!=0):
values_checked.append(value)
unorderd_xr.append(len(result))
value_ltValuePair = OrderedDict(zip(values_checked,unorderd_xr))
for key in sorted(value_ltValuePair):
# print key,value_ltValuePair[key]
ordered_xr.append(value_ltValuePair[key])
print "Xr= ",ordered_xr
print "Kendal Tau distance = ",sum(ordered_xr)
if __name__ == '__main__':
print "***********************************************************"
print "Enter the first string (U):"
u = raw_input().split()
print "Enter the second string (V):"
v = raw_input().split()
print "***********************************************************"
print "Step 1: Find U Inverse"
identity,uinverse = invert(u)
print "***********************************************************"
print "Step 2: Find R = V.UInverse"
r = r_vector(v,uinverse,identity)
print "***********************************************************"
print "Step 3: Finding XR and Kenday_Tau"
xr_vector(r)
About the approach/ algorithm to find Kendall Tau distance this way, I would either leave it to you, or point towards the research paper Optimal Permutation Codes and the Kendall’s τ-Metric
You can implement (Approach) the same in R.
I have a vector of widths,
ws = c(1,1,2,1,3,1)
From this vector I'd like to have another vector of this form:
indexes = c(1,2,3,5,6,7,9,11,12)
In order to create such vector I did the following for loop in R:
ws = c(1,1,2,1,3,1)
indexes = rep(0, sum(ws))
counter = 1
counter2 = 1
last = 0
for(i in 1:length(ws))
{
if (ws[i] == 1)
{
indexes[counter] = counter2
counter = counter + 1
} else {
for(j in 1:ws[i])
{
indexes[counter] = counter2
counter = counter + 1
counter2 = counter2+2
}
counter2 = counter2 - 2
}
counter2 = counter2+1
}
The logic is as follows, each element in ws specifies the respective number of elements in index. For example if ws is 1, the respective number of elements in indexes is 1, but if ws is > 1, let us say 3, the respective number of elements in index is 3, and the elements are skipped 1-by-1, corresponding to 3,5,7.
However, I'd like to avoid for loops since they tend to be very slow in R. Do you have any suggestions on how to achieve such results only with vector operations? or some more crantastic solution?
Thanks!
Here's a vectorized one-liner for you:
ws <- c(1,1,2,1,3,1)
cumsum((unlist(sapply(ws, seq_len)) > 1) + 1)
# [1] 1 2 3 5 6 7 9 11 12
You can pick it apart piece by piece, working from the inside out, to see how it works.