How to rewrite Erlang combinations algorithm in Elixir? - functional-programming

I've been tinkering with Elixir for the last few weeks. I just came across this succinct combinations algorithm in Erlang, which I tried rewriting in Elixir but got stuck.
Erlang version:
comb(0,_) ->
[[]];
comb(_,[]) ->
[];
comb(N,[H|T]) ->
[[H|L] || L <- comb(N-1,T)]++comb(N,T).
Elixir version I came up with this, but it's not correct:
def combination(0, _), do: [[]]
def combination(_, []), do: []
def combination(n, [x|xs]) do
for y <- combination(n - 1, xs), do: [x|y] ++ combination(n, xs)
end
Example usage, with incorrect results:
iex> combination(2, [1,2,3])
[[1, 2, [3], [2, 3]]]
Any pointers on what I'm doing wrong?
Thanks!
Sean

You need to wrap the for expression in parentheses like the Erlang code.
def combination(n, [x|xs]) do
(for y <- combination(n - 1, xs), do: [x|y]) ++ combination(n, xs)
end
Demo:
iex(1)> defmodule Foo do
...(1)> def combination(0, _), do: [[]]
...(1)> def combination(_, []), do: []
...(1)> def combination(n, [x|xs]) do
...(1)> (for y <- combination(n - 1, xs), do: [x|y]) ++ combination(n, xs)
...(1)> end
...(1)> end
{:module, Foo,
<<70, 79, 82, 49, 0, 0, 6, 100, 66, 69, 65, 77, 69, 120, 68, 99, 0, 0, 0, 137, 131, 104, 2, 100, 0, 14, 101, 108, 105, 120, 105, 114, 95, 100, 111, 99, 115, 95, 118, 49, 108, 0, 0, 0, 2, 104, 2, ...>>,
{:combination, 2}}
iex(2)> Foo.combination 2, [1, 2, 3]
[[1, 2], [1, 3], [2, 3]]

Related

Multidimensional random draw without replacement with 'predrawn' samples in pytorch

I have an (N, I) tensor of N rows with I indices beween 0 and Z, e.g.,
N=5, I=3, Z=100:
foo = tensor([[83, 5, 85],
[ 7, 60, 66],
[89, 25, 63],
[58, 67, 47],
[12, 46, 40]], device='cuda:0')
Now I want to efficiently add X random additional new indices (i.e., not yet included in the tensor!) between 0 and Z to the tensor, e.g.:
foo_new = tensor([[83, 5, 85, 9, 43, 53, 42],
[ 7, 60, 66, 85, 64, 22, 1],
[89, 25, 63, 38, 24, 4, 75],
[58, 67, 47, 83, 43, 29, 55],
[12, 46, 40, 74, 21, 11, 52]], device='cuda:0')
The tensor would in the end have in each row I+X unique indices between 0 and Z, where I indices are the ones from the initial tensor, and X indices are uniform randomly drawn without replacement from the remaining indices {0...Z}\{I(n)}, where {I(n)} are the inidices of the nth row.
So it's like a multidimensional random draw without replacement from indices 0 to Z, where the first I draws (in each row) are enforced to result in the indices given by the initial tensor.
How would I do this efficiently, especially with potentially large Z?
What I tried so far (which was quite slow):
device = torch.cuda.current_device()
notinfoo = torch.ones((N,I), device=device).byte()
N_row = torch.arange(N, device=device).unsqueeze(dim=-1)
notinfoo[N_row, foo] = 0
foo_new = torch.stack([torch.cat((f, torch.arange(Z, device=device)[nf][torch.randperm(Z-I, device=device)][:X])) for f,nf in zip(foo,notinfoo)])
Use first numpy numpy.random.choice to get samples with replace=False for without replacement sampling.
and then concat both using torch.cat
import numpy as np
foo_new = torch.tensor(np.random.choice(100 , (5,4), replace=False)) # Z = 100
foo_new = torch.cat((foo, foo_new), 1)
foo_new
tensor([[83, 5, 85, 56, 83, 16, 20],
[ 7, 60, 66, 43, 31, 75, 67],
[89, 25, 63, 96, 3, 13, 11],
[58, 67, 47, 55, 92, 70, 35],
[12, 46, 40, 79, 61, 58, 76]])

In R, how do I find the x value that will equate to a certain y value on a cdplot? And vice-versa?

I have a cdplot where I'm trying to find my x value where the distribution (or the y value) = .5 and couldn't find a method to do it that works. Additionally I want to find the y value when my x value is 0 and would like help finding that equation to if it's different.
I cant really provide my code as it relies on a saved workspace with a large dataframe. I'll give this as an example:
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,1, 2, 1, 1, 1, 1, 1),levels = 1:2, labels = c("no", "yes"))
temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)
cdplot(fail ~ temperature)
So I don't need a quick and dirty way to solve this specific example, I need a code I can apply to my own workspace.
If you capture the return of cdplot, you get a function that you can use to find these values.
CDP = cdplot(fail ~ temperature
uniroot(function(x) { CDP$no(x) - 0.5}, c(55,80))
> uniroot(function(x) { CDP$no(x) - 0.5}, c(55,80))
$root
[1] 62.34963
$f.root
[1] 3.330669e-16

Improve curve fit to data in R

Having trouble fitting an appropriate curve to this data.
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 21, 31, 41, 51, 61, 71,
81, 91, 110, 210, 310, 410, 510, 610, 710, 810, 910, 1100, 2100,
3100, 4100, 5100, 6100, 7100, 8100, 9100)
y <- c(75, 84, 85, 89, 88, 91, 92, 92, 93, 92, 94, 95, 95, 96, 95,
95, 94, 97, 97, 97, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99)
Tried so far:
fit1 <- lm(y~log(x)+I(1/x))
fit2 <- lm(y~log(x)+I(1/x)+x)
plot(x,y, log="x")
lines(0.01:10000, predict(fit1, newdata = data.frame(x=0.01:10000)))
lines(0.01:10000, predict(fit2, newdata = data.frame(x=0.01:10000)), col='red')
The fits are ok, but arrived at entirely empirically and there is room for improvement. I did not fit loess or splines to be any better.
The concrete goal is to increase the R^2 of the fit and improve regression diagnostics (e.g. Q-Q plots of residuals).
Edit: Expected Model: this is sampling data, where more samples (x) improve the accuracy of the estimate (y); it would saturate at 100%.
This would be my function guess and according fit in python
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as so
def f( x, a, b , s, p ):
return a + b * s * ( x - 1 ) / ( 1 + ( s * ( x - 1 ) )**( abs( 1 / p ) ) )**abs( p )
def g( x, a , s, p ):
return a * s * x / ( 1 + ( s * x )**( abs( 1 / p ) ) )**abs( p )
def h( x, s, p ):
return 100 * s * x / ( 1 + ( s * x )**( abs( 1 / p ) ) )**abs( p )
xData = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 21, 31, 41, 51, 61, 71,
81, 91, 110, 210, 310, 410, 510, 610, 710, 810, 910, 1100, 2100,
3100, 4100, 5100, 6100, 7100, 8100, 9100 ]
yData = [ 75, 84, 85, 89, 88, 91, 92, 92, 93, 92, 94, 95, 95, 96, 95,
95, 94, 97, 97, 97, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99 ]
xList = np.logspace( 0, 5, 100 )
bestFitF, err = so.curve_fit( f , xData, yData, p0=[ 75, 25, 1, 1])
bestFitG, err = so.curve_fit( g , xData, yData)
bestFitH, err = so.curve_fit( h , xData, yData)
fList = np.fromiter( ( f(x, *bestFitF ) for x in xList ), np.float)
gList = np.fromiter( ( g(x, *bestFitG ) for x in xList ), np.float)
hList = np.fromiter( ( h(x, *bestFitH ) for x in xList ), np.float)
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( xData, yData, marker='o', linestyle='')
ax.plot( xList, fList, linestyle='-.', label='f')
ax.plot( xList, gList, linestyle='-.', label='g')
ax.plot( xList, hList, linestyle='-.', label='h')
ax.set_xscale( 'log' )
ax.legend( loc=0 )
plt.show()
Function f requires start values, g and h don't. It should be possible to write some code to guess the parameters, basically the first one is yData[0], the second is yData[-1] - yData[0] and the others don't matter and are just set to 1, but I did it manually here.
Both, g and h have the property that they pass ( 0, 0 ).
Additionally, h will saturate at 100.
Note: Sure the more parameters the better the fit, but if it is, e.g., a CDF you probably want a fixed saturation value and maybe the pass through ( 0, 0 ) as well.
This might be an acceptable fit to the Gunary equation, with an R-squared value of 0.976:
y = x / (a + bx + cx^0.5)
Fitting target of lowest sum of squared absolute error = 2.4509677507601545E+01
a = 1.2327255760994933E-03
b = 1.0083740273268828E-02
c = 1.9179200839782879E-03
R package drc has many options.
Here is a 5-parameter log-logistic model, which yields residuals lower than the fits in the question.
BONUS: It has a self-starter function, so you avoid the challenge of finding initial values for non-linear regression.
library(drc)
dosefit <- drm(y ~ x, fct = LL2.5())

Find smallest value with tolerance

How would you find the last lowest value in this vector with a certain tolerance without changing the order?
Example:
c(0, 785, 10273, 6231, 5417, 2328, 5249, 1725, 2656, 6258, 2687,
2651, 1063, 325, 2556, 738, 631, 140, 57, 1173, 407, 225, 135,
69, 81, 21, 16, 3, 0, 26, 1, 2, 0, 1, 2, 1, 1, 0, 0, 3, 1, 0,
0, 0, 1, 0, 0, 0, 0, 1, 0)
Assume a tolerance of 26. Working backwards from the last element (0) I would like to return the position of the number with tolerance greater than 26 from the previous number. In this example it would be position 30, or the number 26.
You can use Position with the right=TRUE argument to avoid having to search forwards and then take the last result:
Position(identity, diff(x) >= 26, right=TRUE) + 1
#[1] 30
If I understand your question correctly, you can do this:
tail(which(diff(x)>=26),1L)+1L;
## [1] 30
Data
x <- c(0,785,10273,6231,5417,2328,5249,1725,2656,6258,2687,2651,1063,325,2556,738,631,140,57,
1173,407,225,135,69,81,21,16,3,0,26,1,2,0,1,2,1,1,0,0,3,1,0,0,0,1,0,0,0,0,1,0);

How to combine 2 vectors in a particular order?

I have the following variables:
loc.dir <- c(1, -1, 1, -1, 1, -1, 1)
max.index <- c(40, 46, 56, 71, 96, 113, 156)
min.index <- c(38, 48, 54, 69, 98, 112, 155)
My goal is to produce the following:
data.loc <- c(40, 48, 56, 69, 96, 112, 156)
In words, I look at each element loc.dir. If the ith element is 1, then I will take the ith element in max.index. On the other hand, if the ith element is -1, then I will take the ith element in min.index.
I am able to get the elements that should be in data.loc by using:
plus.1 <- max.index[which(loc.dir == 1)]
minus.1 <- min.index[which(loc.dir == -1)]
But now I don't know how to combine plus.1 and minus.1 so that it is identical to data.loc
ifelse was designed for this:
ifelse(loc.dir == 1, max.index, min.index)
#[1] 40 48 56 69 96 112 156
It does something similar to this:
res <- min.index
res[loc.dir == 1] <- max.index[loc.dir == 1]

Resources