Optimising a program to tally averages

Optimising a program to tally averages - math

Suppose I wanted to prove empirically that a 1d12 (twelve sided die) follows a rectangular distribution and 2d6 a normal distributions.
The quick and dirty way would be to tally about 1000 randomly generated numbers, put them in an array, and then calculate the mean and expected values from there.
But what if I wanted to save memory by using a running total instead of the 1000 member array?
Could I do something like this:
for (i =0; i < 1000; i++){
x = Math.Random(1,6);
runningTotal += x;
}
mean = runningTotal / 1000;

Another approach [is] using deques is to maintain a sequence of recently
added elements by appending to the right and popping to the left:
One idea would be something along the lines of this simple moving average recipe with Python's double ended collections.deque:
def moving_average(iterable, n=3):
# moving_average([40, 30, 50, 46, 39, 44]) --> 40.0 42.0 45.0 43.0
# https://en.wikipedia.org/wiki/Moving_average
it = iter(iterable)
d = deque(itertools.islice(it, n-1))
d.appendleft(0)
s = sum(d)
for elem in it:
s += elem - d.popleft()
d.append(elem)
yield s / n
Source: https://docs.python.org/3/library/collections.html#deque-recipes

Related

How to find the filter coefficients for a DVBS2 shaping SRRC?

in the DVBS2 Standard the SRRC filter is defined as
How can i find the filter's time domain coefficients for implementation? The Inverse Fourier transform of this is not clear to me.

For DVBS2 signal you can use RRC match filter before timing recovery. For match filter, you can use this expression:
For example for n_ISI = 32 and Roll of factor = 0.25 with any sample per symbol you can use this Matlab code:
SPS = 4; %for example
n_ISI=32;
rolloff = 0.25;
n = linspace(-n_ISI/2,n_ISI/2,n_ISI*SPS+1) ;
rrcFilt = zeros(size(n)) ;
for iter = 1:length(n)
if n(iter) == 0
rrcFilt(iter) = 1 - rolloff + 4*rolloff/pi ;
elseif abs(n(iter)) == 1/4/rolloff
rrcFilt(iter) = rolloff/sqrt(2)*((1+2/pi)*sin(pi/4/rolloff)+(1-2/pi)*cos(pi/4/rolloff)) ;
else
rrcFilt(iter) = (4*rolloff/pi)/(1-(4*rolloff*n(iter)).^2) * (cos((1+rolloff)*pi*n(iter)) + sin((1-rolloff)*pi*n(iter))/(4*rolloff*n(iter))) ;
end
end
But if you want to use SRRC, there are two ways: 1. You can use its frequency representation form if you use filtering in the frequency domain. And for implementation, you can use the expression that you've noted. 2. For time-domain filtering, you should define the FIR filter with its time representation sequence. The time representation of such SRRC pulses is shown to adopt the following form:

How to find a pair of numbers in a list given a specific range?

The problem is as such:
given an array of N numbers, find two numbers in the array such that they will have a range(max - min) value of K.
for example:
input:
5 3
25 9 1 6 8
output:
9 6
So far, what i've tried is first sorting the array and then finding two complementary numbers using a nested loop. However, because this is a sort of brute force method, I don't think it is as efficient as other possible ways.
import java.util.*;
public class Main {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int n = sc.nextInt(), k = sc.nextInt();
int[] arr = new int[n];
for(int i = 0; i < n; i++) {
arr[i] = sc.nextInt();
}
Arrays.sort(arr);
int count = 0;
int a, b;
for(int i = 0; i < n; i++) {
for(int j = i; j < n; j++) {
if(Math.max(arr[i], arr[j]) - Math.min(arr[i], arr[j]) == k) {
a = arr[i];
b = arr[j];
}
}
}
System.out.println(a + " " + b);
}
}
Much appreciated if the solution was in code (any language).

Here is code in Python 3 that solves your problem. This should be easy to understand, even if you do not know Python.
This routine uses your idea of sorting the array, but I use two variables left and right (which define two places in the array) where each makes just one pass through the array. So other than the sort, the time efficiency of my code is O(N). The sort makes the entire routine O(N log N). This is better than your code, which is O(N^2).
I never use the inputted value of N, since Python can easily handle the actual size of the array. I add a sentinel value to the end of the array to make the inner short loops simpler and quicker. This involves another pass through the array to calculate the sentinel value, but this adds little to the running time. It is possible to reduce the number of array accesses, at the cost of a few more lines of code--I'll leave that to you. I added input prompts to aid my testing--you can remove those to make my results closer to what you seem to want. My code prints the larger of the two numbers first, then the smaller, which matches your sample output. But you may have wanted the order of the two numbers to match the order in the original, un-sorted array--if that is the case, I'll let you handle that as well (I see multiple ways to do that).
# Get input
N, K = [int(s) for s in input('Input N and K: ').split()]
arr = [int(s) for s in input('Input the array: ').split()]
arr.sort()
sentinel = max(arr) + K + 2
arr.append(sentinel)
left = right = 0
while arr[right] < sentinel:
# Move the right index until the difference is too large
while arr[right] - arr[left] < K:
right += 1
# Move the left index until the difference is too small
while arr[right] - arr[left] > K:
left += 1
# Check if we are done
if arr[right] - arr[left] == K:
print(arr[right], arr[left])
break

Efficient method for imposing (some cases of) periodic boundary conditions on floats?

Some cases of periodic boundary conditions (PBC) can be imposed very efficiently on integers by simply doing:
myWrappedWithinPeriodicBoundary = myUIntValue & mask
This works when the boundary is the half open range [0, upperBound), where the (exclusive) upperBound is 2^exp so that
mask = (1 << exp) - 1
For example:
let pbcUpperBoundExp = 2 // so the periodic boundary will be [0, 4)
let mask = (1 << pbcUpperBoundExp) - 1
for x in -7 ... 7 { print(x & mask, terminator: " ") }
(in Swift) will print:
1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
Question: Is there any (roughly similar) efficient method for imposing (some cases of) PBCs on floating point-numbers (32 or 64-bit IEEE-754)?

There are several reasonable approaches:
fmod(x,1)
modf(x,&dummy) — has the advantage of knowing its divisor statically, but in my testing comes from libc.so.6 even with -ffast-math
x-floor(x) (suggested by Jens in a comment) — supports negative inputs directly
Manual bit-twiddling direct implementation
Manual bit-twiddling implementation of floor
The first two preserve the sign of their input; you can add 1 if it's negative.
The two bit manipulations are very similar: you identify which significand bits correspond to the integer portion, and mask them (for the direct implementation) or the rest (to implement floor) off. The direct implementation can be completed either with a floating-point division or with a shift to reassemble the double manually; the former is 28% faster even given hardware CLZ. The floor implementation can immediately reconstitute a double: floor never changes the exponent of its argument unless it returns 0. About 20 lines of C are required.
The following timing is with double and gcc -O3, with timing loops over representative inputs into which the operative code was inlined.
fmod: 41.8 ns
modf: 19.6 ns
floor: 10.6 ns
With -ffast-math:
fmod: 26.2 ns
modf: 30.0 ns
floor: 21.9 ns
Bit manipulation:
direct: 18.0 ns
floor: 20.6 ns
The manual implementations are competitive, but the floor technique is the best. Oddly, two of the three library functions perform better without -ffast-math: that is, as a PLT function call than as an inlined builtin function.

I'm adding this answer to my own question since it describes the, at the time of writing, best solution I have found. It's in Swift 4.1 (should be straight forward to translate into C) and it's been tested in various use cases:
extension BinaryFloatingPoint {
/// Returns the value after restricting it to the periodic boundary
/// condition [0, 1).
/// See https://forums.swift.org/t/why-no-fraction-in-floatingpoint/10337
#_transparent
func wrappedToUnitRange() -> Self {
let fract = self - self.rounded(.down)
// Have to clamp to just below 1 because very small negative values
// will otherwise return an out of range result of 1.0.
// Turns out this:
if fract >= 1.0 { return Self(1).nextDown } else { return fract }
// is faster than this:
//return min(fract, Self(1).nextDown)
}
#_transparent
func wrapped(to range: Range<Self>) -> Self {
let measure = range.upperBound - range.lowerBound
let recipMeasure = Self(1) / measure
let scaled = (self - range.lowerBound) * recipMeasure
return scaled.wrappedToUnitRange() * measure + range.lowerBound
}
#_transparent
func wrappedIteratively(to range: Range<Self>) -> Self {
var v = self
let measure = range.upperBound - range.lowerBound
while v >= range.upperBound { v = v - measure }
while v < range.lowerBound { v = v + measure }
return v
}
}
On my MacBook Pro with a 2 GHz Intel Core i7,
a hundred million (probably inlined) calls to wrapped(to range:) on random (finite) Double values takes 0.6 seconds, which is about 166 million calls per second (not multi threaded). The range being statically known or not, or having bounds or measure that is a power of two etc, can make some difference but not as much as one could perhaps have thought.
wrappedToUnitRange() takes about 0.2 seconds, meaning 500 million calls per second on my system.
Given the right scenario, wrappedIteratively(to range:) is as fast as wrappedToUnitRange().
The timings have been made by comparing a baseline test (without wrapping some value, but still using it to compute eg a simple xor checksum) to the same test where a value is wrapped. The difference in time between these are the times I have given for the wrapping calls.
I have used Swift development toolchain 2018-02-21, compiling with -O -whole-module-optimization -static-stdlib -gnone. And care has been taken to make the tests relevant, ie preventing dead code removal, using true random input of different distributions etc. Writing the wrapping functions generically, like this extension on BinaryFloatingPoint, turned out to be optimized into equivalent code as if I had written separate specialized versions for eg Float and Double.
It would be interesting to see someone more skilled than me investigating this further (C or Swift or any other language doesn't matter).
EDIT:
For anyone interested, here is some versions for simd float2:
extension float2 {
#_transparent
func wrappedInUnitRange() -> float2 {
return simd.fract(self)
}
#_transparent
func wrappedToMinusOneToOne() -> float2 {
let scaled = (self + float2(1, 1)) * float2(0.5, 0.5)
let scaledFract = scaled - floor(scaled)
let wrapped = simd_muladd(scaledFract, float2(2, 2), float2(-1, -1))
// Note that we have to make sure the result is not out of bounds, like
// simd fract does:
let oneNextDown = Float(bitPattern:
0b0_01111110_11111111111111111111111)
let oneNextDownFloat2 = float2(oneNextDown, oneNextDown)
return simd.min(wrapped, oneNextDownFloat2)
}
#_transparent
func wrapped(toLowerBound lowerBound: float2,
upperBound: float2) -> float2
{
let measure = upperBound - lowerBound
let recipMeasure = simd_precise_recip(measure)
let scaled = (self - lowerBound) * recipMeasure
let scaledFract = scaled - floor(scaled)
// Note that we have to make sure the result is not out of bounds, like
// simd fract does:
let wrapped = simd_muladd(scaledFract, measure, lowerBound)
let maxX = upperBound.x.nextDown // For some reason, this won't be
let maxY = upperBound.y.nextDown // optimized even when upperBound is
// statically known, and there is no similar simd function available.
let maxValue = float2(maxX, maxY)
return simd.min(wrapped, maxValue)
}
}
I asked some related simd-related questions here which might be of interest.
EDIT2:
As can be seen in the above Swift Forums thread:
// Note that tiny negative values like:
let x: Float = -1e-08
// May produce results outside the [0, 1) range:
let wrapped = x - floor(x)
print(wrapped < 1.0) // false
// which may result in out-of-bounds table accesses
// in common usage, so it's probably better to use:
let correctlyWrapped = simd_fract(x)
print(correctlyWrapped < 1.0) // true
I have since updated the code to account for this.

Adding columns to a DataArray in Julia

Following up How to add vectors to the columns of some array in Julia?, I would like to have some analogous clarifications for DataArrays.
Let y=randn(100, 2). I would like to create a matrix x with the lagged value (with lags > 0) of y. I have already written a code which it seems is working properly (see below). I was wondering if there is a better way for concatenating a DataArray than the one I have used.
T, n = size(y);
x = #data(zeros(T-lags, 0));
for lag in 1:lags
x = hcat(x, y[lags-lag+1:end-lag, :]);
end

Unless there is a specific reason to do otherwise, my recommendation would be to start with your DataArray x being the size that you want it to be and then fill in the column values you want.
This will give you better performance than if you need to recreate the DataArray for each new column, which is what any method for "adding" columns will actually be doing. It's conceivable that the DataArray package might have some more pretty syntax for it than what you have in your question, but fundamentally, that's what it would still be doing.
Thus, in a simplified version of your example, I would recommend:
using DataArrays
N = 5; T = 10;
X = #data(zeros(T, N));
initial_data_cols = 2; ## specify how much of the initial data is filled in
lags = size(X,2) - initial_data_cols
X[:,1:initial_data_cols] = rand(size(X,1), initial_data_cols) ## First two columns of X are fixed in advance
for lag in 1:lags
X[:,(lag+initial_data_cols)] = rand(size(X,1))
end
If you did find yourself in a situation where you need to add columns to an already created object, you could improve somewhat upon the code that you have by first creating all of the new objects together and then doing a single addition of them to your initial DataArray. E.g.
X = #data(zeros(10, 2))
X = [X rand(10,3)]
For instance, consider the difference in execution time, and number and quantity of memory allocations in the two examples below:
n = 10^5; m = 10;
A = #data rand(n,m);
n_newcol = 10;
function t1(A::Array, n_newcol)
n = size(A,1)
for idx = 1:n_newcol
A = hcat(A, zeros(n))
end
return A
end
function t2(A::Array, n_newcol)
n = size(A,1)
[A zeros(n, n_newcol)]
end
# Stats after running each function once to compile
#time r1 = t1(A, n_newcol); ## 0.154082 seconds (124 allocations: 125.888 MB, 75.33% gc time)
#time r2 = t2(A, n_newcol); ## 0.007981 seconds (9 allocations: 22.889 MB, 31.73% gc time)

How to turn variable length bytes(less than 8) to long long int faster in python3?

I have tried to use while loop and struct.unpack to solve the problem like this:
import struct
def func(inputBytes):
inputBytes = bytearray(inputBytes)
while inputBytes.__len__() < 8:
inputBytes.append(0)
inputBytes.reverse()
return struct.unpack("!q", inputBytes)[0]
print(func(b'\x00\x01'))
But it is too slow.How to make it faster?

I've thought it too complicated before. direct calculation is faster.
def func(rawData):
sum = 0
for i in rawData[::-1]:
sum += i
sum <<= 8
sum >>= 8
return sum

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex