Euler Method recursive - recursion

I am quite new to coding in C.
I am trying to apply Eulers Method for an first order ODE in a quite simple way, both as an iterative as well as an recursive function. I can't put the recursive implementation together.
#include <stdio.h>
#include <stdlib.h>
float a,b,x,y,h,b;
float fun(float x,float y)
{
float f;
f=x+y;
return f;
}
float euler (float x, float y, float h, float t){
float k;
while(x<=b) {
y=y+h*fun(x,y);;
x=x+h;
printf("%0.3f\t%0.3f\n",x,y);
}
};
float euler_rec (float x, float y, float h, float b){
if (x<=b) {
y=euler_rec(x, y, h, b)+h*fun(x,y);
}
else {
printf("%0.3f\t%0.3f\n",x,y);
}
};
int main()
{
printf("\nEnter x0,y0,h,xn: ");
scanf("%f%f%f%f",&x,&y,&h,&b);
printf("\n x\t y\n");
euler(x, y, h, b);
printf ("rec\n");
euler_rec(x, y, h, b);
return 0;
}

I would not recommend using recursion depths of thousand or multiple thousands in serious applications. But for the sake of the principle, what you are trying to implement is that euler_rec(x0,y0,h,x) returns the solution approximation at time x for initial point (x0,y0). Thus the recursive call should be
yprev = euler_rec(x0,y0,h,x-h);
y = yprev + h*f(x-h,yprev);
and around that you have to construct the body of the recursion function.
It should amount to something like
float euler_rec (float x0, float y0, float h, float x) {
float y;
if(x > x0+h) {
y = euler_rec(x0 ,y0 ,h , x-h);
} else {
y = y0;
h = x-x0;
}
return y + h * f(x-h, y);
}
The printing has to occur outside of the recursion, or you need still another parameter that is constant over the recursion to indicate the x values for which some print-out should happen.
You can of course also use the forward recursion that is nearly not a recursion as any optimizing compiler will transform it into an iteration. The previous variant was recursion call first, computation second. If you put the computation first, recursion call second then the method reads as
float euler_rec (float x, float y, float h, float b){
if (x+1.01*h>=b) {
h = b-x;
return y + h * f(x,y); // insert print statement if you like
}
return euler_rec(x+h, y+h*f(x,y),h,b)
}

Related

Like Bezout's identity but with Natural numbers and an arbitrary constant. Can it be solved?

I'm an amateur playing with discrete math. This isn't a
homework problem though I am doing it at home.
I want to solve ax + by = c for natural numbers, with a, b and c
given and x and y to be computed. I want to find all x, y pairs
that will satisfy the equation.
This has a similar structure to Bezout's identity for integers
where there are multiple (infinite?) solution pairs. I thought
the similarity might mean that the extended Euclidian algorithm
could help here. Below are two implementations of the EEA that
seem to work; they're both adapted from code found on the net.
Could these be adapted to the task, or perhaps can someone
find a more promising avenue?
typedef long int Int;
#ifdef RECURSIVE_EEA
Int // returns the GCD of a and b and finds x and y
// such that ax + by == GCD(a,b), recursively
eea(Int a, Int b, Int &x, Int &y) {
if (0==a) {
x = 0;
y = 1;
return b;
}
Int x1; x1=0;
Int y1; y1=0;
Int gcd = eea(b%a, a, x1, y1);
x = y1 - b/a*x1;
y = x1;
return gcd;
}
#endif
#ifdef ITERATIVE_EEA
Int // returns the GCD of a and b and finds x and y
// such that ax + by == GCD(a,b), iteratively
eea(Int a, Int b, Int &x, Int &y) {
x = 0;
y = 1;
Int u; u=1;
Int v; v=0; // does this need initialising?
Int q; // quotient
Int r; // remainder
Int m;
Int n;
while (0!=a) {
q = b/a; // quotient
r = b%a; // remainder
m = x - u*q; // ?? what are the invariants?
n = y - v*q; // ?? When does this overflow?
b = a; // A candidate for the gcd - a's last nonzero value.
a = r; // a becomes the remainder - it shrinks each time.
// When a hits zero, the u and v that are written out
// are final values and the gcd is a's previous value.
x = u; // Here we have u and v shuffling values out
y = v; // via x and y. If a has gone to zero, they're final.
u = m; // ... and getting new values
v = n; // from m and n
}
return b;
}
#endif
If we slightly change the equation form:
ax + by = c
by = c - ax
y = (c - ax)/b
Then we can loop x through all numbers in its range (a*x <= c) and compute if viable natural y exists. So no there is not infinite number of solutions the limit is min(c/a,c/b) ... Here small C++ example of naive solution:
int a=123,b=321,c=987654321;
int x,y,ax;
for (x=1,ax=a;ax<=c;x++,ax+=a)
{
y = (c-ax)/b;
if (ax+(b*y)==c) here output x,y solution somewhere;
}
If you want to speed this up then just iterate y too and just check if c-ax is divisible by b Something like this:
int a=123,b=321,c=987654321;
int x,y,ax,cax,by;
for (x=1,ax=a,y=(c/b),by=b*y;ax<=c;x++,ax+=a)
{
cax=c-ax;
while (by>cax){ by-=b; y--; if (!y) break; }
if (by==cax) here output x,y solution somewhere;
}
As you can see now both x,y are iterated in opposite directions in the same loop and no division or multiplication is present inside loop anymore so its much faster here first few results:
method1 method2
[ 78.707 ms] | [ 21.277 ms] // time needed for computation
75044 | 75044 // found solutions
-------------------------------
75,3076776 | 75,3076776 // first few solutions in x,y order
182,3076735 | 182,3076735
289,3076694 | 289,3076694
396,3076653 | 396,3076653
503,3076612 | 503,3076612
610,3076571 | 610,3076571
717,3076530 | 717,3076530
824,3076489 | 824,3076489
931,3076448 | 931,3076448
1038,3076407 | 1038,3076407
1145,3076366 | 1145,3076366
I expect that for really huge c and small a,b numbers this
while (by>cax){ by-=b; y--; if (!y) break; }
might be slower than actual division using GCD ...

Recursive call for x power y power z

Recently i attended an interview where i was asked to write a recursive java code for (x^y)^z.
function power(x,y){
if(y==0){
return 1;
}else{
x*=power(x,y-1);
}
}
I could manage doing it for x^y but was not getting a solution for including the z also in the recursive call.
On asking for a hint, they told me instead of having 2 parameters in call u can have a array with 2 values. But even then i dint get the solution. can u suggest a solution both ways.
This is the solution I would use in python, but you could easily have done it in javascipt or any other language too:
def power(x, y):
if y == 0:
return 1
if y == 1:
return x
return x * power(x, y - 1)
def power2(x, y, z):
return power(power(x, y), z)
You can then use power2 to return your result. In another language you could probably overload the same function but I do not think this is possible in Python for this scenario.
For your javascript code, all you really needed to add to your solution was a second function along the lines of:
function power2(x,y,z)
{
return power(power(x, y), z);
}
As you can see, the solution itself is also recursive despite defining a new function (or overloading your previous one).
Michael's solution in Java Language
public void testPower()
{
int val = power(2, 3, 2);
System.out.println(val);
}
private int power(int x, int y, int z)
{
return power(power(x, y), z);
}
private int power(int x, int y)
{
if (y == 0)
{
return 1;
}
if (y == 1)
{
return x;
}
return x * power(x, y - 1);
}
output is 64

Higher radix (or better) formulation for Stockham FFT

Background
I've implemented this algorithm from Microsoft Research for a radix-2 FFT (Stockham auto sort) using OpenCL.
I use floating point textures (256 cols X N rows) for input and output in the kernel, because I will need to sample at non-integral points and I thought it better to delegate that to the texture sampling hardware. Note that my FFTs are always of 256-point sequences (every row in my texture). At this point, my N is 16384 or 32768 depending on the GPU i'm using and the max 2D texture size allowed.
I also need to perform the FFT of 4 real-valued sequences at once, so the kernel performs the FFT(a, b, c, d) as FFT(a + ib, c + id) from which I can extract the 4 complex sequences out later using an O(n) algorithm. I can elaborate on this if someone wishes - but I don't believe it falls in the scope of this question.
Kernel Source
const sampler_t fftSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
__kernel void FFT_Stockham(read_only image2d_t input, write_only image2d_t output, int fftSize, int size)
{
int x = get_global_id(0);
int y = get_global_id(1);
int b = floor(x / convert_float(fftSize)) * (fftSize / 2);
int offset = x % (fftSize / 2);
int x0 = b + offset;
int x1 = x0 + (size / 2);
float4 val0 = read_imagef(input, fftSampler, (int2)(x0, y));
float4 val1 = read_imagef(input, fftSampler, (int2)(x1, y));
float angle = -6.283185f * (convert_float(x) / convert_float(fftSize));
// TODO: Convert the two calculations below into lookups from a __constant buffer
float tA = native_cos(angle);
float tB = native_sin(angle);
float4 coeffs1 = (float4)(tA, tB, tA, tB);
float4 coeffs2 = (float4)(-tB, tA, -tB, tA);
float4 result = val0 + coeffs1 * val1.xxzz + coeffs2 * val1.yyww;
write_imagef(output, (int2)(x, y), result);
}
The host code simply invokes this kernel log2(256) times, ping-ponging the input and output textures.
Note: I tried removing the native_cos and native_sin to see if that impacted timing, but it doesn't seem to change things by very much. Not the factor I'm looking for, in any case.
Access pattern
Knowing that I am probably memory-bandwidth bound, here is the memory access pattern (per-row) for my radix-2 FFT.
X0 - element 1 to combine (read)
X1 - element 2 to combine (read)
X - element to write to (write)
Question
So my question is - can someone help me with/point me toward a higher-radix formulation for this algorithm? I ask because most FFTs are optimized for large cases and single real/complex valued sequences. Their kernel generators are also very case dependent and break down quickly when I try to muck with their internals.
Are there other options better than simply going to a radix-8 or 16 kernel?
Some of my constraints are - I have to use OpenCL (no cuFFT). I also cannot use clAmdFft from ACML for this purpose. It would be nice to also talk about CPU optimizations (this kernel SUCKS big time on the CPU) - but getting it to run in fewer iterations on the GPU is my main use-case.
Thanks in advance for reading through all this and trying to help!
I tried several versions, but the one with the best performance on CPU and GPU was a radix-16 kernel for my specific case.
Here is the kernel for reference. It was taken from Eric Bainville's (most excellent) website and used with full attribution.
// #define M_PI 3.14159265358979f
//Global size is x.Length/2, Scale = 1 for direct, 1/N to inverse (iFFT)
__kernel void ConjugateAndScale(__global float4* x, const float Scale)
{
int i = get_global_id(0);
float temp = Scale;
float4 t = (float4)(temp, -temp, temp, -temp);
x[i] *= t;
}
// Return a*EXP(-I*PI*1/2) = a*(-I)
float2 mul_p1q2(float2 a) { return (float2)(a.y,-a.x); }
// Return a^2
float2 sqr_1(float2 a)
{ return (float2)(a.x*a.x-a.y*a.y,2.0f*a.x*a.y); }
// Return the 2x DFT2 of the four complex numbers in A
// If A=(a,b,c,d) then return (a',b',c',d') where (a',c')=DFT2(a,c)
// and (b',d')=DFT2(b,d).
float8 dft2_4(float8 a) { return (float8)(a.lo+a.hi,a.lo-a.hi); }
// Return the DFT of 4 complex numbers in A
float8 dft4_4(float8 a)
{
// 2x DFT2
float8 x = dft2_4(a);
// Shuffle, twiddle, and 2x DFT2
return dft2_4((float8)(x.lo.lo,x.hi.lo,x.lo.hi,mul_p1q2(x.hi.hi)));
}
// Complex product, multiply vectors of complex numbers
#define MUL_RE(a,b) (a.even*b.even - a.odd*b.odd)
#define MUL_IM(a,b) (a.even*b.odd + a.odd*b.even)
float2 mul_1(float2 a, float2 b)
{ float2 x; x.even = MUL_RE(a,b); x.odd = MUL_IM(a,b); return x; }
float4 mul_1_F4(float4 a, float4 b)
{ float4 x; x.even = MUL_RE(a,b); x.odd = MUL_IM(a,b); return x; }
float4 mul_2(float4 a, float4 b)
{ float4 x; x.even = MUL_RE(a,b); x.odd = MUL_IM(a,b); return x; }
// Return the DFT2 of the two complex numbers in vector A
float4 dft2_2(float4 a) { return (float4)(a.lo+a.hi,a.lo-a.hi); }
// Return cos(alpha)+I*sin(alpha) (3 variants)
float2 exp_alpha_1(float alpha)
{
float cs,sn;
// sn = sincos(alpha,&cs); // sincos
//cs = native_cos(alpha); sn = native_sin(alpha); // native sin+cos
cs = cos(alpha); sn = sin(alpha); // sin+cos
return (float2)(cs,sn);
}
// Return cos(alpha)+I*sin(alpha) (3 variants)
float4 exp_alpha_1_F4(float alpha)
{
float cs,sn;
// sn = sincos(alpha,&cs); // sincos
// cs = native_cos(alpha); sn = native_sin(alpha); // native sin+cos
cs = cos(alpha); sn = sin(alpha); // sin+cos
return (float4)(cs,sn,cs,sn);
}
// mul_p*q*(a) returns a*EXP(-I*PI*P/Q)
#define mul_p0q1(a) (a)
#define mul_p0q2 mul_p0q1
//float2 mul_p1q2(float2 a) { return (float2)(a.y,-a.x); }
__constant float SQRT_1_2 = 0.707106781186548; // cos(Pi/4)
#define mul_p0q4 mul_p0q2
float2 mul_p1q4(float2 a) { return (float2)(SQRT_1_2)*(float2)(a.x+a.y,-a.x+a.y); }
#define mul_p2q4 mul_p1q2
float2 mul_p3q4(float2 a) { return (float2)(SQRT_1_2)*(float2)(-a.x+a.y,-a.x-a.y); }
__constant float COS_8 = 0.923879532511287; // cos(Pi/8)
__constant float SIN_8 = 0.382683432365089; // sin(Pi/8)
#define mul_p0q8 mul_p0q4
float2 mul_p1q8(float2 a) { return mul_1((float2)(COS_8,-SIN_8),a); }
#define mul_p2q8 mul_p1q4
float2 mul_p3q8(float2 a) { return mul_1((float2)(SIN_8,-COS_8),a); }
#define mul_p4q8 mul_p2q4
float2 mul_p5q8(float2 a) { return mul_1((float2)(-SIN_8,-COS_8),a); }
#define mul_p6q8 mul_p3q4
float2 mul_p7q8(float2 a) { return mul_1((float2)(-COS_8,-SIN_8),a); }
// Compute in-place DFT2 and twiddle
#define DFT2_TWIDDLE(a,b,t) { float2 tmp = t(a-b); a += b; b = tmp; }
// T = N/16 = number of threads.
// P is the length of input sub-sequences, 1,16,256,...,N/16.
__kernel void FFT_Radix16(__global const float4 * x, __global float4 * y, int pp)
{
int p = pp;
int t = get_global_size(0); // number of threads
int i = get_global_id(0); // current thread
////// y[i] = 2*x[i];
////// return;
int k = i & (p-1); // index in input sequence, in 0..P-1
// Inputs indices are I+{0,..,15}*T
x += i;
// Output indices are J+{0,..,15}*P, where
// J is I with four 0 bits inserted at bit log2(P)
y += ((i-k)<<4) + k;
// Load
float4 u[16];
for (int m=0;m<16;m++) u[m] = x[m*t];
// Twiddle, twiddling factors are exp(_I*PI*{0,..,15}*K/4P)
float alpha = -M_PI*(float)k/(float)(8*p);
for (int m=1;m<16;m++) u[m] = mul_1_F4(exp_alpha_1_F4(m * alpha), u[m]);
// 8x in-place DFT2 and twiddle (1)
DFT2_TWIDDLE(u[0].lo,u[8].lo,mul_p0q8);
DFT2_TWIDDLE(u[0].hi,u[8].hi,mul_p0q8);
DFT2_TWIDDLE(u[1].lo,u[9].lo,mul_p1q8);
DFT2_TWIDDLE(u[1].hi,u[9].hi,mul_p1q8);
DFT2_TWIDDLE(u[2].lo,u[10].lo,mul_p2q8);
DFT2_TWIDDLE(u[2].hi,u[10].hi,mul_p2q8);
DFT2_TWIDDLE(u[3].lo,u[11].lo,mul_p3q8);
DFT2_TWIDDLE(u[3].hi,u[11].hi,mul_p3q8);
DFT2_TWIDDLE(u[4].lo,u[12].lo,mul_p4q8);
DFT2_TWIDDLE(u[4].hi,u[12].hi,mul_p4q8);
DFT2_TWIDDLE(u[5].lo,u[13].lo,mul_p5q8);
DFT2_TWIDDLE(u[5].hi,u[13].hi,mul_p5q8);
DFT2_TWIDDLE(u[6].lo,u[14].lo,mul_p6q8);
DFT2_TWIDDLE(u[6].hi,u[14].hi,mul_p6q8);
DFT2_TWIDDLE(u[7].lo,u[15].lo,mul_p7q8);
DFT2_TWIDDLE(u[7].hi,u[15].hi,mul_p7q8);
// 8x in-place DFT2 and twiddle (2)
DFT2_TWIDDLE(u[0].lo,u[4].lo,mul_p0q4);
DFT2_TWIDDLE(u[0].hi,u[4].hi,mul_p0q4);
DFT2_TWIDDLE(u[1].lo,u[5].lo,mul_p1q4);
DFT2_TWIDDLE(u[1].hi,u[5].hi,mul_p1q4);
DFT2_TWIDDLE(u[2].lo,u[6].lo,mul_p2q4);
DFT2_TWIDDLE(u[2].hi,u[6].hi,mul_p2q4);
DFT2_TWIDDLE(u[3].lo,u[7].lo,mul_p3q4);
DFT2_TWIDDLE(u[3].hi,u[7].hi,mul_p3q4);
DFT2_TWIDDLE(u[8].lo,u[12].lo,mul_p0q4);
DFT2_TWIDDLE(u[8].hi,u[12].hi,mul_p0q4);
DFT2_TWIDDLE(u[9].lo,u[13].lo,mul_p1q4);
DFT2_TWIDDLE(u[9].hi,u[13].hi,mul_p1q4);
DFT2_TWIDDLE(u[10].lo,u[14].lo,mul_p2q4);
DFT2_TWIDDLE(u[10].hi,u[14].hi,mul_p2q4);
DFT2_TWIDDLE(u[11].lo,u[15].lo,mul_p3q4);
DFT2_TWIDDLE(u[11].hi,u[15].hi,mul_p3q4);
// 8x in-place DFT2 and twiddle (3)
DFT2_TWIDDLE(u[0].lo,u[2].lo,mul_p0q2);
DFT2_TWIDDLE(u[0].hi,u[2].hi,mul_p0q2);
DFT2_TWIDDLE(u[1].lo,u[3].lo,mul_p1q2);
DFT2_TWIDDLE(u[1].hi,u[3].hi,mul_p1q2);
DFT2_TWIDDLE(u[4].lo,u[6].lo,mul_p0q2);
DFT2_TWIDDLE(u[4].hi,u[6].hi,mul_p0q2);
DFT2_TWIDDLE(u[5].lo,u[7].lo,mul_p1q2);
DFT2_TWIDDLE(u[5].hi,u[7].hi,mul_p1q2);
DFT2_TWIDDLE(u[8].lo,u[10].lo,mul_p0q2);
DFT2_TWIDDLE(u[8].hi,u[10].hi,mul_p0q2);
DFT2_TWIDDLE(u[9].lo,u[11].lo,mul_p1q2);
DFT2_TWIDDLE(u[9].hi,u[11].hi,mul_p1q2);
DFT2_TWIDDLE(u[12].lo,u[14].lo,mul_p0q2);
DFT2_TWIDDLE(u[12].hi,u[14].hi,mul_p0q2);
DFT2_TWIDDLE(u[13].lo,u[15].lo,mul_p1q2);
DFT2_TWIDDLE(u[13].hi,u[15].hi,mul_p1q2);
// 8x DFT2 and store (reverse binary permutation)
y[0] = u[0] + u[1];
y[p] = u[8] + u[9];
y[2*p] = u[4] + u[5];
y[3*p] = u[12] + u[13];
y[4*p] = u[2] + u[3];
y[5*p] = u[10] + u[11];
y[6*p] = u[6] + u[7];
y[7*p] = u[14] + u[15];
y[8*p] = u[0] - u[1];
y[9*p] = u[8] - u[9];
y[10*p] = u[4] - u[5];
y[11*p] = u[12] - u[13];
y[12*p] = u[2] - u[3];
y[13*p] = u[10] - u[11];
y[14*p] = u[6] - u[7];
y[15*p] = u[14] - u[15];
}
Note that I have modified the kernel to perform the FFT of 2 complex-valued sequences at once instead of one. Also, since I only need the FFT of 256 elements at a time in a much larger sequence, I perform only 2 runs of this kernel, which leaves me with 256-length DFTs in the larger array.
Here's some of the relevant host code as well.
var ev = new[] { new Cl.Event() };
var pEv = new[] { new Cl.Event() };
int fftSize = 1;
int iter = 0;
int n = distributionSize >> 5;
while (fftSize <= n)
{
Cl.SetKernelArg(fftKernel, 0, memA);
Cl.SetKernelArg(fftKernel, 1, memB);
Cl.SetKernelArg(fftKernel, 2, fftSize);
Cl.EnqueueNDRangeKernel(commandQueue, fftKernel, 1, null, globalWorkgroupSize, localWorkgroupSize,
(uint)(iter == 0 ? 0 : 1),
iter == 0 ? null : pEv,
out ev[0]).Check();
if (iter > 0)
pEv[0].Dispose();
Swap(ref ev, ref pEv);
Swap(ref memA, ref memB); // ping-pong
fftSize = fftSize << 4;
iter++;
Cl.Finish(commandQueue);
}
Swap(ref memA, ref memB);
Hope this helps someone!

How do I optimize displaying a large number of quads in OpenGL?

I am trying to display a mathematical surface f(x,y) defined on a XY regular mesh using OpenGL and C++ in an effective manner:
struct XYRegularSurface {
double x0, y0;
double dx, dy;
int nx, ny;
XYRegularSurface(int nx_, int ny_) : nx(nx_), ny(ny_) {
z = new float[nx*ny];
}
~XYRegularSurface() {
delete [] z;
}
float& operator()(int ix, int iy) {
return z[ix*ny + iy];
}
float x(int ix, int iy) {
return x0 + ix*dx;
}
float y(int ix, int iy) {
return y0 + iy*dy;
}
float zmin();
float zmax();
float* z;
};
Here is my OpenGL paint code so far:
void color(QColor & col) {
float r = col.red()/255.0f;
float g = col.green()/255.0f;
float b = col.blue()/255.0f;
glColor3f(r,g,b);
}
void paintGL_XYRegularSurface(XYRegularSurface &surface, float zmin, float zmax) {
float x, y, z;
QColor col;
glBegin(GL_QUADS);
for(int ix = 0; ix < surface.nx - 1; ix++) {
for(int iy = 0; iy < surface.ny - 1; iy++) {
x = surface.x(ix,iy);
y = surface.y(ix,iy);
z = surface(ix,iy);
col = rainbow(zmin, zmax, z);color(col);
glVertex3f(x, y, z);
x = surface.x(ix + 1, iy);
y = surface.y(ix + 1, iy);
z = surface(ix + 1,iy);
col = rainbow(zmin, zmax, z);color(col);
glVertex3f(x, y, z);
x = surface.x(ix + 1, iy + 1);
y = surface.y(ix + 1, iy + 1);
z = surface(ix + 1,iy + 1);
col = rainbow(zmin, zmax, z);color(col);
glVertex3f(x, y, z);
x = surface.x(ix, iy + 1);
y = surface.y(ix, iy + 1);
z = surface(ix,iy + 1);
col = rainbow(zmin, zmax, z);color(col);
glVertex3f(x, y, z);
}
}
glEnd();
}
The problem is that this is slow, nx=ny=1000 and fps ~= 1.
How do I optimize this to be faster?
EDIT: following your suggestion (thanks!) regarding VBO
I added:
float* XYRegularSurface::xyz() {
float* data = new float[3*nx*ny];
long i = 0;
for(int ix = 0; ix < nx; ix++) {
for(int iy = 0; iy < ny; iy++) {
data[i++] = x(ix,iy);
data[i++] = y(ix,iy);
data[i] = z[i]; i++;
}
}
return data;
}
I think I understand how I can create a VBO, initialize it to xyz() and send it to the GPU in one go, but how do I use the VBO when drawing. I understand that this can either be done in the vertex shader or by glDrawElements? I assume the latter is easier? If so: I do not see any QUAD mode in the documentation for glDrawElements!?
Edit2:
So I can loop trough all nx*ny quads and draw each by:
GL_UNSIGNED_INT indices[4];
// ... set indices
glDrawElements(GL_QUADS, 1, GL_UNSIGNED_INT, indices);
?
1/. Use display lists, to cache GL commands - avoiding recalculation of the vertices and the expensive per-vertex call overhead. If the data is updated, you need to look at client-side vertex arrays (not to be confused with VAOs). Now ignore this option...
2/. Use vertex buffer objects. Available as of GL 1.5.
Since you need VBOs for core profile anyway (i.e., modern GL), you can at least get to grips with this first.
Well, you've asked a rather open ended question. I'd suggest using modern (3.0+) OpenGL for everything. The point of just about any new OpenGL feature is to provide a faster way to do things. Like everyone else is suggesting, use array (vertex) buffer objects and vertex array objects. Use an element array (index) buffer object too. Most GPUs have a 'post-transform cache', which stores the last few transformed vertices, but this can only be used when you call the glDraw*Elements family of functions. I also suggest you store a flat mesh in your VBO, where y=0 for each vertex. Sample the y from a heightmap texture in your vertex shader. If you do this, whenever the surface changes you will only need to update the heightmap texture, which is easier than updating the VBO. Use one of the floating point or integer texture formats for a heightmap, so you aren't restricted to having your values be between 0 and 1.
If so: I do not see any QUAD mode in the documentation for glDrawElements!?
If you want quads make sure you're looking at the GL 2.1-era docs, not the new stuff.

is it possible to return two vectors from a function?

Im trying to do a merge sort in cpp on a vector called x, which contains x coordinates. As the mergesort sorts the x coordinates, its supposed to move the corresponding elements in a vector called y, containing the y coordinates. the only problem is that i dont know how to (or if i can) return both resulting vectors from the merge function.
alternatively if its easier to implement i could use a slower sort method.
No, you cannot return 2 results from a method like in this example.
vector<int>, vector<int> merge_sort();
What you can do is pass 2 vectors by reference to a function and the resultant mergesorted vector affects the 2 vectors...e.g
void merge_sort(vector<int>& x, vector<int>& y);
Ultimately, you can do what #JoshD mentioned and create a struct called point and merge sort the vector of the point struct instead.
Try something like this:
struct Point {
int x;
int y;
operator <(const Point &rhs) {return x < rhs.x;}
};
vector<Point> my_points.
mergesort(my_points);
Or if you want to sort Points with equal x value by the y cordinate:
Also, I thought I'd add, if you really ever need to, you can alway return a std::pair. A better choice is usually to return through the function parameters.
operator <(const Point &rhs) {return (x < rhs.x || x == rhs.x && y < rhs.y);}
Yes, you can return a tuple, then use structured binding (since C++17).
Here's a full example:
#include <cstdlib>
#include <iostream>
#include <numeric>
#include <tuple>
#include <vector>
using namespace std::string_literals;
auto twoVectors() -> std::tuple<std::vector<int>, std::vector<int>>
{
const std::vector<int> a = { 1, 2, 3 };
const std::vector<int> b = { 4, 5, 6 };
return { a, b };
}
auto main() -> int
{
auto [a, b] = twoVectors();
auto const sum = std::accumulate(a.begin(), a.end(), std::accumulate(b.begin(), b.end(), 0));
std::cout << "sum: "s << sum << std::endl;
return EXIT_SUCCESS;
}
You can have a vector of vectors
=> vector<vector > points = {{a, b}, {c, d}};
now you can return points.
Returning vectors is most probably not what you want, as they are copied for this purpose (which is slow). Have a look at this implementation, for example.

Resources