Which is the default value for the output color in GLSL in case you dont set it?
#version 330
uniform sampler2DRect colorTex;
uniform vec3 backgroundColor;
out vec4 outputColor;
void main(void)
{
vec4 frontColor = texture(colorTex, gl_FragCoord.xy);
outputColor.rgb = frontColor + backgroundColor * frontColor.a;
}
Is it (0, 0, 0, 1)?
Ps: that code belongs to the old GL, trying to use it with the GL3, I get the following error
error C7011: implicit cast from "vec4" to "vec3"
I am right to suppose that in the old GL, the implicit cast was allowed?
This is the fragment shader of the front to back depth peeling example of Nvidia, you can find the code here, org.jogl.demos.dualdepthpeeling\org.jogl.demos.dualdepthpeeling\src\demos\dualDepthPeeling\shaders\front_peeling_final_fragment.glsl
With regard to the default fragment shader output:
There is no default and the result is undefined if you don't set one. See other answers.
I notice you have a texture, and sampling an 'unbound'/incomplete texture is different [1][2]. There is a default, but in practice I would not rely on this for all drivers!:
If a fragment shader uses a sampler which associated texture object is
not complete, as defined in section 3.8.10, the texture image unit will
return (R, G, B, A) = (0, 0, 0, 1). [↱]
There's also a uniform value which may not be set either. These remain constant for the entire draw call, and like samplers, also have a default (although this is more of an "initial" value).
As a result of a successful link operation, all active user-defined uniform variables belonging to program will be initialized to 0 [↱]
But what actually happens when you don't set a fragment colour?
Being undefined in the spec means (0, 0, 0, 1) is a valid value and may in fact be the one you get. If a GL implementation (such as the one Nvidia or ATI provide with their driver + hardware) were to make this the consistent returned value, the generated shader code needs to set a default value or catch the case when you don't set one. This just adds overhead. It's faster to do nothing instead.
The shader must still return a value though, and what ever value is in the register/s for your fragment shader thread gets returned. This is uninitialized data (from the point of view of your shader program). Just like uninitialized memory in a CPU program, the value could be anything, generally depending on what was there beforehand. It's not uncommon for this to be all zeroes, or even something pretty consistent depending on the previous task the GPU was doing (i.e. another shader you run). Although I've found uninitialized values in the fragment shader quite commonly presents as flickering and can make patterns like this:
If you don't assign values to fragment shader outputs, the result is undefined. From the OpenGL 3.3 spec, section 3.9.2 "Shader Execution", page 190:
Any colors, or color components, associated with a fragment that are not written by the fragment shader are undefined.
The corresponding GLSL spec confirms this. In section 4.3 "Storage Qualifiers", page 28:
Global variables without storage qualifiers that are not initialized in their declaration or by the application will not be initialized by OpenGL, but rather will enter main() with undefined values.
And then in section 4.6.3 "Outputs", page 31:
Output variables must be declared at global scope. During shader execution they will behave as normal unqualified global variables.
On the second part of the question, I don't believe there ever was a GLSL version where an implicit cast from vec4 to a vec3 was legal. Some compilers may not have given an error, but they should have.
Related
How can I reliably render bit accurate outputs to a floating-point texture?
I am trying to encode uint values with the intBitsToFloat function within a shader and write the result to an RGBA32F texture. This is meant as a workaround for the inability to alpha blend integer textures, and I, later on, want to decode the float values back to their initial uint value.
This is the shade code used
output_id.r = intBitsToFloat(input_symbol_id);
output_id.g = intBitsToFloat(input_instance_id);
output_id.b = 0.0
output_id.a = alpha > 0.5 ? 1.0 : 0.0;
where input_symbol_id and input_instance_id are the int values I want to encode.
This doesn't seem to work, though. The output for very small values (e.g., intBitsToFloat(1)) always gets truncated to 0.0 when later reading from the output texture via readPixels. Larger values (e.g., 1.0) seem to get passed through just fine.
This is using weblg 2.0.
I'm aware of this question, which describes a similar problem. I am however, already employing the fix described there, and I still only get zero values back.
Or the question can be paraphrased like this:
Why may one need a datatype with a non-zero lower bound?
Consider the following example:
struct S {
int a;
int b;
float c;
float d;
} array[N];
If I had an array of type S[] and I wanted to send only values of fields b and
d, I would create a datatype with the type map { (4, MPI_INT), (12, MPI_FLOAT) }.
At first, it seems that such a type could be used to correctly send an array of
struct S:
MPI_Send(array, N, datatype, ...);
But this doesn't work if N > 1.
Such a type would have lb = 4, ub = 16 and extent = ub - lb = 12. That is,
MPI would consider that the second element of the array starts 12 bytes from the
first one, which is not true.
Well, that may be not a big deal. After all, generally, for such partially sent structures
we have to specify the exact size of the structure:
MPI_Type_create_resized(datatype, 0, sizeof(struct S), &resized);
But I wonder why we always need to specify a zero lower bound. Why would
someone need a non-zero lower bound? The datatypes with non-zero lower bounds looks extremely confusing to me, and I cannot make any sense of them.
If I were to design a type system for MPI, I would describe a type with a single
parameter - its size (extent), which is the stride between two adjacent elements of an array. In terms of MPI, I would always set lb = 0 and extent = ub. Such a system looks much clearer to me, and it would work correctly in the example described above.
But MPI has chosen the different way. We have two independent parameters instead: the lower
and the upper bounds. Why is it so? What's the use of this additional flexibility? When should one use datatypes with a non-zero lower bound?
You have no idea what kind of weird and complex structures one finds in scientific and engineering codes. The standard is designed to be as general as possible and to provide maximum flexibility. Section 4.1.6 Lower-Bound and Upper-Bound Markers begins like this:
It is often convenient to define explicitly the lower bound and upper bound of a type map, and override the definition given on page 105. This allows one to define a datatype that has "holes" at its beginning or its end, or a datatype with entries that extend above the upper bound or below the lower bound. Examples of such usage are provided in Section 4.1.14.
Also, the user may want to overide [sic] the alignment rules that are used to compute upper bounds and extents. E.g., a C compiler may allow the user to overide [sic] default alignment rules for some of the structures within a program. The user has to specify explicitly the bounds of the datatypes that match these structures.
The simplest example of a datatype with non-zero lower bound is a structure with absolute addresses used as offsets, useful in, e.g., sending structures with pointers to data scattered in memory. Such a datatype is used with MPI_BOTTOM specified as the buffer address, which corresponds to the bottom of the memory space (0 on most systems). If the lower bound would be fixed to 0, then you have to find the data item with the lowest address fist and compute all offsets relative to it.
Another example is the use of MPI_Type_create_subarray to create a datatype that describes a subarray of an n-dimensional array. With zero lower bounds you will have to provide a pointer to the beginning of the subarray. With non-zero lower bounds you just give a pointer to the beginning of the whole array instead. And you can also create a contiguous datatype of such subarray datatypes in order to send such n-dimensional "slices" from an n+1-dimensional array.
IMO the only reason to have both LB and UB markers is it simplifies the description of datatype construction. The MPI datatypes are described by a type map (list of offsets and types, including possible LB/UB markers) and all the datatype construction calls define the new typemap in terms of the old typemap.
When you have LB/UB markers in the old typemap and you follow the rules of construction of the new typemap from the old, you get a natural definition of the LB/UB marker in the new type which defines the extent of the new type. If extent were a separate property on the side of the typemap, you'd have to define what the new extent is for every datatype construction call.
Other than that I fundamentally I agree with you on the meaninglessness of having LB/UB as two separate pieces of data, when the only thing they're used for is to define the extent. Once you add LB/UB markers, their meaning is completely disconnected from any notion of actual data offsets.
If you wanted to put an int at displacement 4 and have its extent be 8, it would be fine to construct
[(LB,4), (int,4), (UB,12)]
but it would be equally fine to construct any of
[(LB,0),(int,4),(UB,8)]
[(LB,1000000),(int,4),(UB,1000008)]
[(LB,-1000),(int,4),(UB,-992)]
The above are all completely equivalent in behavior because they have the same extent.
When explanations of LB/UB markers talk about how you need to have datatypes where the first data displacement is non-0, I think that's misleading. It's true you need to be able to make types like that, but the LB/UB markers aren't fundamentally connected to the data displacements. I'm concerned that suggesting they are connected will lead an MPI user to write invalid code if they think the LB is intrinsically related to the data offsets.
I want to analyze the pointer values in LLVM IR.
As illustrated in LLVM Value Class,
Value is is a very important LLVM class. It is the base class of all
values computed by a program that may be used as operands to other
values. Value is the super class of other important classes such as
Instruction and Function. All Values have a Type. Type is not a
subclass of Value. Some values can have a name and they belong to some
Module. Setting the name on the Value automatically updates the
module's symbol table.
To test if a Value is a pointer or not, there is a function a->getType()->isPointerTy(). LLVM also provides a LLVM PointerType class, however there are not direct apis to compare the values of pointers.
So I wonder how to compare these pointer values, to test if they are equal or not. I know there is AliasAnalysis, but I have doubt with the AliasAnalysis results, so I want to validate it myself.
The quick solution is to use IRBuilder::CreatePtrDiff. This will compute the difference between the two pointers, and return an i64 result. If the pointers are equal, this will be zero, and otherwise, it will be nonzero.
It might seem excessive, seeing as CreatePtrDiff will make an extra effort to compute the result in terms of number of elements rather than number of bytes, but in all likelihood that extra division will get optimized out.
The other option is to use a ptrtoint instruction, with a reasonably large result type such as i64, and then do an integer comparison.
From the online reference:
Value * CreatePtrDiff (Value *LHS, Value *RHS, const Twine &Name="")
Return the i64 difference between two pointer values, dividing out the size of the pointed-to objects.
Different from OpenGL ES 3, without gl.mapBufferRange and gl.bufferSubData (It exists), what is the efficient way to update uniform buffer data in WebGL 2?
For example, a PerDraw Uniform block
uniform PerDraw
{
mat4 P;
mat4 MV;
mat3 MNormal;
} u_perDraw;
gl.bufferSubData exists so it would seem like you create a buffer then create a parallel typedArray. Update the typedArray and call
gl.bufferSubData to copy it into the buffer to do the update and gl.bindBufferRange to use it.
That's probably still very fast. First off all value manipulation stays in JavaScript so there's less overhead of calling into WebGL. If you have 10 uniforms to update it means you're making 2 calls into WebGL instead of 10.
In TWGL.js I generate ArrayBufferViews for all uniforms into a single typed array so for example given your uniform block above you can do
ubo.MV[12] = tx;
ubo.MV[13] = ty;
ubo.MV[14] = tz;
Or as another example if you have a math library that takes an array/typedarray as a destination parameter you can do stuff like
var dest = ubo.P;
m4.perspective(fov, aspect, zNear, zFar, dest);
The one issue I have is dealing with uniform optimization. If I edit a shader, say I'm debugging and I just insert output = vec4(1,0,0,1); return; at the top of a fragment shader and some uniform block gets optimized out the code is going to break. I don't know what the standard way of dealing with this is in C/C++ projects. I guess in C++ you'd declare a structure
struct PerDraw {
float P[16];
float MV[16];
float MNormal[9];
}
So the problem kind of goes away. In twgl.js I'm effectively generating that structure at runtime which means if your code expects it to exist but it doesn't get generated because it's been optimized out then code break.
In twgl I made a function that copies from a JavaScript object to the typed array so I can skip any optimized out uniform blocks which unfortunately adds some overhead. You're free to modify the typearray views directly and deal with the breakage when debugging or to use the structured copy function (twgl.setBlockUniforms).
Maybe I should let you specify a structure from JavaScript in twgl and generate it and it's up to you to make it match the uniform block object. That would make it more like C++, remove one copy, and be easier to deal with when debugging optimizations remove blocks.
Here is problem code:
int* m_A = new int[4]
int* reAlloc = new int[10];
memcpy(reAlloc, m_A, 10 *sizeof(int));
When I've compiled it seems okay.
Is it okay when the third argument of memcpy is greater than the size of
second argument of memcpy?
It's not okay. Your code does not ensure that m_A points to a memory location which is at least 10 * sizeof(int) from reAlloc, so the areas might overlap. That means your code causes undefined behavior.
Even if you tried to fix it using memmove(), you are still in the grey zone, because you don't know what is at *(m_A + 9).
Might be your data, then it would work (with memmove(), because it still might be reAlloc's data). Might not be your data, in which case you'll get some SIGSEGV.
memcpy() and memmove() are low-level memory manipulation functions, the compiler will expect you know what you are doing and will not emit warnings.