Vector does reallocation on every push_back

Vector does reallocation on every push_back - vector

IDE - Visual Studio 2008, Visual C++
I have a custom class Class1 with a copy constructor to it.
I also have a vector
Data is inserted using the following code
Class1* objClass1;
vector<Class1> vClass1;
for(int i=0;i<1000;i++) {
objClass1 = new Class1();
vClass1.push_back(*objClass1);
delete objClass1;
}
Now on every insert, the vector gets re-allocated and all the existing contents are copied to new locations. For example, if the vector has 5 elements and if I insert the 6th one, the previous 5 elements along with the new one gets copied to a new location (I figured it out by adding log statements in the copy constructors.)
On using reserve(), this however does not happen as expected! I have the following questions
Is it mandatory to always use the reserve statement?
Does vector does a reallocation every time I do a push_back; or does it happen because I am debugging?

It's not mandatory, it's an optimization because reallocating is expensive.
I think it's an implementation detail how often it reallocates. I think it's normal for the vector to double its storage every time it reallocates, but, as I said, this can vary by implementation. (It might be the case that because you are in a debug build it's reallocating more often than normal.)

Find out by putting your copy constructor test into non debug code, and let us know what you get for your platform! IMO the vector shouldn't reallocate on every pushback. There are smarter ways to manage memory, and I'd bet money that the implementers didn't do that.

Related

Restarting from where recorder left off and Iteration number

I have 2 questions on the case recorder.
1- I am not sure how to restart an optimizaiton from where the recorder left off. I can read in the case reader sql file etc but can not see how this can be fed into the problem() to restart.
2- this question is maybe due to my lack of knowledge in python but how can one access to the iteration number from within an openmdao component (one way is to read the sql file that is constantly being updated but there should be a more efficient way.)

You can re-load a case back via the load_case method on the problem.
See the docs for it here.
Im not completely sure what you mean by access the iteration count, but if you just want to know the number of times your components are called you can add a counter to them yourself.
There is not a programatic API for accessing the iteration count in OpenMDAO as of version 2.3

Golang RWMutex on map content edit

I'm starting to use RWMutex in my Go project with map since now I have more than one routine running at the same time and while making all of the changes for that a doubt came to my mind.
The thing is that I know that we must use RLock when only reading to allow other routines to do the same task and Lock when writing to full-block the map. But what are we supposed to do when editing a previously created element in the map?
For example... Let's say I have a map[int]string where I do Lock, put inside "hello " and then Unlock. What if I want to add "world" to it? Should I do Lock or can I do RLock?

You should approach the problem from another angle.
A simple rule of thumb you seem to understand just fine is
You need to protect the map from concurrent accesses when at least one of them is a modification.
Now the real question is what constitutes a modification of a map.
To answer it properly, it helps to notice that values stored in maps are not addressable — by design.
This was engineered that way simply due to the fact maps internally have intricate implementation which
might move values they contain in memory
to provide (amortized) fast access time
when the map's structure changes due to insertions and/or deletions of its elements.
The fact map values are not addressable means you can not do
something like
m := make(map[int]string)
m[42] = "hello"
go mutate(&m[42]) // take a single element and go modifying it...
// ...while other parts of the program change _other_ values
m[123] = "blah blah"
The reason you are not allowed to do this is the
insertion operation m[123] = ... might trigger moving
the storage of the map's element around, and that might
involve moving the storage of the element keyed by 42
to some other place in memory — pulling the rug
from under the feet of the goroutine
running the mutate function.
So, in Go, maps really only support three operations:
Insert — or replace — an element;
Read an element;
Delete an element.
You cannot modify an element "in place" — you can only
go in three steps:
Read the element;
Modify the variable containing the (read) copy;
Replace the element by the modified copy.
As you can now see, the steps (1) and (3) are mere map accesses,
and so the answer to your question is (hopefully) apparent:
the step (1) shall be done under at least an read lock,
and the step (3) shall be done under a write (exclusive) lock.
In contrast, elements of other compound types —
arrays (and slices) and fields of struct types —
do not have the restriction maps have: provided the storage
of the "enclosing" variable is not relocated, it is fine to
change its different elements concurrently by different goroutines.

Since the only way to change the value associated with the key in the map is to reassign the changed value to the same key, that is a write / modification, so you have to obtain the write lock–simply using the read lock will not be sufficient.

push object originally on the stack to a vector, will the objects get lost?

I just started using STL, say I have a rabbit class, now I'm creating a rabbit army...
#include <vector>
vector<rabbit> rabbitArmy (numOfRabbits,rabbit());
//Q1: these rabbits are on the heap right?
rabbit* rabbitOnHeap = new rabbit();
//Q2: rabbitOnHeap is on the heap right?
rabbit rabbitOnStack;
//Q3: this rabbit is on the stack right?
rabbitArmy.push_back(rabbitOnStack);
//Q4: rabbitOnStack will remain stored on the stack?
//And it will be deleted automatically, though it's put in the rabbitArmy now?
Q4 is the one I'm most concerned with, should I always use new keyword to add rabbit to my army?
Q5: Is there better way to add rabbits to the army than:
rabbitArmy.push_back(*rabbitOnHeap);

Since you haven't specified otherwise, the objects you put in the vector will be allocated with std::allocator<rabbit>, which uses new. For what it's worth, that's usually called the "free store" rather than the heap1.
Again, the usual term is the free store.
Officially, that's "automatic storage", but yes, on your typical implementation that'll be the stack, and on an implementation that doesn't support a stack in hardware, it'll still be a stack-like (LIFO) data structure of some sort.
When you add an item to a vector (or other standard container) what's actually added to the container is a copy of the item you pass as a parameter. The item you pass as a parameter remains yours to do with as you please. In the case of something with automatic storage class, it'll be destroyed when it goes out of scope -- but the copy of it in the collection will remain valid until it's erased or the collection destroyed, etc.
No. In fact, you should only rarely use new to allocate items you're going to put in a standard collection. Since the item in the array will be a copy of what you pass, you don't normally need to use new to allocate it.
Usually you just push back a local object.
For example:
for (i=0; i<10; i++)
rabbitArmy.push_back(rabbit());
This creates 10 temporary rabbit objects (the rabbit() part), adds a copy of each to the rabbitArmy. Then each of the temporaries is destroyed, but the copies of them in the rabbitArmy remain.
In typical usage, "the heap" refers to memory managed by calloc, malloc, realloc, and free. What new and delete manage is the free store. A new expression, in turn, obtains memory from an operator new (either global or inside a class). operator new and operator delete are specified so they could be almost a direct pass-through to malloc and free respectively, but even when that's the case the heap and free store are normally thought of separately.

Flex AS3: Are smaller variable names faster than longer names?

We are in the process of the optimization of a Flex AS3 Application.
One of my team members suggested we make the variable name lengths smaller to optimize the application performance.
I.e.:
var IsRegionSelected:Boolean = false; //Slower
var IsRS:Boolean = false; //faster
Is this true?

No, the gain you will obtain will be only for the size of the swf.
String are put into a constant pool and instruction refering to this String will use an index.
it can be seen as (very schematic) :
constant pool:
[0] IsRegionSelected
[1] IsRS
usage:
value at 0 = false
value at 1 = false
Your code will be probably translated as (for local variable):
push false
setlocal x
push false
setlocal y
where x and y are register int assign by the compiler, so no difference if it's register 2 or register 4
For more detailed read the avm specification

yep.. i second it. changing the name length is not gonna help you. concentrate on item renderers, effects, states and transitions. those may be killing your resource. also checkout for any embedding images, embedding fonts, etc, since those will increase ur final swf file size and increase initial loading time.
cheers, PK

I don't think so, the way you use your variable name does matter than its length.
Good code should be consistent. Whether that means setting rules for the names of variables and functions, adopting standard approaches, or simply making sure all of your code is indented the same way, consistency makes your code easier for others to read.
One should later construe on what is your variable name declared.
var g:String;
var gang:String;
Both perform the same operation, one is more readability where someone going through your code will also construe it.

There's a very small performance gain, but if you plan to use this application again later, it's not worth your sanity. Do absolutely any other optimization you can before this one - and if it's really slow enough to need optimizing, then there are definitely other factors that you'll need to take care of first before variable names.
Cut anything else you can before resorting to 1-2 millisecond boosts.

As Matchu says, there is a difference but a small one.
You should consider assigning meaningful ids to your variables instead of just using simple chars which have no sense.

Having two sets of input combined on hadoop

I have a rather simple hadoop question which I'll try to present with an example
say you have a list of strings and a large file and you want each mapper to process a piece of the file and one of the strings in a grep like program.
how are you supposed to do that? I am under the impression that the number of mappers is a result of the inputSplits produced. I could run subsequent jobs, one for each string, but it seems kinda... messy?
edit: I am not actually trying to build a grep map reduce version. I used it as an example of having 2 different inputs to a mapper. Let's just say that I lists A and B and would like for a mapper to work on 1 element from list A and 1 element from list B
So given that the problem experiences no data dependency that would result in the need for chaining jobs, is my only option to somehow share all of list A on all mappers and then input 1 element of list B to each mapper?
What I am trying to do is built some type of a prefixed look-up structure for my data. So I have a giant text and a set of strings. This process has a strong memory bottleneck, therefore I was after 1 chunk of text/1 string per mapper

Mappers should be able to work independent and w/o side effects. The parallelism can be, that a mapper tries to match a line with all patterns. Each input is only processed once!
Otherwise you could multiply each input line with the number of patterns. Process each line with a single pattern. And run the reducer afterwards. A ChainMapper is the solution of choice here. But remember: A line will appear twice, if it matches two patterns. Is that what you want?
In my opinion you should prefer the first scenario: Each mapper processes a line independently and checks it against all known patterns.
Hint: You can distribute the patterns with the DistributedCache feature to all mappers! ;-) Input should be splitted with the InputLineFormat

a good friend had a great epiphany: what about chaining 2 mappers?
in the main, run a job that fires up a mapper (no reducer). The input is the list of strings, and we can arrange things so that each mapper gets one string only.
in turn, the first mapper starts a new job, where the input is the text. It can communicate the string by setting a variable in the context.

Regarding your edit:
In general a mapper is not used to process 2 elements at once. He shall only process one element a time. The job should be designed in a way, that there could be a mapper for each input record and it would still run correctly!
Of course it is suitable, that the mapper needs some supporting information to process the input. This information can be by-passed with the Job Configuration (Configuration.setString() for example). A larger set of data shall be passed via the distributed cache.
Did you have a look on one of these options?
I'm not sure if I fully understood your problem, so please check by yourself if that would work ;-)
BTW: A appreciating vote for my well investigated previous answer would be nice ;-)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex