Ceph storage usable space calculation

Ceph storage usable space calculation - openstack

Can some help me with below question.
How I can calculate total usable ceph storage space.
Lets say I have 3 nodes and each nodes has 6 OSD of 1TB disk . That is total of 18TB storage ( 3* 6TB ) .All these 18TB space is usable or some space will go for redundancy ?

Ceph has two important values: full and near-full ratios. Default for full is 95% and nearfull is 85%. (http://docs.ceph.com/docs/jewel/rados/configuration/mon-config-ref/)
If any OSD hits the full ratio it will stop accepting new write requrests (Read: you cluster stucks). You can raise this value, but be careful, because if OSD stops because there is no space left (at FS level), you may experience data loss.
That means, that you couldn't get more than full ratio out of your cluster, and for normal operations it's wise to not reach nearfull value.
For you case, with redundancy 3, you have 6*3 Tb of raw space, this translates to 6 TB of protected space, after multiplying by 0.85 you have 5.1Tb of normally usable space.
Two more unsolicited advises: Use at least 4 nodes (3 is a bare minimum to work, if one node is down, you have a trouble), and use lower values for near-full. I'd advice to have it around 0.7. In this case you will have (4 nodes, 6 * 1Tb OSD, /3, *.7) 5.6 Tb of usable space.

Related

Maximum amount of data that R igraph package can handle

I'd like to ask what is the maximum amount of data R igraph package can handle. Is it possible to write hunderds of million rows of data? I could not find anything specific on the document. The document says that it can handle huge amounts of data.
Thanks

At the moment, R/igraph does not check if graph size limits are exceeded, however, if you do exceed certain values, it may misbehave. Checks will hopefully be added for version 2.0.
To be on the safe side, follow these guidelines:
Avoid vectors or matrices with more than 2^31 - 1 ~ 2.1 billion elements. That refers to the total number of elements in the matrix, i.e. a square matrix should not be larger than 46340 by 46340.
Graphs should have fewer than 2^31 - 1 ~ 2.1 billion vertices.
Graphs should have fewer than 2^30 - 1 ~ 1 billion edges.
These are likely to remain the size limits for version 2.0 of R/igraph, except that when they are exceeded, a user-friendly error will be shown. The reason for these limits is that R still does not support 64-bit integers.
The C, Python and Mathematica interfaces of igraph will be able to handle much larger graphs as they will be using 64-bit integers on systems that support it.

Cellular automaton with more then 2 states(more than just alive or dead)

I am making a roguelike where the setting is open world on a procedurally generated planet. I want the distribution of each biome to be organic. There are 5 different biomes. Is there a way to organically distribute them without a huge complicated algorithm? I want the amount of space each biome takes up to be nearly equal.
I have worked with cellular automata before when I was making the terrain generators for each biome. There were 2 different states for each tile there. Is there an efficient way to do 5?
I'm using python 2.5, although specific code isn't necessary. Programming theory on it is fine.
If the question is too open ended, are there any resources out there that I could look at for this kind of problem?

You can define a cellular automaton on any cell state space. Just formulate the cell update function as F:Q^n->Q where Q is your state space (here Q={0,1,2,3,4,5}) and n is the size of your neighborhood.
As a start, just write F as a majority rule, that is, 0 being the neutral state, F(c) should return the value in 1-5 with the highest count in the neighborhood, and 0 if none is present. In case of equality, you may pick one of the max at random.
As an initial state, start with a configuration with 5 relatively equidistant cells with the states 1-5 (you may build them deterministically through a fixed position that can be shifted/mirrored, or generate these points randomly).
When all cells have a value different than 0, you have your map.
Feel free to improve on the update function, for example by applying the rule with a given probability.

Why do they choose numbers like 16, 32, 128 in programming? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Sometimes in code, I see the developer chooses a number like 32 for a package of data. Or in a game, the loaded terrain of a map has the size of 128*128 points.
I know it has something to do with the maximum size of datatypes. Like a Char has 8 bits, etc.
But why don't they just use numbers like 100*100 for a map, a list, or a Minecraft chunk?
If I have 8 bits to store a (positive) number, I can count to 2^8 = 256.
When I choose the size of a map chunk, I could choose a width of 250 in stead of 256. But it seems that is not a good idea. Why?

Sometimes developers do use numbers like 250 or 100. It's not at all uncommon. (1920 appears in a lot of screen resolutions for example.)
But numbers like 8, 32, and 256 are special because they're powers of 2. For datatypes, like 8-bit integers, the number of possible elements of this type is a power of 2, namely, 2^8 = 256. The sizes of various memory boundaries, disk pages, etc. work nicely with these numbers because they're also powers of two. For example, a 16,384-byte page can hold 2048 8-byte numbers, or 256 64-byte structures, etc. It's easy for a developer to count how many items of a certain size fit in a container of another size if both sizes are powers of two, because they've got many of the numbers memorized.

The previous answer emphasizes that data with these sizes fits well into memory blocks, which is of course true. However it does not really explain why the memory blocks themselves have these sizes:
Memory has to be addressed. This means that the location of a given datum has to be calculated and stored somewhere in memory, often in a CPU register. To save space and calculation cost, these addresses should be as small as possible while still allowing as much memory as possible to be addressed. On a binary computer this leads to powers of 2 as optimal memory or memory block sizes.
There is another related reason: Calculations like multiplication and division by powers of 2 can be implemented by shifting and masking bits. This is much more performant than doing general multiplications or divisions.
An example: Say you have a 16 x 16 array of bytes stored in a contiguous block of memory starting at address 0. To calculate the row and column indices from the address, generally you need to calculate row=address / num_columns and column=address % num_columns (% stands for remainder of integer division).
In this special case it is much easier for a binary computer, e.g.:
address: 01011101
mask last 4 bits: 00001101 => column index
shift right by 4: 00000101 => row index

openCL behavior --- need clarification

I am using the following parameters for my simulation on Geforce GT 220 card -
number of compute units = 6
local size = 32
global size = 32*6*256 = 49152
(everything is one dimensional)
But in the Visual Profiler, I see that Number of work groups per Compute Unit = 768. Which means it is utilizing only 2 compute units. Why is that? How can I make sure all the compute units are busy? I mean, ideally, I would expect 49152/(32*6) = 256 work groups per compute unit. I am confused at this behavior.

You should not care about compute units, that is onyl HW specific.
Just care about local size and global size, and try to use the largest local size as you can.
What is probably happening, is that you specify a very small local size. Every group of local size threads are loaded inside a compute unit. And is not efficient to run only 32 threads. So the loading trashing slows the performance, and probably makes the Compute Units remain idle lot of time.
My recomendation, use a very high Local size. Or DO NOT specify a local size (OpenCL will select the higest one posible)

Hardware Cache Formulas (Parameter)

The image below was scanned (poorly) from Computer Systems: A Programmer's Perspective. (I apologize to the publisher). This appears on page 489.
Figure 6.26: Summary of cache parameters http://theopensourceu.com/wp-content/uploads/2009/07/Figure-6.26.jpg
I'm having a terribly difficult time understanding some of these calculations. At the current moment, what is troubling me is the calculation for M, which is supposed to be the number of unique addresses. "Maximum number of unique memory addresses." What does 2m suppose to mean? I think m is calculated as log2(M). This seems circular....
For the sake of this post, assume the following in the event you want to draw up an example: 512 sets, 8 blocks per set, 32 words per block, 8 bits per word
Update: All of the answers posted thus far have been helpful but I still think I'm missing something. cwrea's answer provides the biggest bridge for my understand. I feel like the answer is on the tip of my mental tongue. I know it is there but I can't identify it.
Why does M = 2m but then m = log2(M)?
Perhaps the detail I'm missing is that for a 32-bit machine, we'd assume M = 232. Does this single fact allow me to solve for m? m = log2(232)? But then this gets me back to 32... I have to be missing something...

m & M are related to each other, not defined in terms of each other. They call M a derived quantity however since usually the processor/controller is the limiting factor in terms of the word length it uses.
On a real system they are predefined. If you have a 8-bit processor, it generally can handle 8-bit memory addresses (m = 8). Since you can represent 256 values with 8-bits, you can have a total of 256 memory addresses (M = 2^8 = 256). As you can see we start with the little m due to the processor constraints, but you could always decide you want a memory space of size M, and use that to select a processor that can handle it based on word-size = log2(M).
Now if we take your assumptions for your example,
512 sets, 8 blocks per set, 32 words
per block, 8 bits per word
I have to assume this is an 8-bit processor given the 8-bit words. At that point your described cache is larger than your address space (256 words) & therefore pretty meaningless.
You might want to check out Computer Architecture Animations & Java applets. I don't recall if any of the cache ones go into the cache structure (usually they focus on behavior) but it is a resource I saved on the past to tutor students in architecture.
Feel free to further refine your question if it still doesn't make sense.

The two equations for M are just a relationship. They are two ways of saying the same thing. They do not indicate causality, though. I think the assumption made by the author is that the number of unique address bits is defined by the CPU designer at the start via requirements. Then the M can vary per implementation.

m is the width in bits of a memory address in your system, e.g. 32 for x86, 64 for x86-64. Block size on x86, for example, is 4K, so b=12. Block size more or less refers to the smallest chunk of data you can read from durable storage -- you read it into memory, work on that copy, then write it back at some later time. I believe tag bits are the upper t bits that are used to look up data cached locally very close to the CPU (not even in RAM). I'm not sure about the set lines part, although I can make plausible guesses that wouldn't be especially reliable.

Circular ... yes, but I think it's just stating that the two variables m and M must obey the equation. M would likely be a given or assumed quantity.
Example 1: If you wanted to use the formulas for a main memory size of M = 4GB (4,294,967,296 bytes), then m would be 32, since M = 2^32, i.e. m = log2(M). That is, it would take 32 bits to address the entire main memory.
Example 2: If your main memory size assumed were smaller, e.g. M = 16MB (16,777,216 bytes), then m would be 24, which is log2(16,777,216).

It seems you're confused by the math rather than the architectural stuff.
2^m ("2 to the m'th power") is 2 * 2... with m 2's. 2^1 = 2, 2^2 = 2 * 2 = 4, 2^3 = 2 * 2 * 2 = 8, and so on. Notably, if you have an m bit binary number, you can only represent 2^m different numbers. (is this obvious? If not, it might help to replace the 2's with 10's and think about decimal digits)
log2(x) ("logarithm base 2 of x") is the inverse function of 2^x. That is, log2(2^x) = x for all x. (This is a definition!)
You need log2(M) bits to represent M different numbers.
Note that if you start with M=2^m and take log2 of both sides, you get log2(M)=m. The table is just being very explicit.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Ceph storage usable space calculation - openstack

Can some help me with below question. How I can calculate total usable ceph storage space. Lets say I have 3 nodes and each nodes has 6 OSD of 1TB disk . That is total of 18TB storage ( 3* 6TB ) .All these 18TB space is usable or some space will go for redundancy ?

Related

Maximum amount of data that R igraph package can handle

Cellular automaton with more then 2 states(more than just alive or dead)

Why do they choose numbers like 16, 32, 128 in programming? [closed]

openCL behavior --- need clarification

Hardware Cache Formulas (Parameter)

Categories

Resources