RocksDB total memory usage - rocksdb

I'm looking at an example RocksDB option configuration:
opts = rocksdb.Options()
opts.create_if_missing = True
opts.max_open_files = 300000
opts.write_buffer_size = 67108864
opts.max_write_buffer_number = 3
opts.target_file_size_base = 67108864
opts.table_factory = rocksdb.BlockBasedTableFactory(
filter_policy=rocksdb.BloomFilterPolicy(10),
block_cache=rocksdb.LRUCache(2 * (1024 ** 3)),
block_cache_compressed=rocksdb.LRUCache(500 * (1024 ** 2)))
https://python-rocksdb.readthedocs.io/en/latest/tutorial/
It says
It assings a cache of 2.5G, uses a bloom filter for faster lookups and
keeps more data (64 MB) in memory before writting a .sst file.
Does this mean it uses a maximum memory of 2.5GB or 64MB?
And why is the cache 2.5GB? (2 * (1024 ** 3)) is 2 billion, not 2.5 billion?

Does this mean it uses a maximum memory of 2.5GB or 64MB?
NO. It means the block cache will cost 2.5GB, and the in-memory table will cost 64 * 3MB, since there are 3 (opts.max_write_buffer_number) buffers, each is of size 64MB (opts.write_buffer_size). Besides that, Rocksdb still need some other memory for index and bloom filer blocks. Check this for detail.
And why is the cache 2.5GB? (2 * (1024 ** 3)) is 2 billion, not 2.5 billion?
Because there're 2GB uncompressed block cache (block_cache), and 0.5GB compressed block cache (block_cache_compressed).

Related

Error: cannot allocate vector of size 57.8 Gb [duplicate]

This question already has answers here:
R memory management / cannot allocate vector of size n Mb
(9 answers)
Closed 3 years ago.
I'd like to run a model on RStudio Server, but I'm getting this error.
Error: cannot allocate vector of size 57.8 Gb
This is what my data looks like and it has 10,000 rows.
latitude longitude close_date close_price
1 1.501986 86.35068 2014-08-16 22:25:31.925431 1302246.3
2 36.367095 -98.66428 2014-08-05 06:34:00.165876 147504.5
3 36.599284 -97.92470 2014-08-12 23:48:00.887510 137400.6
4 67.994791 64.68859 2014-08-17 05:27:01.404296 -14112.0
This is my model.
library(caret)
training.samples <- data$close_price %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- data[training.samples, ]
test.data <- datatraining.samples, ]
model <- train(
close_price~., data = train.data, method = "knn",
trControl = trainControl("cv", number = 1),
preProcess = c("center","scale"),
tuneLength = 1
)
My EC2 instance has more than 57 GB available. This is the memory.
total used free shared buffers cached
Mem: 65951628 830424 65121204 64 23908 215484
-/+ buffers/cache: 591032 65360596
Swap: 0 0 0
And it has enough storage space, too. This is the hard drive space.
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 32965196 64 32965132 1% /dev
tmpfs 32975812 0 32975812 0% /dev/shm
/dev/xvda1 103079180 6135168 96843764 6% /
And these are details on the machine.
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03
Because there's always a temporary value "*tmp*" as well as a final value you need about 2 to 3 times the projected object size to do anything useful with it. (The link talks about subset assignment but it also applies any use of the <- function.) Furthermore, to assign a new value to an object name there must be contiguous memory available. So even the supposedly "available" memory may not be contiguous. You either need to buy more memory space or reduce the size of your model. Calculations are all done in RAM or RAM equivalent. There's not usually any disk-swapping unless your OS provides virtual memory.

How the object size in R are calculated?

> print(object.size(runif(1e6)),unit="Mb")
7.6 Mb
This gives me 7.6Mb for a vector with 1 million elements. But why? each element is 32 bit or 64 bit? I cannot add these numbers up.
They're 64-bit (8-byte) floating point values. One megabyte (Mb) is 2^20 bytes (not 10^6 - see below) ... so ...
8*1e6/(2^20)
[1] 7.629395
Lots of potential for confusion about what Mb means:
according to Wikipedia "MB" is the recommended abbreviation for "megabyte", but R uses "Mb"
there is plenty of confusion about whether "mega" means 10^6 or 2^20 in this context.
As usual, this is clearly documented, deep in the details of ?object.size ...
As illustrated by below tables, the legacy and IEC standards use binary units (multiples of 1024), whereas the SI standard uses decimal units (multiples of 1000) ...
*object size* *legacy* *IEC*
1 1 bytes 1 B
1024 1 Kb 1 KiB
1024^2 1 Mb 1 MiB
Google's conversion appears to use SI units (1 MB = 10^6 bytes) instead.

Unity conversions in transmission delay

I'm currently learning about transmission delay and propagation. I'm really having a tough time with the conversions. I understand how it all works but I cant get through the converting. For example:
8000bits/5mbps(mega bits per second) I have no idea how to do this conversion , I've tried looking online but no one explains how the conversion happens. I'm supposed to get 1.6 ms, but I cannot see how the heck that happens. I tried doing it this way, 8000b / 5x10^6 b/s but that gives me 1600 s.
(because that would not fit in a comment):
8000 bits = 8000 / 1000 = 8 kbit, or 8000 / 1000 / 1000 = 0.008 mbit.
(or 8000 / 1024 = 7.8 Kibit, or 8000 / 1024 / 1024 = 0.0076 Mibit,
see here: https://en.wikipedia.org/wiki/Data_rate_units)
Say you have a throughput of 5mbps (mega bits per second), to transmit your 8000 bits that's:
( 0.008 mbit) / (5 mbit/s) = 0.0016 s = 1.6 ms
That is, unit wise:
bit / (bit/s)
bit divided by bit => the bit unit disappear,
then divide and divide by seconds = not "something per second", but second,
result unit is second.

convert 56 kbps to monthly usage in GB

From my internet connection (SIM card) of 56kbps (unlimited data) what would be total gigabytes of data I can consume provided I was using it continuously?
My basic math:
30 days = 2592000 seconds
56 * 2592000 = 145152000 kb = 141750 MB = 141 GB
Does this calculation make sense?
Your basic maths is good, unfortunately you were tricked by the notations which are unfortunately very confusing in this domain.
1) Lower case b stands for a bit, while capital B is a byte, which is made of 8 bits. So when you get 56 kb/s you actually get 56/8 = 7 kB/s.
This gives you 1814400 kB per month.
2) Now comes the second problem. The definition of what is a kB, a MB or a GB is not uniform. Normally you would expect that there are defined following powers of ten (as in any other science) in which case your 1814400 kB per month would convert into 18144 MB per month or 18.1 GB per month.
However for historical reason MB are sometimes defined as 1024 kB and GB as 1024 MB. In this case you would get 17719 MB per month or 17.3 GB per month.
Which convention you should use depend what you actually want to do with it. But such a small difference is probably irrelevant to you compared to potential fluctuations in the actual transfer rate of your connection.

Profiler shows OpenCL not uses all registers available

Here is the copy of occupancy analysis of my kernel from the NVIDIA Compute Visual Profiler:
Kernel details : Grid size: 300 x 1, Block size: 224 x 1 x 1
Register Ratio = 0.75 ( 24576 / 32768 ) [48 registers per thread]
Shared Memory Ratio = 0 ( 0 / 49152 ) [0 bytes per Block]
Active Blocks per SM = 2 : 8
Active threads per SM = 448 : 1536
Occupancy = 0.291667 ( 14 / 48 )
Achieved occupancy = 0.291667 (on 14 SMs)
Occupancy limiting factor = Registers
Warning: Grid Size (300) is not a multiple of available SMs (14).
I am new to openCL and I did a lot of optimisations to bring down the number of registers used so that 3 concurrent blocks can be launched on a SM. However, the profiler only shows that only 2 blocks can run concurrently and the limit factor is registers. But the problem is that it is obvious that my kernel only uses 224 x 48 = 10752 registers per block and therefore, would be capable of running 3 blocks (i.e. 224 x48 x 3 = 32256 registers / 32768 available registers). The problem still exists when I reduce the number of threads per block to 208 which means it should only uses 208 x 48 x 3 = 29952 / 32768 for 3 blocks...
At first, I think it is because of local memory, but my calculation of local memory shows it should be able to launch 3 blocks / SM. And I dont know why the profiler does not show Shared Memory Ratio although my kernel uses local memory.
Thanks for your help.

Resources