In rocksdb Is there any way to know upto what size a file can go in a level? - rocksdb

I am working on rocksdb,but unable to get an option which can tell me the maximum limit of size file inside a level ? And if once it reaches that maximum size how files gets split in RocksDB?

The option you are looking for is target_file_size_base and target_file_size_multiplier.
target_file_size_base - configures the size of SST files in level-1.
target_file_size_multiplier - configures the size of SST files in further levels.
For eg : If target_file_size_base is set to 2MB and target_file_size_multiplier is 10,
Level-1 SST files will be 2MB,
Level-2 SST files will be 20MB,
Level-3 SST files will be 200MB and so on..
You can also configure the number of such files in each level using,
max_bytes_for_level_base and max_bytes_for_level_multiplier.
For eg : If max_bytes_for_level_base = 200MB and target_file_size_base = 2MB, then Level-1 will contain 100 files of 2MB each
You can check for these options in options.h and advanced_options.h files.
if once it reaches that maximum size how files gets split in RocksDB
During compaction/flush, the files are created with configured size. If there are more files than the configured number, compaction gets triggered and the files are pushed to higher levels

Related

Rascal MPL get line count of file without loading its contents

Is there a more efficient way than
int fileSize = size(readFileLines(fileLoc));
to get the total number of lines in a file? I presume this code has to read the entire file first, which could become costly for huge files.
I have looked into IO and Loc whether some of this info might be saved in conjunction with the file.
This is the way, unless you'd like to call wc -l via util::ShellExec 😁
Apart from streaming the file and saving some memory counting lines is always linear in the size of the file so you won't win much time.

How can I find the first clusters/blocks of a file?

I have a FAT16 drive that contains the following info:
Bytes per sector: 512 bytes (0x200)
Sectors per cluster: 64 (0x40)
Reserved sectors: 6 (0x06)
Number of FATs: 2 (0x02)
Number of root entries: 512 (0x0200)
Total number of sectors: 3805043 (0x3a0f73)
Sectors per file allocation table: 233 (0xE9)
Root directory is located at sector 472 (0x1d8)
I'm looking for a file with the following details:
File name: LOREMI~1
File extension: TXT
File size: 3284 bytes (0x0cd4)
First cluster: 660 (0x294)
However, I would admit that the start of the file cluster is located at sector 42616. My problem is that what equation should I use that would produce 42616?
I have trouble figuring this out since there is barely any information about this other than a tutorial made by Tavi Systems but the part involving this is very hard to follow.
Actually, the FAT filesystem is fairly well documented. The official FAT documentation by Microsoft can be found by the filename fatgen103.
The directory entry LOREMI~1.TXT can be found in the root directory and is precedented by the long file name entry (xt, lorem ipsum.t → lorem ipsum.txt), the directory entry is documented in the «FAT Directory Structure» chapter; in case of FAT16 you are interested in the 26th to 28th byte to get the cluster address (DIR_FstClusLo), which is (little endian!) 0x0294 (or 660₁₀).
Based on the BPB header information you provided we can calculate the the data sector like this:
data_sector = (cluster-2) * sectors_per_cluster +
(reserved_sectors + (number_of_fats * fat_size) +
first_data_sector)
Why cluster-2? Because the first two clusters in a FAT filesystem are always reserved for the BPB header block as well as the FAT itself, see chapter «FAT Data Structure» in fatgen103.doc.
In order for us to solve this, we still need to determine the sector span of the root directory entry. For FAT12/16 this can be determined like this:
first_data_sector = ((root_entries * directory_entry_size) +
(bytes_per_sector - 1)) // bytes_per_sector
The directory entry size is always 32 bytes as per specification (see chapter «FAT Directory Structure» in fatgen103.doc), every other value is known by now:
first_data_sector = ((512*32)+(512-1)) // 512 → 32
data_sector = (660-2)*64+(6+(2*233)+32) → 42616

Using Cat to merge Files, ignoring the last Byte of each file

I have written a rudimentary HTTP downloader that downloads parts of a file and stores them in temporary, partial files in a directory.
I can use
cat * > outputfilename
To concatenate the partial files together as their order is given by the individual filenames.
However.
The range of each file is something like:
File 1: 0 - 1000
File 2: 1000 - 2000
File 3: 2000 - 3000
For a file that is 3000 bytes in size. i.e. Last Byte overlaps of first byte.
The cat command duplicates the overlapping bytes into a new file.
Specifically, We can see this with images looking wrong
i.e:
(just using an image from imgur)
https://i.imgur.com/XEvBCtp.jpg
Renders with 1/(the number of partial files) of the image correctly.
To note: The original image is 250 KB.
The concatenated image is 287 KB.
I will be implementing this in C99, Unix as a method that calls exec.
I'm not sure where to upload the partial files to assist w/ stackoverflow.

Why does the Flex compiler generate varying file sizes on successive compilations of the exact same source code?

I'm building a SWF using the command line compiler mxmlc.exe. The compiler writes the output file size as part of it's stdout. If I run the compiler multiple times in succession without changing the actual source code, I see the file size bounce up and down a few bytes at a time.
C:\>mxmlc.exe Gallery.as
C:\Gallery.swf (28443 bytes)
C:\>mxmlc.exe Gallery.as
C:\Gallery.swf (28442 bytes)
C:\>mxmlc.exe Gallery.as
C:\Gallery.swf (28440 bytes)
C:\>mxmlc.exe Gallery.as
C:\Gallery.swf (28442 bytes)
I can't think why this would possibly be the case. Even if I delete the output file each time, the re-generated file size still varies in this way.
Any ideas why?
The Flex compiler includes some information in your SWF that changes from build-to-build, such as the date and time it was built. The SWF is then compressed. Sometimes the compression will work a bit better than others for the varying metadata, thus the minor changes in file size.
http://livedocs.adobe.com/flex/3/html/help.html?content=compilers_16.html#145380

What do the numbers in rsync's output mean?

When I run rsync with the --progress flag, I get information about the transfers as follows.
path/to/file
16 100% 0.01kB/s 0:00:01 (xfer#10857, to-check=427700/441502)
What do the numbers in the second row mean? I know what some of them are, but what do the others mean (marked with ??? below)?
16 ???
100% amount of transfer completed in this file
0.0.1kB/s speed of current file transfer
0:00:01: time elapsed in current file transfer
10857 count of files transferred
427700 ???
441502 ???
When the file transfer finishes, rsync
replaces the progress line with a
summary line that looks like this:
1238099 100% 146.38kB/s 0:00:08 (xfer#5, to-check=169/396)
In this example, the file was 1238099
bytes long in total, the average rate
of transfer for the whole file was
146.38 kilobytes per second over the 8 seconds that it took to complete, it
was the 5th transfer of a regular file
during the current rsync session, and
there are 169 more files for the
receiver to check (to see if they are
up-to-date or not) remaining out of
the 396 total files in the file-list.
from http://samba.anu.edu.au/ftp/rsync/rsync.html under --progress switch
path/to/file
16 100% 0.01kB/s 0:00:01 (xfer#10857, to-check=427700/441502)
The 16 is the bytes-in-this-file transferred sofar. The 100% lists the percentage of the file transferred: 100% in this case. For very short files the kb/sec number often comes out a bit weird: Small measuring errors cause big differences in the calculated overall speed. Then there is the total time. Then, the transfer number. In the example given, of the 427700 files checked so far, only 10857 needed to be transferred. Based on the modification times rsync decided that no transfer was needed for some of the others. Next there is the number of files left-to-check and the total. Modern rsync implementations will create the list that counts towards the "total" on the fly: only adding to the list if the unchecked number drops below 1000.

Resources