I have a FAT16 drive that contains the following info:
Bytes per sector: 512 bytes (0x200)
Sectors per cluster: 64 (0x40)
Reserved sectors: 6 (0x06)
Number of FATs: 2 (0x02)
Number of root entries: 512 (0x0200)
Total number of sectors: 3805043 (0x3a0f73)
Sectors per file allocation table: 233 (0xE9)
Root directory is located at sector 472 (0x1d8)
I'm looking for a file with the following details:
File name: LOREMI~1
File extension: TXT
File size: 3284 bytes (0x0cd4)
First cluster: 660 (0x294)
However, I would admit that the start of the file cluster is located at sector 42616. My problem is that what equation should I use that would produce 42616?
I have trouble figuring this out since there is barely any information about this other than a tutorial made by Tavi Systems but the part involving this is very hard to follow.
Actually, the FAT filesystem is fairly well documented. The official FAT documentation by Microsoft can be found by the filename fatgen103.
The directory entry LOREMI~1.TXT can be found in the root directory and is precedented by the long file name entry (xt, lorem ipsum.t → lorem ipsum.txt), the directory entry is documented in the «FAT Directory Structure» chapter; in case of FAT16 you are interested in the 26th to 28th byte to get the cluster address (DIR_FstClusLo), which is (little endian!) 0x0294 (or 660₁₀).
Based on the BPB header information you provided we can calculate the the data sector like this:
data_sector = (cluster-2) * sectors_per_cluster +
(reserved_sectors + (number_of_fats * fat_size) +
first_data_sector)
Why cluster-2? Because the first two clusters in a FAT filesystem are always reserved for the BPB header block as well as the FAT itself, see chapter «FAT Data Structure» in fatgen103.doc.
In order for us to solve this, we still need to determine the sector span of the root directory entry. For FAT12/16 this can be determined like this:
first_data_sector = ((root_entries * directory_entry_size) +
(bytes_per_sector - 1)) // bytes_per_sector
The directory entry size is always 32 bytes as per specification (see chapter «FAT Directory Structure» in fatgen103.doc), every other value is known by now:
first_data_sector = ((512*32)+(512-1)) // 512 → 32
data_sector = (660-2)*64+(6+(2*233)+32) → 42616
Related
I have been obtaining .zip archives of genome annotation from NCBI (mainly gff files). In order save disk space I prefer not to unzip the archive, but to read these files directly into R using unz(). However, it seems that unz() is unable to extract files from the end of 'large' zip files:
ncbi.zip <- "file_location/name.zip"
files <- unzip(ncbi.zip, list=TRUE)
gff.files <- files$Name[ grep("gff$", files$Name) ]
## this works
gff.128 <- readLines( unz(ncbi.zip, gff.files[128]) )
## this gives an empty data structure (read.table() stops
## with an error saying no lines or similar
gff.129 <- readLines( unz(ncbi.zip, gff.files[129]) )
## there are 31 more gff files after the 129th one.
## no lines are read from any of these.
The zip file itself seems to be fine; I can unzip the specific files using unzip on the command line and unzip -t does not report any errors.
I've tried this with R versions 3.5 (openSuse Leap 15.1), 3.6, and 4.2 (centOS 7) and with more than one zip file and get exactly the same result.
I attached strace to R whilst reading in the 128 and 129th file. In both cases I get a lot of lseek towards the end of file (offset 2845892608, larger than 2^31) to start with. This is where I assume the zip directory can be found. For the 128th file (the one that can be read), I eventually get an lseek to an offset slightly below 2^31, followed by a set of lseeks and reads (that extend beyone 2^31).
For the 129th file, I get the same reads towards the end of the file, but then rather than finding a position within the file I get:
lseek(3, 2845933568, SEEK_SET) = 2845933568
lseek(3, 4294963200, SEEK_SET) = 4294963200
read(3, "", 4096) = 0
lseek(3, 4095, SEEK_CUR) = 4294967295
read(3, "", 4096) = 0
Which is a bit weird since the file itself is only about 2.8 GB. 4294967295, is of course 2^32 - 1.
To me this feels like an integer overflow bug, and I am considering to post a bug report. But am wondering if anyone has seen something similar before or if I am doing something stupid.
Having done what I should have started with (reading the specification for the zip64 format specification), it's actually clear that this is not an integer overflow error.
Zip files contain a central directory at the end of the archive; this contains amongst other things the names of the compressed files and the offset of the compressed data in the zip archive. The offset (and file size fields) are only given 4 bytes each in the standard directory field; when the offset is larger than this it should instead be given in the extra fields section and the value in the standard field should be set to 0xFFFFFFFF. Since this is the offset that gets used when reading the file it seems clear that the problem lies in the parsing of the extra field.
I had a look at the source code for R 4.2.1 and it seems that the problem is due to the way the offset specified in the standard offset field is tested:
if(file_info.uncompressed_size == (ZPOS64_T)(unsigned long)-1)
changing this == 0xFFFFFFFF seems to fix the problem.
I've submitted a bug report to R. Hopefully changing the check will not have any unintended consequences and the issue will be fixed.
Still, I'm curious as to whether anyone else has come across the same issue. Seems a bit unlikely that my experience is unique.
I have written a rudimentary HTTP downloader that downloads parts of a file and stores them in temporary, partial files in a directory.
I can use
cat * > outputfilename
To concatenate the partial files together as their order is given by the individual filenames.
However.
The range of each file is something like:
File 1: 0 - 1000
File 2: 1000 - 2000
File 3: 2000 - 3000
For a file that is 3000 bytes in size. i.e. Last Byte overlaps of first byte.
The cat command duplicates the overlapping bytes into a new file.
Specifically, We can see this with images looking wrong
i.e:
(just using an image from imgur)
https://i.imgur.com/XEvBCtp.jpg
Renders with 1/(the number of partial files) of the image correctly.
To note: The original image is 250 KB.
The concatenated image is 287 KB.
I will be implementing this in C99, Unix as a method that calls exec.
I'm not sure where to upload the partial files to assist w/ stackoverflow.
I am working on rocksdb,but unable to get an option which can tell me the maximum limit of size file inside a level ? And if once it reaches that maximum size how files gets split in RocksDB?
The option you are looking for is target_file_size_base and target_file_size_multiplier.
target_file_size_base - configures the size of SST files in level-1.
target_file_size_multiplier - configures the size of SST files in further levels.
For eg : If target_file_size_base is set to 2MB and target_file_size_multiplier is 10,
Level-1 SST files will be 2MB,
Level-2 SST files will be 20MB,
Level-3 SST files will be 200MB and so on..
You can also configure the number of such files in each level using,
max_bytes_for_level_base and max_bytes_for_level_multiplier.
For eg : If max_bytes_for_level_base = 200MB and target_file_size_base = 2MB, then Level-1 will contain 100 files of 2MB each
You can check for these options in options.h and advanced_options.h files.
if once it reaches that maximum size how files gets split in RocksDB
During compaction/flush, the files are created with configured size. If there are more files than the configured number, compaction gets triggered and the files are pushed to higher levels
I am trying to open a file in R, which is binary and written in Fortran. The file is called GlobalLakeDepth.dat and is available at: http://www.flake.igb-berlin.de/gldbv2.tar.gz
The instructions specify that to open GlobalLakeDepth.dat (in Fortran), one would need to do the following:
An example of opening the binary file in FORTRAN90:
-- open(1, file = 'GlobalLakeDepth.dat', form='unformatted', access='direct', recl=2)
An example of reading the binary file in FORTRAN90:
-- read(1,rec=n) LakeDepth
-- where: n - record number, INTEGER(8);
LakeDepth - mean lake depth in decimeters, INTEGER(2).
My question is: Given these instructions in Fortran, how can I open this file in R? That is, is there an 'R way' of doing this?
I've been following the instructions at http://www.ats.ucla.edu/stat/r/faq/read_binary.htm, but, am still not any closer to getting anything from the data file. All I need is the information provided on the measured lake bathemetry for 36 large lakes.
You can use readBin to read a binary file. For this file, I think the correct command is
lk <- readBin("GlobalLakeDepth.dat", n = 43200 * 21600, what = "integer", endian = "little", size = 2)
This makes a very long vector that could be made into a 43200 * 21600 matrix.
I understand that a directory is just a file in unix that contains the inode numbers and names of the files within. How do I take a look at this? I can't use cat or less on a directory, and opening it in vi just shows me a listing of the files...no inode numbers.
Since this is a programming question (it is a programming question, isn't it?), you should check out the opendir, readdir and closedir functions. These are part of the Single UNIX Spec.
#include <sys/types.h>
#include <dirent.h>
DIR *opendir (const char *dirname);
struct dirent *readdir(DIR *dirp);
int closedir(DIR *dirp);
The dirent.h file should have the structure you need, containing at least:
char d_name[] name of entry
ino_t d_ino file serial number
See here for the readdir manpage - it contains links to the others.
Keep in mind that the amount of information about a file stored in the directory entries for it is minimal. The inode itself contains the stuff you get from the stat function, things like times, size, owner, permissions and so on, along with the all-important pointers to the actual file content.
In the old days - Version 7, System III, early System V - you could indeed open a directory and read the contents into memory, especially for the old Unix file system with 2-byte inode numbers and a limit of 14 bytes on the file name.
As more exotic file systems became more prevalent, the opendir(), readdir(), closedir() family of function calls had to be used instead because parsing the contents of a directory became increasingly non-trivial.
Finally, in the last decade or so, it has reached the point where on most systems, you cannot read the directory; you can open it (primarily so operations such as fchdir() can work), and you can use the opendir() family of calls to read it.
It looks like the stat command might be in order. From the article:
stat /etc/passwd
File: `/etc/passwd'
Size: 2911 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 324438 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2008-08-11 05:24:17.000000000 -0400
Modify: 2008-08-03 05:11:05.000000000 -0400
Change: 2008-08-03 05:11:05.000000000 -0400