What's behind this GPT header value? - guid

This dump is the output of a dd if=/dev/sda bs=512 | hexdump -C on a 2GiB hard disk (.vdi, on Virtual Box) onto which a GUID Partition Table was written using cfdisk. This is what LBA 1 (the GPT header logical block) looks like:
45 46 49 20 50 41 52 54 | EFI signature
00 00 01 00 | GPT version
5c 00 00 00 | GPT header size
f8 8f 25 0d | CRC32 (header)
00 00 00 00 | reserved
01 00 00 00 00 00 00 00 | current LBA (this is LBA 1)
ff ff 3f 00 00 00 00 00 | backup LBA (last LBA on disk)
00 08 00 00 00 00 00 00 | first LBA available for partitions
de ff 3f 00 00 00 00 00 | last LBA available for partitions
a1 4b 7c df ca 02 95 4c | disk's GUID [1/2]
98 16 bb f0 73 d3 c8 0c | disk's GUID [2/2]
02 00 00 00 00 00 00 00 | partition entries' first LBA
80 00 00 00 | total amount of partition entries
80 00 00 00 | size of a single partition entry
86 d2 54 ab | CRC32 (entries)
00 .. | zeroed out until next LBA
This header states there are 80h (128d) partition entries, each being 128 bits long, so entries start from LBA 2 and span for 16KiB or 32 sectors (512B per sector in this disk), meaning from LBA 02h to LBA 21h.
Why is LBA 800h reported as the first available LBA for partitions instead of LBA 22h, the next after partition entries? Aren't entries and actual partitions stored contiguously on disk?

Well it looks like this is a cfdisk-specific behavior. I wiped out the GPT and wrote it back twice using gdisk and parted, both of which placed the Partition Entry Array's starting point in LEA 22h, as I was expecting. Note however that having the actual partitions start further in the disk is perfectly acceptable, since the UEFI 2.6 standard only dictates they don't start any sooner than LEA 22h.

Related

Is it possible to set X-axis time range when plotting binary data with gnuplot?

I have a binary data file containing a header followed by a bunch of five-byte data records. Each byte is the measurement of a different quantity, all taken at the same time1. I want to display a graph with five lines, showing how each quantity changes over time. The start time of the data file is encoded in some bytes at the beginning of the file, and I know that each record is taken 2 seconds after the previous one, but the records don't have a separate timestamp.
I can get this to plot more-or-less like I want by ignoring the actual time completely; if you set xdata time, then gnuplot does the Right Thing. However, it can't display the Right Time. On the other hand, if you set xrange ["*time1*":"*time2*"], then you get no graphs at all, because gnuplot doesn't know that the un-timestamped records correspond to those times.
Is there a way to tell gnuplot that binary records in a file start with a particular timestamp, so that they get matched up to the correct xrange?
My binary data (50-byte header = 0003 signature, uint16 year, uint8 month, uint8 day, uint8 hour, uint8 minute, uint8 second, other random stuff, then 5-byte records):
00000000 03 00 e3 07 0b 0f 16 22 13 63 97 00 00 fc 78 00 |.......".c....x.|
00000010 00 60 5c 04 00 00 00 00 00 63 00 00 00 00 00 00 |.`\......c......|
00000020 00 00 00 00 00 00 00 00 ff ff ff 00 00 ff ff ff |................|
00000030 00 00 61 3a 00 00 00 61 3a 00 00 00 61 3b 00 00 |..a:...a:...a;..|
00000040 00 61 3b 00 00 00 60 3c 00 00 00 60 3c 00 00 00 |.a;...`<...`<...|
00000050 60 3c 00 00 00 60 3c 00 00 00 60 3d 00 00 00 60 |`<...`<...`=...`|
00000060 3d 00 00 00 60 3c 00 00 00 60 3c 00 00 00 61 3e |=...`<...`<...a>|
00000070 00 00 00 61 3e 00 00 00 61 3f 00 00 00 61 3f 00 |...a>...a?...a?.|
00000080 00 00 61 3f 00 02 00 61 3f 00 02 00 61 3f 00 01 |..a?...a?...a?..|
00000090 00 61 3f 00 01 00 61 3f 00 00 00 61 3f 00 00 00 |.a?...a?...a?...|
000000a0 61 3f 00 00 00 61 3f 00 00 00 61 41 00 01 00 61 |a?...a?...aA...a|
000000b0 41 00 01 00 60 42 00 00 00 60 42 00 00 00 60 40 |A...`B...`B...`#|
000000c0 00 00 00 60 40 00 00 00 60 3d 00 00 00 60 3d 00 |...`#...`=...`=.|
000000d0 00 00 60 3c 00 00 00 60 3c 00 00 00 60 3c 00 00 |..`<...`<...`<..|
000000e0 00 60 3c 00 00 00 60 3c 00 00 00 60 3c 00 00 00 |.`<...`<...`<...|
000000f0 60 3d 00 00 00 60 3d 00 00 00 60 3e 00 00 00 60 |`=...`=...`>...`|
00000100 3e 00 00 00 60 3e 00 00 00 60 3e 00 00 00 60 3f |>...`>...`>...`?|
00000110 00 00 00 60 3f 00 00 00 60 40 00 00 00 60 40 00 |...`?...`#...`#.|
00000120 00 00 60 40 00 05 00 60 40 00 05 00 60 40 00 04 |..`#...`#...`#..|
00000130 00 60 40 00 04 00 60 3f 00 00 00 60 3f 00 00 00 |.`#...`?...`?...|
00000140 60 40 00 00 00 60 40 00 00 00 60 3f 00 00 00 60 |`#...`#...`?...`|
00000150 3f 00 00 00 60 3e 00 00 00 60 3e 00 00 00 60 3e |?...`>...`>...`>|
00000160 00 0f 00 60 3e 00 0f 00 60 3c 00 00 00 60 3c 00 |...`>...`<...`<.|
00000170 00 00 60 3d 00 00 00 60 3d 00 00 00 60 3e 00 01 |..`=...`=...`>..|
00000180 00 60 3e 00 01 00 60 3e 00 18 00 60 3e 00 18 00 |.`>...`>...`>...|
00000190 60 3e 00 02 00 60 3e 00 02 00 60 3e 00 03 00 60 |`>...`>...`>...`|
My gnuplot script:
#!/usr/bin/gnuplot
fn="20191115223419"
set title "Heart Rate, O2, Motion"
set xlabel "Time"
set xdata time
set timefmt "%Y%m%d%H%M%S"
set xrange ["20191115223419":"20191116071015"]
set format x "%H:%M"
set yrange [40:120] # Heart rate
set y2range [50:100] # O2 saturation
plot \
fn binary skip=0x32 format="%5uint8" using 0:2 with lines lt rgb "red" title "HR", \
fn binary skip=0x32 format="%5uint8" using 0:(($4)/4+40) with lines lt rgb "orange" title "Motion", \
fn binary skip=0x32 format="%5uint8" using 0:1 with lines lt rgb "gray60" title "O2" axes x1y2
pause -1 "Hit <ENTER> to continue..."
It plots as desired if I comment out the xrange line.
I know I could easily write a script to munge the data into better form, and perhaps that's the thing to do: Python it into a better string, then gnuplot that. But I'd be delighted if I could just tell gnuplot alone to do what I want.
1The 5 bytes are actually O2 saturation, pulse rate, dummy (always zero), motion sensor, and another dummy byte. My gnuplot script only plots the 3 non-dummy values.
Not actually tested since I don't have your data file...
You don't need set xdata time for this.
I'll leave extracting the start time as a separate problem and assume you have it in some usable form at the time you run the script. The incremental time for each record is 2 seconds, so the full time at each point is start + 2 * $0.
start = strptime("%Y%m%d%H%M%S", "20191115223419") # start time in seconds
set xtics time format "%H%M%S"
set xrange [start : start + whatever]
plot \
fn binary skip=0x32 format="%5uint8" using (start+2*$0):2 with lines lt rgb "red" title "HR", \
fn binary skip=0x32 format="%5uint8" using (start+2*$0):(($4)/4+40) with lines lt rgb "orange" title "Motion", \
fn binary skip=0x32 format="%5uint8" using (start+2*$0):1 with lines lt rgb "gray60" title "O2" axes x1y2
Here's a complete gnuplot script to extract the Y/M/D H:M:S data from the binary file and plot the rest of the data, plus a sample image.
#!/usr/bin/gnuplot
#
# This script plots data from a Viatom pulse oximeter (www.viatomtech.com). The
# Android app extracts recorded data and stores it in an Sqlite3 database, and also
# creates files under /sdcard/PlusebitO2CN with names based on the recording start
# date & time. These are binary files with a two-byte "signature" of 0x0003; 7 bytes
# of start date/time as uint16:year, uint8 month, day, hour, minute, second; 41 bytes
# of something else ??? and then 5-byte records of Oxygenation, Heart Rate, dummy-1,
# motion-sensor, dummy-2.
#
# Put file name here...everything else is automagic.
#
fn="20191115223419"
#
####################################################
set title "Heart Rate, O2, Motion"
set xlabel "Time"
#
# Get start time from binary file by fake-plotting. This will
# print a warning message; I don't know how to get rid of it.
#
set term unknown
plot fn binary skip=2 format="%uint16%5uint8" every 1:1:0:0:0:0 \
using (year=$1):(month=$2):(day=$3):(hour=$4):(minute=$5):(second=$6):1 with candlesticks
# Now put the values we extracted into a single 'YYYYmmddHHMMSS' string
sd=sprintf("%04d%d%d%d%d%d", year, month, day, hour, minute, second )
set term qt
set xdata time
start=strptime("%Y%m%d%H%M%S", sd ) # sd contains YYYYmmddHHMMSS extracted from file
set xtics time format "%H%M%S"
#set timefmt "%Y%m%d%H%M%S"
#set xrange ["20191115223419":"20191116071015"]
set format x "%H:%M"
set yrange [40:120] # Heart rate
set y2range [50:100] # O2 saturation
plot \
fn binary skip=0x32 format="%5uint8" using (start+2*$0):2 with lines lt rgb "red" title "HR", \
fn binary skip=0x32 format="%5uint8" using (start+2*$0):(($4)/4+40) with lines lt rgb "orange" title "Motion", \
fn binary skip=0x32 format="%5uint8" using (start+2*$0):1 with lines lt rgb "gray60" title "O2" axes x1y2
pause -1 "Hit <ENTER> to continue..."
And the result:

Serial UART interceptty capture. Can't make heads or tails of it

I'm working on understanding a mystery protocol in a DLP 3d printer. A raspberry pi is talking to a motor/led controller via a serial bus. The device seems to be proprietary but I'm guessing it uses some kind of open standard (like GCode). It may help to know the device was probably made and programmed in china. No idea if this factors but there may be some programming cultural thing I'm missing. I'm trying to figure out how to control this motor/led control board via the serial port so I captured data using interceptty.
This seems to be an idle state sent to the mystery device from the pi.
55 55 55 55 00 08 00 00 00 00 00 00 00 00 00 00 00 00 00 08 aa aa aa aa
This tends to be how the mystery device acknowledges
55 55 55 55 00 03 00 00 00 00 00 00 00 00 00 00 00 00 01 04 aa aa aa aa
55 55 55 55 00 03 00 00 00 00 00 00 00 00 00 00 00 00 01 04 aa aa aa aa
This seems to be noting an idle state always after acknowledgement from the mystery device.
55 55 55 55 00 03 00 00 00 00 00 00 00 00 00 00 00 00 55 58 aa aa aa aa
This is a command that started the print off. So this begins moving a motor.
55 55 55 55 00 03 e8 03 00 00 40 0d 03 00 01 00 00 00 00 3f aa aa aa aa
For reference 55555555 and aaaaaaaa are 01010101 etc in binary. They seem to be a way to clear coms for async serial transmission. It certainly LOOKS like I'm seeing extremely low level communication. As if I hooked a logic analyzer up to the circuit.
There are 16 hex bytes in between each of these clearing/syncing steps. I'm not sure if I'm just seeing VERY low level communication or if these 16 bytes contain all of the data in any given command or the data plus check bytes or something.
Finally, I'm seeing LOTS of repetition. This leads me to think that this isn't Gcode but that the pi is sending a command every cycle and the slave/mystery device is updating as quickly as possible.
For example the output below repeats over and over 1145 times after starting a print. This would be when the motor has descended fully into a vat and an LED is held on for an extended period of time. > denotes received transmissions < denotes outgoing transmissions from the pi.
> 55 55 55 55 00 03 00 00 00 00 | UUUU
> 00 00 00 00 00 00 00 00 01 04 |
> aa aa aa aa 55 55 55 55 00 03 | UUUU
> 00 00 00 00 00 00 00 00 00 00 |
> 00 00 01 04 aa aa aa aa |
< 55 55 55 55 00 03 20 03 00 00 | UUUU
< 78 5d 02 00 00 00 00 00 00 fd | x]
< aa aa aa aa |
I'm hoping to get some direction. None of this hex seems to translate well into ascii or utf. I don't think it's passing ints or chars. Maybe it's backwards bitwise? I'm not sure. I'm having lots of trouble making heads or tails of it.
What level is UUUU and aaaa at? It seems like something you'd see on a logic analyzer not from through a driver.
Anyway, any direction would be much appreciated.

Can anyone advise how I can get the TCP / IP packet checksums verified?

I have a packet that I have manually created for a SYN/ACK but I get no reply from the server.
This is all wireless/GSM stuff so I cannot use a sniffer.
I have calculated the TCP and the IP header checksums manually a few times and they seem correct but I really need a 3rd party method to be sure.
I had several endian issues but I think I have it right now. But who knows...
I only found an online parser but it does not test/verify the checksums.
Does anyone have an easy idea for me?
Just in case someone has suitable access to a test method, and feels like pasting it in for me, here is the packet:
45 10 00 3C 00 02 00 00 64 06 E8 1F 0A AA 61 43 51 8A B1 13
01 BB 01 BB 00 00 00 0A 00 00 00 00 50 02 00 00 3D D8 00 00
Regards
berntd
I've creating a pcap from your hex data using Net::PcapWriter:
use strict;
use warnings;
use Net::PcapWriter;
my $w = Net::PcapWriter->new('test.pcap');
my $ip = pack('H*','4510003C000200006406E81F0AAA6143518AB11301BB01BB0000000A00000000500200003DD80000');
$w->packet($w->layer2prefix('1.1.1.1').$ip);
Loading it into Wireshark shows both the IP checksum and the TCP checksum as correct, so it is probably not a problem of the checksum calculation.
But tcpdump says that the length is wrong:
IP truncated-ip - 20 bytes missing! 10.170.97.67.443 > 81.138.177.19.443: Flags [S], seq 10:30, win 0, length 20
This is because you've set the total length in the IP header to 60 bytes (00 3C) but the IP header + TCP header is only 40 bytes in total and your packet does not have any payload, i.e. the total length should be 40 and not 60 bytes.
Here is what I came up with to do it the manual way:
Put packet into a text file like so:
45 10 00 3C 00 02 00 00 64 06 E8 1F 0A AA 61 43 51 8A B1 13
01 BB 01 BB 00 00 00 0A 00 00 00 00 50 02 00 00 3D D8 00 00
add addressing offsets and group into 16 byte lines as in a hex dump:
000000 45 10 00 3C 00 02 00 00 64 06 E8 1F 0A AA 61 43
000010 51 8A B1 13 01 BB 01 BB 00 00 00 0A 00 00 00 00
000020 50 02 00 00 3D D8 00 00
Save it (source).
Now run ext2pcap.exe -e 0x800 source dest
The dest file can now be imported as a PCAP file into wireshark for decoding.
Multiple packets can be converted byt starting the address offset for each new packet at 000000 again in the source file.
text2pcap.exe seems to come with wireshark.
Tedious but works.
Cheers

quickly load a subset of rows from data.frame saved with `saveRDS()`

With a large file (1GB) created by saving a large data.frame (or data.table) is it possible to very quickly load a small subset of rows from that file?
(Extra for clarity: I mean something as fast as mmap, i.e. the runtime should be approximately proportional to the amount of memory extracted, but constant in the size of the total dataset. "Skipping data" should have essentially zero cost. This can be very easy, or impossible, or something in between, depending on the serialiization format. )
I hope that the R serialization format makes it easy to skip forward through the file to the relevant portions of the file.
Am I right in assuming that this would be impossible with a compressed file, simply because gzip requires to uncompress everything from the beginning?
saveRDS(object, file = "", ascii = FALSE, version = NULL,
compress = TRUE, refhook = NULL)
But I'm hoping binary (ascii=F) uncompressed (compress=F) might allow something like this. Use mmap on the file, then quickly skip to the rows and columns of interest?
I'm hoping it has already been done, or there is another format (reasonably space efficient) that allows this and is well-supported in R.
I've used things like gdbm (from Python) and even implemented a custom system in Rcpp for a specific data structure, but I'm not satisfied with any of this.
After posting this, I worked a bit with the package ff (CRAN) and am very impressed with it (not much support for character vectors though).
Am I right in assuming that this would be impossible with a compressed
file, simply because gzip requires to uncompress everything from the
beginning?
Indeed, for a short explanation let's take some dummy method as starting point:
AAAAVVBABBBC gzip would do something like: 4A2VBA3BC
Obviously you can't extract all A from the file without reading it all as you can't guess if there's an A at end or not.
For the other question "Loading part of a saved file" I can't see a solution on top of my head. You probably can with write.csv and read.csv (or fwrite and fread from the data.table package) with skipand nrows parameters could be an alternative.
By all means, using any function on a file already read would mean loading the whole file in memory before filtering, which is no more time than reading the file and then subsetting from memory.
You may craft something in Rcpp, taking advantage of streams for reading data without loading them in memory, but reading and parsing each entry before deciding if it should be kept or not won't give you a real better throughput.
saveDRS will save a serialized version of the datas, example:
> myvector <- c("1","2","3").
> serialize(myvector,NULL)
[1] 58 0a 00 00 00 02 00 03 02 03 00 02 03 00 00 00 00 10 00 00 00 03 00 04 00 09 00 00 00 01 31 00 04 00 09 00 00 00 01 32 00 04 00 09 00 00
[47] 00 01 33
It is of course parsable, but means reading byte per byte according to the format.
On the other hand, you could write as csv (or write.table for more complex data) and use an external tool before reading, something along the line:
z <- tempfile()
write.table(df, z, row.names = FALSE)
shortdf <- read.table(text= system( command = paste0( "awk 'NR > 5 && NR < 10 { print }'" ,z) ) )
You'll need a linux system with awk wich is able to parse millions of lines in a few milliseconds, or to use a windows compiled version of awk obviously.
Main advantage is that awk is able to filter on a regex or some other conditions each line of data.
Complement for case of data.frame, a data.frame is more or less a list of vectors (simple case), this list will be saved sequentially so if we have a dataframe like:
> str(ex)
'data.frame': 3 obs. of 2 variables:
$ a: chr "one" "five" "Whatever"
$ b: num 1 2 3
It's serialization is:
> serialize(ex,NULL)
[1] 58 0a 00 00 00 02 00 03 02 03 00 02 03 00 00 00 03 13 00 00 00 02 00 00 00 10 00 00 00 03 00 04 00 09 00 00 00 03 6f 6e 65 00 04 00 09 00
[47] 00 00 04 66 69 76 65 00 04 00 09 00 00 00 08 57 68 61 74 65 76 65 72 00 00 00 0e 00 00 00 03 3f f0 00 00 00 00 00 00 40 00 00 00 00 00 00
[93] 00 40 08 00 00 00 00 00 00 00 00 04 02 00 00 00 01 00 04 00 09 00 00 00 05 6e 61 6d 65 73 00 00 00 10 00 00 00 02 00 04 00 09 00 00 00 01
[139] 61 00 04 00 09 00 00 00 01 62 00 00 04 02 00 00 00 01 00 04 00 09 00 00 00 09 72 6f 77 2e 6e 61 6d 65 73 00 00 00 0d 00 00 00 02 80 00 00
[185] 00 ff ff ff fd 00 00 04 02 00 00 00 01 00 04 00 09 00 00 00 05 63 6c 61 73 73 00 00 00 10 00 00 00 01 00 04 00 09 00 00 00 0a 64 61 74 61
[231] 2e 66 72 61 6d 65 00 00 00 fe
Translated to ascii for an idea:
X
one five Whatever?ð## names a b row.names
ÿÿÿý class
data.frameþ
We have the header of the file, the the header of the list, then each vector composing the list, as we have no clue on how much size the character vector will take we can't skip to arbitrary datas, we have to parse each header (the bytes just before the text data give it's length). Even worse now to get the corresponding integers, we have to go to the integer vector header, which can't be determined without parsing each character header and summing them.
So in my opinion, crafting something is possible but will probably not be really much quicker than reading all the object and will be brittle to the save format (as R has already 3 formats to save objects).
Some reference here
Same view as the serialize output in ascii format (more readable to get how it is organized):
> write(rawToChar(serialize(ex,NULL,ascii=TRUE)),"")
A
2
197123
131840
787
2
16
3
262153
3
one
262153
4
five
262153
8
Whatever
14
3
1
2
3
1026
1
262153
5
names
16
2
262153
1
a
262153
1
b
1026
1
262153
9
row.names
13
2
NA
-3
1026
1
262153
5
class
16
1
262153
10
data.frame
254

Parsing SQLite Database Schema in sqlitedb file?

I wrote a program for parsing SQLite file, i can parse all data from b-tree pages to record, column & values but i need to parse schema of tables, i found something like database schema stored in page 1 (root page) and i can see it with Hex Editor, and i found structure of sqlite_master, i read it exact as explain in http://sqlite.org/fileformat2.html
I want to know how can i found the first byte of sqlite_master table in db file, how can i detect starting byte of schema? is there anything related in SQLite DB Header?
Edit 1 (more info):
For example:
i opened sqlite db with hex editor, (if you check my page size is 4096 bytes and i marked page header in image):
i marked root page header that start with 05 means the page is an interior table b-tree page and please check B-tree Page Header Format (http://sqlite.org/fileformat2.html) and its have 5 cells that you can see it with this cell pointers array: 0FFB, 0FF6, 0FF1, 0FEC, 0FE7 (that start after ending header) and all cells have 5 bytes and start from 0FE7 then the schema that you can see it in picture ( in text part ) start from 232~240 and i check other dbs and schema in different place...
Edit 2:
You can download Example File from https://www.dropbox.com/s/lanky02kneyb74w/31bb7ba8914766d4ba40d6dfb6113c8b614be442
Edit 3:
In my file you can see
$ hexdump -C 31bb7ba8914766d4ba40d6dfb6113c8b614be442
00000000 53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00 |SQLite format 3.|
00000010 10 00 02 02 00 40 20 20 00 00 00 02 00 00 00 3f |.....# .......?|
00000020 00 00 00 00 00 00 00 00 00 00 00 47 00 00 00 04 |...........G....|
00000030 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 |................|
00000060 00 2d e2 25 05 00 00 00 05 0f e7 00 00 00 00 3d |.-.%...........=|
00000070 0f fb 0f f6 0f f1 0f ec 0f e7 08 7f 07 9d 08 3c |...............<|
00000080 07 01 06 22 05 92 04 fe 03 fc 04 c1 03 4d 02 b8 |...".........M..|
00000090 02 0a 02 75 01 32 01 c7 00 e9 00 e9 00 00 00 00 |...u.2..........|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000000e0 00 00 00 00 00 00 00 00 00 47 18 06 17 5b 35 01 |.........G...[5.|
000000f0 00 69 6e 64 65 78 73 71 6c 69 74 65 5f 61 75 74 |.indexsqlite_aut|
00000100 6f 69 6e 64 65 78 5f 41 42 4d 75 6c 74 69 56 61 |oindex_ABMultiVa|
00000110 6c 75 65 45 6e 74 72 79 4b 65 79 5f 31 41 42 4d |lueEntryKey_1ABM|
00000120 75 6c 74 69 56 61 6c 75 65 45 6e 74 72 79 4b 65 |ultiValueEntryKe|
Page Header ( offset 64)
05 <- interior table b-tree page
0000 <- Byte offset into the page of the first freeblock
0005 <- Number of cells on this page
0FE7 <- Offset to the first byte of the cell content area
00 <- Number of fragmented free bytes
0000003D (61) <- The right-most pointer
Cell Array Pointers & Cell Contents:
(Table Interior Cell Format)
Cell Pointer| Page number of left child | Rowid
------------|---------------------------|-------
0FFB | 0000001A (26) | 15
0FF6 | 0000001C (28) | 2D
0FF1 | 00000031 (49) | 3C
0FEC | 00000039 (57) | 48
0FE7 | 0000003C (60) | 4C <- equal to (Offset to the first byte of the cell content area) in page header
I realize your question was asked over a year ago and you probably resolved it, but I would like to submit an answer in case anyone else has this same question. I was in the same situation as you, Mehdi. I wanted to read a SQLite database file, and was looking for the master table / schema. It appeared to be in page 1, but the header was not pointing to it. There were two reasons for my confusion.
(1) There was a lot of "dead" data in my SQLite database file that was not being used. I believe as the database was created and grew, the location of the actual active data moved, and the old location was not overwritten with zeros. Doing a search for some of the "CREATE TABLE" statements found multiple results in different locations of the file. I later determined the actual schema was split up and located on pages 18, 10, and 8 (which the page 1 interior table pointed to). I would have detected this earlier, if not for reason #2.
(2) I had miscalculated the byte position of the page number, which confused me. Where p = page #, and s = page size, I thought it was [p * s] .... but actually it's [(p-1) * s] (except for page 1 which starts at byte 100). In other words, I thought the page numbering started at zero instead of 1.
As an additional note, I believe the http://sqlite.org/fileformat2.html page is missing some vital info. Specifically, it doesn't explain where the "root page" number is in the schema table (it's in field 4). I couldn't find this information on the sqlite.org page.
The documentation you linked to says in section 2.6:
Page 1 of a database file is the root page of a table b-tree that holds a special table named "sqlite_master"
and in section 1.5:
A b-tree page is divided into regions in the following order:
The 100-byte database file header (found on page 1 only)
The 8 or 12 byte b-tree page header …
For example, with this database:
$ sqlite3 test.db "create table hello(world);"
$ hexdump -C test.db
00000000 53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00 |SQLite format 3.|
00000010 04 00 01 01 00 40 20 20 00 00 00 01 00 00 00 02 |.....# ........|
00000020 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 04 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |................|
00000060 00 2d e6 03 0d 00 00 00 01 03 cf 00 03 cf 00 00 |.-æ.......Ï..Ï..|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000003c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2f |.............../|
000003d0 01 06 17 17 17 01 3f 74 61 62 6c 65 68 65 6c 6c |......?tablehell|
000003e0 6f 68 65 6c 6c 6f 02 43 52 45 41 54 45 20 54 41 |ohello.CREATE TA|
000003f0 42 4c 45 20 68 65 6c 6c 6f 28 77 6f 72 6c 64 29 |BLE hello(world)|
00000400 0d 00 00 00 00 04 00 00 00 00 00 00 00 00 00 00 |................|
00000410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
... the page header at offset 0x64 has these values:
0d: page is a leaf table b-tree page
0000: freeblock offset
0001: number of cells
03cf: offset of cell content
00: fragmented free bytes
03cf: first cell pointer
And at offset 3cf, you have a standard table b-tree leaf cell, containing the only row of the sqlite_master table:
sqlite> select * from sqlite_master;
type name tbl_name rootpage sql
---------- ---------- ---------- ---------- -------------------------
table hello hello 2 CREATE TABLE hello(world)

Resources