In GUID Partition Table how can I know how many partitions there are? - guid

I have a image of a USB with 3 partitions:
Partition 1: FAT32
Partition 2: exFAT
Partition 3: NTFS
I am making a program that goes trough the partitions, but I am unsure of how I can know how many partitions my program should look for. By looking at the raw data I can see that it has three partitions as expected, but off course my program doesnt know this.
I tried to look at "80 (0x50) 4 bytes Number of partition entries in array" but in my example it gave me value 128 (0x80000000).
Here are screenshots of hex from my example image.
Protective MBR
Partition table header (LBA 1)
signature=- HexLe=4546492050415254 HexBe=5452415020494645
revisionHexLe=000001 HexLe=4546492050415254 HexBe=5452415020494645
headerSizeDec=92 HexLe=5C000000 HexBe=0000005C
crc2OfHeaderDec=82845332 HexLe=941EF004 HexBe=04F01E94
reservedADec=0 HexLe=00000000 HexBe=00000000
currentLBADec=1 HexLe=0100000000000000 HexBe=0000000000000001
backupLBADec=30277631 HexLe=FFFFCD0100000000 HexBe=0000000001CDFFFF
firstUsableLBAForPartitionsDec=34 HexLe=2200000000000000 HexBe=0000000000000022
lastUsableLBADec=30277598 HexLe=DEFFCD0100000000 HexBe=0000000001CDFFDE
diskGUIDHexMe=8B3F71C5AF9D744D9CA3EBFF7D1F9DC9
startingLBAOfArrayOfPartitionEntriesDec=2 HexLe=0200000000000000 HexBe=0000000000000002
numberOfPartitionEntriesInArrayDec=128 HexLe=80000000 HexBe=00000080
sizeOfASinglePartitionEntryDec=128 HexLe=80000000 HexBe=00000080
crc2OfPartitionEntriesArrayDec=-2043475264 HexLe=C00A3386 HexBe=86330AC0
reservedBDec=00000000 HexLe=00000000 HexBe=00000000
We are going to look for partitions now at offset 1024
Partition entries (LBA 2–33)

Bit late, and you may have already figured it out by now.
Refer following figure from wiki page. Wiki page itself will provide you further information.
It is not possible to determine the number of partitions just by looking at the GUID partition table header at LBA1, you have to examine the partition entries and check whether the Partition type GUID says it is unused (all zeros) or not.
Number of partition entries in the header at offset 80 (=0x50) is the total number of entries as determined by the size of a single partition entry at offset 84.

Related

How can I optimize this query in neo4j?

I have a unidirectional graph.
The structure is as follows:
There are about 20,000 nodes in the graph.
I make the simplest request: MATCH (b1)-[:NEXT_BAR*10]->(b2) RETURN b1.id, b2.id LIMIT 5
The request is processed quickly.
But if I increase the number of relationships, the query takes much longer to process. In other words, the speed depends on the number of relationships.
This request takes longer than 5 minutes to complete: MATCH (b1)-[:NEXT_BAR*10000]->(b2) RETURN b1.id, b2.id LIMIT 5
This is still a simplified version. The request can have more than two nodes and the number of relationships can still be a range.
How can I optimize a query with a large number of relationships?
Perhaps there are other graph DBMS where there is no such problem?
Variable-length relationship queries have exponential time and memory complexity.
If R is the average number of suitable relationships per node, and D is the depth of the search, then the complexity is O(R ** D). This complexity will exist in any DBMS.
The theory is simple here, but there are a couple of intricacies in the query execution.
-[:NEXT_BAR*10000]- matches a path that is precisely 10000 edges in size, so query engine spends some time to find these paths. Another thing to mention is that in (b1)-[...]- >(b2), b1 and b2 are not specific, which means that the query engine has to scall all nodes. If there is a limit, yea, scall all should return a limited number of items. The whole execution also depends on the efficiency of variable-length path implementation.
Some of the following might help:
Is it feasible to start from a specific node?
If there are branches, the only hope is aggressive filtering because of exponential complexity (as cybersam well explained).
Use a smaller number in the variable expand, or a range, e.g., [NEXT_BAR*..10000]. In this case, the query engine will match any path up to 10000 in size (different semantics, but maybe applicable).
* means the DFS type of execution. On the other hand, BFS might be the right approach. Memgraph (DISCLAIMER: I'm the co-founder and CTO) also supports BFS type of execution with filtering lambda.
Here is a Python script I've used to generate and import data into Memgraph. By using small nodes_no you can quickly notice the execution patterns.
import mgclient
# Make a connection to the database.
connection = mgclient.connect(
host='127.0.0.1',
port=7687,
sslmode=mgclient.MG_SSLMODE_REQUIRE)
connection.autocommit = True
cursor = connection.cursor()
# Clean and setup database instance.
cursor.execute("""MATCH (n) DETACH DELETE n;""")
cursor.execute("""CREATE INDEX ON :Node(id);""")
# Import dataset.
nodes_no = 10
# Create nodes.
for identifier in range(0, nodes_no):
cursor.execute("""CREATE (:Node {id: "%s"});""" % identifier)
# Create edges.
for identifier in range(1, nodes_no):
cursor.execute("""
MATCH (start_node:Node {id: "%s"})
MATCH (end_node:Node {id: "%s"})
CREATE (start_node)-[:NEXT_BAR]->(end_node);
""" % (identifier - 1, identifier))

how to predict the IO count of mysql query?

As InnoDB organizes its data in B+ trees. The height of the tree affects the count of IO times which may be one of the main reasons that DB slows down.
So my question is how to predicate or calculate the height of the B+ tree (e.g. based on the count of pages which can be calculated by row size, page size, and row number), and thus to make a decision whether or not to partition the data to different masters.
https://www.percona.com/blog/2009/04/28/the_depth_of_a_b_tree/
Let N be the number of rows in the table.
Let B be the number of keys that fit in one B-tree node.
The depth of the tree is (log N) / (log B).
From the blog:
Let’s put some numbers in there. Say you have a billion rows, and you can currently fit 64 keys in a node. Then the depth of the tree is (log 109)/ log 64 ≈ 30/6 = 5. Now you rebuild the tree with keys half the size and you get log 109 / log 128 ≈ 30/7 = 4.3. Assuming the top 3 levels of the tree are in memory, then you go from 2 disk seeks on average to 1.3 disk seeks on average, for a 35% speedup.
I would also add that usually you don't have to optimize for I/O cost, because the data you use frequently should be in the InnoDB buffer pool, therefore it won't incur any I/O cost to read it. You should size your buffer pool sufficiently to make this true for most reads.
Simpler computation
The quick and dirty answer is log base 100, rounded up. That is, each node in the BTree has about 100 leaf nodes. In some circles, this is called fanout.
1K rows: 2 levels
1M rows: 3 levels
billion: 5 levels
trillion: 6 levels
These numbers work for "average" rows or indexes. Of course, you could have extremes of about 2 or 1000 for the fanout.
Exact depth
You can find the actual depth from some information_schema:
For Oracle's MySQL:
$where = "WHERE ( ( database_name = ? AND table_name = ? )
OR ( database_name = LOWER(?) AND table_name = LOWER(?) ) )";
$sql = "SELECT last_update,
n_rows,
'Data & PK' AS 'Type',
clustered_index_size * 16384 AS Bytes,
ROUND(clustered_index_size * 16384 / n_rows) AS 'Bytes/row',
clustered_index_size AS Pages,
ROUND(n_rows / clustered_index_size) AS 'Rows/page'
FROM mysql.innodb_table_stats
$where
UNION
SELECT last_update,
n_rows,
'Secondary Indexes' AS 'BTrees',
sum_of_other_index_sizes * 16384 AS Bytes,
ROUND(sum_of_other_index_sizes * 16384 / n_rows) AS 'Bytes/row',
sum_of_other_index_sizes AS Pages,
ROUND(n_rows / sum_of_other_index_sizes) AS 'Rows/page'
FROM mysql.innodb_table_stats
$where
AND sum_of_other_index_sizes > 0
";
For Percona:
/* to enable stats:
percona < 5.5: set global userstat_running = 1;
5.5: set global userstat = 1; */
$sql = "SELECT i.INDEX_NAME as Index_Name,
IF(ROWS_READ IS NULL, 'Unused',
IF(ROWS_READ > 2e9, 'Overflow', ROWS_READ)) as Rows_Read
FROM (
SELECT DISTINCT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
FROM information_schema.STATISTICS
) i
LEFT JOIN information_schema.INDEX_STATISTICS s
ON i.TABLE_SCHEMA = s.TABLE_SCHEMA
AND i.TABLE_NAME = s.TABLE_NAME
AND i.INDEX_NAME = s.INDEX_NAME
WHERE i.TABLE_SCHEMA = ?
AND i.TABLE_NAME = ?
ORDER BY IF(i.INDEX_NAME = 'PRIMARY', 0, 1), i.INDEX_NAME";
(Those give more than just the depth.)
PRIMARY refers to the data's BTree. Names like "n_diff_pfx03" refers to the 3rd level of the BTree; the largest such number for a table indicates the total depth.
Row width
As for estimating the width of a row, see Bill's answer. Here's another approach:
Look up the size of each column (INT=4 bytes, use averages for VARs)
Sum those.
Multiply by between 2 and 3 (to allow for overhead of InnoDB)
Divide into 16K to get average number of leaf nodes.
Non-leaf nodes, plus index leaf nodes, are trickier because you need to understand exactly what represents a "row" in such nodes.
(Hence, my simplistic "100 rows per node".)
But who cares?
Here's another simplification that seems to work quite well. Since disk hits are the biggest performance item in queries, you need to "count the disk hits" as the first order of judging the performance of a query.
But look at the caching of blocks in the buffer_pool. A parent node is 100 times as likely to be recently touched as the child node.
So, the simplification is to "assume" that all non-leaf nodes are cached and all leaf nodes need to be fetched from disk. Hence the depth is not nearly as important as how many leaf node blocks are touched. This shoots down your "35% speedup" -- Sure 35% speedup for CPU, but virtually no speedup for I/O. And I/O is the important component.
Note that if you fetching the latest 20 rows of a table that is chronologically stored, they will be found in the last 1 (or maybe 2) blocks. If they are stored by a UUID, it is more likely to tale 20 blocks -- many more disk hits, hence much slower.
Secondary Indexes
The PRIMARY KEY is clustered with the data. That implies that a look by the PK needs to drill down one BTree. But a secondary index is implemented by a second BTree -- drill down it to find the PK, then drill down via the PK. When "counting the disk hits", you need to consider both BTrees. And consider the randomness (eg, for UUIDs) or not (date-ordered).
Writes
Find the block (possibly cached)
Update it
If necessary, deal with a block split
Flag the block as "dirty" in the buffer_pool
Eventually write it back to disk.
Step 1 may involve a read I/O; step 5 may involve a write I/O -- but you are not waiting for it to finish.
Index updates
UNIQUE indexes must be checked before finishing an INSERT. This involves a potentially-cached read I/O.
For a non-unique index, an entry in the "Change buffer" is made. (This lives in the buffer_pool.) Eventually that is merged with the appropriate block on disk. That is, no waiting for I/O when INSERTing a row (at least not waiting to update non-unique indexes).
Corollary: UNIQUE indexes are more costly. But is there really any need for more than 2 such indexes (including the PK)?

Finding Aes256 keys

I have a question about aes keys.
I have a binary file which contains an aes256 key (32 bytes) at an unknown offset.
Would it be somehow possible to find this key in the file? Is it somehow possible to tell whether the next 32 bytes would be a valid aes key?
Thanks in advance
EDIT:
Thanks for all of your answers,
The key is stored in the file as normal bytes.
I finally managed to create a way to get it.
I basically filter out all strings, which actually made it work.
Thanks again
Well, yes and no. AES-256 keys should consist of just 32 bytes that are indistinguishable from random. Most files do not consist of just random bytes, so it could be possible t find a sequence that is most likely random, and this could be that key you are looking for. However, it might very well be that there are other random sequences in the file, or sequences that look like random but aren't random at all (such as the binary representation of the number Pi).
It may also be that you are unlucky and that the AES key doesn't look all that random. Or that the key is stored in hexadecimals (text) rather than binary byte values. Then there is the issue of finding the exact offset that might be the problem (is that initial byte with value 0x20 indicating the size of the AES key, a space character or part of the key value)?
Most files have a specific format, so you should first have a look at that. Just looking for random sequences may give you both false positives (rather likely) or false negatives (less likely). If you expect 64 bytes of randomness (two keys) then I suggest you search for that first, as it brings down the chance of false positives by a rather large amount.
No - unless you have a way to verify the key against a known plaintext/ciphertext pair - an AES key is not distinguishable from random noise. Any set of 16, 24 or 32 bytes is a valid AES key.

What is the name for encoding/encrypting with noise padding?

I want code to render n bits with n + x bits, non-sequentially. I'd Google it but my Google-fu isn't working because I don't know the term for it.
For example, the input value in the first column (2 bits) might be encoded as any of the output values in the comma-delimited second column (4 bits) below:
0 1,2,7,9
1 3,8,12,13
2 0,4,6,11
3 5,10,14,15
My goal is to take a list of integer IDs, and transform them in a way they can still be used for persistent URLs, but that can't be iterated/enumerated sequentially, and where a client cannot determine programmatically if a URL in a search result set has been visited previously without visiting it again.
I would term this process "encoding". You'll see something similar done to permit the use of communications channels that have special symbols that are not permitted in data. Examples: uuencoding and base64 encoding.
That said, you still need to (and appear at first blush to have) ensure that there is only one correct de-code; and accept the increase in size of the output (in the case above, the output will be double the size, bit-for-bit as the input).
I think you'd be better off encrypting the number with a cheap cypher + a constant secret key stored on your server(s), adding a random character or four at the end, and a cheap checksum, and simply reject any responses that don't have a valid checksum.
<encrypt(secret)>
<integer>+<random nonsense>
</encrypt>
+
<checksum()>
<integer>+<random nonsense>
</checksum>
Then decrypt the first part (remember, cheap == fast), validate the ciphertext using the checksum, throw off the random nonsense, and use the integer you stored.
There are probably some cryptographic no-no's here, but let's face it, the cost of this algorithm being broken is a touch on the low side.

Maximizing Stored Information (Entropy?)

So I'm not sure if this question belongs here or maybe Math overflow. In any case, my question is about information theory.
Let's say I have a 16 bit word. There are 65,536 unique configurations of 1's and 0's in that number. What each one of those configurations represents is unimportant as depending on your notation (2's complement vs signed magnitude etc.) the same configuration can mean different things.
What I'm wondering is are there any techniques to store more information than that in a 16 bit word?
My original ideas were like odd/even parity or something but then I realized that's already determined by the configuration... i.e. there is no extra information encoded in that. I'm beginning to wonder if no such thing exists.
EDIT For example, let's say some magical computer (thinking quantum or something here) could understand 0,1,a. Then obviously we have 3^16 configurations and can now store more than the numbers [0 - 65,536]. Are there any other properties of a 16 bit word that you can mess with in order to encode extra information in your bit stream?
EDIT2 I am really struggling to put this into words. Right now when I look at a 16 bit word in the computer, the property which conveys information to me the relative ordering of individual 1's and 0's. Is there another property or way of looking at a 16 bit word which would allow more than 2^16 unique "configurations"? (Note it would no longer be a configuration, but 2^16 xxxx's where xxxx is a noun describing an instance of that property). The only thing I can really think of is something like if we looked at the number of 1 to 0 transitions or something rather than whether each bit was actually a 1 or 0? Now transitions does not yield more than 2^16 combinations because it is ultimately solely dependent on the configuration of 1's and 0's. I'm looking for properties that would derive from the configuration of 1's and 0's AND something else thus resulting in MORE than 2^16. Does anyone even know what this would be called if it did exist?
EDIT3 Ok I got it. My question boils down to this: How do we prove that the configuration of 1's and 0's in a word completely defines it? I.E. How do we prove that you need no other information besides the bitmap to show equality between two 16 bit words?
FINAL EDIT
I have an example... If instead of looking at the presence of 1's and 0's we look at transition between bits we can store 2^16 alphabet characters. If the bit to left is the same, treat it as a 1, if it transitions, treat it as a 0. Using the 16 bit word as a circularly linked list type structure where each link represent 0/1 we basically for a 16 bit word out of the transition between bits. That is an exact example of what I was looking for but that results in 2^16, nothing better. I am convinced that you cannot do better and am marking the correct answer =(
The amount of information in a particular configuration of 16 0/1s is determined by the probability of this configuration (this is called self-information). This can be bigger than 16 bits if the configuration is less likely than 1/(2^16), but that means that some other configurations are more likely than 1/(2^16) and so will contain less information than 16 bits.
To take into account all the possible configurations, you have to use the expected value of self-information (called entropy) of individual configurations. This value will reach its maximum when the probabilities of all configurations are equal (that is 1/(2^16)) and then it will be exactly 16 bits.
So the answer is no, you cannot store more than 16 bits of information in 16 0/1s.
See
http://en.wikipedia.org/wiki/Information_theory
http://en.wikipedia.org/wiki/Self-information
EDIT It is important to realize that bit does not stand for 0 or 1, but it is a unit of information, that is -log_2 P(w) where P(w) is the probability of a particular configuration.
You cannot store more than 2 states in one digit of a semiconductor device. You answered it yourself. The only way more information can be fitted into 16 digits is if each digit were to have many possible values.

Resources