Converting virtual address to page table entry - unix

I am reading Modern Operating Systems 3rd Edition by A.S Tanenbaum, and I've come to the chapter on virtual memory management. I've been stuck on a part for some time now, and I can't get my head around it. Either, it's a typo in the book, or I have misunderstood something.
Suppose we have a multi-level page table with two levels, where we map 32-bit virtual addresses to physical memory frames.
The fields for the page tables and the offset is
10 | 10 | 12
meaning we have a top level page table with 1024 entries, and a page size of 4096 bytes or 4KB. Entry 0 of the top level page table points to the text segment page table, entry 1 to the data segment, and the 1023 entry to the stack page table.
This is quoted from the book:
As an example, consider the 32-bit virtual address 0x00403004
(4,206,596 decimal), which is 12,292 bytes into the data. This virtual
address corresponds to PT 1 = 1, PT2 = 2, and Offset = 4. The MMU
first uses PT1 to index into the top-level page table and obtain entry
1, which corresponds to addresses 4M to 8M. It then uses PT2 to index
into the second-level page table just found and extract entry 3, which
corresponds to addresses 12288 to 16383 within its 4M chunk (i.e.,
absolute addresses 4,206,592 to 4,210,687).
When I translate the address to binary, I get 0000000001|0000000011|000000000100 which for me corresponds to PT1 = 1, PT2 = 3, Offset = 4.
Am I missing something, or is this a typo in the book stating PT2 = 2, when it actually should be PT2 = 3? As the text later says, the MMU uses the PT2 index to extract entry 3.
Where does the "12,292 bytes into the data" come from? How is that derived from the virtual address? I understand it has something to do with the offset, but I can't figure out how it's done. As far as I have understood, the physical address is derived as a combination of the frame number from the second page table, and the offset. I see that the 12,292 is a result of 3*4096+4 (PT2 entry * page size + offset). Is this correct?

1) I think this is a typo indeed, as 0000000001|0000000011|000000000100 binary = 4,206,596 decimal
2) The response is in the previous paragraph of the book :
Entry 0 of the top-level page table points to the page table for the program text, entry 1 points to the page table for the data, and entry 1023 points to the page table for the stack
So he is just saying that 0000000001|0000000011|000000000100 corresponds to 0000000011|000000000100 into the data. Indeed 0000000001 corresponds to the data in the top level page table, and 0000000011|000000000100 binary = 12,292 decimal.

I didn't finish converting to binary out of despair and a little bit laziness but its clearer now thanks to you. Seeing it made things much more clear.
it is a typo as the previous answerer stated. its without a doubt (Check US 4th edition)
"12,292 bytes into the data" comes from substracting 222 (4MB) (Toplevel Page table entry size=4MB) from 4206596
TLDR: 4206596 - 222 (4MB) = 12,292

Related

Wiegand card numbers are seen different on Hikvision and ZKteco

I have a problem with two access control panels, one is Hikvision and the other one is a ZKTeco CCA-400, those two panels see the Wiegand card in a different way, this is a big problem because I cannot import cards from ZKteco to Hikvision or the other way around.
Currently I have a card that is physically labeled with the following:
0002821060 043,03012
Hikvision panel sees the card as: 2821060
ZKTeco panel sees the card as: 04303012
My final goal is to understand why is this happening and build a custom Wiegand rule on the Hikvision in order to transform the card id's to be seen identical by both panels.
I searched and couldn't figure it out, so in my pursuit to debug this issue I connected a Wiegand reader to a Arduino UNO just to see that is coming on the wire from the reader, the results just made the problem even confusing:
I tried to Wiegand libraries:
https://github.com/paulo-raca/YetAnotherArduinoWiegandLibrary
and
https://github.com/monkeyboard/Wiegand-Protocol-Library-for-Arduino
Surprise!
The first library sees the card as:
Read 26 bits. 0001010110000101111000100100000000
FC = 43, CC = 3012
This is exactly what the ZKTeco panel sees.
The second library sees the card as:
Card readed: 24bits / 2B0BC4
That in decimal is 2821060, exactly what the Hikvision is seeing.
Can anyone explain to me why this is happening ? From reading the docs of the protocol is pretty straight forward and should not really have two independent ID's.
Hopefully I managed to explain the issue in a good way.
Thanks!
It sounds like the difference in what you are seeing is the two parity bits. Each half of the number encoded into the card has its own parity bit, with one half odd parity, and the other half even parity. In addition to detecting read errors, these two bits allow detection of use of a Wiegand card normally vs. upside-down.
You might check that by determining the reaction of the two devices to running the card through with the front side toward the back. My guess would be the one that only reports 24-bits may ignore reversed reads, but the other might report a different number (with bits reversed from the first one.)
I worked for Kastle Systems on what was probably the first commercial application of Wiegand cards for security almost 40 years ago. The parity scheme was similar to that used on UPC barcode readers. I see there are still documents out on the web describing the, Wiegand Kastle format 32-bit format, which looks like it may be helpful to you.
I managed to sort this out, it seems that ZKTeco and Hikvision handles the conversion from HEX to DEC in a different way, that's the reason there are two different numbers on the card.
So it goes like this, we have a card that has physically printed the following sequence of numbers: 0002821060 043,03012
We convert 2821060 to HEX = 2B0BC4 ( This is what ZKTeco sees )
For Hikvision:
We convert 2B to DEC = 43
We convert 0BC4 to DEC = 3012
The result decimal number is 43 3012, pretty close to what the access panel sees. Now, we have to pad it so it has 8 digits like this:
04303012
If the first bits are <100 in decimal we have to add a 0 in front.
We also need to pad the rest of the bits so the second part reaches a length of 5 digits.
Conclusion:
Hikvision correctly converts the card to Wiegand 26 format ( facility code + card id ), ZKTeco instead converts the entire card number to decimal directly without splitting the facility id / card id.
Hopefully this will be helpful to other people having to deal with this type of access control panels.
I wrote a fragment of messy code that will convert a ZK exported personel file to Hikvision card format.
import sys
def convert(dec):
h = hex(dec)
h = h[2:]
#print "HEX {}".format(h)
first = h[:2]
first = int(h[:2], 16)
if first < 100:
first = "0{}".format(first)
second = int(h[2:6],16)
first = str(first)
second = str(second)
if len(first)+len(second) == 8:
final = "{}{}".format(first,second)
else:
final = "{}0{}".format(first,second)
#print "HIK {}".format(final)
#print "ZK {}".format(int(h,16))
return str(final)
pid=1
with open("zk.csv") as f:
lis = [line.split(",") for line in f]
for i, x in enumerate(lis):
persid = pid
if x[1] == "":
name = "Fara nume"
else:
name = "{} {}".format(x[1], x[2])
cardid = convert(int(x[3]))
if cardid[:1] == "0":
cardid = "'{}".format(cardid)
print "{},NAN,{},1,,,,,{},,".format(pid, name, cardid)
pid+=1
This is the only post on internet I am finding on this topic. I report similar issue. Please do not delete as was done previously because this might help someone.
What I did is convert serialized RFID to site+cardID (hikvision way) with Excel:
M88 is DEC2HEX(L88) (M88 is the Hex of the serialized card number)
first 3 decimal digits
=IF(LEN(HEX2DEC(MID(M88;1;2)))=1;"00"&HEX2DEC(MID(M88;1;2));IF(LEN(HEX2DEC(MID(M88;1;2)))=2;"0"&HEX2DEC(MID(M88;1;2));HEX2DEC(MID(M88;1;2))))
last 5 decimal digits
=IF(LEN(HEX2DEC(MID(M88;3;4)))<5;"0"&HEX2DEC(MID(M88;3;4));HEX2DEC(MID(M88;3;4)))
This is a good calculator I've found but not suitable for hundreds of lines of RFIDs
https://btrockford.com/security/card-access-control/proximity-card-calculator/
Probably there is an option to use custom Wiegand rules to modify the hikvision default behavior (I didn't have success with that till now).

In GUID Partition Table how can I know how many partitions there are?

I have a image of a USB with 3 partitions:
Partition 1: FAT32
Partition 2: exFAT
Partition 3: NTFS
I am making a program that goes trough the partitions, but I am unsure of how I can know how many partitions my program should look for. By looking at the raw data I can see that it has three partitions as expected, but off course my program doesnt know this.
I tried to look at "80 (0x50) 4 bytes Number of partition entries in array" but in my example it gave me value 128 (0x80000000).
Here are screenshots of hex from my example image.
Protective MBR
Partition table header (LBA 1)
signature=- HexLe=4546492050415254 HexBe=5452415020494645
revisionHexLe=000001 HexLe=4546492050415254 HexBe=5452415020494645
headerSizeDec=92 HexLe=5C000000 HexBe=0000005C
crc2OfHeaderDec=82845332 HexLe=941EF004 HexBe=04F01E94
reservedADec=0 HexLe=00000000 HexBe=00000000
currentLBADec=1 HexLe=0100000000000000 HexBe=0000000000000001
backupLBADec=30277631 HexLe=FFFFCD0100000000 HexBe=0000000001CDFFFF
firstUsableLBAForPartitionsDec=34 HexLe=2200000000000000 HexBe=0000000000000022
lastUsableLBADec=30277598 HexLe=DEFFCD0100000000 HexBe=0000000001CDFFDE
diskGUIDHexMe=8B3F71C5AF9D744D9CA3EBFF7D1F9DC9
startingLBAOfArrayOfPartitionEntriesDec=2 HexLe=0200000000000000 HexBe=0000000000000002
numberOfPartitionEntriesInArrayDec=128 HexLe=80000000 HexBe=00000080
sizeOfASinglePartitionEntryDec=128 HexLe=80000000 HexBe=00000080
crc2OfPartitionEntriesArrayDec=-2043475264 HexLe=C00A3386 HexBe=86330AC0
reservedBDec=00000000 HexLe=00000000 HexBe=00000000
We are going to look for partitions now at offset 1024
Partition entries (LBA 2–33)
Bit late, and you may have already figured it out by now.
Refer following figure from wiki page. Wiki page itself will provide you further information.
It is not possible to determine the number of partitions just by looking at the GUID partition table header at LBA1, you have to examine the partition entries and check whether the Partition type GUID says it is unused (all zeros) or not.
Number of partition entries in the header at offset 80 (=0x50) is the total number of entries as determined by the size of a single partition entry at offset 84.

16 bit logical address to 16 bit physical address

I'm studying for finals. I am studying this question:
Convert the following 16-bit logical address into the 16-bit physical address given
the 6-bit page number and 10-bit offset. Use the supplied process page table.
Logical Address 0000010111011110.
How do I calculate the physical address.
My professor gave us the answer = 0001100111011110 but I do not know how she calculate this.
Thanks.
Take the top 6 bits, and use its value as an index into your process page table. In this case, the top 6 bits evaluate to 1, so you replace those bits with the value in entry 1: 000110.

What is the name for encoding/encrypting with noise padding?

I want code to render n bits with n + x bits, non-sequentially. I'd Google it but my Google-fu isn't working because I don't know the term for it.
For example, the input value in the first column (2 bits) might be encoded as any of the output values in the comma-delimited second column (4 bits) below:
0 1,2,7,9
1 3,8,12,13
2 0,4,6,11
3 5,10,14,15
My goal is to take a list of integer IDs, and transform them in a way they can still be used for persistent URLs, but that can't be iterated/enumerated sequentially, and where a client cannot determine programmatically if a URL in a search result set has been visited previously without visiting it again.
I would term this process "encoding". You'll see something similar done to permit the use of communications channels that have special symbols that are not permitted in data. Examples: uuencoding and base64 encoding.
That said, you still need to (and appear at first blush to have) ensure that there is only one correct de-code; and accept the increase in size of the output (in the case above, the output will be double the size, bit-for-bit as the input).
I think you'd be better off encrypting the number with a cheap cypher + a constant secret key stored on your server(s), adding a random character or four at the end, and a cheap checksum, and simply reject any responses that don't have a valid checksum.
<encrypt(secret)>
<integer>+<random nonsense>
</encrypt>
+
<checksum()>
<integer>+<random nonsense>
</checksum>
Then decrypt the first part (remember, cheap == fast), validate the ciphertext using the checksum, throw off the random nonsense, and use the integer you stored.
There are probably some cryptographic no-no's here, but let's face it, the cost of this algorithm being broken is a touch on the low side.

Hardware Cache Formulas (Parameter)

The image below was scanned (poorly) from Computer Systems: A Programmer's Perspective. (I apologize to the publisher). This appears on page 489.
Figure 6.26: Summary of cache parameters http://theopensourceu.com/wp-content/uploads/2009/07/Figure-6.26.jpg
I'm having a terribly difficult time understanding some of these calculations. At the current moment, what is troubling me is the calculation for M, which is supposed to be the number of unique addresses. "Maximum number of unique memory addresses." What does 2m suppose to mean? I think m is calculated as log2(M). This seems circular....
For the sake of this post, assume the following in the event you want to draw up an example: 512 sets, 8 blocks per set, 32 words per block, 8 bits per word
Update: All of the answers posted thus far have been helpful but I still think I'm missing something. cwrea's answer provides the biggest bridge for my understand. I feel like the answer is on the tip of my mental tongue. I know it is there but I can't identify it.
Why does M = 2m but then m = log2(M)?
Perhaps the detail I'm missing is that for a 32-bit machine, we'd assume M = 232. Does this single fact allow me to solve for m? m = log2(232)? But then this gets me back to 32... I have to be missing something...
m & M are related to each other, not defined in terms of each other. They call M a derived quantity however since usually the processor/controller is the limiting factor in terms of the word length it uses.
On a real system they are predefined. If you have a 8-bit processor, it generally can handle 8-bit memory addresses (m = 8). Since you can represent 256 values with 8-bits, you can have a total of 256 memory addresses (M = 2^8 = 256). As you can see we start with the little m due to the processor constraints, but you could always decide you want a memory space of size M, and use that to select a processor that can handle it based on word-size = log2(M).
Now if we take your assumptions for your example,
512 sets, 8 blocks per set, 32 words
per block, 8 bits per word
I have to assume this is an 8-bit processor given the 8-bit words. At that point your described cache is larger than your address space (256 words) & therefore pretty meaningless.
You might want to check out Computer Architecture Animations & Java applets. I don't recall if any of the cache ones go into the cache structure (usually they focus on behavior) but it is a resource I saved on the past to tutor students in architecture.
Feel free to further refine your question if it still doesn't make sense.
The two equations for M are just a relationship. They are two ways of saying the same thing. They do not indicate causality, though. I think the assumption made by the author is that the number of unique address bits is defined by the CPU designer at the start via requirements. Then the M can vary per implementation.
m is the width in bits of a memory address in your system, e.g. 32 for x86, 64 for x86-64. Block size on x86, for example, is 4K, so b=12. Block size more or less refers to the smallest chunk of data you can read from durable storage -- you read it into memory, work on that copy, then write it back at some later time. I believe tag bits are the upper t bits that are used to look up data cached locally very close to the CPU (not even in RAM). I'm not sure about the set lines part, although I can make plausible guesses that wouldn't be especially reliable.
Circular ... yes, but I think it's just stating that the two variables m and M must obey the equation. M would likely be a given or assumed quantity.
Example 1: If you wanted to use the formulas for a main memory size of M = 4GB (4,294,967,296 bytes), then m would be 32, since M = 2^32, i.e. m = log2(M). That is, it would take 32 bits to address the entire main memory.
Example 2: If your main memory size assumed were smaller, e.g. M = 16MB (16,777,216 bytes), then m would be 24, which is log2(16,777,216).
It seems you're confused by the math rather than the architectural stuff.
2^m ("2 to the m'th power") is 2 * 2... with m 2's. 2^1 = 2, 2^2 = 2 * 2 = 4, 2^3 = 2 * 2 * 2 = 8, and so on. Notably, if you have an m bit binary number, you can only represent 2^m different numbers. (is this obvious? If not, it might help to replace the 2's with 10's and think about decimal digits)
log2(x) ("logarithm base 2 of x") is the inverse function of 2^x. That is, log2(2^x) = x for all x. (This is a definition!)
You need log2(M) bits to represent M different numbers.
Note that if you start with M=2^m and take log2 of both sides, you get log2(M)=m. The table is just being very explicit.

Resources