MSP430 MEMORY ADDRESS IN CCS6 - msp430

I've wrote my very first MSP-EXP430F5529LP LED on/off program.
and I wanted to analyze my program. but I had problem at my first step.
I extracted my LED program from board and I've got unclear data. (3)
that's my first question. what is that file format? I mean I want to know file format for my memory dump file. (3)
my second question is that why CCS 6 doesn't indicate memory address properly?
I know that MSP430 is 16 bit MCU. so every memory address should be 16 bit-width. but my assembly code(2) which is copied from CCS6 Disassembly View show me address just like 01XXXX format.
relative data dereference and execution flow branches work well. but why CSS6 make me confused? I mean I want to know that why CCS6 display memory addresse 24 bit-width??
anyone who know where is TI document which explain what I want to know, please let me know. please just don't mention MSP430xxxx User's Guide.
sorry for my english :(
1.c code
#include <msp430f5529.h>
volatile unsigned int i;
void main(void) {
WDTCTL = WDTPW | WDTHOLD;
P1DIR |= 0x01;
while(1){
P1OUT ^= 0x01;
for(i = 20000;i > 0; i--);
}
}
2.assembly code
0100c2: 40B2 5A80 015C MOV.W #0x5a80,&Watchdog_Timer_WDTCTL
0100c8: D3D2 0204 BIS.B #1,&Port_A_PADIR
0100cc: E3D2 0202 XOR.B #1,&Port_A_PAOUT
0100d0: 40B2 4E20 2400 MOV.W #0x4e20,&i
0100d6: 3C02 JMP (0x00dc)
0100d8: 8392 2400 DEC.W &i
0100dc: 9382 2400 TST.W &i
0100e0: 27F5 JEQ (0x00cc)
0100e2: 3FFA JMP (0x00d8)
0100e4: 4303 NOP
0100e6: D032 0010 BIS.W #0x0010,SR
0100ea: 3FFD JMP (0x00e6)
0100ec: 431C MOV.W #1,R12
0100ee: 0110 RETA
0100f0: 4303 NOP
0100f2: 3FFF JMP (0x00f2)
3.memory dump (MAIN)
:1044000031400044b113ec000c930224b1130000be
:104410000c43b113c200b113f00000000200000011
:10442000840001001a44000000240000ffffffff89
:10443000ffffffffffffffffffffffffffffffff8c
:10444000ffffffffffffffffffffffffffffffff7c
...
...

If one reads the User Guide (which is why they exist) then one is informed that the Program Counter is 20-bit. So, now you know why you see an address in the 20-bit range.
Link to the MSP430 User Guide: http://www.ti.com/lit/ug/slau208n/slau208n.pdf
The 20-bit PC (PC/R0) points to the next instruction to be executed.
Each instruction uses an even number of bytes (2, 4, 6, or 8 bytes),
and the PC is incremented accordingly. Instruction accesses are
performed on word boundaries, and the PC is aligned to even addresses.
Figure 6-3 shows the PC.
The above is an excerpt from the User Guide. I cannot emphasis this enough - but you really need to read the User Guide. Not doing so and attempting to program microcontrollers is perlious to your mental health.
The memory dump seems to be in the Intel hex file format https://en.wikipedia.org/wiki/Intel_HEX

Related

How to simulate Real Time Interrupt in 68HC11 THRSim11 simulator

how do you simulate the RTI (Real Time Interrupt) in the 68HC11 THRSim11 simulator (see http://www.hc11.demon.nl/thrsim11/thrsim11.htm)? the following program works at the 68HC11 module but not in THRSim11. It's a test program to read from Analog to Digital Converter and display results to serial port using RTI. I tried the RTI interrupt vector 00EB and FFF0. My chip is the 68H711E9 with the following memory map.
I expected the THRSim11 to simulate the interrupt vector. When running the "again BRA again" just before CLI (enable Interrupt). It must be running the subroutine that reads from ADC and display to serial. It works perfectly in my 68HC711E9 Evaluation board with buffalo
REGBS EQU $1000 ;start of registers
BAUD EQU REGBS+$2B ;sci baud reg
SCCR1 EQU REGBS+$2C ;sci control1 reg
SCCR2 EQU REGBS+$2D ;sci control2 reg
SCSR EQU REGBS+$2E ;sci status reg
SCDR EQU REGBS+$2F ;sci data reg
TMSK2 EQU REGBS+$24 ;Timer Interrupt Mask Register 2
TFLG2 EQU REGBS+$25 ;Timer Interrupt Flag Register 2
ADR3 EQU $1033 ;ADC address 3
OPTION EQU $1039 ;ADC enable
SCS EQU $2E ;SCSR low bit
ADCTL EQU $1030 ;ADC setting
ADCT EQU $30 ;ADC setting low bit
PACTL EQU $1026 ;Pulse Accumulator control
***************************************************************
* Main program starts here *
***************************************************************
ORG $0110
* ORG $E000
start LDS #$01FF ;set stack pointer
JSR ONSCI ;initialize serial port
JSR t_init ;initialize timer
CLI ;enable interrupts
again BRA again
************************************************************
* t_init - Initialize the RTI timer
************************************************************
t_init LDAA #$01 ; set PTR1 and PTR0 to 0 and 1
STAA PACTL ;which leads to an RTI rate of 8.19 ms
LDAA #$40
STAA TFLG2 ;clears RTIF flag (write 1 in it!)
STAA TMSK2 ;sets RTII to allow interruptssec
RTS
************************************************************
* ADC_SERIAL - timer overflow interrupt service routine
************************************************************
ADC_SERIAL
LDX #REGBS
LDAA #%00010010
STAA ADCTL
LDAB #6
ADF00 DECB
BNE ADF00
ldaa ADR3 ; read ADC value
ldab SCSR ; read first Status
staa SCDR ; save in TX Register
BUFFS BRCLR SCS,X #$80 BUFFS
CLRFLG LDAA #$40
STAA TFLG2 ;clear RTIF
RTI ;return from ISR
************************************************************
* ONSCI() - Initialize the SCI for 9600
* baud at 8 MHz
************************************************************
ONSCI LDAA #$30
STAA BAUD baud register
LDAA #$00
STAA SCCR1
LDAA #$0C
STAA SCCR2 enable
LDAA #%10011010 ; enable the ADC
STAA OPTION
RTS
* Interrupt Vectors for BUFALO monitor
* ORG $FFF0 ;RTI vector for microcontroller
*
ORG $00EB ;Real Time Interrupt under Buffalo monitor
JMP ADC_SERIAL ;this instruction is executed every
* time there is a timer overflow
Presumably you mixed up "vector table" and "jump table". The HC11 expects an address at $FFF0, not an instruction.
In contrast, the Buffalo monitor expects an instruction at $00EB.
ORG $FFF0 ;RTI vector for microcontroller
FDB ADC_SERIAL
ORG $FFFE ;Reset vector for microcontroller
FDB start
As you will note, the same holds true for the reset vector at $FFFE.
With these changes it works for me. Be aware that the simulation is really slow*, depending on the number and kind of views opened.
Another side note: You send the single byte of conversion result without further processing. The serial receiver view of the simulator will try to interpret this byte as an ASCII character, and only if it fails, show a decimal number in angles. You might want to consider to convert the conversion result into a human readable value. The most simple solution may be a hex representation.
EDIT:
*) A simulator needs to be factors faster than the original machine, depending on the specific implementation of the simulation. In this case, they seem to have used a quite slow way. The documentation has some words on this. To gain some speed, close any view you don't need, and use the fastest PC you can get. To gain some understanding, think about how slow a simulation would be if it will simulate the analog electronics with each semiconductor of the chip. And even that is just a model, the "real" world currently starts at quantum mechanics.
Without further measure, you cannot use Buffalo's jump table entries, because the Buffalo monitor is not included in the simulator.
If you want to use an unmodified version of your firmware, you will need to add at least the used parts of the Buffalo monitor. If you have the monitor as a file loadable by the simulator, you might want to load it before loading your application.
The least you could do is to provide the jump table yourself, placing the appropriate address of the jump in the vector:
ORG $FFF0 ;RTI vector for microcontroller
FDB $00EB
The "problem" with the ASCII interpretation becomes visible, if values of printable characters are sent. Put the slider in the first third, and you will see some letter or digit or punctuation. Slide it minimally up and down for other characters. Yes, terminals can be dumb, and this one is no exception. Actually it is a little bit smart and shows the printable characters instead of their ASCII value. Additionally it knows at least CR (carriage return, $0D, decimal 13) and LF (line feed, $0A, decimal 10). You might want to write a little test program that sends "Hello, world", CR, LF. Or another experiment that sends all values from $00 to $FF.
The meaning of a value always depends on its interpretation. This terminal interprets values as ASCII characters, if possible.

Ada for I2C on the BBC Micro:Bit with the MCP23017

I am trying to set up a very simple example using the ada I2C library with the MCP23017 IO expander on the micro:bit (V1.5) but I can't figure it out. For now, all I want to do is turn on an LED that is connected to a GPIOA pin. I have it working in python with the following:
from microbit import i2c
while True:
# set pins to output
# 0x20 is the address of the MCP23017
# the first 0x00 is the IODIRA address for setting pin direction (input/output)
# the second 0x00 sets all the pins to be outputs
i2c.write(0x20, bytes([0x00, 0x00]))
# set outputs to true to turn on led
# 0x14 is the OLATA address for outputs
# 0xFF sets all outputs to true
i2c.write(0x20, bytes([0x14, 0xFF]))
and here is my attempt in ada:
with MicroBit.Display; use MicroBit.Display;
with MicroBit.I2C;
with HAL.I2C; use HAL.I2C;
procedure Main is
I2C_Controller : constant HAL.I2C.Any_I2C_Port := MicroBit.I2C.Controller;
I2C_Slave_Address : constant HAL.I2C.I2C_Address := 32;
Pins_Out : constant I2C_Data (0 .. 1) := (0, 0);
Outputs_On : constant I2C_Data (0 .. 1) := (20, 255);
Status : HAL.I2C.I2C_Status;
begin
MicroBit.I2C.Initialize;
Display ('I');
loop
I2C_Controller.Master_Transmit (Addr => I2C_Slave_Address,
Data => Pins_Out,
Status => Status);
I2C_Controller.Master_Transmit (Addr => I2C_Slave_Address,
Data => Outputs_On,
Status => Status);
end loop;
end Main;
I tried to translate the inputs from hex in the python example to the decimal that the ada library needs. So 0x20 in the python corresponds to 32 decimal for the ada. Similarly, 0x00 -> 0, 0x14 -> 20 and 0xFF -> 255. I must be missing something because this isn't working. I display the 'I' to make sure that it flashes successfully, but nothing happens other than that. Any help would be greatly appreciated,
Thanks!
I had an MCP23017 in 2012; can’t find it, so hard to help other than by suggesting lines of approach.
Have you tied the three settable address lines in the datasheet Fig 3.6 (pins 15, 16, 17 in the diagram on the first page) to ground?
You don’t check that the returned Status from the Master_Transmit calls is HAL.I2C.Ok.
Looking at line 135 of nrf-twi.adb
This.Periph.ADDRESS.ADDRESS := UInt7 (Addr / 2);
and comparing it to Fig 3.6 again, I strongly suspect that the chip address you pass to the Ada Drivers Library should be 16#40#. There’s controversy as to whether the address you specify to your support library is bits 1..7 or 0..6, i.e. whether the library expects to divide or multiply it by 2 before sending to the hardware. To confuse the picture further, Table 272 in the nRF51 RM suggests that only the 7 bits of the address are written to the ADDRESS register, with the read/write bit sent after the address bits by the nRF51 hardware; while for the STM32 range, it’s up to our software to put the address bits in the right place (bits 1 .. 7), and the library software twiddles the read/write bit (0).
Guessing here, but it looks as though the Python library writes the value you give it straight into the ADDRESS register.
See this Ada Drivers Library issue.

MSP540F5438A Clock Bringup

I'm new to the MSP430 and I am trying to better understand the clock bring-up process. For my current purpose I'm going to take PMMCOREV out of the equation by using a 4 MHz MCLK which is within the 0-8 MHz range for PMMCOREV = 0.
Will someone knowledgable about these parts please check my logic and assumptions:
When the part boots XT1 is selected as the FLL reference and DCOCLKDIV is selected as the MCLK input. DIVM is 0 so the MCLK source is not divided.
When the system boots the crystal is not yet stable so I'm assuming the UCS moves in to fail-safe mode and uses REFO (internal trimmed 32K) as the FLL reference.
Already I'm a bit confused. If the divided DCO is used for MCLK how are we assured that the FLL is stable? So how is the core functioning at all?
It seems to me that MCLK should be either VLO or REFO until you can bring things up gracefully.
Can someone clarify these details and steer me in the right direction to properly initialize these clocks?
Thanks!
Per your comment, yes.
At startup DCO will be the clock - so you just need to modify the UCSCTL registers and wait for the oscillators to settle and you are good to go.
Here are the steps in general:
Change the vcore level in steps (if necessary - in your case its not)
Enable XT1
Configure the drive strength
Select your clock sources for MCLK, SMCLK, and ACLK and do any source division that you need to do
Allow XT1, XT2 and DCO to stablize by checking for fault flags.
Your external crystal is 4Mhz - Are you wanting to use it as MCLK directly? Or is your angle to use it as a reference to the FLL for DCO, and use DCO for MCLK (to achieve a higher MCLK frequency)? The core volatage that you will need depends on whatever your MCLK frequency is, not your external crystal's frequency. So if you are wanting to use a MCLK rate of higher than 8Mhz you will need to consider stepping up PMMCOREV to 01.
For convenience, here is a reference for UCS registers from SLAU208M.
http://www.ti.com/lit/ug/slau208m/slau208m.pdf#page=172
Based on your OP, I think you should do the following if you want to use XT1 are your MCLK:
//1) Enable XT1 - XT1 will be off by default. You may not need to explicitly
// perform this step. According to pg. 162 in SLAU208M, XT1 will be
// enabled when you select it as the source for one of the clocks. But I
// like being explicit!
UCSCTL6 &= ~XT1OFF //XT1OFF= 0x0001u
//2) Clear the XT1DRIVE bits - it may not be necessary to clear these bits
// explicitly, but XT1's drive strength can be reduced to 0 w/ a 4MHz
// crystal. By default, this will be b11, full scale, which will consume
// more power, but result in a quicker settling time.
UCSCTL6 &= ~XT1DRIVE0; //XT1DRIVE0 = 0x0040u
UCSCTL6 &= ~XT1DRIVE1; //XT1DRIVE1 = 0x0080u
//3) Select XT1 as the clock source for XT1. UCSCTL4 defaults to 0x44 at
// power on - DCOCLKDIV (b100). SELM_XT1CLK = b000.
UCSCTL4 &= SELM__XT1CLK; //SELM__XT1CLK = 0x0000u
//4) Wait for XT1 to stabilize
do
{
//Explicitly clear the XT1 low and high frequency fault flasg, XT2 fault flag,
//and DCO fault flag. 0X0008u, 0x0002, 0x0004, 0x0001 respectively.
UCSCTL7 &= ~(XT2OFFG + XT1LFOFFG + XT1HFOFFG + DCOFFG);
//Clear the oscillator fault interrupt flag in the special function interrupt
//flags register.
SFRIFG1 &= ~OFIFG; //0X0002U
} while (SFRIFG1&OFIFG); //Test to see if any oscillator fault flags are asserted.
I am using IAR systems, not sure if you are using CCS if those definitions will be named differently. I went ahead and typed out the hex for each of the operands.
On the msp4305438A you do not have to do anything with bypass.
Does that answer your question?
Also, in your OP you mention wanting to use XT1 as a reference for an FLL. That is accomplished using UCSCTL3. SELFREF is the field you want to set to b000 to use XT1,
Here's the definitions for the MSP4305438A header in IAR:
#define SELREF0 (0x0010u) /* FLL Reference Clock Select Bit : 0 */
#define SELREF1 (0x0020u) /* FLL Reference Clock Select Bit : 1 */
#define SELREF2 (0x0040u) /* FLL Reference Clock Select Bit : 2 */

How to measure the amount of memory or RAM consumed by a code on Arduino Mega or Due

Can anybody tell me how to measure the consumed RAM for a particular code running on Arduino Mega or Due.
There is two kinds of numbers to this question:
Global static usage and current run time.
The static estimated usage can be determined by adding the following line to (if it does not already exist)
.\arduino-1.5.5\hardware\arduino\avr\boards.txt
uno.upload.maximum_ram_size=2048
This then allows the compiler to output the additional 2nd line in the following example in the IDE's result window
Binary sketch size: 25,880 bytes (of a 32,256 byte maximum)
Estimated used SRAM memory: 990 bytes (of a 2048 byte maximum)
To see the amount of memory used at any given point. Including memory space currently in use, that exists while only in functions and members. This includes the HEAP and such. I use the following MemoryFree library at specific points in the code to reveal the high-water. The readme explains how to save unnecessarily/unintentionally used RAM by prints.
Note: That while the original Arduino IDE 1.0.5's boards.txt files does contain these ram_sizes, it does not actually use display usage. Where the original Arduino IDE 1.5.5 does, along with Arduino ERW 1.0.5 does (an non-supported fork).
In my Arduino IDE 2.1.0
I edit the file: /usr/share/arduino/hardware/arduino/boards.txt
but the second line don't appear
After read:
check-ram-memory-usage-arduino-optimization
measuring-free-memory
I tried:
Show vervose output during compilation
and use avr-size /tmp/build4042914391435450796.tmp/XXXXXXX.cpp.elf
then i get my memory used
Best Regards!
int freeRam () {
extern int __heap_start, *__brkval;
int v;
int fr = (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval);
Serial.print("Free ram: ");
Serial.println(fr);
}

Using the extra 16 bits in 64-bit pointers

I read that a 64-bit machine actually uses only 48 bits of address (specifically, I'm using Intel core i7).
I would expect that the extra 16 bits (bits 48-63) are irrelevant for the address, and would be ignored. But when I try to access such an address I got a signal EXC_BAD_ACCESS.
My code is:
int *p1 = &val;
int *p2 = (int *)((long)p1 | 1ll<<48);//set bit 48, which should be irrelevant
int v = *p2; //Here I receive a signal EXC_BAD_ACCESS.
Why this is so? Is there a way to use these 16 bits?
This could be used to build more cache-friendly linked list. Instead of using 8 bytes for next ptr, and 8 bytes for key (due to alignment restriction), the key could be embedded into the pointer.
The high order bits are reserved in case the address bus would be increased in the future, so you can't use it simply like that
The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations (...) The architecture definition allows this limit to be raised in future implementations to the full 64 bits, extending the virtual address space to 16 EB (264 bytes). This is compared to just 4 GB (232 bytes) for the x86.
http://en.wikipedia.org/wiki/X86-64#Architectural_features
More importantly, according to the same article [Emphasis mine]:
... in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup). Further, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form."
As the CPU will check the high bits even if they're unused, they're not really "irrelevant". You need to make sure that the address is canonical before using the pointer. Some other 64-bit architectures like ARM64 have the option to ignore the high bits, therefore you can store data in pointers much more easily.
That said, in x86_64 you're still free to use the high 16 bits if needed (if the virtual address is not wider than 48 bits, see below), but you have to check and fix the pointer value by sign-extending it before dereferencing.
Note that casting the pointer value to long is not the correct way to do because long is not guaranteed to be wide enough to store pointers. You need to use uintptr_t or intptr_t.
int *p1 = &val; // original pointer
uint8_t data = ...;
const uintptr_t MASK = ~(1ULL << 48);
// === Store data into the pointer ===
// Note: To be on the safe side and future-proof (because future implementations
// can increase the number of significant bits in the pointer), we should
// store values from the most significant bits down to the lower ones
int *p2 = (int *)(((uintptr_t)p1 & MASK) | (data << 56));
// === Get the data stored in the pointer ===
data = (uintptr_t)p2 >> 56;
// === Deference the pointer ===
// Sign extend first to make the pointer canonical
// Note: Technically this is implementation defined. You may want a more
// standard-compliant way to sign-extend the value
intptr_t p3 = ((intptr_t)p2 << 16) >> 16;
val = *(int*)p3;
WebKit's JavaScriptCore and Mozilla's SpiderMonkey engine as well as LuaJIT use this in the nan-boxing technique. If the value is NaN, the low 48-bits will store the pointer to the object with the high 16 bits serve as tag bits, otherwise it's a double value.
Previously Linux also uses the 63rd bit of the GS base address to indicate whether the value was written by the kernel
In reality you can usually use the 48th bit, too. Because most modern 64-bit OSes split kernel and user space in half, so bit 47 is always zero and you have 17 top bits free for use
You can also use the lower bits to store data. It's called a tagged pointer. If int is 4-byte aligned then the 2 low bits are always 0 and you can use them like in 32-bit architectures. For 64-bit values you can use the 3 low bits because they're already 8-byte aligned. Again you also need to clear those bits before dereferencing.
int *p1 = &val; // the pointer we want to store the value into
int tag = 1;
const uintptr_t MASK = ~0x03ULL;
// === Store the tag ===
int *p2 = (int *)(((uintptr_t)p1 & MASK) | tag);
// === Get the tag ===
tag = (uintptr_t)p2 & 0x03;
// === Get the referenced data ===
// Clear the 2 tag bits before using the pointer
intptr_t p3 = (uintptr_t)p2 & MASK;
val = *(int*)p3;
One famous user of this is the V8 engine with SMI (small integer) optimization. The lowest bit in the address will serve as a tag for type:
if it's 1, the value is a pointer to the real data (objects, floats or bigger integers). The next higher bit (w) indicates that the pointer is weak or strong. Just clear the tag bits and dereference it
if it's 0, it's a small integer. In 32-bit V8 or 64-bit V8 with pointer compression it's a 31-bit int, do a signed right shift by 1 to restore the value; in 64-bit V8 without pointer compression it's a 32-bit int in the upper half
32-bit V8
|----- 32 bits -----|
Pointer: |_____address_____w1|
Smi: |___int31_value____0|
64-bit V8
|----- 32 bits -----|----- 32 bits -----|
Pointer: |________________address______________w1|
Smi: |____int32_value____|0000000000000000000|
https://v8.dev/blog/pointer-compression
So as commented below, Intel has published PML5 which provides a 57-bit virtual address space, if you're on such a system you can only use 7 high bits
You can still use some work around to get more free bits though. First you can try to use a 32-bit pointer in 64-bit OSes. In Linux if x32abi is allowed then pointers are only 32-bit long. In Windows just clear the /LARGEADDRESSAWARE flag and pointers now have only 32 significant bits and you can use the upper 32 bits for your purpose. See How to detect X32 on Windows?. Another way is to use some pointer compression tricks: How does the compressed pointer implementation in V8 differ from JVM's compressed Oops?
You can further get more bits by requesting the OS to allocate memory only in the low region. For example if you can ensure that your application never uses more than 64MB of memory then you need only a 26-bit address. And if all the allocations are 32-byte aligned then you have 5 more bits to use, which means you can store 64 - 21 = 43 bits of information in the pointer!
I guess ZGC is one example of this. It uses only 42 bits for addressing which allows for 242 bytes = 4 × 240 bytes = 4 TB
ZGC therefore just reserves 16TB of address space (but not actually uses all of this memory) starting at address 4TB.
A first look into ZGC
It uses the bits in the pointer like this:
6 4 4 4 4 4 0
3 7 6 5 2 1 0
+-------------------+-+----+-----------------------------------------------+
|00000000 00000000 0|0|1111|11 11111111 11111111 11111111 11111111 11111111|
+-------------------+-+----+-----------------------------------------------+
| | | |
| | | * 41-0 Object Offset (42-bits, 4TB address space)
| | |
| | * 45-42 Metadata Bits (4-bits) 0001 = Marked0
| | 0010 = Marked1
| | 0100 = Remapped
| | 1000 = Finalizable
| |
| * 46-46 Unused (1-bit, always zero)
|
* 63-47 Fixed (17-bits, always zero)
For more information on how to do that see
Allocating Memory Within A 2GB Range
How can I ensure that the virtual memory address allocated by VirtualAlloc is between 2-4GB
Allocate at low memory address
How to malloc in address range > 4 GiB
Custom heap/memory allocation ranges
Side note: Using linked list for cases with tiny key values compared to the pointers is a huge memory waste, and it's also slower due to bad cache locality. In fact you shouldn't use linked list in most real life problems
Bjarne Stroustrup says we must avoid linked lists
Why you should never, ever, EVER use linked-list in your code again
Number crunching: Why you should never, ever, EVER use linked-list in your code again
Bjarne Stroustrup: Why you should avoid Linked Lists
Are lists evil?—Bjarne Stroustrup
A standards-compliant way to canonicalize AMD/Intel x64 pointers (based on the current documentation of canonical pointers and 48-bit addressing) is
int *p2 = (int *)(((uintptr_t)p1 & ((1ull << 48) - 1)) |
~(((uintptr_t)p1 & (1ull << 47)) - 1));
This first clears the upper 16 bits of the pointer. Then, if bit 47 is 1, this sets bits 47 through 63, but if bit 47 is 0, this does a logical OR with the value 0 (no change).
I guess no-one mentioned possible use of bit fields ( https://en.cppreference.com/w/cpp/language/bit_field ) in this context, e.g.
template<typename T>
struct My64Ptr
{
signed long long ptr : 48; // as per phuclv's comment, we need the type to be signed to be sign extended
unsigned long long ch : 8; // ...and, what's more, as Peter Cordes pointed out, it's better to mark signedness of bit field explicitly (before C++14)
unsigned long long b1 : 1; // Additionally, as Peter found out, types can differ by sign and it doesn't mean the beginning of another bit field (MSVC is particularly strict about it: other type == new bit field)
unsigned long long b2 : 1;
unsigned long long b3 : 1;
unsigned long long still5bitsLeft : 5;
inline My64Ptr(T* ptr) : ptr((long long) ptr)
{
}
inline operator T*()
{
return (T*) ptr;
}
inline T* operator->()
{
return (T*)ptr;
}
};
My64Ptr<const char> ptr ("abcdefg");
ptr.ch = 'Z';
ptr.b1 = true;
ptr.still5bitsLeft = 23;
std::cout << ptr << ", char=" << char(ptr.ch) << ", byte1=" << ptr.b1 <<
", 5bitsLeft=" << ptr.still5bitsLeft << " ...BTW: sizeof(ptr)=" << sizeof(ptr);
// The output is: abcdefg, char=Z, byte1=1, 5bitsLeft=23 ...BTW: sizeof(ptr)=8
// With all signed long long fields, the output would be: abcdefg, char=Z, byte1=-1, 5bitsLeft=-9 ...BTW: sizeof(ptr)=8
I think it may be quite a convenient way to try to make use of these 16 bits, if we really want to save some memory. All the bitwise (& and |) operations and cast to full 64-bit pointer are done by compiler (though, of course, executed in run time).
According to the Intel Manuals (volume 1, section 3.3.7.1) linear addresses has to be in the canonical form. This means that indeed only 48 bits are used and the extra 16 bits are sign extended. Moreover, the implementation is required to check whether an address is in that form and if it is not generate an exception. That's why there is no way to use those additional 16 bits.
The reason why it is done in such way is quite simple. Currently 48-bit virtual address space is more than enough (and because of the CPU production cost there is no point in making it larger) but undoubtedly in the future the additional bits will be needed. If applications/kernels were to use them for their own purposes compatibility problems will arise and that's what CPU vendors want to avoid.
Physical memory is 48 bit addressed. That's enough to address a lot of RAM. However between your program running on the CPU core and the RAM is the memory management unit, part of the CPU. Your program is addressing virtual memory, and the MMU is responsible for translating between virtual addresses and physical addresses. The virtual addresses are 64 bit.
The value of a virtual address tells you nothing about the corresponding physical address. Indeed, because of how virtual memory systems work there's no guarantee that the corresponding physical address will be the same moment to moment. And if you get creative with mmap() you can make two or more virtual addresses point at the same physical address (wherever that happens to be). If you then write to any of those virtual addresses you're actually writing to just one physical address (wherever that happens to be). This sort of trick is quite useful in signal processing.
Thus when you tamper with the 48th bit of your pointer (which is pointing at a virtual address) the MMU can't find that new address in the table of memory allocated to your program by the OS (or by yourself using malloc()). It raises an interrupt in protest, the OS catches that and terminates your program with the signal you mention.
If you want to know more I suggest you Google "modern computer architecture" and do some reading about the hardware that underpins your program.

Resources