This document provides information specific to VxWorks development on PowerPC targets. It includes the following topics:
Information on changes or additions to particular VxWorks features to support the PowerPC processors.
Special features and limitations of the PowerPC processors, including a figure showing the VxWorks memory layout for these processors.
For general information on the Tornado development environment's cross-development tools, see the Tornado User's Guide.
The Tornado project facility is correctly preconfigured for building BSPs supplied by Wind River. However, if you choose not to use the project facility or if you need to customize your build, you may need the information in the following sections. This includes a configuration constant, an environment variable, and compiler options that together specify the information the Diab or GNU toolkits require to compile correctly for PowerPC targets.
|
|
|||||||||||||||||||
PowerPC devices use two preprocessor constants, CPU and TOOL, to define compiler options for a specific device. The CPU variable ensures that VxWorks and your applications are compiled with the appropriate architecture-specific features enabled. The TOOL variable defines the toolchain to use for compiling and linking modules. Specifying CPU and TOOL is usually sufficient to build a module.
|
|
|||||||||||||||||||
The CPU and TOOL variables should be set to one combination of the values shown in Table 1, depending on the processor you are using.
|
|
NOTE:
For makefiles having command-line options that differ from those in Table 1, always rely on the options set in the makefile. The exact compiler options for each setting are listed in installDir/target/h/tool/$TOOL/make.$CPU$TOOL.
|
||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
1: Motorola PowerPC MPC74xx CPUs are treated as a variation of the PowerPC 604 CPU type. AltiVec support in the MPC74xx processors is in addition to the existing PPC604 functionality. Modules that make use of AltiVec instructions must be compiled with compiler-specific options as described in Table 1, but can be linked with modules that do not use the AltiVec compile options. See AltiVec Support, for details |
|||||||||||||||||||
For example, to specify CPU for a PowerPC 603 on the compiler command line, use the following command-line option when you invoke the compiler:
-DCPU=PPC603
To provide the same information in a header or source file, include the following line in the file:
#define CPU PPC603
All VxWorks makefiles pass along the definition of this variable to the compiler. You can define CPU or TOOL on the make command line as follows:
% make CPU=PPC603 TOOL=diab
You can also set the definition directly in the makefile using the following line:
CPU=PPC603
To compile C and C++ modules for debugging in GDB, you must use the -g flag to generate DWARF debug information. An example command line is as follows:
% ccppc -mcpu=603 -IinstallDir/target/h -fno-builtin \ -DCPU=PPC603 -c -g test.cpp
In this example, installDir is the location of your Tornado tree and -DCPU specifies the CPU type.
This section describes particular routines and tools that are specific to PowerPC targets in any of the following ways:
The HI and HIADJ macros are used in PowerPC assembly code to facilitate the loading of immediate operands larger than 16 bits. The macro HI(x) is the simple high-order 16 bits of the value x. The macro HIADJ(x) is the high-order 16 bits adjusted by bit 15. If bit 15 is set, the value is adjusted by adding 1.
The macro HIADJ(x) must be used whenever the low-order 16 bits are to be used in an instruction that interprets them as a signed quantity (for instance, addi or lwz). If the low-order bits are used in an instruction that interprets them as an unsigned quantity (for instance, ori), the proper macro is HI, not HIADJ.
For example, addi uses a SIGNED quantity, so HIADJ is the proper macro:
lis rx, HIADJ(VALUE) addi rx, rx, LO(VALUE)
However, ori uses an UNSIGNED quantity, so HI is the proper macro:
lis rx, HI(VALUE) ori rx, rx, LO(VALUE)
VxWorks provides two levels of virtual memory support: the basic level bundled with VxWorks, and the full level that requires the optional product VxVMI. Currently, VxVMI is supported on PowerPC 860 family processors only.
For detailed information on VxWorks MMU support, see the VxWorks Programmer's Guide: Virtual Memory Interface. The following subsections augment the information in that chapter.
The PowerPC MMU introduces a distinction between instruction and data MMU and allows them to be separately enabled or disabled. Two parameters, USER_I_MMU_ENABLE and USER_D_MMU_ENABLE, are enabled by default in the Params tab of the Properties window under SELECT_MMU. To disable one or both MMUs, select the corresponding parameter and set the value to FALSE.
In all PowerPC cores except Book E processors, the PowerPC MMU is temporarily disabled by hardware upon interrupts and exceptions. VxWorks also temporarily disables the MMU during certain processor family-specific register changes, cache updates, and page table updates. VxWorks saves the original MMU-enabled state and restores the state as quickly as possible, to maintain memory coherency. As a result, VxWorks requires that exception vectors, handlers, certain update code, and task stacks be mapped such that the virtual and real addresses are identical.
One outcome of the necessity that the virtual and real addresses must be equivalent, is that two copies of the VxWorks run-time (for example, on two processors running their own copies of the VxWorks image) cannot share the same physical address space for running VxWorks.
The VxWorks PowerPC implementations share a common programming model for mapping 4 KB memory pages. The physical memory address space is described by the data structure sysPhysMemDesc[ ], defined in sysLib.c. This data structure is made up of configuration constants for each page or group of pages. All of the configuration constants defined in the VxWorks Programmer's Guide: Virtual Memory Interface are available for PowerPC virtual memory pages.
Use of the VM_STATE_CACHEABLE constant, listed in the VxWorks Programmer's Guide: Virtual Memory Interface for each page or group of pages, sets the cache to copy-back mode.
In addition to VM_STATE_CACHEABLE, the following additional constants are supported:
|
|
|||||||||||||||||||
The first constant sets the page descriptor cache mode field in cacheable write-through mode. Cache coherency and guarded modes are controlled by the other constants. There is no default configuration, because each memory region may have specific requirements; see individual BSPs for examples.
For more information regarding cache modes, see PowerPC Microprocessor Family: The Programming Environments.
For more information on memory page states, state flags, and state masks, see the VxWorks Programmer's Guide: Virtual Memory Interface.
The PowerPC 603 (including PPC82xx) and 604 (including PPC7xx, PPC74xx, collectively the PPC604 family) MMU supports two models for memory mapping. The first, the Block Address Translation (BAT) model, allows mapping of a memory block ranging in size from 128 KB to 256 MB (or larger, depending on the CPU) into a BAT register. The second, the segment model, gives the ability to map the memory in pages of 4 KB. Tornado for PowerPC supports both memory models.
The Block Address Translation (BAT) model takes precedence over the segment model. However, the BAT model is not supported by the VxWorks vmLib or cache libraries. Therefore, functions provided by those libraries are not effective, and no errors are reported, in memory spaces mapped by BAT registers. Typically, in VxWorks, the BATs are only used to map large external regions, or PROM/Flash, where fine grain control is unnecessary; this has the advantage of reducing the size of the PTE table used by the segment model.
All PPC603 and PPC604 family members include 4 BATs: 4 Instruction BATS (IBAT) and 4 Data BATs (DBAT). The BAT registers are always active, and must be initialized during boot. Typically, romInit( ) initializes all (active) BATs to zero, so that they perform no translation. No further work is required if the BATs are not to be used for any address translation at all.
|
|
|||||||||||||||||||
Motorola MPC7x5 and MPC74x5 CPUs have an additional 4 IBAT and 4 DBAT registers. These extra BATs can be enabled or disabled (HID0 or HID1, depending on the CPU); they are disabled by hardware reset. Configuring these additional BATs for VxWorks is optional.
The IBM PPC750FX also adds 4 IBAT and 4 DBAT registers, but these are always enabled.1 In this case, the additional BATs must be configured.
The data structure sysBatDesc[ ], defined in sysLib.c, handles the BAT register configuration. All of the configuration constants used to fill sysBatDesc[ ] are defined in installDir/target/h/arch/ppc/mmu603Lib.h for both the PowerPC 603 and the PowerPC 604. Providing the correct entries in sysBatDesc[ ] is sufficient to configure the basic 4 BATs; no additional software configuration is required. See below for configuration of all 8 BAT registers. If sysBatDesc[ ] is not defined by the BSP, the BATs are left alone after being configured by romInit( ).
The sysBatDesc[ ] array essentially doubles in size, and the order of the entries is fixed. The initial 16 entries are identical in meaning to the original array, so may remain unchanged. For example (from the sp745x BSP):
UINT32 sysBatDesc [2 * (_MMU_NUM_IBAT + _MMU_NUM_DBAT +
_MMU_NUM_EXTRA_IBAT + _MMU_NUM_EXTRA_DBAT)] =
{
/* I BAT 0 */
((ROM_BASE_ADRS & _MMU_UBAT_BEPI_MASK) | _MMU_UBAT_BL_1M |
_MMU_UBAT_VS | _MMU_UBAT_VP),
((ROM_BASE_ADRS & _MMU_LBAT_BRPN_MASK) | _MMU_LBAT_PP_RW |
_MMU_LBAT_CACHE_INHIBIT),
0,0, /* I BAT 1 */
0,0, /* I BAT 2 */
0,0, /* I BAT 3 */
/* D BAT 0 */
((ROM_BASE_ADRS & _MMU_UBAT_BEPI_MASK) | _MMU_UBAT_BL_1M |
_MMU_UBAT_VS | _MMU_UBAT_VP),
((ROM_BASE_ADRS & _MMU_LBAT_BRPN_MASK) | _MMU_LBAT_PP_RW |
_MMU_LBAT_CACHE_INHIBIT),
0,0, /* D BAT 1 */
0,0, /* D BAT 2 */
0,0, /* D BAT 3 */
/*
* These entries are for the the I/D BATs (4-7) on the MPC7455/755.
* They should be defined in the following order.
* IBAT4U,IBAT4L,IBAT5U,IBAT5L,IBAT6U,IBAT6L,IBAT7U,IBAT7L,
* DBAT4U,DBAT4L,DBAT5U,DBAT5L,DBAT6U,DBAT6L,DBAT7U,DBAT7L,
*/
0,0, /* I BAT 4 */
0,0, /* I BAT 5 */
0,0, /* I BAT 6 */
0,0, /* I BAT 7 */
0,0, /* D BAT 4 */
0,0, /* D BAT 5 */
0,0, /* D BAT 6 */
0,0 /* D BAT 7 */
};
The BAT initialization routine is declared as follows:
(void) myBatInitFunc (int * &sysBatDesc[0])
This routine reads sysBatDesc[ ], initializes the BAT registers, and performs any other required setup, for example, configure HID0 for MPC74x5. See the CPU-specific reference manual for the additional BAT register numbers and configuration information. The following functions initialize the MPC7x5 and MPC74x5, respectively:
/* * mmuPpcBatInitMPC74x5 initializes the standard 4 (0-3) I/D BATs & * the additional 4 (4-7) I/D BATs present on the MPC74[45]5. */ IMPORT void mmuPpcBatInitMPC74x5 (UINT32 *pSysBatDesc); /* * mmuPpcBatInitMPC7x5 initializes the standard 4 (0-3) I/D BATs & * the additional 4 (4-7) I/D BATs present on the MPC7[45]5. */ IMPORT void mmuPpcBatInitMPC7x5 (UINT32 *pSysBatDesc);
Finally, the BAT initialization routine must be connected to the MMU initialization hook, _pSysBatInitFunc, which is NULL by default:
IMPORT FUNCPTR _pSysBatInitFunc; _pSysBatInitFunc = mmuPpcBatInitMPC7x5;
The assignment to _pSysBatInitFunc may be made conditional upon the value of the PVR, to allow the same kernel to run on different CPUs.
The segment model allows memory to be mapped in pages of 4 KB. All mapping attributes are defined in the individual page descriptors (write-through/copy-back, cache-inhibited, memory coherent, guarded, execute, and write permissions).
The application programmer interface for the PowerPC 603/604 memory mapping unit is the same as that described previously for MMU translation model.
The page table size depends on the total memory to be mapped. The larger the memory to be mapped, the bigger the page table. The VxWorks implementation of the segment model follows the recommendations given in PowerPC Microprocessor Family: The Programming Environments. During MMU library initialization, the total size of the memory to be mapped is computed, allowing dynamic determination of the page table size. Table 2 shows the correspondence between the total amount of memory to map and the page table size.
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
Although some PPC403 family processors have hardware memory mapping units, VxWorks does not support this feature.
The PPC405 memory mapping model allows memory to be mapped in pages of 4 KB. The translation table is organized into two levels: the top level consists of an array of 1,024 Level 1 (L1) table descriptors; each of these descriptors can point to an array of 1,024 Level 2 (L2) table descriptors. All mapping attributes are defined in L2 descriptors (write-through/copy-back, cache-inhibited, guarded, execute, and write permissions).
The translation table size depends on the total memory to be mapped. The larger the memory to be mapped, the bigger the table.
|
|
|||||||||||||||||||
The PPC440 core provides a 36-bit physical address space and a 32-bit program (virtual) address space. The mapping is accomplished with Translation Lookaside Buffers (TLBs) which are managed by software.
The PPC440 is an implementation of the Book E processor specification. The MMU is always active and all program addresses are translated by the TLBs. The MSRIS and MSRDS bits are used to extend the virtual address space so that TLB lookups can happen from two different address spaces for either instruction or data references. This easily allows for a static map to be used for boot and basic operation when MSR(IS,DS) = (0,0) (VxWorks regards this as MMU "disabled"), and enables dynamic 4 KB page mapping (MMU "enabled") when MSRIS = 1 or
MSRDS = 1.
After a processor reset, the board support package sets up a temporary static memory model. The following steps are included in the BSP romInit.s module:
This release of the VxWorks kernel provides support for the PowerPC 440 memory management unit (MMU). To include this support, configure INCLUDE_MMU_BASIC.
Tornado supports two cooperating models for memory mapping. The first, the static model, allows mapping of memory blocks ranging from 1 KB to 256 MB in size by dedicating an individual processor TLB entry to each block. The second, the dynamic model, provides the ability to map physical memory in 4 KB pages using the remaining available TLB entries in a round-robin fashion.
The data structure sysStaticTlbDesc[ ], defined in sysLib.c, describes the static TLB entry configuration. The number of static mappings is variable, depending on the size of the table, but should be kept to a minimum to allow the remaining TLB entries on the chip to be used for the dynamic model.
The static TLB entry registers are set by the initialization software in the MMU library.
Entry descriptions in sysStaticTlbDesc[ ] that set the _MMU_TLB_TS_0 attribute are used when VxWorks has the MMU "disabled" (that is, MSR(IS,DS) = (0,0)). Note that the VxWorks virtual memory library cannot represent physical addresses larger than the lowest 4 GB, and several of the PowerPC 440GP devices are located at higher physical addresses. To provide access to these devices when VxWorks has the MMU "enabled" (that is, MSRIS = 1 or MSRDS = 1), some entry descriptions in sysStaticTlbDesc[ ] set attribute _MMU_TLB_TS_1.
All of the configuration constants used to fill sysStaticTlbDesc[ ] are defined in installDir/target/h/arch/ppc/mmu440Lib.h.
The PPC440 dynamic mapping model allows memory to be mapped in pages of 4 KB. The translation table is organized into two levels: the top level consists of an array of 1,024 Level 1 (L1) table descriptors; each of these descriptors can point to an array of 1,024 Level 2 (L2) table descriptors. All mapping attributes are defined in L2 descriptors (write-through/copy-back, cache-inhibited, guarded, execute, and write permissions).
The translation table size depends on the total memory to be mapped. The larger the memory to be mapped, the bigger the table.
|
|
|||||||||||||||||||
The PowerPC 8xx memory mapping model allows you to map memory in 4 KB pages. The translation table is organized into two levels: the top level consists of an array of 1,024 Level 1 (L1) table descriptors; each of these descriptors can point to an array of 1,024 Level 2 (L2) table descriptors. Three mapping attributes are defined in the L1 descriptors (copy-back, write-through, and guarded cache modes), the others (cache off and all access permission attributes) are defined in the L2 descriptors. This effects granularity. For example, if one 4 KB page is mapped in copy-back mode, all pages within the corresponding 4 MB block (1,024 x 4 KB pages) are mapped in copy-back mode, except for any pages having cache off defined. That is, the cache mode setting of a single page can affect the cache mode setting of all mapped pages in the block.
The application programmer interface for the PowerPC 8xx memory mapping unit is described previously for the MMU translation model. PowerPC 8xx processors that implement hardware memory coherency typically do not support the use of the VM_STATE_MEM_COHERENCY attribute; the state VM_STATE_CACHEABLE_NOT identifies a page as memory coherent.
AltiVec is a vector coprocessor and PowerPC instruction set extension introduced on the Motorola PowerPC 74xx family of processors. VxWorks treats AltiVec as an extension to the PowerPC 604 core: A PPC604 binary image can run without modification on any AltiVec part, but does not provide access to, or control of, the AltiVec unit itself. This section describes the VxWorks implementation of AltiVec support, including:
The AltiVec-specific functions shown in Table 3 have been added to VxWorks.
|
|
|||||||||||||||||||
The stack frame for functions using the AltiVec registers adds the following areas to the standard EABI frame:
The stack frame layout for functions using the AltiVec registers is shown in Figure 1. Non-AltiVec stack frames are unchanged from prior VxWorks releases.
The required alignment for the SVR4 EABI specification is 16 bytes. This allows an extra 8-byte padding to be introduced in the AltiVec stack frame, and guarantees compatibility with the old 8-byte-aligned environment. AltiVec routines can be called from any other PowerPC EABI-compliant code that assumes an 8-byte alignment for stack boundary.
The AltiVec specification adds a new family of vector data types to the C language. vector types are 128 bits long, and are used to manipulate values in AltiVec registers. Under control of a compiler option, vector is now a keyword in the C and C++ languages. The AltiVec programming model introduces five new keywords as simple type-specifiers: vector, __vector, pixel, __pixel, and bool.
|
|
|||||||||||||||||||
The AltiVec Technology Programming Interface Manual also specifies vector conversions for formatted I/O. VxWorks supports the new formatted input and output of vector data types using the printf( ) and scanf( ) class functions shown in Table 4.
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
For a comprehensive discussion on the new format specifications, see the AltiVec Technology Programming Interface Manual. The following example program illustrates the input and output of sample vector values as well as several formatting variations.
void testFormattedIO()
{
__vector unsigned char s;
__vector signed int I;
__vector signed short SI;
__vector __pixel P;
__vector float F;
s = (__vector unsigned char)
('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F');
I = (__vector signed int) (99, 88, -34, 0);
SI = (__vector signed short) (1, 2, -1, -2, 0, 3, 4, 5);
P = (__vector __pixel) (50, 51, 52, 53, 54, 55, 56, 57);
F = (__vector float) (-3.1415926, 3.1415926, 9.8, 0.000);
printf("s = (%vc), (%,vc)\n\n", s, s);
printf("I = (%,vd), (%,2vld), (%,_3lvi)\n\n", I, I, I);
printf("I = (%,#vd), (%,vlx), (%,_lvX), (%vo)\n\n", I, I, I, I);
printf("I = (%,#vd), (%,#vlp), (%,_lvp), (%#vo)\n\n", I, I, I, I);
printf("SI = (%_vhd), (%:hvd), (%;vhi)\n\n", SI, SI, SI);
printf("VECTOR STRING: (%vs)\n\n", "GOOD !!");
printf("VECTOR PIXEL (%+:5hvi)\n\n", P);
printf("VECTOR FLOAT *e5.6*: (%,5.6ve)\n", F);
printf("VECTOR FLOAT *E5.6*: (%:5.6vE)\n", F);
printf("VECTOR FLOAT *g5.6*: (%;5.6vg)\n", F);
printf("VECTOR FLOAT *G5.6*: (%5.6vG)\n", F);
printf("VECTOR FLOAT *f.7* : (%_.7vf)\n", F);
printf("VECTOR FLOAT *e* : (%ve)\n", F);
}
-> testFormattedIO s = (0123456789ABCDEF), (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F) I = (99,88,-34,0), (99,88,-34, 0), ( 99_ 88_-34_ 0) I = (99,88,-34,0), (63,58,ffffffde,0), (63_58_FFFFFFDE_0), (143 130 37777777736 0) I = (99,88,-34,0), (0x63,0x58,0xffffffde,0x0), (0x63_0x58_0xffffffde_0x0), (0143 0130 037777777736 0) SI = (1_2_-1_-2_0_3_4_5), (1:2:-1:-2:0:3:4:5), (1;2;-1;-2;0;3;4;5) VECTOR STRING: (GOOD !!) VECTOR PIXEL ( +50: +51: +52: +53: +54: +55: +56: +57) VECTOR FLOAT *e5.6*: (-3.141593e+00,3.141593e+00,9.800000e+00,0.000000e+00) VECTOR FLOAT *E5.6*: (-3.141593E+00:3.141593E+00:9.800000E+00:0.000000E+00) VECTOR FLOAT *g5.6*: (-3.14159;3.14159; 9.8; 0) VECTOR FLOAT *G5.6*: (-3.14159 3.14159 9.8 0) VECTOR FLOAT *f.7* : (-3.1415925_3.1415925_9.8000002_0.0000000) VECTOR FLOAT *e* : (-3.141593e+00 3.141593e+00 9.800000e+00 0.000000e+00) value = 76 = 0x4c = 'L' ->
Modules that use the AltiVec registers and instructions must be compiled with the Diab compiler option: -tPPC7400FV:vxworks55 (see Table 1). Use of this flag always enables the AltiVec keywords __vector, __pixel, and __bool.
Diab also enables the AltiVec keywords vector, pixel, bool (and vec_step) by default if the -tPPC7400FV option is used. However, each keyword can be individually enabled or disabled with the Diab compiler (dcc) option -Xkeywords=<mask>, where mask is a logical OR of the values in Table 5.
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
For example, the following command-line sequence enables bool and vec_step, but disables vector and pixel (and also all of the non-AltiVec keywords in Table 5). See the Diab Release Notes for details.
dcc -tPPC7400FV:vxworks55 -Xkeywords=0x180-DCPU=PPC604 -DTOOL_FAMILY=diab -DTOOL=diab -c fioTest.c
|
|
|||||||||||||||||||
Modules that use the AltiVec registers and instructions must be compiled with either the -fvec or -fvec-eabi flags (see Table 1). The -fvec flag disables the use of the new vector and pixel keywords; __vector and __pixel are available for these new types. The -fvec-eabi flag enables all five keywords as a new family of types: bool, vector, __vector, pixel, and __pixel.
|
|
|||||||||||||||||||
Code compiled with the GNU toolchain -fvec compiler option automatically adjusts its stack pointer dynamically at run-time to align itself on a 16-byte boundary. This feature enables AltiVec code to share the same run-time stack with regular non-AltiVec code which is aligned on 8-byte boundaries. However, this feature can also cause complications when calling AltiVec-enabled functions from non-AltiVec code, with either more than eight integer-class parameters or more than eight floating-point parameters. (Integer-class parameters are char, short, int, pointer types, and so forth.)
The initial eight parameters of each class are passed in registers. The remaining parameters are passed through a special area on the caller's stack called the parameter save area. The callee's code refers directly to the caller's stack frame to access these parameters. However, with AltiVec-enabled functions, there may be an 8-byte padding boundary between the two stack frames to fulfill the 16-byte AltiVec alignment constraint. Under these conditions, the callee cannot correctly access its parameters on the stack. The normal parameter passing machinery is broken for such cases.
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
In this release, GDB features a setaltivec command that allows users to set a particular value in a given vector register. Typical usage scenarios for the setaltivec command are as follows:
(gdb)help setaltivec setaltivec <regname> 0x<hex>_<hex>_<hex>_<hex> Sets the value of the specific Altivec register.
To set a given value in an AltiVec register using setaltivec, use the following:
(gdb)setaltivec v4 0x45454545_12345678_12_5A7
Vector register contents can be printed using the print command:
(gdb)print $v4 0x454545451234567800000012000005A7
Throwing C++ exceptions between modules compiled with different compiler flags may result in unexpected behavior. C++ exceptions save register state. Modules compiled with AltiVec support (using either -fvec or -fvec-eabi) save all non-volatile AltiVec registers, but modules compiled without AltiVec support do not save any AltiVec registers. If a C++ exception is thrown from an AltiVec-enabled module, caught by a non-AltiVec enabled handler, and then thrown from there to an AltiVec-enabled handler that alters the AltiVec registers, it is possible to corrupt the saved AltiVec state. In particular, the non-volatile vector registers (v20 through v31) may be corrupted.
The following example illustrates the above scenario. It consists of a program composed of two files, file1.cpp and file2.cpp. Because file2 is compiled with the -fvec option, we call it AltiVec code. file1 is compiled without the -fvec option, so it is considered non-AltiVec code.
The example takes program flow across the two modules. It is also contrived to make intelligent guesses about the compiler register allocation strategy. The output is incorrect when one of the files is compiled without the -fvec option.
extern "C" int printf (const char *fmp, ...);
extern void bar ();
void foo ()
{
try
{
bar ();
}
catch (...)
{
}
}
extern "C" int printf (const char *fmp, ...);
extern void foo ();
typedef __vector signed long T;
void bar ()
{
// use a non-volatile vector register
asm ( "vsplitisw 24,0" ); // v24 <- (0,0,0,0)
}
void Start ()
{
// use a non-volatile vector register v24
T local = (__vector signed long) (-1, -1, -1, -1);
asm ( "vsplitisw 24,15" ); // v24 <- (15, 15, 15, 15)
foo ();
// continue using the non-volatile vector registers
asm ( "addi 9, 31, 32" ); // local <- v24
asm ( "stvx 24, 0, 9" );
printf ("Finally, local = (%vld)\n", local);
}
To produce a partially linked object file2.o, compile the two files with the following commands:
% ccppc -mcpu=604 -c file1.cpp % ccppc -mcpu=604 -nostdlib -fvec -r file1.o file2.cpp
Download file2.o to a target, and execute the Start function.
-> Start Finally, local = (0,0,0,0) ->
Function foo in file1.cpp is non-AltiVec code. Therefore, the try...catch block in foo does not save and restore the AltiVec context. Within the try...catch block, the call to bar alters the value of vector register v24. Because file1.cpp does not save AltiVec context, the value 0 in v24 assigned by bar remains unchanged when program flow returns to Start. The original value 15, assigned before the call to bar, is now corrupted. Hence, the incorrect output, local = (0,0,0,0).
Compile both files with the -fvec option:
% ccppc -mcpu=604 -nostdlib -fvec -r file1.cpp file2.cpp -o file2.o
Download file2.o to a target and execute the Start function.
-> Start Finally, local = (15,15,15,15) ->
Because both modules now have AltiVec code (compiled with the -fvec option), the try...catch block in foo now saves and restores the AltiVec context. The value 15 originally assigned in Start is faithfully restored by foo when it returns.
This section describes characteristics of the PowerPC architecture that you should keep in mind as you write a VxWorks application. The following topics are addressed:
Integer division by zero produces undefined results. Exception generation and handling are not provided by the compiler or run-time.
Floating-point exceptions are disabled by default during task initialization, causing zero-divide conditions to be ignored. On processors with hardware floating point (PPC603 and PPC604), individual tasks may modify their MSR in order to generate exceptions. On processors without hardware floating point (PPC403, PPC405, PPC440, and PPC860), neither the software floating-point library nor the compiler provides support for simulating a floating-point exception.
VxWorks uses bl or bla instructions by default for both exception/interrupt handling, and for dynamically downloaded module relocations. By using bl or bla, the PowerPC architecture is only capable of branching within the limits imposed by a signed 26-bit offset. This limits the available branch range to +/- 32 MB.
Branches across larger address ranges must be made to an absolute 32-bit address with the help of the LR register. Each absolute 32-bit jump is accomplished with a sequence of at least three instructions (more, if register state must be preserved) that is rarely needed and is expensive in terms of execution speed and code size. Such large branches are typically seen only in very large downloaded modules and very large (greater than 32 MB) system images.
One way of getting around this restriction for downloadable applications is to use the -mlongcall compiler option. However, this option may introduce an unacceptable amount of performance penalty and extra code size for some applications. It is for this reason that the VxWorks kernel is not compiled using -mlongcall.
Another way to get around this limitation is to increase the size of the WDB memory pool for host tools. By default, the WDB pool size is set to one-sixteenth of the amount of free memory. Memory allocations for host-based tools (such as WindSh and CrossWind) are done out of the WDB pool first, and then out of the general system memory pool. Requests larger than the available amount of WDB pool memory are done directly out of the system memory pool. If an application is anticipated to be located outside of the WDB pool, thus potentially crossing the 32 MB threshold, the size of the WDB memory pool can be increased to ensure the application fits into the required space.
To change the size of the WDB memory pool, redefine the macro WDB_POOL_SIZE in your BSP config.h file. This macro is defined in installDir/target/config/all/configAll.h as follows:
#define WDB_POOL_SIZE ((sysMemTop() - FREE_RAM_ADRS)/16)
Redefining WDB_POOL_SIZE in your BSP local config.h file alters the macro for that BSP only.
VxWorks 5.5 for PowerPC adds support for extended-call (32-bit addressable) exception vectors.
When exceptions and interrupts occur, PowerPC processors transfer control to a predetermined address, the exception vector, depending on the exception type. After saving volatile task state, the handler function installed for that exception vector is called. This call is made using bla instructions that, as described previously, require the handler function to be located within the first 32 MB of memory. Most systems are able to satisfy this 32 MB constraint. However, if a given handler function were to be located above 32 MB, the target address would be unreachable in previous VxWorks releases.
This release adds support for extended-call exception vectors, which can call handler functions located anywhere in the 4 GB address space. Extended-call exception vectors make calls to a 32-bit address in the Link Register (LR) using the blrl instructions. Extra work is required for an extended-call exception vector to load a 32-bit address into the LR, and make a call to it. Therefore, using extended-call exception vectors incurs an additional eleven instruction overhead in increased interrupt latency. It is therefore not advisable to use this feature unless absolutely necessary.
This release still maintains the earlier style 26-bit call vectors as the default. Using a single bl/bla instruction is much more efficient than the multiple-instruction sequence described previously. It is expected that most targets will continue to use the original relative branch (default) style exception handling.
A new global boolean, excExtendedVectors, has been added, that allows users to enable extended-call exception vectors. By default, excExtendedVectors is set to FALSE. When set to TRUE, extended-call vectors are enabled. excExtendedVectors must be set to TRUE before the exception vectors are initialized in the VxWorks boot sequence (that is, before the call to excVecInit( )). Setting excExtendedVectors after excVecInit( ) does not achieve the desired result, and results in unpredictable system behavior. Selection of extended-call exception vectors is done on a per-BSP basis in order to minimize the impact on those BSPs that do not require this feature.
Because excExtendedVectors must be set to TRUE before the call to excVecInit( ), users must define the preprocessor define INCLUDE_SYS_HW_INIT_0, and also supply a sysHwInit0( ) function that sets excExtendedVectors to TRUE.
The following example is taken from the ads860 BSP.
#ifdef INCLUDE_SYS_HW_INIT_0 /* * Perform any BSP-specific initialisation that must be done before * cacheLibInit() is called and/or BSS is cleared. */ #ifndef _ASMLANGUAGE IMPORT BOOL excExtendedVectors; extern void sysHwInit0(); #endif /*_ASMLANGUAGE */ #define SYS_HW_INIT_0 sysHwInit0 #endif /* INCLUDE_SYS_HW_INIT_0 */
#ifdef INCLUDE_SYS_HW_INIT_0
/************************************************************************
* sysHwInit0 - Used here to enable extended exception vector support.
*
* RETURNS: None.
*/
void sysHwInit0 ()
{
excExtendedVectors = TRUE; /* enable extended-call exc. vectors */
}
#endif /*INCLUDE_SYS_HW_INIT_0 */
Not all target architectures support hardware breakpoints, and those that do accept different values for the access type passed to the bh( ) routine. The PowerPC family supports hardware breakpoints, however, the access type of hardware breakpoints allowed depends upon the specific processor.
For each processor family, the number of hardware breakpoints (a hardware limitation), address alignment constraints, and access types are detailed in the following tables. Both instruction and data access must be 4- byte aligned unless otherwise noted.
IBM PPC403 and PPC405 targets have two data breakpoints and two instruction breakpoints.
Address data parameters are 1-byte aligned if width access is 1 byte, 2-bytes aligned if width access is 2 bytes, 4-bytes aligned if width access is 4 bytes, and cache-line-size aligned if access is a data cache line (16 bytes on PPC403, 32 bytes on PPC405). Instruction accesses are always 4-byte aligned.
IBM PPC40x processors allow the following access types for hardware breakpoints; note that the access type arguments are slightly different between PPC403 and PPC405. The byte width means break on all accesses between (addr) and (addr + x):
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
The Motorola PPC604/75x/74xx/8xx CPUs have one data and one instruction breakpoint. Data and instruction access must be 4-byte aligned
The PPC8xx and PPC440 have 4 instruction and 2 data breakpoints. Data access is 1-byte aligned on PPC8xx and PPC440 CPUs.
All of these processors allow the following access types for hardware breakpoints:
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
The PowerPC conventions regarding register usage, stack frame formats, parameter passing between routines, and other factors involving code inter-operability, are defined by the ABI (Application Binary Interface) and the EABI (Embedded Application Binary Interface) protocols. The VxWorks implementation for PowerPC follows these protocols. Table 12 shows PowerPC register usage in VxWorks (note that only CPUs with hardware floating-point support have fpr0-31).
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
|
|
|||||||||||||||||||
The following subsections augment the information in the VxWorks Programmer's Guide: I/O System.
PowerPC processors contain an instruction cache and a data cache. In the default configuration, VxWorks enables both caches. To disable the instruction cache, highlight the USER_I_CACHE_ENABLE macro in the Params tab under INCLUDE_CACHE_ENABLE and remove the TRUE; to disable the data cache, highlight the USER_D_CACHE_ENABLE macro and remove the TRUE.
For most boards, the cache capabilities must be used with the MMU to resolve cache coherency problems. The page descriptor for each page selects the cache mode. This page descriptor is configured by filling the data structure sysPhysMemDesc[ ] defined in sysLib.c. (For more information about cache coherency, see the reference entry for cacheLib. For information about the MMU and VxWorks virtual memory, see the VxWorks Programmer's Guide: Virtual Memory Interface. For MMU information specific to the PowerPC family, see Memory Management Unit.)
The state of both data and instruction caches is controlled by the WIMG2 information saved either in the BAT (Block Address Translation) registers or in the segment descriptors. Because a default cache state cannot be supplied, each cache can be enabled separately after the corresponding MMU is turned on. For more information on these cache control bits, see PowerPC Microprocessor Family: The Programming Environments, published jointly by Motorola and IBM.
The PPC403, and the PPC405 when not using the MMU, control the W, I, and G attributes using Special Purpose Registers (SPRs). (Because they do not provide any hardware support for memory coherency, these processors always consider the M attribute to be off.)
See the respective processor user's manual for detailed descriptions of the Data Cache Cacheability Register (DCCR), Data Cache Write-through Register (DCWR), Instruction Cache Cacheability Register (ICCR), and Storage Guarded Register (SGR).
The following describes exceptions to the above implementation for PowerPC 440 processors:
The Book E specification and the PPC440 core implementation do not provide a means to set a global cache enable/disable state, nor do they permit independently enabling or disabling the instruction and data caches.
In the default configuration, VxWorks enables both caches. If you disable one cache, you must disable the other. To disable both caches, highlight the USER_I_CACHE_ENABLE and USER_D_CACHE_ENABLE macros in the Params tab under INCLUDE_CACHE_ENABLE and remove the TRUE.
The state of both data and instruction caches is controlled by the WIMG information saved either in the static TLB entry registers or in the dynamic memory mapping descriptors. Because a default cache state cannot be supplied, both caches are enabled after the corresponding MMU is turned on.
If an application requires a different cache mode for instruction versus data access on the same region of memory, #undef USER_I_MMU_ENABLE, #define USER_D_MMU_ENABLE, use sysStaticTlbDesc[ ] to set up the instruction access mode, and sysPhysMemDesc[ ] to set up the data access mode.
The VxWorks cache library interface has changed for the following two calls:
STATUS cacheEnable(CACHE_TYPE cache);
STATUS cacheDisable(CACHE_TYPE cache);
The cache argument is ignored and the instruction and data caches are both enabled or disabled together. If called before the MMU library is initialized, cacheEnable returns OK and signals the MMU library to activate the cache after it has completed initialization. If the MMU library is active (that is, MSRDS = 1), cacheEnable returns ERROR.
The PowerPC 403, 405, 440, and 860 processors do not support hardware floating-point instructions. However, VxWorks provides a floating-point library that emulates these mathematical functions. All ANSI floating-point functions have been optimized using libraries from U. S. Software.
In addition, the following single-precision functions are also available:
The following floating-point functions are not available on PowerPC 403, 405, 440, and 860 processors:
The following floating-point functions are available for PowerPC 60x processors:
The following subset of ANSI functions is optimized using libraries from Motorola:
The following floating-point functions are not available on PowerPC 60x processors:
No single-precision functions are available for PPC60x processors.
Handling of floating-point exceptions is supported for PowerPC 60x processors. By default, the floating-point exceptions are disabled.
To change the default for a task spawned with the VX_FP_TASK option, modify the values of the Machine State Register (MSR) and the Floating-Point Status and Control Register (FPSCR) at the beginning of the task code.
VxMP is an optional VxWorks component that provides shared-memory objects dedicated to high-speed synchronization and communication between tasks running on separate CPUs. For complete documentation of the optional component VxMP, see the VxWorks Programmer's Guide: Shared-Memory Objects.
Normally, boards that make use of VxMP must support hardware test-and-set (TAS: atomic read-modify-write cycle). Motorola PowerPC boards do not provide atomic (indivisible) TAS as a hardware function. VxMP for PowerPC provides special software routines which allow these Motorola boards to make use of VxMP.
|
|
|||||||||||||||||||
The VxMP product for Motorola PowerPC boards has special software routines which compensate for the lack of atomic TAS operations in the PowerPC and the lack of atomic instruction propagation to and from these boards. This software consists of the routines sysBusTas( ) and sysBusTasClear( ).
The software implementation uses ownership of the VME bus as a semaphore; in other words, no TAS operation can be performed by a task until that task owns the VME bus. When the TAS operation completes, the VME bus is released. This method is similar to the special read-modify-write cycle on the VME bus in which the bus is owned implicitly by the task issuing a TAS instruction. (This is the hardware implementation employed, for example, with a 68K processor.) However, the software implementation comes at a price. Execution is slower because, unlike true atomic instructions, sysBusTas( ) and sysBusTasClear( ) require many clock cycles to complete.
To invoke this feature, set SM_TAS_TYPE to SM_TAS_HARD on the Params tab of the project facility under INCLUDE_SM_OBJ.
Systems using multiple VME boards where at least one board is a Motorola PowerPC board must have a Motorola PowerPC board as the board with a processor ID equal to 0 (the board whose memory is allocated and shared). This is because a TAS operation on local memory by, for example, a 68K processor does not involve VME bus ownership and is, therefore, not atomic as seen from a Motorola PowerPC board.
This restriction does not apply to systems that have globally shared memory boards which are used for shared memory operations. Specifying SM_OFF_BOARD as TRUE on the Params tab of the properties window for the processor with ID of 0 and setting the associated parameters enables you to assign processor IDs in any configuration. (For more information, see the VxWorks Programmer's Guide: Shared-Memory Objects.)
PowerPC 403, 405, and 440 processors support two classes of exceptions and interrupts: normal and critical. This release correctly attaches default handlers to both classes of exception handler. This release supplements the excConnect( ) and intConnect( ) functions by adding the excCrtConnect( ) and excIntCrtConnect( ) functions:
STATUS excCrtConnect (VOIDFUNCPTR *vectr, VOIDFUNCPTR routine);
STATUS excIntCrtConnect (VOIDFUNCPTR *vectr, VOIDFUNCPTR routine);
The excCrtConnect( ) function connects a C routine to a critical exception vector, in a manner analogous to excConnect( ). The excIntCrtConnect( ) routine performs a similar function for an interrupt (also see excVecGet( ) and excVecSet( )).
The excIntConnectTimer( ) function, required for PPC403 and PPC405 targets, is not needed for the PPC440.
Application code need not use the lower-level interfaces, excCrtConnect( ) and excIntCrtConnect( ) (see PowerPC 403, 405, and 440); instead, application code should use the simpler excVecGet( ) and excVecSet( ) routines, which automatically handle the different cases of critical and non-critical exceptions:
FUNCPTR excVecGet (FUNCPTR *vectr);
void excVecSet (FUNCPTR *vectr, FUNCPTR function);
The VxWorks memory layout is the same for all PowerPC processors. Figure 2 shows the memory layout, labeled as follows:
Anchor for the shared memory network and VxMP shared memory objects (if there is shared memory on the board).
The VxWorks system image itself (three sections: text, data, and bss). The entry point for VxWorks is at the start of this region, which is BSP dependent (see BSP-specific documentation).
Memory allocated by host tools. The size depends on the the macro WDB_POOL_SIZE. Modify WDB_POOL_SIZE under INCLUDE_WDB.
Size depends on the size of the system image. The sysMemTop( ) routine returns the address of the end of the free memory pool.
All addresses shown in Figure 2 are relative to the start of memory for a particular target board. The start of memory (corresponding to 0x0 in the memory-layout diagram) is defined as LOCAL_MEM_LOCAL_ADRS under INCLUDE_MEMORY_CONFIG for each target.
1: The IBM PPC750FX was not available for test when this document was written, but is expected to be included in this Tornado release.
2:
W: the WRITETHROUGH or COPYBACK attribute.
I: the cache-inhibited attribute.
M: the memory coherency required attribute.
G: the guarded memory attribute.