Roughly, a section is a range of addresses, with no gaps; all data "in" those addresses is treated the same for some particular purpose. For example there may be a "read only" section.
The linker ld
reads many object files (partial programs) and
combines their contents to form a runnable program. When as
emits an object file, the partial program is assumed to start at address 0.
ld
assigns the final addresses for the partial program, so that
different partial programs do not overlap. This is actually an
oversimplification, but it suffices to explain how as
uses
sections.
ld
moves blocks of bytes of your program to their run-time
addresses. These blocks slide to their run-time addresses as rigid
units; their length does not change and neither does the order of bytes
within them. Such a rigid unit is called a section. Assigning
run-time addresses to sections is called relocation. It includes
the task of adjusting mentions of object-file addresses so they refer to
the proper run-time addresses.
For the H8/300 and H8/500,
and for the Hitachi SH,
as
pads sections if needed to
ensure they end on a word (sixteen bit) boundary.
An object file written by as
has at least three sections, any
of which may be empty. These are named text, data and
bss sections.
When it generates COFF output,
as
can also generate whatever other named sections you specify
using the `.section' directive (see section .section name
, subsection).
If you do not use any directives that place output in the `.text'
or `.data' sections, these sections still exist, but are empty.
When as
generates SOM or ELF output for the HPPA,
as
can also generate whatever other named sections you
specify using the `.space' and `.subspace' directives. See
HP9000 Series 800 Assembly Language Reference Manual
(HP 92432-90001) for details on the `.space' and `.subspace'
assembler directives.
Additionally, as
uses different names for the standard
text, data, and bss sections when generating SOM output. Program text
is placed into the `$CODE$' section, data into `$DATA$', and
BSS into `$BSS$'.
Within the object file, the text section starts at address 0
, the
data section follows, and the bss section follows the data section.
When generating either SOM or ELF output files on the HPPA, the text
section starts at address 0
, the data section at address
0x4000000
, and the bss section follows the data section.
To let ld
know which data changes when the sections are
relocated, and how to change that data, as
also writes to the
object file details of the relocation needed. To perform relocation
ld
must know, each time an address in the object
file is mentioned:
(address) - (start-address of section)?
In fact, every address as
ever uses is expressed as
(section) + (offset into section)
Further, most expressions as
computes have this section-relative
nature.
(For some object formats, such as SOM for the HPPA, some expressions are
symbol-relative instead.)
In this manual we use the notation {secname N} to mean "offset N into section secname."
Apart from text, data and bss sections you need to know about the
absolute section. When ld
mixes partial programs,
addresses in the absolute section remain unchanged. For example, address
{absolute 0}
is "relocated" to run-time address 0 by
ld
. Although the linker never arranges two partial programs'
data sections with overlapping addresses after linking, by definition
their absolute sections must overlap. Address {absolute 239}
in one
part of a program is always the same address when the program is running as
address {absolute 239}
in any other part of the program.
The idea of sections is extended to the undefined section. Any address whose section is unknown at assembly time is by definition rendered {undefined U}---where U is filled in later. Since numbers are always defined, the only way to generate an undefined address is to mention an undefined symbol. A reference to a named common block would be such a symbol: its value is unknown at assembly time so it has section undefined.
By analogy the word section is used to describe groups of sections in
the linked program. ld
puts all partial programs' text
sections in contiguous addresses in the linked program. It is
customary to refer to the text section of a program, meaning all
the addresses of all partial programs' text sections. Likewise for
data and bss sections.
Some sections are manipulated by ld
; others are invented for
use of as
and have no meaning except during assembly.
ld
deals with just four kinds of sections, summarized below.
as
and ld
treat them as
separate but equal sections. Anything you can say of one section is
true another.
When the program is running, however, it is
customary for the text section to be unalterable. The
text section is often shared among processes: it contains
instructions, constants and the like. The data section of a running
program is usually alterable: for example, C variables would be stored
in the data section.
ld
must
not change when relocating. In this sense we speak of absolute
addresses being "unrelocatable": they do not change during relocation.
An idealized example of three relocatable sections follows. The example uses the traditional section names `.text' and `.data'. Memory addresses are on the horizontal axis.
These sections are meant only for the internal use of as
. They
have no meaning at run-time. You do not really need to know about these
sections for most purposes; but they can be mentioned in as
warning messages, so it might be helpful to have an idea of their
meanings to as
. These sections are used to permit the
value of every expression in your assembly language program to be a
section-relative address.
Assembled bytes
conventionally
fall into two sections: text and data.
You may have separate groups of
data in named sections
text or data
that you want to end up near to each other in the object file, even though they
are not contiguous in the assembler source. as
allows you to
use subsections for this purpose. Within each section, there can be
numbered subsections with values from 0 to 8192. Objects assembled into the
same subsection go into the object file together with other objects in the same
subsection. For example, a compiler might want to store constants in the text
section, but might not want to have them interspersed with the program being
assembled. In this case, the compiler could issue a `.text 0' before each
section of code being output, and a `.text 1' before each group of
constants being output.
Subsections are optional. If you do not use subsections, everything goes in subsection number zero.
Each subsection is zero-padded up to a multiple of four bytes.
(Subsections may be padded a different amount on different flavors
of as
.)
Subsections appear in your object file in numeric order, lowest numbered
to highest. (All this to be compatible with other people's assemblers.)
The object file contains no representation of subsections; ld
and
other programs that manipulate object files see no trace of them.
They just see all your text subsections as a text section, and all your
data subsections as a data section.
To specify which subsection you want subsequent statements assembled
into, use a numeric argument to specify it, in a `.text
expression' or a `.data expression' statement.
When generating COFF output, you
can also use an extra subsection
argument with arbitrary named sections: `.section name,
expression'.
Expression should be an absolute expression.
(See section Expressions.) If you just say `.text' then `.text 0'
is assumed. Likewise `.data' means `.data 0'. Assembly
begins in text 0
. For instance:
.text 0 # The default subsection is text 0 anyway. .ascii "This lives in the first text subsection. *" .text 1 .ascii "But this lives in the second text subsection." .data 0 .ascii "This lives in the data section," .ascii "in the first data subsection." .text 0 .ascii "This lives in the first text section," .ascii "immediately following the asterisk (*)."
Each section has a location counter incremented by one for every byte
assembled into that section. Because subsections are merely a convenience
restricted to as
there is no concept of a subsection location
counter. There is no way to directly manipulate a location counter--but the
.align
directive changes it, and any label definition captures its
current value. The location counter of the section where statements are being
assembled is said to be the active location counter.
The bss section is used for local common variable storage. You may allocate address space in the bss section, but you may not dictate data to load into it before your program executes. When your program starts running, all the contents of the bss section are zeroed bytes.
Addresses in the bss section are allocated with special directives; you
may not assemble anything directly into the bss section. Hence there
are no bss subsections. See section .comm symbol
, length ,
see section .lcomm symbol
, length.