Myricom GM myrinet software and documentation Copyright (c) 1994-2000 by Myricom, Inc. All rights reserved. Permission to use, copy, modify, and distribute this software and its documentation in source and binary forms for non-commercial purposes and without fee is hereby granted, provided that the modified software is returned to Myricom, Inc. for redistribution. The above copyright notice must appear in all copies and both the copyright notice and this permission notice must appear in supporting documentation, and any documentation, advertising materials, and other materials related to such distribution and use must acknowledge that the software was developed by Myricom, Inc. The name of Myricom, Inc. may not be used to endorse or promote products derived from this software without specific prior written permission. Myricom, Inc. makes no representations about the suitability of this software for any purpose. THIS FILE IS PROVIDED "AS-IS" WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESSED OR IMPLIED, INCLUDING THE WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. MYRICOM, INC. SHALL HAVE NO LIABILITY WITH RESPECT TO THE INFRINGEMENT OF COPYRIGHTS, TRADE SECRETS OR ANY PATENTS BY THIS FILE OR ANY PART THEREOF. In no event will Myricom, Inc. be liable for any lost revenue or profits or other special, indirect and consequential damages, even if Myricom has been advised of the possibility of such damages. Other copyrights might apply to parts of this software and are so noted when applicable. Myricom, Inc. Email: info@myri.com 325 N. Santa Anita Ave. World Wide Web: http://www.myri.com/ Arcadia, CA 91024
Portions of this program are subject to the following copyright:
Copyright (c) 1990 The Regents of the University of California. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by the University of California, Berkeley and its contributors. 4. Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This document describes the GM message passing system. The document describes the GM-1.1 API, which is both simpler to use and more powerful that the GM-1.0 API. The 1.0 API will continue to be supported by the GM libraries for the foreseeable future and GM-1.0 programs actually run significantly faster under GM-1.1 than under GM-1.0, but new programs should use the GM API as described in this document.
This document exists in the following formats:
The following typeface conventions are used in this document:
this
represents code.
this()
represents a function name, with the return type
and parameters unspecified. Such references should not be interpreted
to necessarily represent a function with no parameters and/or no return value.
The absence of a return value or parameters is indicated only by the
use of the keyword void
, as in "void function(void)
",
which indicates that function()
returns nothing and requires no
parameter.
Numerical constants are represented in this document using the
C
language conventions.
GM is a message-based communication system for Myrinet. Like many messaging systems, GM's design objectives included low CPU overhead, portability, low latency, and high bandwidth. Additionally, GM has several distinguishing characteristics:
GM is a light-weight communication layer, and as such has limitations that can be addressed by layering a heavier-weight interface over GM. Some such limitations are the following:
From the client's point of view, GM consists of a library, `libgm.a', and a header file, `gm.h'. All externally visible GM identifiers in these files match the regular expression `^_*[Gg][Mm]_' to minimize name space pollution.
Additionally, GM has other parts that system administrators need to be concerned about:
This document attaches special meaning to a few commonly used words. The meaning of each of these words in the context of this document is defined here. In particular, please note the special meanings of the words "size" and "length." Understanding the special meaning of these terms is critical to understanding this document.
gm_mtu(port)
(usually 4096 bytes) to
bound the time any packet can monopolize network resource. Note that
multiple packets are required to send large messages over the network,
but the segmentation of messages into packets and reassembly of
packets into messages is performed automatically by GM.
log (length + 8) 2where length is the length of the message. The size of a receive buffer is any positive integer less than or equal to
log (length + 8) 2where length is the length of the buffer. Consequently, a buffer of size size must have a length of at least
size 2 - 8.A buffer having a longer length serves no useful purpose in GM, but is allowed. The function
gm_min_size_for_length(length)
can be used to
compute the minimum size for any length, and the function
gm_max_length_for_size(size)
can be used to compute the
maximum length for any size.
The GM communication system provides reliable, ordered delivery between communication endpoints, called "ports," with two levels of priority. This model is "connectionless" in that there is no need for client software to establish a connection with a remote port in order to communicate with it: the client software simply builds a message and sends it to any port in the network. (This apparently paradoxical "connectionless reliability" is achieved by GM maintaining reliable connections between each pair of hosts in the network and multiplexing the traffic between ports over these reliable connections.)
Under operating systems that provide memory protection, GM provides memory protected network access. It should be impossible for any non-privileged GM client application to use GM to access any memory other than the application's own memory, except as explicitly allowed by the GM API. The unforgeable source of each received message is available to the receiver, allowing the receiver to discard messages from untrusted sources.
The largest message GM can send or receive is limited to
(2**31)-1
bytes. However, because send and receive buffers must reside in DMAable
memory, the maximum message size is limited by the amount of DMAable
memory the GM driver is allowed to allocate by the operating system.
Most GM applications obtain DMAable memory using the straightforward
gm_dma_malloc()
and gm_dma_free()
calls, but sophisticated
applications with large memory requirements may perform DMA memory
management using gm_register_memory()
and
gm_deregister_memory()
to pin and unpin memory on operating
systems that support memory registration.
Message order is preserved only for messages of the same priority, from the same sending port, and directed to the same receiving port. Messages with differing priority never block each other. Consequently, low priority messages may pass high priority messages, unlike in some other communication systems. Typical GM applications will either use only one GM priority, or use the high priority channel for control messages (such as client-to-client acks) or for single-hop message forwarding.
Both sends and receives in GM are regulated by implicit tokens,
representing space allocated to the client in various internal GM
queues, as depicted in the following figure. At initialization, the
client implicitly possesses gm_num_send_tokens()
send tokens, and
gm_num_receive_tokens()
receive tokens. The client may call
certain functions only when possessing an implicit send or receive
token, and in calling that function, the client implicitly relinquishes
the token(1). The client
program is responsible for tracking the number of tokens of each type
that it possesses, and must not call any GM function requiring a token
when the client does not possess the appropriate token. Calling a GM API
function without the required tokens has undefined results, but GM
usually reports such errors, and such errors will not cause system
security to be violated.
As stated above, sends are token regulated. A client of a port may send
a message only when it possesses a send token for that port. By calling
a GM API send functions, the client implicitly relinquishes that send
token. The client passes a callback
and context
pointer to
the send function. When the send completes, GM calls callback
,
passing a pointer to the GM port, the client-supplied context
pointer, and status code indicating if the send completed successfully
or with an error. When GM calls the client's callback
function,
the send token is implicitly passed back to the client. Most GM programs,
which rely on GM's fault tolerance to handle transient network faults,
should consider a send completing with a status other than
GM_SUCCESS
to be a fatal error. However, more sophisticated
programs may use the GM fault tolerance API extensions to handle such
non-transient errors. These extensions are described in an appendix.
It is important to note that the client-supplied callback
function
will be called only within a client's call to gm_unknown()
,
the GM unknown event handler function that the client must call when
it receives an unrecognized event. The gm_unknown()
function
is described in more detail below.
GM receives are also token regulated. After a port is opened, the
client implicitly possesses gm_num_receive_tokens()
receive
tokens, allowing it to provide GM with up to this many receive buffers
using gm_provide_receive_buffer()
. With each call to
gm_provide_receive_buffer()
, the client implicitly relinquishes a
receive token. With each buffer passed to
gm_provide_receive_buffer()
, the client passes a corresponding
integer size indicating that the length of the receive buffer is
at least gm_max_length_for_size()
bytes.
Before a client of a port can receive a message of a particular size and priority, the client software must provide GM with a receive token of matching size and priority. The receive token specifies the buffer in which to store the matching receive. When a message of matching size and priority is received, that message will be transferred into the receive buffer specified in the receive token. Note that multiple receive tokens of the same size and priority may be provided to the port.
After providing receive buffers with sizes matching the sizes of all
packets that potentially could be received, the client must poll for
receive events using a gm_*receive*()
function. (Most developers
who think polling is unacceptable in their application find that polling
is fine as long as they do it in a separate thread.) The
gm_*receive*()
function will return a gm_receive_event
.
The receipt of events of type GM_RECV_EVENT
and
GM_HIGH_RECV_EVENT
describe received packets of low and high
priority, respectively. All other events should be simply passed to
gm_unknown()
. Such events are used internally by GM for sundry
purposes, and the client need not be concerned with the contents of
unrecognized receive events unless otherwise stated in this document.
To avoid deadlock of the port, the client software must ensure that the port is never without a receive token for any acceptable combination of size and priority for more than a bounded amount of time, that the port is informed which combinations of size and priority are not acceptable for receives, and that the client not send to any remote port that does not do likewise.
By convention, when a port runs out of low priority receive tokens for any combination of sizes, the client may defer replacing the receive tokens pending the completion of a bounded number of high priority sends, but must always replace exhausted types of high priority receive tokens without waiting for any sends to complete. Using this technique, reliable, deadlock-free, single-hop forwarding can be achieved.
Before calling any other GM function, gm_init()
should be called.
gm_finalize()
should be called after all other GM calls and
before your program exits. Each call to gm_init()
should be
balanced by a call to gm_finalize()
before the program exits.
Although GM automatically handles ungraceful program termination without
such balanced calls on operating systems with memory protection,
developers are strongly discouraged from relying on this feature because
on some systems, such as those using the VxWorks embedded runtime
system, the calls to gm_finalize()
are required for proper
shutdown of GM to allow ports to be reused without rebooting VxWorks.
A GM port is initialized by calling gm_open(struct
gm_port**port, unsigned int unit, unsigned int
port_id, char *port_name, enum gm_api_version
version)
to open port number port_id of Myrinet interface
number unit. The pointer returned at *port
must be
passed to subsequent GM API calls. port_name is an character
string of up to gm_max_port_name_length()
bytes describing the
client. The name is currently used for debugging purposes only, but
this information will eventually be available to all GM clients on the
network through a mechanism TBD. version should be
GM_API_VERSION_1_1
.
Note that while the GM API uses "struct gm_port *
" pointers
throughout, these pointers are opaque to the client. The client should
not attempt to dereference these pointers.
After opening a port, the client implicitly possesses
gm_num_send_tokens()
send tokens and gm_num_receive_tokens()
receive tokens. Most GM programs will use most or all of
the gm_num_receive_tokens()
immediately after opening a port
to pass receive buffers to GM using gm_provide_receive_buffer()
.
After the client has provided all receive buffers that it will
provide during port initialization, the client should call
gm_set_acceptable_sizes()
for each priority (GM_LOW_PRIORITY
and GM_HIGH_PRIORITY
) to indicate what GM receive sizes the client
expects to receive on the port. While this call is not strictly required,
calling it allows GM to immediately reject any contradictory sends,
immediately generating a send error at the sender. If these calls
to gm_set_acceptable_sizes()
are not made, then the error will
not be reported until the sender experiences a GM long-period timeout,
which takes about a minute to be generated by default. Therefore,
calling gm_set_acceptable_sizes()
can save much time during
application development.
GM will only send messages from memory allocated with a
gm_dma_*alloc()
function, or memory that has been registered for
DMA transfers using gm_register_memory()
. If the client attempts
to send data from nonDMAable memory, GM will send bytes of value
0xaa
instead. If the client attempts to receive data into
nonDMAable memory, the data will be silently discarded.
Note that some operating systems (e.g.: Solaris) do not support
gm_register_memory()
due to operating system limitations, so the
gm_dma_*alloc()
functions must be used instead to obtain DMA
memory.
Unless explicitly enabled using
gm_allow_remote_memory_access(
port)
, GM will not
allow remote processes to use gm_directed_send()
to modify the
memory of the process. If remote memory access has been enabled, then
this protection is disabled, and any
remote GM port may modify
the contents of any
DMAable memory associated with that port. GM
developers should be aware of this potential security risk, although
it is usually not a concern.
In GM, message sends are regulated by a simple token-passing mechanism
to prevent GM's bounded-size internal queues from overflowing. The
client software must possess a send token before calling
gm_send_with_callback()
. After initialization, the client
software implicitly possesses all gm_num_send_tokens()
send
tokens, and implicitly passes one token to the GM library with each call
to gm_send_with_callback()
or
gm_send_to_peer_with_callback()
. The token is retained by GM
until the send completes, at which time GM calls the client-supplied
callback, implicitly returning the send token to the client. The
contents of the send message should not be modified in the interval
between the call to gm_send()
and the send completion, because
doing so will cause undefined data to be delivered to the receiver.
The gm_send_with_callback()
call requires the following parameters:
GM_HIGH_PRIORITY
or
GM_LOW_PRIORITY
The order of messages with different priorities or with different destination ports is not preserved. Only the order of messages with the same priority and to the same destination port is preserved.
In the special case that the target_port_id is the same as the
sending port ID (as is often the case), the streamlined
gm_send_to_peer_with_callback()
function may be used instead of
gm_send_with_callback()
, allowing the target_port_id
parameter to be omitted, and slightly improving small-message
performance on 32-bit Myrinet interfaces.
Similarly to message sends, message receives in GM are regulated by a simple token-passing mechanism: Before a message can be received, the client software must provide GM a receive token that allows the message to be received and specifies a buffer to hold the received data.
After initialization, the client implicitly possesses all
gm_num_receive_tokens()
receive tokens. The client software
grants receive tokens to GM by calling
gm_provide_receive_buffer(port, buffer, size,
priority)
, indicating that GM may receive any message into
buffer as long as the size
and priority
fields of
the received message exactly match the size and priority
fields passed to gm_provide_receive_buffer()
. Eventually, GM
will use the buffer indicated by message and size to receive
a message of the indicated size and priority. Unlike some
messaging systems, GM requires that the size of the received
message match the token size exactly. GM will not use the next
larger sized receive buffer when a receive buffer of the correct size is
not available. All receive buffers passed to
gm_provide_receive_buffer
must DMAable. They must also be
aligned or be within memory allocated using gm_dma_*alloc()
to
ensure that messages can be DMAed into the buffer, and must be at least
gm_max_length_for_size(size)
bytes long.
Typical GM clients will provide at least 2 receive buffers for each size
and priority of message that might be received to maximize performance by
allowing one buffer to be processed and replaced while the network is
filling the other. However, 1 receive buffer for each size-priority
combination is sufficient for correct operation. Additionally, it is
almost always a good idea to provide additional buffers for the smallest
sizes, so that many small messages may be received while the host is busy
computing. There is no need to provide tokens for receives smaller than
gm_min_message_size()
.
After providing receive tokens, code may poll for pending events using
gm_receive_pending(port)
, which returns a nonzero value if a
receive is pending or zero if no event has been received.
gm_next_event_peek (struct gm_port *p, gm_u16_t *sender)
can also be used to peek at the event at the head of the queue.
The return value is the event type (zero if no event is pending).
The sender parameter will be filled with the sender of
the message if the event is a message receive event.
The client
may also poll for receives using gm_receive(port)
, which
returns a pointer to a event structure of type gm_event_t
.
If no recv event is in the receive queue, a pointer to a fake receive
event of GM_NO_RECV_EVENT
will be returned. The event returned
by gm_receive()
is only guaranteed to be valid until the
next call to gm_receive()
.
There are several variants of gm_receive()
available, all of
which can safely be used in the same program.
gm_receive()
GM_NO_RECV_EVENT
if
none is pending.
gm_blocking_receive()
gm_blocking_receive_no_spin()
Once the client has obtained a receive event from a
gm_*receive*()
function, the client should either process the
event if the client recognizes the event, or pass the event to
gm_unknown()
if the event is unrecognized. All fields in the
receive event are in network byte order, and must be converted to
host byte order as specified in section See section Endian Conversion.
The client is not required to handle any receive events, and may simply
pass all events to gm_unknown()
, but any useful GM program will
handle GM_RECV_EVENT
s or GM_HIGH_RECV_EVENT
s in order to
access the received data. The receive event types that the client
software may choose to recognize are as follows (GM internal events are
not listed):
GM_ALARM_EVENT
GM_ALARM_EVENT
s should be treated as an unknown event and passed
to gm_unknown()
. However, because client alarm handlers are
called within gm_unknown()
when gm_unknown()
receives a
GM_ALARM_EVENT
, it can be useful for a program to perform alarm
polling only after passing GM_ALARM_EVENT
s to
gm_unknown()
, as in the `test/gm_allsize.c' example program.
See the documentation for gm_set_alarm()
for more information.
GM_RECV_EVENT
GM_HIGH_RECV_EVENT
event->recv
structure.
length
size
buffer
gm_provide_receive_buffer()
, which allowed this receive to occur
sender_node_id
sender_port_id
tag
gm_provide_receive_buffer_with_tag()
or
0 if gm_provide_receive_buffer()
was used instead
type
GM_HIGH_RECV_EVENT
indicates the receipt of a high-priority
packet. GM_RECV_EVENT
indicates the receipt of a low-priority
packet.
GM_PEER_RECV_EVENT
GM_HIGH_PEER_RECV_EVENT
gm_unknown()
), in
which case the event will be converted to a normal GM_RECV_EVENT
and passed to the client in the next call to a gm_*receive*()
function.
These events are just like the normal GM_RECV_EVENT
and
GM_HIGH_RECV_EVENT
events, but indicate that the sender port id
is the same as the receiver port id. Most GM programs should handle
these events directly just like they handle normal receive events.
length
size
buffer
gm_provide_receive_buffer()
, which allowed this receive to occur
sender_node_id
sender_port_id
tag
gm_provide_receive_buffer_with_tag()
or
0 if gm_provide_receive_buffer()
was used instead.
type
PEER
event types indicate that the sender port number is the
same as the port number. The HIGH
event types indicate that the
message was sent with high priority.
GM_FAST_RECV_EVENT
GM_FAST_HIGH_RECV_EVENT
GM_FAST_PEER_RECV_EVENT
GM_FAST_HIGH_PEER_RECV_EVENT
gm_unknown()
), in
which case the event will be converted to a normal GM_RECV_EVENT
and passed to the client in the next call to a gm_*receive*()
function. The conversion process will copy the receive message
from the receive queue into the receive buffer.
These types indicate that a small-message receive occurred with the small
message stored in the receive queue for improved small-message
performance. The PEER
event types indicate that the sender port
number is the same as the port number. The HIGH
event types
indicate that the message was sent with high priority.
If your program uses any small messages that are immediately processed
and discarded upon receipt, then your program can improve performance by
processing these messages directly. If after examining the message your
program determines that it needs the data copied into the buffer, it can
either call gm_memorize_message()
to do so or can pass the event
to gm_unknown()
.
message
gm_receive()
length
size
buffer
gm_provide_receive_buffer()
, which allowed this receive to occur
sender_node_id
sender_port_id
tag
gm_provide_receive_buffer_with_tag()
or
0 if gm_provide_receive_buffer()
was used instead.
type
PEER
types indicate that the sender port number is the same
as the port number. The HIGH
types indicate that the message was
sent with high priority.
*message
past the next call to gm_receive()
, then the client should copy
*message
into *buffer
using gm_memorize_message()
,
which is simply a version of bcopy()
optimized for copying
aligned messages. After calling gm_memorize_message()
, the fast
receive event becomes equivalent to a normal receive event.
GM_NO_EVENT
GM_RAW_RECV_EVENT
event->recv
structure:
length
buffer
GM_SENT_EVENT
gm_send()
function, which is deprecated in favor of the superior
gm_send_with_callback()
functions.
event->sent.message_list
points to a null-terminated array of
void
pointers, which are message pointers from earlier
gm_send()
calls that have completed successfully. For each pointer
in this array, a send token is implicitly returned to the client.
Although the number of receive events may seem daunting at first glance, almost all of the event types can be ignored. The following receive dispatch loop is fully functional for a nontrivial application that accepts messages ports, accepts only small control messages sent with high priority, and accepts low priority messages of any size:
{ struct gm_port *my_port; gm_recv_event_t *e; void *some_buffer; ... while (1) { e = gm_receive (my_port); switch (gm_htohc (e->recv.type)) { case GM_HIGH_RECV_EVENT: /* Handle high-priority control messages here in bounded time */ gm_provide_recv_buffer (my_port, gm_ntohp (e->recv.buffer), gm_ntohc (e->recv.size), GM_HIGH_PRIORITY); break; case GM_RECV_EVENT: /* Handle data messages here in bounded time */ gm_provide_recv_buffer (my_port, some_buffer, gm_ntohc (e->recv.size), GM_LOW_PRIORITY); break; case GM_NO_RECV_EVENT: /* Do bounded-time processing here, if desired. */ break; default: gm_unknown (my_port, e); } } }
However, the following implementation is slightly faster because it handles control messages without copying them into the receive buffer:
{ struct gm_port *my_port; gm_recv_event_t *e; void *some_buffer; ... while (1) { e = gm_receive (my_port); switch (gm_ntohc (e->recv.type)) { case GM_FAST_HIGH_PEER_RECV_EVENT: case GM_FAST_HIGH_RECV_EVENT: /* Handle high-priority control messages here in bounded time */ gm_provide_recv_buffer (my_port, gm_ntohp (e->recv.buffer), gm_ntohc (e->recv.size), GM_HIGH_PRIORITY); break; case GM_FAST_PEER_RECV_EVENT: case GM_FAST_RECV_EVENT: gm_memorize_message (gm_ntohp (e->recv.buffer), gm_ntohp (e->recv.message), gm_ntohl (e->recv.length)); case GM_PEER_RECV_EVENT: /* Handle data messages here in bounded time */ gm_provide_recv_buffer (my_port, some_buffer, gm_ntohc (e->recv.size), GM_LOW_PRIORITY); break; case GM_NO_RECV_EVENT: /* Do bounded-time processing here, if desired. */ break; default: gm_unknown (my_port, e); } } }
Any receive event not recognized by an application must be passed
immediately to gm_unknown()
, as in the example above. The
function gm_unknown()
will free any resources associated with the
event that the client application would normally be expected to free if
it recognized the type. Also, additional, undocumented event types will
be received by an application and are handled by gm_unknown()
.
These messages can be used for supporting features such as GM alarms and
blocking receives.
The motivation for putting small messages in the receive queue despite the fact that doing so might require a receive-side copy is the following set of observations:
gm_receive()
.
Therefore, placing small received messages in the receive command queue rather than in the more permanent receive buffer enhances performance and is worth the added complexity.
To prevent program deadlock, the client software must ensure that GM is
never without a receive token (buffer) for any potentially received
message for more than a bounded amount of time. Generally, except for
the case of message `forwarding' described in the next chapter,
this means that after each successful call to gm_receive()
the
client will call gm_provide_receive_buffer()
to replace the
receive token (buffer) with one of the same size and
priority before the next call to gm_receive()
or
gm_send()
. If such a deadlock condition exists for too long (on
the order of a minute) or too often (a significant fraction of a
one-minute interval), then remote sends directed at the receiving port
will time out.
GM receive events are delivered to the user in network byte order. This
enhances the performance of GM programs, but is a minor inconveniece to
developers using the GM API. The client must call a special function to
convert each field read from the gm_recv_event_t
union to host
byte order. Neglecting this conversion will result in undefined
program behaviour in most cases.
In the absense of automatic checks, endian conversion is typically an
error-prone programming task. Therefore, support has been added to
GM-1.4 `gm.h' to ensure that no conversion is missing. Note,
however, the support is incompatible with the deprecated
gm_send()
/GM_SENT_EVENT
mechanism in GM.
All you need to do to activate the checking is add the line
#define GM_STRONG_TYPES 1
before your the line
#include "gm.h"
in your source code to activate this feature(2). Once the feature is activated, the compiler will report errors if any type conversion is missing. The error messages can be a bit cryptic and are platform specific, but they generally indicate some sort of type mismatch.
Endian conversion of fields in receive events from network to host order is achieved with the following functions:
gm_ntohc()
gm_ntohs()
gm_ntohc()
gm_ntohp()
These 4 functions should be sufficient to convert all the types you will encounter in gm receive events.
GM provides the following simple alarm API. The alarm API allow the GM client to schedule a callback function to be called after a delay, specified in microseconds. An unbounded number of alarms may be set, although alarm overhead increases linearly in the number of set alarms, and the client must provide storage for each set alarm.
gm_alarm_t
structure for use with
gm_set_alarm()
. This function should be called after the
structure is allocated but before a pointer to it is passed to
gm_set_alarm()
or gm_cancel_alarm()
.
callback(context)
to be called after
usec microseconds (or later), or reschedule the alarm if it has
already been scheduled and has not yet triggered. callback must
be non-NULL
. context is treated as an opaque pointer by
GM, and may be used to pass a pointer to the client-supplied callback
function.
GM clients will also be able to take advantage of the fact that an
application is guaranteed to receive a single GM_ALARM_EVENT
for
each call to a client-supplied callback, with the corresponding callback
occurring during the call to gm_unknown()
that processes that
alarm. This means that a case statement like the following in the
client's event loops can be used to significantly reduce the overhead of
polling for any effect of a client supplied alarm callback:
case GM_ALARM_EVENT: gm_unknown (event); /* poll for effect of alarm callbacks only here */ break;
While GM automatically handles transient network errors such as dropped, corrupted, or misrouted packets, and while the GM mapper automatically reconfigures the network if links or nodes appear or disappear, GM cannot automatically handle catastrophic errors such as crashed hosts or loss of network connectivity without the cooperation of the client program.
When GM detects a catastrophic error, it temporarily disables the
delivery of all messages with the same sender port, target port, and
priority as the message that experienced the error, and GM informs the
client of catastrophic network errors by passing a status other than
GM_SUCCESS
to the client's send completion callback routine. The
client program is then expected to call either
gm_resume_sending()
or gm_drop_sends()
, which reenable the
delivery of messages with the same sender port, target port, and
priority. This mechanism preserves the message order over the
prioritized connection between the sending and receiving ports, while
allowing the client to decide if the other packets that it has already
enqueued over the same connection should be transmitted or dropped.
Simpler GM programs, such as MPI programs, will typically consider GM
send errors to be fatal and will typically exit when they see a send
error. This is reasonable for applications running on small or
physically robust clusters where errors are rare and when users can
tolerate restarting jobs in the rare event of a network error. Poorly
written GM programs may simply ignore the error codes, which will cause
the program to eventually hang with no error indication when
catastrophic errors are encountered. This poor programming practice is
strongly discouraged: Developers should always check the send completion
status. More sophisticated applications, such as high availability
database applications, will respond to the network faults, which appear
to the client as send completion status codes other than
GM_SUCCESS
.
The send completion status codes are as follows:
GM_SUCCESS
GM_SEND_TIMED_OUT
GM_SEND_REJECTED
gm_set_acceptable_sizes()
)
the size of the message was unacceptable. This error indicates
a programming error in the client software.
GM_SEND_TARGET_PORT_CLOSED
GM_SEND_TARGET_NODE_UNREACHABLE
GM_SEND_DROPPED
gm_drop_sends()
.) This status code does not indicate an error.
GM_SEND_PORT_CLOSED
When the send completion status code indicates an error a sophisticated
client program may respond by calling gm_resume_sending()
or gm_drop_sends()
. Calling gm_resume_sending()
causes GM to
simply reenable delivery of subsequent messages over the connection, including
those that have already been enqueued. This would be the typical
response of a distributed database that assumes the underlying network
is unreliable and layers its own reliability protocol over GM.
Calling gm_drop_sends()
causes GM to drop all enqueued sends
over the disabled connection, return them to the client with
status GM_SEND_DROPPED
, and reenable the connection. This would
be the typical response of a program that wishes to reorder subsequent
communication over the connection in response to the error.
Note that each of the fault response functions (gm_drop_sends()
and gm_resume_sending()
) requires a send token. This send token
is implicitly returned to the caller when the callback function passed
to gm_drop_sends()
or gm_resume_sending()
is called by GM.
Some of GM's internal modules may be useful to GM developers, so their APIs are exposed. These modules include the following:
GM provides the following functions, which compute 32-bit CRCs on the contents of memory. These functions are not guaranteed to perform any particular variant of the CRC-32, but these functions are useful for creating robust hashing functions.
GM implements a generic hash table with a flexible interface. This module can automatically manage storage of fixed-size keys and/or data, or can allow the client to manage storage for keys and/or data. It allows the client to specify arbitrary hashing and comparison functions.
For example,
hash = gm_create_hash (gm_hash_compare_strings, gm_hash_hash_string, 0, 0, 0, 0);
creates a hash table that uses null-terminated character string keys residing in client-managed storage, and returns pointers to data in client-managed storage. In this case, all pointers to hash keys and data passed by GM to the client will be the same as the pointers passed by the client to GM.
As another example,
hash = gm_create_hash (gm_hash_compare_ints, gm_hash_hash_int, sizeof (int), sizeof (struct my_big_struct), 100, 0);
creates a hash table that uses ints
as keys and returns
pointers to copies of the inserted structures. All storage
for the keys and data is automatically managed by the hash table. In
this case, all pointers to hash keys and data passed by GM to the client
will point to GM-managed buffers. This function also preallocates
enough storage for 100 hash entries, guaranteeing that at least
100 key/data pairs can be inserted in the table if the hash table
creation succeeds.
The automatic storage management option of GM not only is convenient, but also is extremely space efficient for keys and data no larger than a pointer, because when keys and data are no larger than a pointer, GM automatically stores them in the space reserved for the pointer to the key or data, rather than allocating a separate buffer.
Note that all keys and data buffers are referred to by pointers, not by value. This allows keys and data buffers of arbitrary size to be used. As a special (but common) case, however, one may wish to use pointers as keys directly, rather than use what they point to. In this special case, use the following initialization, and pass the keys (pointers) directly to the API, rather than the usual references to the keys.
hash = gm_create_hash (gm_hash_compare_ptrs, gm_hash_hash_ptr, 0, data_len, min_cnt, flags);
While it is possible to specify a key_len of sizeof (void
*)
during initialization and treat pointer keys just like any other
keys, the API above is more efficient, more convenient, and completely
architecture independent.
Some day the GM hash table API may be extended, but the current API is as follows:
gm_hash
structure or 0
if the
hash table could not be created. The parameters are as follows:
gm_hash_compare_ints
gm_hash_compare_longs
gm_hash_compare_ptrs
gm_hash_compare_strings
gm_hash_hash_int
gm_hash_hash_long
gm_hash_hash_ptr
gm_hash_hash_string
0
if the keys should not be copied into GM-managed buffers.
0
if the data should not be copied into GM-managed buffers.
0
because no flags are currently defined.
0
if no match exists. If the data resides in a GM-managed
buffer, it is only guaranteed to be valid until the next operation on
the hash table.
0
if no match exists.
*
key (or data *
data) is
copied into the hash table unless the table was initialized with a
key_len (or data_len) of 0.
GM implements a lookaside list, which may be used to manage small
fixed-length blocks more efficiently than gm_malloc()
and
gm_free()
. Lookaside lists can also be used to ensure that at
least a minimum number of blocks are available for allocation at all
times.
GM lookaside lists have the following API:
0
if the buffer could not be allocated.
gm_lookaside_alloc()
. The contents of the block of memory
are guaranteed to be unchanged until the next operation is performed
on the lookaside list.
The GM "mark" API is new to GM-1.4. It allows the creation and destruction of mark sets, which allow mark addition, mark removal, and test for mark in mark set operations to be performed in constant time. Marks may be members of only one mark set at a time. Marks have the very unusual property that they need not be initialized before use.
All operations on marks are extremely efficient. Mark initialization
requires zero time. Removing a mark from a mark set and testing for
mark inclusion in a mark set take constant time. Addition of a mark to
a mark set takes O(constant) time, assuming the marks set was created
with support for a sufficient number of marks; otherwise, it requires
O(constant) average time. Finally, creation and destruction of a mark
set take time comperable to the time required for a single call to
malloc()
and free()
, respectively.
Because marks need not be initialized before use, they can actually be used to determine if other objects have been initialized. This is done by putting a mark in the object, and adding the mark to a "mark set of marks in initialized objects" once the object has been initialized. This is similar to one common use of "magic numbers" for debugging purposes, except that it is immune to the possibility that the uninitialized magic number contained the magic number before initialization, so such marks can be used for non-debugging purposes. Therefore, marks can be used in ways that magic numbers cannot. For example, they may be used to solve the following exercise:
Marks have a nice set of properties that each mark in a mark set has a unique value and if this value is corrupted, then the mark is implicitly removed from the mark set. This makes marks useful for detecting memory corruption, and are less prone to false negatives than are magic numbers, which proliferate copies of a single value.
Finally, marks are location-dependent. This means that if a mark is copied, the copy will not be a member of the mark set.
GM_SUCCESS
on success.
Requires time comperable to malloc()
.
*set
.
Requires time comperable to free()
.
*mark
to set. Requires O(constant) time if
the mark set has preallocated resources for the mark. Otherwise, requires
O(constant) average time.
*mark
is in set. Requires O(constant) time.
*mark
from set.
Requires O(constant) time.
The following GM API allows pages to be allocated and freed.
GM_PAGE_LEN
.
gm_page_alloc()
.
The function (char *)_gm_get_route(port, node_id, length_p)
returns a pointer to the route to the network host with GM ID
node_id and stores the length of the route in
*length_p
(4).
In the future, a standard mechanism for determining the type of a remote node will be specified. It will involve sending a request to the node to return the type to avoid storing types for all remote nodes in local memory.
The following miscellaneous library functions are provided. Several are simply cover functions for standard Unix library functions, but are provided to simplify the creation of portable GM programs, or to provide the ANSI functionality on non-ANSI systems, such as Windows NT.
abort()
. Aborts the current process.
priority
previously freed with gm_free_send_token()
or
returns 0
if no token is available. Clients may choose to maintain
their own send token counts without using this utility function.
gm_directed_send()
function. This is a
significant security hole, but is very useful on tightly coupled
clusters on trusted networks.
gm_blocking_receive()
, only
it sleeps the current thread immediately if no receive is pending. It
is well suited to applications with more than one CPU-intensive thread
per processor.
gm_mtu(port)
If any network error is encountered while sending the packet, the packet
is silently and immediately dropped. After the packet has been DMA'd
from host memory,
callback(port,context,status)
is called
inside a user invokation of gm_unknown()
, reporting the
status of the attempted send.
gm_register_memory()
.
gm_directed_send_with_callback()
instead.
Transfer the len bytes at source_buffer to
target_port_id on target_node_id with priority
priority and store the data at the remote virtual memory address
target_buffer. The order of the transfer is preserved
relative to messages of the same priority sent using gm_send()
or gm_send_to_peer()
.
callback(port,context,status)
when the
send completes or fails, with status indicating the status of the
send. The order of the transfer is preserved relative to messages of
the same priority sent using gm_send()
or
gm_send_to_peer()
.
gm_dma_calloc()
or gm_dma_malloc()
. Note that the memory is not necessarily
unlocked and returned to the operating system, but may be reused in
future calls to gm_dma_calloc()
and gm_dma_malloc()
.
GM_SEND_DROPPED
. This function requires a send token, which
will be returned to the client using a GM_FREE_SEND_TOKEN_EVENT
or GM_FREE_HIGH_SEND_TOKEN_EVENT
, as determined by priority.
status
.
gm_finalize()
should be matched by a call to gm_init()
.
gm_malloc()
, or gm_calloc()
.
gm_alloc_pages()
.
priority
for port
so that it can later be allocated
using gm_alloc_send_token()
. Clients may choose to maintain their
own count of send tokens in the client's possession instead of using
this utility function.
gm_free_send_token()
, but can be used to free zero or more tokens.
*
n.
GM_DIRECTED_SEND_NOTIFICATION
event is received, programs
may optionally call this function instead of gm_unknown()
as an
optimization for handling send completions more efficiently. This
function does not improve performance on newer hardware, which does not
generate GM_DIRECTED_SEND_NOTIFICATION
events.
GM_DIRECTED_SEND_NOTIFICATION
event is
received, programs may optionally call this function instead of
gm_unknown()
as an optimization for handling send completions
more efficiently. This function does not improve performance on newer
hardware, which does not generate GM_DIRECTED_SEND_NOTIFICATION
events.
GM_NO_SUCH_NODE_ID
in case of an error.
GM_PAGE_LEN
). Each
call to gm_init()
should be matched by a call to gm_finalize()
.
(5)
gm_free()
.
*
n.
*n
.
memcmp()
function.
FAST
" receive
messages as described in See section Receiving Messages. If message
and buffer differ, gm_memorize_message(port,
message,buffer)
copies the message pointed to by
message into the buffer pointed to by buffer.
gm_memorize_message()
returns buffer. This function is
optimized for performing such aligned copies.
on_exit()
, this functions registers a callback so
that callback(status,arg)
is called when the
program exits. Callbacks are called in the reverse of the order of
registration. This function is also somewhat similar to BSD
atexit()
.
*
p. This pointer must be passed to all
subsequent functions that operate on the opened port. port_name
is a null-terminated ASCII string that is used to identify the port
client for debugging (and potentially other) purposes; pass in the
name of your program.
Note that unit and port numbers start at 0, and that ports 0 and 1 reserved, so clients will usually open ports 2 and higher.
gm_page_alloc()
.
perror
, but takes the error code as a
parameter to allow thread safety in future implementations, and only
supports GM error numbers. Prints message followed by a
description of errno.
printf()
function.
gm_provide_receive_buffer_with_tag(...,0)
, and
no faster than doing so. It is included for backwards compatibility. Many
new clients will want to use gm_provide_receive_buffer_with_tag()
instead.
size
and priority
fields. It is the
client software's responsibility to provide buffers of each size
and priority that might be received; not doing so can cause
program deadlock, which will eventually result in the port being closed
after a timeout(6).
The client software may provide up to gm_num_receive_tokens()
different
receive buffers into which messages may be received.
Each buffer provided by the client software to GM via this function will
be used only once to receive a message. In other words, calling
gm_provide_receive_buffer(port, buffer, size,
priority)
provides GM a token to receive a single message of size
size and priority priority into the receive buffer
buffer. When a message is eventually received into this buffer,
gm_receive(port)
stores the buffer pointer buffer and
tag in the returned event, returning control of the buffer
(token) to the client software. If the client software wishes for the
buffer to be reused for a similar receive, it must call
gm_provide_receive_buffer()
again with the same or similar
parameters.
Once a buffer has been provided to GM, its content should not be changed
until control of the buffer has been returned to the client software via
gm_receive()
.
The tag parameter must be in the range [0,255], and is returned in the receive event describing a receive into the buffer. It may be used in any way the client desires, and need not be unique.
GM_NO_RECV_EVENT
is immediately returned.
gm_receive*)()
function will return the
event immediately, although gm_receive()
is preferred in this
case for efficiency.
gm_receive*)()
function will return the
event immediately, although gm_receive()
is preferred in this
case for efficiency.
gm_deregister_memory()
. Note that memory registration is an
expensive operation relative to sending and receiving packets, so you
should use persistent memory registrations wherever possible. Also note
that memory registration is not supported on Solaris due to operating
system limitations.
GM_SEND_DROPPED
. This function requires a send
token, which will be returned to the client using a
GM_FREE_SEND_TOKEN_EVENT
or GM_FREE_HIGH_SEND_TOKEN_EVENT
,
as determined by priority.
Queues the message of length len to be sent with priority
priority to node target_node_id. Before calling
gm_send()
, client software must first possess a send token of the
same priority, and by calling gm_send ()
the client implicitly
relinquishes this send token. After a call to gm_send(...,
message, len, ...)
, the memory specified by message
and len must not be modified until the send completes. The buffer
pointed to by message is not modified by GM between the time
gm_send(port, message,...)
is called and the time
that the sent message pointer appears in a GM_SENT_EVENT
.
gm_send()
, client software must first possess a send token of the
same priority, and by calling gm_send ()
the client implicitly
relinquishes this send token. After a call to gm_send(...,
message, len, ...)
, the memory specified by message
and len must not be modified until the send completes. After the
send completes,
callback(port,context,status)
will be
called inside gm_unknown()
, with status indicating the
status of the completed send. The buffer pointed to by message
should not be modified by GM between the time gm_send(port,
message,...)
is called and the time that the sent completes.
Like gm_send()
, only with the target_node_id implicitly set
to the same ID as port. This function is marginally faster than
gm_send()
.
gm_send_with_callback()
, only with the target_node_id
implicitly set to the same ID as port. This function is
marginally faster than gm_send_with_callback()
.
gm_alloc_send_token(port, priority)
would
return, without actually allocating a send token. This function allows
client software to test for the availability of a send token without
actually allocating the send token.
sleep()
, sleeping the entire process for
seconds seconds.
The following functions require a send tokens:
gm_datagram_send()
gm_directed_send()
gm_directed_send_with_callback()
gm_drop_sends()
gm_resume_sending()
gm_send()
gm_send_to_peer()
gm_send_to_peer_with_callback()
gm_send_with_callback()
The send token is implicitly returned to the client when the function's
callback is called or, for the GM-1.0 functions gm_send()
and
gm_send_to_peer()
, a send token is implicitly passed to the
client with each pointer returned in a GM_SENT_EVENT
.
(The legacy GM_SENT_EVENT
s are generated if and only if the legacy
gm_send()
and gm_send_to_peer()
functions are called.)
The following functions require a receive token:
gm_provide_receive_buffer()
gm_provide_receive_buffer_with_tag()
A single receive token is passed to the client with each of the following events:
GM_RAW_RECV_EVENT
GM_RECV_EVENT
GM_HIGH_RECV_EVENT
GM_HIGH_PEER_RECV_EVENT
GM_FAST_HIGH_RECV_EVENT
GM_FAST_HIGH_PEER_RECV_EVENT
(However, if the client passes these event to gm_unknown()
, then the
token is implicitly returned to GM.) Any of the GM receive functions
can generate these types of events. These functions are:
gm_receive()
gm_blocking_receive()
gm_blocking_receive_no_spin()
The following terms abbreviations are used in the GM documentation and source code. Some of these abbreviations are obvious to speakers of English, but are included for speakers of other languages. This section does not include architecture-specific abbreviations used in the architecture-specific GM driver code, as those are documented by the architecture's vendor and are not of interest to most GM developers.
Jump to: g
section Token Reference for details.
On 64-bit Solaris machines, the GM_STRONG_TYPES feature can be used during compilation to check for missing conversion, but it the resulting programs will not run and must be recompiled without this feature.
containing unknown values
The syntax of this call may change.
Currently, gm_open()
implicitly calls gm_init()
for
the caller and gm_close()
implicitly calls gm_finalize()
,
but developers should not depend on this.
For a future version of GM, an exception notification mechanism will report this exception, instead.
This document was generated on 20 November 2000 using texi2html 1.56k.