logo To Foot
© J R Stockton, ≥ 2003-03-05

Borland Pascal Optimisation 1.

No-Frame * Framed Index * Frame This
Links within this site :-

Viewing Code

To see the code generated, when using BP7, do Alt-C M to Make, then Alt-T D (Alt-T X for DPMI) to start Turbo Debugger, then Alt-V C; AFAIR, various BP Alt-O C options need to be chosen before the Make (I usually have D L y on). This is an exceedingly useful way of examining rival code fragments to see what the compiler makes of them. The asm fragments below have been so obtained.

N.B. The actual floating-point machine instructions are given as CDhh.. ; AIUI, the CD is a software interrupt op-code which, when using a real FPU, is replaced by the appropriate value for an FPU op-code (possibly with a {superfluous?} FWAIT too).

Pedt Scragg :- TP7 came without the standalone debugger. IIRC, that needed to be bought separately and came with TASM.

Those without a good debugger can use the following technique to see the code :-

uses Dos ;
var P : ^byte ; J : byte ;
begin
  P := @GetTime ;
  for J := 1 to 100 do begin Write(P^:4) ; Inc(longint(P)) end ;
  Readln ;
  end.

I believe that this (briefly tested in TP5) writes the first 100 bytes of GetTime; it certainly gives a 44 and a 33 in seemly places.

X86 Newsgroup

On 4 Dec 1997 in comp.lang.pascal.borland, Mike wrote :- ... the vote to moderate c.l.a.x86 worked. ... c.l.a.x is becoming the source for improving the speed of Pascal.

Sets

Sets are a valuable part of Pascal and Delphi; they are in effect compacted arrays of boolean, with efficiently-coded operators. Each element is represented by the state of the corresponding bit in the set. Examples of their use follow; others can be found in the programs directory.

In TP/BP/D, set types can have up to 256 elements, and the Ord of each possible element must be in 0..255 - but one can have an array of sets : for an example, see eratost3.pas.

They are useful for general Bit Operations; see below.

In Delphi, one can also use the TBits type.

A set is stored so that elements 0..255 occupy 32 bytes, except that the unnecessary bytes are not allocated; except that, in Delphi, if three bytes are needed then a fourth, presumably higher, is allocated. Therefore "set of 0..7" uses one byte, but "set of 1..8" needs two bytes. Successful typecasting to/from/between set types requires detailed understanding of storage.

Other implementations of Pascal might store sets in a different manner.

An empty set, of any type, is "[]"; I seek a good notation for a general full set.

Operators on sets returning a set value are :- +, union; -, difference; *, intersection (and, in effect, Include() and Exclude()). Operators on sets returning a boolean value are :- =, <>, <=, >= and in. For details, see the Language Manual and/or the on-line Help.

The bits which are allocated but not within the set proper are, as far as I can see, potentially undefined.

type T34 = set of 3..4 ;
procedure P2 ; var S : T34 ;
  begin Exclude(S, 3) ; Exclude(S, 4) ; Writeln(byte(S)) end ;
BEGIN P2 ; Readln END.

displayed 2, run in the BP7 IDE; and 4, run alone; and 0, with D3 DCC32 -cc. The comparison of two variables of type T34 is implemented (in BP7) as a byte comparison. Type-casting to or from a set which does not use its entire allocation must therefore be unsafe.

Be cautious with "fast" ways of counting elements present, or absent, in ill-fitting sets; they can see bits not in the set proper. Use :-

N := 0 ;
for Element := First to Last do if Element in MySet then Inc(N) ;

Bit Operations

Most operations involving individual bits can readily be expressed by using sets, and this approach often generates faster, shorter code.

Note that if other events can change the variable (e.g. keystate at $40:$17), one needs to disable interrupts during non-atomic operations.

To test whether the bit numbered BitNum (7..0) is set in ThisByte, traditionally,

AFAIR,   if (ThisByte and (1 shl BitNum))>0 then ...
   OR,   if ((ThisByte shr BitNum) and 1)>0 then ...
   OR,   if Odd(ThisByte shr BitNum) then ...

I believe that the first is customary; the second seems to code shorter, and the third shorter still; using sets is even better. The code above is untested, that below is from Debugger by copy'n'paste.

Bob Schor said : "I like using sets (it seems "natural" to me) ... it deals directly with the bits as named types, without doing shifting."

I also like its efficiency; see below (BP7.01, real mode, debugger) :-

program bits ;

type
bittype = (bit0, bit1, bit2, bit3, bit4, bit5, bit6, bit7);
bitsettype = set of bittype;

var byteregister : bitsettype;
ThisByte, BitNum : byte ; X : word ;

BEGIN ;

if (ThisByte and (1 shl BitNum))>0 then X := 0 ;
{ 32 bytes :-
  asm  mov al,[0052] ; xor ah,ah ; mov dx,ax ;
       mov ax,0001 ; mov cx,dx ; shl ax,cl ;
       mov dx,ax ;
       mov al,[0051] ; xor ah,ah ; and ax,dx ;
       or ax,ax ; jle BITS.11 ; xor ax,ax ; mov [0054],ax ; BITS.11: end }

if ((1 shl BitNum) and ThisByte)>0 then X := 0 ;
{ 32 bytes :-
  asm  mov al,[0051] ; xor ah,ah ; mov bx,ax ;
       mov al,[0052] ; xor ah,ah ; mov dx,ax ;
       mov ax,0001 ; mov cx,dx ; shl ax,cl ;
       and ax,bx ;
       or ax,ax ; jle BITS.12 ; xor ax,ax ; mov [0054],ax ; BITS.12: end }

if ((ThisByte shr BitNum) and 1)>0 then X := 0 ;
{ 28 bytes :-
  asm  mov al,[0052] ; xor ah,ah ; mov dx,ax ;
       mov al,[0051] ; xor ah,ah ; mov cx,dx ;
       shr ax,cl ; and ax,0001 ;
       or ax,ax ; jle BITS.13 ; xor ax,ax ; mov [0054],ax ; BITS.13: end }

if Odd(ThisByte shr BitNum) then X := 0 ;
{ 26 bytes :-
  asm  mov al,[0052] ; xor ah,ah ; mov dx,ax ;
       mov al,[0051] ; xor ah,ah ; mov cx,dx ;
       shr ax,cl ;
       shr al,1 ; jnb BITS.16 ; xor ax,ax ; mov [0054],ax ; BITS.16: end }

if (ThisByte and $10)>0 then X := 0 ;
{ 14 bytes, as next }
if boolean(ThisByte and $10) then X := 0 ;
{ 14 bytes :-
  asm  mov al,[0051] ; and al,10 ;
       or al,al ; je BITS.23 ; xor ax,ax ; mov [0054],ax ; BITS.23: end }

if bit4 in byteregister then X := 0 ;
{ 12 bytes :-
  asm  test byte ptr [0050],10 ;
                  je BITS.21 ; xor ax,ax ; mov [0054],ax ; BITS.21: end }

byteregister := byteregister + [bit3];
{ 8 bytes :-
  asm  mov al,[0050] ; or al,08 ; mov [0050],al ; end }

byteregister := byteregister - [bit2];
{ 8 bytes :-
  asm  mov al,[0050] ; and al,FB ; mov [0050],al ; end }

The advantage of Bob's approach is clear, both notationally and efficiency-wise. If bitsettype is to be matched to a byte, the approach does rely on the implementation of a set being what it is. Actually it seems that the implementation is undefined, but it is hard to see what else it might sensibly be and it is unlikely ever to change.

Another example : if you actually want FLAGS to be a bit-array, occupying a single 16-bit word, then it is easy enough to achieve, especially if, as ISTM, you don't need to index it in a general manner.

uses Dos ;
type Flags = (Carry,    b01,    Parity,      b03, AuxCar, b05, Zero, Sign,
               Trap, IntEnb, Direction, Overflow,    b12, b13,  b14,  b15) ;
FlagSet = Set of Flags ;
var Regs : registers ; Z : byte ;
begin
  if (Regs.Flags and 1) = 0 then Z := 0 ;
  if not (Carry in FlagSet(Regs.Flags)) then Z := 0 ;
  end.

PROGRAM.6: if (Regs.Flags and 1) = 0 then Z := 0 ;
  cs:000F A16200         mov    ax,[0062]
  cs:0012 250100         and    ax,0001
  cs:0015 09C0           or     ax,ax
  cs:0017 7505           jne    PROGRAM.7 (001E)
  cs:0019 C606640000     mov    byte ptr [0064],00
PROGRAM.7: if not (Carry in FlagSet(Regs.Flags)) then Z := 0 ;
  cs:001E F606620001     test   byte ptr [0062],01
  cs:0023 7505           jne    PROGRAM.8 (002A)
  cs:0025 C606640000     mov    byte ptr [0064],00
PROGRAM.8: end.
  cs:002A

Roger Donais said :-

In TP/BP 7.0 you can also use the following typecast to manipulate bits w/in a byte

TYPE bit = set of 0..7 ;
  Include( Bit(r.AL), 7) { to set bit 7 } ;
  Exclude( Bit(r.AL), 7) { to clear bit 7 } ;

Note that the Include and Exclude each compile to a single instruction. I see that this can be extended, e.g. to manipulate a word or a longint, with appropriately larger set types; also

if 7 in Bit(r.AL) then ... ;

is an efficient bit test, and likewise extendable.

Word-sized variables (and, I suppose, longints; and anything up to 256 bits) can be treated similarly.

To handle the Shift-State, consider

program X ;
type TS17 = (RSh, LSh, Ctrl, Alt, Sl, Nl, Cl, Il) ;
TStatByte = set of TS17 ;
var Statbyte : TStatByte absolute $40:$17 {use pointers in Win/DPMI} ;
const CSl : TStatByte = [Sl] ;
begin
  Readln ;
  Include(StatByte, Cl) ; Readln ;
  Exclude(StatByte, Cl) ; Readln ;
  byte(StatByte) := byte(StatByte) xor byte(CSl) ; Readln ;
end.

More Bit Testing

Consider, for example,

type Bits = (b0,b1,b2,b3,b4,b5,b6,b7,b8,b9,ba,bb,bc,bd,be,bf) ;

BitSet = set of Bits; DigSet = set of 0..15 ;

var Flags : BitSet ; Fbits : DigSet absolute Flags ; J : byte ; W : word ;

begin
  if b2 in Flags then  ;
  if not (b2 in Flags) then  ;
  Include(Flags, b6) ;
  Exclude(Flags, b3) ;
  Flags := Flags + [b8, bb..be] ;
  Flags := Flags - [Bits(1 shl J)] ;
  Fbits := Fbits + [J] ;
  Flags := Flags * [b2, b4, b9] ;
  if Flags=[] then ;
  if Flags>=[b9] then ;
  if BitSet(W)>=[b1, b3, b13] then ;
  end.

(untested, but compiles) and so on. Where I have examined the code generated, it has always been at least as efficient as, and generally more efficient than, FORTRAN-like array-based techniques. Sets are seemingly most efficient if contained within the native word-length of the compiler and machine being used.

Brief tests suggest that the advantages of using sets, including in some cases shorter code generated, also occur in Delphi. But the situation can be confused by the more powerful optimiser.

Larger Sets

To access a bit in an arbitrary position of a variable which is no longer than 32 bytes, one can use

type BigSet = set of 0..Pred(8*SizeOf(Variable)) ;
 ...
Include(BigSet(Variable), BitNo) ;
if BitNo in BigSet(Variable) then ... ;

To access a bit in an arbitrary position of a variable longer than 32 bytes, one can use

type EightBits = set of 0..7 ;
ArrN = array [0..Pred(SizeOf(Variable))] of EightBits ;
 ...
Include(ArrN(Variable)[BitNo div 8], BitNo mod 8) ;
if BitNo mod 8 in ArrN(Variable)[BitNo div 8] then ... ;

Or, perhaps better, cast it to the largest possible array of sets

type ArrSet = array [0..2046] of set of 0..255 ;
 ...
Include(ArrSet(Variable)[BitNo div 256], BitNo mod 256) ;
if (BitNo mod 256) in ArrSet(Variable)[BitNo div 256] then ... ;

For sets of more than 256 elements, consider :

type SetArr = array [0..Pred((Elements+255) div 256)] of set of 0..255 ;
 ...
Include(SetArr[Index div 256], Index mod 256) ;
if (Index mod 256) in SetArr(Variable)[Index div 256] then ... ;

Note that div/mod by a fixed power of 2 should compile efficiently; and by 256 could be done by pure casting.

In Delphi, the TBits type is an elastic array of bits, without set operators.

Caveat

One should not cast a byte into a set (in general) unless one has control over the declaration of the set, or the number and order of the elements in the set is fixed by reliable external considerations.

It only needs the order of the elements in the set to be changed, and one then has a perfectly compileable program which does something other than what it did before.

Is Character in a Range?

It seems that code such as

if Ch in ['A'..'Z'] then Inc(Ch, 32) ;

is surprisingly efficient; the code used is not like that which would be needed for a general set variable, and could be improved only marginally in assembler. Remember the inverse can be any of

if not (Ch in ['A'..'Z']) then Inc(Ch, 32) ;
if Ch in ['A'..'Z'] then else Inc(Ch, 32) ;
if Ch in [#0..#255]-['A'..'Z'] then Inc(Ch, 32) ;
if Ch in [#0..Pred('A'), Succ('Z')..#255] then Inc(Ch, 32) ;

at least some of which should give identical code.

Another convenient test is as

if Pos(Ch, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')>0 then ...

which extends for strings to such as

if Pos(St+#32, '/h /H /? -h -H -? ')>0 then ...

in which the spaces tend to reduce false positives.

Multiply by Shift and Add

Discussing optimisation of direct screen access -
Scott : '... the y*80 often used can be expressed as "(y shl 4) + (y shl 6)". Two shifts together are more efficient than a single multiply.'
Osmo : '"y*16+y*64" is more readable and does the same thing.'

Me : 'That does 4+6=10 shifts; (y*4+y)*16 should be quicker, doing 2+4=6 shifts. Yes, and it also saves using dx.'

I have used Debugger to see the code - BP7.01, Real mode, {$R-,Q-} :-

 both z := (y*4+y)*16 ; and z := (y shl 2 + y) shl 4 ; give
  asm  mov ax,[0050] ; shl ax,02 ;
       add ax,[0050] ; shl ax,04 ;             mov [0052],ax end ;
 but z := y*16+y*64 ; and z := y shl 4 + y shl 6 ; give
  asm  mov ax,[0050] ; shl ax,06 ; mov dx,ax ;
       mov ax,[0050] ; shl ax,04 ; add ax,dx ; mov [0052],ax end ;
 and z := y*80 ; gives
  asm  imul ax,[0050],0050 ; mov [0052],ax end ;

The times taken, within a minimal FOR loop and with all debugging off, were approximately in the ratio 96:100:138 on my 486/33 PC.

Convert to/from Binary String

Write

procedure PIB2(J : word) ;
var B : byte ;
const OI : array [boolean] of char = '01' ;
begin
  for B := 0 to 15 do begin Write(OI[integer(J)<0]) ; J := J shl 1 end ;
  end {PIB2} ;

To String

function IB(J : word) : string ;
var B : byte ;
const OI : array [boolean] of char = '01' ;
S : string [16] = '0123456789abcdef' ;
begin
  for B := 1 to 16 do begin S[B] := OI[integer(J)<0] ; J := J shl 1 end ;
  {or for B := 16 downto 1 do begin S[B] := OI[Odd(J)] ; J := J shr 1 end ; }
  IB := S end {IB} ;

(* '15 in BitSet(J)' is as good as 'integer(J)<0' *)

Pedt Scragg says that the following is fast (but the "1" needs to be "longint(1)") :-

function LB(d : longint) : string ;
const BD : array[boolean] of char = '01' ;
var j : byte ; S : string ;
begin S[0] := #32 ;
  for j := 0 to 31 do S[32-j] := BD[d and (1 shl j) <> 0] ;
  LB := S end ;                         (* ^ longint(1) *)

? or   S[32-j] := char(Ord('0') + (d and (1 shl j))) ;  ?

This is faster than the previous (corrected) one :-

function LB3(d : longint) : string ;
const BD : array[boolean] of char = '01' ;
type Q = set of 0..31 ;
var j : byte ; S : string ;
begin S[0] := #32 ;
  for j := 0 to 31 do S[32-j] := BD[j in Q(d)];
  LB3 := S end ;

From String

function bb(Bin : string) : longint ;
var K : byte ; B : longint ;
begin B := 0 ;
  for K := 1 to Length(Bin) do begin B := B shl 1 ;
    Inc(B, Ord(Bin[K]='1')) { OR if Bin[K]='1' then Inc(B) } ;
    end ;
  bb := B end {bb} ;

Convert to/from Any-Base String

const C = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ' ;
A : array [0..length(C)-1] of char = C ;

function LItoABS(const Base : byte ; X : longint) : string ;
var S : string [32] ;
begin
  if (Base<2) or (Base>length(A)) or (X<0) then RunError(222) ;
  if X=0 then begin LItoABS := A[0] ; EXIT end ;
  S := '' ;
  repeat S := A[X mod Base] + S ; X := X div Base until X=0 ;
  LItoABS := S end {LItoABS} ;

function ABStoLI(const Base : byte ; const S : string) : longint ;
var LI : longint ; B : byte ;
begin
  if (Base<2) or (Base>length(A)) or (S='') then RunError(223) ;
  LI := 0 ;
  for B := 1 to length(S) do LI := LI*Base + Pos(S[B], C)-1 ;
  ABStoLI := LI end {ABStoLI} ;
Other parts of Opts : Index * Part 2
Home Page
Mail: no HTML
© Dr J R Stockton, near London, UK.
All Rights Reserved.
These pages are tested mainly with Firefox 3.0 and W3's Tidy.
This site, http://www.merlyn.demon.co.uk/, is maintained by me.
Head.