Title: Multiple Nonzero-Rank Part References

Submitted By: Aleksandar Donev

Status: For Consideration

References: J3/03-253

Basic Functionality:
I propose to delete the constraint that prohibits multiple nonzero rank part-refs:
"In a data-ref, there shall be no more then one part-ref with nonzero rank." 
There is no justification for this constraint, and removing it would unleash a most 
useful capability which Fortran is uniquely capable of with its ability to deal with 
non-contiguous arrays.

Rationale:
The proposed functionality gives two gains:

1) It allows for a kind of separation between the implementation of operations on data 
   and the way the data is actually stored which is unprecedented in other languages. 
   This kind of separation is much more flexible and easy to use then inheritance-based 
   methods (but is more limited in that only data, not methods, are covered). An example 
   includes the ability to code a computational geometry package which operates on a 
   collection of points, without specifically indicating how the coordinates of the 
   points are stored - in a simple multidimensional array, or inside some complicated 
   hierarchy of derived types.

2) It allows the use of all the powerful array syntax and intrinsics for data stored 
   inside derived types.

Take the simple example:

TYPE Point3D
     ! A point in 3D
     REAL :: coordinates(3), data(2)
END TYPE Point3D

TYPE(point3D), DIMENSION(10) :: points
     ! A collection of points

Finding the centroid of the selected points would be performed with,

WRITE(*,*) "The centroid is", SUM(points%coordinates, DIM=2) / SIZE(points)

which requires no loops.

Even more useful would be the ability to pass the coordinates of the selected points to 
a procedure (note that this procedure need not know that the coordinates came from an 
array of derived type point3D).

Estimated Impact:
The edits needed to implement this are small and localized to Section 6.1.2 (examples 
are given under Specification). References with multiple non-zero part-refs are treated 
in all respects like data-refs with just a single non-zero rank part-ref, namely, they 
are array sections. 
Therefore I estimate that no other part of the standard will need to be changed.

The implementation of this feature does require some nontrivial work.
However, the steps involved are very similar to the way current data-refs and array 
pointers/sections are handled.
I have implemented extensions for the three compilers I use to be able to use such 
structure components in only a hundred lines of Fortran + C code.
I essentially use low-level C code which manipulates the compiler's array descriptors 
to create an higher rank array pointer to the data-refs I need, and then I can use the 
array pointer when I need to access the data as a multi-rank array (see my Fortran 
Forum article).

Detailed Specification:
The main edits needed are the following:

Delete "In a data-ref, there shall be no more then one part-ref with nonzero rank". 
Then add constraint 

The rank of a data-ref is the sum of the ranks of the part-refs with nonzero rank, 
if any; otherwise, the rank is zero.
...
Cxxx: The maximum rank of a data-ref shall be 7.

and change the way the rank of data-refs is determined:

The rank and shape of a nonzero rank part-ref are determined as follows.
If the part-ref has no section-subscript-list, the rank and shape are those of 
part-name. Otherwise, the rank is the number of subscript triplets and vector 
subscripts in section-subscript-list, and the shape is the rank-1 array whose i-th 
element is the number of integer values in the sequence indicated by the i-th subscript 
triplet or vector subscript. If any of these sequences is empty, the corresponding 
element in the shape is zero.

In an array-section, the rank of the array is the sum of the ranks of the nonzero rank 
part-refs. The shape of the array is the rank-1 array obtained by concatenating the 
shapes of the nonzero rank part-refs, in backward order, i.e., starting from the last 
one. If the shape has an element with the value of zero, the array section has size zero.

There are some other edits that will be needed, mostly in Section 6.1.2.

The Shape of the data-ref

A problem in the proposal as described above is that the Fortran order of specifying 
components, structure%component, as opposed to the alternative component%structure, 
is the opposite of the order of concatenation of the shapes of the non-zero rank 
references.

For example, the reference:

level1(1:4,1:5,1:6)%level2(1:2,1:3)%level3(1:1)

represents an array section of shape (/1,2,3,4,5,6/), and not (/4,5,6,2,3,1/) as might 
be thought at first.
However, this is the best choice, for both the compiler and the standard and the user, 
despite the extra cost of having to be careful with indices in certain situations. I 
believe the wrong choice was made when component references were chosen to follow the 
C-style ordering of object%component instead of component%object. This cannot be changed 
now without introducing a whole new syntax and the associated cost for users and 
implementors. Instead, we should choose the proposed shape for the data-ref that I 
describe here and accept the loss of simplicity in the syntax as unavoidable due to 
past mistakes.

History:
Many debates during the design of F8x...

Comments:
John Reid, JKR Associates, Oxford:
I would like to suggest that we allow arrays of arrays, such as
       a(:,:)%comp(:,:)

They are not allowed because when such an array is passed to a dummy argument dum,
      a(i,j)%comp(k,l)
corresponds to
      dum(k,l,i,j)

and the more array parts there are, the more confusing it is seen to be. 
However, I think we could get used to the rule and it is not too hard to state.

Personally, I would prefer a shorter and simpler proposal and am prepared to work on it 
if we decide in favour.

I discussed this with Lawrie Schonfelder some time ago by e-mails and he wants it.

Malcolm Cohen, Nihon NAG, Tokyo:
   This seems semi-reasonable, HOWEVER
     (i) we still need to maintain that no pointer or allocatable component
         can occur after a nonzero rank part.
    (ii) it is seriously limited without expanding our current 7-dim limit.
         Expanding the 7-dim limit costs (it makes runtime library routines
         that traverse an array bigger - they all need rewriting).

   Overall, I'd definitely put this feature as being lower priority than
   expanding our current dimension limit.

   SUMMARY: (1) More important to have more dimensions; pick a number (15
                is the smallest number J3 came up with, and is the largest
                one I'd want to see).
            (2) I don't see this particular proposal as being terribly
                important.