A discussion between Nicolas Delcros and Roger Hui
Nicolas, Prologue: From a language point of view, thanks to Ken Iverson, it is obvious that you want grade rather than sort as a primitive. Yet from a performance point of view, sort is currently faster than grade.
Can one be “more fundamental” than the other? If so, who’s wrong…APL or our CPUs? In any case, what does “fundamental” mean?
Roger: Sorting is faster than grading due to reasons presented in the J Wiki essay Sorting versus Grading, in the paragraph which begins “Is sorting necessarily faster than grading?” I can not prove it in the mathematical sense, but I believe that to be the case on any CPU when the items to be sorted/graded are machine units.
Nicolas, Formulation 1: Now, parsing ⍵[⍋⍵]
(and not scanning the idiom) begs the question of how deeply an APL intepreter can “understand” what it’s doing to arrays.
How would an APL compiler resolve this conjunction in the parse tree? Do you simply have a bunch of state pointers such as “is the grade of” or “is sorted” or “is squozen” or “axis ordering” etc. walking along the tree? If so, do we have an idea of the number of such state pointers required to exhaustively describe what the APL language can do to arrays? If not, is there something more clever out there?
Roger: I don’t know of any general principles that can tell you what things can be faster. I do have two lists, one for J and another for Dyalog. A big part of the lists consists of compositions of functions, composition in the mathematical sense, that can be faster than doing the functions one after the other if you “recognize” what the composed function is doing and write a native implementation of it. Sort vs. grade is one example (sort is indexing composed with grade). Another one is (⍳∘1 >)
or (1 ⍳⍨ >)
. The function is “find the first 1” composed with >
. These compositions have native implementations and:
x←?1e6⍴1e6
cmpx '5e5(1 ⍳⍨ >)x' '(5e5>x)⍳1'
5e5(1 ⍳⍨ >)x → 0.00E0 | 0%
(5e5>x)⍳1 → 1.06E¯2 | +272100% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
cmpx '¯1(1 ⍳⍨ >)x' '(¯1>x)⍳1'
¯1(1 ⍳⍨ >)x → 2.41E¯3 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(¯1>x)⍳1 → 4.15E¯3 | +71% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
If you get a “hit” near the beginning, as would be the case with 5e5
, you win big. Even if you have to go to the end (as with ¯1
), you still save the cost of explicitly generating the Boolean vector and then scanning it to the end.
Another one, introduced in 14.1, is:
cmpx '(≢∪)x' '≢∪x'
(≢∪)x → 4.43E¯3 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
≢∪x → 1.14E¯2 | +157% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
This is the tally nub composition, used often in a customer application. If you “know” that you just want the tally of the nub (uniques), you don’t actually have to materialise the array for the nub.
I am not conversant with compiler technology so I don’t know what all an APL compiler can do. I do know that there’s a thing call “loop fusion” where, for example, in a+b×c÷2
, it doesn’t have to go through a,b,c
in separate loops, but can instead do
:for i :in ⍳≢a ⋄ z[i]←a[i]+b[i]×c[i]÷2 ⋄ :endif
saving on the array temp on every step. You win some with this, but I think the function composition approach wins bigger. On the other hand, I don’t know that there is a general technique for function composition. I mean, what general statements can you make about what things can be faster (AKA algebraically simpler)?
Nicholas: I sort of see…so jot is a “direct” conjunction. An indirect conjunction could be ⍵[⍺⍴⍋⍵]
where the intermediate grade is reshaped. We “know” that grade and shape are “orthogonal” and can rewrite the formula to ⍴⍵[⍋⍵]
.
So if we can establish a list of flags, and establish how primitives touch these flags, and how these flags affect each other, then we can extend ∘
to any path of the parse tree, provided that the intermediate nodes don’t destroy the relationship between the two operations (here grade + indexing provides sort, independently of the reshape).
Of course we can spend our lives finding such tricks. Or we could try and systemise it.
Roger: What you are saying above is that reshape and indexing commute (i.e. reshape ∘
indexing ←→
indexing ∘
reshape). More generally, compositions of the so-called structural functions and perhaps of the selection functions are amenable to optimisations. This would especially be the case if arrays use the “strided representation” described in Nick Nickolov’s paper Compiling APL to JavaScript in Vector. I used strided representation implicitly to code a terse model of ⍺⍉⍵
in 1987.
Nicolas, Formulation 2: On which grounds did the guys in the ’50s manage to estimate the minimal list of operations that you needed to express data processing?
Roger: APL developed after Ken Iverson struggled with using conventional mathematical notation to teach various topics in data processing. You can get an idea of the process from the following papers:
- Notation as a Tool of Thought (Turing lecture)
- The Evolution of APL, especially the Transcript of Presentation
In our own humble way, we go through a similar process: We talk to customers to find out what problems they are faced with, what things are still awkward, and think about what if anything we can do to the language or the implementation to make things better. Sometimes we come up with a winner, for example ⌸
. You know, the idea for ⍋
(grade) is that often you don’t just use ⍋x
to order x
(sort) but you use ⍋x
to order something else. Similarly with ⌸
, you often don’t want just x⍳y
but you use it to apply a function to items with like indices. The J Wiki essay Key describes how the idea arose in applications, and then connected with something I read about, the Connection Machine, a machine with 64K processors (this was in the 1980s).
Nicolas, Formulation 3: Do we have something with a wider spectrum than “Turing complete or not” to categorise the “usefulness and/or efficiency” of a language?
Roger: Still no general principles, but I can say this:
- Study the languages of designers whom you respect, and “borrow” their primitives, or at least pay attention to the idea. For example,
=x
is a grouping primitive in k. Symbols are Arthur Whitney’s most precious resource and for him to use up a symbol is significant.
=x ←→ {⊂⍵}⌸x
old k definition
=x ←→ {⍺⍵}⌸x
current k definition
Both{⊂⍵}⌸x
(14.0) and{⍺⍵}⌸x
(14.1) are supported by special code.
- Study the design of machines. For a machine to make something “primitive” is significant. For example, the Connection Machine has an instruction “generalised beta” which is basically
{f⌿⍵}⌸
. Around the time the key operator was introduced, we (Ken Iverson and I) realised that it’s better not to build reduction into a partitioning, so that if you actually want to have a sum you have to say+⌿⌸
rather than just+⌸
. The efficiency can be recovered by recognising{f⌿⍵}⌸
and doing a native implementation for it. (Which Dyalog APL does, in 14.0.)
Roger, Epilogue: To repeat Nicolas’ question in the opening section, what does “fundamental” mean? (Is grade more fundamental than sort?)
In computability, we can use Turing completeness as the metric. But here the requirement is not computability but expressiveness. I don’t know that there is a definitive answer to the question. One way to judge the goodness of a notation, or of an addition to the existing notation such as dfns or key or rank, is to use it on problems from diverse domains and see how well it satisfies the important characteristics of notation set out in Ken Iverson’s Turing lecture:
- ease of expressing constructs arising in problems
- suggestivity
- subordination of detail
- economy
- amenability to formal proofs
I would add to these an additional characteristic: beauty.