Volume 24, N°2

Inner product – an old/new problem

by Roger Hui (rhui000@shaw.ca)

Abstract

Exploitation of an old algorithm for inner product leads to a greater than 10-fold improvement in speed to +.× in Dyalog v12.1. In the process dfns and dops were found to be excellent tools of thought for thinking about the problem. We also demonstrate the product of some large sparse matrices in J.

1. Row-by-column

The conventional method for inner product produces its result one element at a time, operating on one row of the left argument against one column of the right argument. That is, if z ← x f.g y, then

   z[i;j] = f/ x[i;] g y[;j]

Question: why is inner product defined in this computationally inconvenient way?

A is a linear transformation from n (dimensional) space to m space, where m n←⍴A, and B is a linear transformation from p space to n space (n p←⍴B). The transformations are effected by A+.×y and B+.×y1. For example:

   A←¯3+?3 3⍴10
   B←¯3+?3 3⍴10
   y←¯3+?3⍴10

   B+.×y
17 12 ¯12
   A+.×(B+.×y)
2 58 15

The last transformation is the result of transforming y first by B and then by A. The inner product A+.×B computes the matrix for this transformation, A composed with B .

   (A+.×B) +.× y
2 58 15

   A+.×B
¯9 ¯19 ¯2
¯6  16  8
 9 ¯21 21

So the answer to the question is, inner product is defined this way to effect the composition of two linear transformations.

2. Row-at-a-time

It turns out that there is an algorithm more efficient than ‘row-by-column’, namely ‘row-at-a-time’. For z←x f.g y:

   z[i;] = f⌿ x[i;] g[0] y

The phrase x[i;] g[0] y applies g to an element of x[i;] against the corresponding row of y. For example:

   10 100 1000 ×[0] 3 4⍴⍳12
   0   10    20    30
 400  500   600   700
8000 9000 10000 11000

In practice, the cumulation f⌿ is composed and interleaved with the dyadic function g[0] so that no additional temporary space is needed. The result for row-at-a-time is mathematically and numerically identical to that for the conventional row-by-column method.

The above description is stated in terms of matrices, but the arguments are applicable to arguments of any rank.

3. Advantages

Row-at-a-time has some efficiency advantages over row-by-column:

4. Models

The current row-by-column implementation in Dyalog v12.0 starts by transposing the right argument to move the leading axis to the back, followed by an outer product of the rows. The process can be modelled as follows:

   dotc ←{↑ (↓⍺) ∘.(⍺⍺{⍺⍺/⍺ ⍵⍵ ⍵}⍵⍵) ↓(¯1⌽⍳⍴⍴⍵)⍉⍵}

   x←¯500+?13 19 11⍴1000
   y←¯500+?11 23⍴1000

   (x+.×y) ≡ x +dotc× y
1
   (x-.≥y) ≡ x -dotc≥ y
1

The expression can be simplified and shortened with judicious use of components:

   dotc   ← {↑ (↓⍺) ∘.(⍺⍺{⍺⍺/⍺ ⍵⍵ ⍵}⍵⍵) ↓(¯1⌽⍳⍴⍴⍵)⍉⍵}
   dotc1  ← {↑ (↓⍺) ∘.(⍺⍺{⍺⍺/⍺ ⍵⍵ ⍵}⍵⍵) ↓flip ⍵}
   dotc2  ← {↑ (↓⍺) ∘.(⍺⍺/at ⍵⍵) ↓flip ⍵}
   dotc3  ← {↑ ⍺ ∘.(⍺⍺/at ⍵⍵)compose↓ flip ⍵}
   dotc4  ← {⍺ ∘.(⍺⍺/at ⍵⍵)under↓ flip ⍵}   ⍝ nonce error
   dotc5  ← {⍺ ∘.(⍺⍺/at ⍵⍵)under↓∘flip ⍵}
   dotc6  ← {∘.(⍺⍺/at ⍵⍵)under↓∘flip}       ⍝ nonce error

   flip   ← {(¯1⌽⍳⍴⍴⍵)⍉⍵}
   at     ← {⍺⍺(⍺ ⍵⍵ ⍵)}
   compose← {(⍵⍵ ⍺)⍺⍺(⍵⍵ ⍵)}
   under  ← {(⍵⍵⍣¯1)(⍵⍵ ⍺)⍺⍺(⍵⍵ ⍵)}

The transformation from dotc3 to dotc4 and from dotc5 to dotc6 do not work, but should. The later definitions are shorter than the earlier ones if the metric is word count; alternatively, they are shorter if the compositions were denoted by symbols such as and instead of at and compose.

Row-at-a-time can be modelled as follows:

   dotr  ← {↑ (↓⍺) ⍺⍺{⍺⍺⌿⍺ ⍵⍵[0] ⍵}⍵⍵¨ ⊂⍵}
   dotr1 ← {↑ (↓⍺) ⍺⍺⌿at(⍵⍵[0])¨ ⊂⍵}

In a C implementation of the algorithm, the functions ↑ ↓ ⊂ and the operator ¨ would not performed explicitly. Rather, they would be embodied as for loops and other control structures to mediate access to the arguments x and y.

In J, the dotc (row-by-column) and dotr (row-at-a-time) operators can be modelled as follows:

   flip  =: 0&|:
   dotcj =: 2 : 'u/@:v"1/ flip' 

   lr    =: 1 : '1{u b. 0'  NB. left rank
   dotrj =: 2 : 'u/@(v"(1+(v lr),_))'

In the important special case of rank-0 (scalar) functions, where the left rank is 0, dotrj simplifies to:

   dotrj0 =: 2 : 'u/@(v"1 _)'
   dotr1  ← {↑ (↓⍺) ⍺⍺⌿at(⍵⍵[0])¨ ⊂⍵}

dotr1 is repeated from above. The expression (↓⍺) f¨ ⊂⍵ means that rows of are applied against in toto, and that is what rank 1 _ does in J. The leading mix is not needed in J because, not having done the explicit split , there is no need to undo it.

5. Inner products on sparse arrays

The sparse datatypes in J represent an array by the indices and values of non-‘zero’ entries. For example:

   ] x=: (?.3 4$10) * ?.3 4$2
0 5 9 0
0 9 0 7
0 0 0 0
   ] y=: (?.4 5$10) * ?.4 5$2
0 5 9 0 0
9 0 7 0 0
0 0 3 0 1
0 0 0 0 0

   ] xs=: $. x  NB. convert to sparse
0 1 | 5
0 2 | 9
1 1 | 9
1 3 | 7
   ] ys=: $. y
0 1 | 5
0 2 | 9
1 0 | 9
1 2 | 7
2 2 | 3
2 4 | 1

   x -: xs      NB. match
1
   y -: ys
1

Functions apply to sparse arrays, including inner product.

   x +/ .* y
45 0 62 0 9
81 0 63 0 0
 0 0  0 0 0

   xs +/ .* ys
0 0 | 45
0 2 | 62
0 4 |  9
1 0 | 81
1 2 | 63

   (x +/ .* y) -: xs +/ .* ys
1

Sparse arrays make possible large inner products which are impractical in the dense domain. In J7.01 beta:

   load '\ijs\sparsemm.ijs'

   NB. d sa s - random sparse array with density d and shape s

   x=: 0.0001 sa 2$1e5
   y=: 0.0001 sa 2$1e5

   $ x             NB. shape of x
100000 100000
   */ $ x          NB. # of elements in x
1e10
   
   z=: x +/ .*y    NB. inner product of x and y
   $ z             NB. shape of z
100000 100000
   
   +/ +./"1 x~:0   NB. # of nonzero rows    in x
99995
   +/ +./   y~:0   NB. # of nonzero columns in y
99995
   +/@, z~:0       NB. # of nonzero entries in z
9989822

The preceding expressions show that the conventional row-by-column approach would be problematic. There are 99995 non-zero rows in x and 99995 non-zero columns in y. Consequently, there are roughly 1e10 row-by-column combinations (of which only 1e7 end up being non-zero). No matter how fast each combination is, there are still a lot of them.

Sparse inner product in J uses a row-at-a-time algorithm. The 1e5 by 1e5 inner product above took 3.6 seconds.

 

 

BAA logo

Next meetings

Vector is the journal of the British APL Association. The BAA promotes the APLs, terse programming languages derived from Iverson’s mathematical notation. (more…)

comp.lang.apl

Kenneth E. Iverson
Kenneth E. Iverson
1920-2004

Coming up

13-16 Sep
Technische Universität Berlin
APL 2010
Berlin

13-23 Sep
q training courses London

27 Sep–4 Oct
q training courses Singapore

18-28 Oct
q training courses New York

8-18 Nov
q training courses London

8-20 Dec
q training courses New York

YouTube

Valid XHTML 1.0 Strict