通常认为递归会为新层级创建新栈帧,层级过多则很可能造成爆栈问题。为此,对于一种特殊的递归“尾递归”,SBCL在编译时可将其优化为循环,从而大大提高运行效率,这就是“尾递归优化”(TCO, Tail Call Optimization)。
“尾递归”即“不在递归后执行任何操作”,在这种情况下,原栈帧中保存的现场将不再有什么用武之地,这就是尾递归优化的原理。
下面我们实操一下,看看SBCL如何优化尾递归。首先写一个函数:
(这是ANSI Common Lisp第5章第7题的a问)
(defun stepped-list-p-rcsv (ls)
(cond
((or (null ls) (null (cdr ls)))
t)
((/= 1 (abs (- (car ls) (cadr ls))))
nil)
(t
(stepped-list-p-rcsv (cdr ls)))))
我们首先求值,然后键入
M-x sly-disassemble-symbol RET stepped-list-p-rcsv RET
查看反汇编结果:
; disassembly for STEPPED-LIST-P-RCSV
; Size: 160 bytes. Origin: #xB800C1FA27 ; STEPPED-LIST-P-RCSV
; 27: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 2B: 488945F8 MOV [RBP-8], RAX
; 2F: 4D39E1 CMP R9, R12 ; NIL
; 32: 7508 JNE L2
; 34: L0: 498D542488 LEA RDX, [R12-120] ; T
; 39: L1: C9 LEAVE
; 3A: F8 CLC
; 3B: C3 RET
; 3C: L2: 418D71F9 LEA ESI, [R9-7]
; 40: 40F6C60F TEST SIL, 15
; 44: 757C JNE L5
; 46: 4D8B4101 MOV R8, [R9+1]
; 4A: 4D39E0 CMP R8, R12 ; NIL
; 4D: 74E5 JEQ L0
; 4F: 498B51F9 MOV RDX, [R9-7]
; 53: 4D8B4101 MOV R8, [R9+1]
; 57: 418D70F9 LEA ESI, [R8-7]
; 5B: 40F6C60F TEST SIL, 15
; 5F: 755E JNE L4
; 61: 498B78F9 MOV RDI, [R8-7]
; 65: 41FF942409FDFFFF CALL [R12-759] ; [#5200000FFCA0] = #B8000010C0 ; GENERIC--
; 6D: 4C8B4DF0 MOV R9, [RBP-16]
; 71: 4883EC10 SUB RSP, 16
; 75: B902000000 MOV ECX, 2
; 7A: 48892C24 MOV [RSP], RBP
; 7E: 488BEC MOV RBP, RSP
; 81: 498B442429 MOV RAX, [R12+41] ; LISP-LINKAGE-TABLE
; 86: FF90100A0000 CALL [RAX+2576] ; ABS
; 8C: 4C8B4DF0 MOV R9, [RBP-16]
; 90: BF02000000 MOV EDI, 2
; 95: 41FF942431FDFFFF CALL [R12-719] ; [#5200000FFCC8] = #B800001280 ; GENERIC-=
; 9D: 4C8B4DF0 MOV R9, [RBP-16]
; A1: 7405 JEQ L3
; A3: 498BD4 MOV RDX, R12 ; NIL
; A6: EB91 JMP L1
; A8: L3: 498B5101 MOV RDX, [R9+1]
; AC: B902000000 MOV ECX, 2
; B1: FF7508 PUSH QWORD PTR [RBP+8]
; B4: 498B442429 MOV RAX, [R12+41] ; LISP-LINKAGE-TABLE
; B9: FFA0A0000100 JMP [RAX+65696] ; STEPPED-LIST-P-RCSV
; BF: L4: CC56 INT3 86 ; OBJECT-NOT-LIST-ERROR
; C1: 20 BYTE #X20 ; R8(d)
; C2: L5: CC56 INT3 86 ; OBJECT-NOT-LIST-ERROR
; C4: 24 BYTE #X24 ; R9(d)
; C5: CC0F INT3 15 ; Invalid argument count trap
不难注意到,在递归时(B9)并没有用CALL创建新栈帧,而是直接JMP到函数开头,已经应用了TCO。
为了对比,我们故意写一个非尾递归的版本:
(defun stepped-list-p-nontail (ls)
(cond
((or (null ls) (null (cdr ls)))
t)
((/= 1 (abs (- (car ls) (cadr ls))))
nil)
(t
(and (stepped-list-p-nontail (cdr ls))
t))))
反汇编:
; disassembly for STEPPED-LIST-P-NONTAIL
; Size: 200 bytes. Origin: #xB800C1E124 ; STEPPED-LIST-P-NONTAIL
; 24: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 28: 488945F8 MOV [RBP-8], RAX
; 2C: 4C3965F0 CMP [RBP-16], R12 ; NIL
; 30: 7508 JNE L2
; 32: L0: 498D542488 LEA RDX, [R12-120] ; T
; 37: L1: C9 LEAVE
; 38: F8 CLC
; 39: C3 RET
; 3A: L2: 488B45F0 MOV RAX, [RBP-16]
; 3E: 8D70F9 LEA ESI, [RAX-7]
; 41: 40F6C60F TEST SIL, 15
; 45: 0F8598000000 JNE L5
; 4B: 488B45F0 MOV RAX, [RBP-16]
; 4F: 4C8B4001 MOV R8, [RAX+1]
; 53: 4D39E0 CMP R8, R12 ; NIL
; 56: 74DA JEQ L0
; 58: 488B45F0 MOV RAX, [RBP-16]
; 5C: 488B50F9 MOV RDX, [RAX-7]
; 60: 488B45F0 MOV RAX, [RBP-16]
; 64: 4C8B4001 MOV R8, [RAX+1]
; 68: 418D70F9 LEA ESI, [R8-7]
; 6C: 40F6C60F TEST SIL, 15
; 70: 756E JNE L4
; 72: 498B78F9 MOV RDI, [R8-7]
; 76: 41FF942409FDFFFF CALL [R12-759] ; [#5200000FFCA0] = #B8000010C0 ; GENERIC--
; 7E: 4883EC10 SUB RSP, 16
; 82: B902000000 MOV ECX, 2
; 87: 48892C24 MOV [RSP], RBP
; 8B: 488BEC MOV RBP, RSP
; 8E: 498B442429 MOV RAX, [R12+41] ; LISP-LINKAGE-TABLE
; 93: FF90100A0000 CALL [RAX+2576] ; ABS
; 99: BF02000000 MOV EDI, 2
; 9E: 41FF942431FDFFFF CALL [R12-719] ; [#5200000FFCC8] = #B800001280 ; GENERIC-=
; A6: 7530 JNE L3
; A8: 488B45F0 MOV RAX, [RBP-16]
; AC: 488B5001 MOV RDX, [RAX+1]
; B0: 4883EC10 SUB RSP, 16
; B4: B902000000 MOV ECX, 2
; B9: 48892C24 MOV [RSP], RBP
; BD: 488BEC MOV RBP, RSP
; C0: 498B442429 MOV RAX, [R12+41] ; LISP-LINKAGE-TABLE
; C5: FF90A8000100 CALL [RAX+65704] ; STEPPED-LIST-P-NONTAIL
; CB: 480F42E3 CMOVB RSP, RBX
; CF: 4C39E2 CMP RDX, R12 ; NIL
; D2: 0F855AFFFFFF JNE L0
; D8: L3: 498BD4 MOV RDX, R12 ; NIL
; DB: E957FFFFFF JMP L1
; E0: L4: CC56 INT3 86 ; OBJECT-NOT-LIST-ERROR
; E2: 20 BYTE #X20 ; R8(d)
; E3: L5: 488B45F0 MOV RAX, [RBP-16]
; E7: CC56 INT3 86 ; OBJECT-NOT-LIST-ERROR
; E9: 00 BYTE #X00 ; RAX(d)
; EA: CC0F INT3 15 ; Invalid argument count trap
清晰可见,在 C5 处有明显保护现场、创建栈帧的行为,与有TCO的版本产生鲜明对比。
尾递归优化使得函数式编程可以兼具递归的优雅与迭代的高效,是极为重要的特性。其中,Scheme、Haskell、Elixir和Erlang等语言在标准中提供了尾递归优化保证;Common Lisp、Clojure、F#、Scala等语言在实现或实践中常用尾递归优化。
本文使用的编译器:SBCL 2.5.7, amd64