lcomp

Overview

AMG: This page presents a list comprehension I initially developed on hat0's page in response to his musings about building a syntax for list comprehensions directly into Tcl. My intention was to demonstrate that list comprehensions work fine as a command without special language support.

[expr] is used to process each generated list element. More than one output element can be generated for each combination of input elements, which is useful for making dicts. A variety of iteration modes are supported, as shown in the table below.

Iteration type[foreach] example [lcomp] example
Simple foreach a $list {...} lcomp {...} for a in $list
Striding foreach {a b} $list {...} lcomp {...} for {a b} in $list
Unpacking foreach _ $list {lassign $_ a b; ...} lcomp {...} for {a b} inside $list
Combinatorial foreach a $list1 {foreach b $list2 {...}}lcomp {...} for a in $list1 for b in $list2
Parallel foreach a $list1 b $list2 {...} lcomp {...} for a in $list1 and b in $list2
Conditional foreach a $list {if {cond} {...}} lcomp {...} for a in $list if {cond}

All these different iterations types can be freely intermixed. For example, conditional iteration can be used to skip entire inner loops, not just individual elements.

The caller's variables can be brought into [lcomp]'s scope using the with opcode, whose operand is a list of variable names to bring in. This can be used both to import values and export iterators. Any number of with opcodes can be used, and they can be placed anywhere in the argument list.

The "{...}"s shown in the above table aren't exact copies of each other. Let's compare in more detail:

[lcomp] [foreach]
lcomp {$a * 2} for a in $list set _ {}; foreach a $list {lappend _ [expr {$a * 2}]}; set _
lcomp {$b} {$a} for {a b} in $list set _ {}; foreach {a b} $list {lappend _ $b $a}; set _

For brevity, the second [foreach] example says "$b $a" even though it's technically "[expr {$b}] [expr {$a}]".


Implementation

Here's the code:

proc lcomp {expression args} {
    set __0__ "lappend __1__ \[expr [list $expression]\]"
    while {[llength $args] && [lindex $args 0] ni {for if with}} {
        append __0__ " \[expr [list [lindex $args 0]]\]"
        set args [lrange $args 1 end]
    }
    set tmpvar 2
    set structure {}
    set upvars {}
    while {[llength $args]} {
        set prefix ""
        switch [lindex $args 0] {
        for {
            set nest [list foreach]
            while {[llength $nest] == 1 || [lindex $args 0] eq "and"} {
                if {[llength $args] < 4 || [lindex $args 2] ni {in inside}} {
                    error "wrong # operands: must be \"for\" vars \"in?side?\"\
                           vals ?\"and\" vars \"in?side?\" vals? ?...?"
                }
                switch [lindex $args 2] {
                in {
                    lappend nest [lindex $args 1] [lindex $args 3]
                } inside {
                    lappend nest __${tmpvar}__ [lindex $args 3]
                    append prefix "lassign \$__${tmpvar}__ [lindex $args 1]\n"
                    incr tmpvar
                }}
                set args [lrange $args 4 end]
            }
            lappend structure $nest $prefix
        } if {
            if {[llength $args] < 2} {
                error "wrong # operands: must be \"if\" condition"
            }
            lappend structure [list if [lindex $args 1]] $prefix
            set args [lrange $args 2 end]
        } with {
            if {[llength $args] < 2} {
                error "wrong # operands: must be \"with\" varlist"
            }
            foreach var [lindex $args 1] {
                lappend upvars $var $var
            }
            set args [lrange $args 2 end]
        } default {
            error "bad opcode \"[lindex $args 0]\": must be for, if, or with"
        }}
    }
    foreach {prefix nest} [lreverse $structure] {
        set __0__ [concat $nest [list \n$prefix$__0__]]
    }
    if {[llength $upvars]} {
        set __0__ "upvar 1 $upvars; $__0__"
    }
    unset -nocomplain expression args tmpvar prefix nest structure var upvars
    set __1__ ""
    eval $__0__
    return $__1__
}

Demonstration

To demonstrate its usage, I will first rework all the version-1 examples given on old versions of this page.

set l {1 2 3 4 5 6}
puts "A copy of the list: [lcomp {$i} for i in $l]"
puts "Double values from list: [lcomp {$n * 2} for n in $l]"
puts "Only even numbers: [lcomp {$i} for i in $l if {$i % 2 == 0}]"
proc digits {str} {
    lcomp {$d} for d in [split $str ""] if {[string is digit $d]}
}
puts "Just digits from (703)-999-0012= [digits (703)-999-0012]"
set names1 {Todd Coram Bob Jones Tim Druid}
puts "From ($names1): Last,first = [lcomp {"$l,$f"} for {f l} in $names1]"
puts "From ($names1): Only names starting with 't':\
    [lcomp {$f} for {f l} in $names1 if {[string match T* $f]}]"
puts "Create a matrix pairing {a b c} and {1 2 3}:\
    [lcomp {[list $n1 $n2]} for n1 in {a b c} for n2 in {1 2 3}]"
lcomp {$x} for x in {0 1 2 3}                         ;# 0 1 2 3
lcomp {[list $y $x]} for {x y} in {0 1 2 3}           ;# {1 0} {3 2}
lcomp {$x ** 2} for x in {0 1 2 3}                    ;# 0 1 4 9
lcomp {$x + $y} for x in {0 1 2 3} for y in {0 1 2 3} ;# 0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6
lcomp {$x} for x in {0 1 2 3} if {$x % 2 == 0}        ;# 0 2
image delete {*}[lcomp {$val} for {key val} in [array get images]]

Now I will show off some new features:

set scale 2
lcomp {$x * $scale} with scale for x in {1 2 3 4}            ;# 2 4 6 8
lcomp {$key} {$val} for key in {a b c} and val in {1 2 3}    ;# a 1 b 2 c 3
lcomp {"$key=$val"} for {key val} inside {{a 1} {b 2} {c 3}} ;# a=1 b=2 c=3

Discussion

Bytecoding

AMG: It would be awesome if this was bytecoded. :^)

DKF: Just a hint: writing bytecode compilers is quite difficult. No, it's actually very difficult indeed for anything non-trivial; you're working with the parser and the bytecode and the compiler core and you're short of most of the nice tools available to you when Tcl coding. What's more, get it wrong and you might not notice for quite a long time, as different problems show up in different ways. (Try tcl::unsupported::assemble instead; it gives you 90% of the effect for about 1% of the effort.)

AMG: What objections would there be to me distributing Tcl code that uses assemble?

Reduction functions

AMG: I'm curious if there's a workable way to add customizable "reduction" behavior, or if list reduction should remain a separate operation performed on [lcomp]'s return value. (Reduce is also known as fold.)

reduce(f, l) = f(...(f(f(f(l[0], l[1]), l[2]), l[3])..., l[n-1])

At the moment, the "reduction function" is hard-coded to be [lappend], which builds a list given each element in sequence. There are other sensible options: [append] concatenates the elements into a string, [join] does the same but inserts a delimiter, [+] finds the sum, [*] finds the product, [max] finds the maximum, etc.

DKF: It's more usual to define reduce to take three arguments: a reduction function, a “zero”, and a list/sequence. (It's also best if you can have the function and zero being things can be applied without the list to get a partially-applied – i.e., curried – function.) That gives a definition like this:

reduce(f, z)(l) = f(...(f(f(f(z, l[0]), l[1]), l[2])..., l[n-1])

AMG: Subtle but cool: DKF's reduce is a function that returns a function which is applied to l.

Spiritual successor

AMG: The "syntax" provided by this [lcomp] command inspired the [loop] and [collect] commands proposed for Brush. The [loop] command takes the place of [foreach], [for], [while], and [dict for], plus provides many more modes of operation. The [collect] command is the list comprehension command, and its arguments are [expr]essions followed by the same looping specification arguments supported by [loop].

Here are the examples given on this page rewritten to use Brush and [collect]:

set &l (1 2 3 4 5 6)
puts "A copy of the list: [collect i for &i in $l]"
puts "Double values from list: [collect n*2 for &n in $l]"
puts "Only even numbers: [collect i for &i in $l if {i % 2 == 0}]"
proc digits {str} {
    collect d for &d in [split $str ""] if {[string is digit $d]}
}
puts "Just digits from (703)-999-0012= [digits {(703)-999-0012}]"
set &names1 (Todd Coram Bob Jones Tim Druid)
puts "From ($names1): Last,first = [collect {"$l,$f"} for (&f &l) in $names1]"
puts "From ($names1): Only names starting with 't':\
    [collect f for (&f &l) in $names1 if {[string match T* $f]}]"
puts "Create a matrix pairing {a b c} and {1 2 3}:\
    [collect {(n1, n2)} for &n1 in (a b c) for &n2 in (1 2 3)]"
collect x for &x in (0 1 2 3)                                       ;# 0 1 2 3
collect {(y, x)} for (&x &y) in (0 1 2 3)                           ;# {1 0} {3 2}
collect x**2 for &x in (0 1 2 3)                                    ;# 0 1 4 9
collect x+y for &x in (0 1 2 3) for &y in (0 1 2 3)                 ;# 0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6
collect x for &x in (0 1 2 3) if {x % 2 == 0}                       ;# 0 2
image delete {*}[collect val for (&key &val) in [array get images]] ;# Note: Brush doesn't have [array]
set scale 2
collect x*scale for x in (1 2 3 4)                                  ;# 2 4 6 8      * no need for "with"
collect key val for &key in (a b c) and &val in (1 2 3)             ;# a 1 b 2 c 3
collect {"$key=$val"} for (&key &val) unpack ((a 1) (b 2) (c 3))    ;# a=1 b=2 c=3

AMG: I'm working on an update to the design of Brush's [loop] and [collect] commands that would eliminate "unpack" and change that last line to:

collect {"$key=$val"} for ((&key &val)) in ((a 1) (b 2) (c 3))

Difficulty processing nested lists

AMG: [lcomp] can't be used to process nested lists, not in the way I intended.

% lcomp {$b ** 2} for a in {{1 2 3} {4 5}} for b in $a
can't read "a": no such variable
% lcomp {[lcomp {$b ** 2} for b in $a]} for a in {{1 2 3} {4 5}}
{1 4 9} {16 25}
% lcomp {$b ** 2} for b inside {{1 2 3} {4 5}}
1 16

That really hurts.

Compare Python. Tcl insists on variable substitution being done before the command starts, provided $ notation is used, whereas Python variable substitution happens on demand throughout the execution of a "command" (using the term loosely).

>>> [b ** 2 for a in ((1, 2, 3), (4, 5)) for b in a]
[1, 4, 9, 16, 25]

I'm working on changes to the design of Brush to permit the following, which gives the same result as the Python code:

% collect b**2 for ((* &b)) in ((1 2 3) (4 5))
1 4 9 16 25

But this is still suboptimal because the implicit nesting expressed by ((* &b)) does not offer any room to inject an if clause. How to translate the following Python?

>>> [b ** 2 for a in ((1, 2, 3), (4, 5)) if len(a) > 2 for b in a]
[1, 4, 9]

As mentioned before, Tcl insists on variable substitution being done early. Brush provides a way to defer the substitution:

% collect {$b@ ** 2} for &a in ((1 2 3) (4 5)) for &b over &a
1 4 9 16 25

Here, the b variable doesn't contain an actual list element. Typing $b instead gives a reference to one of said list elements, so to obtain the elements themselves, it is necessary to dereference with $b@. (In Tcl, this could be done by b containing a script that provides the referent when [eval]'ed, in this case "lindex $a 123", where 123 is the current loop iteration counter.)

This makes room for if:

% collect {$b@ ** 2} for &a in ((1 2 3) (4 5)) if {[list size $a] > 2} for &b over &a
1 4 9

Changing gears a bit. Tcl has [lmap]:

% lmap a {{1 2 3} {4 5}} {lmap b $a {expr {$b ** 2}}}
{1 4 9} {16 25}

However, this doesn't offer an obvious way to flatten the list structure.

Back to Brush. How to not flatten the list?

% collect {[collect b**2 for &b in $a]} for &a in ((1 2 3) (4 5))
{1 4 9} {16 25}
% collect i for &a in ((1 2 3) (4 5)) {= &i [collect b**2 for &b in $a]}
{1 4 9} {16 25}

These do not make me happy...


See also