[explain importance of this notion of immutability to proper understanding of Tcl's semantics]
Those who have a little knowledge of C (or other similar languages) have learned to pass arguments to a function by value or by reference.
Tcl scripts have such functionalities : when you need to pass the variable by value, you put a $ before the variable name. And this does not modify the original variable.
But when you need to access uplevel variables, you can use upvar in the proc you call, passing just the name of the variable as argument. Then you can access the content of a variable defined in the upper stack level. This feature works especially well with arrays (arrays cannot be passed by value).
But when you come to Tcl C API (the internals that are required to write C extensions), you must know how Tcl passes variables by value, which makes use of the copy-on-write semantics.
Basically the copy-on-write semantics are:
A surprise
Here are 2 procs, the first gets a list passed by value, the second by reference via upvar.
# Proc 1 proc lmul-byval {list mult} { foreach i $list { lappend out [expr {$i*$mult}] } return $out } # Proc 2 proc lmul-byref {list mult} { upvar 1 $list l for {set i 0} {$i<[llength $l]} {incr i} { lset l $i [expr {[lindex $l $i] * $mult}] } return } # Timings proc timeit {lsize val mult} { for {set i 0} {$i<$lsize} {incr i} {lappend list1 $val;lappend list2 $val} puts "By value : [time {set list1 [lmul-byval $list1 $mult]} 10]" puts "By reference : [time {lmul-byref list2 $mult} 10]" } foreach size {100 1000 10000} { puts "TIME : $size elements" timeit $size 2 2 }
Results:
TIME : 100 elements By value : 41.1 microseconds per iteration By reference : 84.0 microseconds per iteration TIME : 1000 elements By value : 386.9 microseconds per iteration By reference : 764.8 microseconds per iteration TIME : 10000 elements By value : 4300.3 microseconds per iteration By reference : 7893.7 microseconds per iteration
As you can see, upvar is costly.
PR Don't believe so. The problem is the lset command for updating the values. Here are some modified procedures, showing another result.
# Proc byval proc lmul-byval {listv mult} { foreach i $listv { lappend out [expr {$i*$mult}] } return $out } # Proc byref proc lmul-byref {listname mult} { upvar 1 $listname listv for {set i 0} {$i<[llength $listv]} {incr i} { lset listv $i [expr {[lindex $listv $i] * $mult}] } return } # Proc byref2 - set value to list inside proc proc lmul-byref2 {listname mult} { upvar 1 $listname listv foreach i $listv { lappend out [expr {$i*$mult}] } set listv $out return } # Proc byref3 - return new values as list proc lmul-byref3 {listname mult} { upvar 1 $listname listv foreach i $listv { lappend out [expr {$i*$mult}] } return $out } # Timings proc timeit {lsize val mult} { for {set i 0} {$i<$lsize} {incr i} { lappend list1 $val lappend list2 $val lappend list3 $val lappend list4 $val }
puts "By reference : [time {lmul-byref list2 $mult} 10]" puts "By reference 2 : [time {lmul-byref2 list3 $mult} 10]" puts "By reference 3 : [time {set list4 [lmul-byref3 list4 $mult]} 10]" } foreach size {100 1000 10000} { puts "TIME : $size elements" timeit $size 2 2 }
Results: (Running on XP) % {TIME : 100 elements} % {By value : 73 microseconds per iteration} % {By reference : 133 microseconds per iteration} % {By reference 2 : 78 microseconds per iteration} % {By reference 3 : 75 microseconds per iteration} % {TIME : 1000 elements} % {By value : 613 microseconds per iteration} % {By reference : 1154 microseconds per iteration} % {By reference 2 : 619 microseconds per iteration} % {By reference 3 : 616 microseconds per iteration} % {TIME : 10000 elements} % {By value : 7664 microseconds per iteration} % {By reference : 11916 microseconds per iteration} % {By reference 2 : 8692 microseconds per iteration} % {By reference 3 : 7972 microseconds per iteration}
This is not like in C, in which large structures are faster passed by reference via pointers than by value. In C++, objects need to be passed by reference (&obj vs. obj) because the copy constructor can be waste CPU time when you work on large vectors of objects.
Another example showing this difference:
proc checka {l} { if {[llength $l] == 1} { if {![string is integer $l]} {error "non-numerical value (should be integer)"} return [list bignum $l] } return $l } proc checkb {v} { upvar 1 $v l if {[llength $l] == 1} { if {![string is integer $l]} {error "non-numerical value (should be integer)"} set l [list bignum $l] } }
In the above example, we check a list represents a bignum, and if it is a simple integer, makes a bignum of it.(assuming bignums are lists of integers with a "bignum" heading tag)
foreach i {1 {bignum 2} {bignum 3 3 3 3 3 3 3 3 3}} { set a $i puts "By value: [time {set a [checka $a]} 100]" set b $i puts "By reference: [time {checkb b} 100]" } By value: 2.61 microseconds per iteration By reference: 3.42 microseconds per iteration By value: 2.11 microseconds per iteration By reference: 2.62 microseconds per iteration By value: 2.1 microseconds per iteration By reference: 2.51 microseconds per iteration
This demonstrates that upvar is more time-wasting than passing arguments by value and returning it, for this perticular kind of tasks.
Example in C (taken from foriter - a loop command)
1 (just read the variable)
result = Tcl_GetIntFromObj(interp, objv[2], &start); if (result != TCL_OK) { return result; } ...
2 (check if the variable is shared)
if ( Tcl_IsShared( obj_counter ) ) {
3 (duplicate shared data)
obj_counter = Tcl_DuplicateObj( obj_counter );
Increment reference counter (see Tcl_Obj refCount HOWTO)
Tcl_IncrRefCount( obj_counter );
Remember that the Tcl_Obj is shared
isSharedObjCounter=1; } ...
When the Tcl_Obj becomes useless, we can decrement refcount to allow freeing memory. Without that, the whole interpreter may crash without alerting you :-( ...
/* ending of the loop */ if ( isSharedObjCounter ) { Tcl_DecrRefCount( obj_counter ); }
[and consequences for extension design]