Why doesn't length() on DBI/DBD::ODBC returned unicode data return the correct length in characters?

Today someone posted a problem using the SQL Server XML datatype with DBI/DBD::ODBC on the dbi-users mailing list. I sorted their problem pretty quickly but noticed his code using length() on scalars which were bound with bound_col was not reporting the correct length. The example and output are below:

use strict;
use DBI qw(:sql_types);
use Devel::Peek;

use bytes;
no bytes;

my ($txt_de, $txt_ru);
{
    use utf8;

    $txt_de = 'Käse';
    $txt_ru = 'Москва';
}

binmode STDOUT, ':utf8';

my @dsn = qw/DBI:ODBC:xx xx xx/;
my %opt = (PrintError => 0, RaiseError => 1, AutoCommit => 1, ChopBlanks => 1);
my $dbh = DBI->connect( @dsn, \%opt );

# Table was created with
# CREATE TABLE T2 (a VARCHAR(99), u NVARCHAR(99), x XML);
my $sth_ins = $dbh->prepare(
  'INSERT INTO T2 (a, u, x) VALUES (?, ?, CAST( ? AS XML) )' );

foreach my $row ([$txt_de, $txt_de, "<d>$txt_de</d>"],
                 [$txt_ru, $txt_ru, "<r>$txt_ru</r>"]) {
    $sth_ins->bind_param(1, $row->[0]);
    $sth_ins->bind_param(2, $row->[1]);
    $sth_ins->bind_param(3, $row->[2], {TYPE => SQL_WCHAR});
    $sth_ins->execute;
}

my $sth_sel = $dbh->prepare( 'SELECT u, x FROM T2' );
$sth_sel->execute;
$sth_sel->bind_col(1, \my $txt, {TYPE => SQL_WCHAR});
$sth_sel->bind_col(2, \my $xml, {TYPE => SQL_WCHAR});

my $i = 0;
while ( $sth_sel->fetch ) {
  printf "%3u %3u %3u %s [%s] [%s]\n",
      ++$i, length($txt), bytes::length($txt),
          (utf8::is_utf8($txt) ? ' utf8' : '!utf8'), $txt, $xml;
  # NOTE, if I don't reset $txt each iteration the length() call returns
  # the wrong answer.
  #$txt = ''; #this line fixes it
  #Dump($txt);
}
$dbh->disconnect;

which outputs:

1   4   5  utf8 [Käse] [<d>Käse</d>]
2   4  12  utf8 [&#1052;&#1086;&#1089;&#1082;&#1074;&#1072;]

Notice the length (second column) is 4 for Käse and Москва even though the latter should obviously be 6. I noticed that by resetting $txt to '' in each fetch iteration it fixed the problem and then I added Devel::Peek and Dump for those two bound scalars I got:

1   4   5  utf8 [Käse] [<d>Käse</d>]
SV = PVMG(0x853b6fc) at 0x856d570
REFCNT = 2
FLAGS = (PADMY,SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x8674698 "K\303\244se"\0 [UTF8 "K\x{e4}se"]
CUR = 5
LEN = 8
MAGIC = 0x86746c0
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 4

2   4  12  utf8 [&#1052;&#1086;&#1089;&#1082;&#1074;&#1072;] [<r>&#1052;&#1086;&#1089;&#1082;&#1074;&#1072;</r>]
SV = PVMG(0x853b6fc) at 0x856d570
REFCNT = 2
FLAGS = (PADMY,SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x8671098 "\320\234\320\276\321\201\320\272\320\262\320\260"\0 [UTF8 "\x{41c}\x{43e}\x{441}\x{43a}\x{432}\x{430}"]
CUR = 12
LEN = 16
MAGIC = 0x86746c0
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 4

Those MG_LEN=4 lines were really suspicious but I was still stumped. The really great people at Perl Monks soon identified the problem as there was set magic and so probably SvSETMAGIC should have been called "somewhere" but hadn't been and ikegami even posted a worked example (see length() returns wrong result - suspicious magic). After this it was not too hard to track down a sv_setpvn that should have been a sv_setpvn_mg although I am still a little mystified why I need to know there is magic attached on the bound scalars when setting them.

Thanks to Perl Monks (again) and in particular ikegami.