execute_for_fetch experimental support now in Perl DBD::ODBC 1.34_1

I have uploaded DBD::ODBC 1.34_1 to CPAN. This release adds very experimental support for a native execute_for_fetch method to DBD::ODBC which means you can do multiple row inserts/updates/deletes much quicker than using DBI's default execute_for_fetch (so long as you are using execute_for_fetch or execute_array and using column-wise binding).

Native execute_for_fetch support can often be as much as 30 or 40 times faster in my experiments (depending on data sent and odbc_batch_size) when inserting rows and even more if you do it in a transaction and even faster if you disable index updates until after the commit. Read on for why.

By default DBD::ODBC will use the new execute_for_fetch and there are some differences between what happens when you use DBD::ODBC with DBI's default execute_for_fetch and DBD::ODBC's. To disable the ODBC one you can set the new odbc_disable_array_operations attribute. I've listed the potential difference below. I'd love people to test this. It may be missing some special workarounds for a few drivers but I'm working my way through those.

Native execute_for_fetch support is must faster than DBI's default method because multiple rows (odbc_batch_size) are sent in one go instead of one at a time. Mostly the time wasted with DBI's default method is sending a row at a time over the network and then waiting for the response. The slight negative is that with most ODBC drivers you can only find the affected rows in total for the batch instead of per row but this usually doesn't matter. To get the benefit you need to use execute_for_fetch or execute_array with column-wise binding (this change will make no difference to row-wise binding via execute_array as execute_for_fetch is not called in that case). I'd love to hear from anyone who can make use of this or is already using execute_for_fetch with DBD::ODBC as this is a very significant change but one I desperately need feedback on.

The differences you may see between DBI's default execute_for_fetch/execute_array and DBD::ODBCs are:

  • as DBI's execute_for_fetch does one row at a time the result from execute is for one row and just about all ODBC drivers can report the number of affected rows when SQLRowCount is called per execute. When batches of parameters are sent the driver can still return the number of affected rows but it is usually per batch rather than per row. As a result, the tuple_status array you may pass to execute_for_fetch (or execute_array) usually shows -1 (unknown) for each row although the total affected returned in array context is a correct total affected.
  • not all ODBC drivers have sufficient ODBC support (arguably a bug) for correct diagnostics support when using arrays. DBI dictates that if a row in the batch is in error the tuple_status will contain the
    state, native and error message text. However the batch may generate multiple errors per row (which DBI says nothing about) and more than one row may error. In ODBC we get a list of errors but to associate
    each one with a particular row we need to call SQLGetDiagField for SQL_DIAG_ROW_NUMBER and it should say which row in the batch the diagnostic is associated with. Some ODBC drivers do not support SQL_DIAG_ROW_NUMBER properly and then DBD::ODBC cannot know which row in the batch an error refers to. In this case DBD::ODBC will report an error saying "failed to retrieve diags", state of HY000 and a native of 1 so you'll still see an error but not necessarily the exact one. Also, when more than one diagnostic is found for a row DBD::ODBC picks the first one (which is usually most relevant) as there is no way to report more than one diagnostic per row in the tuple_status. If the first problem of SQL_DIAG_ROW_NUMBER proves to be a problem for you the DBD::ODBC tracing will show all errors and if that is still not sufficient I'd consider adding a method to show all diagnostics.
  • Binding parameters with execute_array and execute_for_fetch does not allow the parameter types to be set. However, as parameter types are sticky you can call bind_param(param_num, undef, {TYPE => sql_type}) before calling execute_for_fetch/execute_array and the TYPE should be sticky when the batch of parameters is bound.
  • Although you can insert very large columns execute_for_fetch will need odbc_batch_size * max length of parameter per parameter so you may hit memory limits. If you use DBI's execute_for_fetch DBD::ODBC uses the ODBC API SQLPutData (see odbc_putdata_start) which does not require large amounts of memory as large columns are sent in pieces.