Reading Corrupt Data

Created: February 3, 2015 Tagged As: .net, C#, General, Language Share:

I don't know how many times I have been given an old Microsoft Access database to import into a new SQL Server-based system I am writing only to run into SQL Data Import errors that say some cryptic message. Usually these are caused by corruption in the Access (or Excel) data source that the native program seems to know how to gloss over but everything else does not.

I usually then must resort to C# code to read and parse the data. The preferred mechanism for this is ADO.net and there is a ton of information about connecting it to various data sources, so I won't go into that here. What most articles fails to discuss is the DataAdapter.FillError event. Were it not for this event, we would be stuck with an all-or-nothing approach to data import: either you get the entire table, or, if even one tiny error occurs, nothing at all.

The FillError event is called for every ‚Äčrow that generates and error. When handling this event, you will notice that the event arguments passed in have a property named Continue. Setting this to true tells the DataAdapter that you've acknowledged the issue and it can keep going.

The code example below shows how this event can be used:

public class FillErrorTest
{
   public DataTable FillTable()
   {
      DataTable tableToFill = null; 
      System.Data.OleDb.OleDbDataAdapter daOLE = new System.Data.OleDb.OleDbDataAdapter(); 
      daOLE.FillError += daOLE_FillError;
      daOLE.SelectCommand = new System.Data.OleDb.OleDbCommand() { CommandText = "SELECT * FROM MyTable", CommandType = CommandType.Text };
      daOLE.Fill(tableToFill); return tableToFill;
   }

   private void daOLE_FillError(object sender, FillErrorEventArgs e)
   {
      Console.WriteLine("Fill error!!");
      e.Continue = true;
   }
} 

Now I know that this doesn't help you get data from the row(s) that are corrupt, but at least the rest of the data source can be read and my clients generally feel that 99% (or better) of data retrieval is better than none. And if you can use the FillError event to log which row(s) cannot be read, then manually entering the information would then be possible.