BulkInsert with the Entity Framework – DEVELOPPARADISE
03/05/2018

BulkInsert with the Entity Framework

Introduction

The current status of this project can be found at https://github.com/mtanneryd/ef6-bulk-operations. There is also a nuget package available at https://www.nuget.org/packages/Tanneryd.BulkOperations.EF6.

The entity framework (EF) is great in many ways but when inserting huge amounts of data, it is not really what you need, unless you have lots of time to spend waiting. In this article, I will present a way to use EF together with the bulk insert functionality without leaving the EF comfort zone. The code is based on an idea that I had some years ago, but I have added some more features to it since then. I figured it was time to write a complete article and provide some example code to go with it.

The latest version of this code can be found as a nuget package here.

There is also an updated version of this article here.

Background

The reason it takes so long is that every single insert statement is encapsulated in the stored procedure sp_executesql. So, if you have a million or so inserts to make, you get just as many roundtrips to the database each executing this stored procedure with your insert statement inside. No wonder it takes eons to finish.

Typically, it could look something like this:

exec sp_executesql N'INSERT [dbo].[Prime]([Number], [CompositeId]) VALUES (@0, @1) SELECT [Id] FROM [dbo].[Prime] WHERE @@ROWCOUNT > 0 AND [Id] = scope_identity()',N'@0 bigint,@1 bigint',@0=7,@1=198

I had the following requirements when I set out to find a way to speed things up a bit.

  • dead simple to add the functionality to any EF6 project
  • minimize impact on existing code
  • handle complete object graphs

Challenges

Extend the Functionality of the db Context

To integrate seamlessly with the derived DbContext class, I decided to add the BulkInsert-method in a separate partial class declaration. BulkInsertAll is the only public method but there are a couple of private methods too and it seemed like a practical solution to place them all in a separate file making use of the partial class pattern.

Table Mappings

I wanted to be able to deduct all relevant information such as table names, column names, foreign keys and whatever else I might need directly from the context and the entity objects themselves. But, as far as I know, there are no documented ways of accessing the entity framework mapping data so I had to resort to using reflection. It works but it also means that the code might stop working with a new version of EF. The following code is used to extract the initial mapping data using reflection.

var objectContext = ((IObjectContextAdapter)this).ObjectContext; var workspace = objectContext.MetadataWorkspace; var containerName = objectContext.DefaultContainerName; Type t = entities[0].GetType(); var entityName = t.Name; var storageMapping = workspace.GetItem<GlobalItem>(containerName, DataSpace.CSSpace); dynamic temp = storageMapping.GetType().InvokeMember(                 "EntitySetMappings",                 BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance,                 null, storageMapping, null);

The temp variable now holds an enumerable of EntitySetMapping objects that we cast to a list, just so that we can Single() out the mapping for our entity type.

var entitySetMaps = new List<EntitySetMapping>(); foreach (var t in temp) {   entitySetMaps.Add((EntitySetMapping)t); } var entitySetMap = entitySetMaps.Single(m => m.EntitySet.ElementType.Name == entityName);

Our next step is to find the mapping between CLR property names and table column names.

var typeMappings = entitySetMap.EntityTypeMappings; EntityTypeMapping typeMapping = typeMappings[0]; var fragments = typeMapping.Fragments; var fragment = fragments[0]; var properties = fragment.PropertyMappings; foreach (var property in properties.Where(p => p is ScalarPropertyMapping).Cast<ScalarPropertyMapping>()) {    var clrProperty = property.Property;    var columnProperty = property.Column;    columnMappings.Add(clrProperty.Name, new CLR2ColumnMapping    {        CLRProperty = clrProperty,        ColumnProperty = columnProperty,    }); }

Finally, we look for any foreign key relationships.

var foreignKeyMappings = new List<ForeignKeyMapping>(); var navigationProperties =     typeMapping.EntityType.DeclaredMembers.Where(         m => m.BuiltInTypeKind == BuiltInTypeKind.NavigationProperty)         .Cast<NavigationProperty>()         .Where(p => p.RelationshipType is AssociationType)         .ToArray();  foreach (var navigationProperty in navigationProperties) {     var relType = (AssociationType)navigationProperty.RelationshipType;      if (foreignKeyMappings.All(m => m.Name != relType.Name))     {         var fkMapping = new ForeignKeyMapping         {             NavigationPropertyName = navigationProperty.Name,             BuiltInTypeKind = navigationProperty.TypeUsage.EdmType.BuiltInTypeKind,             Name = relType.Name,             FromType = relType.Constraint.FromProperties.Single().DeclaringType.Name,             FromProperty = relType.Constraint.FromProperties.Single().Name,             ToType = relType.Constraint.ToProperties.Single().DeclaringType.Name,             ToProperty = relType.Constraint.ToProperties.Single().Name,         };         foreignKeyMappings.Add(fkMapping);     } }

Identity Columns

The entity framework will set any generated identity columns when inserting new rows and as you can see in the sp_executesql code snippet above, it uses scope_identity() to do this. This works nicely as long as you insert one row at a time just like EF does but now that we want to use bulk insert instead, we need a slightly more sophisticated approach. So, the code below does this for us when our primary key column is an identity column.

var pkColumnName = pkColumn.Name; var pkColumnType = Type.GetType(pkColumn.PrimitiveType.ClrEquivalentType.FullName);  // Get the number of existing rows in the table. cmd.CommandText = $@"SELECT COUNT(*) FROM {tableName}"; var result = cmd.ExecuteScalar(); var count = Convert.ToInt64(result);  // Get the identity increment value cmd.CommandText = $"SELECT IDENT_INCR('{tableName}')"; result = cmd.ExecuteScalar(); dynamic identIncrement = Convert.ChangeType(result, pkColumnType);  // Get the last identity value generated for our table cmd.CommandText = $"SELECT IDENT_CURRENT('{tableName}')"; result = cmd.ExecuteScalar(); dynamic identcurrent = Convert.ChangeType(result, pkColumnType);  var nextId = identcurrent + (count > 0 ? identIncrement : 0);  bulkCopy.BulkCopyTimeout = 5 * 60; bulkCopy.WriteToServer(table);  cmd.CommandText = $"SELECT SCOPE_IDENTITY()"; result = cmd.ExecuteScalar(); dynamic lastId = Convert.ChangeType(result, pkColumnType);  cmd.CommandText = $"SELECT {pkColumnName} From {tableName}  WHERE {pkColumnName} >= {nextId} and {pkColumnName} <= {lastId}"; var reader = cmd.ExecuteReader(); var ids = (from IDataRecord r in reader             let pk = r[pkColumnName]             select pk)             .OrderBy(i => i)             .ToArray(); if (ids.Length != entities.Count) throw new ArgumentException ("More id values generated than we had entities. Something went wrong, try again.");   for (int i = 0; i < entities.Count; i++) {     SetProperty(pkColumnName, entities[i], ids[i]); }

Foreign Key Relationships

Our search through the entity mappings gave us information about foreign keys and with it, we can implement a recursive pattern that will allow the bulk insert method to save a complete entity graph and set identity columns for all saved objects. The BulkInsert-method takes three arguments.

public void BulkInsertAll(IList entities, SqlTransaction transaction = null, bool recursive = false)

The list of entities must all be of the same type. The transaction is optional and the recursive flag tells us if we should save only the entities in the list (false) or if we should also save entities in any navigation properties (true). We have two types of navigation properties to deal with.

  1. Single properties, the result of a foreign key in this entity pointing to another entity
  2. Collection properties, the result of another entity having a foreign key to this entity

We start by finding all the single navigation properties and saving them, if they are new, and we then update the entity with the foreign key value. This will have to be done one navigation property at a time. Having done this, we can safely save all our entities. Finally, we go through all collection navigation properties, setting the proper foreign key to our entity whose primary keys we now have and then we make a recursive call to the BulkInsert-method for each of these collection navigation properties.

Using the Code

All you have to do is to add the partial class below after renaming it properly and placing it in the same namespace as your derived DbContext class.

/* * Copyright ©  2017 Tånneryd IT AB *  * This file is part of the tutorial application BulkInsert.App. *  * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at *  *   http://www.apache.org/licenses/LICENSE-2.0 *  * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ using System; using System.Collections; using System.Collections.Generic; using System.Data; using System.Data.Entity.Core.Mapping; using System.Data.Entity.Infrastructure; using System.Data.Entity.Core.Metadata.Edm; using System.Data.SqlClient; using System.Linq; using System.Reflection; using System.Runtime.CompilerServices; using System.Text.RegularExpressions;  namespace BulkInsert.App {     public partial class NumberContext     {         public NumberContext(string nameOrConnectionString)             : base($"name={nameOrConnectionString}")         {         }          public void BulkInsertAll(IList entities, SqlTransaction transaction = null,                                    bool recursive = false)         {             BulkInsertAll(entities, transaction, recursive,              new Dictionary<object, object>(new IdentityEqualityComparer<object>()));         }          private void BulkInsertAll(IList entities, SqlTransaction transaction,          bool recursive, Dictionary<object, object> savedEntities)         {             if (entities.Count == 0) return;              var objectContext = ((IObjectContextAdapter)this).ObjectContext;             var workspace = objectContext.MetadataWorkspace;              Type t = entities[0].GetType();              var mappings = GetMappings(workspace, objectContext.DefaultContainerName, t.Name);             if (recursive)             {                 foreach (var fkMapping in mappings.ToForeignKeyMappings)                 {                     var navProperties = new HashSet<object>();                     var modifiedEntities = new List<object[]>();                     foreach (var entity in entities)                     {                         var navProperty = GetProperty(fkMapping.NavigationPropertyName, entity);                         var navPropertyKey = GetProperty(fkMapping.ToProperty, entity);                          if (navProperty != null && navPropertyKey == 0)                         {                             var currentValue = GetProperty(fkMapping.FromProperty, navProperty);                             if (currentValue > 0)                             {                                 SetProperty(fkMapping.ToProperty, entity, currentValue);                             }                             else                             {                                 navProperties.Add(navProperty);                                 modifiedEntities.Add(new object[] { entity, navProperty });                             }                         }                     }                     if (navProperties.Any())                     {                         BulkInsertAll(navProperties.ToArray(), transaction, true, savedEntities);                         foreach (var modifiedEntity in modifiedEntities)                         {                             var e = modifiedEntity[0];                             var p = modifiedEntity[1];                             SetProperty(fkMapping.ToProperty, e,                                          GetProperty(fkMapping.FromProperty, p));                         }                     }                 }             }              var validEntities = new ArrayList();             var ignoredEntities = new ArrayList();             foreach (dynamic entity in entities)             {                 if (savedEntities.ContainsKey(entity))                 {                     ignoredEntities.Add(entity);                     continue;                 }                 validEntities.Add(entity);                 savedEntities.Add(entity, entity);             }             BulkInsertAll(validEntities, t, mappings, transaction);              if (recursive)             {                 foreach (var fkMapping in mappings.FromForeignKeyMappings)                 {                     var navigationPropertyName = fkMapping.NavigationPropertyName;                      var navPropertyEntities = new List<dynamic>();                     foreach (var entity in entities)                     {                         if (fkMapping.BuiltInTypeKind == BuiltInTypeKind.CollectionType ||                             fkMapping.BuiltInTypeKind ==                             BuiltInTypeKind.CollectionKind)                         {                             var navProperties = GetProperty(navigationPropertyName, entity);                              foreach (var navProperty in navProperties)                             {                                 SetProperty(fkMapping.ToProperty, navProperty,                          GetProperty(fkMapping.FromProperty, entity));                                 navPropertyEntities.Add(navProperty);                             }                         }                         else                         {                             var navProperty = GetProperty(navigationPropertyName, entity);                             if (navProperty != null)                             {                                 SetProperty(fkMapping.ToProperty, navProperty,                                    GetProperty(fkMapping.FromProperty, entity));                                 navPropertyEntities.Add(navProperty);                             }                         }                     }                      if (navPropertyEntities.Any())                     {                         BulkInsertAll(navPropertyEntities.ToArray(), transaction, true, savedEntities);                     }                 }             }         }          private void BulkInsertAll(IList entities, Type t, Mappings mappings,                                lTransaction transaction = null)         {             Set(t).ToString();             var tableName = GetTableName(t);             var columnMappings = mappings.ColumnMappings;              var conn = (SqlConnection)Database.Connection;             if (conn.State == ConnectionState.Closed)                 conn.Open();             var bulkCopy = new SqlBulkCopy(conn, SqlBulkCopyOptions.Default, transaction)                   var properties = t.GetProperties().Where                             (p => columnMappings.ContainsKey(p.Name)).ToArray();             var table = new DataTable();             foreach (var property in properties)             {                 Type propertyType = property.PropertyType;                  // Nullable properties need special treatment.                 if (propertyType.IsGenericType &&                     propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))                 {                     propertyType = Nullable.GetUnderlyingType(propertyType);                 }                  // Ignore all properties that we have no mappings for.                 if (columnMappings.ContainsKey(property.Name))                 {                     // Since we cannot trust the CLR type properties to be in the same order as                     // the table columns we use the SqlBulkCopy column mappings.                     table.Columns.Add(new DataColumn(property.Name, propertyType));                     var clrPropertyName = property.Name;                     var tableColumnName = columnMappings[property.Name].ColumnProperty.Name;                     bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping                                   (clrPropertyName, tableColumnName));                 }             }              // Add all our entities to our data table             foreach (var entity in entities)             {                 var e = entity;                 table.Rows.Add(properties.Select(property => GetPropertyValue                                (property.GetValue(e, null))).ToArray());             }              var cmd = conn.CreateCommand();             cmd.Transaction = transaction;              // Check to see if the table has a primary key with auto identity set. If so             // set the generated primary key values on the entities.             var pkColumn = columnMappings.Values.Where(m =>               m.ColumnProperty.IsStoreGeneratedIdentity).Select(m => m.ColumnProperty).SingleOrDefault();              if (pkColumn != null)             {                  var pkColumnName = pkColumn.Name;                 var pkColumnType = Type.GetType(pkColumn.PrimitiveType.ClrEquivalentType.FullName);                  // Get the number of existing rows in the table.                 cmd.CommandText = $@"SELECT COUNT(*) FROM {tableName}";                 var result = cmd.ExecuteScalar();                 var count = Convert.ToInt64(result);                  // Get the identity increment value                 cmd.CommandText = $"SELECT IDENT_INCR('{tableName}')";                 result = cmd.ExecuteScalar();                 dynamic identIncrement = Convert.ChangeType(result, pkColumnType);                  // Get the last identity value generated for our table                 cmd.CommandText = $"SELECT IDENT_CURRENT('{tableName}')";                 result = cmd.ExecuteScalar();                 dynamic identcurrent = Convert.ChangeType(result, pkColumnType);                  var nextId = identcurrent + (count > 0 ? identIncrement : 0);                  bulkCopy.BulkCopyTimeout = 5 * 60;                 bulkCopy.WriteToServer(table);                  cmd.CommandText = $"SELECT SCOPE_IDENTITY()";                 result = cmd.ExecuteScalar();                 dynamic lastId = Convert.ChangeType(result, pkColumnType);                  cmd.CommandText = $"SELECT {pkColumnName} From {tableName}                         WHERE {pkColumnName} >= {nextId} and {pkColumnName} <= {lastId}";                 var reader = cmd.ExecuteReader();                 var ids = (from IDataRecord r in reader                            let pk = r[pkColumnName]                            select pk)                            .OrderBy(i => i)                            .ToArray();                 if (ids.Length != entities.Count) throw new ArgumentException                            SetProperty(pkColumnName, entities[i], ids[i]);                 }             }             else             {                 bulkCopy.BulkCopyTimeout = 5 * 60;                 bulkCopy.WriteToServer(table);             }         }          private string GetTableName(Type t)         {             var dbSet = Set(t);             var sql = dbSet.ToString();             var regex = new Regex(@"FROM (?<table>.*) AS");             var match = regex.Match(sql);             return match.Groups["table"].Value;         }          private object GetPropertyValue(object o)         {             if (o == null)                 return DBNull.Value;             return o;         }          private Mappings GetMappings(MetadataWorkspace workspace,                                  string containerName, string entityName)         {             var columnMappings = new Dictionary<string, CLR2ColumnMapping>();             var storageMapping = workspace.GetItem<GlobalItem>(containerName, DataSpace.CSSpace);             dynamic temp = storageMapping.GetType().InvokeMember(                 "EntitySetMappings",                 BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance,                 null, storageMapping, null);             var entitySetMaps = new List<EntitySetMapping>();             foreach (var t in temp)             {                 entitySetMaps.Add((EntitySetMapping)t);             }              var entitySetMap = entitySetMaps.Single(m => m.EntitySet.ElementType.Name == entityName);             var typeMappings = entitySetMap.EntityTypeMappings;             EntityTypeMapping typeMapping = typeMappings[0];             var fragments = typeMapping.Fragments;             var fragment = fragments[0];             var properties = fragment.PropertyMappings;             foreach (var property in properties.Where                    (p => p is ScalarPropertyMapping).Cast<ScalarPropertyMapping>())             {                 var clrProperty = property.Property;                 var columnProperty = property.Column;                 columnMappings.Add(clrProperty.Name, new CLR2ColumnMapping                 {                     CLRProperty = clrProperty,                     ColumnProperty = columnProperty,                 });             }               var foreignKeyMappings = new List<ForeignKeyMapping>();             var navigationProperties =                 typeMapping.EntityType.DeclaredMembers.Where                         (m => m.BuiltInTypeKind == BuiltInTypeKind.NavigationProperty)                     .Cast<NavigationProperty>()                     .Where(p => p.RelationshipType is AssociationType)                     .ToArray();              foreach (var navigationProperty in navigationProperties)             {                 var relType = (AssociationType)navigationProperty.RelationshipType;                  if (foreignKeyMappings.All(m => m.Name != relType.Name))                 {                     var fkMapping = new ForeignKeyMapping                     {                         NavigationPropertyName = navigationProperty.Name,                         BuiltInTypeKind = navigationProperty.TypeUsage.EdmType.BuiltInTypeKind,                         Name = relType.Name,                         FromType = relType.Constraint.FromProperties.Single().DeclaringType.Name,                         FromProperty = relType.Constraint.FromProperties.Single().Name,                         ToType = relType.Constraint.ToProperties.Single().DeclaringType.Name,                         ToProperty = relType.Constraint.ToProperties.Single().Name,                     };                     foreignKeyMappings.Add(fkMapping);                 }             }              return new Mappings             {                 ColumnMappings = columnMappings,                 ToForeignKeyMappings = foreignKeyMappings.Where(m => m.ToType == entityName).ToArray(),                 FromForeignKeyMappings = foreignKeyMappings.Where                                 (m => m.FromType == entityName).ToArray()             };         }          private dynamic GetProperty(string property, object instance)         {             var type = instance.GetType();             return type.InvokeMember(property, BindingFlags.GetProperty | BindingFlags.Public |                    BindingFlags.Instance, Type.DefaultBinder, instance, null);         }          private void SetProperty(string property, object instance, object value)         {             var type = instance.GetType();             type.InvokeMember(property, BindingFlags.SetProperty | BindingFlags.Public |                    BindingFlags.Instance, Type.DefaultBinder, instance, new[] { value });         }     }      class IdentityEqualityComparer<T> : IEqualityComparer<T> where T : class     {         public int GetHashCode(T value)         {             return RuntimeHelpers.GetHashCode(value);         }          public bool Equals(T left, T right)         {             return left == right; // Reference identity comparison         }     }      class Mappings     {         public Dictionary<string, CLR2ColumnMapping> ColumnMappings { get; set; }         public ForeignKeyMapping[] ToForeignKeyMappings { get; set; }         public ForeignKeyMapping[] FromForeignKeyMappings { get; set; }     }      class CLR2ColumnMapping     {         public EdmProperty CLRProperty { get; set; }         public EdmProperty ColumnProperty { get; set; }     }      class ForeignKeyMapping     {         public BuiltInTypeKind BuiltInTypeKind { get; set; }         public string NavigationPropertyName { get; set; }         public string Name { get; set; }         public string FromType { get; set; }         public string FromProperty { get; set; }         public string ToType { get; set; }         public string ToProperty { get; set; }     } }

You can also download the Bulkinsert.App tutorial application from this link.

History

  • 2017-04-15
    • Fixed yet another minor bug related to complex entity graphs
  • 2017-04-07
    • Changed the software license to Apache License 2.0
  • 2017-03-29
    • Fixed some minor bugs and added support for complex entity graphs where the same entity object exists in multiple places. Now making sure to save each entity only once!
  • 2017-03-17
    • Updated version of sample code uploaded. Significant performance improvements when saving complex entity graphs
  • 2017-03-10
    • Initial version uploaded