Data Migration

From CaisisWiki

Jump to: navigation, search


Before You Start


Import historical data (legacy system) into Caisis tables (SQL Server 2005) using staging tables.

  • The staging database is a test environment that mirrors the destination database (in this case, Caisis).
  • The source data can be in any file type, but we either import the data into our staging database or link to it (via linked tables).


  • These migrations are for a 1-time dump of data.
  • The legacy system should be retired after migration.
  • You have access to the source data.
  • Buy-in owners of the data (or head of group being migrated into Caisis).

Necessary skills

  • SQL Server Integration Services (SSIS)
  • T-SQL

Necessary knowledge

  • knowledge of terminology (i.e. disease-specific vocabulary) used in source system.
    • Ensures that the source vocabulary matches what you have been using in Caisis.
  • knowledge of source system model.
  • knowledge of Caisis model.

Map Source Data to Caisis

  • Field-by-field mapping of source data to specific Caisis table(s) and field(s).
Sample mapping
Source Field CommentsCaisisNotes
Med_rec_no Patients.PtMRN
Surgery_date OperatingRoomDetails.OpDate, Procedures.ProcDate
Patient_sex Patients.PtGender
Patient_last Patients.PtLastName
Patient_first Patients.PtFirstName
Resp_MD Procedures.ProcSurgeon ProcSurgeonType=Attending
Best practice: include owner(s) of data and/or stakeholders in this process.

Tips for Mapping Your Data

  1. How do you know where to put the data in Caisis?
    • you must understand Caisis structure
    • must work with expert of source system
  2. What if the source data point doesn’t fit?
    1. make sure it is a true data point and not derived (calculated)
    2. one option is to update Caisis data model (requires development skills to update application)
    3. virtual field (more on this in a later document)
    4. alternative option: migrate points that fit, store key to legacy data in RelatedRecords table and re-visit the gap(s) after data model is updated to accommodate.
    5. if you want to add a field, let us know – we may want to add it to the core structure.

Staging Tables

Goal: create staging tables in SQL that mirror Caisis contraints and structure. This is important to make sure that you data is in a format suitable for Caisis to consume.

Note: At this point, you will need some T-SQL language knowledge.  Gets technical.

Append source data to staging tables (based on mapping)

  • Best practice: populate Patients table first - even if you do not plan to import data to Patients table in Caisis (i.e. patients already exist in Caisis).

Tips for Staging Your Data

  • Don't forget required fields (i.e. EnteredBy, EnteredTime, etc.).
  • Be careful with Memo data types in MS Access. May truncate in Caisis (i.e. max notes column is 1000).
  • Caisis has a DateText field that goes along with each Date field. Make sure that all DateText fields are formatted as text.

Connect SQL Server to Staging DB

  • If your source data is in MS Access, you can create linked server between SQL Server and your MS Access database.
    • Provider: Microsoft Jet 4.0 OLE DB Provider

Source DB To SQL Server 2005 (T-SQL Script)

This section provides snippets of T-SQL that can be used to build a migration script for importing data from a source datababse into Caisis database.


*The following code is intended to serve as a sample and should not be considered a production-ready solution. Different systems and migrations have different requirements and your migration script should be written to suit your specific needs.

*These scripts should not be run during regular operating hours. You should schedule your migration at off-peak time.

*This script is intended to serve as a solution that does not modify production tables. Instead, it uses temporary staging tables to enforce referential integrity between source and production tables.

*This document outlines one approach. The Caisis Team is can not be responsible for any loss of data or any other occurrence while using this approach.

Step 1: Declare Variables

CREATE TABLE #vars (vname varchar(50), vvalue varchar(50), ivalue int)
--Set the institution Id assigned to the data set you’re migrating (in this example 10):
INSERT INTO #vars (vname, ivalue) values ('InstitutionId', 10)
INSERT INTO #vars (vname, vvalue) values ('EnteredBy', 'DataMigration')

Step 2: Fill in destination_PatientId for existing pts

  • In staging environment, store primary key PatientId of production.
-- Fill in Destination_PatientId for existing pts.
-- Destination_PatientId is the PatientId (primary key) in the destination Patients table.
-- This will provide a mapping between the source Patients table record and the destination Patients table record.
UPDATE Staging.dbo.Patients
SET Destination_PatientId = b.PatientId
FROM Staging.dbo.Patients a INNER JOIN Production.dbo.Patients b ON a.PtMRN = b.PtMRN

Step 3: Add new patients

INSERT INTO Production.dbo.Patients (PtMRN, EnteredBy, EnteredTime)
FROM Staging.dbo.Patients
WHERE Destination_PatientId IS NULL

Step 4: Fill in Destination_PatientId for new pts

Since MRN is unique per patient and also unique per record in the Patients table,
it can therefore be used as natural join between source Patients table 
and destination Patients table.
UPDATE Staging.dbo.Patients
SET Destination_PatientId = b.PatientId
FROM Staging.dbo.Patients a INNER JOIN Production.dbo.Patients b ON a.PtMRN = b.PtMRN
WHERE a.Destination_PatientId IS NULL

Step 5: PatientInstitutions

*Set the institution Id assigned to the data set you’re migrating
*Note: if this is a new institution (Caisis data set), you must first add it to the Institutions table.
INSERT INTO Production.dbo.PatientInstitutions (PatientId, InstitutionId, EnteredTime, UpdatedTime )
SELECT Destination_PatientId, (SELECT ivalue FROM #vars WHERE vname = 'InstitutionId'), 
      EnteredTime, EnteredTime
FROM Staging.dbo.Patients
WHERE Destination_PatientId NOT IN (SELECT PatientId
                                   FROM Production.dbo.PatientInstitutions
                                   WHERE InstitutionId = (SELECT ivalue FROM #vars 
                                                         WHERE vname = 'InstitutionId'))

Step 6: Migration Other Tables

-------FAMILY MEMBERS------------------------------------------------------------------
-- Add new FamilyMembers records based on this specific sorting. 
INSERT INTO Production.dbo.FamilyMembers
 (PatientId, FamMemNum, FamMemSide, FamMemRelation, FamMemDeathAge, 
  EnteredBy, EnteredTime, UpdatedBy, UpdatedTime, LockedBy, LockedTime)
SELECT b.Destination_PatientId, FamMemNum, FamMemSide, FamMemRelation, FamMemDeathAge, 
      a.EnteredBy, a.EnteredTime, a.UpdatedBy, a.UpdatedTime, a.LockedBy, a.LockedTime
FROM Staging.dbo.FamilyMembers a INNER JOIN Staging.dbo.Patients b ON a.PatientId = b.PatientId
-- NOTE: It is important ORDER BY the production table's identity column (in this case FamilyMemberId)
-- so that the records sort sequentially.
ORDER BY a.PatientId ASC, a.FamilyMemberId ASC
-- Fill in Dest_FamilyMemberId for new records added
SET Destination_FamilyMemberId = dest.FamilyMemberId
FROM Staging.dbo.FamilyMembers a 
-- NOTE: Use the same fields in ROW_NUMBER that you used in ORDER BY.  
-- The combination of these fields (in this example a.PatientId and a.FamilyMemberId) should be unique.
-- We are doing this to generate matching SequenceNum in both Staging and Production.  
-- They will be joined in the next statement.
SELECT b.Destination_PatientId, a.PatientId, a.FamilyMemberId,
      ROW_NUMBER() OVER (ORDER BY a.PatientId ASC, a.FamilyMemberId ASC) SequenceNum
FROM Staging.dbo.FamilyMembers a INNER JOIN Staging.dbo.Patients b ON a.PatientId = b.PatientId
) src ON a.PatientId = src.PatientId AND a.FamilyMemberId = src.FamilyMemberId
-- NOTE: SequenceNum in Staging will be joined to SequenceNum in Production:
-- NOTE: In the (ORDER BY FamilyMemberId ASC) below, make sure to use the table's primary key.
SELECT PatientId, FamilyMemberId, ROW_NUMBER() OVER (ORDER BY FamilyMemberId ASC) SequenceNum
FROM Production.dbo.FamilyMembers
WHERE EnteredBy = (SELECT vvalue FROM #vars WHERE vname = 'EnteredBy')
) dest ON src.SequenceNum = dest.SequenceNum AND src.Destination_PatientId = dest.PatientId
-------FAMILY MEMBERS DX---------------------------------------------------------------
-- Add new FamilyMembersDx records
INSERT INTO Production.dbo.FamilyMemberDiagnosis
 (FamilyMemberId, FamMemDiagnosis, FamMemDiagnosisAge, FamMemNotes, 
  EnteredBy, EnteredTime, UpdatedBy, UpdatedTime)
SELECT b.Dest_FamilyMemberId, FamMemDiagnosis, FamMemDiagnosisAge, FamMemNotes, 
      a.EnteredBy, a.EnteredTime, a.UpdatedBy, a.UpdatedTime
FROM Staging.dbo.FamilyMemberDiagnosis a INNER JOIN Staging.dbo.FamilyMembers b ON 
                                                   a.FamilyMemberId = b.FamilyMemberId

Step 7: Test the migration

  • Structure of Caisis may change during time spent on migration.
  • Source system (if active during migration) may have changed, requiring update of script.
  • Check to make sure data is in the right place after you run the script.
    • if possible, view data in UI using a test instance of Caisis.

Percent Effort Estimates

Tip: Overestimate your timeline when creating a roadmap for a migration as it is difficult to truly assess the work involved in a migration until you are at the staging database step.

  • Strategy and planning [10% of effort]
  • Map source data to Caisis (spreadsheet) [10% of effort]
  • Staging database [50% effort]:
    • Step 1: Create staging tables in SQL.
    • Step 2: Link to source data.
    • Step 3: Write append queries to insert source data into Caisis staging tables.
  • OPTIONAL: Connect SQL Server to staging database (via Linked Server). [<1% of effort]
  • Write T-SQL script to insert data into Caisis tables and test migration. [20% of effort]
  • Misc. unforeseen issues [10% of effort]: waiting for access to source data, gathering necessary knowledge, etc.
Personal tools