Data Migration

From CaisisWiki

Revision as of 16:43, 4 November 2009 by Nazemis (Talk | contribs)
Jump to: navigation, search
File:Wikipuzzlepiece50px.png Please help expand this article.


Contents

Before You Start

Goal

Import historical data (legacy system) into Caisis tables (SQL Server 2005) using a staging database (MS Access 2003/2007).

  • The staging database is a test environment that mirrors the destination database (in this case, Caisis).
  • The source data can be in any file type, but we either import the data into our staging database or link to it (via linked tables).

Assumptions

  • These migrations are for a 1-time dump of data.
  • The legacy system should be retired after migration.
  • You have access to the source data.
  • Buy-in owners of the data (or head of group being migrated into Caisis).

Necessary skills

  • SQL
  • T-SQL
  • MS Access

Necessary knowledge

  • knowledge of terminology (i.e. disease-specific vocabulary) used in source system.
    • Ensures that the source vocabulary matches what you have been using in Caisis.
  • knowledge of source system model.
  • knowledge of Caisis model.

Why MS Access?

  • Easy to use: Visual interface for building and organizing queries.
  • Distribution of workload
    • analyst with MS Access skills can stage the data while your DBA can maintain their day-to-day tasks and append the data to production Caisis.
    • It's possible that the analyst doesn't have access to your SQL Server database.
  • Maintains referential integrity between primary and foreign keys.
  • Maintains structural integrity such as required fields, data types, and max characters.
  • SQL Server can easily link to MS Access.

Map Source Data to Caisis

  • Field-by-field mapping of source data to specific Caisis table(s) and field(s).
Sample mapping
Source Field CommentsCaisisNotes
IDPrimary key DO NOT IMPORT
Med_rec_no Patients.PtMRN
Surgery_date OperatingRoomDetails.OpDate, Procedures.ProcDate
DOBPatients.PtBirthDate
AgeCALCULATED – DO NOT IMPORT
Patient_sex Patients.PtGender
Patient_last Patients.PtLastName
Patient_first Patients.PtFirstName
Resp_MD Procedures.ProcSurgeon ProcSurgeonType=Attending
Best practice: include owner(s) of data and/or stakeholders in this process.

Tips for Mapping Your Data

  1. How do you know where to put the data in Caisis?
    • you must understand Caisis structure
    • must work with expert of source system
  2. What if the source data point doesn’t fit?
    • make sure it is a true data point and not derived (calculated)
    • one option is to update Caisis data model (requires development skills to update application)
    • alternative option: migrate points that fit, store key to legacy data and re-visit the gap(s) after data model is updated to accommodate.
    • if you want to add a field, let us know – we may want to add it to the core structure.

Staging in MS Access

Goal: create a staging database that mirrors Caisis contraints and structure.

Note: At this point, you need someone with DBA/SQL Server skills.  Gets technical.

Step 1: Create "Caisis shell" in MS Access

Step 2: Connect to source data from within MS Access Staging Database

  • Best practice: link to the data source by creating linked tables rather than importing source data into staging database.
    • Why?: This helps with versioning issues (prevents branching caused by separate copies of the source data). Any changes made to the source data will be automatically available in the staging database.

Step 3: Append source data to staging tables (based on mapping)

  • Best practice: populate Patients table first - even if you do not plan to import data to Patients table in Caisis (i.e. patients already exist in Caisis).

Tips for Staging Your Data

  • Don't forget required fields (i.e. EnteredBy, EnteredTime, etc.).
  • Be careful with Memo data types in MS Access. May truncate in Caisis (i.e. max notes column is 1000).
    • When SQL Server Export Wizard generates Caisis staging tables in your MS Access staging database, it estimates the best option for data type. For instance, notes fields in Caisis (i.e. PathNotes) are varchar(1000) which translates to a Memo field in MS Access. If you stage notes data in this MS Access field, it may exceed the 1000 character limit of the Caisis notes field.
  • DateText fields should be formatted as text. In MS Access, use the format expression: Format([Date], "mm/dd/yyyy")

Connect SQL Server to Staging DB

  • Create linked server between SQL Server and your MS Access staging database.
    • Provider: Microsoft Jet 4.0 OLE DB Provider

Staging DB To SQL Server 2005 (T-SQL Script)

This section provides snippets of T-SQL that can be used to build a migration script for importing data from a staging datababse into Caisis database.

Disclaimer:

*The following code is intended to serve as a sample and should not be considered a production-ready solution. Different systems and migrations have different requirements and your migration script should be written to suit your specific needs.

*These scripts should not be run during regular operating hours. You should schedule your migration at off-peak time.

Step 1: Disable all trigers

EXEC spUtilityToggleTriggers 'DISABLE', '%'

Step 2: Add temporary columns to parent tables

  • These columns will store the primary keys from your staging database.
  • This helps maintain the relationships between parent/child tables that were established in your staging database.
  • This is only necassary for tables that have child tables associated with them (if you will insert data into those child tables).
EXEC spUtilityToggleColumn 'Patients', 'ADD', 'StagingPatientId varchar(50) null'
EXEC spUtilityToggleColumn 'Encounters', 'ADD', 'StagingPK int null'

Step 3: Add 'MatchedPatient' column to Patients table

  • In the next step, we will insert a bit into this column that identifies if a patient already exists in the Patients table.

EXEC spUtilityToggleColumn 'Patients', 'ADD', 'MatchedPatient bit null'

GO

Step 4: Identify existing patients

  • Identify patients that already exist in Caisis -- Matching records
UPDATE	Patients
SET	Patients.MatchedPatient = 1, Patients.SourcePatientId = b.PatientId
FROM	Patients INNER JOIN StagingDB.dbo.Patients b ON Patients.PtMRN = b.PtMRN

Step 5: Patients table

Insert records in Patients table for non-matching (don't already exist) patients

<source lang="tsql"> INSERT INTO Patients (PtMRN, LungPatientId, PtTitle, PtFirstName, PtMiddleName, PtLastName, PtSuffix, PtAlias, PtMaidenName,

            PtGender, PtCompany, PtBusinessCity, PtBusinessState, PtAddress1, PtAddress2, PtCity, PtState, 
            PtPostalCode, PtCountry, PtEmail, PtBirthDateText, PtBirthDate, PtBirthPlace, PtDeathDateText, 
            PtDeathDate, PtDeathType, PtDeathCause, PtContactPerson, PtContactAddress1, PtContactAddress2, 
            PtContactCity, PtContactState, PtContactPostalCode, PtContactPhone, PtSpouse, PtRace, PtEthnicity, 
            PtLanguage, PtContactStatus, PtNotes, EnteredBy, EnteredTime, UpdatedBy, UpdatedTime, LockedBy, LockedTime)

SELECT a.PtMRN, a.PatientId, a.PtTitle, a.PtFirstName, a.PtMiddleName, a.PtLastName, a.PtSuffix, a.PtAlias, a.PtMaidenName,a.PtGender, a.PtCompany, a.PtBusinessCity, a.PtBusinessState, a.PtAddress1, a.PtAddress2, a.PtCity, a.PtState,a.PtPostalCode, a.PtCountry, a.PtEmail, a.PtBirthDateText, a.PtBirthDate, a.PtBirthPlace, a.PtDeathDateText,a.PtDeathDate, a.PtDeathType, a.PtDeathCause, a.PtContactPerson, a.PtContactAddress1, a.PtContactAddress2,a.PtContactCity, a.PtContactState, a.PtContactPostalCode, a.PtContactPhone, a.PtSpouse, a.PtRace, a.PtEthnicity,a.PtLanguage, a.PtContactStatus, a.PtNotes, a.EnteredBy, a.EnteredTime,

       a.UpdatedBy, a.UpdatedTime, a.LockedBy, a.LockedTime 

FROM StagingDB.dbo.Patients a LEFT OUTER JOIN Patients ON a.PtMRN = Patients.PtMRN WHERE Patients.PtMRN IS NULL </source>

Step 6: PatientInstitutions

  • Set the institution Id assigned to the data set you’re migrating
  • Note: if this is a new institution (Caisis data set), you must first add it to the Institutions table. Use the InsitutionId value that is generated in the Institutions table as your @InstitutionId parameter below.

<source lang="tsql"> DECLARE @InstitutionId int SET @InstitutionId = 1

INSERT INTO PatientInstitutions (PatientId, InstitutionId, EnteredTime, UpdatedTime ) SELECT PatientId,@InstitutionId,GETDATE(), GETDATE() FROM Patients WHERE Patients.StagingPatientId IS NOT NULL AND

       PatientId NOT IN (SELECT PatientId 
                          FROM PatientInstitutions 
                          WHERE InstitutionId = @InstitutionId)

</source>

Step 7: Migrate other tables

The example below demonstrates how to join associated parent/child tables using the temporary StagingPK column.

<source lang="tsql"> --ENCOUNTER--------------------

INSERT INTO Encounters (PatientId, LungPK, EncPending, EncDateText, EncDate, EncType, EncChiefComplaint, EncPhysician, EncFellow, EncNurse, EncPresentation, EncStatus, EncBP, EncPulse, EncTemp, EncRespiration, EncHeight, EncWeight, EncBSA, EncBMI, EncKPS, EncECOG_Score, EncZubrod_Score, EncInstitution, EncNotes, EncDataSource, EncQuality, EnteredBy, EnteredTime, UpdatedBy, UpdatedTime, LockedBy, LockedTime ) SELECT b.PatientId, a.EncounterId, a.EncPending, a.EncDateText, a.EncDate, a.EncType, a.EncChiefComplaint, a.EncPhysician, a.EncFellow, a.EncNurse, a.EncPresentation, a.EncStatus, a.EncBP, a.EncPulse, a.EncTemp, a.EncRespiration, a.EncHeight, a.EncWeight, a.EncBSA, a.EncBMI, a.EncKPS, a.EncInstitution, a.EncECOG_Score, a.EncZubrod_Score, a.EncNotes, a.EncDataSource, a.EncQuality,a.EnteredBy, a.EnteredTime, a.UpdatedBy, a.UpdatedTime, a.LockedBy, a.LockedTime FROM StagingDB.dbo.Encounters a INNER JOIN Patients b ON a.PatientId = b.StagingPatientId

--EncExams---------------------

INSERT INTO EncExams EncounterID, ExamSystem, Exam, ExamResult, ExamAttendingPerformed, ExamNotes, EnteredBy, EnteredTime, UpdatedBy, UpdatedTime) SELECT b.EncounterID, a.ExamSystem, a.Exam, a.ExamResult, a.ExamAttendingPerformed, a.ExamNotes, a.EnteredBy, a.EnteredTime, a.UpdatedBy, a.UpdatedTime FROM StagingDB.dbo.EncExams a INNER JOIN Encounters b ON a.EncounterId = b.StagingPK </source>

Step 8: Drop temporary columns

<source lang="tsql"> EXEC spUtilityToggleColumn 'Patients', 'DROP', 'StagingPatientId' EXEC spUtilityToggleColumn 'Encounters', 'DROP', 'StagingPK' </source>

Step 9: Enable triggers

<source lang="tsql"> EXEC spUtilityToggleTriggers 'ENABLE','%' </source>

Step 10: Test the migration

  • Structure of Caisis may change during time spent on migration.
  • Source system (if active during migration) may have changed, requiring update of script.
  • Check to make sure data is in the right place after you run the script.
    • if possible, view data in UI using a test instance of Caisis.

Percent Effort Estimates

Tip: Overestimate your timeline when creating a roadmap for a migration as it is difficult to truly assess the work involved in a migration until you are at the staging database step.

  • Strategy and planning [10% of effort]
  • Map source data to Caisis (spreadsheet) [10% of effort]
  • Staging database [50% effort]:
    • Step 1: Use SQL Server Import and Export Wizard to create staging database in MS Access.
    • Step 2: Link to source data from within staging database.
    • Step 3: Write append queries to insert source data into Caisis staging tables.
  • Connect SQL Server to staging database (via Linked Server). [<1% of effort]
  • Write T-SQL script to insert data into Caisis tables and test migration. [20% of effort]
  • Misc. unforeseen issues [10% of effort]: waiting for access to source data, gathering necessary knowledge, etc.
Personal tools