Master Plan

Overview

Master plan for development of NCI-MEME.

https://wcinformatics.atlassian.net/secure/RapidBoard.jspa?rapidView=1&projectKey=NE&view=planning.nodetail&selectedIssue=NE-8&epics=visible

Questions

Timeline

  • Production deploy
    • 6/12/2017 currently editing about 900 RADLEX concepts on MEME4
    • Early July will likely add May/June NCI thesaurus together into MEME4 (not sure on size, complexity of this - and also dependent on new OWL2 inversion process)
    • Transfer to new system for editing will happen 2nd half of July or later
    • Will likely try to add NCI thesaurus at that point and shortly after add MTH
    • Documented on NCI wiki on page "Loading MEME from NCI Meta"
  • Priorities
    • Documentation
      • IN PROGRESS: NE-325: Screen shots (JFW)
      • TODO: Text on wiki (BAC)
      • NE-308: 30-sec to 1 min training videos (DSS, RAW)
    • Mini run-throughs (stock dev build + running stock processes) - determine "correctness"
      • Pre-production,  Release (DONE)
        • CORRECTNESS: compare output to input (REQUIRES work on MRSAT, MRREL, MRSAB, and MRHIER(MSH))
      • Pre-production, Prod-Mid Cleanup, Feedback (requires a "release directory" (matching pre-prod) with MR files in it, can be exactly same as input MR files) (DONE)
      • NCI Insertion, SNOMEDCT_US Insertion, MTH Insertion
        • CORRECTNESS: verify assumptions and volume of data related to each step of the insertion.
      • NCI Insertion, Pre-production, Release
      • NCI Insertion, Pre-production, Prod-Mid Cleanup, Feedback
    • Scale testing
      • DONE: ResetNciMetaDataDatabase - using 201610 data (input.dir = <dir with https://wci1.s3.amazonaws.com/NCI/NCIM_201610.zip>)
      • DONE: NCI Insertion - DONE
      • DONE: SNOMEDCT_US Insertion - IN PROGRESS
      • MTH Insertion
      • DONE: Pre-Production, Release
      • Prod-Mid Cleanup, Feedback
    • Feedback/Fixes - see Jira
      • NCI-337: - support ability to edit computeHierarchy flag for root terminologies in the UI
      • There is still some concern about management of retired project concepts
        • prod-mid cleanup should remove any old version atoms (as it does)
        • NE-322: It should then remove any "empty" concepts (no atoms) that were not assigned a CUI (e.g. id = terminologyId).
        • NE-323: Reload CUI history should ensure that there is not more than one project concept in the database with the same CUI assignment (as terminologyId). 
      • ongoing work.
      • NCI-317: Turn GenerateNciMetaDAta loads of bins into files like workflowQa.txt for all bin types.
      • NCI-336: Improvement for bin query editing,  have the "TEst" button send back also a count and some examples (or have a different button for that)

Diagrams

Processes/Algorithms

Insertion Process (algorithms in order)

  • DONE: NCI Insertion
  • DONE: SNOMED Insertion
  • DONE: UMLS Insertion
  • DONE: RemapComponentInfoRelationships (maint/ins)

Maintenance Algorithms

  • DONE: MatrixInitializerAlgorithm
  • DONE: StampingAlgorithm
  • DONE: LexicalClassAssignmentAlgorithm
  • DONE: ComputePreferredNamesAlgorithm (code, concept, descriptor)
    • computes and sets preferred names where doesn't match
    • also computes and sets "publishable" where doesn't match
  • DONE: ReindexAlgorithm
    • indexed objects parameter
  • DONE: RelplaceAttributesAlgorithm extends AttributeLoaderAlgorithm
    • attributes.src (in the inputPath)
    • Any source, attribute_name combo in the file gets removed - match on terminology, version, name
    • attributes.src gets loaded as normal.
    • ALTERNATE: implement as AttributeLoaderAlgorithm with a "replace" parameter (boolean)
    • ALSO: make a maintenance process that runs just this with replace turned on and a name of "Replace Attributes"
  • IN PROGRESS: ReplaceRelationshipsAlgorithm extends RelationshipsLoaderAlgorithm
    • relationships.src, contexts.src (in the inputPath)
    • Same as above
    • Match on terminology,version, relationshipType, additionalRelationshipType (and inverses)
    • ALSO: make a maintenance process that runs just this with replace turned on and a name of "Replace Attributes"
  • IN PROGRESS: ReplaceContextsAlgorithm extends ContextLoaderAlgorithm
    • Same as above
    • Match on terminology, version
    • ALSO: make a maintenance process that runs just this with replace turned on and a name of "Replace Attributes"
  • IN PROGRESS: RecomputeContextsAlgorithm - calls remove tree positions, then compute tree positions
  • DONE: use rootTerminology.hierarchyComputable to know whether or to load/compute (update Context Loader)
    • root terminology editing in interface needs to support changing this.

Pre-Production Processes

  • DONE: CreateNewReleaseAlgorithm → can "reset" this if you don't like validation problems
    • Make release directories
    • Create "release info" for the current release
    • For each terminology, 
      • if current terminology with null firstRelease, set it to this release (thisRelease = releaseInfo.getVersion(), 
        • key for map is project.getTerminology());
      • if non-current terminology without a null lastRelease, set it to this release
  • DONE: ComputePreferredNamesAlgorithm
    • Compute project concept preferred names
  • DONE: CreateNciPdqMapAlgorithm (concepts/atoms/etc need to exist here).
    • Compute map set for NCI/PDQ maps, including all infrastructure.
  • DONE: AssignReleaseIdentifiersAlgorithm
    • Assign CUIs
    • Assign STY ATUIs
    • Assign RUIs for concept level srels
  • DONE: ComputeContextTypeAlgorithm (also assigns SIB RUIs)
    • Compute includeSiblings
    • Compute polyhierarchy flag
    • Compute SIB RUIs (for release)
  • DONE: PrepareMetamorphoSysAlgorithm → may just send an email to Joanne to do this manually


Release Processes

  • DONE: WriteRrfMetadataFilesAlgorithm
    • *write release.dat also
  • DONE: WriteRrfContentFilesAlgorithm
    • Write AMBIG and CHANGE files.
  • DONE: WriteRrfHistoryFilesAlgorithm (inlcluding writing NCI files)
    • IN PROGRESS: refactoring..
  • DONE: WriteRrfIndexesFilesAlgorithm
  • DONE: ValidateReleaseAlgorithm (referential CUI checking, etc.)
  • DONE: RunMetamorphoSysAlgorithm (this makes METASUBSET which is used for packaging)
    • This also uses make_config.csh (now in project) to build the final prop files
  • DONE: PackageReleaseAlgorithm
    • Create the final .zip file?  = see the wiki instructions

Release Feedback Algorithm

  • DONE: ComputePreferredNamesAlgorithm
  • DONE: ReloadConceptHistoryAlgorithm
    • Remove dead concepts and concept history
    • Reload from MRCUI (as RrfLoaderAlgorithm does)
  • DONE: FeedbackReleaseAlgorithm (needs the "mr directory") - e.g. getInputPath() + "/mr/" + getVersion()
    • For each atom
      • Load atom→cui map from MRCONSO, remove or replace alternate terminology id based on this
      • Relace the lastPublishedRank with the correct value based on MRCONSO (TS/STT computation - see RrfLoaderAlgorithm

Prod-MID Cleanup Process

  • DONE: UpdatePublishedAlgorithm
    • Mark atoms/concepts/descriptors,codes,XXRelationship,definitions,Attributes,SemanticTypecomponents,Mappings/Mapsets/Subsets,AtomSubsetMember,ConceptSubsetMember... as published if publishable and !published (e.g. do this before any insertion).
      • → no molecular actions
  • DONE:ProdMidCleanupAlgorithm
    • RemoveUnpublishableAndUnpublishedConcepts
    • For each atom, code, concept, descriptor that has a terminology that is !isCurrent()
      • Remove the attributes, definitions, semanticTypeComponents
      • Remove the relationships 
      • Remove the entity (consider a "cascade" delete function in ContentServiceJpa. e.g. removeConcept(Long id, boolean cascade)
    • For each component info relationship that is not current
      • Remove the entity
    • For each map set that has a terminology that is !isCurrent()
      • Remove the attributes,
      • remove the mappings (and the mapping attributes)
      • Remove the entity (consider a "cascade" delete function)
    • For each subset that has a terminology that is !isCurrent()
      • Remove the attributes
      • remove the members (and member attributes)
      • Remove the entity (consider a "cascade" delete function)
    • Consider truncating action tables, log entries, etc.
  • DONE: ComputePreferredNamesAlgorithm

Sample Data

  • DONE: SAMPLE_NCI - contains a "sample meta" generation of the NCI META 201604.
  • DONE: Include id assignment files
  • DONE: Include unpublished data files
  • Sample SRC files
    • DONE: NCIt insertion - mini
      • Load RRF from latest release,
      • Process next NCIt
    • DONE: SNOMED insertion - mini
    • DONE: UMLS Insertion - Mini

Back End

  • DONE: Application Config/ Project Management/ WorkflowConfig management
    • AccessRestriction - By Project (based on user role??)
      • READ_ONLY - null
      • ADMIN (maintenance/insertion is OK, but not authoring) - checked by meta editing service and workflow service (performWorfklowAction).
      • AUTHORING (all changes are allowed) - (double check this when actions are performed - e.g. service?)
      • Update project should send a websocket event (so that ui can prevent editing)
    • Processes need to check access restriction before starting - checked by process service
    • Process has isRunning and ProcessService will have "isProcessRunning"
      • This represents the global lock.
    • ProcessConfig/Execution, AlgorithmConfig/Execution, process execution should point back to its config, but also copy it (sub/superclass)
  • DONE: Content and Metadata Model Objects
    • DONE: Hierarchies like MSH - not based on transitive closure - need to udpate the RRF loader to support inserting tree positions rather that computing them (but needs configuration)
      • AUI -> CODE -> CODE -> SDUI -> SDUI*

    • DONE: AtomTreePosition/Jpa/UnitTest
    • DONE: SRC concepts - affects RrfLoaderAlgorithm
      • Load code also as a corresponding "organizing class type" (e.g. for RHT) - ?
      • Properly fix the relationships as well.
    • DONE: Rels where type of id1 is different than type of id 2
      • Could just use atom-atom rels with a relationship attribute of the real sg_type1/2
      • ComponentInfoRelationship
    • DONE: Rels represented in both directions (and UIs) 
      • Including reconciling misuse of "relationship group" in both directions.
      • Always use "blank" instead of 0.
      • Handle cases of essentially duplicate RUIs...
        • either disambiguate (e.g. with a DA flag)
        • or clean up the sample data
        • however, the loader should identify and/or correct this data condition. (e.g. identify where RUIs and inverses are ambiguous and just re-assign RUIs completely from scratch)
        • this will matter much more when handling full data.
    • DONE: Atom -> modeled as lowerNameHash and uses MD5
    • DONE: User -> "team" (for modeling groups), then project can be "isTeamBased" like mapping tool.

  • Editing
    • DONE: REST API : MetaEditingServiceRest (Client/Impl)
      • add/removeSemanticType(Long projectId, Long conceptId, SemanticTypeComponent, authToken)
      • merge, move, split
      • add/removeRelationship (concept level)
      • add/remove/updateAtom
      • Approve concept
    • DONE: atomic/molecular actions
    • DONE: ID assignment
      • Perform UI assignment in most cases during actions
      • Do not perform terminologyId assignment for SemanticTypeComponent
      • Do not perform terminologyId assignment for ConceptRelationship (e.g. for UMLS relationships)
      • Do not perform terminologyId assignment for CUIs (e.g. UMLS concepts)
    • DONE: Support uploading an editor manual (track as project attachments - like "ReleaseArtifact" -> rename as Attachment)
  • DONE: Insertion - like a "loader"
    • Recipe
      • Steps are "algorithms" with configuration.  Just add a step, remove a step, reorder a step, or reconfigure a step.
      • all written to be agnostic about SAB.
    • Complete UI handling
    • Attribute -> Defintion, Subset, Mapping, SubsetMember, Mapping (and requisite attributes and ID assignments)
    • Compute delta
    • source data loader?, link to local file system - uses ContentServiceJpa
    • insertion recipes (tracked over time). 
    • src_atom_id handling (need to track in DB because of cross-source relationships)
    • loading data (unit or batch?)
    • merge engine
    • matching/demotions
    • atom ordering
    • Ensure that update releasablity is done before merging so "publishable" is a proxy for "version"
    • Maintenance Tools for insertion
      • Mark deleted CUIs as deleted (e.g. instead of bequeathing them)
      • bequeathing concepts based on matching criteria
      • Performing merges based on matching criteria (e.g. query-based merges)
  • Maintence Tools
    • DONE: Precedence list management
    • Cluster type STY management (Brian has no recollection what this is)
    • DONE: Source information.
    • DONE: insert attributes, insert stys, insert relationships, recompute tree positions ,... (see $MEME_HOME/bin)  (already done in Insertion process algorithms)
  • Workflow
    • DONE: start with workflow stuff from Refset
    • DONE: Model objects
      • TrackingRecord (origConceptIds, componentIds, clusterId, clusterType, etc).
      • Epoch
      • ME, QA, AH bins - WorkflowConfig, WorkflowBin
      • Worklist, Checklist
      • WorkflowBinStatistics
      • WorklistStatistics
      • WorkflowBinDefinition
        • Track "isRequired" as a flag indicating required for release.
    • DONE: WorkflowActionHandler
    • DONE: Services
      • clear/regenerateBin(s)
      • createChecklist/Worklist
      • getWorkflowB
    • DONE: Checklist
      • Creating checklist from a workflow bin
      • Creating checklist from a query (SQL/HQL/Lucene)
      • Creating checklist from a list of conceptId
      • Creating checklist from a list of clusterId, conceptId
      • Creating checklist form a file of (conceptId)
      • Creating checklist form a file of (clusterId, conceptId)
    • DONE: Worklist
      • Creating a worklist -> should probably specify the team
    • DONE: Stamping (batch "concept approval" action)
    • DONE: Semantic type categories - chem/nonchem
      • Model as part of Project -
    • DONE: When lists are returned (e.g. on "finish"), track the edit/review time.  
      • Worklist/Jpa .get/setAuthor/ReviewerTime
    • DONE: Track "team" of a worklsit
      • Worklist/Jpa get/setTeam
      • Project/Jpa get/isTeamBased
    • DONE: Import/Export
      • export worklist (clusterId\tconceptId\tname)
      • export checklist
      • create checklist from file
      • export workflow config
      • import workflow config (on project editing)
  • QA
    • DONE: Matrix init (recompute concept status based on workflow status of embedded objects).
    • DONE: Validation for objects (concept, atom, etc.) as well as validation for actions (e.g. move, merge, split)
    • IN PROGRESS (JFW): MID Validation - query-based validation that feeds into workflow system (e.g. "create checklist")
    • DONE: EMS QA Bins - 
    • DONE: Sty-cooc
    •  

  • Reporting
    • DONE: Concept reports - unit and batch modes
    • DONE: Query-based reports (role-based) (all reports from MEME4 are implemented in new system)(reporting code has been brought over)
    • n/a: Tools for researching issues in inversions (no one used these)
    • DONE: Canned reports
      • Daily editing report

        EMS v3 Daily Editing Report for Dec 01, 2016
        Database: memestg
        Time now: Fri Dec  2 06:02:22 EST 2016
        
        Concepts Approved this day: 105
                          Distinct: 105
        Number of actions this day: 193
        
        Shown below are editing statistics for each authority.  The E-{initials}
        authority shows approvals done in the interface while the S-{initials}
        authority counts batch or stamping approvals.  The percentages show
        the proportion of each, by editor.
        
        Authority  Actions  Concepts Approved  Rels Inserted  STYs Inserted  Splits  Merges
        ---------  -------  -----------------  -------------  -------------  ------  ------
           E-LAR       100        38 (100.0%)         19             1           1      15
        
           E-LLW        93        67 (100.0%)          4             4           2       0
        
        --------------------------------------------
        For more detail, follow this link to the EMS
  • Production
    • DONE: Bequeathal relationship strategies.
    • DONE: LUI reassignment!
    • DONE: CUI assignment
      • last assigned cui
      • last released cui
      • ConceptIdentity table to track max id?
    • DONE: Semantic type Component ATUI assignment
    • DONE: ConceptRelationship RUI assignment
    • DONE: CUI history
      • Model by leaving old concepts around and linking them to "live" concepts (for bequeathal)
      • Need to update concept history with new bequeathal rels, follow recursion, etc.
    • DONE: AUI history
    • DONE: When creating MRHIER, add in the SRC/RHT layer as needed
    • DONE: Incremental release (or export of a single source)
      • what about CUI assignment?
    • DONE: Begin editing cycle?
    • DONE: Abiltity to export RRF for just a single source?? (what about CUIs - can force temp CUIs)
  • Cross-cutting
    • DONE: Websocket
    • DONE: Disable editing (e.g. don't allow editors to make changes) - how to store this?
      • for admin processes, etc?
      • ProjectJpa flag?
    • DONE: Query engine - HQL, SQL, LUCENE -> produces clusterId, conceptId

User Interface

  • Tables
    • TODO: http://ng-table.com/#/columns/demo-reordering  (we support searching, filtering, paging - is there really a need for this?, not simple)
    • DONE: Sortable, Filterable, Column reorderable (and choosable)
    • Control that opens a dialog and lets the user configure (use drag/drop to support this)
      • which columns are shown
      • which columns are included in filtering
      • which columns are sortable
      • column order
    • Consider saving column widths as well (e.g. col-md-??)
  • DONE: Separate interfaces (like the "simple" mechanism) for different data types
  • n/a: Editable concept report as well.  (There is a switch in the config to turn this on if they ask for it - may still need some work)
  • DONE: Websocket event handling (upon object changes)
  • DONE: Administration
    • User admin (create,remove,update, assign to teams, provide roles on projects)
  • DONE: Workflow (for ADMIN/REVIEWER)
    • Clear/regenerate bins
    • Search and create worklists
    • Search and create checklists
  • DONE: Workflow (for AUTHOR)
    • Available/assigned work...

Problems Needing a Solution

  • Make sure "historical" concepts representing MRCUI data all have real CUIs  (need to clarify later - happens in Feedback process)
  • DONE: Make sure empty normalized string works for LUI assignment (make an explicit integration test for this)
  • DONE: SFO/LFO, it's a "rel", AQ/QB relationships
    • do SFO/LFO just live in the database and not get released??
    • Pertainst to MGV_E, MGV_F
    • These live in the DB as "SY" relationships with RELA value either equal to "expanded_form_of" or starting with "mth_" and ending with "_form_of".
    • Don't support SFO/LFO as a relationship type.
    • RelationshipLoaderAlgorith, ReportServiceJpa
  • DONE: Handling of "XR" rels - the UMLS loader (or generate sample data) should make ure there is an XR
    • for that matter we might as well just put it into MRDOC anyway. 
  • DONE: Insertion handling of "current" and "previous" source versions.  Part of the issue is desiring to avoid updating massive numbers of rows (with update releasability).  We may leave this to later as an "optimization" and just do the obvious thing for now. If it turns out to be not a performance issue, then we are good.
  • DONE: Inverse relationships related to thing with relationship groups. (e.g. relationshipIdentity.txt)
    • Ideally each relationship has exactly one inverse.
    • identical relationships participating in multiple groups causes a problem for this.
    • inverse relationships always have a blank group - perhaps we should include the group as a negative number on the inverse so that we can record it, but then negative groups don't get written out to the release.
    • REQUIREMENT: support the same relationship with different groups (currently uses different RUIs)
    • REQUIREMENT: correctly link and identify a single inverse relationship that can be properly maintained
    • REQUIREMENT: inverse relationships of things with groups should not have a group setting (it is non-sensical)
  • DONE: Monster QA - QaDatabase (make it configurable by project).
  • DONE: Metadata publishable (as well as TransitiveRelationship and TreePosition) isn't being managed by release process

Out Of Scope (mostly things for NLM)

  • QA that will get left out
    • Counts, Comparisons, Adjustments, Sampling
    • STY QA
    • Research Unmapped Identifiers
  • TOP level relationships PAR/CHD involving SRC atoms
      • need approval
  • MRAUI history tracking for past releases (just produce the current release atom movements)
  • Hierarchies with different kind of objects used (e.g. CODE-SDUI)
    • Resolution of top-level MSH hierarchy using different SG_TYPE than rest of tree. → must be the same
    • top-level SRC atoms are excepted
  • Mappings loaded from .src that do not use SRC_ATOM_ID are not supported
  • "ST", "DA", "MR" attributes (not present in NCI-META)
  • Support for tobereleased = Y, y, n, N
    • There is "publishable" flag and it is true or false.  that's it.  other mechanisms would need to be used to make this distinction. 
  • Custom Safe replacement steps for different terminologies/termTypes within same set of .src files. 
  • Special support for non-ENG content
    • Hiding/showing in UI
    • Moving/Merge/Split moving non-ENG atoms along with "translation_of" ENG ones.
  • Report stuff:
    • LEXICAL_TAG, LEGACY_CODE, EZ/RN:EC NUMBER
  • Definition or LT editing
  • Some integrity checks - MGV_B2, MGV_D, ...
  • Relationship type "LK"
  • P level relationships  (now there are just atom-atom and a workflowSTatus of "DEMOTION").
  • Embryo concept status
  • Team assignment for worklists.
  • Content Views
  • AH bins
  • Cluster types beyond "chem" and "nonchem"
  • QA "sampling" and UI
  • Map metadata editor
  • Map set viewer
  • Source information management system
  • Handling of AQ/QB relationships in a special way for release (these are just regular relationships)

Inversion "TODO" List

  • TODO: Where scripts pull from database → need new scripts for this.
  • contexts.src only needs to exist if transitive closure can be used. otherwise, should always use the same value for "sg id"
    • QUESTION: does NCI-META have inconsistent SG type in hierarchies? A: likely  no.
    • QUESTION: can we use HCD as a proxy for non-computable hierarchies? A: not sure → run a query where atom_id_2 has different sets of atom_id_1 for different PTR/RELA. - LOINC, MSH
  • attributes.src: no need for
    • CONTEXT attributes in attributes.src
  • classes_atoms.src: leave order id blank (don't compute it anymore)
  • MRDOC.RRF: RELA inverses - both ends should ALWAYS be in MRDOC.RRF
  • termgroups.src: norm_exclude and exclude flags → always use the same value
  • sources.src - acquisition contact has different from than content contact.
  • attributes/relationships: Does NCI ever connect to an older-version SRC_ATOM_ID? NO → just keep most recent
  • Stop computing "hashcode" field for attributes.src
  • sty_term_ids → has a third field, ok to always set to 0

Loading From NCI-META Release

  • Start with an NCI-META Release
  • Permissions issues
    • location of index files (/meme_work/ncim/data/indexes)
    • Location of reports folder (/meme_work/ncim/uploads/92051/reports)
  • Keep only hierarchies that need to be loaded instead of computed (see NCI META wiki for algorithm)
    • MSH
      PNDS
      USPMG

  • Pick up unpublished data (UnpublishedRrfLoaderAlgorithm) - /meme_work/ncim/data/METASUBSET/unpublished
    • UMLSCUI attributes

      drop table tbac;
      create table tbac as select distinct aui,last_release_cui from classes a, attributes b
      where a.atom_id = b.atom_id
      and attribute_name='UMLSCUI' and b.source = (select current_name from source_version where source='MTH');
       
      dump_table.pl -u mth -t tbac memedb > umlscui.txt
      • umlscui.txt (aui|cui)

    • ~DA: flags

      -- Find cases actually used
      drop table tbac;
      create table tbac as 
      select rui, source_rui from relationships_ui where source_rui like '%~DA:%' and source_rui not like '%~DA:0';
      
      dump_table.pl -u mth -t tbac memedb > x.txt
      perl -pe 's/~DA\://' x.txt >! ~/ruiDaFlags.txt
       

      → ruiDaFlags.txt (this is used by make_id_files.csh)

    • CONCEPT NOTE → conceptNotes.txt
      • drop table tbac;
        create table tbac as
        select a.cui, attribute_value, b.authority, b.timestamp
        from concept_status a, attributes b
        where a.concept_id=b.concept_id and attribute_name='CONCEPT_NOTE'
        and attribute_value not like '<>Long_Attribute<>:%'
        union all
        select a.cui, text_value, b.authority, b.timestamp
        from concept_status a, attributes b, stringtab c
        where a.concept_id=b.concept_id and attribute_name='CONCEPT_NOTE'
        and attribute_value like '<>Long_Attribute<>:%'
        and to_number(substr(attribute_value,20)) = c.string_id;

        dump_table.pl -u mth -d memestg -q "select * from tbac" >! conceptNotes.txt

         

    • ATOM NOTE → atomNotes.txt
      • dump_table.pl -u mth -d memestg -q "select a.aui, attribute_value, b.authority, b.timestamp from classes a, attributes b where a.atom_id=b.atom_id and attribute_name='ATOM_NOTE'" → NO DATA
    • MRSAB data for older versions
      • n/a - the latest MRSAB should be good enough, we don't care about anything earlier, just need to set Imeta/Rmeta
    • SRC_ATOM_ID values → srcAtomId.txt
      • dump_table.pl -u mth -d memestg -q "select distinct source_row_id, aui from source_id_map a, classes b where local_row_id=atom_id and table_name='C'" >! srcAtomIds.txt
    • deleted concept names → deletedCuiNames.txt
      • cp $MRD_HOME/etc/deletedcui.txt deletedCuiNames.txt
      • cp /meme_work/mr/201610/META/MRCUI.RRF .
    • Integrity check unary/binary data → integrityCheckData
      • dump_table.pl -u mth -d memestg -q "select * from ic_single" >! icSingle.txt
      • dump_table.pl -u mth -d memestg -q "select * from ic_pair" >! icPair.txt
    • Make sure this is all root terminologies now.
    • XR Relationships - xrRelationships.txt (cui1|cui2|lastModifiedBy)
      • dump_table.pl -u mth -d memestg -q "select distinct a.cui, b.cui, r.authority from relationships r, concept_status a, concept_status b where concept_id_1=a.concept_id and concept_id_2=b.concept_id and relationship_name='XR'" >! xrRelationships.txt