SPRINT 3 - Finish 6/30/2015

Overview

Sprint to extend some of the "read only" APIs, to finalize unit/integration testing for existing functionality, to extend the interface, to develop additional loaders, and to begin developing editing capabilities.

Priority things

  • Include precedence list info in the metadata
    • Add a "getDefaultPrecedenceList(terminology,version)" method to MetadataServiceRest (and implement all connected stuff)
    • Add integration test for this method as well (MetadataServiceRestNormalUse, DegenerateUse, EdgeCases,...)
    • On application side
      • Load the precedence list after "get all metadata" is called and set a scope variable equal to it.
      • In the "Metadata" tab, show the precedence list in its own section (separate from the other metadata - because it is an ordered list.)
      • Put in its own bullet before the "general metadata entries" stuff.
  • Support roman-numeral sort order for ICD10
  • SEO for browsers (google indexing, add to website, consider advertising).
  • Mojo integration tests for ClaML, RF2, RRF single
  • RRF loader -> create marker set for SNOMED (both "single" and "umls")
    • implemented, needs testing
  • Create a video demo of the site (camtasia) and post as a link on the header (video glyphicon if there is one)
  • Test solutions for exact matches showing up at the top for auto complete
    • the edge ngram boost should already be working for words starting with those things
  • Need improvements to filtering (e.g. misaligned quotes, extra punctuation, lots of things don't quite work right)
    • test for atoms - should work cleanly
    • test for relationships - consider escaping the query and handling the "*" at the end differently
  • DONE: Change marker set -> label
  • DONE: index on "active" as well as obsolete, and index on YYYYMMDD for easier searching.
  • DONE: Calls that return tree position list or tree positions should involve sorting (based on pfs?)
  • DONE: Make a US Extension subset.
  • DONE: Get the ?query= tested Also update the UI
  • DONE: Get the marked refset stuff working.
    • Add a button to "show extensions"
    • Highlight concept ids containing extension data with tooltip
    • Highlight text of extension concepts themselves with tooltip — need mechanism for this still
    • Have MarkerSet not apply to concepts that are ALREADY in the subset…
  • DONE: Ensure query is urlencoded so "heart attack" (in quotes) works as a valid search.
  • DONE: Test solution to boosting non-suppressible
    • try (suppressible:false^5 OR suppressible:true^1) as an alternative
    • Make sure suppressible thigns are still searchable but top results are never suppressible (if there are non-suppressible options).
  • DONE: For the intro pages: copy text from a local file and put into partials - don't keep in "rest"
    • Bring them in (like the logo images).
  • TODO: Finalize REST integration testing of tree methods.
  • TODO: autocomplete suggestions should always be valid (e.g. remove lucene special characters when setting the value)
    • This probably doable with QueryParser.escape on the text that comes in
    • Make sure the text is also url encoded
  • "self" entries need to exist in transitive closure for leaf nodes (for semantic search)
  • DONE: fix cases where atoms paging shows up but not the filter - has to do with show obsolete/suppressible
    • I think this happens when there are exactly 10 cases after "hide suppressible" or "hide obsolete"
    • Find cases from UMLS and SNOMED with exactly 10 non-obsolete, non-suppressible atoms
  • mojo test suite for RRF-single, plus terminology remover
  • mojo test suite for ClaML
  • ClaML Loader (make algorithm).
  • Finalize REST service for a general concept query (e.g. lucene query, HQL query, or SQL query)
    • DONE: Must start with "SELECT " and not contain ";"
      • provide appropriate error message via exception if not
    • DONE: Execute the query but check the object type of the first result
      • if doesn't match expected type, fail with an exception.
    • Use timeout => 
    • query.setHint("javax.persistence.query.timeout", timeout);
  • Implement NewConceptMinRequirementsCheck and "validation service"
    • DONE: Validation check
      • validate(Concept), validate(Atom), validate(Code), validate(Descriptor), validateMerge(Concept,Concept)
    • Validation service - See ValidationService.java
      • public ValidationResult validateConcept 
      • public ValidationResult validateAtom 
      • public ValidationResult validateDescriptor ...
      • public ValidationResult validateCode  ...
      • public ValidationResult validateMerge(Strng cui1, String cui2, String branch)
    • Implement ValidationService (e.g. ValidationServiceJpa)
      • pre-cache validation handlers (see how this works in MetadataService, etc).
        • static List<ValidationCheck> validationHandlers  = ...
      • Implement the interface methods
        • Call the corresponding validation check method
        • accumulate validation results
        • return them
    • ValidationServiceRest/RestImpl/ClientRest
      • validateConcept  /validate/cui/ (PUT) ConceptJpa
        • load all validation handlers and call each validate Concept) method.
      • validateAtom  /validate/aui (PUT) AtomJpa
      • validateDescriptor ... /validate/dui (PUT)DescriptorJpa
      • validateCode  .../validate/code (PUT)DescriptorJpa
      • validateMerge /validate/cui/merge/{cui1}/{cui2} - GET
    • Integration tests - create infrastructure and implement for "default validation check"
      • Assume that validation always passes.
      • normal use
        • validate a concept
        • validate an atom
        • validate a descriptor
        • validate a code
        • validate a merge between two concepts
      • degenerate use
      • edge cases
    • For individual tests, we need unit tests
      • implement a unit test for DefaultValidationCheck
      • in server-jpa-services
      • com.wci.umls.server.test.validation...
    • DONE: DefaultValidationCheck
      • validate atom whitespace
      • validate that a concept has at least one atom and at least one "hierarchical" relationship (to something that exists)
      • update config.properties to use this check
    • DONE: Organize validation checks into packages and create classes
      • com.wci.umls.server.jpa.services.validation (DefaultValidationCheck)
      • com.wci.umls.server.jpa.services.validation.umls (NewConceptMinRequirementsCheck)
        • concept has at least one atom and at least one hierarchical rel 
      • com.wci.umls.server.jpa.services.validation.snomed (NewConceptMinRequirementsCheck)
        • see term server project
  • DONE: Properly implement terminology remover functionality (e.g. content service rest.removeTerminology).  The content and the metadata objects need to be removed in the right order so as to avoid foreign key constraint errors. (e.g. attributes, definitions, semantic types, tree positions, transitive relationships, relationships, atoms, then atom classes - something like that).  For metadata I think terminology/rootTerminology is the only dependency (remove root terminology first).
    • Then update the mojo test case to remove the terminology (currently commented out)
  • DONE: Finish RF2 loader implementation, test on mini smomed
    • create separate config project for this (e.g. "prod-snomedct").
    • load and depoy to snomed.terminology.tools
    • classifier?
  • DONE: Finish implementing dui/code calls in ContentServiceRestImpl for child and root routines
  • DONE: Fill in the ContentClientRest calls for new Tree REST APIs
  • DONE: Add a "are you sure you want to leave the page" feature if people click the back button (see mapping project)
  • DONE: BUG: lower relevance of suppressible data in queries. - should be able to add a boost to suppressible:0
  • DONE: Hierarchy -rest layer adds a dummy top for merging 
  • DONE: Clients should handle null parameters properly
    • BAC: handle for content, metadata, project, security
    • Make a RootClientRest with a precondition check for null or blank

User Interface Enhancements

  • DONE: Customizable launch-page content (e.g. disclaimer) through html partial, ng-include, and config file variable
  • DONE: Show additional stuff: fully defined, for relationships inferred vs. stated
  • DONE: For SNOMED change labels - find a way to do this based on metadata (e.g. general metadata entries)
    • Atoms -> descriptions, attributes -> properties
  • DONE: Add subset member and subset member attribute info to UI (e.g. for SNOMED)
  • Add features for "deep" relationships when browsing UMLS.    
    • Need a generalized way to know when to use this
    • it is definitely only for "concept"
    • It may be that if any "atoms" of the concept don't match the terminology, then we show it.
  • DONE: Tree positions (in report itself)
  • IN PROGRESS: Tree browser widget (searchable) (as alternative to "search", e.g. use a tab)
  • Websocket (for a WebsocketWorkflowListener)
  • Advanced search (uses "search criteria" or "general query" mechanisms)
  • DONE: Mobile-friendly and other style issues
    • For "definitions" use "..." for long definition values.
    • Check on style of "filter" boxes in mobile experience
    • Add a "hide non-English" button if there are atoms with language !='ENG'

Additional/Enhanced  Loaders

  • Consider adding a "configure" routine to all "handlers" so they can configure themselves after they are all instantiated (e.g. if they need access to a metadataservice or a contentservice to function properly)
  • Boost higher term types in search.
    • lternatively - configurable boost handler (default is suppressible and term-type).
  • Need a "subset ancestors" computer (like reverse transitive closure) to idenitfy all of the ancestors of members of a subset that are not in the subset itself.  This is for a tree browser to be able to show you the path to your content.  Thus each "subset" will have two subsets, the subset itself (publishable) and the collection of ancestors (not publishable).  Then, in tree browser ,we can look up the subset memberships (if desired ,through graph resolver) of the level of the tree being browsed and then have a means to indicate that desired subset content exists in this subtree)
    • This works in conjunction with "tree browser" - there can be a picklist where you can choose a subset and then see that subset's data within the context of the tree (one subset at a time).  Or you can pick "no subset".  When enabled, this causes subset lookups to occur (can even really be a subsetquent call as delayed highlighting is fine)
  • "smart" RRF loader should support a config file to indicate what level at which definitions should be attached and should handle RXNORM and NCIt concepts.
  • RF2 snapshot loader for SNOMED CT (directly from RF2 - initial work done)
    • Need to create metadata
    • RelationshipType will be same as UMLS (PAR/CHD/RO)
    • AdditionalRelationshipType will be the typeId from relationships file
    • TermType will typeId from descriptions file
    • NO STYs
    • AtributeNames - correspond to the field names from RF2 files
  • RF2 delta/full loaders
  • Owl loader (e.g. for NCIt) - will require use of "DL" features
    • Also have a corresponding Owl export feature (e.g. "release")

     

Testing 

  • Handler002Test for normal use
    • Needs implementation to test graph resolver for UMLS, SNOMED, and MSH cases.
    • Get objects from Jpa layer, test resolver does what we expect. 
  • Implement Handler003/008Test - for ID assignment algorithms.  Borrow code from other project (though there may be differences).  The uuidHash algorithm is implemented properly for UMLS and may be different than for SNOMED.
  • Content service integration test
    • normal use - see all of the "TODO" items indicating functionality that can be tested more fully that is not yet.
    • degenerate use
    • edge cases
  • Better handle lucene searches like "xxx:yyy" where xxx is not a valid field name.  Should report that it is not a valid field name.

Editing Features

  • Project
    • Figure out how to capture "project scope" for SNOMED and for UMLS in a generalized way.  Update project objects to be able to properly capture (and compute) project scope.  NOTE: the scope definition may involve concepts/terminologies/semantic types.  IN that event, the scope computer gets a little bit more complicated.
  • Test loading a DB with envers auditing disabled and then making changes in a DB while it is enabled. Does it properly create audit entries?
    • for the old edition of the component?
    • for the new edition?
  • Metathesaurus editing actions
    • MetathesaurusContentEditingRest
      • methods for each "edit action"
      • Create a RestImpl
      • Create a client
      • Create integration tests to run against a "stock" dev  database
    • Add a semantic type component, Remove a semantic type component
      • Have a pencil icon by  the STYs section
      • clicking gives you a list of available STYs assigned, in tree order with a filter box for typing characters of the STY you want.
        • See the metadata "semantic type" option
      • User may want to choose multiple ones (so have a "finished" button)
      • Dont allow user to choose STYs already assigned to the concept.
      • Final action is to call "contentService.addSemanticTypeComponent"
      • Consider what happens to workflow status
      • Consider how to show "NEEDS_REVIEW" content in the browser
      • Consider how to support "Undo". - perhaps an action log (atomic/molecular) is a good idea still for that
    • Implement this completely including testing before moving on to other actions (each which requires a UI enhancement)
      • Approve a concept (e.g. set workflow status values).
      • Add an atom (e.g. MTH/PN - that needs to be generalized somehow)
      • Merge two concepts (consider the "workflow status "when this happens).
      • Move an atom (or atoms) from one concept to another
      • Split an atom (or atoms) out of a concept and specify a relationship type between the two concept
  • Terminology Editing (first use case)
    • Add a concept (as a child of an existing concept) with one or more atoms and a PAR/CHD relationship.
    • Run the classifier
    • Show classifier results (e.g. new inferred relationships, etc)
    • NOTE: this only works with a description logic based source that tracks inferred relationships.
    • PREREQ: SNOMEDCT RF2 loader.

 

Admin Tools

  • RRF Loader
    • Finish metadata loading
    • Do content loading
  • RRF Loader - single source

Optimizations

  • Consider all query-based lookups in ContentServiceJpa and reimplement as lucene lookups instead if it seems that it would improve performance.
    • Consider a second search upon a parse exception that doesn't try to use lucene syntax (e.g. remove parens, brackets, etc.)
  • Search should handle parse exceptions, e.g. if there are mismatched parens or braces, etc.

 

Future Stuff

  • Test conditional envers auditing: http://stackoverflow.com/questions/14250612/conditional-envers-auditing
  • escg (expression grammar - research)
  • Use Lucene SynonymFilter with synonym table
  • Component-Component relationships (between any two components).
  • Value set definitions (and marking of subset.isValueSet()) and linking to definition? via attribute?
  • Owl loader, Owl export of DL terminologies (e.g. RF2-loaded SNOMED)
  • Rdf export (flat)
  • Classifier (owl interface)
  • Expression language (based on SNOMED expression constraint grammar)
  • Sub-branching
    • branchResolutionHandler - figures out how to copy and mark branched objects  and update indexes  - for different branching strategies.
  • Handle mappings - may be not worth it
  • Implement an RF2 loader (use DL features)
  • Implement a ClaML loader
  • Support semantic network (e.g. sty rels, etc).  - probably want to wait for a real ontology - and maybe even load it as such.