/
Loading a Database with Full 201610 Data
Loading a Database with Full 201610 Data
Overview
Instructions on loading a full database.
Details - from precomputed
- Create a separate working dir, e.g. c:/ncifull
- Create config/config.properties file for this work -e.g. config.properties
- set all directories to c:/ncifull instead of c:/umlsserver
- Set up a second "server" in Eclipse to run with this config file.
- Set up secondary run configurations that use this config.properties file
- Create config/config.properties file for this work -e.g. config.properties
- Truncate your db (or prepare a fresh db)
- Make sure you are using collation utf8_bin
- ALTER DATABASE ncifulldb DEFAULT CHARACTER SET utf8 COLLATE utf_bin
- Pull and build the project and make sure your config file is completely up to date
- Clear your indexes directory of all files
- Download the data from https://wci1.s3.amazonaws.com/NCI/umls.sql.gz
- Data including the NCI scale insertion is at: https://wci1.s3.amazonaws.com/NCI/umls-with-nci.sql.gz
- gunzip the file (may require cygwin)
- Try "gunzip umls.sql.gz"
- Due to the file size being so large, gzip can have some checksum issues, so do this:
- "cat umls.sql.gz | gunzip -c > umls.sql"
- Import the file into your database (with MySQL workbench → see the Management/Data Import tool)
- Reindex your database (run the Reindex profile on the admin/lucene project)
- NOTE: make sure your indexed.objects property is blank.
With that, you should be able to build and deploy a server.
NOTE: on certain unix environments MySQL table names are case sensitive and will need to be corrected (ask BAC if needed)
Details - from RRF data (and identity/unpublished data)
- Create a separate working dir, e.g. c:/ncifull
- Create config/config.properties file for this work -e.g. config.properties
- set all directories to c:/ncifull instead of c:/umlsserver
- Set up a second "server" in Eclipse to run with this config file.
- Set up secondary run configurations that use this config.properties file
- Create config/config.properties file for this work -e.g. config.properties
- Truncate your db (or prepare a fresh db)
- Make sure you are using collation utf8_bin
- ALTER DATABASE ncifulldb DEFAULT CHARACTER SET utf8 COLLATE utf_bin
- Pull and build the project and make sure your config file is completely up to date
- Clear your indexes directory of all files
- Download the 201610 data from https://wci1.s3.amazonaws.com/NCI/NCIM_201610.zip
- Unzip into c:/data/NCIM_201610 (may need to use cygwin to unzip due to large file sizes)
- Download the identity/unpublished data from https://wci1.s3.amazonaws.com/NCI/NCIM_201610-identity-unpublished.zip
- Unzip into c:/data/NCIM_201610 (may need to use cygwin to unzip due to large file sizes)
- Run the standard "reset-meta" integration test used to load sample data
- See Step 7 of Building and Deploying in Eclipse (use "reset-meta" as the profile instead of "reset")
- make sure the "input.dir" is pointing to "c:/data/NCIM_201610"
- and config.properties is pointing to c:/ncifull/config/config.properties
Scale Testing Insertions
Full data for insertions can be downloaded from S3
- Download the "full" testing data - https://wci1.s3.amazonaws.com/NCI/NCI-srcDataDir-full.zip
- Unzip this into c:/umlsserver/data
- This process will use the contents of the "inv" directory
- Edit config-full.properties so that source.data.dir points to c:/umlsserver/data
- Run a server, and perform NCI, SNOMEDCT_US, or MTH insertion.
Scale Testing Release
- Download the "full" testing data - https://wci1.s3.amazonaws.com/NCI/NCI-srcDataDir-full.zip
- Unzip this into c:/umlsserver/data
- This process will use the contents of the "mr" directory
- Edit config-full.properties so that source.data.dir points to c:/umlsserver/data
- Copy the "bin" directory from config/prod-nci-meta/src/main/resources to your c:/umlsserver directory
- If running on windows, make sure "cygwin.bin" property is properly configured in your config.properties (e.g. path to cygwin bin directory)
- Run a server, and perform NCI, SNOMEDCT_US, or MTH insertion.
References/Links
- Testing on mini scale - Loading a Database with Sample 201604 Data