Building and Deploying (with Docker)
Information on building and deploying the application in Docker.
Prerequisites
Create a database (e.g.
terminologydb
) in your postgres instance with UTF-8 character encoding. For example,psql> CREATE DATABASE terminologydb WITH encoding 'UTF-8';
NOTE: when redeploying a .dump file for the second time, you need to remember to first DROP and then re-create your database as described above. Otherwise, thepg_restore
command will simply add additional data.Ensure docker is set up in a way that the running image can use up to 4G of memory (the server process itself is capped at 3500M).
Details
A deployment involves three artifacts:
A docker impage (pulled from dockerhub) - e.g.
wcinformatics/wci-terminology-service:1.2.1-20240108
A postgres database dump file (that can be loaded with pg_restore) - link supplied on project-by-project basis
A Lucene indexes directory packaged up as a .zip file - link supplied on a project-by-project basis.
Following are the steps to deploy the terminology server with a specified data set (example shows the testing dataset).
# Start by setting information about artifacts and your postgres config, e.g.:
dockerImage=wcinformatics/wci-terminology-service:1.2.1-20240108
dumpUrl=https://wci-us-west-2.s3-us-west-2.amazonaws.com/term-server-v2/data/wci-terminology-db-TEST-2024.dump
indexUrl=https://wci-us-west-2.s3-us-west-2.amazonaws.com/term-server-v2/data/wci-terminology-indexes-TEST-2024.zip
PGDATABASE=terminologydb
PGHOST=localhost
PGPORT=5432
PGUSER=postgres
PGPASSWORD=pgP@ssw0rd
# Choose a directory where indexes will live
indexDir=/data/index
# Restore database (see lower in this document for restoring from a plain text dump)
wget -O data.dump $dumpUrl
pg_restore -O -n public -Fc --dbname=$PGDATABASE --username=$PGUSER data.dump
# Unpack indexes
# NOTE: ensure the docker user will be able to access the index files.
# NOTE: if deploying with Kubernetes, you will want to use a persistent volume
# (the other option is to put the data at an accessible URL and
# the pod can be configured to download that data and unpack it locally)
#
mkdir -p $indexDir
wget -O $indexDir/index.zip $indexUrl
unzip $indexDir/index.zip -d $indexDir
chmod -R 777 $indexDir
# Pull and run docker image (use -d to put it in the background)
# NOTE: these commands assume "sudo" is required to run docker
# and expose the process on port 8080 of the machine
sudo docker run -d --rm -e PGHOST=$PGHOST -e PGPORT=$PGPORT -e PGUSER=$PGUSER -e PGPASSWORD=$PGPASSWORD \
-e PGDATABASE=$PGDATABASE -p 8080:8080 -v "$indexDir":/index $dockerImage
After launching, you should be able to access the application via http://localhost:8080/terminology-ui/index.html
Script for Loading Database and Indexes
If operating in an environment where you have local psql client tools available and connectivity to the database, you can use this handy load-data.sh
script to make the process of loading (or reloading) the database a little easier. See: https://github.com/WestCoastInformatics/wci-terminology-service-in-5-minutes/tree/master/load-data
Watching Logs
Logs can be easily viewed by just watching the docker logs (e.g. sudo docker logs -f <container>
). However, the application uses a JSON logging format that can be a little hard to follow. We find that this perl script is useful in turning the logs into a more readable form.
$ cat > jlog.pl << 'EOF'
#!/usr/bin/perl
while(<>) {
$et = "";
if (/.*"extendedStackTrace":\[([^\]]*).*/) {
$et = $1;
}
if (/.*"thrown":\{.*"message":"(.*)","name".*/) {
$em = "$1";
# $em =~ s/(.{1,200}).*/$1/;
}
if (/.*"name":"([^"]*).*/) {
$name = $1;
}
/.*"level":"([^"]*).*"message":"(.*)","(endOfBatch|thrown).*"time":"([^"]*).*/;
$level = $1; $time = $4;
$x = "$2";
if (!$x && /"url":"([^"]*).*"status-code":"([^"]*)/) {
$url = "$1";
$status = "$2";
$url =~ s/.*http.*\/\/.*\//\//;
$x = "$status $url";
}
$x =~ s/\\"/"/g;
$x =~ s/\\n/\n/g;
# $x =~ s/(.{0,200}).*/$1/;
print "$time $level $x\n" if $x;
if ($et) {
$indent = " ";
print "$name: $em\n";
foreach $trace (split /\},\{/, $et) {
$trace =~ s/.*"file":"([^"]*).*"line":(\d+),.*/$1\:$2/;
print "$indent$trace\n";
if (length($indent)<20) {
$indent .= " ";
}
}
}
}
EOF
$ chmod 755 jlog.pl
With this script in hand, something like this can be done to see the logs more easily:
sudo docker logs <container> | jlog.pl
Connecting to Postgres with SSL
One additional environment variable can be passed to the docker container to add JDBC URL parameters - PGJDBCPARAMS.
This mechanism can be used to inject SSL parameters, for example (use a non-validating SSL connection):
For a situation where you know the PGSSLROOTCERT parameter you would use to connect via psql, the following can be used:
In the case of using PGSSLROOTCERT, it must be set to a path that is accessible within the docker container. The easiest way to achieve this is to use the already mounted volume in the container and put your root certificate file in $indexDir and then set PGSSLROOTCERT to /index/<cert file>. The certificate file can be a PEM encoded X509v3 certificate.
NOTE: while you’re testing this, you may want to get your psql
client connecting to the server with SSL first, so you can work out the proper “sslmode” and “sslrootcert” parameters you’ll want to use in the PGJDBCPARAMS. psql
has parameters for --sslmode
and -sslrootcert
that you can pass when called to simulate what we will do with the JDBC params.
Troubleshooting
Need to restore postgres from a plain-text dump
This is what dumping to plain looks like:
The plan dump can be unpacked this way