Building and Deploying (with Docker)

Information on building and deploying the application in Docker.

1 Prerequisites
2 Details
3 Script for Loading Database and Indexes
4 Watching Logs
5 Connecting to Postgres with SSL
6 Troubleshooting

Prerequisites

Create a database (e.g. terminologydb) in your postgres instance with UTF-8 character encoding. For example,

psql> CREATE DATABASE terminologydb WITH encoding 'UTF-8';

NOTE: when redeploying a .dump file for the second time, you need to remember to first DROP and then re-create your database as described above. Otherwise, the pg_restore command will simply add additional data.
Ensure docker is set up in a way that the running image can use up to 4G of memory (the server process itself is capped at 3500M).

Details

A deployment involves three artifacts:

A docker impage (pulled from dockerhub) - e.g. wcinformatics/wci-terminology-service:1.2.1-20240108
A postgres database dump file (that can be loaded with pg_restore) - link supplied on project-by-project basis
- Test Data: https://wci-us-west-2.s3-us-west-2.amazonaws.com/term-server-v2/data/wci-terminology-db-TEST-2024.dump
- Test Data (plain text dump): https://wci-us-west-2.s3-us-west-2.amazonaws.com/term-server-v2/data/wci-terminology-db-TEST-2024.dump.gz
A Lucene indexes directory packaged up as a .zip file - link supplied on a project-by-project basis.
- Testing Data Indexes: https://wci-us-west-2.s3-us-west-2.amazonaws.com/term-server-v2/data/wci-terminology-indexes-TEST-2024.zip

Following are the steps to deploy the terminology server with a specified data set (example shows the testing dataset).

# Start by setting information about artifacts and your postgres config, e.g.:
dockerImage=wcinformatics/wci-terminology-service:1.2.1-20240108
dumpUrl=https://wci-us-west-2.s3-us-west-2.amazonaws.com/term-server-v2/data/wci-terminology-db-TEST-2024.dump
indexUrl=https://wci-us-west-2.s3-us-west-2.amazonaws.com/term-server-v2/data/wci-terminology-indexes-TEST-2024.zip
PGDATABASE=terminologydb
PGHOST=localhost
PGPORT=5432
PGUSER=postgres
PGPASSWORD=pgP@ssw0rd

# Choose a directory where indexes will live
indexDir=/data/index

# Restore database (see lower in this document for restoring from a plain text dump)
wget -O data.dump $dumpUrl
pg_restore -O -n public -Fc --dbname=$PGDATABASE --username=$PGUSER data.dump

# Unpack indexes
#   NOTE: ensure the docker user will be able to access the index files.
#   NOTE: if deploying with Kubernetes, you will want to use a persistent volume
#         (the other option is to put the data at an accessible URL and
#          the pod can be configured to download that data and unpack it locally)
#
mkdir -p $indexDir
wget -O $indexDir/index.zip $indexUrl
unzip $indexDir/index.zip -d $indexDir
chmod -R 777 $indexDir

# Pull and run docker image (use -d to put it in the background)
#  NOTE: these commands assume "sudo" is required to run docker
#        and expose the process on port 8080 of the machine
sudo docker run -d --rm -e PGHOST=$PGHOST -e PGPORT=$PGPORT -e PGUSER=$PGUSER -e PGPASSWORD=$PGPASSWORD \
     -e PGDATABASE=$PGDATABASE -p 8080:8080 -v "$indexDir":/index $dockerImage

After launching, you should be able to access the application via http://localhost:8080/terminology-ui/index.html

Script for Loading Database and Indexes

If operating in an environment where you have local psql client tools available and connectivity to the database, you can use this handy load-data.sh script to make the process of loading (or reloading) the database a little easier. See: https://github.com/WestCoastInformatics/wci-terminology-service-in-5-minutes/tree/master/load-data

Watching Logs

Logs can be easily viewed by just watching the docker logs (e.g. sudo docker logs -f <container>). However, the application uses a JSON logging format that can be a little hard to follow. We find that this perl script is useful in turning the logs into a more readable form.

$ cat > jlog.pl << 'EOF'
#!/usr/bin/perl
while(<>) {
    $et = "";
    if (/.*"extendedStackTrace":\[([^\]]*).*/) {
        $et = $1;
    }
    if (/.*"thrown":\{.*"message":"(.*)","name".*/) {
        $em = "$1";
#        $em =~ s/(.{1,200}).*/$1/;
    }
    if (/.*"name":"([^"]*).*/) {
        $name = $1;
    }
    /.*"level":"([^"]*).*"message":"(.*)","(endOfBatch|thrown).*"time":"([^"]*).*/;
    $level = $1; $time = $4;
    $x = "$2";
    if (!$x && /"url":"([^"]*).*"status-code":"([^"]*)/) {
      $url = "$1";
      $status = "$2";
      $url =~ s/.*http.*\/\/.*\//\//;
      $x = "$status $url";
    }
    $x =~ s/\\"/"/g;
    $x =~ s/\\n/\n/g;
 #   $x =~ s/(.{0,200}).*/$1/;
    print "$time $level $x\n" if $x;
    if ($et) {
        $indent = "  ";
        print "$name: $em\n";
        foreach $trace (split /\},\{/, $et) {
            $trace =~ s/.*"file":"([^"]*).*"line":(\d+),.*/$1\:$2/;
            print "$indent$trace\n";
            if (length($indent)<20) {
              $indent .= "  ";
            }
        }
    }
}
EOF
$ chmod 755 jlog.pl

With this script in hand, something like this can be done to see the logs more easily:

sudo docker logs <container> | jlog.pl

Connecting to Postgres with SSL

One additional environment variable can be passed to the docker container to add JDBC URL parameters - PGJDBCPARAMS.

This mechanism can be used to inject SSL parameters, for example (use a non-validating SSL connection):

docker run ... -e PGJDBCPARAMS="?sslmode=required&sslfactory=org.postgresql.ssl.NonValidatingFactory" ...

For a situation where you know the PGSSLROOTCERT parameter you would use to connect via psql, the following can be used:

docker run ... -e PGJDBCPARAMS="?sslmode=verify-full&sslrootcert=$PGSSLROOTCERT" ...

In the case of using PGSSLROOTCERT, it must be set to a path that is accessible within the docker container. The easiest way to achieve this is to use the already mounted volume in the container and put your root certificate file in $indexDir and then set PGSSLROOTCERT to /index/<cert file>. The certificate file can be a PEM encoded X509v3 certificate.

NOTE: while you’re testing this, you may want to get your psql client connecting to the server with SSL first, so you can work out the proper “sslmode” and “sslrootcert” parameters you’ll want to use in the PGJDBCPARAMS. psql has parameters for --sslmode and -sslrootcert that you can pass when called to simulate what we will do with the JDBC params.

Troubleshooting

Need to restore postgres from a plain-text dump

This is what dumping to plain looks like:

pg_dump -U $PGUSER --format=plain --no-owner --no-acl $PGDATABASE \
    | sed -E 's/(DROP|CREATE|COMMENT ON) EXTENSION/-- \1 EXTENSION/g' |\
    gzip -9 > wci-terminology-service-${version}.dump.gz

The plan dump can be unpacked this way

$ psql -Upostgres
psql> CREATE DATABASE mydatabase WITH encoding 'UTF-8';
$ gunzip -c wci*dump.gz | psql -U postgres mydatabase