Skip to content

Latest commit

 

History

History
576 lines (473 loc) · 27.5 KB

README.md

File metadata and controls

576 lines (473 loc) · 27.5 KB

DATAWAVE External Services

Apache License

This module contains DATAWAVE external services. These are microservices that are intended to work in conjunction with, and eventually replace, the Wildfly based DATAWAVE web service.

DATAWAVE microservices are built on top of Spring Cloud and Spring Boot.

How to Use This Repository

The microservices and associated utility projects are intended to be developed, versioned, and released indepdendently and as such are stored in separate repositories. This repository includes them all as submodules in order to provide an easy way to import them all in an IDE for viewing the code, or refactoring. Git submodules require some extra commands over the normal ones that one may be familiar with.

To clone the repository, use the following:

git clone --recurse-submodules [email protected]:NationalSecurityAgency/datawave-microservices-root.git
# Or for HTTPS:
# git clone --recurse-submodules https://github.com/NationalSecurityAgency/datawave-microservices-root.git

If you have already cloned the repository and didn't use the --recurse-submodules option, then all of the directories will be empty. You can initialize them with:

git submodule update --init --recursive

Each of the submodules will be in a detached head state. If you want to check out the main branch on each, you can use:

git submodule foreach 'git checkout main'

To update submodules:

git pull --recurse-submodules

Why Microservices?

The Wildfly DATAWAVE web service has become a monolith over time, and for production use, different components of the service evolve at different rates. However, given the monolithic nature of the service and impact of outages, deployments are not as frequent as they could be. Splitting the web service into microservices will allow components to evolve at different rates. It will also allow components to be scaled independently as the need arises.

Features

Externalized Configuration

Configuration for DATAWAVE microservices is externalized. Services download their configuration from a configuration service at startup time, and are also capable of notification of configuration changes via direct contact or notification over an AMQP message bus. Upon such notification, the services will reload their configuration dynamically. Configuration is profile-based and stored in YAML files.

Authorization Service

The authorization service provides cached authentication and authorization via either X509 certificates or signed JSON Web Tokens (which themselves can be retrieved from the authorization service). The authorization service uses the supplied authentication credentials in order to contact a back end authorization provider to retrieve associated roles and compute Accumulo authorizations based on those roles. This information is all collected in a DatawaveUser object that is returned as the payload in a JWT.

Note that, by default, the authorization service expects to authorize an entity on behalf of another caller. That is, it is assumed that another service (such as the existing DATAWAVE Wildfly web service) is authenticating on behalf of a calling user (or chain of calling user and intermediate servers). In this scenario, the request is made using the web service's credentials and therefore that information should not be included in the response. Practically, this means that if you wish to test the authorization service directly, you must pass X-ProxiedEntitiesChain/X-ProxiedIssuersChain headers. It is possible to configure the service to run in a mode where these headers are not required and then any call without these headers acts as though the calling server is proxying for itself. This is controlled by setting security.proxied-entities-required to true.

Common Service Starter

Over time, the goal is to keep breaking up the monolithic DATAWAVE web service into smaller microservices. The spring-boot-starter-datawave artifact provides common utilities and configuration that are intended to be used with any additional microservices. Depending on this artifact will activate the contained configuration. In particular, this applies Spring Security configuration that sets up the service to accept a signed JSON Web Token (JWT) for authentication and authorization. Typically some gateway service will take care of calling the authorization service and retrieving a JWT that represents the calling user chain (this could even be performed by a load balancer). The JWT will be passed to your service, and the JWT security configuration will read the JWT header, check the signature, and convert the token into a security principal that can be accessed by your service. This principal contains a list of the proxied DatawaveUser objects which themselves contain the original roles returned by the back end authorization provider, the computed Accumulo authorizations, and the mapping from roles to authorizations. If your service needs to call another microservice, it can take advantage of the Spring RestTemplate configuration provided by the common module. This configuration sets up RestTemplate so that outgoing calls will be secured using the configured client certificate, and the authenticated principal will automatically be converted back to a JWT and passed along in a header, thus satisfying the authentication and authorization needs of the service you are calling. With this approach, authentication and authorization happens once and then the credentials are passed around securely to each service that is invoked along the calling chain.

Audit Service

The audit service provides query auditing capabilities to send audit records to a file or Accumulo. This service, which itself extends the common service base (and therefore accepts JWTs for authentication), could be extended to allow the audit records to be sent somewhere else as well (a remote audit store, for example).

Internally, the audit service simply receives audit messages, validates them, and then puts the message onto Spring Cloud streams to handle each of the configured audit sinks (e.g., file and Accumulo).

Audit Client Starter

Query services that support auditing integrate with the audit service via the spring-boot-starter-datawave-audit module. View the starter README for details.

Dictionary Service

The dictionary service provides access to the data and edge dictionaries that were previously contained in the Wildfly monolith. This service extends the common service base, and therefore accepts JWTs for authentication. If the remoteauth profile is activated, then the service will also attempt to contact the authorization service in order to authenticate a user when a client certificate is provided and no JWT is provided.

Accumulo Service

The Accumulo service is an administrator utility that provides a simple rest API for performing basic table and security operations on Accumulo. View the service README for details.

Service Discovery

Consul is used for service discovery (unless running in some container orchestrator such as Kubernetes, where DNS would be used for service discovery). It is somewhat like Zookeeper in that it runs a quorum of servers to maintain consistency and provide fault tolerance. A lightweight agent acts as a proxy to the quorum of servers, and also provides DNS-based discovery of services (Consul also supports an HTTP discovery interface, which Spring uses behind the scenes). Consul integrates well with Spring Boot / Spring Cloud. When enabled, each service automatically registers itself with Consul. When using Spring's RestTemplate to invoke a service, the location of that service is automatically discovered through Consul. Even the configuration server, which is needed at startup by all services, registers itself through Consul and each service during bootstrap contacts Consul to locate the configuration server and then continues to download the configuration from the located service.

The DNS-based service discovery is useful for non-Spring applications. For example, the Wildfly-based DATAWAVE web service needs to contact microservices. By running Consul on port 53, only the name of the service needs to be configured and lookup will happen automatically without any additional glue code. Or, if Consul is run on its default port of 8600, a small amount of glue code can be used to contact the non-standard port and locate the service. Consul also supports SVR DNS records, which not only supply the location of a service but the port on which it is running as well.

Shared Caching

While the authorization service may run perfectly well as a single instance and not need to be scaled to handle load, it would still be desirable to run more than one copy for fault tolerance. However, if we do that, we then have the potential for cache inconsistencies (and performance impacts) where a lookup from one copy of the authorization service returns one result and then a lookup from another either has to wait to call the backend authorization provider or returns a cached result that is different and/or has a different lifespan since it was inserted into the cache at a different time.

Hazelcast provides a client-server cache where the server is a cluster of cache services that store data and provide it to clients. By running more than a single cache server, we can achieve fault tolerance for the cache (and even perform a rolling upgrade). The authorization service uses a Hazelcast client to connect to the server to store and retrieve cached data. The Hazelcast client can be configured with a "Near Cache" which stores frequently used data in-process to avoid the extra network call, if performance becomes an issue. Note that all of this is hidden behind standard Spring cache abstractions, so the implementation could be change if the need arises.

Refreshable Configuration

Spring Boot supports the creation of beans that are annotated with the RefreshScope annotation. This creates a bean that is wrapped with a proxy to the real bean. Then, upon refresh events, the real bean is re-created thus changing the view behind the proxied bean. This setup is very similar to the RefreshableScope annotation that is used in the Wildfly-based DATAWAVE web service. As with that service, only beans that have been annotated as refreshable will benefit from the mechanism, and one must be careful in a bean that is not refreshable (but uses a refreshable configuration bean) not to pull values from a refreshable bean and store them in member variables. This defeats the purpose of having a refresh mechanism.

There are three ways to issue a refresh:

  1. Send a POST message with an empty body to /<servicename>/mgmt/refresh on the service you wish to refresh. This will cause all beans annotated wih RefreshScope to be re-created behind their proxies.
  2. Send a POST message with an empty body to /<servicename>/mgmt/bus-refresh?destination=<otherservicename>:** on any service. This will cause all running instances of <otherservice> that are listening on the event bus to refresh.
  3. Send a POST message with an empty body to /<servicename>/mgmt/bus-refresh on any service when using RabbitMQ. The service in question and all other services using RabbitMQ will be refreshed.

Getting Started

The quickest way to get stared is as follows. First, copy the .example files in the sample_configuration directory to the same name without .example (e.g., authorization-dev.yml.example becomes authorization-dev.yml). Edit the files you copied and customize any properties you wish.

cd sample_configuration
for f in *.yml.example; do
    cp ${f} ${f%.example}
done

Next, build the DATAWAVE microservices.

mvn -Pexec clean install
# You can add -DskipTests to skip running unit tests
# You can add -Pdocker to build Docker images

Now launch the configuration service. Specify the sample_configuration for the configuration repository.

java -jar datawave-config-service/target/config-service*-exec.jar --spring.profiles.active=dev,nomessaging,native,open_actuator --spring.cloud.config.server.native.searchLocations=file://$PWD/sample_configuration 

Now launch the authorization service.

java -jar datawave-authorization-service/service/target/authorization-service*-exec.jar --spring.profiles.active=dev,nomessaging,mock

Note that the authorization service is configured for two-way authentication, and the PKI materials located here are used by default (password for all: ChangeIt). For example, to access the authorization service endpoints below, simply import either the testUser.p12 or testServer.p12 client cert into your browser or preferred HTTP client. The default PKI configuration is provided for testing purposes only and is not intended for production use.

Once all services are running, you should be able to hit some of the following URLs:

  • https://localhost:8643/authorization/v1/authorize will return a JWT corresponding to your user
  • https://localhost:8643/authorization/v1/whoami will return a JSON-encoded version of the DatawaveUser corresponding to your client certificate
  • https://localhost:8643/authorization/swagger-ui.html shows Swagger documentation of the service
  • https://localhost:8643/authorization/mgmt/ shows available Spring Boot Actuator management endpoints

You may see an exception from either the authorization or config service due to an "Illegal key size". If you see this exception, it means you JRE/JDK does not have the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy files installed. See here for more information.

Now launch the dictionary service, if desired:

java -jar datawave-dictionary-service/service/target/dictionary-service*-exec.jar --spring.profiles.active=dev,nomessaging,remoteauth

You should be able to retrieve the data and edge dictionaries at the following URLs:

  • https://localhost:8843/dictionary/data/v1/
  • https://localhost:8843/dictionary/edge/v1/

If you are invoking from a browser, be sure the PKI materials referenced above have been loaded into the browser so that the client certificate is sent to the dictionary service (and then along to the authorization service). Alternatively, you can retrieve a JWT from the authorization service and pass it along to the dictionary service. The following example assumes the PKI materials from spring-boot-starter-datawave are used:

# Retrieve a JWT:
export PKI_DIR=$PWD/datawave-spring-boot-starter/src/main/resources/pki
curl --cacert $PKI_DIR/ca.pem -E $PKI_DIR/user.pem https://localhost:8643/authorization/v1/authorize > /tmp/jwt.txt
curl -H "Authorization: Bearer $(</tmp/jwt.txt)" --cacert $PKI_DIR/ca.pem https://localhost:8843/dictionary/data/v1/

If you wish to run the audit service, it requires RabbitMQ. You must first install and run RabbitMQ on your local host. If you have docker on Linux (on a Mac you will have to map the ports through to the host rather than use host networking), you can do this with:

docker run --rm -d --name rabbitmq --network=host rabbitmq:3.7-management-alpine

You should do this before running any of the other services above, and when you launch each service (configuration, authorization, dictionary), remove ,nomessaging from the activated profiles on the command-line for each. Then, you can run the audit service as follows:

java -jar datawave-audit-service/service/target/audit-service*-exec.jar --spring.profiles.active=dev

Once the audit service is running, you can call it by passing the JWT you retrieved from the authorization service in an Authorization header. For example, here is how you would check the health of the audit service:

curl -k -H "Authorization: Bearer <insert JWT text here>" https://localhost:8743/audit/mgmt/health
# Optional: save JWT in a text file, then pass it for future calls:
# curl -k <specify certificate info> https://localhost:8643/authorization/v1/authorize > /tmp/jwt.txt
# curl -q -k -H "Authorization: Bearer $(</tmp/jwt.txt)" https://localhost:8743/audit/mgmt/health
# NOTE: if you are using the supplied configuration, the test user cert will work:
# export PKI_DIR=datawave-spring-boot-starter/src/main/resources/pki
# curl --fail --cacert $PKI_DIR/ca.pem -E $PKI_DIR/user.pem https://localhost:8643/authorization/v1/authorize > /tmp/jwt.txt
# curl -q --fail --cacert $PKI_DIR/ca.pem -H "Authorization: Bearer $(</tmp/jwt.txt)" https://localhost:8743/audit/mgmt/health

You can send an audit request by posting to the /v1/audit endpoint. The following example assumes you have saved your JWT provided by the authorization service in /tmp/jwt.txt.

curl -q -k -H "Authorization: Bearer $(</tmp/jwt.txt)" \
--data-urlencode "auditUserDN=testUser" \
--data-urlencode "auditType=LOCALONLY" \
--data-urlencode "query=no query--testing the audit service" \
--data-urlencode "queryDate=1514764800000" \
--data-urlencode "auditColumnVisibility=USER" \
--data-urlencode "logicClass=EventQuery" \
--data-urlencode "auths=TEST" \
https://localhost:8743/audit/v1/audit

Running With Consul

A production environment is likely to use DNS-based discovery provided by something like Rancher or Kubernetes. However, if you wish to use Consul locally for service discovery in development (note that it is also possible to simply specify that all services are running on localhost), you must download Consul from here. You can then run the agent in dev mode on your local host:

mkdir /tmp/consul.d
cat > /tmp/consul.d/consul.json <<_EOF_
{
    "datacenter": "demo_dc",
    "disable_update_check": true,
    "enable_agent_tls_for_checks": true,
    "key_file": "$PWD/datawave-spring-boot-starter/src/main/resources/pki/server-key.pem",
    "cert_file": "$PWD/datawave-spring-boot-starter/src/main/resources/pki/server-crt.pem",
    "ca_file": "$PWD/datawave-spring-boot-starter/src/main/resources/pki/ca.pem"
}
_EOF_
consul agent -dev -ui -config-dir=/tmp/consul.d

Note that the config.json file tells Consul to provide the demo server X509 certificate when making health checks. If you have modified the sample configuration to use a different certificate, then you should adjust the configuration for your certificate and CA. Or, you can leave that step out entirely (run without the -config-dir=/tmp/consul.d argument), everything will work, but the health checks performed by Consul will fail on the authorization service since it requires a client certificate. A third option is to reconfigure the authorization service to not require a client certificate (comment-out the server.ssl.client-auth property in authorization.yml).

Or, if you have Docker available just run the Consul image:

docker run -d --rm --name consul --network=host -v $PWD/datawave-spring-boot-starter/src/main/resources/pki:/pki \
    -e CONSUL_LOCAL_CONFIG='{"datacenter": "demo_dc", \
        "disable_update_check": true, "enable_agent_tls_for_checks": true, \
        "key_file": "/pki/server-key.pem", "cert_file": "/pki/server-crt.pem", \
        "ca_file": "/pki/ca.pem"}' \
    consul:1.0.3

The Consul gui will then be available at http://localhost:8500/ui.

You will then need to enable the consul profile on each service you run by adding ,consul to the --spring.profiles.active list. Note that if you are not using RabbitMQ and therefore are using the nomessaging profile, you will need to list the consul profile before the nomessaging profile. E.g., to run the authorization service with consul, but without RabbitMQ support, you would run:

java -jar datawave-authorization-service/service/target/authorization-service*-exec.jar --spring.profiles.active=dev,consul,nomessaging

If you intend to run Consul and alo use RabbitMQ, then you must define a service registration for RabbitMQ in Consul. You can do that by running Consul as follows instead of the method described previously:

mkdir /tmp/consul.d
echo '{"service": {"name": "rabbitmq", "port": 5672}}' | tee /tmp/consul.d/rabbitmq.json
consul agent -dev -ui -config-dir=/tmp/consul.d

Or, if you have Docker, you can manually add a service definition to the running Consul container:

docker exec -it consul sh -c 'echo '"'"'{"service": {"name": "rabbitmq", "port": 5672}}'"'"' | tee /consul/config/rabbitmq.json'
docker exec -it consul consul reload

Running with a Hazelcast Server

A production environment will use Hazelcast for caching in the authorization service. Hazelcast is set up to run in a client/server mode where the authorization service runs a Hazelcast client which connect to a Hazelcast server (which is usually a cluster of servers). The dev profile template disables this client, forcing the configuration to use an embedded Hazelcast member that is only useful for testing. To test with the client/server form on Hazelcast you will need to run one or more Hazelcast server processes (to form a cluster) and configure the authorization service to run the Hazelcast client.

To tun the Hazelcast service, execute the following. Note that running the Hazelcast service requires Consul.

java -jar datawave-hazelcast-service/service/target/hazelcast-service-*-exec.jar --spring.profiles.active=dev,consul,nomessaging

Remove the ,nomessaging if you are using RabbitMQ. If you want to run more than one copy of the service, you will need to specify a different secure and non-secure port for each copy. For example, you could run a second and third copy with:

java -jar datawave-hazelcast-service/service/target/hazelcast-service-*-exec.jar --spring.profiles.active=dev,consul,nomessaging --cachePort=8843

and

java -jar datawave-hazelcast-service/service/target/hazelcast-service-*-exec.jar --spring.profiles.active=dev,consul,nomessaging --cachePort=8943

To configure the authorization service to run the Hazelcast client, you can comment-out or remove the following line from you authorization-dev.yml file:

hazelcast.client.enabled: ${hzClient:false}

Alternatively, you can set the value to true, or pass --hzClient=true on the command-line.

Building/Running with Docker

If you have Docker installed on your machine, then this demo can be built with Docker images by enabling the docker maven profile:

mvn -Pdocker clean package

Note that you must override the os detection since you might not be building on the same architecture that the Docker image will be running.

Then, using Docker Compose, you can run the demo:

cd docker-quickstart
docker-compose up -d

You can watch logs with:

docker-compose logs -f

And shut everything back down with:

docker-compose down

The docker-compose.yml file launches the following services:

  • Consul
  • RabbitMQ
  • Hazelcast Server
  • Authorization - configured with a mock provider that returns canned credentials for any caller
  • Audit

It will take a minute for the services to start and for the authorization service to download its configuration from the configuration service. You can run docker ps to find the port mappings for the various services. The authorization service and auditing service get randomly assigned ports, which allows scaling with docker-compose scale.

Directory/Build Layout

The external services are organized as follows. An executabl service (e.g., Authorization, Audit, Accumulo, Dictionary) all have two components: api and service. For example, datawave-authorization-service/api contains the components of the Authorization service that a client would use, and datawave-authorization-service/service contains the implementation of the service. The package as a whole is versioned using SemVer. If code in the service sub-folder changes, then typically this means it is an api-compatible change and only the minor or patch version need be updated. However, it should be updated for the module as a whole. This means the api module will receive a new version even though code in it hasn't changed. If code in the api module does change, then care must be taken. Typically the change in an API will involve the response objects. In this case, the response object should be copied and given a new class name with a version appended. The service must then be updated to have a new version endpoint that works with the new response object. The old and new api version must be maintained over enough releases to ensure that users have switched to the new service api version.

In addition to the common pattern for services, this directory also contains:

  • datawave-config-service: The configuration service. It has no API (other than that provided by Spring Boot), so there is no config-api module.
  • datawave-hazelcast-service: This is the Hazelcast cache service. Multiple copies of this are intended to be run and form a cluster for caching data. The hazelcast service contains two other sub-modules:
    • client: This is the client code required to access a running Hazelcast clustered cache.
    • common: This is code required by both the client and service.
  • datawave-metrics-reporter: This contains code to publish metrics from a Dropwizard metrics registry to StatsD, NSQ, or Timely.
  • datawave-accumulo-utils: General utilities for working with Accumulo and datawave marking functions
  • datawave-base-rest-responses: Base rest response and exception classes that are used by query/service responses
  • datawave-common-utils: Extremely low level utilities that might be used by any service (e.g., datawave-specific string utils)
  • datawave-type-utils: Normalizers and data types that are part of the datawave metadata and query system
  • datawave-spring-boot-starter: This is a service starter that can be used when creating a new microservice. See the "Common Service Starter" section above.
  • datawave-spring-boot-starter-audit: This is a service starter that provides an auto-configured REST client for submitting messages to the audit service.
  • datawave-spring-boot-starter-cache: This is a service starter that customizes the default spring boot caching layer. Since datawave services declare some named caches, this starter allows another cache configuration to be declared and marked as an @Primary bean in order to configure application-level caching.
  • docker-quickstart: This contains quickstart configuration for running the microservices using Docker compose.
  • sample_configuration: This contains example configuration that you might run the config service against in order to provide configuration for the various microservices.