HTML/URL Render Service¶
The HTML/URL Render Service is a suite of containers that function to securely render HTML and URL Observables
into images. The design of the system is such that, depending on deployment, even malicious content can be rendered. Once rendered, the images can be served through ACE's GUI to analysts, to allow for quick triaging.
Local Development¶
The suite of containers are orchestrated through either Docker Compose or Docker Swarm. Running the suite through Swarm will allow for auto replication of expired Renderer containers, at the expense of a single log file. You can run the service through either of these, but understand that with docker-compose
, only a single render job will be processed.
Docker¶
Docker is configured to use the render2/src/shared
folder as the build context. Dockerfiles can be found at:
render2/src/controller/Dockerfile
render2/src/nginx/Dockerfile
render2/src/renderer/Dockerfile
The containers that are deployed are based on the following DockerHub images:
- nginx:
nginx:alpine
- controller:
tiangolo/uvicorn-gunicorn-fastapi
- redis:
redis:latest
- renderer:
selenium/standalone-chrome
Docker Compose¶
The docker-compose.yml
file is at render2/
. To run the containers locally through docker-compose
, follow these steps:
cd render2
docker-compose build
docker-compose up
Swarm¶
The swarm-stack.yml
file is at render2/
. To run the containers locally through swarm, use the render2/swarm.sh
script, and follow these steps:
cd render2
./swarm.sh start
You can force swarm to rebuild all the container images through ./swarm.sh start force
Pytest¶
Tests must be run from the root ace directory! All tests related to the renderer are in tests/render2
. Tests span both unit
and integration
. Tests marked integration
will spin up containers through Docker, so you will need to ensure that you are not running any containers locally that would conflict. Integration tests also have a longer runtime, due to container start/stop time (and potentially build time).
If you want the renderer to use a web proxy while running tests:
1. Run cd /path/to/ice && cp /path/to/ice/.env.template /path/to/ice/.env
2. Fill in the variables. DO NOT commit the .env
file to version control.
Reverse Proxy¶
The nginx proxy serves two functions:
- It terminates the HTTPS client-based certificate authentication.
- It reverse proxies requests to (potentially) multiple Controller containers running.
You can pass a Base64 encoded string of the certificates as environment variables to dynamically set the
server certificates as well as the CA for the client certificates:
- NGINX_SERVER_NAME
- The host as seen from the client's perspective
- NGINX_X509_PRIVATE_KEY_B64
- Base64 encoded string of the private x509 server key
- NGINX_X509_PUBLIC_CERT_B64
- Base64 encoded string of the public x509 server cert
- CLIENT_CERT_CA
- Base64 encoded x509 public certificate that signed the client certificates (used for auth)
- UVICORN_HOST
- Hostname/ip of the Controller container
- UVICORN_PORT
- Port the Controller is listening on
These values will be dynamically injected into the nginx.conf
file at container
startup.
Nginx container listens on port 8443 by default
There is a helper script to generate your own server x509 certs, CA, and client-signed certs.
- Make sure you have openssl installed
cd /path/to/ice/render2/src/nginx
./gen_test_certs.sh
Controller¶
The Controller is implemented as a FastAPI HTTP API. It uses several pydantic
models to perform validation on requests & data:
- JobIn
- JobOut
It has the following endpoints:
Endpoint | HTTP | Request Body | Response Body | codes |
---|---|---|---|---|
job/ |
POST | { "content_type": "url", "content": "https://www.google.com" "output_type": "redis", "width": 1024, "height": 1024 } |
{ "id": "b5530f79-9084-394d-b8d0-8e1ec8c77dc5", "content_type": "url", "content": "https://www.google.com", "output_type": "redis", "width": 0, "height": 0 ,"status": "queued", "data": null } |
201 , 409 , 422 , 500 |
job/{job_id} |
GET | { "id": "b5530f79-9084-394d-b8d0-8e1ec8c77dc5", "content_type": "url", "content": "https://www.google.com", "output_type": "redis", "width": 0, "height": 0 ,"status": "processing", "data": "base64string } |
404 , 500 |
|
job/{job_id} |
DELETE | 200 |
||
ping/ |
GET | 200 |
Controller container can be populated with environment variables:
- REDIS_HOST
- Hostname or IP of the redis database
- REDIS_PORT
- Port to use for Redis
- REDIS_DB
- Database to use on the Redis host. NOTE this needs to be the
same for the controller and renderer containers.
- JOB_QUEUE_KEY
- The name of the job queue. NOTE should be set to the same
value for the controller and renderer.
- PORT
- The port the controller should listen on. NOTE should be set to the same value for
the UVICORN_PORT
on the nginx container. Port 8080
is preferred.
Documentation¶
Once the controller is deployed, it automatically provisions interactive API documentation at the following endpoints:
Postman¶
A full suite of Postman requests and tests are included. Due to client-based cert authentication, you will need to configure Postman to use client certificates before sending requests. If cert-based authorization is not configured, no requests will be served!
You can import the test suite through Postman using the tests/render2/RenderTests.postman_collection.json
. Within the Collection are several folders:
- Happy Path URL - this suite replicates the ACE Render Analysis Module's behavior for a job containing just a URL:
- POST a new job.
- GET the newly created job until job status resolves.
- DELETE the job.
- Happy Path HTML - this suite replicates the ACE Render Analysis Module's behavior for a job containing pre-crawled HTML, similar to the above.
- Sad Path - this suite tests all of the controller's data validation for incorrect values
- Misc - this suite contains miscellaneous requests, such as the
/ping
heartbeat and redirect behavior.
Redis¶
Redis is configured to communicate with both the Controller and the Renderer
Configuration¶
Redis currently uses the following parameters:
- Port: 6379
- DB: 0
Redis Schema¶
The redis schema uses the following keys:
render:queue:incoming
: stores aRedis.List
of all newjob_id
s that are waiting to be processedrender:job:<job_id>
: stores each job's hashmap
Job Hashmap¶
A job in redis is represented as follows:
{
"id": "string" - Value is the job id
"content": "string" - Value should be the URL or the HTML string
"content_type": "string" - Value should be 'url' or 'html'
"output_type": "string" - Value should be 'file' or 'redis'
"output_name": "string" - Value should be file path + file name like /path/to/screenshot.png or redis key to store binary at or S3 object path
"width": integer - Screenshot width in pixels
"height": integer - Screenshot height in pixels
"status": "string" - Value should be (queued, in_progress, completed, failed)
}
Renderer¶
The Renderer executes the following steps:
- Checks for a
job_id
in the queue. If the queue is empty, the renderersleep()
s. - Takes a job, extracts necessary information and sends it to the remote selenium webdriver.
- Received the rendered picture data and stores it.
It abstracts away the storage method through the BaseStorage
class, and subclasses.
See more at Renderer lifecycle
Environment variables at runtime:
- REDIS_HOST
- Hostname or IP of the redis database
- REDIS_PORT
- Port to use for Redis
- REDIS_DB
- Database to use on the Redis host. NOTE this needs to be the
same for the controller and renderer containers.
- JOB_QUEUE_KEY
- The name of the job queue. NOTE should be set to the same
value for the controller and renderer.
- PROXY_HOST
- Hostname of the proxy if applicable. If this environment variable is
not populated, the proxy settings are not set.
- PROXY_PORT
- Port for the proxy if applicable
- PROXY_USER
- Username for proxy if applicable (not required to use proxy)
- PROXY_PASS
- Password for proxy if applicable (not required to use proxy)
Data Storage¶
The rendered picture data was designed to be stored in a variety of ways. Currently, data is stored as a base64
encoded string directly in the Redis job hashmap. This could eventually be extended to file storage or S3 bucket storage.
Shared code¶
Some code is shared across containers.
JobQueue¶
The JobQueue
class essentially abstracts away Redis, and allows both the Controller and Renderer to access jobs through a single API, without needing direct access to Redis.
CachedJob¶
The Renderer creates a CachedJob
in its main run()
loop, which grabs a job off the queue and stores it for the lifetime of the Renderer.
Logging¶
Each module has it's own logger, and each logger uses a special TruncatingFormatter
object. This prevents the logged messages from exceeding a maximum size (MAX_BYTES
in render2/src/shared/shared_logging.py
).