Deploying Data Visualization Application to the Cloud: Docker Compose and Amazon Web Service

Introduction

Back then, during 2017-2018, when data scientists are defined as mere practitioners that have limited functionalities that only involves the workflow related to data, such as data ingestion, migration, storage, manipulation, analysis, and modeling. From my point of view, however, with the fast pace of technology-growing and advancement, a good data scientist should evolve accordingly as well. While asking a data scientist to scale their machine learning models to the cloud is generally not practical several years ago, I think it is necessary for a modern data scientist to at least have some basic knowledge of the infrastructure behind the deployment process.

To gear myself up with the essential skills about cloud computing and software deployment, I integrated one of my data visualization applications with Flask, which is a web development framework, and dockerized both of them. Combined with AWS (Amazon Web Services), the deployed application is now on the cloud running 24/7. In this post, I will demonstrate the whole deployment process and break down the steps one by one, including building docker containers, scale services using docker compose, and deploy them on AWS EC2 instances. The application is live here: http://18.219.26.131/.

Web Development

In my previous post, I introduced how to make interactive data visualizations via Bokeh server. The extra options provided in the plot such as shading and radio check group, are supported by a Bokeh server running behind the scene. To display the content returned by the Bokeh server on a webpage, we need to build a framework to hold the information and present it through a web browser. There are various platforms out there that provide the functionality, including Flask, Django, etc. Compares to Django, Flask is more lightweight and has more freedom in terms of keeping the core of the web application extensible and straightforward. On the other hand, Django is a full-stack Python web framework which takes care of many built-in modules such as admin interface and ORM database support. Since what I need is a simple layout to mount my Bokeh visualization, I choose Flask to build the webpage. 

Essentially, Flask runs a server locally that hears all the inbound requests and sends the webpage content to the client backend. The following code sets up the Flask server and pulls the plot from our Bokeh server automatically once a request is heard: 

# import flask
from flask import Flask, render_template
from bokeh.embed import server_document
import sys, os
# instantiate the app
app = Flask(__name__)
# render the template
@app.route('/')
def index():
    # pull server session from the bokeh server
    vggm = server_document(url="http://localhost:5006/map")
    # return the rendered webpage
    return render_template("index.html", vggm=vggm)
# run the server
if __name__ == "__main__":
    app.run(host="0.0.0.0")

 

In the code, the returned content from the Flask server is an HTML file that can be displayed by the browser which is rendered by index.html. How to write HTML and CSS is beyond the scope of this post, but basically index.html specifies the layout of the webpage, and the corresponding CSS file specifies the style of the content.

Docker Containers

Docker helps to overcome individual differences between various platforms and operating systems when deploying software applications. A Docker image packs up the application core and all its dependencies into one package that can run anywhere on any computing environment as long as there is a docker engine installed. The environment for the application is internally installed and set up automatically by the docker image, such a micro-environment running based on the underlying operating system is called a docker container. Figure 1 illustrates the structure of Docker-containerized applications.

Figure 1. Docker bridges the gap between the operating system and applications.
Figure 1. Docker bridges the gap between the operating system and applications.

As we can see in Figure 1, there can be multiple applications that run in their Docker containers with different configurations, which can be a life savior in some cases where only one of the apps need to be updated or modified. In this case, two docker containers will be running within one operating system, one for Flask and one for Bokeh, and new applications can be added into the group later on as additional services. 

To dockerize a particular application, once the docker engine is installed on the local computer, one last component needs to be presented, which is called Dockerfile. A Dockerfile is a list of instructions that docker follows to build a Docker image. For example, the Dockerfile for Flask application is shown below: 

# from base image
FROM continuumio/miniconda3
# maintainer
LABEL maintainer="shihao1007@gmail.com"
# install packages
RUN conda install -y bokeh flask
# set the working directory to /app
WORKDIR /app
# copy the current directory contents into the working directory
COPY . /app
# make port 5006 avaiable to the world outside the container
EXPOSE 5006
# run bokeh serve whem container launches
CMD python /app/flaskapp/app.py

 

The FROM instruction initializes the building processing of the container with a base image for following instructions. Here we choose a base image Miniconda3. The RUN instruction installs all the packages needed for this container using conda install. The rest of the instructions are explained by the comments.

The Dockerfile for Bokeh application can be constructed similarly, just add pandas and nltk to the packages that need to be installed and change the exposed port number. Then the docker image can be built by running docker build.  To run a docker container from a docker image, use docker run.

Docker Compose

For a multi-functional service that is composed of multiple Docker containers, while one can run docker run multiple times, it can be tedious and hard to scale if there are too many containers to run at the same time. Fortunately, we have Docker Compose, which is a tool for defining and running multiple docker container applications. Referring to the official documentation, using Docker Compose includes three steps:

  1. Define individual Dockerfiles for each application.
  2. Define services in docker-compose.yaml to associate all docker containers.
  3. Run docker-compose up.

Step 1 is done previously, the next step is to build a docker-compose.yaml file using the following code: 

version: '2'
services:
  bokehapp:
    build:
      context: .
      dockerfile: bokehapp.Dockerfile
    ports:
      - '5006:5006'
    volumes:
      - './:/app'
  flaskapp:
    depends_on:
      - bokehapp
    build:
      context: .
      dockerfile: flaskapp.Dockerfile
    volumes:
      - './:/app'
    ports:
      - '80:5000'

 

As shown in the code:

  • version: '2' specifies that the file format version of this .yaml file is version 2. Different format versions have different syntax and upgrades compare to the older version.
  • services specifies all the containers to be launched.
  • Within each service (container), build specifies the docker image for that specific container to build from, context . dockerfile tells Docker Compose the container should be built from a Dockerfile.
  • ports binds the local port to the container port, in this case, for the Bokeh container, the local port 5006 is mapped to port 5006 on the container. Similarly, local port 80 is exposed to Flask container’s port 5000. Once the containers are running, we can access the servers running in each container through the local ports (e.g., 5006 for Bokeh server and 80 for Flask server).
  • volumes specifies the shared volumes for all containers and map the current working directory on localhost to the /app directory.

The .yaml works with the file structure illustrated in Figure 2, the current working directory will be vggm-webapp folder. 

Figure 2. File Structure for the application.
Figure 2. File Structure for the application.

As shown in Figure 2, by running docker-compose up command in the terminal, docker creates two containers, one for Bokeh using map.py as source file, and another one for Flask using app.py. Two containers are then associated with each other by the settings specified in docker-compose.yaml. Now the webpage with the Bokeh plot mounted by Flask is available at localhost:80.

To the cloud

There are numerous ways to deploy the application to the cloud, including Microsoft Azure, Google Cloud, and Amazon Web Services. Azure provides excellent services when deploying applications on Windows operating systems, or related to Microsoft Office, SQL server, etc. Google Cloud is a better fit for applications related to Big Data and Advanced Machine Learning. AWS provides a wide variety of Cloud Computing services and lots of free-tier eligible services. Hence I choose AWS for my small visualization application.

Specifically, I deploy the application using Amazon Elastic Compute Cloud (EC2) following these steps:

  1. Launch an EC2 t2.micro instance with Ubuntu Server 16.04 LTS (HVM), SSD Volume Type AMI (Amazon Machine Image).
  2. Configure inbound rules in the security group so that port 80 is enabled for hearing requests.
  3. Connect to the instance using ssh.
  4. Install Docker and git on the instance.
  5. Pull the repository.
  6. Run dock-compose up -d.
  7. Close the terminal to disconnect from the instance.

Now the services are up here in detach mode (by specifying -d flag in the last step). That simple.

Conlusion

Docker containers are handy when deploying software applications to different platforms. Combined with advanced cloud-computing infrastructure such as AWS, it is much easier to deploy and scale up the app. I learned a lot during deploying my little data visualization application to the cloud, including how to dockerize applications as well as how to launch AWS EC2 instances.

I hope this post can help you, as well.

Lastly, the code for this project can be found in this repository

Thank you for reading 🙂