Docker 进阶学习

learn with docker getting started

Posted by Elli0t on 2020-05-11

docker getting started

docker Persisting our DB

we are going to use a named volume. Think of a named volume as simply a bucket of data. Docker maintains the physical location on the disk and you only need to remember the name of the volume. Every time you use the volume, Docker will make sure the correct data is provided.

  1. Create a volume by using the docker volume create command.
    docker volume create todo-db

  2. Start the todo container, but add the -v flag to specify a volume mount. We will use the named volume and mount it to /etc/todos, which will capture all files created at the path.

    docker run -dp 3000:3000 -v todo-db:/etc/todos getting-started

  3. Once the container starts up, open the app and add a few items to your todo list.Items added to todo list

  4. Remove the container for the todo app. Use docker ps to get the ID and then docker rm -f to remove it.

  5. Start a new container using the same command from above.

  6. Open the app. You should see your items still in your list!

  7. Go ahead and remove the container when you’re done checking out your list.

Diving into our Volume

If you want to know where is Docker actually storing my data when I use a named volume, you can use the docker volume inspect command.

1
2
3
4
5
6
7
8
9
10
11
12
$ docker volume inspect todo-db
[
{
"CreatedAt": "2019-09-26T02:18:36Z",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/todo-db/_data",
"Name": "todo-db",
"Options": {},
"Scope": "local"
}
]

The Mountpoint is the actual location on the disk where the data is stored. Note that on most machines, you will need to have root access to access this directory from the host. But, that’s where it is!

Docker network or Multi-Container Apps

docker for network

Container Networking

Remember that containers, by default, run in isolation and don’t know anything about other processes or containers on the same machine. So, how do we allow one container to talk to another? The answer is networking. Now, you don’t have to be a network engineer (hooray!). Simply remember this rule…

If two containers are on the same network, they can talk to each other. If they aren’t, they can’t.

Starting MySQL

There are two ways to put a container on a network: 1) Assign it at start or 2) connect an existing container. For now, we will create the network first and attach the MySQL container at startup.

  1. Create the network.

    1
    docker network create todo-app
  2. Start a MySQL container and attach it the network. We’re also going to define a few environment variables that the database will use to initialize the database (see the “Environment Variables” section in the MySQL Docker Hub listing).

    1
    2
    3
    4
    5
    6
    docker run -d \
    --network todo-app --network-alias mysql \
    -v todo-mysql-data:/var/lib/mysql \
    -e MYSQL_ROOT_PASSWORD=secret \
    -e MYSQL_DATABASE=todos \
    mysql:5.7

    You’ll also see we specified the --network-alias flag. We’ll come back to that in just a moment.

    Pro-tip

    You’ll notice we’re using a volume named todo-mysql-data here and mounting it at /var/lib/mysql, which is where MySQL stores its data. However, we never ran a docker volume create command. Docker recognizes we want to use a named volume and creates one automatically for us.

  3. To confirm we have the database up and running, connect to the database and verify it connects.

    1
    docker exec -it <mysql-container-id> mysql -p

    When the password prompt comes up, type in secret. In the MySQL shell, list the databases and verify you see the todos database.

    1
    mysql> SHOW DATABASES;

    You should see output that looks like this:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    +--------------------+
    | Database |
    +--------------------+
    | information_schema |
    | mysql |
    | performance_schema |
    | sys |
    | todos |
    +--------------------+
    5 rows in set (0.00 sec)

    Hooray! We have our todos database and it’s ready for us to use!

Connecting to MySQL

Now that we know MySQL is up and running, let’s use it! But, the question is… how? If we run another container on the same network, how do we find the container (remember each container has its own IP address)?

To figure it out, we’re going to make use of the nicolaka/netshoot container, which ships with a lot of tools that are useful for troubleshooting or debugging networking issues.

  1. Start a new container using the nicolaka/netshoot image. Make sure to connect it to the same network.

    1
    docker run -it --network todo-app nicolaka/netshoot
  2. Inside the container, we’re going to use the dig command, which is a useful DNS tool. We’re going to look up the IP address for the hostname mysql.

    1
    dig mysql

    And you’ll get an output like this…

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    ; <<>> DiG 9.14.1 <<>> mysql
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32162
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

    ;; QUESTION SECTION:
    ;mysql. IN A

    ;; ANSWER SECTION:
    mysql. 600 IN A 172.23.0.2

    ;; Query time: 0 msec
    ;; SERVER: 127.0.0.11#53(127.0.0.11)
    ;; WHEN: Tue Oct 01 23:47:24 UTC 2019
    ;; MSG SIZE rcvd: 44

    In the “ANSWER SECTION”, you will see an A record for mysql that resolves to 172.23.0.2 (your IP address will most likely have a different value). While mysql isn’t normally a valid hostname, Docker was able to resolve it to the IP address of the container that had that network alias (remember the --network-alias flag we used earlier?).

    What this means is… our app only simply needs to connect to a host named mysql and it’ll talk to the database! It doesn’t get much simpler than that!

Running our App with MySQL

The todo app supports the setting of a few environment variables to specify MySQL connection settings. They are:

  • MYSQL_HOST - the hostname for the running MySQL server
  • MYSQL_USER - the username to use for the connection
  • MYSQL_PASSWORD - the password to use for the connection
  • MYSQL_DB - the database to use once connected

Warning

While using env vars to set connection settings is generally ok for development, it is HIGHLY DISCOURAGED when running applications in production. Diogo Monica, the former lead of security at Docker, wrote a fantastic blog post explaining why.

A more secure mechanism is to use the secret support provided by your container orchestration framework. In most cases, these secrets are mounted as files in the running container. You’ll see many apps (including the MySQL image and the todo app) also support env vars with a _FILE suffix to point to a file containing the file.

As an example, setting the MYSQL_PASSWORD_FILE var will cause the app to use the contents of the referenced file as the connection password. Docker doesn’t do anything to support these env vars. Your app will need to know to look for the variable and get the file contents.

With all of that explained, let’s start our dev-ready container!

  1. We’ll specify each of the environment variables above, as well as connect the container to our app network.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    docker run -dp 3000:3000 \
    -w /app -v ${PWD}:/app \
    --network todo-app \
    -e MYSQL_HOST=mysql \
    -e MYSQL_USER=root \
    -e MYSQL_PASSWORD=secret \
    -e MYSQL_DB=todos \
    node:12-alpine \
    sh -c "yarn install && yarn run dev"
  2. If we look at the logs for the container (docker logs), we should see a message indicating it’s using the mysql database.

    1
    2
    3
    4
    5
    6
    7
    8
    # Previous log messages omitted
    $ nodemon src/index.js
    [nodemon] 1.19.2
    [nodemon] to restart at any time, enter `rs`
    [nodemon] watching dir(s): *.*
    [nodemon] starting `node src/index.js`
    Connected to mysql db at host mysql
    Listening on port 3000
  3. Open the app in your browser and add a few items to your todo list.

  4. Connect to the mysql database and prove that the items are being written to the database. Remember, the password is secret.

    1
    docker exec -ti <mysql-container-id> mysql -p todos

    And in the mysql shell, run the following:

test

Obviously, your table will look different because it has your items. But, you should see them stored there!

If you take a quick look at the Docker Dashboard, you’ll see that we have two app containers running. But, there’s no real indication that they are grouped together in a single app. We’ll see how to make that better shortly!

Docker Dashboard showing two ungrouped app containers

Recap

At this point, we have an application that now stores its data in an external database running in a separate container. We learned a little bit about container networking and saw how service discovery can be performed using DNS.

But, there’s a good chance you are starting to feel a little overwhelmed with everything you need to do to start up this application. We have to create a network, start containers, specify all of the environment variables, expose ports, and more! That’s a lot to remember and it’s certainly making things harder to pass along to someone else.

In the next section, we’ll talk about Docker Compose. With Docker Compose, we can share our application stacks in a much easier way and let others spin them up with a single (and simple) command!

Docker compose (everything to be easy)

Docker Compose is a tool that was developed to help define and share multi-container applications. With Compose, we can create a YAML file to define the services and with a single command, can spin everything up or tear it all down.

The big advantage of using Compose is you can define your application stack in a file, keep it at the root of your project repo (it’s now version controlled), and easily enable someone else to contribute to your project. Someone would only need to clone your repo and start the compose app. In fact, you might see quite a few projects on GitHub/GitLab doing exactly this now.

So, how do we get started?

run the following and see version information

docker-compose version

creating our Compose File

At the root of the app project, create a file named docker-compose.yml or docker-compose.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
version: "3.7"

services:
app:
image: node:12-alpine
command: sh -c "yarn install && yarn run dev"
ports:
- 3000:3000
working_dir: /app
volumes:
- ./:/app
environment:
MYSQL_HOST: mysql
MYSQL_USER: root
MYSQL_PASSWORD: secret
MYSQL_DB: todos

mysql:
image: mysql:5.7
volumes:
- todo-mysql-data:/var/lib/mysql
environment:
MYSQL_ROOT_PASSWORD: secret
MYSQL_DATABASE: todos

volumes:
todo-mysql-data: # 使用卷标的形式,特点就是简洁,但是不知道数据到底在本地的什么位置。需要通过卷标查看。docker volume inspect todo-mysql-data

image-20200511212719803

Tearing it All Down

When you’re ready to tear it all down, simply run docker-compose down or hit the trash can on the Docker Dashboard for the entire app. The containers will stop and the network will be removed.

Removing Volumes

By default, named volumes in your compose file are NOT removed when running docker-compose down. If you want to remove the volumes, you will need to add the --volumes flag.

The Docker Dashboard does not remove volumes when delete the app stack.

Once torn down, you can switch to another project, run docker-compose up and be ready to contribute to that project! It really doesn’t get much simpler than that!

Image Building Best Practices

Image Layering

Did you know that you can look at what makes up an image? Using the docker image history command, you can see the command that was used to create each layer within an image.

  1. Use the docker image history command to see the layers in the getting-started image you created earlier in the tutorial.

    1
    docker image history getting-started

    You should get output that looks something like this (dates/IDs may be different).

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
    a78a40cbf866 18 seconds ago /bin/sh -c #(nop) CMD ["node" "/app/src/ind… 0B
    f1d1808565d6 19 seconds ago /bin/sh -c yarn install --production 85.4MB
    a2c054d14948 36 seconds ago /bin/sh -c #(nop) COPY dir:5dc710ad87c789593… 198kB
    9577ae713121 37 seconds ago /bin/sh -c #(nop) WORKDIR /app 0B
    b95baba1cfdb 13 days ago /bin/sh -c #(nop) CMD ["node"] 0B
    <missing> 13 days ago /bin/sh -c #(nop) ENTRYPOINT ["docker-entry… 0B
    <missing> 13 days ago /bin/sh -c #(nop) COPY file:238737301d473041… 116B
    <missing> 13 days ago /bin/sh -c apk add --no-cache --virtual .bui… 5.35MB
    <missing> 13 days ago /bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 0B
    <missing> 13 days ago /bin/sh -c addgroup -g 1000 node && addu… 74.3MB
    <missing> 13 days ago /bin/sh -c #(nop) ENV NODE_VERSION=12.14.1 0B
    <missing> 13 days ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
    <missing> 13 days ago /bin/sh -c #(nop) ADD file:e69d441d729412d24… 5.59MB

    Each of the lines represents a layer in the image. The display here shows the base at the bottom with the newest layer at the top. Using this, you can also quickly see the size of each layer, helping diagnose large images.

  2. You’ll notice that several of the lines are truncated. If you add the --no-trunc flag, you’ll get the full output (yes… funny how you use a truncated flag to get untruncated output, huh?)

    1
    docker image history --no-trunc getting-started
Layer Caching

Now that you’ve seen the layering in action, there’s an important lesson to learn to help decrease build times for your container images.

Once a layer changes, all downstream layers have to be recreated as well

Let’s look at the Dockerfile we were using one more time…

1
2
3
4
5
FROM node:12-alpine
WORKDIR /app
COPY . .
RUN yarn install --production
CMD ["node", "/app/src/index.js"]

Going back to the image history output, we see that each command in the Dockerfile becomes a new layer in the image. You might remember that when we made a change to the image, the yarn dependencies had to be reinstalled. Is there a way to fix this? It doesn’t make much sense to ship around the same dependencies every time we build, right?

To fix this, we need to restructure our Dockerfile to help support the caching of the dependencies. For Node-based applications, those dependencies are defined in the package.json file. So, what if we copied only that file in first, install the dependencies, and then copy in everything else? Then, we only recreate the yarn dependencies if there was a change to the package.json. Make sense?

  1. Update the Dockerfile to copy in the package.json first, install dependencies, and then copy everything else in.

    1
    2
    3
    4
    5
    6
    FROM node:12-alpine
    WORKDIR /app
    COPY package.json yarn.lock ./
    RUN yarn install --production
    COPY . .
    CMD ["node", "/app/src/index.js"]
  2. Build a new image using docker build.

    1
    docker build -t getting-started .

    You should see output like this…

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    Sending build context to Docker daemon  219.1kB
    Step 1/6 : FROM node:12-alpine
    ---> b0dc3a5e5e9e
    Step 2/6 : WORKDIR /app
    ---> Using cache
    ---> 9577ae713121
    Step 3/6 : COPY package* yarn.lock ./
    ---> bd5306f49fc8
    Step 4/6 : RUN yarn install --production
    ---> Running in d53a06c9e4c2
    yarn install v1.17.3
    [1/4] Resolving packages...
    [2/4] Fetching packages...
    info fsevents@1.2.9: The platform "linux" is incompatible with this module.
    info "fsevents@1.2.9" is an optional dependency and failed compatibility check. Excluding it from installation.
    [3/4] Linking dependencies...
    [4/4] Building fresh packages...
    Done in 10.89s.
    Removing intermediate container d53a06c9e4c2
    ---> 4e68fbc2d704
    Step 5/6 : COPY . .
    ---> a239a11f68d8
    Step 6/6 : CMD ["node", "/app/src/index.js"]
    ---> Running in 49999f68df8f
    Removing intermediate container 49999f68df8f
    ---> e709c03bc597
    Successfully built e709c03bc597
    Successfully tagged getting-started:latest

    You’ll see that all layers were rebuilt. Perfectly fine since we changed the Dockerfile quite a bit.

  3. Now, make a change to the src/static/index.html file (like change the `` to say “The Awesome Todo App”).

  4. Build the Docker image now using docker build again. This time, your output should look a little different.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    Sending build context to Docker daemon  219.1kB
    Step 1/6 : FROM node:12-alpine
    ---> b0dc3a5e5e9e
    Step 2/6 : WORKDIR /app
    ---> Using cache
    ---> 9577ae713121
    Step 3/6 : COPY package* yarn.lock ./
    ---> Using cache
    ---> bd5306f49fc8
    Step 4/6 : RUN yarn install --production
    ---> Using cache
    ---> 4e68fbc2d704
    Step 5/6 : COPY . .
    ---> cccde25a3d9a
    Step 6/6 : CMD ["node", "/app/src/index.js"]
    ---> Running in 2be75662c150
    Removing intermediate container 2be75662c150
    ---> 458e5c6f080c
    Successfully built 458e5c6f080c
    Successfully tagged getting-started:latest

    First off, you should notice that the build was MUCH faster! And, you’ll see that steps 1-4 all have Using cache. So, hooray! We’re using the build cache. Pushing and pulling this image and updates to it will be much faster as well. Hooray!

Multi-Stage Builds

While we’re not going to dive into it too much in this tutorial, multi-stage builds are an incredibly powerful tool to help use multiple stages to create an image. There are several advantages for them:

  • Separate build-time dependencies from runtime dependencies
  • Reduce overall image size by shipping only what your app needs to run
Maven/Tomcat Example

When building Java-based applications, a JDK is needed to compile the source code to Java bytecode. However, that JDK isn’t needed in production. Also, you might be using tools like Maven or Gradle to help build the app. Those also aren’t needed in our final image. Multi-stage builds help.

1
2
3
4
5
6
7
FROM maven AS build
WORKDIR /app
COPY . .
RUN mvn package

FROM tomcat
COPY --from=build /app/target/file.war /usr/local/tomcat/webapps

In this example, we use one stage (called build) to perform the actual Java build using Maven. In the second stage (starting at FROM tomcat), we copy in files from the build stage. The final image is only the last stage being created (which can be overridden using the --target flag).

React Example

When building React applications, we need a Node environment to compile the JS code (typically JSX), SASS stylesheets, and more into static HTML, JS, and CSS. If we aren’t doing server-side rendering, we don’t even need a Node environment for our production build. Why not ship the static resources in a static nginx container?

1
2
3
4
5
6
7
8
9
10
FROM node:12 AS build
WORKDIR /app
COPY package* yarn.lock ./
RUN yarn install
COPY public ./public
COPY src ./src
RUN yarn run build

FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html

Here, we are using a node:12 image to perform the build (maximizing layer caching) and then copying the output into an nginx container. Cool, huh?

Recap

By understanding a little bit about how images are structured, we can build images faster and ship fewer changes. Multi-stage builds also help us reduce overall image size and increase final container security by separating build-time dependencies from runtime dependencies.