Thursday, December 1, 2016

Shrink the size of Docker.qcow2 to free valuable diskspace

Since my developing work is moved to Docker platform for Mac, it makes the machine setup quicker and easier. Now they have stable version of Docker for Mac for use which is great! However, the easiness comes with a price of file size inflation on the development machine. On Mac, it's not uncommon to find that we are running out of diskspace. After Docker for Mac is in place for a couple of months, there's a surprise over the size of their qcow2 file:


$ ls -l ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
-rw-r--r--  1 user  staff   46GB Nov 1 14:47 Docker.qcow2
-rw-r--r--  1 user  staff    64K Nov 1 14:44 console-ring
-rw-r--r--  1 user  staff     5B Nov 1 14:44 hypervisor.pid
-rw-r--r--  1 user  staff     0B Nov 1 12:34 lock
drwxr-xr-x  4 user  staff   136B Nov 1 12:34 log/
-rw-r--r--  1 user  staff    17B Nov 1 14:44 mac.0
-rw-r--r--  1 user  staff    36B Nov 1 12:34 nic1.uuid
-rw-r--r--  1 user  staff     5B Nov 1 14:44 pid
-rw-r--r--  1 user  staff   141B Nov 1 14:44 syslog
lrwxr-xr-x  1 user  staff    12B Nov 1 14:44 tty@ -> /dev/ttys001


As you can see above, Docker.qcow2 grows up to 46GB which almost eats up half of the free space on SSD drive. I remember I have regularly remove unused images and containers. Even I have done this, the file size of Docker.qcow2 didn't actually stop growing.

In theory, Docker.qcow2 file keeps those layers and containers in use for Docker Engine. But the fact is that Docker doesn't come with a cleanup mechanism for all these. As long as we are pulling new images for testing and then delete them, those data remains inside Docker.qcow2 and will not be erased. This is why we see a huge file sitting on the harddrive as time goes by.

You may try deleting Docker.qcow2 file but you are going to destroy everything you've built inside the containers. After a restart of Docker engine, this file may still grow up to the previous size based on Docker's registry information for all those used or unused layers and containers.

Through using qemu utilities, we can shrink the size of .qcow2 file effectively.
$ brew update && brew install qemu
$
$ cd ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
$ mv original_image.qcow2 original_image.qcow2_backup
$ qemu-img convert -O qcow2 original_image.qcow2_backup original_image.qcow2
$
$

Once we have confirmed Docker engine is up and running again, we can remove the backup file:
$ rm ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2_backup


Another way to reclaim used space within .qcow2 file is using dock_gc:
https://github.com/spotify/docker-gc

You can follow the instructions up there to build a custom Docker image based on your current Docker version number and then deploy it as Docker image and run the cleanup command like this:

$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -v /etc:/etc spotify/docker-gc


Reminder: The docker-gc container requires access to the docker socket in order to function, so we need to map it when running this command. The /etc directory is also mapped so that it can read any exclude files that we have created.

Once we git clone the source of docker-gc, we can start modifying to our needs.

To checkout:
$ git clone https://github.com/spotify/docker-gc.git


To build the source and upload to local Docker engine:
$ docker build -t spotify/docker-gc .


Combining docker_gc with qemu-img command, we can effectively reduce the size of .qcow2 file safe and sound.

Here's the modified version of my Dockerfile

FROM gliderlabs/alpine:3.2

ENV DOCKER_VERSION 1.12.3

# We get curl so that we can avoid a separate ADD to fetch the Docker binary, and then we'll remove it
RUN apk --update add bash curl 
RUN cd /tmp/ 
RUN curl -sSL -O https://get.docker.com/builds/Linux/x86_64/docker-${DOCKER_VERSION}.tgz 
RUN tar zxf docker-${DOCKER_VERSION}.tgz 
RUN mkdir -p /usr/local/bin/ 
RUN mv ./docker /usr/local/bin/ 
RUN chmod +x /usr/local/bin/docker 
RUN apk del curl 
RUN rm -rf /tmp/* 
RUN rm -rf /var/cache/apk/*

COPY ./docker-gc /docker-gc

VOLUME /var/lib/docker-gc

CMD ["/docker-gc"]



Here's the modified version of docker_gc file
#!/bin/bash

# Copyright (c) 2014 Spotify AB.
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

# This script attempts to garbage collect docker containers and images.
# Containers that exited more than an hour ago are removed.
# Images that have existed more than an hour and are not in use by any
# containers are removed.

# Note: Although docker normally prevents removal of images that are in use by
#       containers, we take extra care to not remove any image tags (e.g.
#       ubuntu:14.04, busybox, etc) that are used by containers. A naive
#       "docker rmi `docker images -q`" will leave images stripped of all tags,
#       forcing users to re-pull the repositories even though the images
#       themselves are still on disk.

# Note: State is stored in $STATE_DIR, defaulting to /var/lib/docker-gc

# The script can send log messages to syslog regarding which images and
# containers were removed. To enable logging to syslog, set LOG_TO_SYSLOG=1.
# When disabled, this script will instead log to standard out. When syslog is
# enabled, the syslog facility and logger can be configured with
# $SYSLOG_FACILITY and $SYSLOG_LEVEL respectively.

set -o nounset
set -o errexit

GRACE_PERIOD_SECONDS=${GRACE_PERIOD_SECONDS:=3600}
STATE_DIR=${STATE_DIR:=/var/lib/docker-gc}
FORCE_CONTAINER_REMOVAL=${FORCE_CONTAINER_REMOVAL:=0}
FORCE_IMAGE_REMOVAL=${FORCE_IMAGE_REMOVAL:=0}
#DOCKER=${DOCKER:=docker}
DOCKER='/usr/local/bin/docker/docker'
PID_DIR=${PID_DIR:=/var/run}
LOG_TO_SYSLOG=${LOG_TO_SYSLOG:=0}
SYSLOG_FACILITY=${SYSLOG_FACILITY:=user}
SYSLOG_LEVEL=${SYSLOG_LEVEL:=info}
SYSLOG_TAG=${SYSLOG_TAG:=docker-gc}
DRY_RUN=${DRY_RUN:=0}
EXCLUDE_DEAD=${EXCLUDE_DEAD:=0}

for pid in $(pidof -s docker-gc); do
    if [[ $pid != $$ ]]; then
        echo "[$(date)] : docker-gc : Process is already running with PID $pid"
        exit 1
    fi
done

trap "rm -f -- '$PID_DIR/dockergc'" EXIT

echo $$ > $PID_DIR/dockergc


EXCLUDE_FROM_GC=${EXCLUDE_FROM_GC:=/etc/docker-gc-exclude}
if [ ! -f "$EXCLUDE_FROM_GC" ]
then
  EXCLUDE_FROM_GC=/dev/null
fi

EXCLUDE_CONTAINERS_FROM_GC=${EXCLUDE_CONTAINERS_FROM_GC:=/etc/docker-gc-exclude-containers}
if [ ! -f "$EXCLUDE_CONTAINERS_FROM_GC" ]
then
  EXCLUDE_CONTAINERS_FROM_GC=/dev/null
fi

EXCLUDE_IDS_FILE="exclude_ids"
EXCLUDE_CONTAINER_IDS_FILE="exclude_container_ids"

function date_parse {
  if date --utc >/dev/null 2>&1; then
    # GNU/date
    echo $(date -u --date "${1}" "+%s")
  else
    # BSD/date
    echo $(date -j -u -f "%F %T" "${1}" "+%s")
  fi
}

# Elapsed time since a docker timestamp, in seconds
function elapsed_time() {
    # Docker 1.5.0 datetime format is 2015-07-03T02:39:00.390284991
    # Docker 1.7.0 datetime format is 2015-07-03 02:39:00.390284991 +0000 UTC
    utcnow=$(date -u "+%s")
    replace_q="${1#\"}"
    without_ms="${replace_q:0:19}"
    replace_t="${without_ms/T/ }"
    epoch=$(date_parse "${replace_t}")
    echo $(($utcnow - $epoch))
}

function compute_exclude_ids() {
    # Find images that match patterns in the EXCLUDE_FROM_GC file and put their
    # id prefixes into $EXCLUDE_IDS_FILE, prefixed with ^

    PROCESSED_EXCLUDES="processed_excludes.tmp"
    # Take each line and put a space at the beginning and end, so when we
    # grep for them below, it will effectively be: "match either repo:tag
    # or imageid".  Also delete blank lines or lines that only contain
    # whitespace
    sed 's/^\(.*\)$/ \1 /' $EXCLUDE_FROM_GC | sed '/^ *$/d' > $PROCESSED_EXCLUDES
    # The following looks a bit of a mess, but here's what it does:
    # 1. Get images
    # 2. Skip header line
    # 3. Turn columnar display of 'REPO TAG IMAGEID ....' to 'REPO:TAG IMAGEID'
    # 4. find lines that contain things mentioned in PROCESSED_EXCLUDES
    # 5. Grab the image id from the line
    # 6. Prepend ^ to the beginning of each line

    # What this does is make grep patterns to match image ids mentioned by
    # either repo:tag or image id for later greppage
    $DOCKER images \
        | tail -n+2 \
        | sed 's/^\([^ ]*\) *\([^ ]*\) *\([^ ]*\).*/ \1:\2 \3 /' \
        | grep -f $PROCESSED_EXCLUDES 2>/dev/null \
        | cut -d' ' -f3 \
        | sed 's/^/^(sha256:)?/' > $EXCLUDE_IDS_FILE
}

function compute_exclude_container_ids() {
    # Find containers matching to patterns listed in EXCLUDE_CONTAINERS_FROM_GC file
    # Implode their values with a \| separator on a single line
    PROCESSED_EXCLUDES=`cat $EXCLUDE_CONTAINERS_FROM_GC \
        | xargs \
        | sed -e 's/ /\|/g'`
    # The empty string would match everything
    if [ "$PROCESSED_EXCLUDES" = "" ]; then
        touch $EXCLUDE_CONTAINER_IDS_FILE
        return
    fi
    # Find all docker images
    # Filter out with matching names
    # and put them to $EXCLUDE_CONTAINER_IDS_FILE
    $DOCKER ps -a \
        | grep -E "$PROCESSED_EXCLUDES" \
        | awk '{ print $1 }' \
        | tr -s " " "\012" \
        | sort -u > $EXCLUDE_CONTAINER_IDS_FILE
}

function log() {
    msg=$1
    if [[ $LOG_TO_SYSLOG -gt 0 ]]; then
        logger -i -t "$SYSLOG_TAG" -p "$SYSLOG_FACILITY.$SYSLOG_LEVEL" "$msg"
    else
        echo "[$(date +'%Y-%m-%dT%H:%M:%S')] [INFO] : $msg"
    fi
}

function container_log() {
    prefix=$1
    filename=$2

    while IFS='' read -r containerid
    do
        log "$prefix $containerid $(${DOCKER} inspect -f {{.Name}} $containerid)"
    done < "$filename"
}

function image_log() {
    prefix=$1
    filename=$2

    while IFS='' read -r imageid
    do
        log "$prefix $imageid $(${DOCKER} inspect -f {{.RepoTags}} $imageid)"
    done < "$filename"
}

# Change into the state directory (and create it if it doesn't exist)
if [ ! -d "$STATE_DIR" ]
then
  mkdir -p $STATE_DIR
fi
cd "$STATE_DIR"

# Verify that docker is reachable
$DOCKER version 1>/dev/null

# List all currently existing containers
$DOCKER ps -a -q --no-trunc | sort | uniq > containers.all

# List running containers
$DOCKER ps -q --no-trunc | sort | uniq > containers.running
container_log "Container running" containers.running

# compute ids of container images to exclude from GC
compute_exclude_ids

# compute ids of containers to exclude from GC
compute_exclude_container_ids

# List containers that are not running
comm -23 containers.all containers.running > containers.exited

if [[ $EXCLUDE_DEAD -gt 0 ]]; then
    echo "Excluding dead containers"
    # List dead containers
    $DOCKER ps -q -a -f status=dead | sort | uniq > containers.dead    
    comm -23 containers.exited containers.dead > containers.exited.tmp
    cat containers.exited.tmp > containers.exited
fi

container_log "Container not running" containers.exited

# Find exited containers that finished at least GRACE_PERIOD_SECONDS ago
> containers.reap.tmp
cat containers.exited | while read line
do
    EXITED=$(${DOCKER} inspect -f "{{json .State.FinishedAt}}" ${line})
    ELAPSED=$(elapsed_time $EXITED)
    if [[ $ELAPSED -gt $GRACE_PERIOD_SECONDS ]]; then
        echo $line >> containers.reap.tmp
    fi
done

# List containers that we will remove and exclude ids.
cat containers.reap.tmp | sort | uniq | grep -v -f $EXCLUDE_CONTAINER_IDS_FILE > containers.reap || true

# List containers that we will keep.
comm -23 containers.all containers.reap > containers.keep

# List images used by containers that we keep.
cat containers.keep |
xargs -n 1 $DOCKER inspect -f '{{.Image}}' 2>/dev/null |
sort | uniq > images.used

# List images to reap; images that existed last run and are not in use.
$DOCKER images -q --no-trunc | sort | uniq > images.all

# Find images that are created at least GRACE_PERIOD_SECONDS ago
> images.reap.tmp
cat images.all | while read line
do
    CREATED=$(${DOCKER} inspect -f "{{.Created}}" ${line})
    ELAPSED=$(elapsed_time $CREATED)
    if [[ $ELAPSED -gt $GRACE_PERIOD_SECONDS ]]; then
        echo $line >> images.reap.tmp
    fi
done
comm -23 images.reap.tmp images.used | grep -E -v -f $EXCLUDE_IDS_FILE > images.reap || true

# Use -f flag on docker rm command; forces removal of images that are in Dead
# status or give errors when removing.
FORCE_CONTAINER_FLAG=""
if [[ $FORCE_CONTAINER_REMOVAL -gt 0 ]]; then
    FORCE_CONTAINER_FLAG="-f"
fi
# Reap containers.
if [[ $DRY_RUN -gt 0 ]]; then
    container_log "The following container would have been removed" containers.reap
else
    container_log "Removing containers" containers.reap
    xargs -n 1 $DOCKER rm $FORCE_CONTAINER_FLAG --volumes=true < containers.reap &>/dev/null || true
fi

# Use -f flag on docker rmi command; forces removal of images that have multiple tags
FORCE_IMAGE_FLAG=""
if [[ $FORCE_IMAGE_REMOVAL -gt 0 ]]; then
    FORCE_IMAGE_FLAG="-f"
fi

# Reap images.
if [[ $DRY_RUN -gt 0 ]]; then
    image_log "The following image would have been removed" images.reap
else
    image_log "Removing image" images.reap
    xargs -n 1 $DOCKER rmi $FORCE_IMAGE_FLAG < images.reap &>/dev/null || true
fi








No comments:

Post a Comment