How I Made AWS CLI 300% Faster!
This developer explains why he needed to take AWS's CLI up a notch and the experimental way he did it.
Join the DZone community and get the full member experience.
Join For Free
yeah yeah, it's "highly experimental" and all, but still, it's three times faster than simply running
aws bla bla bla
, the "plain" way.
and yes, it won't always be that fast, especially if you only run aws cli about once a fortnight. but it will certainly have a clear impact once you start batching up your aws cli calls; maybe routine account checks/cleanups, maybe extracting tons of cloudwatch metrics records, or maybe a totally different, unheard-of use case.
whatever it is, i guess it would be useful for the masses some day.
plus, as one of the authors and maintainers of the world's first serverless ide , i have certainly had several chances to put it to good use!
the problem: why aws cli is "too slow" for me
(let's just call it "cli", shall we?)
it's actually nothing to do with the cli itself; rather, it's the fact that each cli invocation is a completely new program execution cycle .
this means:
-
python
(and ultimately the os) has to load the binaries, configs,boto
api definitions and so forth; - the cli has to initialize itself: load all supported commands definitions, prepare parsers, generate api client classes and so forth.
but, as usual, the highest impact comes via the network i/o:
- the cli has to create an api client from scratch (the previous one was lost when the command execution completed).
- since the network connection to aws is managed by the client, this means that each command creates (and then destroys) a fresh tcp connection to the aws endpoint, which involves a dns lookup as well (although later lookups may be served from the system cache).
- since aws apis almost always use ssl, every new connection results in a full ssl handshake (client hello, server hello, server cert, yadda, yadda, yadda)
now, assume you have 20
cloudwatch log groups
to be deleted. since the
logs api
does not offer a bulk deletion option, the cheapest way to do this would be to run a simple shell script — looping
aws logs delete-log-group
over all groups:
for i in $(aws logs describe-log-groups --query 'loggroups[*].loggroupname' --output text); do
aws logs delete-log-group --log-group-name $i
done
this would run the cli 20 times (21 to be precise, if you count the initial list api call); meaning that all of the above will run 20 times. clearly a waste of time and resources, since we were quite clear that the same endpoint was going to be invoked in all those runs.
try scaling this up to hundreds or thousands of batched operations and see where it takes you!
and no,
aws-shell
does not cut it.
not yet, at least.
leaving aside the nice and cozy repl interface (interactive user prompt), handy autocompletion, syntax coloring and inline docs, does not give you any performance advantage over
aws-cli
. every command in the shell is executed in a new aws cli instance — with parsers, command hierarchies, api specs and, more importantly,
api clients
— getting recreated for every command.
skeptical? peek at the
aws-shell
sources; or better still, fire up
wireshark
(or
tcpdump
if you dare), run a few commands in the shell repl, and see how each command initializes a fresh ssl channel from scratch.
the proposal: what can we do?
obviously, the cli cannot do much about it. it's a simple program, and whatever improvements we do, won't last until the next invocation. the os would rudely wipe them and start the next cli with a clean slate; unless we use some spooky (and rather discouraged ) memory persistence magic to serialize and reload the cli's state. even then, the other os-level stuff (network sockets etc.) will be gone, and our effort would be pretty much fruitless.
if we are going to make any impactful changes, we need to make the cli stateful, a long-running process.
the d(a)emon
in the os world, this usually means setting up a
daemon
— a background process that waits for and processes events like user commands. (a popular example is mysql, with its
mysql-server
daemon and
mysql-client
packages.)
in our case, we don't want a fully-fledged "managed" daemon, like a system service. for example, there's no point in starting our daemon before we actually start making our cli calls; also, if our daemon dies, there's no point in starting it right away; since we cannot recover the lost state anyway.
so we have a simple plan:
- break the cli into a "client" and daemon
-
every time we run the cli,
- check for the presence of the daemon, and
- spawn the daemon if it is not already running
this way, if the daemon dies, the next cli invocation will auto-start it. nothing to worry, nothing to manage.
our fast aws cli daemon — it's all in a
subprocess
!
it is easy to handle the daemon spawn without having the trouble of maintaining a second program or script; simply
use
subprocess.popen
to launch another instance of the program, and instruct it to run the daemon's code path, rather than the client's.
enough talk; show me the code!
here you go:
#!/usr/bin/python
import os
import sys
import tempfile
import psutil
import subprocess
rd = tempfile.gettempdir() + "/awsr_rd"
wr = tempfile.gettempdir() + "/awsr_wr"
def run_client():
out = open(rd, "w")
out.write(" ".join(sys.argv))
out.write("\n")
out.close()
inp = open(wr, "r")
result = inp.read()
inp.close()
sys.stdout.write(result)
def run_daemon():
from awscli.clidriver import clioperationcaller, log, create_clidriver, history_recorder
def patchedinit(self, session):
self._session = session
self._client = none
def patchedinvoke(self, service_name, operation_name, parameters, parsed_globals):
if self._client is none:
log.debug("creating new %s client" % service_name)
self._client = self._session.create_client(
service_name, region_name=parsed_globals.region,
endpoint_url=parsed_globals.endpoint_url,
verify=parsed_globals.verify_ssl)
client = self._client
response = self._make_client_call(
client, operation_name, parameters, parsed_globals)
self._display_response(operation_name, response, parsed_globals)
return 0
clioperationcaller.__init__ = patchedinit
clioperationcaller.invoke = patchedinvoke
driver = create_clidriver()
while true:
inp = open(rd, "r")
args = inp.read()[:-1].split(" ")[1:]
inp.close()
if len(args) > 0 and args[0] == "exit":
sys.exit(0)
sys.stdout = open(wr, "w")
rc = driver.main(args)
history_recorder.record('cli_rc', rc, 'cli')
sys.stdout.close()
if __name__ == "__main__":
if not os.access(rd, os.r_ok | os.w_ok):
os.mkfifo(rd)
if not os.access(wr, os.r_ok | os.w_ok):
os.mkfifo(wr)
# fork if awsr daemon is not already running
ps = psutil.process_iter(attrs=["cmdline"])
procs = 0
for p in ps:
cmd = p.info["cmdline"]
if len(cmd) > 1 and cmd[0].endswith("python") and cmd[1] == sys.argv[0]:
procs += 1
if procs < 2:
sys.stderr.write("forking new awsr background process\n")
with open(os.devnull, 'r+b', 0) as devnull:
# new instance will see env var, and run itself as daemon
p = subprocess.popen(sys.argv, stdin=devnull, stdout=devnull, stderr=devnull, close_fds=true, env={"awsr_daemon": "true"})
run_client()
elif os.environ.get("awsr_daemon") == "true":
run_daemon()
else:
run_client()
yep, just 89 lines of rather primitive code — of course, it's also on github , in case you were wondering.
some statistics, if you're still not buying it
"lies, damn lies and statistics," they say. but sometimes, statistics can do wonders when you are trying to prove a point.
as you would understand, our new repl really shines when there are more and more individual invocations (api calls); so that's what we would compare.
let's upload some files (via ):
date
for file in $(find -type f -name "*.sha1"); do
aws s3api put-object --acl public-read --body $file --bucket target.bucket.name --key base/path/
done
date
-
bucket region:
us-east-1
- file type: fixed-length checksums
- file size: 40 bytes each
-
additional:
public-read
acl
uploading 70 such files via
aws s3api put-object
takes:
- 4 minutes 35 seconds
- 473.5 kb data (319.5 kb downlink + 154 kb uplink)
- 70 dns lookups + ssl handshakes (one for each file)
in comparison, uploading 72 files via
awsr s3api put-object
takes:
- 1 minute 28 seconds
- 115.5 kb data (43.5 kb downlink + 72 kb uplink)
- 1 dns lookup + ssl handshake for the whole operation
a 320% improvement on latency (or 420% , if you consider bandwidth).
if you feel like it, watch the outputs (stdout) of the two runs — real-time. you would notice how
awsr
shows a low and consistent latency from the second output onwards; while the plain
aws
shows almost the same latency between every output pair -—apparently because almost everything gets re-initialized for each call.
if you monitor (say, "wireshark") your network interface, you will see the real deal:
aws
continuously makes dns queries and ssl handshakes, while
awsr
just makes one every minute or so.
counterargument #1:
if your files are all in one place or directory hierarchy, you could just use
aws s3 cp
or
aws s3 sync
in one go. these will be as performant as
awsr
, if not more. however in my case, i wanted to pick and choose only a subset of files in the hierarchy; and there was no easy way of doing that with the
aws
command alone.
counterargument #2:
if you want to upload to multiple buckets, you will have to batch up the calls bucket-wise (
us-east-1
first,
ap-southeast-2
next, etc.); and kill
awsr
after each batch — more on that later.
cloudwatch
logs
our
serverless ide sigma
generates quite a lot of cloudwatch logs — especially when our qa battalion is testing it. to keep things tidy, i prefer to occasionally clean up these logs, via
aws logs delete-log-group
.
date
for i in $(aws logs describe-log-groups --query 'loggroups[*].loggroupname' --output text); do
echo $i
aws logs delete-log-group --log-group-name $i
done
date
cleaning up 172 such log groups on
us-east-1
, via plain
aws
, takes:
- 5 minutes 44 seconds
- 1.51 mb bandwidth (1133 kb downlink, 381 kb uplink)
- 173 (1 + 172) dns lookups + ssl handshakes; one for each log group, plus one for the initial listing
on the contrary, deleting
252
groups via our new repl
awsr
, takes just:
- 2 minutes 41 seconds
- 382 kb bandwidth (177 kb downlink, 205 kb uplink)
- 4 dns lookups + ssl handshakes (about 1 in each 60 seconds)
this time, a 310% improvement on latency; or 580% on bandwidth.
cloudwatch
metrics
i use this script to occasionally check the sizes of our s3 buckets — to track down and remove any garbage; playing the "scavenger" role:
for bucket in `awsr s3api list-buckets --query 'buckets[*].name' --output text`; do
size=$(awsr cloudwatch get-metric-statistics --namespace aws/s3 \
--start-time $(date -d @$((($(date +%s)-86400))) +%f)t00:00:00 --end-time $(date +%f)t00:00:00 \
--period 86400 --metric-name bucketsizebytes \
--dimensions name=storagetype,value=standardstorage name=bucketname,value=$bucket \
--statistics average --output text --query 'datapoints[0].average')
if [ $size = "none" ]; then size=0; fi
printf "%8.3f %s\n" $(echo $size/1048576 | bc -l) $bucket
done
checking 45 buckets via
aws
(45+1 api calls to the same cloudwatch api endpoint), takes:
checking
61
buckets (62 api calls) via
awsr
, takes:
a 288% improvement.
the catch
there are many; more unknowns than knowns, in fact:
bonus: hands-on aws cli fast automation example, ftw!
i run this occasionally to clean up our aws accounts of old logs and build data. if you are curious, replace the
awsr
occurrences with
aws
(and remove the daemon-killing magic), and witness the difference in speed!
caution:
if there are ongoing codebuild builds, the last step may keep on looping – possibly even indefinitely, if the build is stuck in
build_in_progress
status. if you run this from a fully automated context, you may need to enhance the script to handle such cases as well.
for p in araprofile meprofile podiprofile thadiprofile ; do
for r in us-east-1 us-east-2 us-west-1 us-west-2 ca-central-1 eu-west-1 eu-west-2 eu-central-1 \
ap-northeast-1 ap-northeast-2 ap-southeast-1 ap-southeast-2 sa-east-1 ap-south-1 ; do
# profile and region changed, so kill any existing daemon before starting
arg="--profile $p --region $r"
kill $(ps -ef -c /usr/bin/python | grep -v grep | grep awsr | awk '{print $2}')
rm /tmp/awsr_rd /tmp/awsr_wr
# log groups
for i in $(awsr $arg logs describe-log-groups --query 'loggroups[*].loggroupname' --output text); do
echo $i
awsr $arg logs delete-log-group --log-group-name $i
done
# codebuild projects
for i in $(awsr $arg codebuild list-projects --query 'projects[*]' --output text); do
echo $i
awsr $arg codebuild delete-project --name $i
done
# codebuild builds; strangely these don't get deleted when we delete the parent project...
while true; do
builds=$(awsr $arg codebuild list-builds --query 'ids[*]' --output text --no-paginate)
if [[ $builds = "" ]]; then break; fi
awsr $arg codebuild batch-delete-builds --ids $builds
done
done
done
in closing: so, there it is!
feel free to install and try out
awsr
; after all there's just one file, with less than a hundred lines of code!
although i cannot make any guarantees, i'll try to eventually hunt down and fix the gaping holes and shortcomings; and any other issues that you or me come across along the way.
over to you, soldier/beta user!
Published at DZone with permission of Janaka Bandara, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments