Saturday, January 12, 2013

Essential Command-Line Tools for Web Developers

Essential Command-Line Tools for Web Developers:
Tools can make our workflows feel seamless, allowing us to focus on what we are building, and not worry about the process. Most web developers, on all portions of the stack, work from the command-line. There are countless utilities, which can make you more productive. These aren't full blown command-line applications, such as Git, but rather simple and composable tools, which can improve your workflow as a web developer.

If you are on Mac OS X, you can use Homebrew to install any of the tools that are not part of the standard packages. Each of these tools have pages worth of options, and only a fraction of them are reviewed below. If you’d like to learn more about each tool, begin in the man pages with man <command> or <command> -h.
Getting familiar with reading the supplied documentation, rather than Googling for an answer, is an essential skill for a productive developer.

cURL

cURL is a venerable Swiss-Army knife for hitting URLs and receiving data in return.
cURL is used to “transfer a URL” according to the man page. It’s a venerable Swiss-Army knife for hitting URLs and receiving data in return. From simply returning the body of a page, such as your external IP address from ifconfig.me, to downloading a file, to viewing the headers of a page. But cURL isn't just about pulling data, you can use it to push data to URLs to submit forms or APIs. Every developer working on the web should know this tool.
The basic usage of cURL is to download the contents of a website. Here’s an example:
$ curl ifconfig.me
173.247.192.90
You may want to download a file from a site, in which case you'll use the -O flag.
$ curl -LO http://download.virtualbox.org/virtualbox/4.2.4/VirtualBox-4.2.4-81684-OSX.dmg
Notice that we also used the -L flag, which tells cURL to follow any redirects (which we need to download VirtualBox).
Often, when configuring web servers, you'll want to see that the headers are being set properly (e.g., Cache-Control and Etags). You can view just the headers of a page with the -I flag:
$ curl -I http://newrelic.com
HTTP/1.1 200 OK
Server: NewRelic/0.8.53
Date: Fri, 16 Nov 2012 22:24:36 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Status: 200 OK
X-UA-Compatible: IE=Edge,chrome=1
ETag: "4dafac3d1cc508e44b4ed1b0ac7f22d9"
Cache-Control: max-age=0, private, must-revalidate
X-Runtime: 0.056271
X-Rack-Cache: miss
Setting request headers is fairly common when working with URLs from the command-line. Many times, you'll need to define an Accept Header to set the response type, or Authorization header for setting your credentials. You can also set custom headers, if required by your API.
$ curl -iH "x-api-key:yourapikey" https://api.newrelic.com/api/v1/accounts/1/servers.xml
HTTP/1.1 200 OK
Server: NewRelic/0.8.53
Date: Fri, 16 Nov 2012 22:38:55 GMT
Content-Type: application/xml; charset=utf-8
Connection: keep-alive
Status: 200 OK
X-Runtime: 606
ETag: "c57879ddfc1e35ec4a390d306114275f"
Cache-Control: private, max-age=0, must-revalidate
Content-Length: 38871
Vary: Accept-Encoding
<?xml version="1.0" encoding="UTF-8"?>
    <servers type="array">
        <server>
            <id type="integer">987987</id>
            <hostname>proxy1</hostname>
            <overview-url>https://api.newrelic.com/accounts/1/servers/987987</overview-url>
        </server>
    </servers>
In addition to setting an API key header with -H, we've used -i to return both the header information, as well as the response body.
cURL isn't just for pulling down URLs, you can also use it to make POST/PUT requests to submit data to an API or act as an HTML form.
$ curl https://api.faker.com/v1/customers \
    -H 'x-api-key:apikey' \
    -H 'Content-Type: application/json' \
    -d '{"firstName":"Justin", "lastName":"Bieber"}'
The beauty of cURL is that it can be piped into other programs to work with the returned data.
We've broken the command down to multiple lines, using \ to make it easier to read. First, we need to set the proper headers with -H; you can see that we’ve set multiple headers by using multiple -H. Then, we set the JSON data for the POST using -d.
We've only scratched the surface of the power of cURL, and you should get very familiar with these basic commands. After a while, you may even find yourself browsing the web more often through cURL. The beauty of cURL, as with many UNIX utilities, is that it can be piped into other programs (like grep or jq) to work with the returned data.

jq

jq is like sed for JSON.
If you are working with a lot of JSON APIs (who isn't these days?), then you'll want to become familiar with jq. Described as a lightweight and flexible command-line JSON parser, jq is like sed for JSON. You aren't limited to using jq with APIs though; it can parse any JSON document (maybe from your favorite NoSQL database like Riak or Couchbase). The most common use case would be to pipe the results of a cURL request into jq, and you've got a powerful combination for working with JSON APIs.
First, start with retrieving a JSON document; in this case, we're pulling tweets as JSON from Twitter's API, and piping it into jq. We'll return the entire response with . (period).
$ curl 'http://search.twitter.com/search.json?q=bieber&rpp=5&include_entities=true' | jq '.'
{
    "since\_id\_str": "0",
    "since_id": 0,
    "results\_per\_page": 5,
    "completed_in": 0.082,
    "max_id": 269583781747372030,
    "max\_id\_str": "269583781747372032",
    "next_page": "?page=2&max_id=269583781747372032&q=bieber&rpp=5&include_entities=1",
    "page": 1,
    "query": "bieber",
    "refresh_url": "?since_id=269583781747372032&q=bieber&include_entities=1",
    "results": [...]
}
The JSON response includes meta-data about the request, as well as the results of the query, stored in the results[] array (which I truncated in the above response). We can pull out just the first tweet with:
$ curl 'http://search.twitter.com/search.json?q=bieber&rpp=5&include_entities=true' | jq '.results[0]'
The tweet contains a fair bit of information that might not be pertinent to our current project, so we can show only the individual fields we're interested in:
$ curl 'http://search.twitter.com/search.json?q=bieber&rpp=5&include_entities=true' | jq '.results[0] | {from_user, text}'
{
    "text": "I just voted for Justin Bieber #maleartist #PeoplesChoice. Retweet to vote http://t.co/Y8405WqO via @peopleschoice",
    "from_user": "PequenaDoDrew"
}
You can collect complex results into usable formats by surrounding the filter in []. For example, we might want to return all image URLs found in our tweets into the media array. Twitter's API returns any included media information within the entities field, so we can return the URLs, like so:
$ curl 'http://search.twitter.com/search.json?q=bieber&rpp=5&include_entities=true' | jq '.results[] | {from_user, text, media: [.entities.media[].media_url]}'
{
    "media": [ "https://twitpic.com/show/iphone/bdpx8p" ],
    "text": "I just voted for Justin Bieber #maleartist #PeoplesChoice. Retweet to vote http://t.co/Y8405WqO via @peopleschoice",
    "from_user": "PequenaDoDrew"
}
That's just the start of what jq is capable of. Refer to the full documentation to unlock its serious power. Once you get the hang of folding up responses into a desired result, you'll be able to work with and transform data like a machine.

ngrep

Ngrep, or network grep, is exactly what it sounds like: grep for network traffic.
Ngrep, or network grep, is exactly what it sounds like: grep for network traffic. It allows you to use a regular expression to match network packets, and it will return very similar information to what you would get from curl -I. The basic usage can be helpful to see all the requests a page is making, but that's just the start. When working with rich JavaScript client applications that make countless AJAX requests, which can be difficult to monitor and debug, ngrep will be your new best friend.
$ ngrep -d en1 -q -W byline "^(GET|POST) .*"
interface: en1 (172.16.0.0/255.255.255.0)
match: ^(GET|POST) .*
T 172.16.0.79:59435 -> 204.93.223.150:80 [A]
GET / HTTP/1.1.
Host: newrelic.com.
Connection: keep-alive.
Cache-Control: max-age=0.
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10\_7\_4) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11.
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8.
Accept-Encoding: gzip,deflate,sdch.
Accept-Language: en-US,en;q=0.8.
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3.
Cookie: ...
This command will show all GET and POST requests made on a non-default interface with -d en1. The -W byline will maintain the line breaks for better readability, and -q will give you less noise about non-matching packets. You can use the host and port parameters to isolate the traffic to the specific application you are working with, if you need to.
ngrep is great for viewing network traffic, and can bring to light what type of data is being passed between your computer and various sites. For example, it wouldn't take long to find a site sending your password in plaintext in the POST request for a login form.

S3cmd

S3cmd gives you command-line access to your buckets and files on S3, plus much more.
Nearly every developer stores files in Amazon S3 at this point. You might be using it for simple storage of DB backups, or using it as the backing to your CDN, or even serving an entire static site from it. While the name stands for Simple Storage Service, working with the admin panel can be anything but simple. Besides, why would you want to leave the command-line to use an interface for a file system? S3cmd gives you command-line access to your buckets and files on S3, plus much more.
After configuring the tool, which entails entering access keys from the AWS console, you will be able to work with S3 much in the same way that you would your local filesystem.
Make a bucket:
$ s3cmd mb s3://ckelly-demo-bucket
Bucket 's3://ckelly-demo-bucket/' created
List your buckets:
$ s3cmd ls
2012-11-27 00:52  s3://ckelly-demo-bucket
Put a file into the bucket:
$ s3cmd put index.html s3://ckelly-demo-bucket/index.html
index.html -> s3://ckelly-demo-bucket/index.html  [1 of 1]
List the contents of the bucket:
$ s3cmd ls s3://ckelly-demo-bucket
2012-11-27 00:54  0  s3://ckelly-demo-bucket/index.html
Download a file from the bucket:
$ s3cmd get s3://ckelly-demo-bucket/index.html
s3://ckelly-demo-bucket/index.html -> ./index.html  [1 of 1]
Remove a bucket and its content:
$ s3cmd rb --recursive s3://ckelly-demo-bucket
WARNING: Bucket is not empty. Removing all the objects from it first. This may take some time...
File s3://ckelly-demo-bucket/index.html deleted
Bucket 's3://ckelly-demo-bucket/' removed
We've just previewed the file system commands, but that is only the start for S3cmd. You can also use it to manage your access control list, as well as your CloudFront distribution points from the command-line!

Localtunnel

You'll never have to mess with manually tunneling traffic again.
Localtunnel is a project, by Jeff Lindsay and sponsored by Twilio, that makes it dead simple to expose your local web server to the Internet. Localtunnel is one tool that takes the UNIX philosophy to heart: it does one thing and does it well. The only option is to upload a public key for authentication, but that only needs to be done once.
Localtunnel is a RubyGem, so you'll need Ruby and RubyGems installed. A simple gem install localtunnel will get you started. Then, to expose a locally running server and port, you simply pass the port you want to expose as an argument.
$ localtunnel -k /Users/ckelly/.ssh/id_rsa.pub 3000
This localtunnel service is brought to you by Twilio.
Port 3000 is now publicly accessible from http://4nc9.localtunnel.com ...
Your local server can then be accessed by anyone, anywhere. It's great for sharing work in progress, and perfect for accessing your application on a mobile device. You'll never have to mess with manually tunneling traffic again. Localtunnel solves a simple, but painful problem; it's the ideal tool for a web developer.

Just Getting Started

There are numerous other tools, which are central to a web developer's life (load testing, network monitoring, etc.), and they vary wildly, depending on what portion of the stack you’re working in. You might want to visit Command-line Fu or follow Command-line Magic on Twitter to discover new tools. And, of course, you should definitely try New Relic, since no web developer should be without it.

No comments:

Post a Comment