PhantomJS is a headless WebKit packaged as a JavaScript-driven tool. It can be used in command-line utilities which requires web stack, or even as the basis for testing rich web application. It uses WebKit in a headless mode, so you get access to the real native and fast implementation (not a simulated environment) of various standards such as DOM, CSS selector, Canvas, SVG, and many others.
The project page contains a bunch of examples, from easy ones to some more complicated uses. Feel free to contribute more examples!
Let's look at one of the examples, the page rasterizer (yes, it's only 16 lines!):
if (phantom.state.length === 0) { if (phantom.args.length !== 2) { console.log('Usage: rasterize.js URL filename'); phantom.exit(); } else { var address = phantom.args[0]; phantom.state = 'rasterize'; phantom.viewportSize = { width: 600, height: 600 }; phantom.open(address); } } else { var output = phantom.args[1]; phantom.sleep(200); phantom.render(output); phantom.exit(); }
If I want to have the famous PostScript tiger from its SVG source, all I have to do is to run:
phantomjs rasterize.js http://ariya.github.com/svg/tiger.svg tiger.png
But static vector graphic is boring. Replacing the above with
phantomjs rasterize.js http://raphaeljs.com/polar-clock.html clock.png
gives me Polar Clock, one notable example from RaphaelJS.
Should you need to deal with JSONP, process XML, and integrate with YQL, that's all easily done. Again, refer to the various service integration examples. Let me show one example, which is actually my favorite:
if (phantom.state.length === 0) { var origin, dest; if (phantom.args.length < 2) { console.log('Usage: direction.js origin destination'); console.log('Example: direction.js "San Diego" "Palo Alto"'); phantom.exit(1); } origin = phantom.args[0]; dest = phantom.args[1]; phantom.state = origin + ' to ' + dest; phantom.open(encodeURI('http://maps.googleapis.com/maps/api/directions/xml?origin=' + origin + '&destination=' + dest + '&units=imperial&mode=driving&sensor=false')); } else { if (phantom.loadStatus === 'fail') { console.log('Unable to access network'); } else { var steps; steps = phantom.content.match(/<html_instructions>(.*)<\/html_instructions>/ig); if (steps == null) { console.log('No data available for ' + phantom.state); } else { steps.forEach(function (ins) { ins = ins.replace(/\</ig, '<').replace(/\>/ig, '>'); ins = ins.replace(/\<div/ig, '\n<div'); ins = ins.replace(/<.*?>/g, ''); console.log(ins); }); } } phantom.exit(); }
If I run it like the following:
phantomjs direction.js 'Redwood City' 'Sunnyvale'
what I got is the complete driving direction:
Head east on Broadway toward El Camino Real Take the 1st left onto El Camino Real Turn right at Whipple Ave Slight right to merge onto US-101 S toward San Jose Take exit 398B to merge onto CA-85 S toward Santa Cruz/Cupertino Take exit 22A to merge onto CA-82 S/E El Camino Real toward Sunnyvale Destination will be on the right Map data ©2011 Google
Make sure you check out other examples, such as getting weather forecast conditions, finding pizza in New York, looking up approximate location based on IP address, pulling the list of seasonal food, displaying tweets, and many others.
Headless execution of any web content also enables fast unit testing. Obviously, the goal is not to replace comprehensive, cross-browser framework such as Selenium or Squish for Web. Rather, it serves a quick sanity check just before you check in some changes.
Since this can happen automatically and does not need to launch any browser, even better, you can hook the test so that it executes right before a commit and actually prevents the commit if any of the test fails. It is easily done using git via its hook support. This is something I have written at Sencha blog. It demonstrated precommit hook with Jasmine, but technically it can work with any test framework.
I have been working on and off on PhantomJS for the past few years. You may be already familiar with some of its inspiration (also involving headless WebKit): SVG rasterizer, page capture, visual Google, etc. Finally I managed to overcome my laziness, cleaned up the code, and published it for your pleasure. Obviously it's not a surprise if you find out that PhantomJS uses QtWebKit.
I got a few tasks for next PhantomJS version 1.1. You are encouraged to file bugs and feature requests in the said issue tracker.
Get it while it is hot!
16 comments:
This is just great, thanks a lot for this project.
Really cool project. Can the Windows executable downloaded from the project website be redistributed with a commercial application? What are the conditions for this?
@Tommy: I am not a lawyer so I'm not sure about that. WebKit is using GNU LGPL, PhantomJS itself is BSD licensed, and the executable is statically compiled. You may want to consult knowledgable IP attorney.
This is cool! However, the fact that the entire script gets cleared and re-executed after a call to "open" seems kind of funky.
Wouldn't a callback be a little more straightforward? I could see this getting really convoluted for a multi-page scenario.
I put up an OSX binary for people to play with. I hope that's ok:
http://blog.marc-seeger.de/2011/01/26/phantomjs_osx_binary
These things are always more complicated than they seem. I grabbed the windows binary but can't get examples which use phantom.open() to run. Other examples do work, so webkit is clearly working, but do I need to get the rest of Qt as well?
Addendum to my last comment: I forgot that I have to use a proxy here. If it doesn't automatically detect and use the default system proxy settings is there a way to set the settings manually?
@GVN: The re-execution is not a design choice, it's a workaround to the technical limitation. Essentially the script runs in a context of a web page, so it's thrown away after another page is loaded.
Do I use this with proxy server?
@Danii @Matt: it seems that phantomjs can't (yet) detect a configured http proxy. Unfortunately, there is no command line argument to specify one either. Ariya, could you update the Wiki FAQ with this information?
This is a great project. Do you have any intention of supporting plug-ins ... specifically Flash?
Hi,
Firstly this looks amazing. Have had a quick play (with render() in particular) and it's great.
One question: does the implementation work with cookies? i.e. I've got a script which loads a login page, fills in the form and then clicks the button (all good.) This then returns a new page (it is logged in) but then once I attempt to go to another page it redirects back to the login page. I'm using asp.net but I doubt that matters.
Thanks.
@David Connors: Flash is tricky because most of the time it is a windowed plugin and thus it requires some view (can't be headless).
@MarkJ: I'm fairly sure that cookies support is missing.
Great work, i can imagine this becoming a part of my testing setup.
Wondering: are there any plans to add read/write to the filesystem support from the Javascript API? Would be especially interesting for screenscraping purposes, and writing test results.
@Husky: Check the roadmap wiki page.
Post a Comment