It seems to be my lot in life to point out when we (the community of people who spec and build tech) are ignoring the lessons of the web when feeling our way into new territory.
This post is a short note in webhead lingo about how the augmented reality (AR) community should stop inventing new protocols and, instead, embrace web techniques. I'm assuming that the reader understands why the web always wins, but please feel free to write this up in language that the AR community will grok.
Problem space:
In this post "augmented reality" is narrowly defined: A live video feed overlaid with spatially placed graphics which add information gathered via the Internet.High level summary of Webbish AR:
Webbish augmented reality consists of decentralized services and clients which make use of the patterns and protocols of the web: HTML/CSS/Javascript and HTTP 1.1 provide the base on which techniques like Comet, Ajax are laid. Together with a few new technologies like WebGL, browser geolocation, and HTML5's audio and video elements, the web provides a complete platform for AR.
An example instantiation:
In this example, there are layers containing resources. An example layer might be named "Public art in the City of San Jose" and an example resource in this layer might be named "Big Stone Finger Sculpture" with metadata including a location in SJC, an artist's name, and a fiduciary marker ID.
AR layers have two components: 1.) a JSON-over-Comet based protocol for discovering and receiving updates for nearby digital resources and 2.) a RESTful api for serving the resources. Resources are referenced by URIs and are hypertext documents of types like HTML and Collada.
Initial discovery of layers is performed using mDNS (a la Bonjour in Safari), by visible marker, by hand entered URL, or by querying the GOOG for layers by content-type and location.
Display is handled by the web browser you also use for 2D browsing. In the same way that the browser can open "Top Sites" or "History" pages, the browser can open "Augmented Reality" pages. The most prominent part of an AR page is an HTML5 video element (showing a stream from your device's camera) overlaid by a canvas element using a WebGL context to render two and three dimensional resources. The javascript on the page manages the comet connections to the layers, the geolocation and orientation event handling, recognition of fiduciary markers, and canvas drawing.
Yes, all of the open questions in any other AR system still exist (e.g. How do I find and present metadata for both a statue on the ground and a plane flying overhead?) but the takeaway here is that the exploration to find those answers should happen on the web so that we don't have to then watch the web eat all of our hard work like it did with video and is currently doing with MMOGs.
Code bases of interest:
FriendFeed's Tornado httpd for non-blocking long lived Comet, Safari and Firefox's nightly builds of WebGL, the geolocation api, and their HTML5 based video and audio elements.
Final note: I'm fully aware that we can't build this without changing today's browsers. That said, there's nothing here that requires changes to today's web protocols and standards beyond what any half-asleep monkey can see will soon occur. Be the change you want.
Comments