Learning Art History in Context: A Model of Borobudur and the Limits of Reality

by Michael Greenhalgh <Michael.Greenhalgh@anu.edu.au>

Introduction

Learning the history of art or architecture has generally meant sitting in a darkened room looking at slides on screens, or looking at pictures and reading the accompanying text in a book. The only context physical or intellectual is that provided by the lecturer or the text. As for how the learning happens, the student has little control over one or more of its timing, nature, content, direction or pace. Hence the attractivness of the promises proferred by computerization because, using digital multimedia, the student should be able to control all such parameters, and to learn better because Virtual Reality provides a richly-textured physical context - a good second-best, perhaps, to visiting the actual monument or site. The web increases flexibility and convenience of delivery, and tools are available to construct virtual "worlds" and to view and manipulate them in web browsers.

With the euphoria endemic to computing and software development, ambitious projects are begun which scholars hope will provide realistic and highly-detailed virtual environments from which their students may learn. This paper examines one such project, namely a model of Borobudur, for which a computer "wire model" was first constructed, and then "clothed" with photograph images to approximate to a version of "reality". The paper explains in detail why the Virtual Reality Modelling Language - VRML (here is a Beginners' Guide) - cannot deliver the goods because it can never provide cost-effective, detailed and accurate models of large-scale objects or complex sites. The paper concludes that we must therefore look elsewhere for practicable pseudo-immersive web-based technologies, and offers some suggestions.

The search for reality The search for ways of giving the impression of reality to a constructed artwork goes back at least to the Hellenistic Greeks, with perspective developed as a tool offering rules for the placement of objects in believable space. Hence in the popular imagination, the art of the Renaissance goes hand-in-hand with the delineation of the real world via perspectival tricks, and the delineation of atmospherics (as in Masaccio's famous Tribute Money fresco, SM del Carmine, Brancacci Chapel),


to accentuate the impression of distance. Wall-sized frescoes on adjacent walls from at least the time of Augustus, such as the garden room from the Villa of Livia at Prima Porta, (now Rome, Palazzo Massimo),


immerse the viewer in a fictive space by providing views considerably larger than a human field of vision - offering a "wrap-around" comprehensive view seen on a smaller scale in panoramic drawings and paintings. From the 18th century, provoked perhaps by developments in stage scenery (cf. panoramas with a literary origin), special buildings are constructed to display large-scale wrap-around, usually 360-degree panoramas - a vogue that continues even with the development of photographic panoraamas, and of which several working examples survive today. The simplest was the continuous wrap-around image, painted and displayed in circular buildings, laid out so that the spectators could position themselves in the centre, and inspect the painted "world" wrapped right around their horizon. Such a specification required a substantial structure. For example, Burford's Panorama in Leicester Square (London, England) was immense, as this cross-section from 1801 demonstrates:



Such immersive spectacles depended for their effect on the artist/photographer's skill. The viewer was placed in the middle, and either moved on the central axis to view the work or, in some cases, the spectacle itself rotated while the spectator stayed still.

In no case (except for modulations of lighting) could the artwork be manipulated, for this always remained in the same perspective position relative to the spectator - although artists ifrom at least the Hellenistic period were adept at manipulating perspective and creating their own versions of virtual reality .

Apparent attractions of computer-based Virtual Reality The inflexibility of the pre-computer immersive spectacle was apparently to be broken by a machine which could indeed conjure up scenes that the spectator/user could now manipulate, with some measure of automation involved. Indeed, at first sight, the ability to create new "worlds" via computer software commands offers a true three dimensions to the flat computer screen, outpacing pre-computer "worlds" where artistry and perspective had to be used to conjure up "reality". Notionally, at least, the computer user can pick up such a constructed "world", move in and between its elements, examining them in close-up or from a distance, and from any angle - sometimes even from outside the "world". Nor were such worlds intended just for landscape or gaming simulations, for in the most excitable predictions we would lose the computer monitor altogether, and occupy a virtual space for our work, moving effortlessly from the computerised Desktop to three-dimensional virtual filing-cabinets from which we can withdraw digital documents. Or we could interact with others in a completely virtual environment. Such projections require a lot of computing power, and have yet to be shown to work at a high level of detail. Just what the advantages of such a hugely expensive setup might be are never clear; and I can find no working examples of such setups beyond mere demonstrations. As we all should know by now, a demo is often a false promise, because it promotes what the developer can do, rather than what the user necessarily needs; and the stipulation Show me the software doing now what I want it to do will usually bring reality back, and with it the realisation that between feasibility and practical use there can be a huge gulf - even before we begin to analyse utility and cost-effectiveness.

VRML is only one of several ways of constructing "reality" - that is, a simulacrum of our world, with recognizable objects occupying convincing space. Computer games and professional simulation (e.g. for pilot training) are where the VR money is to be made; but neither area requires a high level of detail, nor yet a large variety of completely different ensembles such as (for example) would be required for a convincing model of a great cathedral or palace.

Indeed, VR/VRML technologies can work well in those instances where the world to be constructed is completely imaginary (as in a computer game - which usually moves so fast that intricate detailing is not required), or where it is a reconstruction of something now lost. In the former, best seen in computer games, no strict relationship with reality is needed or perhaps even desirable. In the latter, exemplified by archaeological reconstructions, the often tentative nature of the exercise precludes the need for a high level of detailing (by definition, suitable levels of information are rarely available); and hence the viewer accepts the abbreviated result as natural, because the computer model is simply a higher-tech version of something produced a decade ago on a drawing-board.

But what about those instances where the aim is to build in the computer a simulacrum of the real world, as it exists, and as it can be measured and photographed, here and now? There are many disciplines in which such models would be useful: in Art History, for example, they would get students away from the restricting format of the flat screen in the lecture room, and give them three-dimensional experience of monuments and sites which perhaps they could never hope to visit "in the flesh". Again, VRML has apparently offered enticements in archaeology, although sophisticated and detailed models are absent - and the tendency is to deal with individual buildings, rather than with immersive worlds. This is because the requirements of a VRML model sophisticated enough to satisfy researchers are considerably more exacting than those for simulations developed e.g. for pilot training.

Unfortunately, VRML is unable to oblige in building a real world, because the techniques it uses are incapable of attaining at any reasonable cost either detail or accuracy sufficient to convince the viewer that what is seen is indeed indistinguishable from the real world. That this is the case is naturally reflected in two research areas, where scholars attempt to plug deficiencies in "world-creators" such as VRML: one examines automation and computer vision (e.g. video cameras and computers on a robot trolly) to capture data for offering to VRML; the other provides algorithms for the simulation of tones, textures and shapes in the real world. In other words, because VRML is tedious and exacting to construct by hand, and cannot offer textures that convince the eye, software must be developed to compensate. Nevertheless, as we shall see, given the amount of effort that has gone into such developments, the lack of any reasonable outcome at a moderate cost after years of effort should be enough to convince us that we must pursue computer-based reality in other directions.

If the above seems overly sceptical, then a short time surfing the web and looking for large-scale, detailed and accurate VRML projects - or even demos - should convince the reader otherwise, Show me the software doing now what I want it to do is still the stipulation, but there are more promises than fulfilment. Acknowledging the great skill and achievements of those working on algorithms to plug the holes in VRML and like methodologies - this still does not add up yet to any set of easy-to-use general-purpose tools that address the twin problems of detail and accuracy.

What is more, it is conceivable that VRML is finished as a development medium, for new plugins (so that VRML may be presented to a web browser) are thin on the ground; and the adventurous (and apparently easy-to-use) program canoma died without even reaching a second edition.

Photographic not Virtual Reality? For those who need detailed, accurate and cheap-and-easy solutions to the problem of providing some heightened impression of a building or site than can be had from flat 2D window-on-the-world photographs or computer models, we are thrown back on extensions to photography. Keep it simple is the motto here, and we mayh arguably achieve our aims by expending less effort than would be required for even a simplistic VRML model. For example, a PhotoModeler demo-model looks better the larger the number of photographs employed for building it; so that, for constructing a complete immersive world, an extensive photographic campaign would be needed. So why not use a similar number of photographs to produce panoramas, stereo pairs, and imagemaps? I hope to deal with these far easier and cheaper technologies in a future paper.

Borobudur: a "simple" task for VRML?



As we shall see, making a VRML model involves building a geometrical armature, and then "clothing" it with its detail, which is usually derived from photographs. One of the great monuments of Buddhist sculpture, constructed on the island of Java about 800 AD, Borobudur should be on anybody's list of the ten greatest art-complexes in the world for its size, quality, sophistication and excellent state of preservation. The mere figures for Borobudur (e.g. 2500 square metres of narrative and decorative reliefs, 504 sculptures) are stupendous, and make the effort involved in sculpture for a run-of-the-mill western cathedral look relatively puny.

As well as the importance of the monument per se, we chose Borobudur because of the local availability of a high-quality photographic survey, and also because we knew that VRML could handle the decorative and figural reliefs easily, each one being rectangular and consistent in size, so that clothing the model involved simply pasting each relief in its appropriate position on the armature. Moreover, because of the dour mid-dark-grey of its stone, Borobudur is one of the few monuments that may reasonably be "rebuilt" using greyscale photographs, because this is the colour of the stone, and we have no information on whether/how it might have been stuccoed or coloured. In other words, from the start we were in a sense tailoring our choice to the technology available - whereas arguably we should have been assessing the ability of VRML to handle the difficult elements - namely the three-dimensional sculptures. and then surely dismissing VRML as inadequate and abandoning the project. Instead, we chose to concentrate on the relief sculptures, and modelled the 3D ones and their complicated architecture once, replicating each section as required, as can be seen in this view of the whole stupa:



Building the VRML Model The first task was to establish an image database of the sculptures, reliefs and views of Borobudur, and to present them as an indexed sequence of HTML pages, with thumbnails, record fields and hotlinks to the larger images. This would be used to locate and then paste the reliefs onto the armature:


We then produced a large (1.3Mb) CAD (Computer-Assisted Drawing) version of the stupa, with measurements, hoping to use this for the semi-automatic construction of the VRMl model, since such CAD packages can output into VRML aand many other types of code. However, this approach was estimated to be too slow, because each texture (i.e. photograph of a relief) needed pasting in place by hand. Dr. Ajay Limaye, of our Supercomputer Facility Visualization Laboratory, therefore constructed his own VRML model, and automated the clothing-with-photographs process by means of perl scripts.

The resultant VRML model presentation is very flexible: it provides automatic tours around the stupa, or tours "driven" by the user, which may be slowed down, stopped, or reversed. The user may zoom in or out, and indeed bring up both large images of the reliefs and (where appropriate) an account of the stories they relate on a separate HTML page. Because of network and machine speeds, the project was sliced up into manageable sections, and also offered at different resolutions, with images of 128, 256 and 512 pixels (cf. the VRML home page here

Navigation around such a large project could also be a problem, so we provided a translucent plan of the site, with a red dot which moves to indicate the exact location (left-hand image) and a second overlay which allows the user to change to another part of the stupa, and to change image resolution (right-hand image):



Finally, VRML allows another memory- and network-saving tool, namely the loading of images as the user approaches, and the dropping of frames once they are "behind" the user. This can be seen here, with the more distant reliefs not yet loaded in the left-hand image:



VRML and Complicated Projects Although Borobudur is an intricate model teeming with images, it is a regular structure, with much the same relationship between reliefs and backing wall in every gallery. Only the stupa terrace, with its bell-shapes, is a little different. The VRML model was built by hand for two reasons. I have already given a reason for Dr Limay building our VRML model by hand (automation); but there is another reason, namely complication. That is, the complexity of the profiles and curves built into our initial AutoCad model were too great for VRML to handle in files of manageable size; so we could not simply translate from one format to the other and so, once again, we tailored the project to the capabilities of VRML.

Programs for automating the processing of photographs into models are certainly more developed and more flexible than when we began Borobudur in 1997 (cf. ShapeCapture, ImageModeler and PhotoModeler), but none of them offer much automation, and they all depend on a human being defining and locating dots and lines in space so that the program can turn them into 3D shapes and volumes.

From our experience, the problem with the computer modelling of large objects lies precisely with the inability of any program to "recognise" forms from flat photographs, and then build three-dimensional models from those forms. (Note that small objects that can be placed on a turntable and processed by camera+computer work very well in VRML, using laser-scanning or a digital camera; larger objects such as Michelangelo's David require expensive equipment). Since within normal budgets constructing the geometries must be done by hand, the trade-off is between time and quality. The longer the time spent plotting lines, edges and curves, the greater the precision, and the better the model. Again, the simpler the source, the better-looking the result. Simple cubic structures work well, but curves/domes/profiled pediments do not. Thus Le Corbusier's Villa Savoie at Poissy could be attempted, but Baroque churches - like Borobudur with its elaborate Buddha niches - seem too intricate for speedy construction. It is not coincidental that v.4 of Photomodeler uses two children's wooden building blocks for the first tutorial, nor that a main use for the program is in car-crash investigation. For any such program, large quantities of images (preferably from all angles) are required, as we have already seen. And for accuracy, alignment targets fixed to the architecture might be needed; another technique some software uses to increase accuracy is the sophisticated measurement and cognizance of camera focal lengths.

Given the time, care and hand-construction needed by such programs, it is not surprising that I can find no large-scale VRML models produced with such software that approaches the complexity required by the real world of architecture (curves, domes, entablatures, statues). Nor is it surprising that some software engineers actuall recommend a highly selective use of both reality and detail in their models. Small wonder, therefore, that enthusiastic predictions about the future of VRML have run into the sand.

Shortcoming of Model Construction in VRML Even a cursory examination of the Borobudur model will reveal shortcomings which stem from the almost impossibly complicated nature of real life, and which the computer cannot model - and, I venture to suggest, will never be able accurately to do, within sane cost boundaries. The problems with the Borobudur model (shared by all other VRML models I have seen) include:

  1. Level of detail: The model does not offer a detailed representation of the stupa, because we could not afford the time necessary to specify every curve and profile. The reliefs are well-lit and already in high relief; had we needed to model each detail by hand (for over 2,000 reliefs), the project could not have been started; as it is, we "cheated" because the photographs of the reliefs already give the impression of being 3D;
  2. Duplication of difficult elements: From (1) follows the usual trick for model building - namely to build one repetitious element (column and capital, cornice, niche), and then simply to multiply that single build however many times are required. In Borobudur, this is evident from the Buddha niches: one was modelled, and then repeated as needed.
  3. Level of accuracy: Given (1) and (2) above, it is hardly surprising that, while the model gives a good general impression of the stupa, it cannot be used for detailed study because it simplistic and impressionistic rather than accurate. A researcher concerned with any element apart from the reliefs would do better to use the photographic survey. Indeed, with the model we have in one sense moved backward from the photographs: it is arguable that we use computers to improve on what was once done by hand (searching databases, or comparing files) - whereas with Borobudur computer software plus human intervention has been unable to improve on photographs taken decades ago, and has in fact downgraded their impact and detail with simplifications required by the software;
  4. "Theatre scenery" effect: modelling of walls is done in two dimensions, so the walls have no thickness and appear like stage flats; and, as on a stage, if we go behind the scenery, what we see is either blank, or a simple repetition of what is seen from the front. In other words, such a short cut gives us the Buddha niches above the reliefs; but to model all four sides of such niches (plus from the top) would have cost too much to complete satisfactorily;
  5. Dead textures and general lifelessness: In spite of the availability of various lighting algorithms, and of the strong lighting in the photographs of the reliefs, the Borobudur model, especially in the computer-constructed parts such as the Buddha niches, the gallery pavements, and the general exterior views, has the insipid deadness of the artificial. It does not look like part of the real world, and lacks substantiality, and weight;
  6. No people: People are as difficult to construct in VRML as sculptures, and the odd elephant and retinue ambling in fron of the stupa helps only a little with suggesting depth and real-world, breathing objects;

Context through scale and immersion There is no adequate substitute for visiting an actual site - and no evidence yet suggests that computers will be able to provide one in the near future. So can the Borobudur VRML model approach verisimilitude and, if so, how much better than just the window provided by a computer monitor? One technique is to get rid of the window-like paradigm of the monitor, and project the image(s) onto one or more walls, so that the user can stand "within" the structure(s) represented. Thus the Wedge at our SuperComputer Facility Visualization Lab is a room with two large screens (each about 3 metres long by 2.2 metres high) set at right-angles, onto which are back-projected split computer images which are viewed in stereo through glasses, to give added "space" and "reality". A Head-Mounted-Display (for turning left or right, up or downt) and a hand-held wand for moving forward or drawing back are provided. Similar facilities exist around the world - for example Manchester's Visualization Immersive Projection Laboratory.

Museums, Panoramas and Virtual Space Given that museums are by definition artificial environments of works extracted (for whatever reason) from their context, and that computer software holds out the expectation of the reproduction of physical environments, it might follow that museum curators should become enthusiastic users of visualisation techniques, as a way both of being "leading-edge" in our web-mad world, and of increasing "visits" both virtual and real as justification for their existence and funding needs. Indeed, the lack-of-context problem is exacerbated by the fact that objects from one context can be spread around the world (e.g. the Parthenon frieze, the Pergamum Altar, several of the great collections of artworks formed in the Renaissance, or any loan exhibition anywhere). And some museums and galleries have indeed had brief love-affairs with VRML.

Ambitious plans have been laid. The Guggenheim Virtual Museum, for example, in a web page last modified in September 2001, will not only provide global access to all Guggenheim Museums and their services, amenities, archives, and collections but will also provide a unique and compelling spatial environment to be experienced by the virtual visitor. In addition, the virtual museum is an ideal space for the deployment and experience of art and events created specifically for the interactive digital medium where simultaneous participation, as well as viewing is made possible for an audience distributed around the globe.

If VRML can be too labour-intensive and time-consuming for many applications, then technologies such as QuickTime and related panorama- and wide-angle-producing software can help. Computer panoramas are, as we have seen, the direct descendant of full-size panoramic viewing galleries popular all over Europe from the Enlightenment onwards, such as the Eidophousikon - but with the distinction that they can be multi-layered, with hotspots which allow movement at the behest of the user, and the display of sets of images or text-pages "behind" the hotspots.

Panoramas may be "continuous", through 360 degrees, or just some segment of wrap-around space. As with VRML, the image can be panned and zoomed, and hotspots inserted which move the user to separate HTML links or pages. Very high-quality zooming is possible, but the technology entails the intricacy of fitting panels representing high-quality images within the panorama itself, thereby effectively offering a panorama with two different resolutions - a low one for the panorama itself, and a higher one for the artworks. The disadvantage (beyond the fact that this works well only for rectangular objects) is the amount of computer memory etc required to load and manipulate the panorama, but the finished effect is a good one. The user can approach the target object, and simply keep on zooming. In the example below, from the Dobell Exhibition at the Drill Hall Gallery of the Australian National University (September 1999), the large painting to the left of the first image is approached in the central image, then zoomed into in the right-hand image. Only at a relatively high-resolution does the zoomed image begin to break up - whilst the frame of the painting (at the lower, "panorama" resolution) would be very pixellated:



Of course, making such a digital record of the Drill Hall (or any gallery or museum) suggests that the space provided is somehow ideal: we record the space, when we should perhaps be concentrating on the works displayed. Assuming no display space is ideal, and that every exhibition requires different display parameters, it may be argued that the construction of a purely digital space can provide a better record of a physical exhibition than the collection of walls, windows, glare, shadows, glass and shiny floors that bedevil accurate photography in many museum spaces.

The Virtual Museum as the Ideal Museum

Given the difficulties faced by many physical museums - lack of display space; 90% of items in store; distant from the majority of any persons who might want to visit; with only some of the items available that are needed to build the "context" of an art work), perhaps what we need for the Web is an Ideal Digital Gallery - a piece of purely digital real estate. This would be available 24 hours per day, with all information downloadable, with no glare, great light on all objects, catalogue immediately available, and no other visitors getting in the way of our viewing! Such a gallery could be of any scale, its walls and spaces of any colour or shape, with each curator able to call up ideal conditions suited to the different kinds of works to be displayed. "Hanging the exhibition" - i.e. positioning all the materials that make up the display - would be done at the computer screen, preferably using a Web browser, as would sizing the exhibition space, painting the walls, and priming the inevitable Virtual Coffee Machine.

With limited or diminishing resources, should we have virtual museums instead of physical museums, converting the latter to storerooms for specialists? Is it important not to conflate the idea of the museum with its physical site which, after all, may be difficult of access, and of inadequate size to display all its treasures? To promote virtual museums is not, of course, an attack on visiting real sites - on cultural tourism, one of the treasured institutions of our culture, but rather the promotion of ways of reintegrating works in their context, bypassing the piecemeal approach perforce found in museum displays. Of course, seeing the actual objects cannot be beaten, but we should acknowledge the pressures militating against both visiting sites and viewing travelling exhibitions (the two sides of the same mirror, as it were). As well as access and display constrictions, some sites (such as the Acropolis at Athens) are so heavily tramped that sections are roped off. Insurance and conservation problems sometimes militate against travelling exhibitions of great works today, as fewer borrowing institutions can afford to insure, and fewer lending institutions want gaping holes amongst their treasures. Sometimes only the blockbuster, with the look, feel, intellectual level and catchment of Disneyland, survives. The ideal digital exhibition, on the other hand, can offer all the works required to everyone, without cheapening the presentation into a fairyland.

Conclusion: Keep it simple! The disciplines of Art History and Museology must certainly address the growing importance and flexibility of digital multimedia as aids in teaching, learning and presentation if they are to get away from the paradigm imposed on the one hand by the restrictions of traditional "lantern slide" technology and on the other by the cultural baggage of traditional perceptions of what museums are and should be. I already lecture with digital images from my website onto a screen in the lecture theatre; and there seems little reason why, to add verisimilitude to learning, The Wedge as discussed above should not make its appearance in the lecture theatre - at the same time as (ironically) miniature wireless-fed computer/viewing devices suggest the appreaching death of the "physical" lecture itself.

In our new virtual education world, what place will museums and galleries play? Are they able to adapt and somehow integrate their objects with that virtual world? Until a virtual revolution occurs in Virtual reality software, museums and galleries, like lecturers, must content themselves with relatively simple technologies - computer equivalents of the nineteenth-century vogue for giant panoramas and stereo viewers - and add video and the facilities of the web (hotspotting, imagemaps, and sound) to their armoury. To persist with technologies such as VRML in the face of evidence of its inadequacy for producing high-quality, detailed "worlds", is simply to waste scarce resources. And if "physical tourism" continues to increase, the same question will be posed about important assests such as World Heritage sites for which the aim of UNESCO is to protect natural and cultural properties of outstanding universal value against the threat of damage in a rapidly developing world. If to the delivery capabilities and flexibility of the web, we add wall-sized displays and/or miniaturised wireless computers which are truly personal, then it will develop from a narrow "window on the world" into a vehicle for more immersive and user-controlled experiences, no matter what the subject-matter.

5th June, 2002
Michael Greenhalgh
The Sir William Dobell Professor of Art History
The Australian National University
Michael.Greenhalgh@anu.edu.au