Monday, March 3, 2008

WHAT YOU CAN'T SEE IS WHAT YOU DON'T GET: Life and death of the third-person narrative in the current visual paradigm

I have been playing The Witcher lately. This Polish RPG has got positive reviews in both Old and New World, even with its minor but obvious flaws, inconsistencies and glitches (but CRPG fans cannot be picky these days...).
One of the inconsistencies ruined my suspension of disbelief big time. As usual with Baldur's Gate-inspired RPGs, most of the narrative is revealed in conversation with NPCs. But, given the limited animation they have, the NPCs sometimes speak as if they were doing something else that they're doing in their visual representation. The chemist in the chemistry workshop welcomes you while mixing substances to make an explosive, and asks you (or, the player character) to hold your breath, so that you don't interfere with the experiment. What struck me as strange was the fact that while he was describing the delicate operation he was performing, his "avatar" was justing standing around with bare hands, seemingly engaged in casual conversation. This raised two uncomfortable questions: 1) Why is he describing his actions, if ("in the reality of the game world") I should be able to watch them for myself? 2) Why can't I see it if it obviously is happening "in the reality of the game world"? This is not an uncommon incosistency, especially with RPGs.

The very investigation of why this is so strange made me investigate the modes of visualization of game worlds. I do not want to take any shortcuts, but I immediately thought the discrepancy could be handled by switching to less iconic representation, in which the character's action could be described by third-person narration. It would make so much sense, given that The Witcher is a novel adaptation - and what an awesome novel it is! But written language is pretty much banned from contemporary AAA games (portable consoles being the exception). Could its use enhance the storytelling and gameplay experience? And what is its relationship to the current visual paradigm?

First, let us take a look at the history of the rhetorical relationship between the fictional worlds and their visual representation in digital games. I would like to distinguish between two approaches, that make more sense once they're opposed to each other. First of them, we might call illusionism. In games using this approach, there are no signals that the actual game world should look otherwise that its representation or that there could be more to it than you can see. The world of Mario Bros., for example, is exactly what it looks like in the game. The same holds for Doom. Of course, each of us can make his own mental image of this world, but the graphics of the actual game will be the primary source. Graphics come first.
The second one, we might dub illustrationism. In this approach, the graphics are obviously just a part or a version of what the actual world looks like. They are mere illustrations. The game admits that it does not show you everything. One of the best examples of this approach is that of early illustrated text adventure games. In those, graphics served to the same purpose as illustrations in printed books do. They were hints for imagination and most of the world description was presented as written text. This approach was, at the time, arguably better for conveing images of more complex worlds and more complex actions, because not all of them had to be seen, much of it could be just verbally described. In other words, the world comes first and the graphics try to catch up. ASCII roguelike games are even beyond illustrationism: in these, the ASCII characters are not "representations" of the game world characters in an iconic way of visual resemblance - they are simply indices, placeholders. They just show the spatial whereabouts of the character in relation to others.
The illusionist graphics maintain the illusion that you can see all there is to be seen. And within the current trend in rendering and display technologies and the ability to look at any game object from any angle, it is the prevalent paradigm - because it would be humiliating to admit that there are limits to things that the graphics engine can represent visually. The graphics engine is the device of the objective truth of the game world. Markku Eskelinen, everyone's favorite extreme ludologist, describes this (in one of his most inspired moments) as some kind of military ideology: everything can be consistently seen (in a constant level of detail) and mapped with the world divided into several "zones". It is indeed militaristic given that it is the ideal form of expression for first person shooters - in which you need to SEE in order to survive. There's no time for fuzziness or fancy.

Some video game genres, however, tend to contain complex storylines enacted within the game, and those are for most part adventure games and RPGs. The complexity of the worlds makes it incredibly hard for the developers to elaborate the game world visually in a constant level of detail. And there are basically two ways of dealing with this.
One of them is editing. It works really well in adventure games. In Secret Of The Monkey Island, for example, it would be unsustainable to draw the graphics of the whole game world on the same level of detail as the main locations have. That's where the bird-eye-view maps come in, on which the main character is represented by nothing more than a microscopic dot. When the "moving around" part is foregrounded and the "interaction with close surroundings" part is backgrounded in the gameplay, the change of perspective follows. The game thus presents the player with two or more cohesive, but distinct views of the world.
Another way of editing is the synecdoche effect many adventure games players are familiar with: representing a certain setting in a fictional game world by just a part of it. To give a fairly recent example, in the Abe Lincoln Must Die part of the new Sam & Max series, you are able to enter the White House - but the only room you can get to is the Oval Office. The Oval Office stands for the whole White House and it is the place where the action is. There is no explicit explanation of that, it's just a convention. As you enter the White House, you go directly to the Oval Office. This space inconsistency is enabled by the fact the gameplay is localized to certain spots in the game world and there have to be no "common laws of physics" valid across the boundaries of the game locales.
In the more space-conscious games of the yester-year, the space is preserved at the expense of visual detail. The lack of visual information is then compensated by hybridized code, or in other words, by augmentation by written language. It's no big news that language is an incredibly powerful medium, capable of not only compensating for missing visuals, but bringing in new information as well. Let's take a look at Wizardry VII: Crusaders Of The Dark Savant, a 1992 psychedelic RPG that uses a lot of written language by the virtue of being a "fantasy role-playing simulation". Pretty much all of the graphics in this game is just a basic rendering of the game map made out of floor and wall tiles (plus monsters and NPCs). There are four sets of these, which is enough to distinguish an "underground" locale, "town", "forest" and "cave", but not enough to distinguish between the "throne room" and the "storage room". The visual representation is more a representation of "the map" of the world in the indexical sense: it helps you locate yourself, gives you to structural backbone of the environment. The fictional gameworld is distinguished and described in further detail in the third-person "dungeon master" narrative:

This way, you can build a huge game without enslaving whole nations to draw graphics for you. And when the writing is good, you can actually smell the places you go to. This it not to say this illustrationist approach was common to all CRPGs of that time - Lands Of Lore, for example, went on to visualize as much as it could.
In the next generation of Western CRPGs, there seemed to be a tendency to concentrate the writing into the dialogs. The late 90's Infinity Engine games such as Baldur's Gate and Planescape: Torment offered a pre-rendered environment in a fixed-angle top-down view. This gave a good overview of the general situation, but left a lot of space for the player's fancy in terms of the looks and gestures of the characters, and could not really capture minute details. Planescape: Torment, widely considered to be the pinnacle of digital game storytelling and a game of deep philosophical insight, is an incredibly text-heavy RPG. In fact, you don't get to SEE the most interesting stuff in the game. You just READ about it. The game designers simply abused the Infinity Engine dialog box to include not only dialogs, but also memories, thoughts, object interaction and environment description. The story is bursting out of the engine, because it is too strong and complex to be captured by it. In the illustrationist manner, the game world itself existed (in a way) prior to the game, as it is a licensed product, and it is simply too weird for any graphics engine to do justice to it.

Now, I know that the much of the gaming industry operates based on a projected target audience with short to no attention spans, but by giving up on written text, many contemporary games just lose a lot of their expressive potential. It is strange that the narrative non-diegetic voice has been abolished, while other non-diegetic elements (HUDs, gameplay information) have been retained. Because of this, the assumption that videogames remediate the visual politics of film (which is largely illusionist, but does not usually contain any extradiegetic visual information) is not so clear and obvious.
Good storytelling in the illusionist visual paradigm CAN be done, and Bioshock could be an example - but in Bioshock, the world is insular, there are no NPCs and not much variety in the player character's actions (which is not to say it is not an excellent game).
But to fit a sizable world to this paradigm takes a lot of effort - and still it cannot be done in a consistent way. The Witcher, on one hand, tries to maintain the illusionist view, never resorting to non-diegetic narrative and being spatially consistent. But when it comes to dialogues, the fictional world's story is, again, stronger than the engine. This time, the contents of the dialogue are directly contradicting the visual representation (in the previous examples the verbal information rather added up to build a complete picture). I cannot see the chemist performing the mixing, because this character animation is not in the graphics assets of the game. But why do I see him doing something different? Instead of seeing a detailed, realistic representation of something ELSE, wouldn't it be better to just see his portrait and a read/hear a verbal description of what he is doing? I think it would, paradoxically, contribute to the suspension of disbelief.
I still think that The Witcher is an awesome game (there is drug dealing, vampire prostitutes and ex-lycanthropic girlfriends, mind you!), but it strikes me as a game that would benefit from a third-person narrative. That is not to say I am an interactive fiction advocate - I find pure text adventures just too hard to find my bearings in - but I believe that language can express so many things in one sentence that would take hundreds of artist-hours to visualize. Take T. S. Eliot's "streets that follow like a tedious argument / of insidious intent", for example. Do we shut ourself off settings like this by insisting on the fact everything in the game must be seen in 3D?


  1. It could be that I'm not quite getting Raph Koster's Metaplace, but it seems like one thing the Metaplace client will be able to do is scale up/down from text to 2D to full 3D... if that model takes off, does that mean that we'll have environments that are -- at least potentially -- equally lush in text descriptors as graphics?

