Parser grammar power to the power of two!

By LucasGodzilla in Programming February 26, 2018

In my last blog post, I described how we have to do a bit of grammar analysis with the parser in order to properly interpret more complex sentences. If you have not read the article, you should do so now, as the following will build upon what I described last time.

While the solution I offered up last time to detect and process subjects and objects that may be decorated with an adjective, the problem is that not all adjectives are inherently adjectives. In many instances, we will detect them as being nouns and only the context defines that they are being used as an adjective. The word “lamp oil” is a perfect example of this because it describes one subject consisting of two nouns.

In a minimalistic parser, when entering look at the lamp oil, the software would interpret it as look at the lamp, followed by the word oil without further reference. The parser could theoretically infer that we wanted to numerate through the nouns and interpret it as look at the lamp, look at the oil but neither actually reflects the original intention.

In my last installment I solved this problem by inserting a stage in the parser that was looking for specific word combinations to identify such decorated nouns by checking if the first noun is “lamp” and the second noun is “oil” and then turning it into a single noun named “lamp_oil.” The solution is perfectly feasible and valid. But is it good?

No, not really—for a number of reasons. First of all, because the substitution happens after the entire sentence has been processed, the slot for theNoun2 has been used up which means a third noun would not be processed. (Note: In its current implementation, my parser only processes two nouns, something that can be easily rectified, but I’m just trying to make a point here.) As a result, a more complex sentence like

get the lamp oil from under the bed

would not be correctly processed because it contains three nouns. That’s not good.

Secondly, and perhaps even more importantly, the command the player entered could have been

dip the lamp in the oil

If the parser did the methodical substitution described above, we would end up with an interpreted command that says

dip the lamp_oil with

Whaaat?? Dagnabbit…

Clearly, grammar is not as simple as just that. A bit more work is needed to really nail down the inherent meaning of sentences.

The sequence of words in a sentence is important.

The sequence of words in a sentence is important. We know that in the word “lamp oil” the “lamp” part always comes directly before the “oil” part. Always. There’s no separating them. If they are not following each other immediately, the meaning changes and it does not refer to “lamp oil” anymore.

This brings us to our first improvement. If we keep track in which order words are being processed, we can then check if two words follow each other. Voilá, the solution.

In order to do this, we create a counter that is incremented every time the parser processes a new word. We then assign that serial number to the respective verb, noun, preposition, adjective, etc as they are being tokenized.

if Vocab [ _cleanWord ] [ "type" ] == WordType.Verb:
	if not globals.theVerb:													# If no verb found yet
		globals.theVerb = Vocab [ _cleanWord ] [ "meaning" ]
		globals.theVerbString = self.TokenLookup ( globals.theVerb )
		globals.theVerbSerial = _serial

With this in place, we can now perform specific checks to identify subjects that are decorated by adjectives or another noun, as I do in the following example with the lamp oil.

if Tokens.Lamp == globals.theNoun and Tokens.Oil == globals.theNoun2:
	if globals.theNounSerial == globals.theNoun2Serial-1:
		globals.theNoun = Tokens.LampOil
		globals.theNounString = self.TokenLookup ( Tokens.LampOil )
		globals.theNoun2 = None
		globals.theNoun2String = None
		globals.theNoun2Serial = 0

This approach prevents a lot of misunderstandings and it addresses both problems I mentioned earlier. Because we are analyzing the sequence of words, we can now move this check from the Grammar() function that is performed after all words have been processed and move it into the InstaCheck() function instead. The immediate benefit is that upon encountering such a compound word, the parser now immediately frees up the second noun, making room for more words to be processed. This makes it possible to correctly interpret a command like

get the lamp oil from under the bed

Because these kinds of checks follow the same structure over and over again, I decided to create a general, parametrized method for it that can be easily called from within the InstaCheck() function.

def UnifyTwoNouns ( self, token1, token2, token3 ):
	if token1 == globals.theNoun and token2 == globals.theNoun2:
		if globals.theNounSerial == globals.theNoun2Serial-1:
			globals.theNoun = token3
			globals.theNounString = self.TokenLookup ( token3 )
			globals.theNoun2 = None
			globals.theNoun2String = None
			globals.theNoun2Serial = 0

To show you how it looks like, here is a snippet from my InstaCheck() function. See how neat and clean this is? Easy to maintain, easy to add new compound words to it and easy to extend for even more logic.

def InstaCheck ( self ):
	""" Check for word combinations that can be instantly replaced, while still parsing the input """

	if Tokens.In == globals.thePrep and Tokens.Front == globals.theAdjective:		# In front
		globals.thePrep = Tokens.Before
		globals.thePrepString = self.TokenLookup ( Tokens.Before )
		globals.theAdjective = None
		globals.theAdjectiveString = None

	if Tokens.It == globals.theNoun:												# Handle IT
		globals.theNoun = globals.theLastNoun
		globals.theNounString = globals.theLastNounString

	self.UnifyTwoNouns ( Tokens.Lamp, Tokens.Oil, Tokens.LampOil )					# Lamp oil
	self.UnifyTwoNouns ( Tokens.Jewelry, Tokens.Box, Tokens.JewelryBox )			# Jewelry box

	self.UnifyAdjNoun( Tokens.Brass, Tokens.Key, Tokens.BrassKey )					# Brass Key
	self.UnifyAdjNoun( Tokens.Small, Tokens.Key, Tokens.SmallKey )					# Small Key

There you go… our parser just got a whole lot smarter yet again. It’s not some abstract, behind-the-scenes improvement but something that directly affects the player’s experience, because the parser grammar functions will misinterpret the player input far less frequently while, at the same time, automating a lot of the logic for the game designer who doesn’t have to think about grammar pitfalls and can instead focus on simple meaning.

This way you can easily parse even the most complex commands, such as Use the trowel to plant the pot plant in the plant pot, an example that text adventure developer Magnetic Scrolls reportedly used to show off its parser’s prowess.

Tags: adventure game, interactive fiction, parser, programming, python

2 Replies to “Parser grammar power to the power of two!”

Robert
November 17, 2022 at 1:09 pm


Hi …came across your site when searching for text adventure parsers.
I am prototyping a Traveller RPG game. What I am trying to do is create a system where the game master can create adventures using text files to define the environment, places people objectives and how to achieve the objectives.
- LucasGodzilla
  November 17, 2022 at 2:01 pm
  
  
  That sounds very interesting. Something like that, I would probably approach using YAML, which allows you to have an easy, deterministic approach to your data management.

2 Replies to “Parser grammar power to the power of two!”

Leave a Reply Cancel reply