Natural Language Processing

June 15, 2011 at 10:05 AMmgordon

I can’t remember a time when I didn’t have an interest in home automation.  Finally, I decided to take the plunge and started thinking about  what a system might look like.  At the top of my list of requirements was a way to communicate with the system in sentences…via email, SMS, IM and voice.

 

Requirements

What I had in mind was a something that would take what was input, figure out what was being asked for and then perform some action based on how the sentence was interpreted.  The system didn’t need to be able to figure out any possible sentence, but at the same time I didn’t want to have to remember an exact syntax for each command, so some flexibility needed to exist, as well.  Sense I predict the system will be expanded upon in an ongoing way, the addition of functionality and the commands to invoke it had to be relatively easy to add.

 

Existing Tools

I did a fair amount of research into what tools were available and invested a good deal of thought into how I might use them.  As for existing tools, I found that most (like Antelope and SharpNLP)  use maximum entropy to determine the meaning of a sentence.  They break down the sentence into its constituent parts such as phrases and words and define the part of speech each represents.  For each word, it’s also possible to look up its synonyms and definition.  I found that this approach, however, would be quite labor intensive to use.  Also, these tools, curiously, did not understand the concept of certain temporal phrases like “tomorrow” or “last Friday”.

I though these tools could be useful in that a command could be pre-processed and certain parts of it, like the verb and object, could be stored away.  Then, when a command was input, it could be processed and then compared with the stored information.

About this time, I came across Ian Mercer’s blog.  There, he describes an NLP engine that he has built and it became the inspiration for the engine I ended up building.

 

Step by Step

There are three steps I take in processing an input phrase or sentence, Tokenize, Organize and Evaluate.  The input is parsed and tokens are created for the recognized portions of the string.  These tokens are then grouped and organized into a particular order so they can be compared to method signatures defined  to represent recognized commands.   You’ll see that this approach also allows for a good deal of flexibility in how the command is expressed by the user.

 

Tokenize

There are several token classes defined and these are all instantiated at system startup.  Each is responsible for searching out one particular piece of text and correctly interpreting it. The token class also keeps track of where in the string the text was found and its length.  The token classes range in complexity from a simple text search to converting a piece of text into a DateTime or numeric value.  Each token class returns one or more TokenResult classes.  Each of these contains information about the text that was tokenized, where it was found, its length and its value.

The value returned depends on the text being tokenized.  For example, if the text “12” was found to be in the input by the TokenNumeric class, it would return an int, long and double with the value of 12.  It would also return an ordinal value indicating the 12th in a series and DateTime representing the 12th day of the current month. 

All of these Token classes inherit from a Token base class and are in class hierarchies of varying depths.  This allows for the use of polymorphism during the evaluation step.  We’ll see more about this later.

An example of a simple Token class would be the TokenTurn class:

 

[DataContract]
   public class TokenTurn : Token
   {
       public TokenTurn()
       {
           Words = new List<string> { "turn", "set" }; 
       }
   }

 

The base class, Token, defines the member variable “Words” and a method, “Parse”, that knows how to locate occurrences of any of the words in the Words collection.  So, in this decendant class, all that’s required is the definition of the strings to locate.  Once an occurrence is found, the TokenResult that’s returned notes the fact that it was a TokenTurn that located the string.  We’ll use this information later when we look for matching command definitions.  Notice that we have defined two words in our Words collection.  Since either could be found and in either case the meaning of the word would be interpreted as having the same meaning, we are in essence defining synonyms.  This is one way that flexibility is introduced.  For example, the user could say, “turn livingroom lamp on” or “set livingroom lamp on” and the command would be invoked the same way.

In some cases, another level of hierarchy exists.  One example is with the temporal tokens.  The TokenTemporal class knows about its own set of sub-tokens, if you will, that know how to parse bits and pieces of a Date or Datetime.  It cycles through each of these tokens and collects the token results.  It then evaluates them to determine a single date or time value.  This value, in turn, is passed back up the call chain to the main processing logic.

 

At system startup, all Token classes are located and instantiated using reflection.

 

private static List<Token> GetTokens()
       {
           var tokens = new List<Token>();
           var assemblies = new List<Assembly>();
           var currentAssembly = Assembly.GetExecutingAssembly();
           string path = currentAssembly.Location;

           foreach (string dll in Directory.GetFiles(Path.GetDirectoryName(path), "*.dll"))
           {
               try
               {
                   assemblies.Add(Assembly.LoadFile(dll));
               }
               catch (Exception)
               {
               }

           }

           foreach (var assembly in assemblies)
           {
               foreach (Type type in assembly.GetTypes())
               {
                   if ((type.Namespace == "StructuredSpeech2.Tokens.Nouns" ||
                        type.Namespace == "StructuredSpeech2.Tokens.Prepositions" ||
                        type.Namespace == "StructuredSpeech2.Tokens.Temporal" ||
                        type.Namespace == "StructuredSpeech2.Tokens.Verbs" ||
                        type.Namespace == "StructuredSpeech2.Tokens") &&
                       type.Name.StartsWith("Token") &&
                       type.Name != "TokenResult" &&
                       type.Name != "Token" &&
                       type.Name != "TokenNoun")
                   {
                       ConstructorInfo ci = type.GetConstructor(Type.EmptyTypes);
                       tokens.Add((Token)ci.Invoke(null));

                   }
               }
           }
           return tokens;
       }

When input is received, each token class in the collection has it’s “Parse()” Method called and the input is passed in.  The return from this method is a collection of TokenResult instances.  These are all added to a collection.

 

var results = new List<TokenResult>();

            foreach (var token in Tokens)
            {
                results.AddRange(token.Parse(input));
            }

 

Organize

At this point, we have a collection of tokens that were identified.  They are in no particular order and order is going to be important to us when we start looking for a matching command definition.  So, all the tokens are organized into a dictionary of List<Token>.  The dictionary key is the start position of where the tokenized text was found in the input.  This way, we have an entry for each start position and a collection of Tokens that represent the particular piece of text found there.

 

var buckets = new Dictionary<int, List<TokenResult>>();

            foreach (var result in results.OrderBy(r => r.Start))
            {
                if (!buckets.ContainsKey(result.Start))
                {
                    buckets[result.Start] = new List<TokenResult>(); 
                }

                buckets[result.Start].Add(result);
            }

 

Evaluate

As I’ve alluded to many times, there are also method signatures defined to represent each command the systems knows about.  This methods might look like this:

 

public static void GetWeather(ConversationContext cContext, TokenList list, TokenWeather weather)
        {
            cContext.Say(WeatherService.GetTodaysWeather(cContext.ConversationUser), null);
        }

Each method defined for this purpose takes as its first parameter an instance of ConversationContext.  This object represents the conversation up to this point and allows a method to search back through the context to find things of interest.  For example, if I request a list of things from the system and it returns a list, I can add that list to the conversation context.  At some later time, if I ask the system to delete item number 4, how will the system know which item was number 4 unless it has access to the list that was returned before?  You might say that it could just re-retrieve the list, but what of the list of items has changed somehow?  If an item has been deleted from the list since it was returned last, number 4 might now be number three and the wrong item would be deleted. 

The context is also persisted in the database for durability.

The next two parameters are each tokens and we’ll try to match these with the tokens we have parsed out.

At startup, reflection is used to load all the methods into a collection.

 

var rules = new List<MethodInfo>();
           var assemblies = new List<Assembly>();
           var currentAssembly = Assembly.GetExecutingAssembly();
           string path = currentAssembly.Location;
           IEnumerable methods = null;

           foreach (string dll in Directory.GetFiles(Path.GetDirectoryName(path), "*.dll"))
           {
               try
               {
                   assemblies.Add(Assembly.LoadFile(dll));
               }
               catch (Exception)
               {
               }

           }

           foreach (var assembly in assemblies)
           {
               List<Type> ruleClasses = (from t in assembly.GetTypes()
                                         where !t.IsInterface
                                         where !t.IsAbstract
                                         where t.GetInterface("StructuredSpeech2.IRulesClass") != null
                                         select t).ToList();

               foreach (var ruleClass in ruleClasses)
               {
                   if (ruleClass == null)
                       continue;

                   methods = from m in ruleClass.GetMethods()
                             where m.IsStatic
                             select m;

                   rules.AddRange(methods.Cast<object>().Cast<MethodInfo>());
               }
           }

           return rules.ToList();

We look for any classes that implement the interface StructuredSpeech2.IRulesClass and then load all methods in these classes.

 

When input is received and after the Tokenization and organization steps, we loop through each of these methods.  For each, the parameters each expects are inspected.  The second parameter (remember the first is always our context) is compared with all the tokens found in the first list in the token dictionary to see if there is a match.  If so, the next parameter is compared with tokens in the second list and so on until all parameters have been inspected or we run out of either parameters or lists to compare.  For each method, we retain how many parameters match the tokens we have.

After all methods have been evaluated, we find the one method which matched on all parameters and that method is invoked via reflection.

 

Flexibility

I mentioned earlier that we could simulate synonyms by having our tokens pare out words with similar meaning in our context.  The other greatest point of flexibility is in how the command methods are defined.  Note that in the above example method that the two tokens in the definition were TokenList (which looks for “list”, “get” and “show”) and TokenWeather (which looks for “weather”).  The combination of the token definitions and the command definition allow any of the following to invoke the command.

list weather
list the weather (note that there is no token for “the” so it is effectively ignored
get weather
get the weather
show weather
show the weather

Posted in: .Net

Tags: , ,