Writing a Natural Language Parser in C# Part 5 - Questions and Rules

April 10, 2012 at 12:13 PMAdministrator

This post is part of a series on creating a natural language processor in C#. The other entries in this series are:

Writing a Natural Language Parser in C# Part 1–Why?
Writing a Natural Language Parser in C# Part 2 – Architecture
Writing a Natural Language Parser in C# Part 3–CommandProcessor and ConversationContext
Writing a Natural Language Parser in C# Part 4–Tokens

In this, the last part in this series, I'd like to look at the final step in processing the user's input which is locating a rule that matches the set of tokens that have been generated.  If you'll remember, in the last entry we saw how the user's input was tokenized and how those tokens we then organized into a Dictionary<int, List<TokenResult>> structure where the integer key was the index into the string where the tokenized input was found and the TokenResult itself was added to the list corresponding with that position.  Now that we have that structure, we need to evaluate the rules we have defined to see if there is a match.

What is a Rule

A rule is a method that represents a form of expected input the system can look for.  In a way, the tokens and rules we define make up the system's vocabulary where the tokens represent the words and phrases the system knows about and the rules represent the whole thoughts the system can work out.  For example, look at the following rule definition.

A Rule Definition
  1. public static void GetWeather(ConversationContext cContext, TokenList list, TokenWeather weather)
  2.         {
  3.             cContext.Say(WeatherService.GetTodaysWeather(cContext.ConversationUser), null);
  4.         }

All rules are defined in classes that implement the IRuleClass interface.  This interface has no members defined on it, but is used as a marker interface so that we can find all the types that contain rules using reflection.  Whereas we could use MEF to load all our tokens, we can't do this with rules since there is no commonality between the classes that contain them.  Here, we have to resort to using reflection. 

Each rule will take a ConversationContext as its first parameter which will be followed by a list of token types.  In the above example, we're looking for a TokenList and a TokenWeather.  The objective is that if the user passes in something like "what's the weather" or "get the weather", this rule will be found to be a match for that request and our weather service will be called to get the forecast.  Then, the Say method on the ConversationContext instance will be called to push the forecast back to the user.  The second parameter, if you'll recall (null here) is the tag value that is serialized and saved in the conversation history for future reference.

Finding the matching Rule

So, how do we go about using our dictionary to find a matching rule?

First, locate and load all the rules found in the solution.  See code below that uses reflection to locate and get instances of all the rules.

Load rules
  1. var rules = new List<MethodInfo>();
  2.             var assemblies = new List<Assembly>();
  3.             var currentAssembly = Assembly.GetExecutingAssembly();
  4.             string path = currentAssembly.Location;
  5.             IEnumerable methods = null;
  6.  
  7.             foreach (string dll in Directory.GetFiles(Path.GetDirectoryName(path), "*.dll"))
  8.             {
  9.                 try
  10.                 {
  11.                     assemblies.Add(Assembly.LoadFile(dll));
  12.                 }
  13.                 catch (Exception)
  14.                 {
  15.                 }
  16.             }
  17.  
  18.             foreach (var assembly in assemblies)
  19.             {
  20.                 List<Type> ruleClasses = (from t in assembly.GetTypes()
  21.                                           where !t.IsInterface
  22.                                           where !t.IsAbstract
  23.                                           where t.GetInterface("SmartHome.Core.IRuleClass") != null
  24.                                           select t).ToList();
  25.  
  26.                 foreach (var ruleClass in ruleClasses)
  27.                 {
  28.                     if (ruleClass == null)
  29.                     {
  30.                         continue;
  31.                     }
  32.  
  33.                     methods = from m in ruleClass.GetMethods()
  34.                               where m.IsStatic
  35.                               select m;
  36.  
  37.                     rules.AddRange(methods.Cast<object>().Cast<MethodInfo>());
  38.                 }
  39.             }
  40.  
  41.             return rules.ToList();

Loop through the rules and for each one, get a list of all the rule's parameters.

Get list of parameters
  1. var parameters = (from p in rule.GetParameters()
  2.                                   where p.ParameterType != typeof(ConversationContext)
  3.                                   select p).ToList();

Now, since the order of the parameters should match the order our tokens are found in the input, we loop though the parameters for each of the rules and for each one, see if we have a token in each of the lists in our dictionary; in index order.  For each rule, then, we keep track of how many of our TokenResult classes wrapped a matching Token type, again in the same order that the parameters are specified.  After all the rules have been evaluated, we look at these results we have saved for each.  If there are tokens in the dictionary that are not found in the parameter list, the rule is thrown out.  Next, if there are parameters in a rule that are not in the dictionary, the rule is thrown out.  Of the remaining rules, the top 1 is selected.  The rule is invoked and an instance of the ConversationContext along with the other defined parameters are passed in.

Questions

As we saw before, the ConversationContext class defines a method, AskQuestion().  This functionality was not in my original implementation.  I found, however, that there were some things that were difficult to communicate to the system in a single sentence.  For example, I wanted the system to be able to handle reminding the user of something on a particular date at a particular time and also allow the user to specify which channels they were reminded on.  For example, "remind me to get a haircut on Friday at 4:00 PM and remind me via email and sms".  My thoughts were that it's easier for the user to just say, "remind me to get a haircut" and let the system prompt them for the other details.  To enable this, I created  two additional types, Question and QuestionManager.  The Question type simply contains information about the question.  See its definition, below.

Question
  1. [DataContract]
  2.     public class Question
  3.     {
  4.         [DataMember]
  5.         public string QuestionText { get; set; }
  6.  
  7.         [DataMember]
  8.         public List<Token> ExpectedReplys { get; set; }
  9.         
  10.         [DataMember]
  11.         public Action<ConversationContext, object,
  12.             List<Token>> ExecuteIfAnswered { get; set; }
  13.         
  14.         [DataMember]
  15.         public ConversationMode Mode { get; set; }
  16.         
  17.         [DataMember]
  18.         public string Address { get; set; }
  19.         
  20.         [DataMember]
  21.         public Guid UserId { get; set; }
  22.         
  23.         [DataMember]
  24.         public DateTime PosedDateTime { get; set; }
  25.         
  26.         [DataMember]
  27.         public object State { get; set; }
  28.     }

Instances of this class store the text of the question and  information about the channel and context the question was asked on.  In addition, there is a callback that will be executed when the question is answered and a list of tokens that the input must match in order qualify as an answer to the question.  This match is evaluated in exactly the same way we matched rules, before.  The QuestionManager stores these questions until they are answered.  It is called to evaluate the input from the user to see if it matches any of the questions it is managing and once a question is matched, its callback is called and it is then removed from the QuestionManager.

Posted in: .Net | Natural Language Processing

Tags: ,

Writing a Natural Language Parser in C# Part 4–Tokens

April 1, 2012 at 6:50 AMAdministrator

This post is part of a series on creating a natural language processor in C#.  The other entries in this series are:

Writing a Natural Language Parser in C# Part 1–Why?
Writing a Natural Language Parser in C# Part 2 – Architecture 
Writing a Natural Language Parser in C# Part 3–CommandProcessor and ConversationContext
Writing a Natural Language Parser in C# Part 5 - Questions and Rules

This week, I’d like to look deeper into how the speech processor tokenizes the incoming command.

When I first started thinking about creating a speech processor for my smart home software, I did quite a bit of research on the internet looking to see what the state of this technology was and what people were doing with it.  What I found was that there were several pieces of software out there like SharpNLP and Antelope that are capable of taking a sentence and breaking it down into its constituent phrases and words.  They can identify the part of speech each is, the definition of words and their synonyms.  The output from such a tool might look something like this.

Let/VB 's/PRP see/VB how/WRB tokenization/NN works/VBZ in/IN SmartNLP/NNP ./.

I was amazed at all this until I tried to figure out how to apply this technology to my problem of communicating with my machine.  I found that even with all this information about what was said, I still couldn’t understand it enough to act on it.  I discovered a different approach which is the conversion of the input into tokens containing a different set of properties.  Once the input was tokenized, it was no longer a string, but a collection of objects that could be processed to discover what was being communicated.

Let’s look at how the system tokenizes its input.

Anatomy of A Token

A token has three members;  A collection of phrases, a value and one method that takes the input and returns results.  A skeleton would look like this:

Token
  1. [DataContract]
  2.     public class Token
  3.     {
  4.         protected List<string> Words;
  5.  
  6.         [DataMember]
  7.         public object Value { get; set; }
  8.  
  9.         public virtual IEnumerable<TokenResult> Parse(string input, Guid userId)
  10.         {
  11.  
  12.         }
  13.     }

Here, the list of strings, Words, holds a collection of words or phrases that the class will locate and generate a result for.  The Value property will hold the value that has been parsed.  Depending on the token, this could be a string, a DateTime or  a number.  Last, the Parse method is called to do the actual parsing.  All tokens inherit from the base Token class.  The parse method is implemented in this class to provide a basic functionality of locating phrases that are in the Words collection and return a TokenResult for each.

Some of the tokens are very simply implemented and others are quite complicated depending on what is being parsed.  For example, the token that parses the user’s request for information is TokenList, as in “list reminders”.  Its entire implementation looks like this:

TokenList
  1. [DataContract]
  2.     [Export(typeof(IParseToken))]
  3.     public class TokenList : Token, IParseToken
  4.     {
  5.         public TokenList()
  6.         {
  7.             Words = new List<string> { "list", "show", "get", "lists", "what are my", "whats" };   
  8.         }
  9.     }

This class takes advantage of the base class’ Parse implementation.  Also, notice that the Words lists contains synonyms for list.  If the user specifies any of these values, it will be parsed as TokenList.

As an example of a more complicated Token, consider a token that parses a DateTime.  Here, we would override the base class’ implementation of parse and look for portions of the input that could be parsed as a DateTime.  This can get quite complicated when you consider that the user could say something like “remind me to call bob next saturday”  This token would need to be able to recognize that “next saturday” specifies a date and then calculate what that date is.

TokenResult

The Token classes all return an instance of TokenResult for each value they parse out.  The TokenResult class is listed below.

TokenResult
  1. [DataContract]
  2.     [KnownType("GetKnownTypes")]
  3.     public class TokenResult
  4.     {
  5.         [DataMember]
  6.         public object Value { get; set; }
  7.  
  8.         [DataMember]
  9.         public string TokenType { get; set; }
  10.  
  11.         [DataMember]
  12.         public int Start { get; set; }
  13.  
  14.         [DataMember]
  15.         public int Length { get; set; }
  16.  
  17.         [DataMember]
  18.         public Token Token { get; set; }
  19.  
  20.         private static IEnumerable<Type> GetKnownTypes()
  21.         {
  22.             return new List<Type>
  23.                             {
  24.                                 typeof (Token),
  25.                                 typeof (TokenInt),
  26.                                 typeof (TokenLong),
  27.                                 typeof (TokenNumeric),
  28.                                 typeof (TokenPercentage),
  29.                                 typeof (TokenQuotedPhrase),
  30.                                 typeof (TokenResult),
  31.                                 typeof (Tokens.Nouns.TokenNoun),
  32.                                 typeof (Tokens.Nouns.TokenToDo),
  33.                                 typeof (Tokens.Nouns.TokenEmail),
  34.                                 typeof (Tokens.Nouns.TokenSms),
  35.                                 typeof (Tokens.Nouns.TokenWeather),
  36.                                 typeof (Tokens.Nouns.TokenNews),
  37.                                 typeof (Tokens.Nouns.TokenIm),
  38.                                 typeof(Tokens.Nouns.TokenNeither),
  39.                                 typeof(Tokens.Nouns.TokenYesNo),
  40.                                 typeof(TokenReminder),
  41.                                 typeof(TokenDefinedList),
  42.                                 typeof(TokenNamed),
  43.                                 //typeof (Tokens.Nouns.TokenDevice),
  44.                                 //typeof (Tokens.Nouns.TokenRoom),
  45.                                 typeof (Tokens.Nouns.TokenState),
  46.                                 //typeof (Tokens.Nouns.TokenStructure),
  47.                                 //typeof (Tokens.Nouns.TokenZone),
  48.                                 typeof (Tokens.Nouns.TokenDim),
  49.                                 typeof (Tokens.Prepositions.TokenPreposition),
  50.                                 typeof (Tokens.Temporal.TokenDeterminateSeries),
  51.                                 typeof (Tokens.Temporal.TokenExactTime),
  52.                                 typeof (Tokens.Temporal.TokenIndeterminateSeries),
  53.                                 typeof (Tokens.Temporal.TokenTemporal),
  54.                                 typeof(Tokens.Temporal.TemporalParts.TokenDayOfWeek),
  55.                                 typeof(Tokens.Temporal.TemporalParts.TokenApril),
  56.                                 typeof(Tokens.Temporal.TemporalParts.TokenAugust),
  57.                                 typeof(Tokens.Temporal.TemporalParts.TokenDayAfterTomorrow),
  58.                                 typeof(Tokens.Temporal.TemporalParts.TokenDayBeforeYesterday),
  59.                                 typeof(Tokens.Temporal.TemporalParts.TokenDecember),
  60.                                 typeof(Tokens.Temporal.TemporalParts.TokenEach),
  61.                                 typeof(Tokens.Temporal.TemporalParts.TokenEighteenth),
  62.                                 typeof(Tokens.Temporal.TemporalParts.TokenEighth),
  63.                                 typeof(Tokens.Temporal.TemporalParts.TokenEleventh),
  64.                                 typeof(Tokens.Temporal.TemporalParts.TokenFebruary),
  65.                                 typeof(Tokens.Temporal.TemporalParts.TokenFifteenth),
  66.                                 typeof(Tokens.Temporal.TemporalParts.TokenFifth),
  67.                                 typeof(Tokens.Temporal.TemporalParts.TokenFirst),
  68.                                 typeof(Tokens.Temporal.TemporalParts.TokenForteenth),
  69.                                 typeof(Tokens.Temporal.TemporalParts.TokenForth),
  70.                                 typeof(Tokens.Temporal.TemporalParts.TokenFriday),
  71.                                 typeof(Tokens.Temporal.TemporalParts.TokenInt),
  72.                                 typeof(Tokens.Temporal.TemporalParts.TokenJanuary),
  73.                                 typeof(Tokens.Temporal.TemporalParts.TokenJuly),
  74.                                 typeof(Tokens.Temporal.TemporalParts.TokenJune),
  75.                                 typeof(Tokens.Temporal.TemporalParts.TokenLong),
  76.                                 typeof(Tokens.Temporal.TemporalParts.TokenMarch),
  77.                                 typeof(Tokens.Temporal.TemporalParts.TokenMay),
  78.                                 typeof(Tokens.Temporal.TemporalParts.TokenMonday),
  79.                                 typeof(Tokens.Temporal.TemporalParts.TokenMonth),
  80.                                 typeof(Tokens.Temporal.TemporalParts.TokenNinteenth),
  81.                                 typeof(Tokens.Temporal.TemporalParts.TokenNinth),
  82.                                 typeof(Tokens.Temporal.TemporalParts.TokenNovember),
  83.                                 typeof(Tokens.Temporal.TemporalParts.TokenNumeric),
  84.                                 typeof(Tokens.Temporal.TemporalParts.TokenOctober),
  85.                                 typeof(Tokens.Temporal.TemporalParts.TokenOrdinal),
  86.                                 typeof(Tokens.Temporal.TemporalParts.TokenOther),
  87.                                 typeof(Tokens.Temporal.TemporalParts.TokenPercentage),
  88.                                 typeof(Tokens.Temporal.TemporalParts.TokenRelativeTemporalOrdinal),
  89.                                 typeof(Tokens.Temporal.TemporalParts.TokenSaturday),
  90.                                 typeof(Tokens.Temporal.TemporalParts.TokenSecond),
  91.                                 typeof(Tokens.Temporal.TemporalParts.TokenSeptember),
  92.                                 typeof(Tokens.Temporal.TemporalParts.TokenSeventeenth),
  93.                                 typeof(Tokens.Temporal.TemporalParts.TokenSeventh),
  94.                                 typeof(Tokens.Temporal.TemporalParts.TokenSixteenth),
  95.                                 typeof(Tokens.Temporal.TemporalParts.TokenSixth),
  96.                                 typeof(Tokens.Temporal.TemporalParts.TokenSpecifiedDate),
  97.                                 typeof(Tokens.Temporal.TemporalParts.TokenSunday),
  98.                                 typeof(Tokens.Temporal.TemporalParts.TokenTenth),
  99.                                 typeof(Tokens.Temporal.TemporalParts.TokenThird),
  100.                                 typeof(Tokens.Temporal.TemporalParts.TokenThirteenth),
  101.                                 typeof(Tokens.Temporal.TemporalParts.TokenThirtieth),
  102.                                 typeof(Tokens.Temporal.TemporalParts.TokenThirtyFirst),
  103.                                 typeof(Tokens.Temporal.TemporalParts.TokenThursday),
  104.                                 typeof(Tokens.Temporal.TemporalParts.TokenTime),
  105.                                 typeof(Tokens.Temporal.TemporalParts.TokenToday),
  106.                                 typeof(Tokens.Temporal.TemporalParts.TokenTomorrow),
  107.                                 typeof(Tokens.Temporal.TemporalParts.TokenTuesday),
  108.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwelth),
  109.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentieth),
  110.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentyEighth),
  111.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentyFifth),
  112.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentyFirst),
  113.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentyFourth),
  114.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentyNinth),
  115.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentySecond),
  116.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentySeventh),
  117.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentySixth),
  118.                                 typeof(Tokens.Temporal.TemporalParts.TokenTwentyThird),
  119.                                 typeof(Tokens.Temporal.TemporalParts.TokenWednesday),
  120.                                 typeof(Tokens.Temporal.TemporalParts.TokenYesterday),
  121.                                 typeof (Tokens.Verbs.TokenCreate),
  122.                                 typeof (Tokens.Verbs.TokenDelete),
  123.                                 typeof (Tokens.Verbs.TokenList),
  124.                                 typeof (Tokens.Verbs.TokenRemind),
  125.                                 typeof (Tokens.Verbs.TokenReset),
  126.                                 typeof (Tokens.Verbs.TokenWhatIs),
  127.                                 typeof(Tokens.Verbs.TokenWhereIs),
  128.                                 typeof(Tokens.Verbs.TokenWhoIs),
  129.                                 typeof (Tokens.Verbs.TokenWhoSang),
  130.                                 typeof (Tokens.Verbs.TokenWhoWasIn),
  131.                                 typeof (Tokens.Verbs.TokenRemindMeTo),
  132.                                 typeof (Tokens.Verbs.TokenRemindMeAt),
  133.                                 typeof(System.Type),
  134.                                 typeof(Questions.Question)
  135.                                 //typeof(StructuredSpeech2.House.Structure.Device),
  136.                                 //typeof(StructuredSpeech2.House.Structure.House),
  137.                                 //typeof(StructuredSpeech2.House.Structure.Room),
  138.                                 //typeof(StructuredSpeech2.House.Structure.X10Device),
  139.                                 //typeof(StructuredSpeech2.House.Structure.Zone),
  140.                                 //typeof(StructuredSpeech2.House.Devices.X10LampDevice),
  141.                                 //typeof(StructuredSpeech2.Tokens.Verbs.TokenTurn),
  142.                                 //typeof(StructuredSpeech2.Tokens.Nouns.TokenDeviceList)
  143.                             };
  144.         }
  145.     }

This class wraps a token instance and additionally holds the start position and length of the parsed value.  You’ll also notice code here that facilitates the serializing of tokens.  The application often stores tokens and token results in the database and this code allows the types to be serialized and persisted.

Recall that the CommandProcessor class calls into the TokenManager which calls into each token, in turn, and compiles all the results into buckets. 

TokenManager

The TokenManager holds a collection of tokens and manages giving each a shot at parsing the input.  It, then, uses the start position and length properties on the results to organize then into a dictionary that can be used to determine a matching rule to be executed.  The TokenManager class is listed, below.

TokenManager
  1. [Export]
  2.     public class TokenManager
  3.     {
  4.         [ImportMany(typeof(IParseToken))]
  5.         private List<IParseToken> Tokens { get; set;}
  6.  
  7.         public Dictionary<int, List<TokenResult>> TokenizeInput(
  8.             string input, Guid userId)
  9.         {
  10.             var results = new List<TokenResult>();
  11.  
  12.             try
  13.             {
  14.                 foreach (var token in Tokens)
  15.                 {
  16.                     results.AddRange(token.Parse(input, userId));
  17.                 }
  18.             }
  19.             catch (Exception e)
  20.             {
  21.                 Logger.Log(e.Message);
  22.             }
  23.  
  24.  
  25.             CreateQuotedPhraseTokens(results, input);
  26.  
  27.             //arrange all token results by their start positions
  28.             var buckets = new Dictionary<int, List<TokenResult>>();
  29.  
  30.             foreach (var result in results.OrderBy(r => r.Start))
  31.             {
  32.                 if (!buckets.ContainsKey(result.Start))
  33.                 {
  34.                     buckets[result.Start] = new List<TokenResult>();
  35.                 }
  36.  
  37.                 buckets[result.Start].Add(result);
  38.             }
  39.  
  40.             return buckets;
  41.         }
  42.  
  43.         private void CreateQuotedPhraseTokens(
  44.             List<TokenResult> results, string input)
  45.         {
  46.             int index = 0;
  47.             List<WordInfo> words = new List<WordInfo>();
  48.             string accumulator = "";
  49.  
  50.             for (index = 0; index < input.Length - 1; index++)
  51.             {
  52.                 if (input[index] == ' ')
  53.                 {
  54.                     words.Add(new WordInfo
  55.                     {
  56.                         Found = false,
  57.                         Length = accumulator.Length,
  58.                         Start = index - accumulator.Length,
  59.                         Value = accumulator
  60.                     });
  61.  
  62.                     accumulator = "";
  63.                     continue;
  64.                 }
  65.  
  66.                 accumulator += input[index];
  67.             }
  68.  
  69.             accumulator += input[index];
  70.  
  71.             words.Add(new WordInfo
  72.             {
  73.                 Found = false,
  74.                 Length = accumulator.Length,
  75.                 Start = (index + 1) - accumulator.Length,
  76.                 Value = accumulator
  77.             });
  78.  
  79.             accumulator = "";
  80.  
  81.             foreach (var word in words)
  82.             {
  83.                 var match = results.Where(r =>
  84.                     word.Start >= r.Start && (word.Start + word.Length) <=
  85.                     (r.Start + r.Length)).FirstOrDefault();
  86.  
  87.                 if (match != null)
  88.                 {
  89.                     if (accumulator.Length > 0)
  90.                     {
  91.                         results.Add(new TokenResult
  92.                         {
  93.                             Length = accumulator.Trim().Length,
  94.                             Start = word.Start - 1 - accumulator.Trim().Length,
  95.                             Token = new TokenQuotedPhrase { Value = accumulator.Trim() },
  96.                             TokenType = typeof(TokenQuotedPhrase).ToString(),
  97.                             Value = accumulator.Trim()
  98.                         });
  99.                         accumulator = "";
  100.                     }
  101.                 }
  102.                 else
  103.                 {
  104.                     accumulator += word.Value + " ";
  105.                 }
  106.             }
  107.  
  108.             if (accumulator.Length > 0)
  109.             {
  110.                 results.Add(new TokenResult
  111.                 {
  112.                     Length = accumulator.Trim().Length,
  113.                     Start = input.Length - 1 - accumulator.Trim().Length,
  114.                     Token = new TokenQuotedPhrase { Value = accumulator.Trim() },
  115.                     TokenType = typeof(TokenQuotedPhrase).ToString(),
  116.                     Value = accumulator.Trim()
  117.                 });
  118.             }
  119.         }
  120.     }

In lines 4 and 5, you can see that we’re using MEF to load all the Tokens into a collection.  The TokenizeInput method loops through the tokens and passes the input to each.  It then calls the CreateQuotedPhraseTokens method, which I’ll discuss shortly.  Next, the results are iterated through and organized into a dictionary.

It’s quite possible the user will sometimes specify words or phrases that we have no token for.  In fact, there are situations where we expect the user to do this.  For example, when the user asks the system to create a reminder for them they will say something like, “Remind me to cook the golden goose next Friday”.  We can parse out enough of the input to determine the users would like a reminder created and when they would like to be reminded.  We don’t have tokens, however, to represent the “cook the golden goose” portion of the input.  For this reason, after all the token classes have parsed out their results from the input, the TokenManager tokenizes the “left out” portions of the input as a TokenQuotedPhrase type.  This allows us to use these values when locating rules to execute and inside those rules we can use that portion of the input as data.

The tokenization of the input is an important part of understanding what the user is asking for.  It allows us to work with the input as a collection of objects as opposed to dealing with a string.  The last part of the process is matching the tokens to a rule.  We’ll look at how this is done next time.

Posted in: .Net | Natural Language Processing

Tags: ,