使用Hashtable实现简单的关键字过滤

2011/6/30 6:07:56中文系列评论(0)浏览(3575)

这段时间开发一个聊天室，需要使用到关键字过滤的功能，需求如下：

1.将关键字替换成“*”;

2.支持过滤HTML，例如，S<span>B</span>也要过滤掉。

原本打算使用String.Replace来实现，但是这样的话，如果关键字很多，例如1000个，用以下方式：

for(int i=0;i<1000;i++)

{

replace….
}

来实现，性能显然很低。因此，换了一种方法，用了Hashtable来提高过滤的性能。大概思路如下：

根据所有关键字建好用Hashtable数据结构，也可以理解为建立索引，之后每次过滤都用这个来进行：

例如，有以下几个关键字： SB , SX, Fuck, AB, ABC,那么用于过滤的数据结构如下：

上图中每个黑色方框代表一个hashtable，里面的字母就是hashtable的key，每个key都指向另一个hashtable，例如，第一个hashtable中就包含了3个Key，分别是S，A，F。每个关键字中的每个字符都被分散的放到hashtable中。

下面举例如何过滤

1.处理字符串 SB：第一个字母在第一级的hashtable中找到，而B也可以在S指向的Hashtable中找到，B指向的Hashtable中包括0，则，SB符合关键字，过滤掉。

2.处理字符串 SF：虽然S在第一级hashtable中可以找到，但是S指向的hashtable中没有F，所以，SF不是关键字

具体代码如下：

   1:  static class FilterWordUtil

   2:  {

   3:      static Hashtable FilterWords = new Hashtable();

4:

   5:      public static void AddFilterWord(string word)

   6:      {

   7:          Hashtable h = FilterWords;

   8:          foreach (char c in word.ToUpper())

   9:          {

  10:              if (!h.ContainsKey(c)) h[c] = new Hashtable();

  11:              h = h[c] as Hashtable;

  12:          }

  13:          h[0] = new Hashtable();

  14:      }

15:

  16:      static int Match(string content, int index, out StringBuilder alt)

  17:      {

  18:          content = content.ToUpper();

  19:          alt = new StringBuilder();

  20:          bool filterChar = true;

21:

  22:          Hashtable h = FilterWords;

  23:          int i = index;

  24:          for (; i < content.Length; i++)

  25:          {

  26:              char c = content[i];

  27:              switch (c)

  28:              {

  29:              case '<':

  30:                  {

  31:                      filterChar = false;

  32:                      break;

  33:                  }

  34:              case '>':

  35:                  {

  36:                      filterChar = true;

  37:                      break;

  38:                  }

  39:              case ' ':

  40:                  {

  41:                      break;

  42:                  }

  43:              default:

  44:                  {

  45:                      if (filterChar)

  46:                      {

  47:                          if (h.ContainsKey(c))

  48:                          {

  49:                              h = h[c] as Hashtable;

  50:                              c = '*';

  51:                          }

  52:                          else

  53:                          {

  54:                              if (!h.ContainsKey(0)) return -1;

  55:                          }

  56:                      }

  57:                      break;

  58:                  }

  59:              }

  60:              alt.Append(c);

  61:              if (h.ContainsKey(0)) return i;

  62:          }

  63:          return h.ContainsKey(0) ? i : -1;

  64:      }

65:

  66:      public static String Filter(string content)

  67:      {

  68:          lock (FilterWords)

  69:          {

  70:              StringBuilder result = new StringBuilder();

  71:              bool filterChar = true;

72:

  73:              for (int i = 0; i < content.Length; i++)

  74:              {

  75:                  char c = content[i];

  76:                  switch (c)

  77:                  {

  78:                  case '<':

  79:                      {

  80:                          filterChar = false;

  81:                          break;

  82:                      }

  83:                  case '>':

  84:                      {

  85:                          filterChar = true;

  86:                          break;

  87:                      }

  88:                  default:

  89:                      {

  90:                          if (filterChar)

  91:                          {

  92:                              StringBuilder temp;

  93:                              int fi = Match(content, i, out temp);

  94:                              if (fi != -1)

  95:                              {

  96:                                  i = fi;

  97:                                  result.Append(temp);

  98:                                  continue;

  99:                              }

 100:                          }

 101:                          break;

 102:                      }

 103:                  }

 104:                  result.Append(c);

 105:              }

 106:

 107:              return result.ToString();

 108:          }

 109:      }

 110:  }

测试代码:

   1:  //添加关键字

   2:  FilterWordUtil.AddFilterWord("SB");

   3:  FilterWordUtil.AddFilterWord("SX");

   4:  FilterWordUtil.AddFilterWord("fuck");

   5:  FilterWordUtil.AddFilterWord("fuck you");

   6:  FilterWordUtil.AddFilterWord("天朝");

   7:  //过滤

   8:  string new_html = FilterWordUtil.Filter("你是SB,天<span>朝</span>");

   9:  Console.WriteLine(new_html);

过滤效果如下：

看到这里，有些读者要担心了，用了这么多的hashtable，会不会占用很多内存？当时也考虑过这问题，测试了1000多个关键字，建数据结构后，内存大概增加了几M，还不算太多。

作者: 卢春城发表于 2011-06-30 00:09 原文链接

推荐.NET配套的通用数据层ORM框架：CYQ.Data 通用数据层框架

IT技术博客