JavaScript效率PK——统计特定字符在字符串中出现的次数
2011年7月15日23:34:18
效率PK —— 统计字符串中字符出现次数
原文见:javascript 统计哪个字符出现的次数最多–修正版
var str = "The officials say tougher legislation is needed because some \
telecommunications companies in recent years have begun new services and made \
system upgrades that create technical obstacles to surveillance. They want to \
increase legal incentives and penalties aimed at pushing carriers like Verizon, \
AT&T, and Comcast to ensure that any network changes will not disrupt their \
ability to conduct wiretaps." +
    "An Obama administration task force that includes officials from the \
Justice and Commerce Departments, the F.B.I. and other agencies recently began \
working on draft legislation to strengthen and expand a 1994 law requiring \
carriers to make sure their systems can be wiretapped. There is not yet \
agreement over the details, according to officials familiar with the \
deliberations, but they said the administration intends to submit a package to \
Congress next year." +
    "To bolster their case, security agencies are citing two previously \
undisclosed episodes in which major carriers were stymied for weeks or even \
months when they tried to comply with court-approved wiretap orders in criminal \
or terrorism investigations, the officials said.",
  count = 0,
  index = 0,
  arrStr = [],
  oLetter = {};
str = str.replace(/\s/g,''); // 之前的Method_3和normal不对,原来是漏了这里
for (var i = 0; i < 5000; i++) { //create a long text
    arrStr.push(str);
}
str = arrStr.join(""); // 原来的代码这里为什么要用","?我发现他的代码也会统计",",所以把","删掉了。
if(! ('console' in this || 'console' in window) ){ // 专给无console的解析器
  console = {
    stacks : [],
    log : function(str){
      stacks.push(str);
    },
    show : function(){
      alert(console.stacks.join('\n'));
      console.stacks = [];
    }
  }
}
我的方法,使用str.replace(RegExp,Function) 进行遍历
关于str.replace(RegExp, function)的用法,请参考我的上一篇随笔《JavaScript replace(RegExp, Function)详解》
function method_replace_RegExp_function(){
  function counter(match) {  // 用于统计的函数
    if(visited[match]){
      visited[match]++;
    } else {
      visited[match] = 1;
    }
  }
  var count = 0, index = 0, arrStr=[], visited = {};
  var begin = (new Date()).getTime();
  str.replace(/\S/g, counter);
  for (var i in visited) {
    if (visited[i] > count) {
      count = visited[i];
      index = i;
    }
  }
  var end = + new Date();
  console.log("Method_replace_RegExp_Function:\n出现次数最多的是" + index + ",一共出现" + count + "次", "耗时:" + (end - begin) + "毫秒");
}
// 又想到的Normal方法
function method_normal(){
  var count = 0, index = 0, arrStr = [], visited = {}, tmp = '';
  var begin = (new Date()).getTime();
  for(var i = 0; i < str.length; i++){
    tmp = str.charAt(i);
    if(visited[tmp]){
      visited[tmp]++;
    } else {
      visited[tmp] = 1;
    }
  }
  for (var i in visited) {
    if (visited[i] > count) {
      count = visited[i];
      index = i;
    }
  }
  var end = + new Date();
  console.log("Method_normal:\n出现次数最多的是" + index + ",一共出现" + count + "次", "耗时:" + (end - begin) + "毫秒");
}
method_2();
method_3();
method_replace_RegExp_function();
method_normal();
(!!console.show)?console.show():void 0;
 //给不支持console的浏览器使用的
几个环境下的输出结果:
傲游 3.1.3.600 Method_2: 出现次数最多的是e,一共出现610000次 耗时:7128毫秒 Method_3: 出现次数最多的是e,一共出现610000次 耗时:6757毫秒 Method_replace_RegExp_Function: 出现次数最多的是e,一共出现610000次 耗时:4399毫秒 Method_normal: 出现次数最多的是e,一共出现610000次 耗时:5925毫秒
Node.exe 2011.07.14 v0.5.1 http://nodejs.org > method_2(); Method_2: 出现次数最多的是e,一共出现610000次 耗时:3141毫秒 > method_3(); Method_3: 出现次数最多的是e,一共出现610000次 耗时:1560毫秒 > //method_replace_RegExp_function(); //这个会直接死掉…… > method_normal(); Method_normal: 出现次数最多的是e,一共出现610000次 耗时:1045毫秒
FireFox 3.6.3 FireBug 1.7.3 Method_2: 出现次数最多的是e,一共出现610000次 耗时:12046毫秒 Method_3: 出现次数最多的是e,一共出现610000次 耗时:10488毫秒 Method_replace_RegExp_Function: 出现次数最多的是e,一共出现610000次 耗时:6836毫秒 Method_normal: 出现次数最多的是e,一共出现610000次 耗时:5351毫秒
IE9: 日志: Method_2: 出现次数最多的是e,一共出现610000次耗时:18411毫秒 日志: Method_3: 出现次数最多的是e,一共出现610000次耗时:10968毫秒 日志: Method_replace_RegExp_Function: 出现次数最多的是e,一共出现610000次耗时:1651毫秒 日志: Method_normal: 出现次数最多的是e,一共出现610000次耗时:12339毫秒
总结:不能迷信正则表达式的强大搜索功能,正则的每一次匹配过程就是一次循环。
所以正则的匹配不能用太多,善用String.replace(RegExp, Function)才是高效的选择。
推荐.NET配套的通用数据层ORM框架:CYQ.Data 通用数据层框架