awstats CGI模式下动态生成页面缓慢的改进-低调大师

awstats CGI模式下动态生成页面缓慢的改进

2016-05-04 119813

本文可以看做是多server多站点情况下awstats日志分析这篇文章的下篇，在使用过程中发现awstats在cgi模式下动态生成分析报告慢的问题（尤其是有些站点每天两个多G的日志，查看起来简直是在考验人的耐性），本文分享一种改造这个缺点的思路。

首先再来总结下awstats的处理过程以及查看分析结果的两种方式，来看官方版说明:

Process logs: Building/updating statistics database，建立/更新统计数据库（包含统计结果的文本文件）命令如下
perl awstats.pl -config=mysite -update
Run reports: Building and reading reports（生成并阅读报告）
1.The first option is to build the main reports, in a static HTML page, from the command line, using the following syntax
第一种方式，通过命令行生成html文件，然后浏览器展示。命令如下
perl awstats.pl -config=mysite -output -staticlinks > awstats.mysite.html
2.The second option is to dynamically view your statistics from a browser. To do this, use the URL:
第二种方式，通过如下的url“动态”的生成该站点的分析报告
http://www.myserver.mydomain/awstats/awstats.pl?config=mysite

总体思路就是，既然“动态生成”这个过程耗时，那就在服务器上定时通过curl 请求每个站点对应的url将生成的html页面存储到特定位置，然后浏览器访问时直接读取html文件即可（可能有同学要问了，这么费事，那为啥不直接用上面的第一种方式，用awstats.pl提供的参数直接生成html文件呢？这也就回归到上篇文章中讨论过的两种方式的差别了，awstats.pl生成的静态html页面从易用性和美观性都不如通过CGI动态生成的html页面）

思路有了，接下来就是“尝试”和“分析特征”。我们直接以

curl  -o /tmp/mysite.html http://www.myserver.mydomain/awstats/awstats.pl?config=mysite

得到的页面源代码如下

<html >
<head>
<meta name="generator" content="AWStats 7.4 (build 20150714) from config file awstats./usr/local/awstats/etc/www.conf.conf (http://www.awstats.org)">
<meta name="robots" content="noindex,nofollow">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta http-equiv="expires" content="Wed Apr 27 11:09:58 2016">
<meta http-equiv="description" content="Awstats - Advanced Web Statistics for www.dddd.com (2015-08) - main">
<title>Statistics for www.mysite.com (2015-08) - main</title>
</head>

<frameset cols="240,*">
<frame name="mainleft" src="awstats.pl?config=mysite&amp;framename=mainleft" noresize="noresize" frameborder="0" />
<frame name="mainright" src="awstats.pl?config=mysite&amp;framename=mainright" noresize="noresize" scrolling="yes" frameborder="0" />
<noframes><body>Your browser does not support frames.<br />
You must set AWStats UseFramesWhenCGI parameter to 0
to see your reports.<br />
</body></noframes>
</frameset>

</html>

可以看到动态生成的页面实际上是一个包含了两个frame（mainleft和mainright）的html文件，也就是说，如果我们想还原一个动态生成的报告页面，需要通过如下三条命令来生成对应的三个文件

curl -s -o main.html "http://www.myserver.mydomain/awstats/awstats.pl?config=mysite"    #取得主页面
curl -s -o left.html "http://www.myserver.mydomain/awstats/awstats.pl?config=mysite&framename=mainleft"    #取得左frame
curl -s -o right.html "http://www.myserver.mydomain/awstats/awstats.pl?config=mysite&framename=mainright"    #取得右frame

然后，需要在 main.html中修改mainleft和mainright两个frame的src属性，将其指定到我们生成的left.html和right.html。如此我们就实现了将动态页面静态化（实际上是把动态生这个等待时间放到脚本里定时执行了）。

接下来，就是具体的实现过程了，涉及到对上篇文章中“cron_awstats_update.sh”脚本的改进，修改后的脚本内容如下（注释还算丰富，也能帮助理解思路）

#!/bin/sh

#awstats日志分析
basedir=/usr/local/awstats-7.4
date_y_m=$(date +%Y%m -d '1 day ago')    #因为该脚本是第二天凌晨分析前一天的日志
year=`echo ${date_y_m:0:4}`
month=`echo ${date_y_m:4:5}`

cd $basedir
#循环更新所有站点日志统计信息
echo -e "\e[1;31m-------`date "+%F %T"`    开始处理---------\n\e[0m" >>logs/cron.log
for i in `ls result/`
do
    echo -e "\e[1;32m -----`date "+%F %T"`  处理 $i 日志-----\e[0m" >>logs/cron.log
    perl wwwroot/cgi-bin/awstats.pl -config=etc/$i.conf -lang=cn -update &>>logs/cron.log

    #将动态页面静态化，查看展示页面结构可得：主页面基本没内容，主要靠左右两个frame来生成内容
    #所以可以将每一个站点的展示页分为三部分来缓存
    echo -e "\e[1;32m -----`date "+%F %T"` 分析 $i 生成静态页面-----\n\e[0m" >>logs/cron.log
    cd wwwroot
    if [ ! -d $i/$date_y_m ];then mkdir -p $i/$date_y_m;fi
    cd $i/$date_y_m
    curl -s -o main.html\    #主页面
        "http://127.0.0.1/cgi-bin/awstats.pl?month=$month&year=$year&output=main&config=/usr/local/services/awstats-7.4/etc/$site.conf&framename=index"
    curl -s -o left.html\    #左页面
        "http://127.0.0.1/cgi-bin/awstats.pl?month=$month&year=$year&output=main&config=/usr/local/services/awstats-7.4/etc/$site.conf&framename=mainleft"
    curl -s -o right.html\    #右页面
        "http://127.0.0.1/cgi-bin/awstats.pl?month=$month&year=$year&output=main&config=/usr/local/services/awstats-7.4/etc/$site.conf&framename=mainright"

    #修改main.html里关于左右两个frame的引用
    sed -i -e 's/awstats.pl.*left/left.html/g' -e 's/awstats.pl.*right/right.html/g' main.html
    #接下来修改上面三个文件中的超链接部分
    sed -i -e 's#awstats.pl#http://123.123.123.123/cgi-bin/awstats.pl#g'\    #123.123.123.123为公网ip
           -e 's/charset=.*/charset=utf-8">/g'\
           -e 's/lang="cn"//g'\
           main.html left.html right.html
    #剩下的事就是去修改nginx index.html页面的超链接指向

    cd $basedir
done
echo -e "\e[1;33m-------`date "+%F %T"`  处理完成---------\n\e[0m" >>logs/cron.log

#####
#原始请求样式，
#http://127.0.0.1/cgi-bin/awstats.pl?config=/usr/local/awstats-7.4/etc/heibai.conf 这个url访问该站点最新数据，会产生下面三个请求
#http://127.0.0.1/cgi-bin/awstats.pl?config=/usr/local/awstats-7.4/etc/heibai.conf
#http://127.0.0.1/cgi-bin/awstats.pl?config=/usr/local/awstats-7.4/etc/heibai.conf&framename=mainleft
#http://127.0.0.1/cgi-bin/awstats.pl?config=/usr/local/awstats-7.4/etc/heibai.conf&framename=mainright
#####
#选择年月之后，会产生如下三个请求
#http://127.0.0.1/cgi-bin/awstats.pl?month=05&year=2016&output=main&config=%2Fusr%2Flocal%2Fawstats-7.4%2Fetc%2Fheibai.conf&framename=index 经过编码的
#http://127.0.0.1/cgi-bin/awstats.pl?month=05&year=2016&output=main&config=/usr/local/awstats-7.4/etc/heibai.conf&framename=mainleft
#http://127.0.0.1/cgi-bin/awstats.pl?month=05&year=2016&output=main&config=/usr/local/awstats-7.4/etc/heibai.conf&framename=mainright
#####

经过脚本处理之后，在wwwroot目录下，站点目录与html文件会是这个样子

到此，我们对上篇文章中的nginx配置部分做相应修改后就可以通过如下url来访问了

http://www.myserver.mydomain/www/201605 #表示www站2016年5月的统计页面

但是，改造到这里并不算完，在动态生成的页面里，有选择年和月的下拉框，可以查看指定年月的统计页面，如下图

这个功能会产生一个如下的请求
http://www.myserver.mydomain/cgi-bin/awstats.pl?month=04&year=2016&output=main&config=www.conf&framename=index

仍然是动态请求（即仍然会慢），但按照我们的设计，每个月应该都已经生成了静态文件，所以是不需要动态生成的。如何将这个功能点修改为也按照上面静态url的格式呢，这里作者首先想到了两个方案：
一个是通过js获取年和月的值，然后在表单的action处拼出所需的url
另一个是通过nginx的rewrite来实现
经过尝试和对比，第二种方案更适合这里的场景，因为第一种涉及到对生成的html文件内容进行修改，且不止一处，实现起来啰嗦一些；而第二种方案只需要在nginx里做配置即可（这里如何从nginx获取到参数值并且引用该值算是一个小技巧吧）。

最终，修改之后的nginx配置文件如下

server {
    listen   800;
    root /usr/local/awstats/wwwroot;
    access_log /tmp/awstats_access_log access;
    error_log /tmp/awstats_nginx.error_log notice;

    location / {
        index index.html main.html;
    }

    # Static awstats files: HTML files stored in DOCUMENT_ROOT/awstats/
    location /awstats/classes/ {
        alias classes/;
    }
    location /awstats/css/ {
        alias css/;
    }
    location /awstats/icon/ {
        alias icon/;
    }
    location /awstats-icon/ {
        alias icon/;
    }
    location /awstats/js/ {
        alias js/;
    }

    # Dynamic stats.
    location ~ ^/cgi-bin/(awredir|awstats)\.pl.* {
        gzip off;
        fastcgi_pass 127.0.0.1:9000;
        fastcgi_param SCRIPT_FILENAME $document_root/cgi-bin/fcgi.php;
        fastcgi_param X_SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_param X_SCRIPT_NAME $fastcgi_script_name;
        include fastcgi_params;

        fastcgi_send_timeout 300;

        #为了让顶部根据时间筛选功能也能用上之前生成的静态页面, 其中%2F部分为url编码后的/，为了取得站点名
        if ($query_string ~* "^month=(\d+)&year=(\d+)&output=main&config=.+etc%2F(.+)\.conf&framename=index$") {
            set $month $1;
            set $year $2;
            set $site $3;
            rewrite  ^/cgi-bin/awstats\.pl  /$site/$year$month? permanent;
        }
    }
    expires 12h;
}

最后一点，不要忘了修改“入口文件”`index.html`哦，js自动生成的超链接要修改，增加及修改下面内容

/*... 省略 ...*/
//一个能计算昨天明天等日期的函数
            function GetDateStr(AddDayCount) {
                var dd = new Date();
                dd.setDate(dd.getDate()+AddDayCount);//获取AddDayCount天后的日期
                var y = dd.getFullYear();
                var m = dd.getMonth()+1;//获取当前月份的日期
                var d = dd.getDate();
                if (m<10) {
                    return y+"0"+m;    //格式自定义
                } else {
                    return y+''+m;
                }
            }
            var yesterday=GetDateStr(-1);   //计算昨天日期  格式  201604

            //向表格填充内容
            for (var tdid=0;tdid<num;tdid++) {
                //依顺序获取各td元素
                var tdnode=document.getElementById(tdid+1);
                //取出每个域名里的主机名，服务器端配置文件命名方式为 “主机名.conf”
                var hostname=vhost[tdid].split(".abc",1);
                //向表格插入域名并且设置超链接
                tdnode.innerHTML="<a href=\""+hostname+"/"+yesterday+"\">" +vhost[tdid] +"</a>";
            }
      /*... 省略 ...*/

主要修改如下图

ok，到这里整个改进过程完毕。每个月份的统计结果的主页面都已经实现了静态化，查看时再也不用经历漫长的等待了！
PS: 工具再好，也不见得完全适合或者满足自己的需求，大部分情况下作为“软件使用者”的运维同胞，应该有这个意识：不只会用，必要时还能改。共勉！

微信关注我们

原文链接：https://blog.51cto.com/kaifly/1770137

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

利用paramiko模块批量登录(执行命令/上传文件)

工作中由于服务器主机很多,如果手动的一台一台去添加ssh认证,效率太低了,而此脚本正是为了解决此问题此脚本的实现的功能： 1、实现了(密码、ssh认证)单一主机登录和批量主机登录 2、实现了(密码、ssh认证)单一主机上传文件和批量主机上传文件(下载文件的原理和此一样) 3、主机批量添加ssh认证(这才是我的主要目的) 脚本的不足： 1、只能循环主机名 2、所有的主机的账号和密码都是一样的,不够灵活有需求的朋友可以修改下代码,可以把主机、账号密码存放在一个文件中,循环读取文件下面贴上代码吧如下代码有错欢迎各位纠正有错才有进步你们的指点是我进步的源泉 #!/usr/bin/python #coding:utf-8 #需要安装paramiko模块 #yuminstallpython-paramiko-y importparamiko,os classSSH2: def__init__(self,hostname=None,username=None,password=None,key_file=None,port=22,timeout=30): self.h=hostname...

2016-05-03

783

分布式系统介绍及MogileFS安装、基本配置分布式MogileFS 大纲前言: 什么是分布式? 分布式存在的意义? 分布式的难点及CAP、BASE、2PC、X/Open XA介绍分布式存储和分布式文件系统: MogileFS实现原理: MogileFS编译安装和配置总结前言: 不知不觉中我们就进入大数据时代, 什么是大数据? 什么是分布式? 什么是云计算? 我们在后面都将介绍, 本篇文章, 我们主要讨论分布式系统; 什么是分布式? 分布式这个词听起来很高大上, 实际上在我们以前(作者博客)经常构建分布式系统, 从最初的分离LAMP中的MySQL 到引入Varnish缓存页面, 再到使用LVS负载均衡Nginx|Apache, Nginx负载均衡Tomcat等等, 广义上都算是分布式系统. 简单来说分布式就是将一个系统的各个组件(MySQL、PHP、Apache …)分布在网络上的各台主机, 并且各组件之间仅通过消息传递来通信并协调工作分布式存在的意义? 其实我们在之前负载均衡相关的博文中已经回答过了, 主要有以下问题: 垂直扩展的性价比不高单机扩展存在性能上升的临界点 ...

2016-05-04

801

资源下载

更多资源

Mario

马里奥是站在游戏界顶峰的超人气多面角色。马里奥靠吃蘑菇成长，特征是大鼻子、头戴帽子、身穿背带裤，还留着胡子。与他的双胞胎兄弟路易基一起，长年担任任天堂的招牌角色。

Nacos

Nacos /nɑ:kəʊs/ 是 Dynamic Naming and Configuration Service 的首字母简称，一个易于构建 AI Agent 应用的动态服务发现、配置管理和AI智能体管理平台。Nacos 致力于帮助您发现、配置和管理微服务及AI智能体应用。Nacos 提供了一组简单易用的特性集，帮助您快速实现动态服务发现、服务配置、服务元数据、流量管理。Nacos 帮助您更敏捷和容易地构建、交付和管理微服务平台。

Spring

Spring框架（Spring Framework）是由Rod Johnson于2002年提出的开源Java企业级应用框架，旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念，提供核心容器、应用上下文、数据访问集成等模块，支持整合Hibernate、Struts等第三方框架，其适用范围不仅限于服务器端开发，绝大多数Java应用均可从中受益。

Rocky Linux

Rocky Linux（中文名：洛基）是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版，作为CentOS稳定版停止维护后与RHEL（Red Hat Enterprise Linux）完全兼容的开源替代方案，由社区拥有并管理，支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性，采用模块化包装和SELinux安全架构，默认包含GNOME桌面环境及XFS文件系统，支持十年生命周期更新。