http://www.cnblogs.com/sdytzz/archive/2011/01/20/1292892.html
webalizer,在windows下运行,除了win32版的webalizer外,不需要其他任何支持。
win32版的webalizer下载地址在这里
下载后,解压缩在任意文件夹下即可
然后打开simple.conf,另存为webalizer.conf
打开webalizer.conf,修改。
Code
1 #
2 # Sample Webalizer configuration file
3 # Copyright 1997-2000 by Bradford L. Barrett (brad@mrunix.net)
4 #
5 # Distributed under the GNU General Public License. See the
6 # files "Copyright" and "COPYING" provided with the webalizer
7 # distribution for additional information.
8 #
9 # This is a sample configuration file for the Webalizer (ver 2.01)
10 # Lines starting with pound signs '#' are comment lines and are
11 # ignored. Blank lines are skipped as well. Other lines are considered
12 # as configuration lines, and have the form "ConfigOption Value" where
13 # ConfigOption is a valid configuration keyword, and Value is the value
14 # to assign that configuration option. Invalid keyword/values are
15 # ignored, with appropriate warnings being displayed. There must be
16 # at least one space or tab between the keyword and its value.
17 #
18 # As of version 0.98, The Webalizer will look for a 'default' configuration
19 # file named "webalizer.conf" in the current directory, and if not found
20 # there, will look for "/etc/webalizer.conf".
21
22
23 # LogFile defines the web server log file to use. If not specified
24 # here or on on the command line, input will default to STDIN. If
25 # the log filename ends in '.gz' (ie: a gzip compressed file), it will
26 # be decompressed on the fly as it is being read.
27
28 LogFile C:\WINDOWS\system32\LogFiles\W3SVC529919685\nc080917.log
29
30 # LogType defines the log type being processed. Normally, the Webalizer
31 # expects a CLF or Combined web server log as input. Using this option,
32 # you can process ftp logs as well (xferlog as produced by wu-ftp and
33 # others), or Squid native logs. Values can be 'clf', 'ftp' or 'squid',
34 # with 'clf' the default.
35
36 LogType iis
37
38 # OutputDir is where you want to put the output files. This should
39 # should be a full path name, however relative ones might work as well.
40 # If no output directory is specified, the current directory will be used.
41
42 OutputDir E:\wwwroot\banbank\webalizer
43
44 # HistoryName allows you to specify the name of the history file produced
45 # by the Webalizer. The history file keeps the data for up to 12 months
46 # worth of logs, used for generating the main HTML page (index.html).
47 # The default is a file named "webalizer.hist", stored in the specified
48 # output directory. If you specify just the filename (without a path),
49 # it will be kept in the specified output directory. Otherwise, the path
50 # is relative to the output directory, unless absolute (leading /).
51
52 #HistoryName webalizer.hist
53
54 # Incremental processing allows multiple partial log files to be used
55 # instead of one huge one. Useful for large sites that have to rotate
56 # their log files more than once a month. The Webalizer will save its
57 # internal state before exiting, and restore it the next time run, in
58 # order to continue processing where it left off. This mode also causes
59 # The Webalizer to scan for and ignore duplicate records (records already
60 # processed by a previous run). See the README file for additional
61 # information. The value may be 'yes' or 'no', with a default of 'no'.
62 # The file 'webalizer.current' is used to store the current state data,
63 # and is located in the output directory of the program (unless changed
64 # with the IncrementalName option below). Please read at least the section
65 # on Incremental processing in the README file before you enable this option.
66
67 Incremental yes
68
69 # IncrementalName allows you to specify the filename for saving the
70 # incremental data in. It is similar to the HistoryName option where the
71 # name is relative to the specified output directory, unless an absolute
72 # filename is specified. The default is a file named "webalizer.current"
73 # kept in the normal output directory. If you don't specify "Incremental"
74 # as 'yes' then this option has no meaning.
75
76 #IncrementalName webalizer.current
77
78 # ReportTitle is the text to display as the title. The hostname
79 # (unless blank) is appended to the end of this string (seperated with
80 # a space) to generate the final full title string.
81 # Default is (for english) "Usage Statistics for".
82
83 #ReportTitle Usage Statistics for
84
85 # HostName defines the hostname for the report. This is used in
86 # the title, and is prepended to the URL table items. This allows
87 # clicking on URL's in the report to go to the proper location in
88 # the event you are running the report on a 'virtual' web server,
89 # or for a server different than the one the report resides on.
90 # If not specified here, or on the command line, webalizer will
91 # try to get the hostname via a uname system call. If that fails,
92 # it will default to "localhost".
93
94 #HostName localhost
95
96 # HTMLExtension allows you to specify the filename extension to use
97 # for generated HTML pages. Normally, this defaults to "html", but
98 # can be changed for sites who need it (like for PHP embeded pages).
99
100 #HTMLExtension html
101
102 # PageType lets you tell the Webalizer what types of URL's you
103 # consider a 'page'. Most people consider html and cgi documents
104 # as pages, while not images and audio files. If no types are
105 # specified, defaults will be used ('htm*', 'cgi' and HTMLExtension
106 # if different for web logs, 'txt' for ftp logs).
107
108 PageType htm*
109 PageType cgi
110 #PageType phtml
111 #PageType php3
112 #PageType pl
113
114 # UseHTTPS should be used if the analysis is being run on a
115 # secure server, and links to urls should use 'https://' instead
116 # of the default 'http://'. If you need this, set it to 'yes'.
117 # Default is 'no'. This only changes the behaviour of the 'Top
118 # URL's' table.
119
120 #UseHTTPS no
121
122 # DNSCache specifies the DNS cache filename to use for reverse DNS lookups.
123 # This file must be specified if you wish to perform name lookups on any IP
124 # addresses found in the log file. If an absolute path is not given as
125 # part of the filename (ie: starts with a leading '/'), then the name is
126 # relative to the default output directory. See the DNS.README file for
127 # additional information.
128 #
129 # Note that this is not yet supported in the Windows port of Webalizer.
130
131 #DNSCache dns_cache.db
132
133 # DNSChildren allows you to specify how many "children" processes are
134 # run to perform DNS lookups to create or update the DNS cache file.
135 # If a number is specified, the DNS cache file will be created/updated
136 # each time the Webalizer is run, immediately prior to normal processing,
137 # by running the specified number of "children" processes to perform
138 # DNS lookups. If used, the DNS cache filename MUST be specified as
139 # well. The default value is zero (0), which disables DNS cache file
140 # creation/updates at run time. The number of children processes to
141 # run may be anywhere from 1 to 100, however a large number may effect
142 # normal system operations. Reasonable values should be between 5 and
143 # 20. See the DNS.README file for additional information.
144
145 #DNSChildren 0
146
147 # HTMLPre defines HTML code to insert at the very beginning of the
148 # file. Default is the DOCTYPE line shown below. Max line length
149 # is 80 characters, so use multiple HTMLPre lines if you need more.
150
151 #HTMLPre <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
152
153 # HTMLHead defines HTML code to insert within the <HEAD></HEAD>
154 # block, immediately after the <TITLE> line. Maximum line length
155 # is 80 characters, so use multiple lines if needed.
156
157 #HTMLHead <META NAME="author" CONTENT="The Webalizer">
158
159 # HTMLBody defined the HTML code to be inserted, starting with the
160 # <BODY> tag. If not specified, the default is shown below. If
161 # used, you MUST include your own <BODY> tag as the first line.
162 # Maximum line length is 80 char, use multiple lines if needed.
163
164 #HTMLBody <BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000">
165
166 # HTMLPost defines the HTML code to insert immediately before the
167 # first <HR> on the document, which is just after the title and
168 # "summary period"-"Generated on:" lines. If anything, this should
169 # be used to clean up in case an image was inserted with HTMLBody.
170 # As with HTMLHead, you can define as many of these as you want and
171 # they will be inserted in the output stream in order of apperance.
172 # Max string size is 80 characters. Use multiple lines if you need to.
173
174 #HTMLPost <BR CLEAR="all">
175
176 # HTMLTail defines the HTML code to insert at the bottom of each
177 # HTML document, usually to include a link back to your home
178 # page or insert a small graphic. It is inserted as a table
179 # data element (ie: <TD> your code here </TD>) and is right
180 # alligned with the page. Max string size is 80 characters.
181
182 #HTMLTail <IMG SRC="msfree.png" ALT="100% Micro$oft free!">
183
184 # HTMLEnd defines the HTML code to add at the very end of the
185 # generated files. It defaults to what is shown below. If
186 # used, you MUST specify the </BODY> and </HTML> closing tags
187 # as the last lines. Max string length is 80 characters.
188
189 #HTMLEnd </BODY></HTML>
190
191 # The Quiet option suppresses output messages Useful when run
192 # as a cron job to prevent bogus e-mails. Values can be either
193 # "yes" or "no". Default is "no". Note: this does not suppress
194 # warnings and errors (which are printed to stderr).
195
196 #Quiet no
197
198 # ReallyQuiet will supress all messages including errors and
199 # warnings. Values can be 'yes' or 'no' with 'no' being the
200 # default. If 'yes' is used here, it cannot be overriden from
201 # the command line, so use with caution. A value of 'no' has
202 # no effect.
203
204 #ReallyQuiet no
205
206 # TimeMe allows you to force the display of timing information
207 # at the end of processing. A value of 'yes' will force the
208 # timing information to be displayed. A value of 'no' has no
209 # effect.
210
211 #TimeMe no
212
213 # GMTTime allows reports to show GMT (UTC) time instead of local
214 # time. Default is to display the time the report was generated
215 # in the timezone of the local machine, such as EDT or PST. This
216 # keyword allows you to have times displayed in UTC instead. Use
217 # only if you really have a good reason, since it will probably
218 # screw up the reporting periods by however many hours your local
219 # time zone is off of GMT.
220
221 #GMTTime no
222
223 # Debug prints additional information for error messages. This
224 # will cause webalizer to dump bad records/fields instead of just
225 # telling you it found a bad one. As usual, the value can be
226 # either "yes" or "no". The default is "no". It shouldn't be
227 # needed unless you start getting a lot of Warning or Error
228 # messages and want to see why. (Note: warning and error messages
229 # are printed to stderr, not stdout like normal messages).
230
231 #Debug no
232
233 # FoldSeqErr forces the Webalizer to ignore sequence errors.
234 # This is useful for Netscape and other web servers that cache
235 # the writing of log records and do not guarentee that they
236 # will be in chronological order. The use of the FoldSeqErr
237 # option will cause out of sequence log records to be treated
238 # as if they had the same time stamp as the last valid record.
239 # Default is to ignore out of sequence log records.
240
241 #FoldSeqErr no
242
243 # VisitTimeout allows you to set the default timeout for a visit
244 # (sometimes called a 'session'). The default is 30 minutes,
245 # which should be fine for most sites.
246 # Visits are determined by looking at the time of the current
247 # request, and the time of the last request from the site. If
248 # the time difference is greater than the VisitTimeout value, it
249 # is considered a new visit, and visit totals are incremented.
250 # Value is the number of seconds to timeout (default=1800=30min)
251
252 #VisitTimeout 1800
253
254 # IgnoreHist shouldn't be used in a config file, but it is here
255 # just because it might be usefull in certain situations. If the
256 # history file is ignored, the main "index.html" file will only
257 # report on the current log files contents. Usefull only when you
258 # want to reproduce the reports from scratch. USE WITH CAUTION!
259 # Valid values are "yes" or "no". Default is "no".
260
261 #IgnoreHist no
262
263 # Country Graph allows the usage by country graph to be disabled.
264 # Values can be 'yes' or 'no', default is 'yes'.
265
266 #CountryGraph yes
267
268 # DailyGraph and DailyStats allows the daily statistics graph
269 # and statistics table to be disabled (not displayed). Values
270 # may be "yes" or "no". Default is "yes".
271
272 #DailyGraph yes
273 #DailyStats yes
274
275 # HourlyGraph and HourlyStats allows the hourly statistics graph
276 # and statistics table to be disabled (not displayed). Values
277 # may be "yes" or "no". Default is "yes".
278
279 #HourlyGraph yes
280 #HourlyStats yes
281
282 # GraphLegend allows the color coded legends to be turned on or off
283 # in the graphs. The default is for them to be displayed. This only
284 # toggles the color coded legends, the other legends are not changed.
285 # If you think they are hideous and ugly, say 'no' here
286
287 #GraphLegend yes
288
289 # GraphLines allows you to have index lines drawn behind the graphs.
290 # I personally am not crazy about them, but a lot of people requested
291 # them and they weren't a big deal to add. The number represents the
292 # number of lines you want displayed. Default is 2, you can disable
293 # the lines by using a value of zero ('0'). [max is 20]
294 # Note, due to rounding errors, some values don't work quite right.
295 # The lower the better, with 1,2,3,4,6 and 10 producing nice results.
296
297 #GraphLines 2
298
299 # The "Top" options below define the number of entries for each table.
300 # Defaults are Sites=30, URL's=30, Referrers=30 and Agents=15, and
301 # Countries=30. TopKSites and TopKURLs (by KByte tables) both default
302 # to 10, as do the top entry/exit tables (TopEntry/TopExit). The top
303 # search strings and usernames default to 20. Tables may be disabled
304 # by using zero (0) for the value.
305
306 #TopSites 30
307 #TopKSites 10
308 #TopURLs 30
309 #TopKURLs 10
310 #TopReferrers 30
311 #TopAgents 15
312 #TopCountries 30
313 #TopEntry 10
314 #TopExit 10
315 #TopSearch 20
316 #TopUsers 20
317
318 # The All* keywords allow the display of all URL's, Sites, Referrers
319 # User Agents, Search Strings and Usernames. If enabled, a seperate
320 # HTML page will be created, and a link will be added to the bottom
321 # of the appropriate "Top" table. There are a couple of conditions
322 # for this to occur.. First, there must be more items than will fit
323 # in the "Top" table (otherwise it would just be duplicating what is
324 # already displayed). Second, the listing will only show those items
325 # that are normally visable, which means it will not show any hidden
326 # items. Grouped entries will be listed first, followed by individual
327 # items. The value for these keywords can be either 'yes' or 'no',
328 # with the default being 'no'. Please be aware that these pages can
329 # be quite large in size, particularly the sites page, and seperate
330 # pages are generated for each month, which can consume quite a lot
331 # of disk space depending on the traffic to your site.
332
333 #AllSites no
334 AllURLs yes
335 #AllReferrers no
336 #AllAgents no
337 AllSearchStr yes
338 #AllUsers no
339
340 # The Webalizer normally strips the string 'index.' off the end of
341 # URL's in order to consolidate URL totals. For example, the URL
342 # /somedir/index.html is turned into /somedir/ which is really the
343 # same URL. This option allows you to specify additional strings
344 # to treat in the same way. You don't need to specify 'index.' as
345 # it is always scanned for by The Webalizer, this option is just to
346 # specify _additional_ strings if needed. If you don't need any,
347 # don't specify any as each string will be scanned for in EVERY
348 # log record A bunch of them will degrade performance. Also,
349 # the string is scanned for anywhere in the URL, so a string of
350 # 'home' would turn the URL /somedir/homepages/brad/home.html into
351 # just /somedir/ which is probably not what was intended.
352
353 #IndexAlias home.htm
354 #IndexAlias homepage.htm
355
356 # The Hide*, Group* and Ignore* and Include* keywords allow you to
357 # change the way Sites, URL's, Referrers, User Agents and Usernames
358 # are manipulated. The Ignore* keywords will cause The Webalizer to
359 # completely ignore records as if they didn't exist (and thus not
360 # counted in the main site totals). The Hide* keywords will prevent
361 # things from being displayed in the 'Top' tables, but will still be
362 # counted in the main totals. The Group* keywords allow grouping
363 # similar objects as if they were one. Grouped records are displayed
364 # in the 'Top' tables and can optionally be displayed in BOLD and/or
365 # shaded. Groups cannot be hidden, and are not counted in the main
366 # totals. The Group* options do not, by default, hide all the items
367 # that it matches. If you want to hide the records that match (so just
368 # the grouping record is displayed), follow with an identical Hide*
369 # keyword with the same value. (see example below) In addition,
370 # Group* keywords may have an optional label which will be displayed
371 # instead of the keywords value. The label should be seperated from
372 # the value by at least one 'white-space' character, such as a space
373 # or tab.
374 #
375 # The value can have either a leading or trailing '*' wildcard
376 # character. If no wildcard is found, a match can occur anywhere
377 # in the string. Given a string "www.yourmama.com", the values "your",
378 # "*mama.com" and "www.your*" will all match.
379
380 # Your own site should be hidden
381 #HideSite *mrunix.net
382 #HideSite localhost
383
384 # Your own site gives most referrals
385 #HideReferrer mrunix.net/
386
387 # This one hides non-referrers ("-" Direct requests)
388 #HideReferrer Direct Request
389
390 # Usually you want to hide these
391 HideURL *.gif
392 HideURL *.GIF
393 HideURL *.jpg
394 HideURL *.JPG
395 HideURL *.png
396 HideURL *.PNG
397 HideURL *.ra
398 HideURL *.css
399
400 # Hiding agents is kind of futile
401 #HideAgent RealPlayer
402
403 # You can also hide based on authenticated username
404 #HideUser root
405 #HideUser admin
406
407 # Grouping options
408 #GroupURL /cgi-bin/* CGI Scripts
409 #GroupURL /images/* Images
410
411 #GroupSite *.aol.com
412 #GroupSite *.compuserve.com
413
414 #GroupReferrer yahoo.com/ Yahoo!
415 #GroupReferrer excite.com/ Excite
416 #GroupReferrer infoseek.com/ InfoSeek
417 #GroupReferrer webcrawler.com/ WebCrawler
418
419 #GroupUser root Admin users
420 #GroupUser admin Admin users
421 #GroupUser wheel Admin users
422
423 # The following is a great way to get an overall total
424 # for browsers, and not display all the detail records.
425 # (You should use MangleAgent to refine further)
426
427 #GroupAgent MSIE Micro$oft Internet Exploder
428 #HideAgent MSIE
429 #GroupAgent Mozilla Netscape
430 #HideAgent Mozilla
431 #GroupAgent Lynx* Lynx
432 #HideAgent Lynx*
433
434 # HideAllSites allows forcing individual sites to be hidden in the
435 # report. This is particularly useful when used in conjunction
436 # with the "GroupDomain" feature, but could be useful in other
437 # situations as well, such as when you only want to display grouped
438 # sites (with the GroupSite keywords). The value for this
439 # keyword can be either 'yes' or 'no', with 'no' the default,
440 # allowing individual sites to be displayed.
441
442 #HideAllSites no
443
444 # The GroupDomains keyword allows you to group individual hostnames
445 # into their respective domains. The value specifies the level of
446 # grouping to perform, and can be thought of as 'the number of dots'
447 # that will be displayed. For example, if a visiting host is named
448 # cust1.tnt.mia.uu.net, a domain grouping of 1 will result in just
449 # "uu.net" being displayed, while a 2 will result in "mia.uu.net".
450 # The default value of zero disable this feature. Domains will only
451 # be grouped if they do not match any existing "GroupSite" records,
452 # which allows overriding this feature with your own if desired.
453
454 #GroupDomains 0
455
456 # The GroupShading allows grouped rows to be shaded in the report.
457 # Useful if you have lots of groups and individual records that
458 # intermingle in the report, and you want to diferentiate the group
459 # records a little more. Value can be 'yes' or 'no', with 'yes'
460 # being the default.
461
462 #GroupShading yes
463
464 # GroupHighlight allows the group record to be displayed in BOLD.
465 # Can be either 'yes' or 'no' with the default 'yes'.
466
467 #GroupHighlight yes
468
469 # The Ignore* keywords allow you to completely ignore log records based
470 # on hostname, URL, user agent, referrer or username. I hessitated in
471 # adding these, since the Webalizer was designed to generate _accurate_
472 # statistics about a web servers performance. By choosing to ignore
473 # records, the accuracy of reports become skewed, negating why I wrote
474 # this program in the first place. However, due to popular demand, here
475 # they are. Use the same as the Hide* keywords, where the value can have
476 # a leading or trailing wildcard '*'. Use at your own risk ;)
477
478 #IgnoreSite bad.site.net
479 #IgnoreURL /test*
480 #IgnoreReferrer file:/*
481 #IgnoreAgent RealPlayer
482 #IgnoreUser root
483
484 # The Include* keywords allow you to force the inclusion of log records
485 # based on hostname, URL, user agent, referrer or username. They take
486 # precidence over the Ignore* keywords. Note: Using Ignore/Include
487 # combinations to selectivly process parts of a web site is _extremely
488 # inefficent_!!! Avoid doing so if possible (ie: grep the records to a
489 # seperate file if you really want that kind of report).
490
491 # Example: Only show stats on Joe User's pages
492 #IgnoreURL *
493 #IncludeURL ~joeuser*
494
495 # Or based on an authenticated username
496 #IgnoreUser *
497 #IncludeUser someuser
498
499 # The MangleAgents allows you to specify how much, if any, The Webalizer
500 # should mangle user agent names. This allows several levels of detail
501 # to be produced when reporting user agent statistics. There are six
502 # levels that can be specified, which define different levels of detail
503 # supression. Level 5 shows only the browser name (MSIE or Mozilla)
504 # and the major version number. Level 4 adds the minor version number
505 # (single decimal place). Level 3 displays the minor version to two
506 # decimal places. Level 2 will add any sub-level designation (such
507 # as Mozilla/3.01Gold or MSIE 3.0b). Level 1 will attempt to also add
508 # the system type if it is specified. The default Level 0 displays the
509 # full user agent field without modification and produces the greatest
510 # amount of detail. User agent names that can't be mangled will be
511 # left unmodified.
512
513 #MangleAgents 0
514
515 # The SearchEngine keywords allow specification of search engines and
516 # their query strings on the URL. These are used to locate and report
517 # what search strings are used to find your site. The first word is
518 # a substring to match in the referrer field that identifies the search
519 # engine, and the second is the URL variable used by that search engine
520 # to define it's search terms.
521
522 SearchEngine yahoo.com p=
523 SearchEngine altavista.com q=
524 SearchEngine google.com q=
525 SearchEngine eureka.com q=
526 SearchEngine lycos.com query=
527 SearchEngine hotbot.com MT=
528 SearchEngine msn.com MT=
529 SearchEngine infoseek.com qt=
530 SearchEngine webcrawler searchText=
531 SearchEngine excite search=
532 SearchEngine netscape.com search=
533 SearchEngine mamma.com query=
534 SearchEngine alltheweb.com query=
535 SearchEngine northernlight.com qr=
536 SearchEngine baidu.com word=
537 SearchEngine sina.com.cn word=
538 SearchEngine sohu.com word=
539 SearchEngine 163.com q=
540
541 # The Dump* keywords allow the dumping of Sites, URL's, Referrers
542 # User Agents, Usernames and Search strings to seperate tab delimited
543 # text files, suitable for import into most database or spreadsheet
544 # programs.
545
546 # DumpPath specifies the path to dump the files. If not specified,
547 # it will default to the current output directory. Do not use a
548 # trailing slash ('/').
549
550 #DumpPath /var/lib/httpd/logs
551
552 # The DumpHeader keyword specifies if a header record should be
553 # written to the file. A header record is the first record of the
554 # file, and contains the labels for each field written. Normally,
555 # files that are intended to be imported into a database system
556 # will not need a header record, while spreadsheets usually do.
557 # Value can be either 'yes' or 'no', with 'no' being the default.
558
559 #DumpHeader no
560
561 # DumpExtension allow you to specify the dump filename extension
562 # to use. The default is "tab", but some programs are pickey about
563 # the filenames they use, so you may change it here (for example,
564 # some people may prefer to use "csv").
565
566 #DumpExtension tab
567
568 # These control the dumping of each individual table. The value
569 # can be either 'yes' or 'no'.. the default is 'no'.
570
571 #DumpSites no
572 DumpURLs yes
573 #DumpReferrers no
574 #DumpAgents no
575 #DumpUsers no
576 DumpSearchStr yes
577
578 # End of configuration file Have a nice day!
579
580
581
大家可以直接拷贝了我的用。
另外,需要将iis里的日志格式,设置为NC,这一步很重要,如果分析日志的时候出现truncating oversized username,那就是这里的问题了。呵呵
然后运行 webalizer.exe 分析就可以了。