一个文档包含了安装、增量备份、扩展、api调用示例,省去了查找大量文章的时间。
搭建coreseek(sphinx+mmseg3)安装
[第一步] 先安装mmseg3
cd /var/installwget http://www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gztar zxvf coreseek-4.1-beta.tar.gzcd coreseek-4.1-betacd mmseg-3.2.14./bootstrap./configure --prefix=/usr/local/mmseg3make && make install遇到的问题:error: cannot find input file: src/Makefile.in或者遇到其他类似error错误时...解决方案:依次执行下面的命令,我运行'aclocal'时又出现了错误,解决方案请看下文描述yum -y install libtoolaclocallibtoolize --forceautomake --add-missingautoconfautoheadermake clean
安装好'libtool'继续从'aclocal'开始执行上面提到的一串命令,执行完后再运行最开始的安装流程即可。
[第二步] 安装coreseek
##安装coreseek$ cd csft-3.2.14 或者 cd csft-4.0.1 或者 cd csft-4.1$ sh buildconf.sh #输出的warning信息可以忽略,如果出现error则需要解决$ ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql##如果提示mysql问题,可以查看MySQL数据源安装说明 http://www.coreseek.cn/product_install/install_on_bsd_linux/#mysql$ make && make install$ cd ..##命令行测试mmseg分词,coreseek搜索(需要预先设置好字符集为zh_CN.UTF-8,确保正确显示中文)$ cd testpack$ cat var/test/test.xml #此时应该正确显示中文$ /usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc var/test/test.xml$ /usr/local/coreseek/bin/indexer -c etc/csft.conf --all$ /usr/local/coreseek/bin/search -c etc/csft.conf 网络搜索
出现这个 xmlpipe2 support NOT compiled in. To use xmlpipe2, install missing XML libra 错误
执行以下命令:
yum -y install expat expat-devel
依次安装后,从新编译coreseek,然后再生成索引,就可以通过了。
结果如下:
Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file 'etc/csft.conf'... index 'xml': query '网络搜索 ': returned 1 matches of 1 total in 0.000 sec displaying matches: 1. document=1, weight=1590, published=Thu Apr 1 07:20:07 2010, author_id=1 words: 1. '网络': 1 documents, 1 hits 2. '搜索': 2 documents, 5 hits
下面开始sphinx与mysql的配置
创建sphinx统计表,在coreseek_test库中执行。
CREATE TABLE sph_counter( counter_id INTEGER PRIMARY KEY NOT NULL, max_doc_id INTEGER NOT NULL);
创建配置sphinx与mysql的配置文件
# vi /usr/local/coreseek/etc/csft_mysql.conf
#MySQL数据源配置,详情请查看:http://www.coreseek.cn/products-install/mysql/#请先将var/test/documents.sql导入数据库,并配置好以下的MySQL用户密码数据库#源定义source main #定义源名称{ type = mysql sql_host = localhost sql_user = root sql_pass = 123456 sql_db = coreseek_test sql_port = 3306 sql_query_pre = SET NAMES utf8 sql_query_pre = REPLACE INTO sph_counter SELECT 1,MAX(id) FROM hr_spider_company; # 更新sph_counter sql_query = SELECT * FROM hr_spider_company WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) # 根据sph_counter纪录ID读入数据 #sql_query第一列id需为整数 #title、content作为字符串/文本字段,被全文索引,请参考数据库实际字段 sql_attr_uint = from_id #从SQL读取到的值必须为整数,请参考数据库实际字段 sql_attr_uint = link_id #从SQL读取到的值必须为整数,请参考数据库实际字段 sql_attr_uint = add_time #从SQL读取到的值必须为整数,请参考数据库实际字段}#增量源定义source delta : main #注意与定义名称的统一性{ sql_query_pre = SET NAMES utf8 sql_query = SELECT * FROM hr_spider_company WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) # 根据sph_counter纪录ID读入数据 sql_query_post_index = REPLACE INTO sph_counter SELECT 1,MAX(id) FROM hr_spider_company # 更新sph_counter}#index定义index main #注意与定义名称的统一性{ source = main #对应的source名称 path = /usr/local/coreseek/var/data/mysql #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/... docinfo = extern mlock = 0 morphology = none min_word_len = 1 html_strip = 0 #中文分词配置,详情请查看:http://www.coreseek.cn/products-install/coreseek_mmseg/ charset_dictpath = /usr/local/mmseg3/etc/ #BSD、Linux环境下设置,/符号结尾 charset_type = zh_cn.utf-8}index delta : main #注意与定义名称的统一性{ source = delta path = /usr/local/coreseek/var/data/delta}#全局index定义indexer{ mem_limit = 128M}#searchd服务定义searchd{ listen = 9312 read_timeout = 5 max_children = 30 max_matches = 1000 seamless_rotate = 0 preopen_indexes = 0 unlink_old = 1 pid_file = /usr/local/coreseek/var/log/searchd_mysql.pid #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/... log = /usr/local/coreseek/var/log/searchd_mysql.log #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/... query_log = /usr/local/coreseek/var/log/query_mysql.log #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/... binlog_path = #关闭binlog日志}
我的测试表名为hr_spider_company,你只需要根据实际需求更改为自己的表名即可。
调用命令列表:
启动后台服务(必须开启)
# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf
执行索引(查询、测试前必须执行一次)
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate
执行增量索引
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate
合并索引
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted 0 0
(为了防止多个关键字指向同一个文档加上--merge-dst-range deleted 0 0)
后台服务测试
# /usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/csft_mysql.conf aaa
关闭后台服务
# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf --stop
自动化命令:
crontab -e
*/1 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate*/5 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted 0 030 1 * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate
以下任务计划的意思是:每隔一分钟执行一遍增量索引,每五分钟执行一遍合并索引,每天1:30执行整体索引。
Sphinx扩展安装安装
Coreseek官方教程中建议php使用直接include一个php文件进行操作,事实上php有独立的sphinx模块可以直接操作coreseek(coreseek就是sphinx!)已经进入了php的官方函数库,而且效率的提升不是一点点!但php模块依赖于libsphinxclient包。
[第一步] 安装依赖libsphinxclient
# cd /var/install/coreseek-4.1-beta/csft-4.1/api/libsphinxclient/# ./configure --prefix=/usr/local/sphinxclientconfigure: creating ./config.statusconfig.status: creating Makefileconfig.status: error: cannot find input file: Makefile.in #报错configure失败 //处理configure报错编译过程中报了一个config.status: error: cannot find input file: src/Makefile.in这个的错误,然后运行下列指令再次编译就能通过了:# aclocal# libtoolize --force# automake --add-missing# autoconf# autoheader# make clean//从新configure编译# ./configure# make && make install
[第二步] 安装sphinx的PHP扩展
http://pecl.php.net/package/sphinx# wget http://pecl.php.net/get/sphinx-1.3.0.tgz# tar zxvf sphinx-1.3.0.tgz# cd sphinx-1.3.0# phpize# ./configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/sphinxclient# make && make install# cd /etc/php.d/# cp gd.ini sphinx.ini# vi sphinx.iniextension=sphinx.so# service php-fpm restart
打开phpinfo看一下是否已经支持了sphinx模块。
php调用sphinx示例:
<?php $s = new SphinxClient; $s->setServer("127.0.0.1", 9312); $s->setMatchMode(SPH_MATCH_PHRASE); $s->setMaxQueryTime(30); $res = $s->query("宝马",'main'); #[宝马]关键字,[main]数据源source $err = $s->GetLastError(); var_dump(array_keys($res['matches'])); echo "<br>"."通过获取的ID来读取数据库中的值即可。"."<br>"; echo '<pre>'; var_dump($res); var_dump($err); echo '</pre>';
输出结果:
array(20) { [0]=> int(1513) [1]=> int(42020) [2]=> int(57512) [3]=> int(59852) [4]=> int(59855) [5]=> int(60805) [6]=> int(94444) [7]=> int(94448) [8]=> int(99229) [9]=> int(107524) [10]=> int(111918) [11]=> int(148) [12]=> int(178) [13]=> int(595) [14]=> int(775) [15]=> int(860) [16]=> int(938) [17]=> int(1048) [18]=> int(1395) [19]=> int(1657)}<br>通过获取的ID来读取数据库中的值即可。<br><pre>array(10) { ["error"]=> string(0) "" ["warning"]=> string(0) "" ["status"]=> int(0) ["fields"]=> array(17) { [0]=> string(3) "cid" [1]=> string(8) "link_url" [2]=> string(12) "company_name" [3]=> string(9) "type_name" [4]=> string(10) "trade_name" [5]=> string(5) "scale" [6]=> string(8) "homepage" [7]=> string(7) "address" [8]=> string(9) "city_name" [9]=> string(8) "postcode" [10]=> string(7) "contact" [11]=> string(9) "telephone" [12]=> string(6) "mobile" [13]=> string(3) "fax" [14]=> string(5) "email" [15]=> string(11) "description" [16]=> string(11) "update_time" } ["attrs"]=> array(3) { ["from_id"]=> string(1) "1" ["link_id"]=> string(1) "1" ["add_time"]=> string(1) "1" } ["matches"]=> array(20) { [1513]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "3171471" ["add_time"]=> string(10) "1394853454" } } [42020]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "2248093" ["add_time"]=> string(10) "1394913884" } } [57512]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "2684470" ["add_time"]=> string(10) "1394970833" } } [59852]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "3" ["link_id"]=> string(1) "0" ["add_time"]=> string(10) "1394977527" } } [59855]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "3" ["link_id"]=> string(1) "0" ["add_time"]=> string(10) "1394977535" } } [60805]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "3" ["link_id"]=> string(1) "0" ["add_time"]=> string(10) "1394980072" } } [94444]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "3" ["link_id"]=> string(1) "0" ["add_time"]=> string(10) "1395084115" } } [94448]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "3" ["link_id"]=> string(1) "0" ["add_time"]=> string(10) "1395084124" } } [99229]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "1297992" ["add_time"]=> string(10) "1395100520" } } [107524]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "5" ["link_id"]=> string(10) "4294967295" ["add_time"]=> string(10) "1395122053" } } [111918]=> array(2) { ["weight"]=> int(2) ["attrs"]=> array(3) { ["from_id"]=> string(1) "5" ["link_id"]=> string(10) "4294967295" ["add_time"]=> string(10) "1395127953" } } [148]=> array(2) { ["weight"]=> int(1) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "2770294" ["add_time"]=> string(10) "1394852562" } } [178]=> array(2) { ["weight"]=> int(1) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "2474558" ["add_time"]=> string(10) "1394852579" } } [595]=> array(2) { ["weight"]=> int(1) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(6) "534804" ["add_time"]=> string(10) "1394852862" } } [775]=> array(2) { ["weight"]=> int(1) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "3230353" ["add_time"]=> string(10) "1394852980" } } [860]=> array(2) { ["weight"]=> int(1) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "2549233" ["add_time"]=> string(10) "1394853048" } } [938]=> array(2) { ["weight"]=> int(1) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "3191382" ["add_time"]=> string(10) "1394853114" } } [1048]=> array(2) { ["weight"]=> int(1) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "3234645" ["add_time"]=> string(10) "1394853174" } } [1395]=> array(2) { ["weight"]=> int(1) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "2661219" ["add_time"]=> string(10) "1394853375" } } [1657]=> array(2) { ["weight"]=> int(1) ["attrs"]=> array(3) { ["from_id"]=> string(1) "2" ["link_id"]=> string(7) "2670624" ["add_time"]=> string(10) "1394853540" } } } ["total"]=> int(543) ["total_found"]=> int(543) ["time"]=> float(0.109) ["words"]=> array(1) { ["宝马"]=> array(2) { ["docs"]=> int(543) ["hits"]=> int(741) } }}string(0) ""</pre>