当前位置: 代码迷 >> Oracle管理 >> 请教把word,ppt,pdf存入blob字段中能否进行中文的全文检索
  详细解决方案

请教把word,ppt,pdf存入blob字段中能否进行中文的全文检索

热度:69   发布时间:2016-04-24 05:55:35.0
请问把word,ppt,pdf存入blob字段中能否进行中文的全文检索
请问把word,ppt,pdf等文本文档存入blob字段中能否进行中文的全文检索?
是否必须用clob字段才行?

------解决方案--------------------
你可以加一个字段,来给这个大的字段,无法查的字段写个注释,查时就查这个注释字段
------解决方案--------------------
下面是如何检索XML文档的例子
InterMedia Text 支持索引XML文档通过指定区段组。区段组就是XML文档中预先定义的节点.你可以用WithIn在指定检索某个节点,提高了检索的准确性。

 

1) 首先,创建一个表来存储我们的XML文档:

CREATE TABLE employee_xml(

id NUMBER PRIMARY KEY,

xmldoc CLOB )

/

2) 插入一个简单的文档(the DTD is not required)
INSERT INTO employee_xml
VALUES (1,
 '<?xml version="1.0"?>
<!DOCTYPE employee [
<!ELEMENT employee (Name, Dept, Title)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Dept (#PCDATA)>
<!ELEMENT Title (#PCDATA)>
]>
<employee>
<Name>Joel Kallman</Name>
<Dept>Oracle Service Industries Technology Group</Dept>
<Title>Technologist</Title>
</employee>');

3)创建一个叫'xmlgroup'的interMedia Text section group , 添加 Name和Dept tag到section group中。(Caution: in XML, tag names are case-sensitive, but
tag names in section groups are case-insensitive)
BEGIN
ctx_ddl.create_section_group ('xmlgroup', 'XML_SECTION_GROUP');
ctx_ddl.add_zone_section ('xmlgroup', 'Name', 'Name');
ctx_ddl.add_zone_section ('xmlgroup', 'Dept', 'Dept');
END;

4)Create our interMedia Text index, specifying the section group we created above.
Also, specify the null_filter, as the Inso filter is not required.

CREATE INDEX employee_xml_index
ON employee_xml( xmldoc )
INDEXTYPE IS ctxsys.CONTEXT PARAMETERS(
'filter ctxsys.null_filter section group xmlgroup' )
/

5) 现在,执行一个查询,搜寻特定Section中的Name:
SELECT id
FROM employee_xml
WHERE contains (xmldoc, 'Joel within Name') > 0;

6)Only non-empty tags will be indexed, but not the tag names themselves.
 Thus, the following queries will return zero rows.
SELECT id
FROM employee_xml
WHERE contains (xmldoc, 'title') > 0;
SELECT id
FROM employee_xml
WHERE contains (xmldoc, 'employee') > 0;

7) But the following query will locate our document, even though we have not defined
 Title as a section.
SELECT id
FROM employee_xml
WHERE contains (xmldoc, 'Technologist') > 0;
 
Let's say you want to get going right away with indexing XML, and don't want to have to specify sections for every element in your XML document collection. You can do this very easily by using the predefined AUTO_SECTION_GROUP. This section group is exactly like the XML section group, but the pre-definition of sections is not required. For all non-empty tags in your document, a zone section will be created with the section name the same as the tag name.

Use of the AUTO_SECTION_GROUP is also ideal when you may not know in advance all of the tag names that will be a part of your XML document set.

8) Drop our existing interMedia Text index.

9)And this time, recreate it specifying the AUTO_SECTION_GROUP.
We do not need to predefine the sections of our group, it is handled for us Automatically.

DROP INDEX employee_xml_index
/

CREATE INDEX employee_xml_index ON employee_xml( xmldoc )
INDEXTYPE IS ctxsys.CONTEXT PARAMETERS( 'filter ctxsys.null_filter section group ctxsys.auto_section_group' )

10) 再一次,我们使用Section查找定位我们的文档:
SELECT id
FROM employee_xml
 WHERE contains (xmldoc, 'Technologist within Title') > 0;

具体请参考:http://epub.itpub.net/4/1.htm

  相关解决方案