很久以前,在vb里用正则regexp组件,处理数据很爽快,后来在VBA里也是,vbscript regular expression 5.5
后来我就想知道数据库里是不是也支持,于是我找到了下面的正则工具,是一群强大的外国sql server 程序员开发的
其强大之处,用过正则的人会明白,支持test,replace,特别是支持execute!!! 在程序中,execute是返回一个对象,他有各种属性,方法,子成员。而在数据库中,这些是放在了相互关联的表里,太有才了!!
不过正如作者所说,CLR会更快一些,而且说实在的,正则更趋于对行数据进行处理,所以我在用的时候觉得挺慢的,虽然其功能很绚丽。sql server 里没有vi来辅助工作,用用正则算是一个弥补吧。
以下代码在sql server 2000中已测试,其中数据类型有需要修改的 2000中不支持nvarchar(max),把这个一改就可以运行了
在此向国外的同行学习,致敬!
字数限制,按照实现的功能分开贴了,分别是:test方法,replace方法,和最强大的execute方法
- SQL code
/*This Workbench is about using Regular expressions with SQL Server via TSQL. It doesn't even attempt to teach how regular expressions work or how to pull them together. The aim is to demonstrate a few possibilities and try to persuade you to experiment with them if you don't already use Regex with SQL Server.We suggest that, if you are an ordinary mortal without special powers like Phil and I, you should use an application such as http://www.regexbuddy.com/ RegexBuddy to form, edit and interpret Regular expressions, as it makes learning them a lot easier. In order that people with access only to SQL Server 2000 can use the workbench, we'll use OLE in the examples, but they are readily adapted to CLR Contents---------IntroductionThe OLE Functions The OLE Regex Match function The OLE Regex Replace function The OLE Regex Find (Execute) functionCombining two RegexsRegex PerformanceRegular Expressions can be very useful to the Database programmer, particularly for data validation, data feeds and data transformations.Regular Expressions are not regular in the sense that there is any common dialect of expression that is understood by all Regex engines. On the contrary, regular expresssions aren't always portable and there are many common, similar but incompatible, dialects in use, such as Perl 5.8, Java.util.regex, .NET, PHP, Python, Ruby, ECMA Javascript, PCRE, Apache, vi, Shell tools TCL ARE, POSIX BRE, Funduc and JGsoft. Regular Expressions were never developed to be easy to understand. They are a condensed shorthand that, on preliminary inspection, looks as if someone has repeatedly sat on the keyboard. Even when interpreted, the logic isn't always easy to follow. If you don't agree, then explain this one! http://aspn.activestate.com/ASPN/Cookbook/Rx/Recipe/59864See http://www.simple-talk.com/dotnet/.net-framework/implementing-real-world-data-input-validation-using-regular-expressions/ for an introduction to regular expressionsA great deal can be done using commandline applications that work with regular expressions such as GREP and AWK. However, there are times where it is handy to use Regex directly from TSQL. There are two Regex engines available to SQL Server. These are the .NET Regex which is in the system.text.regularexpression moduleThe ECMA Regex from VBScript.RegExp which is distributed with the IE browser and is used by Javascript and JScript.Both of these are excellent standard implementations. Both work well in TSQL. The .NET Regex requires the creation of CLR functions to provide regular expressions, and works only with SQL Server 2005, (and 2007) http://www.sqlservercentral.com/articles/Development/clrintegration/1967/The ECMA Regex can be used via VBScript.RegExp, which are available to SQL Server 2000 as well. The regex is compatible with Javascript.The advantage of using CLR is that the regular expressions of the NET framework are very good, and performance is excellent. However, the techniques are well-known, whereas some of the more powerful uses of VBScript.RegExp have hardly ever been published, so this workbench will concentrate on the latterThe OLE functions------------------There are various properties to consider in these functionsIgnoreCase By default, the regular expression is case sensitive. In the following functions, we have set the IgnoreCase property to True to make it case insensitive. The Multiline property The caret and dollar only match at the very start and very end of the subject string by default. If your subject string consists of multiple lines separated by line breaks, you can make the caret and dollar match at the start and the end of those lines by setting the Multiline property to True. (there is no option to make the dot match line break characters). The Global property If you want the RegExp object to return or replace all matches instead of just the first one, set the Global property to True.Only the 'IgnoreCase is relevant in the first function but we've 'hardcoded' it to 1 as case-sensitive searches are a minority interest.The OLE Regex Match function-----------------------------Let's start off with something simple, a function for testing a string against a regular expression*/IF OBJECT_ID (N'dbo.RegexMatch') IS NOT NULL DROP FUNCTION dbo.RegexMatchGOCREATE FUNCTION dbo.RegexMatch ( @pattern VARCHAR(2000), @matchstring VARCHAR(8000)--Varchar(8000) got SQL Server 2000 )RETURNS INT/* The RegexMatch returns True or False, indicating if the regular expression matches (part of) the string. (It returns null if there is an error).When using this for validating user input, you'll normally want to check if the entire string matches the regular expression. To do so, put a caret at the start of the regex, and a dollar at the end, to anchor the regex at the start and end of the subject string.*/ AS BEGIN DECLARE @objRegexExp INT, @objErrorObject INT, @strErrorMessage VARCHAR(255), @hr INT, @match BIT SELECT @strErrorMessage = 'creating a regex object' EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'Pattern', @pattern --Specifying a case-insensitive match IF @hr = 0 EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1 --Doing a Test' IF @hr = 0 EXEC @hr= sp_OAMethod @objRegexExp, 'Test', @match OUT, @matchstring IF @hr <> 0 BEGIN RETURN NULL END EXEC sp_OADestroy @objRegexExp RETURN @match ENDGO/* Now, with this routine, we can do some complex input validation*/--IS there a repeating wordSELECT dbo.RegexMatch('\b(\w+)\s+\1\b','this has has been repeated')--1SELECT dbo.RegexMatch('\b(\w+)\s+\1\b','this has not been repeated')--0--find a word near another word (in this case 'for' and 'last' 1 or 2 words apart)SELECT dbo.RegexMatch('\bfor(?:\W+\w+){1,2}?\W+last\b', 'You have failed me for the last time, Admiral')--1SELECT dbo.RegexMatch('\bfor(?:\W+\w+){1,2}?\W+last\b', 'You have failed me for what could be the last time, Admiral')--0--is this likely to be a valid credit cardSELECT dbo.RegexMatch('^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6011[0-9]{12}|3(?:0[0-5]|[68][0-9])[0-9]{11}|3[47][0-9]{13}|(?:2131|1800)\d{11})$','4953129482924435') --IS this a valid ZIP codeSELECT dbo.RegexMatch('^[0-9]{5,5}([- ]?[0-9]{4,4})?$','02115-4653')--is this a valid PostcodeSELECT dbo.RegexMatch('^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) {0,1}[0-9][A-Za-z]{2})$','RG35 2AQ')--is this a valid European dateSELECT dbo.RegexMatch('^((((31\/(0?[13578]|1[02]))|((29|30)\/(0?[1,3-9]|1[0-2])))\/(1[6-9]|[2-9]\d)?\d{2})|(29\/0?2\/(((1[6-9]|[2-9]\d)?(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))|(0?[1-9]|1\d|2[0-8])\/((0?[1-9])|(1[0-2]))\/((1[6-9]|[2-9]\d)?\d{2})) (20|21|22|23|[0-1]?\d):[0-5]?\d:[0-5]?\d$','12/12/2007 20:15:27')--is this a valid currency value (dollar)SELECT dbo.RegexMatch('^\$(\d{1,3}(\,\d{3})*|(\d+))(\.\d{2})?$','$34,000.00')--is this a valid currency value (Sterling)SELECT dbo.RegexMatch('^\?\d{1,3}(\,\d{3})*|(\d+))(\.\d{2})?$','?4,000.00')--A valid email address?SELECT dbo.RegexMatch('^(([a-zA-Z0-9!#\$%\^&\*\{\}''`\+=-_\|/\?]+(\.[a-zA-Z0-9!#\$%\^&\*\{\}''`\+=-_\|/\?]+)*){1,64}@(([A-Za-z0-9]+[A-Za-z0-9-_]*){1,63}\.)*(([A-Za-z0-9]+[A-Za-z0-9-_]*){3,63}\.)+([A-Za-z0-9]{2,4}\.?)+){1,255}$',[email protected]')/*With this function, the passing back of errors is rudimentary. If an OLE error occurs, then a null is passed back.There are two other basic Regex functions available. With them, you can use regular expressions in all sorts of places in TSQL without having to get to direct grips with the rather awkward OLE interface.