[Date Prev][Date Next][Thread Prev][Thread Next]
[Author Index]
[Date Index]
[Thread Index]
[SQR-USERS Info]
[SQRUG Home Page]
Pattern matching procedure now returns substrings
- Subject: Pattern matching procedure now returns substrings
- From: John Milardovic <milardj@SX.COM>
- Date: Tue, 29 Aug 2000 19:41:25 -0400
Some of you might recall that I posted a pattern matching library to the
list group in Feb. The library tests whether a specified pattern exists in
a string. I have now modified it to return substrings that are specified
with patterns. It works great on my test set of data (admittedly a small
set) and I thought I would post it to everyone to play with. The source
code contains a pretty thorough explanation of how to use the library. I
would appreciate any bug reports.
The source code is NOT documented but I will be including the final version
in The SQR Cookbook and that will be fully documented and (hopefully) bug
free with your help.
Anyone is free to use the code as they see fit as long as ALL my comments
remain in place. If you make any modifications please let me know so that I
can incorporate them into my version.
The latest changes allow you to test for a pattern and to return substrings.
Example 1 -
Given a string of "1-905-999-8888" we can pull out the area code with the
pattern "\d-(\d\d\d)"
Given a string "29-Aug-00" we can validate that it is in a proper date
format AND pull out the month with the pattern "\d\d~-(\a\a\a)-\d\d"
where :
\d\d~ indicates 1 or 2 digits (the tilde means 0 or 1 occurrence of the
previous pattern/literal)
- is a literal hyphen
(\a\a\a) indicates 3 characters of values a to z and the brackets tell the
procedure to return the encapsulated pattern if found
etc
The 2 biggest weaknesses at this point is that all patterns are anchored to
the beginning of the string and that I don't have a "tentative" state.
For example given the string "905-999-8888" and the pattern
"\d~-~\d\d\d-\d\d\d-\d\d\d\d" this would return false because the first \d
is matching to the 9 and so when we get to the first set of \d\d\d this is
false because it corresponds to "05-"in the string. So I need to get rid of
this "false positive" and allow all "optional" modified patterns to be
ignored and to try to rematch when confronted with a "false negative" later
on. Have I lost everyone?
All that said, once you realize the above then you can simply modify your
pattern.
Example
The task is to pull out all area codes from our phone column and to verify
that we have a legal phone format. The phone numbers can be in the form
"9-999-999-9999" or "999-999-9999" the pattern
"1~-~(\d\d\d)-\d\d\d-\d\d\d\d" can be used and it will work correctly in all
cases where we DO NOT have an area code that begins with a "1". This is
because the 1 will never match (unless it is the long distance code) so we
do not get the false positive.
We can go even further. Say in addition to the formats above we can also
have the format "12223334444" or "2223334444" the pattern
"1~-~(\d\d\d)-~\d\d\d-~\d\d\d\d" will return true for all the above formats.
Now lets say that another format is allowed "999.999.9999" then the pattern
would be "1~-~(\d\d\d)-~.~\d\d\d-~.~\d\d\d\d).
Now let's say that we want to convert all the "2223334444" and
"999.999.9999" formats into "222-333-4444". We simple use the following
pattern "(\d\d\d).~(\d\d\d).~(\d\d\d\d) and concatenate the returned values
with hyphens and write to the database. (see attachment for details on how
substrings are returned).
I hope everyone enjoys/finds useful the library and excuses my long post.
Please feel free to contact me directly with any questions or problems you
may have in using it. I would also appreciate any bug reports. The
patterns can get confusing though so double check them before thinking it is
a bug.
I would also like to hear how people think this could be used (validating
dates/phone numbers/zip codes/postal codes etc, pulling out area codes, last
names, PeopleSoft Name validation? etc)
Please do not reply directly to the list with problems/comments etc. Send
e-mails to milardj@sx.com or (preferably) milardj@yahoo.com.
<<pattern.sqr>>
pattern.sqr