Wednesday, March 7, 2012

Limitations, etc.

We're looking at rewriting part of one of our applications and FINALLY
moving from SQL 7.0 to SQL 2000.
One reason we haven't is currently the table we index has 3 columns we want
to be able to do a FT search against and since SP3 on SQL 7, MS has
"changed" the rules.
So, our current solution is probably going to be to jam all 3 fields into a
single "superfield" (probably in some sort of XML format for future use.)
(so something like <xml><title>Tale of Two Cities</title><Author>Charles
Dickens</author><body>It was the best of times...</body></xml>) (XML format
has not been decided upon).
Anyway, we also want to be able to have HTML tags within the text.
Now, several questions:
My understanding is SQL 2000 gives us things like:
Change tracking
Scalability beyond 1 million rows (a serious problem right now, as we add
about 1-2 million rows a year.)
What else?
My understanding is the word breaker dll's depend partly on the OS? That
Windows 2003 is "better" in some ways?
Is there one that can give us good "breaking" on HTML (we are currently
stripping it out which causes other issues) and allows decent FORMSOF and
INFLECTIONAL support?
(for example run would find runs, running, ran, etc.)
Is it work upgrading to Windows 2003? (I'm not sure I can justify it unless
the features really help).
Thanks.
--
Greg D. Moore
President Green Mountain Software
Personal: http://stratton.greenms.com
SQL FTS in SQL 7 and SQL 2000 does not index properties. So if you are
expecting to be able to query the Title, or Author independently of the body
you will be unable to do so.
XML is problematic for the same reason if you choose to index it as XML in
columns of the image data type using the document type column. If you index
XML or HTML in the text or char columns your results will be unpredictable
due to how the word breaker treats words that are adjacent to the <, and >
symbols.
I would advise you to render your content as text instead of indexing it as
XML or HTML. You could have two columns one with the content with the tags,
and the other without. You could index the column which does not contain the
HTML or XML tags, and then display the columns which do.
I would not characterize the Win2003 word breakers as better than the
Win2000 ones. I would say they have evolved, perform better, but have their
own idosyncracies. Word breakers will look at the stream of data which is
emmitted by the iFilters and break the stream into words or tokens according
to language rules. It is the iFilters which will examine the data in your
columns and extract the textual stream which it sends to the word breakers.
If you use the document type column and store data in its native form in the
image data type column, SQL FTI will read the value of the document type
column and apply the correct iFilter.
If you choose to do this HTML will be indexed correctly but not the author
(docauthor) or title (doctitle) metatags. XML elements, attributes and
values will not be indexed as they are considered to be properties by the
XML iFilter. Your best bet with XML is to shred it and store it in columns,
or to remove the tags and store it as text or char.
Contains and FreeText both do the linguistic stemming you are looking for.
I have seen performance increases with Win2003 on a rather large Full Text
catalog I have. I can't quantify it, but it is faster.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in message
news:ds1bd.295162$bp1.244115@.twister.nyroc.rr.com. ..
> We're looking at rewriting part of one of our applications and FINALLY
> moving from SQL 7.0 to SQL 2000.
> One reason we haven't is currently the table we index has 3 columns we
want
> to be able to do a FT search against and since SP3 on SQL 7, MS has
> "changed" the rules.
> So, our current solution is probably going to be to jam all 3 fields into
a
> single "superfield" (probably in some sort of XML format for future use.)
> (so something like <xml><title>Tale of Two Cities</title><Author>Charles
> Dickens</author><body>It was the best of times...</body></xml>) (XML
format
> has not been decided upon).
> Anyway, we also want to be able to have HTML tags within the text.
> Now, several questions:
> My understanding is SQL 2000 gives us things like:
> Change tracking
> Scalability beyond 1 million rows (a serious problem right now, as we add
> about 1-2 million rows a year.)
> What else?
> My understanding is the word breaker dll's depend partly on the OS? That
> Windows 2003 is "better" in some ways?
> Is there one that can give us good "breaking" on HTML (we are currently
> stripping it out which causes other issues) and allows decent FORMSOF and
> INFLECTIONAL support?
> (for example run would find runs, running, ran, etc.)
> Is it work upgrading to Windows 2003? (I'm not sure I can justify it
unless
> the features really help).
> Thanks.
>
> --
> --
> Greg D. Moore
> President Green Mountain Software
> Personal: http://stratton.greenms.com
>

No comments:

Post a Comment