
The TACTweb Query Language
The TACTweb Query Language is very similar to that used in TACT's UseBase program. If
you know this specification language, you already know how to specify queries in TACTweb.
The Basics
- 1. Word forms can be entered directly. To enter more than one form in a query, separate them by
commas:
- moon, sun (displays occurrences of the words "moon" and "sun")
"when (use a single" to specify a query language reserved word like "when")
- 2. Use the regular expression notation ".*" to specify a wildcard sequence:
- abs.* (displays words beginning with "abs")
.*ent (displays words ending with "ent")
c.*ons (displays words beginning with "c", ending with "ons")
.*ite.* (displays words containing "ite")
To see more on regular expressions, click here.
- 3. Use the | operator to request a phrase:
- my | lov.* (displays phrases beginning with the word "my" and followed immediately by any word starting with "lov")
my | (displays all words that immediately follow the word "my")
The discussion below introduces some variations.
- 4. Request co-occurrence patterns with the operators "&" or "~":
- moon & star.* (displays all occurrences of "moon" that occur near to a word beginning "star")
moon ~ star.* (displays all occurrences of "moon" that do not occur near a "star" word)
For more information about the co-occurrence syntax, click here.
- 5. Any of the above selection tools can be followed by a "when" refinement:
- moon.* ; when speaker = bottom (selects only those words starting with the letters "moon" when spoken
by the character "Bottom")
Both the semi-colon and the word "WHEN" are required. For more information, click
here.
The TACTweb Query in Detail
All TACTweb selection queries start with some type of initial
selection called a source. Some also contain
refinements or filters that modify the initial
selection. The general form is:
<sources> ; <refinements>
Note that the refinement section is optional, but, if used, must
be preceded by a semicolon.
Selection Sources
The source can consist of several selection items separated
by commas:
source 1 , source 2 , ...
The following are legal sources:
- Vocabulary Words: Any vocabulary word can be given. If
the word is reserved within the TACTweb query language, precede it
with an (unmatched) '"', e.g. "when.
- Regular Expressions: Regular expressions allow you to
select word forms based on letter patterns within that word. To
see information on how to formulate regular expressions,
click here.
- Frequency: The FREQ selection criteria can be
used as a source or a refinement to select words according to
frequency of occurrence. For more information about it
click here.
- SIMIL: The SIMIL selector selects word types based on
their similarity to a given pattern, e.g.: "SIMIL love
70%". This selects words with letter sequences that are more
than 70% similar to "love".
- Phrase Selection: Any of the above selection tools can
be combined into phrases by joining them with the "|"
operator, e.g. "my | lov.*" chooses phrases beginning with
the word "my" and followed immediately by any word starting with
"lov". The ">" character can be used to select positions
other than the first word in the phrase: e.g. "my | >
lov.*" will save the "lov" words. The ">" character may be
placed by itself before or after the phrase: "> | my |
lov.*" chooses all word immediately before the phrase; "my
| > | lov.*" chooses all phrases beginning with the word
"my", followed by any word, and then followed by a "lov" word. The
middle, unspecified, word is saved in the resulting list.
- Co-occurrence Selection: Any of the above selection
tools can be combined into expressions involving the co-occurrence
operators "&", "~". For more information about
the co-occurrence syntax, click here.
- Exclusion: The "-" operator can be used:
<source 1> - <source 2> produces a list of items that are in
<source 1> but are not in <source 2>.
Selection Refinements
A refinement can follow any source or source group. It begins with
a semicolon. More than one refinement can be given -- in which case
separate each by a semi-colon:
<sources> ; <refinement> ; <refinement> ...
There are several different refinement operators to choose from.
All refinement operators begin with a name, and some are followed by
some further specification information.
- FREQ: The FREQ operator can be used as a refinement.
For more information about it, click here.
- POS: The POS operator needs no further specification
information. It transforms each word (type) entry it is given into
a list of positions where that word occurs: e.g: "earth;
pos"
- REGEXP: Regular expressions can be used as a
refinement, e.g.: "freq 1; REGEXP a.*" chooses all word
forms occurring once that start with the letter "a". For more
information on regular expression syntax,
click here.
- WHEN: Can be used to select positions based on
structural information in the text. For more information,
click here.
- SIMIL: The SIMIL operator (described above) can also
act as a refinement, e.g.: "l.*; simil love 75%" will
select all words beginning with "l" that are more than 75% similar
to "love".
- SPAN: The SPAN operator creates a resulting list that
consists of all word tokens "near" to the given inputted positions
("near" is defined by the Context associated with the query). If
the configuration context was 5 words on either side, for example,
the selection "love; span" would select all words that
occur within 5 words of "love" in the text.

Web design Alex
Stevens, content Geoffrey
Rockwell or John
Bradley. March 7, 1997