Syntax of MINT

From HypKNOWsys

Contents


Syntax of the Mining Language MINT V1.2

A MINT query SELECTs a template containing a list of variables and wildcards. The variables must be bound to such values that the predicates in the whereClause are satisfied.

query ::= 'SELECT' selectList fromClause [whereClause]

Query structure: MINT V1.2 only supports the output of the template specified in the fromClause without further manipulation.

selectList ::= templateVar

Specifying the query template: A template consists of node variables and wildcards.

fromClause ::= 'FROM' nodeRef ',' templateRef
nodeRef ::= 'NODE' 'AS' nodeVar*
templateRef ::= 'TEMPLATE' template 'AS' templateVar
template ::= ['#'] [wildcards] (nodeVar [wildcards])*

The special symbol # denotes the beginning of a sequence. So the template a * b is different from # a * b, in that variable a may appear anywhere in the first template, while it must appear at the sequence beginning in the second template. The wildcards in MINT V1.2 can be annotated with constraints:

wildcards ::= '*' | '['low_boundary';'high_boundary']'

The wildcard * stands for zero or more events, if it appears by itself. If a constraint interval is used as a wildcard, the permissible number of events must belong to this interval. Note that the interval is closed at both ends. This means that infinity is a legal high boundary value. All elements of a template must be separated from each other using a blank space, e.g. # [0;1] a * b.

Introducing predicates: This is SQL-like syntax, but the queries are applied to groups of sequences.

whereClause ::= 'WHERE' condition ('AND' condition)*
condition ::= stringCondition | numericCondition
stringCondition ::= columnReference stringCompOp stringLiteral
columnReference ::= nodeVar '.' columnName
stringCompOp ::= '=' | '!=' | '>' | '>=' | '<' | '<=' |
                 'contains' | 'startswith' | 'endswith' |
                 '!contains' | '!startswith' | '!endswith'
numericCondition ::= numericReference numericCompOp numericLiteral
numericReference ::= columnReference |
                     '( ' columnReference 
                          numericOperator
                          columnReference ' )'
numericCompOp ::= '=' | '!=' | '>' | '>=' | '<' | '<='
numericOperator ::= '+' | '-' | '*' | '/'

A numeric expression must involve at most two column references and must be linear. The columns currently supported are:

  • url: Identifier of an event. However, a url may appear more than once in a sequence, since a user may revisit some web pages. An event in a sequence is uniquely identified by the url and the occurrence (below).
  • occurrence: Occurrence of the event in the sequence. This number increases if an event occurs more than once in a sequence.
  • support: The number of sequences where the event appears, in the context of events bound to node variables and preceding it in a sequence.
  • accesses: Total number of sequences where the url occurs, independently of occurrence number and preceding events.

In the current version, numeric expressions involve at most two column references and are always linear. Column names and string operators must be typed in lower case letters.

Examples

For the following examples of MINT queries, consider this small demo web site:

Image:Syntax_of_Mint_WebSite.gif

After importing this access log file into WUM, the users' sessions (threshold: 30 min. maximum session duration) were determined and afterwards aggregated into this aggregated tree:

Image:Syntax_of_Mint_DemoAggregatedLog.gif

Have a look at the following example MINT queries:

  • Which paths of length between 1 and 5 lead to a node of support more than 5?
select t
from node as a, template # [1;5] a as t
where a.support > 5
Note that a cannot be the first node of the path. From the results, you can see that for any value of a, the output aggregate tree is comprised by more than one paths, none of which was traversed more than 5 times.
  • Which paths do visitors use to get from X.html to Y.html?
select t
from node as a b, template a * b as t
where a.url = "X.html"
and b.url = "Y.html"
  • Try templates containing a cycle!
select t
from node as a b, template a * b as t
where ( b.support / a.support ) > 0.3
and b.occurrence = 2

Please make sure that your MINT queries conform to the following syntax rules:

  • Blank spaces between operators, parentheses, etc. are compulsory!
  • All elements of the MINT syntax must be written in lower case letters!
  • The character # stands for the root node of the Aggregated Log.