Chennai Mathematical Institute


12.00 - 1.00 p.m.
Transforming Transaction Click Streams into Web Accessibility Models

I.V. Ramakrishnan
SUNY, Stony Brook, U.S.A.


Screen readers, the dominant assistive technology used by visually impaired users to access the Web, function by speaking out the content of the screen serially. Using screen readers for conducting online transactions can cause considerable information overload, because transactions, such as shopping and paying bills, typically involve a number of steps spanning several web pages. One can combat this overload with a transaction model for web accessibility that presents only fragments of web pages that are needed for doing transactions. Such a model can be realized by coupling an automata, encoding states of a transaction, with concept classifiers that identify page fragments ``relevant'' to a particular state of the transaction.

This talk will present a fully automated process that synergistically combines several techniques for transforming click stream data generated by transactions into a transaction model. These techniques include web content analysis to partition a web page into segments consisting of semantically related content elements, contextual analysis of data surrounding clickable objects in a page, and machine learning methods, such as clustering of page segments based on contextual analysis, statistical classification, and automata learning.

A unique aspect of the transformation process is that the click streams, that serve as the training data for the learning methods, need not be labeled. More generally, it operates with partially labeled click stream data where some or *all *the labels could be missing. Not having to rely exclusively on (manually) labeled click stream data has significant benefits: (i) visually impaired users do not have to depend on sighted users for the training data needed to construct transaction models; (ii) it is possible to mine personalized models from transaction click streams associated with sites that visually impaired users visit regularly; (iii) since partially labeled data is relatively easy to obtain, it is feasible to scale up the construction of domain-specific transaction models (e.g.: separate models for shopping, airline reservations, bill payments, etc.); (iv) adjusting the performance of deployed models over time with new training data is also doable.

Preliminary experimental evidence is suggestive of the practical effectiveness of both domain-specific, as well as personalized accessibility transaction models built using the transformation process. Finally, this process is also applicable for building transaction models for mobile devices with limited-size displays, as well as for creating wrappers for information extraction from web sites.

(Joint work with Jalal Mahmud, Yevgen Borodin, C.R. Ramakrishnan)