Diffusion of context and credit information in Markovian models

被引:17
作者
Bengio, Y [1 ]
Frasconi, P [1 ]
机构
[1] UNIV FLORENCE,DIPARTIMENTO SISTEMI & INFORMAT,I-50139 FLORENCE,ITALY
关键词
D O I
10.1613/jair.233
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.
引用
收藏
页码:249 / 270
页数:22
相关论文
共 26 条
[1]  
Bahl L. R., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P49
[2]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[3]  
Bellman R., 1974, INTRO MATRIX ANAL
[4]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[5]   GLOBAL OPTIMIZATION OF A NEURAL NETWORK-HIDDEN MARKOV MODEL HYBRID [J].
BENGIO, Y ;
DEMORI, R ;
FLAMMIA, G ;
KOMPE, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (02) :252-259
[6]  
BENGIO Y, 1994, ADV NEURAL INFORMATI
[7]  
BENGIO Y, 1995, ADV NEURAL INFORMATI
[8]  
BENGIO Y, 1995, ADV NEURAL INFORMATI, V7
[10]  
CHAUVIN Y, 1995, IN PRESS J COMPUTATI