메뉴 건너뛰기




Volumn 49, Issue 5, 1994, Pages 4637-4651

Query construction, entropy, and generalization in neural-network models

Author keywords

[No Author keywords available]

Indexed keywords


EID: 0000782519     PISSN: 1063651X     EISSN: None     Source Type: Journal    
DOI: 10.1103/PhysRevE.49.4637     Document Type: Article
Times cited : (22)

References (36)
  • 8
    • 84926920569 scopus 로고    scopus 로고
    • H. S. Seung, M. Opper, and H. Sompolinsky, in Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (COLT '92), Pittsburgh, 1992 (ACM, New York, 1992), pp. 287 294.
  • 20
    • 84926920568 scopus 로고    scopus 로고
    • As a notational shorthand, we assume that in all probability distributions in which Θ(p) appears, the number of examples p is held fixed, without writing this explicitly. Thus, for example, P(Θ(p)|V) should strictly be written as P(Θ(p)|V,p); hence, it is normalized to 1 when integrating over all possible training sets of size p. To make this convention consistent with the use of Bayes' theorem as in ( refPVthn), we also make the natural assumption that the number of training examples is independent of the teacher rule that we are trying to learn. Thus, P(p|V) = P(p) and hence P(V|p) = P(V), so that we only need one a priori teacher distribution for all values of p.
  • 21
    • 84926920567 scopus 로고    scopus 로고
    • If there is a continuum of teachers V, P(V|Θ(p)) is a probability density which has the dimension of the inverse of V. Strictly speaking, a dimensional normalizing constant is then necessary to make the argument of the logarithm in ( refsv) dimensionless, but we shall not write this explicitly since it cancels from the entropy differences we will be concerned with.
  • 26
    • 84926920566 scopus 로고    scopus 로고
    • The divergence as T to 0 of the term (N/2) ln T in the student space entropy ( reflinpercsnthn) does not present a problem here since we will only be concerned with entropy differences for which this term is irrelevant.
  • 28
    • 84926920565 scopus 로고    scopus 로고
    • Strictly speaking Krogh and Hertz citeKroghetal92 consider a Gaussian distribution for the inputs instead of the spherical distribution ( refxspher), but in the limit N to∞ these produce identical results, as can be checked by a direct calculation of the average eigenvalue spectrum of $M_V along the lines of citeKinzeletal91.
  • 30
    • 84926882822 scopus 로고    scopus 로고
    • W. Feller, Introduction to Probability Theory and Its Applications, 3rd ed. (Wiley, New York, 1970), Vol. 1.
  • 31
    • 84926901547 scopus 로고    scopus 로고
    • For finite but large N, this expression can be estimated to be valid for values of α much smaller than ln N, from results for the mean waiting time in the ``collector's problem'' (see, e.g., Ref. citeFeller70); this ensures that the relative decrease [α+O(1)]/4N is always smaller than 1 as it has to be.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.