SCOPUS 정보 검색 플랫폼

Volumn 49, Issue 5, 1994, Pages 4637-4651

Query construction, entropy, and generalization in neural-network models

Author keywords

[No Author keywords available]

Indexed keywords

EID: 0000782519 PISSN: 1063651X EISSN: None Source Type: Journal
DOI: 10.1103/PhysRevE.49.4637 Document Type: Article

Times cited : (22)

References (36)

5
- 84956214804
- Improving a Network Generalization Ability by Selecting Examples
- (1990) Europhysics Letters (EPL) , vol.13 , pp. 473
- Kinzel, W.¹ Rujan, P.²

7
- 0000695404
- Information-Based Objective Functions for Active Data Selection
- (1992) Neural Computation , vol.4 , pp. 590
- MacKay, D.J.C.¹

8
- 84926920569
- H. S. Seung, M. Opper, and H. Sompolinsky, in Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (COLT '92), Pittsburgh, 1992 (ACM, New York, 1992), pp. 287 294.

9
- 0011904207
- edited by, S. J. Hanson, J. D. Cowan, C. L. Giles, Morgan Kaufmann, San Mateo, CA
- (1993) Advances in Neural Information Processing Systems 5 , pp. 483 490
- Freund, Y.¹ Seung, H.S.² Shamir, E.³ Tishby, N.⁴

10
- 36149029748
- (1992) J. Phys. A , vol.25 , pp. 6243
- Kinouchi, O.¹ Caticha, N.²

11
- 0004264698
- Academic Press, New York
- (1972) Theory of Optimal Experiments
- Fedorov, V.V.¹

13
- 0000953829
- Developments in the Design of Experiments, Correspondent Paper
- (1982) International Statistical Review / Revue Internationale de Statistique , vol.50 , pp. 161
- Atkinson, A.C.¹

14
- 32044452528
- and C. F. J. Wu
- (1985) Biometrika , vol.72 , pp. 545
- Ford, I.¹ Titterington, D.M.²

15
- 0000540568
- Robust Bayes and Empirical Bayes Analysis with $_\epsilon$-Contaminated Priors
- (1986) The Annals of Statistics , vol.14 , pp. 461
- Berger, J.¹ Berliner, L.M.²

17
- 21144461159
- Nonlinear Experiments: Optimal Design and Inference Based on Likelihood
- (1993) Journal of the American Statistical Association , vol.88 , pp. 538
- Chaudhuri, P.¹ Mykland, P.A.²

20
- 84926920568
- As a notational shorthand, we assume that in all probability distributions in which Θ(p) appears, the number of examples p is held fixed, without writing this explicitly. Thus, for example, P(Θ(p)|V) should strictly be written as P(Θ(p)|V,p); hence, it is normalized to 1 when integrating over all possible training sets of size p. To make this convention consistent with the use of Bayes' theorem as in ( refPVthn), we also make the natural assumption that the number of training examples is independent of the teacher rule that we are trying to learn. Thus, P(p|V) = P(p) and hence P(V|p) = P(V), so that we only need one a priori teacher distribution for all values of p.

21
- 84926920567
- If there is a continuum of teachers V, P(V|Θ(p)) is a probability density which has the dimension of the inverse of V. Strictly speaking, a dimensional normalizing constant is then necessary to make the argument of the logarithm in ( refsv) dimensionless, but we shall not write this explicitly since it cancels from the entropy differences we will be concerned with.

22
- 0000729504
- Statistical Theory of Learning Curves under Entropic Loss Criterion
- (1993) Neural Computation , vol.5 , pp. 140
- Amari, S.¹ Murata, N.²

23
- 0025508916
- (1990) Proc. IEEE , vol.78 , pp. 1568
- Levin, E.¹ Tishby, N.² Solla, S.A.³

24
- 84956110061
- Optimal Learning with a Neural Network
- (1993) Europhysics Letters (EPL) , vol.21 , pp. 871
- Watkin, T.L.H.¹

25
- 21344475869
- (1993) J. Phys. A , vol.26 , pp. 5767
- Dunmur, A.P.¹ Wallace, D.J.²

26
- 84926920566
- The divergence as T to 0 of the term (N/2) ln T in the student space entropy ( reflinpercsnthn) does not present a problem here since we will only be concerned with entropy differences for which this term is irrelevant.

28
- 84926920565
- Strictly speaking Krogh and Hertz citeKroghetal92 consider a Gaussian distribution for the inputs instead of the spherical distribution ( refxspher), but in the limit N to∞ these produce identical results, as can be checked by a direct calculation of the average eigenvalue spectrum of $M_V along the lines of citeKinzeletal91.

29
- 0004004381
- edited by, E. Domany, J. L. van Hemmen, K. Shulten, Springer, Berlin
- (1991) Models of Neural Networks , pp. 149 171
- Kinzel, W.¹ Opper, M.²

30
- 84926882822
- W. Feller, Introduction to Probability Theory and Its Applications, 3rd ed. (Wiley, New York, 1970), Vol. 1.

31
- 84926901547
- For finite but large N, this expression can be estimated to be valid for values of α much smaller than ln N, from results for the mean waiting time in the ``collector's problem'' (see, e.g., Ref. citeFeller70); this ensures that the relative decrease [α+O(1)]/4N is always smaller than 1 as it has to be.

32
- 0001317036
- Uncertainty, Information, and Sequential Experiments
- (1962) The Annals of Mathematical Statistics , vol.33 , pp. 404
- DeGroot, M.H.¹

34
- 0001041635
- Learning algorithms and probability distributions in feed-forward and feed-back networks
- (1987) Proceedings of the National Academy of Sciences , vol.84 , pp. 8429
- Hopfield, J.J.¹

36
- 84926920563
- edited by, F. Fougelman Soulie, J. Hérault, Springer, Berlin
- (1989) Neuro Computing: Algorithms, Architectures and Applications , pp. 227 236
- Bridle, J.S.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.