SCOPUS 정보 검색 플랫폼

Physical Review E - Statistical, Nonlinear, and Soft Matter Physics

Volumn 79, Issue 3, 2009, Pages

Level statistics of words: Finding keywords in literary texts and symbolic sequences

(5) Carpena, P a Bernaola Galvan P a Hackenberg, M b Coronado, A V a Oliver, J L c

a UNIVERSITY OF MÁLAGA (Spain)

b CIC BIOGUNE (Spain)

c UNIVERSITY OF GRANADA (Spain)

Author keywords

[No Author keywords available]

Indexed keywords

A-PRIORI; DISORDERED SYSTEMS; LEVEL STATISTICS; LITERARY TEXTS; SPATIAL DISTRIBUTIONS; SYMBOLIC SEQUENCES;

EID: 63649110040 PISSN: 15393755 EISSN: 15502376 Source Type: Journal
DOI: 10.1103/PhysRevE.79.035102 Document Type: Article

Times cited : (77)

References (28)

1
- 0004289791
- edited by C. Fellbaum (MIT Press, Cambridge, MA
- WordNet: An Electronic Lexical Database, edited by, C. Fellbaum, (MIT Press, Cambridge, MA, 1998);
- (1998) WordNet: An Electronic Lexical Database

2
- 0002936192
- E. Frank, Proceedings of the 16th International Joint Conference on Artificial Intelligence, 1999, p. 668;
- (1999) Proceedings of the 16th International Joint Conference on Artificial Intelligence , pp. 668
- Frank, E.¹

3
- 33745551648
- edited by M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, and R. Silva, European Language Resource Association
- L. van der Plas, in Proceedings of the 4th International Conference on Language Resources and Evaluation, edited by, M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, and, R. Silva, European Language Resource Association, 2004, p. 2205;
- (2004) Proceedings of the 4th International Conference on Language Resources and Evaluation , pp. 2205
- Van Der Plas, L.¹

4
- 38949105955
- 10.1371/journal.pcbi.0040020
- K Cohen and L. Hunter, PLOS Comput. Biol. 4, e20 (2008). 10.1371/journal.pcbi.0040020
- (2008) PLOS Comput. Biol. , vol.4 , pp. 20
- Cohen, K.¹ Hunter, L.²

5
- 0000880768
- H. P. Luhn, IBM J. Res. Dev. 2, 157 (1958).
- (1958) IBM J. Res. Dev. , vol.2 , pp. 157
- Luhn, H.P.¹

6
- 33746044621
- 10.1142/S0218213004001466
- Y. Matsuo and M. Ishizuka, Int. J. Artif. Intell. 13, 157 (2004); 10.1142/S0218213004001466
- (2004) Int. J. Artif. Intell. , vol.13 , pp. 157
- Matsuo, Y.¹ Ishizuka, M.²

7
- 38148999476
- Lecture Notes on Computer Science, edited by A. Ghosh, R. K. De, and S. K. Pal (Springer-Verlag, Berlin
- G. Palshikar, in Proceedings of the Second International Conference on Pattern Recognition and Machine Intelligence (PReMI07), Vol. 4815 of Lecture Notes on Computer Science, edited by, A. Ghosh, R. K. De, and, S. K. Pal, (Springer-Verlag, Berlin, 2007), p. 503.
- (2007) Proceedings of the Second International Conference on Pattern Recognition and Machine Intelligence (PReMI07) , vol.4815 , pp. 503
- Palshikar, G.¹

8
- 0003653039
- McGraw-Hill, New York
- G. Salton and M. J. McGill, Introduction to Modern Information Retrieval (McGraw-Hill, New York, 1983);
- (1983) Introduction to Modern Information Retrieval
- Salton, G.¹ McGill, M.J.²

9
- 84953744816
- 10.1108/eb026526
- K. Sparck Jones, J. Document. 28, 11 (1972); 10.1108/eb026526
- (1972) J. Document. , vol.28 , pp. 11
- Sparck Jones, K.¹

10
- 0001319911
- NIST SP 500-225, edited by D. K. Harman (National Institute of Standards and Technology, Gaithersburg, MD
- S. E. Robertson, in Overview of the Third Text REtrieval Conference (TREC-3), NIST SP 500-225, edited by, D. K. Harman, (National Institute of Standards and Technology, Gaithersburg, MD, 1995), pp. 109-126.
- (1995) Overview of the Third Text REtrieval Conference (TREC-3) , pp. 109-126
- Robertson, S.E.¹

11
- 0016102974
- 10.1002/asi.4630250505
- A. Bookstein and D. R. Swanson, J. Am. Soc. Inf. Sci. 25, 312 (1974); 10.1002/asi.4630250505
- (1974) J. Am. Soc. Inf. Sci. , vol.25 , pp. 312
- Bookstein, A.¹ Swanson, D.R.²

12
- 0016428956
- 10.1002/asi.4630260107
- A. Bookstein and D. R. Swanson, J. Am. Soc. Inf. Sci. 26, 45 (1975). 10.1002/asi.4630260107
- (1975) J. Am. Soc. Inf. Sci. , vol.26 , pp. 45
- Bookstein, A.¹ Swanson, D.R.²

13
- 0016526522
- 10.1002/asi.4630260402
- S. P. Harter, J. Am. Soc. Inf. Sci. 26, 197 (1975). 10.1002/asi. 4630260402
- (1975) J. Am. Soc. Inf. Sci. , vol.26 , pp. 197
- Harter, S.P.¹

14
- 84902975388
- A. Berger and J. Lafferty, Proc. ACM SIGIR99, 1999, p. 222;
- (1999) Proc. ACM SIGIR99 , pp. 222
- Berger, A.¹ Lafferty, J.²

15
- 0032268440
- J. Ponte and W. Croft, Proc. ACM SIGIR98, 1998, p. 275.
- (1998) Proc. ACM SIGIR98 , pp. 275
- Ponte, J.¹ Croft, W.²

16
- 34547346865
- 10.1103/RevModPhys.53.385
- T. A. Brody, Rev. Mod. Phys. 53, 385 (1981); 10.1103/RevModPhys.53.385
- (1981) Rev. Mod. Phys. , vol.53 , pp. 385
- Brody, T.A.¹

17
- 0004163930
- Academic Press, New York
- M. L. Mehta, Random Matrices (Academic Press, New York, 1991).
- (1991) Random Matrices
- Mehta, M.L.¹

18
- 0036337516
- 10.1209/epl/i2002-00528-3
- M. Ortuño, Europhys. Lett. 57, 759 (2002). 10.1209/epl/i2002-00528- 3
- (2002) Europhys. Lett. , vol.57 , pp. 759
- Ortuño, M.¹

19
- 19744382393
- 10.1103/PhysRevLett.93.176804
- P. Carpena, P. Bernaola-Galvan, and P. C. Ivanov, Phys. Rev. Lett. 93, 176804 (2004). 10.1103/PhysRevLett.93.176804
- (2004) Phys. Rev. Lett. , vol.93 , pp. 176804
- Carpena, P.¹ Bernaola-Galvan, P.² Ivanov, P.C.³

20
- 0141955293
- 10.1016/S0378-4371(03)00625-3
- H. D. Zhou and G. W. Slater, Physica A 329, 309 (2003); 10.1016/S0378-4371(03)00625-3
- (2003) Physica A , vol.329 , pp. 309
- Zhou, H.D.¹ Slater, G.W.²

21
- 20044376021
- 10.1142/S021947750300104X
- M. J. Berryman, A. Allison, and D. Abbott, Fluct. Noise Lett. 3, L1 (2003). 10.1142/S021947750300104X
- (2003) Fluct. Noise Lett. , vol.3 , pp. 1
- Berryman, M.J.¹ Allison, A.² Abbott, D.³

22
- 0442323012
- 10.1239/jap/1067436088
- V. T. Stefanov, J. Appl. Probab. 40, 881 (2003). 10.1239/jap/1067436088
- (2003) J. Appl. Probab. , vol.40 , pp. 881
- Stefanov, V.T.¹

23
- 0000180851
- 10.1023/A:1014633825822
- S. Robin and J. J. Daudin, Ann. Inst. Stat. Math. 53, 895 (2001). 10.1023/A:1014633825822
- (2001) Ann. Inst. Stat. Math. , vol.53 , pp. 895
- Robin, S.¹ Daudin, J.J.²

24
- 63649116442
- http://bioinfo2.ugr.es/TextKeywords

25
- 0034730161
- 10.1073/pnas.180265397
- H. J. Bussemaker, H. Li, and E. D. Siggia, Proc. Natl. Acad. Sci. U.S.A. 97, 10096 (2000). 10.1073/pnas.180265397
- (2000) Proc. Natl. Acad. Sci. U.S.A. , vol.97 , pp. 10096
- Bussemaker, H.J.¹ Li, H.² Siggia, E.D.³

26
- 34848925328
- Now, the nearest neighbors distances are measured in letters, not in words, since we advance letter by letter exploring the text. Thus we ignore the "reading frame," which would be relevant only in texts with fixed-length words consecutively placed, as the words of length 3 (codons) in the coding regions of DNA. In addition, we consider only nonoverlapping occurrences of a word. Although the overlapping is not very likely in text without spaces, it can happen in other symbolic sequences, such as DNA, and the expected P (d) is different for overlapping or nonoverlapping randomly placed words (see Cambridge University Press, Cambridge, England
- Now, the nearest neighbors distances are measured in letters, not in words, since we advance letter by letter exploring the text. Thus we ignore the "reading frame," which would be relevant only in texts with fixed-length words consecutively placed, as the words of length 3 (codons) in the coding regions of DNA. In addition, we consider only nonoverlapping occurrences of a word. Although the overlapping is not very likely in text without spaces, it can happen in other symbolic sequences, such as DNA, and the expected P (d) is different for overlapping or nonoverlapping randomly placed words (see S. Robin, F. Rodolphe, and S. Schbath, DNA, Words and Models (Cambridge University Press, Cambridge, England, 2005).
- (2005) DNA, Words and Models
- Robin, S.¹ Rodolphe, F.² Schbath, S.³

27
- 0004116989
- The MIT Press, Cambridge, MA
- T. H. Cormen, Introduction to Algorithms (The MIT Press, Cambridge, MA, 1990), pp. 485-488.
- (1990) Introduction to Algorithms , pp. 485-488
- Cormen, T.H.¹

28
- 63649157557
- note
- The algorithm proceeds as follows: we consider an initial l0, and we start with a given l0 word for which C> C0, where C0 corresponds to a p value=0.05. We then find the child (l0 +1) word with the highest C value, and proceed the same in the following generations: we find the successive child (l0 +i) word with the highest C value; i=1,2,.... This process stops at a previously chosen maximal word length lmax. Finally, we choose as the representative of the lineage the longest word for which C> C0 and define it as the extracted semantic unit. We repeat this algorithm for all the l0 words with C> C0, and we repeat also all the processes by changing the initial l0 value. Finally, we remove the remaining redundancies (repeated words or semantic units) due to explore all lineages from different initial l0.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.