메뉴 건너뛰기




Volumn 79, Issue 3, 2009, Pages

Level statistics of words: Finding keywords in literary texts and symbolic sequences

Author keywords

[No Author keywords available]

Indexed keywords

A-PRIORI; DISORDERED SYSTEMS; LEVEL STATISTICS; LITERARY TEXTS; SPATIAL DISTRIBUTIONS; SYMBOLIC SEQUENCES;

EID: 63649110040     PISSN: 15393755     EISSN: 15502376     Source Type: Journal    
DOI: 10.1103/PhysRevE.79.035102     Document Type: Article
Times cited : (77)

References (28)
  • 1
    • 0004289791 scopus 로고    scopus 로고
    • edited by C. Fellbaum (MIT Press, Cambridge, MA
    • WordNet: An Electronic Lexical Database, edited by, C. Fellbaum, (MIT Press, Cambridge, MA, 1998);
    • (1998) WordNet: An Electronic Lexical Database
  • 4
    • 38949105955 scopus 로고    scopus 로고
    • 10.1371/journal.pcbi.0040020
    • K Cohen and L. Hunter, PLOS Comput. Biol. 4, e20 (2008). 10.1371/journal.pcbi.0040020
    • (2008) PLOS Comput. Biol. , vol.4 , pp. 20
    • Cohen, K.1    Hunter, L.2
  • 9
  • 10
    • 0001319911 scopus 로고
    • NIST SP 500-225, edited by D. K. Harman (National Institute of Standards and Technology, Gaithersburg, MD
    • S. E. Robertson, in Overview of the Third Text REtrieval Conference (TREC-3), NIST SP 500-225, edited by, D. K. Harman, (National Institute of Standards and Technology, Gaithersburg, MD, 1995), pp. 109-126.
    • (1995) Overview of the Third Text REtrieval Conference (TREC-3) , pp. 109-126
    • Robertson, S.E.1
  • 13
  • 16
    • 34547346865 scopus 로고
    • 10.1103/RevModPhys.53.385
    • T. A. Brody, Rev. Mod. Phys. 53, 385 (1981); 10.1103/RevModPhys.53.385
    • (1981) Rev. Mod. Phys. , vol.53 , pp. 385
    • Brody, T.A.1
  • 18
    • 0036337516 scopus 로고    scopus 로고
    • 10.1209/epl/i2002-00528-3
    • M. Ortuño, Europhys. Lett. 57, 759 (2002). 10.1209/epl/i2002-00528- 3
    • (2002) Europhys. Lett. , vol.57 , pp. 759
    • Ortuño, M.1
  • 20
    • 0141955293 scopus 로고    scopus 로고
    • 10.1016/S0378-4371(03)00625-3
    • H. D. Zhou and G. W. Slater, Physica A 329, 309 (2003); 10.1016/S0378-4371(03)00625-3
    • (2003) Physica A , vol.329 , pp. 309
    • Zhou, H.D.1    Slater, G.W.2
  • 22
    • 0442323012 scopus 로고    scopus 로고
    • 10.1239/jap/1067436088
    • V. T. Stefanov, J. Appl. Probab. 40, 881 (2003). 10.1239/jap/1067436088
    • (2003) J. Appl. Probab. , vol.40 , pp. 881
    • Stefanov, V.T.1
  • 24
    • 63649116442 scopus 로고    scopus 로고
    • http://bioinfo2.ugr.es/TextKeywords
  • 26
    • 34848925328 scopus 로고    scopus 로고
    • Now, the nearest neighbors distances are measured in letters, not in words, since we advance letter by letter exploring the text. Thus we ignore the "reading frame," which would be relevant only in texts with fixed-length words consecutively placed, as the words of length 3 (codons) in the coding regions of DNA. In addition, we consider only nonoverlapping occurrences of a word. Although the overlapping is not very likely in text without spaces, it can happen in other symbolic sequences, such as DNA, and the expected P (d) is different for overlapping or nonoverlapping randomly placed words (see Cambridge University Press, Cambridge, England
    • Now, the nearest neighbors distances are measured in letters, not in words, since we advance letter by letter exploring the text. Thus we ignore the "reading frame," which would be relevant only in texts with fixed-length words consecutively placed, as the words of length 3 (codons) in the coding regions of DNA. In addition, we consider only nonoverlapping occurrences of a word. Although the overlapping is not very likely in text without spaces, it can happen in other symbolic sequences, such as DNA, and the expected P (d) is different for overlapping or nonoverlapping randomly placed words (see S. Robin, F. Rodolphe, and S. Schbath, DNA, Words and Models (Cambridge University Press, Cambridge, England, 2005).
    • (2005) DNA, Words and Models
    • Robin, S.1    Rodolphe, F.2    Schbath, S.3
  • 28
    • 63649157557 scopus 로고    scopus 로고
    • note
    • The algorithm proceeds as follows: we consider an initial l0, and we start with a given l0 word for which C> C0, where C0 corresponds to a p value=0.05. We then find the child (l0 +1) word with the highest C value, and proceed the same in the following generations: we find the successive child (l0 +i) word with the highest C value; i=1,2,.... This process stops at a previously chosen maximal word length lmax. Finally, we choose as the representative of the lineage the longest word for which C> C0 and define it as the extracted semantic unit. We repeat this algorithm for all the l0 words with C> C0, and we repeat also all the processes by changing the initial l0 value. Finally, we remove the remaining redundancies (repeated words or semantic units) due to explore all lineages from different initial l0.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.