-
1
-
-
0004289791
-
-
edited by C. Fellbaum (MIT Press, Cambridge, MA
-
WordNet: An Electronic Lexical Database, edited by, C. Fellbaum, (MIT Press, Cambridge, MA, 1998);
-
(1998)
WordNet: An Electronic Lexical Database
-
-
-
3
-
-
33745551648
-
-
edited by M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, and R. Silva, European Language Resource Association
-
L. van der Plas, in Proceedings of the 4th International Conference on Language Resources and Evaluation, edited by, M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, and, R. Silva, European Language Resource Association, 2004, p. 2205;
-
(2004)
Proceedings of the 4th International Conference on Language Resources and Evaluation
, pp. 2205
-
-
Van Der Plas, L.1
-
7
-
-
38148999476
-
-
Lecture Notes on Computer Science, edited by A. Ghosh, R. K. De, and S. K. Pal (Springer-Verlag, Berlin
-
G. Palshikar, in Proceedings of the Second International Conference on Pattern Recognition and Machine Intelligence (PReMI07), Vol. 4815 of Lecture Notes on Computer Science, edited by, A. Ghosh, R. K. De, and, S. K. Pal, (Springer-Verlag, Berlin, 2007), p. 503.
-
(2007)
Proceedings of the Second International Conference on Pattern Recognition and Machine Intelligence (PReMI07)
, vol.4815
, pp. 503
-
-
Palshikar, G.1
-
9
-
-
84953744816
-
-
10.1108/eb026526
-
K. Sparck Jones, J. Document. 28, 11 (1972); 10.1108/eb026526
-
(1972)
J. Document.
, vol.28
, pp. 11
-
-
Sparck Jones, K.1
-
10
-
-
0001319911
-
-
NIST SP 500-225, edited by D. K. Harman (National Institute of Standards and Technology, Gaithersburg, MD
-
S. E. Robertson, in Overview of the Third Text REtrieval Conference (TREC-3), NIST SP 500-225, edited by, D. K. Harman, (National Institute of Standards and Technology, Gaithersburg, MD, 1995), pp. 109-126.
-
(1995)
Overview of the Third Text REtrieval Conference (TREC-3)
, pp. 109-126
-
-
Robertson, S.E.1
-
13
-
-
0016526522
-
-
10.1002/asi.4630260402
-
S. P. Harter, J. Am. Soc. Inf. Sci. 26, 197 (1975). 10.1002/asi. 4630260402
-
(1975)
J. Am. Soc. Inf. Sci.
, vol.26
, pp. 197
-
-
Harter, S.P.1
-
16
-
-
34547346865
-
-
10.1103/RevModPhys.53.385
-
T. A. Brody, Rev. Mod. Phys. 53, 385 (1981); 10.1103/RevModPhys.53.385
-
(1981)
Rev. Mod. Phys.
, vol.53
, pp. 385
-
-
Brody, T.A.1
-
18
-
-
0036337516
-
-
10.1209/epl/i2002-00528-3
-
M. Ortuño, Europhys. Lett. 57, 759 (2002). 10.1209/epl/i2002-00528- 3
-
(2002)
Europhys. Lett.
, vol.57
, pp. 759
-
-
Ortuño, M.1
-
20
-
-
0141955293
-
-
10.1016/S0378-4371(03)00625-3
-
H. D. Zhou and G. W. Slater, Physica A 329, 309 (2003); 10.1016/S0378-4371(03)00625-3
-
(2003)
Physica A
, vol.329
, pp. 309
-
-
Zhou, H.D.1
Slater, G.W.2
-
22
-
-
0442323012
-
-
10.1239/jap/1067436088
-
V. T. Stefanov, J. Appl. Probab. 40, 881 (2003). 10.1239/jap/1067436088
-
(2003)
J. Appl. Probab.
, vol.40
, pp. 881
-
-
Stefanov, V.T.1
-
24
-
-
63649116442
-
-
http://bioinfo2.ugr.es/TextKeywords
-
-
-
-
26
-
-
34848925328
-
-
Now, the nearest neighbors distances are measured in letters, not in words, since we advance letter by letter exploring the text. Thus we ignore the "reading frame," which would be relevant only in texts with fixed-length words consecutively placed, as the words of length 3 (codons) in the coding regions of DNA. In addition, we consider only nonoverlapping occurrences of a word. Although the overlapping is not very likely in text without spaces, it can happen in other symbolic sequences, such as DNA, and the expected P (d) is different for overlapping or nonoverlapping randomly placed words (see Cambridge University Press, Cambridge, England
-
Now, the nearest neighbors distances are measured in letters, not in words, since we advance letter by letter exploring the text. Thus we ignore the "reading frame," which would be relevant only in texts with fixed-length words consecutively placed, as the words of length 3 (codons) in the coding regions of DNA. In addition, we consider only nonoverlapping occurrences of a word. Although the overlapping is not very likely in text without spaces, it can happen in other symbolic sequences, such as DNA, and the expected P (d) is different for overlapping or nonoverlapping randomly placed words (see S. Robin, F. Rodolphe, and S. Schbath, DNA, Words and Models (Cambridge University Press, Cambridge, England, 2005).
-
(2005)
DNA, Words and Models
-
-
Robin, S.1
Rodolphe, F.2
Schbath, S.3
-
28
-
-
63649157557
-
-
note
-
The algorithm proceeds as follows: we consider an initial l0, and we start with a given l0 word for which C> C0, where C0 corresponds to a p value=0.05. We then find the child (l0 +1) word with the highest C value, and proceed the same in the following generations: we find the successive child (l0 +i) word with the highest C value; i=1,2,.... This process stops at a previously chosen maximal word length lmax. Finally, we choose as the representative of the lineage the longest word for which C> C0 and define it as the extracted semantic unit. We repeat this algorithm for all the l0 words with C> C0, and we repeat also all the processes by changing the initial l0 value. Finally, we remove the remaining redundancies (repeated words or semantic units) due to explore all lineages from different initial l0.
-
-
-
|