1994 - An introduction to genetic algorithms and to their use in information retrieval - Online and CDROM Review
- Jones G, A. M. Roberston, P. Willett
2568 AGGLOMERATIVE 25 AUTOMATIC 12000 BIBLIOGRAPHIC 5000 BINARY 89 The input file is likely to be large (say) 10,000 terms. The signature representation will reduce this to (say) 100 bits. In this case, the data structure taken as input to the genetic algorithm is a partition array of length 99. Each location in the array is an integer representing the number of indexing terms in that particular partition. The following might be a section of the array structure: 126 251 93 150 12 34 400 This data structure represents a situation where there are 126 terms in the first partition, 251 indexing terms in the second partition etc. Our genetic algorithm is used to produce and modify structures like this until the optimal set of partitions is obtained First the signature length should be decided (the number of partitions of the index file). The total frequency, Ft of words on the index file can then be divided by the signature length M to give the ideal frequency of each partition Fs.