(For those who have taken the courses and want to submit for evaluation, please read the instructions linked on the table of contents page. Most of the questions below have straightforward answers from the material in the corresponding courses, although a few questions require some further studies, which are still based on the course material.)
- Briefly describe how the weighting is done for each word in document comparison?
- After the weights for all the words in documents are done, briefly describe how the actual comparison is done.
- The Zipf's law is the distribution of words and the Benford's law is the distribution of digits in the numbers appeared in documents. Both of them are so-called power law distributions. Inquire the google god about Zipf's law and Benford's law. What are your thoughts that why words and numbers in documents show such patterns?