A study on semantic similarity and its application to clustering