Efficient Data Deduplication in Hadoop