Lexical substitution
Lexical substitution is the task of identifying a substitute for a word in the context of a clause. For instance, given the following text: "After the match, replace any remaining fluid deficit to prevent chronic dehydration throughout the tournament", a substitute of game might be given.
Lexical substitution is strictly related to word sense disambiguation (WSD), in that both aim to determine the meaning of a word. However, while WSD consists of automatically assigning the appropriate sense from a fixed sense inventory, lexical substitution does not impose any constraint on which substitute to choose as the best representative for the word in context. By not prescribing the inventory, lexical substitution overcomes the issue of the granularity of sense distinctions and provides a level playing field for automatic systems that automatically acquire word senses (a task referred to as Word Sense Induction).
Evaluation
In order to evaluate automatic systems on lexical substitution, a task was organized at the Semeval-2007 evaluation competition held in Prague in 2007. A Semeval-2010 task on cross-lingual lexical substitution has also taken place.
Skip-gram model
The skip-gram model takes words with similar meanings into a vector space (collection of objects that can be added together and multiplied by numbers) that are found close to each other in N-dimensions (list of items). A variety of neural networks (computer system modeled after a human brain) are formed together as a result of the vectors and networks that are related together. This all occurs in the dimensions of the vocabulary that has been generated in a network.[1] The model has been used in lexical substitution automation and prediction algorithms. One such algorithm developed by Oren Melamud, Omer Levy, and Ido Dagan uses the skip-gram model to find a vector for each word and its synonyms. Then, it calculates the cosine distance between vectors to determine which words will be the best substitutes.[2]
Example
In a sentence like "The dog walked at a quick pace" each word has a specific vector in relation to the other. The vector for "The" would be [1,0,0,0,0,0,0] because the 1 is the word vocabulary and the 0s are the words surrounding that vocabulary, which create a vector.
Bibliography
- D. McCarthy, R. Navigli. The English Lexical Substitution Task. Language Resources and Evaluation, 43(2), Springer, 2009, pp. 139–159.
- D. McCarthy, R. Navigli. SemEval-2007 Task 10: English Lexical Substitution Task. Proc. of Semeval-2007 Workshop (SEMEVAL), in the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, 23–24 June 2007, pp. 48–53.
- D. McCarthy. Lexical substitution as a task for WSD evaluation. In Proceedings of the ACL workshop on word sense disambiguation: Recent successes and future directions, Philadelphia, USA, 2002, pp. 109–115.
- R. Navigli. Word Sense Disambiguation: A Survey, ACM Computing Surveys, 41(2), 2009, pp. 1–69.
References
- Barazza, Leonardo (3 April 2017). "How does Word2Vec's Skip-Gram work?". Becoming Human.
- Melamud, Oren; Levy, Omer; Dagan, Ido (5 June 2015). "A Simple Word Embedding Model for Lexical Substitution". Proceedings of NAACL-HLT 201: 1–7. doi:10.3115/v1/W15-1501. S2CID 2897037. Retrieved 16 April 2018.