Hashtag in a specific category

Put a word in a specific zone.

Nikhil Desai
4 min readApr 16, 2021

We are much familiar using word2vec library for word vectors. For Natural Language Processing (NLP), it’s more useful to use dictionaries that define concepts in terms of their usage statistics and Word2Vec models are helpful to create this kind of dictionaries.But,the problem with that is Word Part,for example a bird is in a dictionary, then model does not identify the individual usage of it and that’s the drawbacks of Word2vec models.

Word vectors built from these methods map words to points in space that effectively encode semantic and syntactic meaning despite ignoring word order information.

google Image

After-all these problems have been identified and suggested that we should allow word vectors to grow arbitrarily, so that we can do a better job of modelling complicated concepts. That is the reason why Sense2Vec comes into the picture today.

sense2vec working by leveraging supervised NLP labels instead of
unsupervised clusters to determine a particular word instance’s sense. This eliminates the need to train embedding multiple times, eliminates the need for a clustering step, and creates an efficient method by which a supervised classifier may consume the appropriate word-sense embedding.

Graphical representation of sense2vec

Given a labeled corpus with one or more labels per word, the sense2vec model first counts the number of uses of each word and generates a random ”sense embedding” for each use. A model is then trained using either the CBOW (Continuous-Bag-of-Word), Skip-gram, or Structured Skip-gram model configurations. This model predicts a word sense given surrounding senses.

Sense2vec is a wonderful library to twist on word2vec that lets you learn more interesting, detailed and context-sensitive word vectors. You can find it more on this link. The Idea is simple if a bird is in two different situations then take two entries of it as a named entities. Additionally, we merge named entities and base noun phrases into single tokens, so that they receive a single vector.

Also,Sense2vec library can be used as a stand alone or we can use it with Spacy pipeline component with convenient custom attributes.

Now, lets talk about the project,which I did using this wonderful library. Before we moved further lets talk about the basic requirements which we need to download before jump into it.So, download one of the vectors of Sense2vec from here. And extract vector file from it and store them in a pickle file as a Source file, put their values in a variable.

Now, with the help of K-means Clustering method, which gives you the number of clusters you want to make as a input. Found out distances of particular vector to all different vector’s centroids, and grabs first 5 smallest distances of that vector. Then,separate out all distances to that particular vector and store its clusters number and its distances into a pickle file as Destination file.

The vectors are listed out with their clusters number and weights according to centroids of different clusters. As we discuss earlier that we’ll take only top 5 nearest in distance.

Conclusion:

In this work, we have proposed a new model for word sense disambiguation that uses supervised NLP labeling to disambiguate between word senses. However, instead of using unsupervised clustering methods, our approach clusters using supervised labels which can analyze a specific word’s context and assign a label. This significantly reduces the computational overhead of word-sense modeling and provides a natural mechanism for other NLP tasks to select the appropriate sense embedding.

Watch a video to get familiar about hashtag category classification via APIs:

APIs responses through postman

Happy Learning…!!!

--

--