- Deepak Sharma | Sep 20.2018 | Senior Data Scientist
Product graphs can help Indian MSMEs become better at managing their stock items and derive insights for their business. Here are the nuts and bolts of building product graphs from Clustr’s viewpoint.
At the recent Anthill Inside 2018 conference held in July, I was intrigued by the discussion around ‘Building knowledgeable machines using Knowledge Graphs’. The talk by Prof. Partha Talukdar, who heads the lab at Indian Institute of Science, Bangalore, had a number of relevant insights that can be applicable in the real world.
‘Knowledge graph’ is a graphical data structure where the nodes are entities which have attributes and edges which are link between the entities. Usually various types of links exist between entities. For example, in a knowledge graph ‘Nestle’ node owns brand ‘Maggi’ node. Having a knowledge graph provides various applications. It helps in Natural Language Understanding (NLU), Mapping, Reasoning, Question Answering, Search etc.
Various types of knowledge graphs have been made popular. The more common applications are seen in Chatbots and Voice assistants which typically use a knowledge graph in the background. Few of the most popular knowledge graphs include Google Knowledge Vault, LinkedIn Knowledge Graph, Microsoft Concept Graph etc.
Building knowledge graphs:
Building a knowledge graph is essentially extracting entities, their attributes and values and analyzing linkages between entities. Typically, knowledge graphs can be built for a specific domain area using the corpus data. The research on building knowledge graphs have been deepened due to the efforts of tech leaders such as Google and Facebook. A lot of theoretical work on this has been done by Tom Mitchell’s lab NELL which showcases a continuous learning system (Never Ending Language Learning) to read the web.
At Clustr, we can use the same concepts to build a Product Knowledge Graph for our different tasks. We could build this using SKU titles and other product meta data available from different data sources.
For example, in the figure above ‘Maggi’ is an entity in the product group, its attributes are that it is a brand, its company is Nestle and so on. It has a Product Category relationship to ‘Category: Noodles’ entity.
Extracting entities and relationships from data:
Extracting entities from data requires entity recognition models also called NER models. A simple NER model could be to extract all nouns from the SKU title. More complex entity recognition models can be build using tagged data. Typical techniques used to build these models include CRF, LSTM etc.
The above labeling provides a training model which can help us extract labels. How do we extract relationships? In traditional relationship extraction models, the structure of the language is used. However, in our case, building the model is not possible since we are working with product titles which rarely have language structure. An case in point is the product title we have seen earlier ‘Maggi Masala Noodle 100gm’. This title has very little structure. If we want to extract relationship between Maggi and Noodle, we do not have any other information except co-location.
This can be done in the following ways:
Validating the product graph:
The validation will happen through implicit and explicit methods:
Benefits of using Product Graph:
Product graph will help us better understand raw text (curation) and user queries (search). We can then better extract entities because product graph will improve entity linking and co-reference resolution.
For example, “Apple 10kg” will be resolved as Apple fruit, not Apple brand because the product graph will be able to see that 10kg has a connection with Apple Fruit Category and not Apple Brand or Company.
Similarly search experience can be improved by using product graph to provide context to the search and making it more relevant. This way the search becomes more entity-based and contextual.
The product graph can also be used for auto completion. A variety of learning from graph algorithms can be used for inference. Bayesian network based models are some of the more popular ones for inference on graphs. This is a complex problem with lot of research happening on this. We can explore these concepts in more detail in future articles.
At Clustr, we could use the product graph primarily to improve resolution and mapping services. This will also help in improving aggregation as well as search and select service. Resolution services will help convert the raw product titles of the customer to structured entities from our catalogue. Having structured entities will help customers avail various GET and AGGREGATE services. For example, the above Maggi Masala Noodle 100gm SKU will get mapped to its attributes and item in our catalogue. Clustr can then help customers with updates in tax rates, information on Nestle, average Noodle price in the area, popular variants of the SKU, number of this SKU selling in the region, forecast of the SKU and how much should customer stock etc.
Similarly, the product graph can help improve search services, allowing customers to run their day to day operations more efficiently.
So far, the MSME sector in India has not been able to take advantage of the advances in Big Data Analytics. At Clustr, we aim to do this in a scalable and highly cost-effective manner for it to be consumable by the FMCG sector. For this, our approach is to first map their stock items to our catalogue. Then, with our technical expertise and deeper understanding of their buying and selling behavior, provide the MSMEs with highly tailored recommendations.
Product graphs stand to help MSMEs become better at managing their stock items and derive insights for their business. They can improve mapping of customer items to Clustr entities. And improved mapping will help us improve our recommendations and aggregations and will help us reach more customers.