Research Papers

Thesaurus-Guided Text Analytics Technique for Capability-Based Classification of Manufacturing Suppliers

[+] Author and Article Information
Ramin Sabbagh

Engineering Informatics Lab,
Texas State University,
San Marcos, TX 78666
e-mail: r_s343@txstate.edu

Farhad Ameri

Engineering Informatics Lab,
Texas State University,
San Marcos, TX 78666
e-mail: ameri@txstate.edu

Reid Yoder

Engineering Informatics Lab,
Texas State University,
San Marcos, TX 78666
e-mail: rjy15@txstate.edu

Contributed by the Computers and Information Division of ASME for publication in the JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING. Manuscript received October 9, 2017; final manuscript received March 5, 2018; published online June 12, 2018. Assoc. Editor: Jitesh H. Panchal.

J. Comput. Inf. Sci. Eng 18(3), 031009 (Jun 12, 2018) (14 pages) Paper No: JCISE-17-1217; doi: 10.1115/1.4039553 History: Received October 09, 2017; Revised March 05, 2018

Manufacturing capability (MC) analysis is a necessary step in the early stages of supply chain formation. In the contract manufacturing industry, companies often advertise their capabilities and services in an unstructured format on the company website. The unstructured capability data usually portray a realistic view of the services a supplier can offer. If parsed and analyzed properly, unstructured capability data can be used effectively for initial screening and characterization of manufacturing suppliers specially when dealing with a large pool of suppliers. This work proposes a novel framework for capability-based supplier classification that relies on the unstructured capability narratives available on the suppliers' websites. Four document classification algorithms, namely, support vector machine (SVM ), Naïve Bayes, random forest, and K-nearest neighbor (KNN) are used as the text classification techniques. One of the innovative aspects of this work is incorporating a thesaurus-guided method for feature selection and tokenization of capability data. The thesaurus contains the formal and informal vocabulary used in the contract machining industry for advertising manufacturing capabilities. A web-based tool is developed for the generation of the concept vector model associated with each capability narrative and extraction of features from the input documents. The proposed supplier classification framework is validated experimentally through forming two capability classes, namely, heavy component machining and difficult and complex machining, based on real capability data. It was concluded that thesaurus-guided method improves the precision of the classification process.

Copyright © 2018 by ASME
Grahic Jump Location
Fig. 1

The SKOS concept diagram for Swiss machining process

Grahic Jump Location
Fig. 2

Document frequency of some of the concepts (categorized based on schema) in an intermediate stage of thesaurus development (before deleting less frequent concepts)

Grahic Jump Location
Fig. 3

The total number of concepts under each concept scheme in the manufacturing the MC thesaurus

Grahic Jump Location
Fig. 4

The total number of concepts under each top concept of the MC thesaurus

Grahic Jump Location
Fig. 5

Proposed manufacturer classification framework

Grahic Jump Location
Fig. 6

Concept weighting schema

Grahic Jump Location
Fig. 7

Concept model builder function

Grahic Jump Location
Fig. 8

The user interface for extracting capability text

Grahic Jump Location
Fig. 9

Sample capability narrative tagged by MC thesaurus concepts

Grahic Jump Location
Fig. 10

Extracted concepts with their frequencies from the sample capability narrative

Grahic Jump Location
Fig. 11

The precision of text classification techniques. By applying concept weighting, the overall precision improves for all four techniques.

Grahic Jump Location
Fig. 12

Comparison BoW and BoC methods for ten trial runs for heavy machining (HM) and complex machining




