D-Extract: Extracting dimensional attributes from product images
Product dimension is a crucial piece of information enabling customers make better buying decisions. Ecommerce websites extract dimension attributes to enable customers filter the search results according to their requirements. The existing methods extract dimension attributes from textual data like title and product description. However, this textual information often exists in an ambiguous, disorganized structure. In comparison, images can be used to extract reliable and consistent dimensional information. With this motivation, we hereby propose two novel architecture to extract dimensional information from product images. The first namely Single-Box Classification Network is designed to classify each text token in the image, one at a time, whereas the second architecture namely MultiBox Classification Network uses a transformer network to classify all the detected text tokens simultaneously. To attain better performance, the proposed architectures are also fused with statistical inferences derived from the product category which further increased the F1-score of the SingleBox Classification Network by ≈ 3.78% and Multi-Box Classification Network by ≈ 0.9%. We use distance supervision technique to create a large scale automated dataset for pretraining purpose and notice considerable improvement when the models were pretrained on the large data before finetuning. The proposed model achieves a desirable precision of 91.54% at 89.75% recall and outperforms the other state of the art approaches by ≈ 4.76% in F1-score.