NLP Document Classification & Entity Extraction API

🤖 ML Project

NLP Document Classification & Entity Extraction API

NLP microservice classifying PDFs and extracting key entities.

Project Overview

Client

Internal Project

Industry

NLP / Document Processing

Timeline

2 months (2024)

My Role

ML Engineer

Built an NLP microservice that automatically classifies documents and extracts structured data from unstructured PDFs.

The Challenge

Manual document processing was a bottleneck:

Hours spent manually categorizing documents

Inconsistent classification across team members

Key data buried in unstructured text

No searchable document database

Compliance risks from misfiled documents

They needed automated document understanding.

My Solution

I developed an intelligent document processing system:

1

Text Extraction

OCR and PDF parsing for text extraction from any document.

2

Classification Model

Fine-tuned transformer model for document type classification.

3

Entity Extraction

spaCy NER for extracting names, dates, amounts, and custom entities.

4

REST API

FastAPI endpoint for document upload and processing.

Key Features

📄

PDF Processing

Extract text from scanned and digital PDFs.

🏷️

Classification

Auto-categorize into 20+ document types.

🔍

Entity Extraction

Extract dates, names, amounts, and more.

🔗

API Ready

RESTful API for easy integration.

Tech Stack

NLP

spaCyTransformersNLTK

API

FastAPIPython

Document

PyMuPDFTesseract OCR

ML

Hugging FacePyTorch

Screenshots

NLP

Entity Extraction

API

API Documentation

Results

Classification Results

Results & Impact

95%

Accuracy

Classification

20+

Document

Types

2sec

Processing

Per Doc

10K+

Documents

Processed

Key Achievements

95% classification accuracy across 20+ document types

Processing time under 2 seconds per document

Extracted 50+ entity types with high precision

Processed 10,000+ documents in production

Interested in Something Similar?

I help businesses build robust backend systems, membership platforms, and automation tools.

More Projects