🤖 ML Project
NLP microservice classifying PDFs and extracting key entities.
Internal Project
NLP / Document Processing
2 months (2024)
ML Engineer
Built an NLP microservice that automatically classifies documents and extracts structured data from unstructured PDFs.
Manual document processing was a bottleneck:
Hours spent manually categorizing documents
Inconsistent classification across team members
Key data buried in unstructured text
No searchable document database
Compliance risks from misfiled documents
They needed automated document understanding.
I developed an intelligent document processing system:
OCR and PDF parsing for text extraction from any document.
Fine-tuned transformer model for document type classification.
spaCy NER for extracting names, dates, amounts, and custom entities.
FastAPI endpoint for document upload and processing.
Extract text from scanned and digital PDFs.
Auto-categorize into 20+ document types.
Extract dates, names, amounts, and more.
RESTful API for easy integration.
NLP
API
Document
ML
Entity Extraction
API Documentation
Classification Results
95%
Accuracy
Classification
20+
Document
Types
2sec
Processing
Per Doc
10K+
Documents
Processed
Key Achievements
95% classification accuracy across 20+ document types
Processing time under 2 seconds per document
Extracted 50+ entity types with high precision
Processed 10,000+ documents in production
I help businesses build robust backend systems, membership platforms, and automation tools.