Multimodal AI, LLMs & Medical Imaging Education
2019 — Present
Ph.D. at RPI; earlier ML/NLP foundation
My recent research focuses on multimodal AI systems for medical imaging education, benchmark design, and interactive learning. I built MEDI-SLATE, a slide-lecture aligned dataset containing 1,117 high-resolution slides and 262,182 refined narration tokens from a full 23-lecture undergraduate medical imaging course.
I designed MILU, a benchmark for structured lecture understanding across four open-source VLMs, generating 15,000+ JSON artifacts in 9–11 GPU-hours on 4 × NVIDIA RTX A5000 GPUs. Across 1,117 slides, parsing coverage remained 92–99%, while semantic agreement stayed low, with pairwise concept Jaccard of 0.03–0.09 and triple-level F1 of 0.001–0.033.
I also contributed to ALIVE, a fully local avatar-lecture interaction engine integrating ASR, FAISS-based retrieval, local LLM reasoning, text-to-speech, and talking-head synthesis for real-time lecture-grounded interaction. Earlier work in ML and NLP included Bangla and phonetic Bangla text analysis, where a manually annotated dataset of 1,500 reviews achieved 75.58% accuracy with SVM, and later work on unified sentiment and emotion recognition.
AI Safety & Synthetic Agent Systems
2025 — Present
Ph.D., RPI
My current safety-oriented work studies risky instruction propagation, social regulation, and decentralized governance in synthetic AI societies. I developed OpenClaw on Moltbook, an agent-only social environment for analyzing emergent behavior among autonomous agents.
In an empirical study of 39,026 posts and 5,712 comments from 14,490 agents, I found that 18.4% of posts contained action-inducing language and that such posts were more likely to elicit norm-enforcing responses, while toxic responses remained rare.
I also developed ADAPT, an AI-driven decentralized publishing framework that models scholarly publishing as a closed-loop governance system with bounded policy adaptation under overload, disagreement, and collusion-related stress. This work led to U.S. Provisional Patent Application No. 63/975,609.
Blockchain Provenance, Secure Retrieval & Applied AI
2020 — Present
M.Sc. at KUET + continuing work
My M.Sc. and related work focus on blockchain-based trust, provenance, and secure information systems. I completed my M.Sc. thesis on a blockchain-based secure framework for user-centric multi-party skyline queries, introducing multi-party ElGamal, re-encryption and shuffling, targeted queries, and blockchain-based integrity with distinct blocks for each party.
I built SlideChain, a blockchain-backed semantic provenance framework for educational AI, using four VLMs over 1,117 lecture slides and achieving approximately one-slide-per-second registration throughput, 100% tamper detection, and deterministic reproducibility with Jaccard = 1.0.
I also developed secure data-sharing and applied AI systems including ShaEr, a privacy-preserving medical data sharing and monetisation framework, and contributed to a blockchain-aided heart disease detection system integrating seven datasets and achieving 89.2% accuracy, with 85.3% precision, 97.0% recall, and 90.8% F1 using a voting ensemble with private blockchain support.