Center for Clinical Investigation

Academic home of the Cleveland Clinical and Translational Science Collaborative.

Jump to: navigation, search

Satya Sahoo, PhD, is an assistant professor in the Division of Medical Informatics at Case Western Reserve University.


  • Paper comparing two approaches for computing brain structural connectivity nominated for Best Student Paper Award at AMIA CRI 2016 (details)
  • Epilepsy ontology paper is selected for Best Paper Award by the International Medical Informatics Association (IMIA) in clinical research informatics (paper)
  • Tutorial on Provenance Analysis and RDF Query Processing (PARC) at International Semantic Web Conference (ISWC 2015), Bethlehem, PA on October 12, 2015. Register here for the tutorial
  • Inaugural NIH Big Data to Knowledge (BD2K) Software Development grant in Data Provenance awarded to Satya Sahoo at CWRU (NIH Announcement)
  • CWRU to lead new project to look at provenance metadata in Big Data (MedCity News)
  • CWRU to lead multi-institutional Big Data project (The Daily)

Open Position in Biomedical Big Data

July 2015: Data Scientist in Biomedical Big Data (Position Details)

Research Interests

My research interests include development of computer science methods and technologies for:
1. Clinical big data: Massive scale distributed computing over electrophysiological signal data (Cloudwave project);
2. Patient information capture at the point of care (OPIC project)
3. Natural language processing (NLP) over clinical free text (EpiDEA project)

My computer science research involves knowledge representation and reasoning (Ontology engineering), distributed and parallel computing, Semantic Web, provenance metadata, and data integration. I have also worked on scientific workflows and (Semantic) Web services.

I have served as an invited expert in the World Wide Web Consortium (W3C) Provenance Working Group and received the 2012-13 Glennan Fellowship award from the CWRU University Center for Innovation in Teaching and Education.


Google Scholar Profile (with citation score)
List Format

Research Projects

Cloudwave: Managing Electrophysiological Big Data

Electrophysiological signal data collected in epilepsy centers, such electroencephalogram (EEG) and electrocardiogram (ECG), are characterized by both volume and velocity. Signal data is used as gold standard to drive a range of research and patient care applications, including pre-surgical evaluation of patients. The Cloudwave platform uses open source Hadoop implementation to apply map-reduce algorithms for parallelizing the computation of clinical measures. The results are accessed through a Web-based visual interface with signal rendering and query composition features. More information
Funding: The PRISM (Prevention and Risk Identification of SUDEP Mortality) Project (1-P20-NS076965-01)

Ontology-driven Clinical Free Text Analysis

Extracting structured data from clinical free text is extremely challenging. Epilepsy Data Extraction and Annotation (EpiDEA) is an ontology-driven clinical free text processing platform that uses the Epilepsy and Seizure ontology (EpSO) as the core knowledge resource for processing, representing, and querying of clinical text. By extending the cTAKES natural language processing tool developed at the Mayo Clinic, EpiDEA addresses the unique challenges of epilepsy and seizure-related clinical free text in patient discharge summaries. EpiDEA also incorporates a visual interface for cohort identification that can be directly used by clinical researchers. More information
Funding: The PRISM (Prevention and Risk Identification of SUDEP Mortality) Project (1-P20-NS076965-01)

Provenance Framework for Biomedical Data Management

Provenance is contextual metadata that facilitates effective data integration, reproducibility of results, correct attribution of original source, and answering queries involving “What”, “Where”, “When”, “Which”, “Who”, “How”, and “Why”. The SemPoD project is creating an integrated query environment for accessing and analyzing experiment data using well known experiment reporting guidelines (e.g. MIMIx, MIAPE) as search criteria. SemPoD uses a provenance ontology (semantic provenance) to implement (a) Ontology-driven Visual Query Composer, (b) Result Explorer,and (c) Query Manager. More information
Funding: CTSC Informatics Pilot (UL1TR000439)


  • Biomedical Big Data for Clinical Research and Patient Care: Role of Semantic Computing, Plenary Session: Challenges of Semantic Computing and Medical Big Data at Eighth IEEE International Conference on Semantic Computing (ICSC), 2014 (Slides)
  • Awakening Clinical Data: Semantics for Scalable Medical Research Informatics, Dagstuhl seminar on Semantic Data Management at the Leibniz Center for Informatics, 2012 (Slides)
  • Role of Semantic Web in Health Informatics, Tutorial at 2nd ACM SIGHIT International Health Informatics Symposium, 2012 (Slides)
  • A Framework for Provenance Management in eScience, EECS Seminar at Case Western Reserve University, October 7, 2010 (details)

Service (selected)


(Graduated Students)

Semantic Web and Provenance Workshop Series

Proposed and co-organize a series of workshops exploring the research issues at the intersection of Semantic Web and Provenance Management (SWPM).

  • SWPM 2012 (co-located with the 9th Extended Semantic Web Conference, Heraklion, Greece)
  • SWPM'10 (co-located with 9th International Semantic Web Conference, Shanghai, China)
  • SWPM'09(co-located with 8th International Semantic Web Conference, Washington DC, USA)

Satya sahoo.png

Satya Sahoo, PhD
Assistant Professor
Division of Medical Informatics
Office: WRB 6126
Phone: 216-368-3286
Fax: 216-368-0207
Email: satya ''dot'' sahoo ''at'' case ''dot'' edu