Data Scientist

PNW

About Me

I’m a Data Scientist currently living in Seattle. I completed my PhD in Biomedical Engineering at University of Washington, where I developed computational methods to explore relationships between brain structure and brain function using magnetic resonance imaging. My primary research focus was on designing computer vision models to map the human cortex using various MRI modalities, and relating derived maps of the cortex to various biological and neurological processes. I have extensive experience in medical imaging, image processing, machine learning, graph-based data, high-dimensional data analysis, data visualization, and scientific communication.

Since completing my PhD, I’ve worked as Data Scientist in the Biotechnology sector, with a major focus on building AI tools to facilitate the drug discovery process. This work has included developing image and graph-based machine learning models for in-house and client-facing projects, designing end-to-end analysis and deployment pipelines, and building data visualization tools for scientists in various functional groups.

When I’m not thinking about data, I like to backcountry ski, climb, trail run, and take care of my plants. I’m driven by curiosity and fueled by coffee.

Interests

  • Deep Learning
  • Data Visualization
  • Computer Vision

Education

  • PhD in Biomedical Engineering, 2021

    University of Washington

  • BSc in Molecular and Cellular Biology, 2012

    University of California, Los Angeles

What Am I Doing Now?

 
 
 
 
 

Senior Data Scientist

Just-Evotec Biologics

Mar 2022 – Present Seatte, WA
  • Leading the development of AI models for in-silico de novo antibody design using graph-based protein language models
  • Developed downstream protein purification visualization tools that reduced end-to- end analysis times by ~2 weeks per client project
  • Orchestrated migration of company-wide on-prem applications to cloud-based hosting on AWS EC2 instances with Gitlab CI/CD pipelines

Past Roles

 
 
 
 
 

Data Scientist

CuriBio

Jun 2021 – Apr 2022 Seatte, WA
  • Built AI models for predicting cell differentiation success rates from high- throughput microscopy imaging datasets
  • Explored the utility of explainable AI tools for relating imaging-based phenotypic features to cell differentiation outcomes
  • Developed software platform for phenotypic analysis of engineered cardiac and skeletal myocyte contractility waveforms
 
 
 
 
 

PhD Graduate Student, University of Washington

Integrated Brain Imaging Center

Sep 2014 – Dec 2021 Seatte, WA
  • Developed computer vision models based on unsupervised + supervised ML algorithms for segmenting cortical tissue using MRI, implemented in Python
  • Designed graph neural network models to apply population-level cortical maps to unmapped MRI images, improving segmentation accuracies by >15%, relative to conventional CNN models
  • Initiated cross-department study on brain dynamics, resulting in high-impact publication
  • Developed turn-key pipeline for processing 1000+ adult human MRI scans (>1.5TB) using shell scripting and distributed computing (SGE)
  • Thesis work resulted in 2+ high-impact peer-reviewed publications, 2+ international conference posters, and 3+ conference presentations
 
 
 
 
 

Software Engineering Intern

Phase Genomics

Apr 2017 – Jun 2017 Seattle, WA
  • Contributed to the development of meta-genome clustering algorithms for Python-based software platform
  • Learned and employed principles of test-driven software development
  • Gained experience with cloud computing using AWS
 
 
 
 
 

Data Science Intern

Pacific Northwest National Laboratory

Jun 2016 – Sep 2016 Richland, WA
  • Studied data structures related to dynamic graphs
  • Analyzed dynamical systems of functional MRI to characterize coherent spatial patterns of brain activity
  • Translated summer internship research into journal paper journal paper

Skills

Python

PyData stack

Machine Learning

PyTorch, Tensorflow, Scikit-Learn, Deep Graph Library

Software Engineering

Data structures and algorithms, unit-testing, object-oriented design

Linear Algebra

Statistics

Computer Vision / Image Processing

Recent Posts

CI/CD Part 4: Container Registries

This is the last post in a mini-series on designing Gitlab CI/CD pipelines. We’ve discussed the basic anatomy of a .gitlab-ci.yml file, how to set up authentication tokens and files for building and pushing packages to a registry, and designing a Dockerfile for building images from a package in the context of a CI/CD pipeline.

CI/CD Part 3: Building containers with Docker

This is the third post in a mini-series on designing Gitlab CI/CD pipelines. In the last post, we discussed setting up your .pypirc and .netrc files in the context of a Gitlab CI/CD pipeline to enable building and pushing packages to a package registry, as well as for installing code from a private registry.

CI/CD Part 2: Building and pushing packages

This is the second post in a mini-series on designing Gitlab CI/CD pipelines. In order to build packages and push them to a remote package registry, we use the build and twine packages. build generates a package, and twine pushes this package to a registry (or “index”).

CI/CD Part 1: Gitlab Pipelines

I recently developed a template workflow to help our team adopt a CI/CD-based development strategy. Many of our web applications and tools were based on simple repository structures. With growing datasets and ever-increasing use by outside teams, we found ourselves needing to add new features more frequently to many of these tools and believed that continuous integration and deployment could help us not just develop more quickly, but also more intelligently.

Visualizing SQL Schemas

I was recently tasked with examining databases related to some computer vision tools that my company had acquired. Basically, the framework was as follows… Clients/users would sign up for some service with the goal in mind of building a model to classify a set of microscopy images.

Software

parcellearning

package of neural network modules for learning cortical architectures from brain connectivity data

submet

package to compute various distance metrics between subspaces

ddCRP

package to fit distance-dependent Chinese Restaurant Process models

fieldmodel

package to fit distributions over scalar fields on the domain of regular meshes

Talks and Presentations

Automated Connectivity-Based Parcellation With Registration-Constrained Classification

(Best Talk, Honorable Mentions) Automated Connectivity-Based Parcellation With Registration-Constrained Classification

Analyzing the Resting Brain with Dynamic Mode Decomposition

Posters and Publications

Learning Cortical Parcellations Using Graph Neural Networks

We examine the utility of graph neural networks for the purpose of learning cortical segmentations. We show that attention-based transformer networks significantly outperform conventional GCN and linear feed-forward variants for the purpose of generating accurate reproducible cortical maps.

Linear Mapping of Cortico-Cortico Resting-State Functional Connectivity

Using non-linear dimensionality reduction of functional brain connectivity patterns, and multivariate spatial statistics to characterize the functional embeddings, we analyze the spatial relationships between pairs of cortical regions to better examine how pairs of cortical regions connect and relate to one another.

Extracting Reproducible Time-Resolved Resting State Networks Using Dynamic Mode Decomposition

In this paper, we develop a novel method based on dynamic mode decomposition (DMD) to extract resting-state networks from short windows of noisy, high-dimensional fMRI data, allowing RSNs from single scans to be resolved robustly at a temporal resolution of seconds. This automated DMD-based method is a powerful tool to characterize spatial and temporal structures of RSNs in individual subjects.

Automated Connectivity-Based Cortical Mapping Using Registration-Constrained Classification

In this analysis, we propose the use of a library of training brains to build a statistical model of the parcellated cortical surface to act as templates for mapping new MRI data.

Registering Cortical Surfaces Based on Whole-Brain Structural Connectivity and Continuous Connectivity Analysis

We present a framework for registering cortical surfaces based on tractography-informed structural connectivity. We define connectivity as a continuous kernel on the product space of the cortex, and develop a method for estimating this kernel from tractography fiber models.