24 June, 2021
Jacob Levernier, PhD
Software engineer and architect specialized in creating tools for data science, from prototype to production. Technical leadership in software development married to extensive training as a data scientist.
7+ years of professional full stack architecture and development. 4+ years in enterprise environments.
7+ years professional data engineering using SQL, including 3+ years using NoSQL approaches.
3+ years professionally teaching and collaborating in applied statistics and machine learning.
3+ years in Technical Lead and team leadership roles.
Recognized across institutions as a leader in team facilitation, communication, and project management.
Full Stack Frontend, Back End Server, Command-Line Interface, and Native App development:
Python: AsycIO, unittest
Go, Bash, PHP, Bash Automated Testing System
API design: GRPC, Websockets, GraphQL, SQLAlchemy
R: Shiny, RMarkdown, Stan, Classical Machine Learning, Bayesian statistics, Synthetic data, Statistical Disclosure Control
Python: Numpy, Pandas, D3, Jupyter
Databases: Design and consumption of SQL (PostgreSQL, MySQL), ElasticSearch, Neo4J Cypher
Containerization: Docker, Docker-Compose, Kubernetes, Vault, Terraform
CI/CD: GitHub Actions, Bazel, Snakemake, Make, Custom tooling
Leadership: Agile Project Management, Jira Administration, Workflow design, Documentation workflows
PhD, Personality & Social Psychology (Supporting Specialization in Data Science) Eugene, OR
University of Oregon 08/2011 – 09/2016
- Specialized in Natural Language Processing, applied statistics, algorithm creation, and data ethics
- Developed new algorithm for Natural Language Processing in network graphs
- Significant software architecture, software development, and project management experience
MS, Psychology Eugene, OR
University of Oregon 08/2011 - 12/2012
BA, Psychology (Minors Neuroscience, Ethics) San Francisco, CA
University of San Francisco 08/2006 – 05/2010
- First student in 13 years to be both Valedictorian and Dean's Medalist
The Coleridge Initiative Remote
Data and Privacy Manager 01/2021 - 03/2021
Designed new infrastructure approach for scaling and automating statistical disclosure control for a large data warehouse for highly sensitive federal and state government data (e.g., Medicaid, Social Security), to address time-critical need for resilience to rapidly increasing client-base.
Automated team’s project management approach, including designing and implementing SLA-relevant alerting systems, decreasing design, development, and reporting time from days to hours.
Created new semi-automated approach to reviewing Data Sharing Agreements for data from federal and state government agencies, resulting in surfacing new areas of necessary renegotiation across multiple contracts.
Children's Hospital of Philadelphia (CHOP) Philadelphia, PA
Senior Analyst / Programmer 08/2018 - 01/2021
Designed and engineered major full stack components of "Arcus,” a large-scale micro-architecture data and computing platform for securely combining clinical and research datasets.
Redesigned and fully implemented 7-person engineering team’s CI/CD approach, creating custom tooling around Bazel and additional pipeline software. Created self-documenting tools for containerizing development and production environments, deploying software, and creating and running automated functional and unit tests.
Designed and authored large-scale metadata management platform for multi-terabyte datasets and delivered fully-functional product in one month, down from timeline of 10 months for previous design.
Developed new algorithm for creating reproducible mock data for testing custom React front-end statistical display components, resulting in faster development and testing cycles.
At Team Lead’s request, designed and led seven-person engineering team’s approaches to project management and cross-team communication, resulting in the team’s subsequent Jira workflow and documentation approaches being adopted by additional teams.
Designed and created ElasticSearch cluster, ETL pipelines, and APIs for data discovery at scale.
Created new approach to documenting research management, adopted by a 40-person team, resulting in dramatically faster and more consistent onboarding of research project specialists.
As Technical Lead and senior contributor, supervised undergraduate and graduate students and mentored junior colleagues.
Princeton University Remote
Data Scientist (Project-based) 05/2019 - 08/2019
Developed semi-supervised workflow for cleaning and classifying library borrower records to aid administrators in understanding resource allocations.
University of Pennsylvania Philadelphia, PA
CLIR Bollinger Fellow in Library Innovation (Data Scientist) 01/2017 – 09/2018
Python Programming and Data Science Instructor, Price Lab for Digital Humanities 03/2017 – 10/2018
Through the Council for Library and Information Resources (CLIR), designed and prototyped new software and algorithms for privacy-protecting data analysis of large-scale library user data, sharing these approaches across the “Ivy Plus” universities, resulting in multiple policy decisions.
Led development and deployment of full stack components and containers for interactive dashboards, resulting in a new, more in-depth, and faster approach to cleaning and reporting on user data for executive decision-making.
Served as software engineering and data science consultant for software engineers and researchers, including at the Library of Congress.
Co-founded advisory group on software and data engineering best practices relating to data privacy, resulting in new policy creation.
Taught programming and data analysis to faculty and graduate researchers. Lectured at Wharton Business School on measure design and data visualization.
The University of Oregon Eugene, OR
Research and Teaching Fellow (Research Scientist and Software Engineer) 09/2011 – 12/2016
As architect and lead developer, created data platform for research participant metadata, adopted by multiple research labs, increasing the speed of recruitment matching from days to seconds.
Led software development across three research labs, designing and authoring new, real-time data collection and analysis pipelines.
Managed a research lab of four researchers and five research assistants.
Taught Graduate and Advanced Undergraduate statistics and machine learning.
The Veterans Health Research Institute / San Francisco VA San Francisco, CA
Staff Research Associate II 09/2010 – 07/2011
Designed and authored research participant recruitment database and an automated system for determining research participant study eligibility, increasing accuracy in recruitment analyses.
Selected Publications and Presentations: Available here.