Summary


  • Experienced data scientist & machine learning engineer, passionate about leading and performing highly technical work to drive business value.
  • Years of hands-on experience in the full lifecycle of machine learning models at scale, from business problem definition and refinement, data and feature pipelines, model development, and model deployment and monitoring in production.
  • Fast learner and self-starter, thrives in fast-paced environments. Ability to identify gaps in expertise in projects and programs, and filling the gaps by bringing in the right people or developing the right expertise myself.
  • Domain expertise in mineral exploration, financial services & semiconductor mfg.

Experience

Staff Data Scientist, KoBold Metals
02/2020-Present, Berkeley, CA

  • KoBold Metals is investing in battery materials projects across the globe by combining basic ore-deposit science, big data, and scientific computing with patient private capital.


Manager, Data Science – Credit Card Fraud Defense, Capital One
12/2017-02/2020, McLean, VA

  • Lead data scientist for developing and deploying machine learning models for payment fraud defense of our entire credit card portfolios. The models score > 700M individual payments totaling $400B (~2% US GDP) from > 40M customers per year, led to fraud savings of > $70M / yr over legacy model, and > $150M / yr overall.
  • Built reusable, well-tested, end-to-end model development pipelines, including infrastructure provisioning on AWS-EMR, data pull and feature engineering in PySpark and SQL, gradient boosting trees models in H2O, and model monitoring stack in Python, InfluxDB, and Grafana.
  • Re-wrote feature calculation and unit test codes in production from Java to PySpark, and set up CICD pipelines with pytest and Jenkins. Conducted model validation in prod, allowing on-schedule deployment.
  • Intimately involved in data scientist recruiting processes for the entire enterprise, serving as one of a handful of interviewers for on-site DS interviews, and providing feedback to shape our recruiting practices.


Manager, Data Science/PM – Enterprise Customer Intelligence, Capital One
03/2017-12/2017, Vienna, VA

  • Built prototype tools to consume customer digital interaction event streams on Kafka, and explored NLP / sequence models to generate insights to power personalized customer experiences over digital channels.
  • Interim product manager of in-house clickstream analytics platform that leverages Kafka and Snowplow. Coalesced efforts for monitoring and analysis, and coordinated user transition from legacy platform.


Principal Data Scientist – Analytical Solutions, Capital One Labs
04/2015-03/2017, Arlington, VA

  • Actively partner with internal lines of business to identify highest leverage problems, and execute on the development of data science based solutions and data products. Also serve as tech evangelist to mentor, train and teach data science techniques and software engineering best practices to cultivate self-sufficient teams.
  • Analyzed TBs of credit card transactions to identify characteristics and trends of block-level neighborhoods in selected US cities. Developed geospatial data pipelines in Python (fiona, rtree, shapely) and postgres / PostGIS, customer segmentation models in Python, and geospatial data-viz web app in R-shiny / leaflet.
  • Product owner & team lead for internal platform to automate workflows for business metrics monitoring and dashboards using Python, InfluxDB, and Grafana. Mentored 50+ analysts in 30+ teams, and implemented self-service instruction to scale adoption. (Details available on my PyCon 2019 talk and Capital One blog)


Senior Data Scientist – Technology/Product Characterization, Philips Lumileds
12/2012-04/2015, San Jose, CA

  • Developed statistical analysis and data visualization tool in R-shiny to automate data analysis, which reduced routine analysis time by 95% and improved team throughput and enabled faster learning cycles.
  • Built reusable data pipelines on manufacturing line data to connect multiple processing and testing steps, and developed tree-based models to provide insight on process control capabilities and improve yield.


Senior Device Engineer, Alta Devices
06/2011-10/2012, Sunnyvale, CA

  • Performed electrical and optical modeling in MATLAB to predict and improve solar cell performance.
  • Developed and optimized novel, scalable fabrication processes to improve solar cell efficiency, leading to 2 world records and 3 patents.


Stanford Graduate Fellow, Stanford University
09/2006-06/2011, Stanford, CA

  • Developed advanced characterization techniques and experimentally realized the new light trapping design to improve efficiency of organic-inorganic hybrid solar cells.
  • Initiated and led collaborations with five research teams across three continents, and authored 13 papers that are highly cited in the field.



Education

Ph.D., Stanford University
Materials Science and Engineering, 09/2006 - 06/2011, Stanford, CA
B.Sc., National Taiwan University
Chemistry, with minor in Materials Sci. Eng., 09/2000 - 06/2004, Taipei, Taiwan

Skills

Languages and Tools

Programming languages and tools that I use in my day-to-day.

Python, Spark, SQL, git, GitHub, Postgres, PostGIS, QGIS

Packages

Specific Python packages that I have proficiency with.

pandas, numpy, scipy, scikit-learn, matplotlib, seaborn, pytest, h2o, SimPEG

Cloud/DevOps

Cloud and DevOps-specific tools I have experience of.

AWS (EC2, EMR, S3), Dask, Docker, Kubeflow, Hadoop, Ansible, CircleCI, Jenkins

Domain expertise

Domain expertise gained during my work experiences.

Geospatial analysis, Geophysical modeling, Financial services, Semiconductor Manufacturing



Made with ❤️ using Flask, Jinja2, and Bootstrap 4. Source at GitHub