About Me

My photo
An experienced Data Professional with experience in Data Science and Data Engineering interested in the intersection of Machine Learning and Engineering.

Most Recent Post


How do I Scale My Personal Performance?

Scale My Personal Performance

Quick post on my personal process as a programmer and my own method of improving.

When I first started programming, I would

  1. Write some code
  2. Do some ad-hoc tests to make sure it was correct
  3. Move on

This is not scalable for working on a team and maintaining projects for months or years at a time. I was nervous to change any of the code 10 minutes after I did the ad-hoc tests, and even more nervous to update code 6 months after the initial development. I didn't want to break anything, so my ability to experiment with the libraries I was using was drastically limited.

This initial process wasn't even a good way to learn to program. Most of my programs as an undergrad didn't even compile. It wasn't until I started working as a Software Engineer and writing unit tests that I became a competent programmer.

Nowadays I

  1. Write some code
  2. Write some Unit Tests to validate code
  3. Iterate on the code for efficiency and readability

Iterating on existing code without changing the functionality has allowed me to have a deeper understanding of the libraries that I'm working with. Having the unit tests is a guarantee that the modified code works as expected with every change.

Compared to ad-hoc testing, writing unit tests doesn't even take any extra time. There is a learning curve to learning to using unit testing in Python, but the ability to re-run ad-hoc tests 5, 10, 50 times ultimately saves time in the development.

I Take Notes

I maintain notes on a variety of subjects,

  • Git/Github
    • Checkout new branch from an existing remote branch
    • How to set the upstream branch
    • etc
  • Python
    • Basics of unit testing
    • Common libraries I use a lot - mock, unittest, requests 
    • etc
  • Pandas
    • How do do a groupby transform to add a new column
    • How to fill `na` values depending on other columns
    • etc
  • PySpark
    • How to read a single CSV file
    • PySpark Pipelines
    • etc
  • Specific Vendors
    • Each vendor and service has it's own quarks, and I like to jot them down
  • ML Algorithms
  • OSX, bash, brew, conda, etc.

Each one is 1-3 pages (currently) and contains code snippets and/or explanations of things I've learned at work that are generic to Data Science in general, i.e. no company secrets and absolutely no copy/paste of any code developed at work for work.


Over time this process has helped me to become a better programmer and faster at accomplishing routine tasks.

Hopefully there is something in this that is helpful for you!