Scale My Personal Performance
Quick post on my personal process as a programmer and my own method of improving.
When I first started programming, I would
- Write some code
- Do some ad-hoc tests to make sure it was correct
- Move on
This is not scalable for working on a team and maintaining projects for months or years at a time. I was nervous to change any of the code 10 minutes after I did the ad-hoc tests, and even more nervous to update code 6 months after the initial development. I didn't want to break anything, so my ability to experiment with the libraries I was using was drastically limited.
This initial process wasn't even a good way to learn to program. Most of my programs as an undergrad didn't even compile. It wasn't until I started working as a Software Engineer and writing unit tests that I became a competent programmer.
Nowadays I
- Write some code
- Write some Unit Tests to validate code
- Iterate on the code for efficiency and readability
Iterating on existing code without changing the functionality has allowed me to have a deeper understanding of the libraries that I'm working with. Having the unit tests is a guarantee that the modified code works as expected with every change.
Compared to ad-hoc testing, writing unit tests doesn't even take any extra time. There is a learning curve to learning to using unit testing in Python, but the ability to re-run ad-hoc tests 5, 10, 50 times ultimately saves time in the development.
I Take Notes
I maintain notes on a variety of subjects,
- Git/Github
- Checkout new branch from an existing remote branch
- How to set the upstream branch
- etc
- Python
- Basics of unit testing
- Common libraries I use a lot - mock, unittest, requests
- etc
- Pandas
- How do do a groupby transform to add a new column
- How to fill `na` values depending on other columns
- etc
- PySpark
- How to read a single CSV file
- PySpark Pipelines
- etc
- Specific Vendors
- Each vendor and service has it's own quarks, and I like to jot them down
- ML Algorithms
- OSX, bash, brew, conda, etc.
Each one is 1-3 pages (currently) and contains code snippets and/or explanations of things I've learned at work that are generic to Data Science in general, i.e. no company secrets and absolutely no copy/paste of any code developed at work for work.
Conclusion
Over time this process has helped me to become a better programmer and faster at accomplishing routine tasks.
Hopefully there is something in this that is helpful for you!