DataRaccoon


Welcome! Here to explore or learn?

Why did i start this site? Read TLDR / Longer Tale to find out!

Since my undergraduate days, I am fond of digitalizing my summarized notes in order to

  1. Aids my personal understanding of a topic.
  2. Assist my recollection, and since it is digitalized, it is easy to search and edit!
  3. Sharing my notes with my peers which aid their exam preparation, which they in kind help me spot errors or suggest changes or share new insights/knowledge.

In a way, this site is my attempt of recreating something similar as a working professional and my contribution back to the industry.

I hope you enjoy your stay here and is able to learn a few things along the way, make some actions and hopefully benefit the world!

The Data/Machine Learning industry is vast, diverse and constantly evolving. While it is an industry that is largely misunderstood, it is also provides constant learnings as well as interesting and unique challenges.

Regarding largely misunderstood, I personally resonate most with Dr. Cassie Kozyrkov's1 way of describing this.

Paraphrasing her and in my own words, this is what I think:

Fundamentally, it is about decision making and as a result of that, performs an action which thus creates impact.

If you had watched her videos2, you will see she split into 3 broad categories;

  1. Performance (Engineering),
  2. Rigor (Inference),
  3. Speed (Analytics).

In each of these 3 pillars, the practices, tools, processes (among other things) and even the types of people can get quite different! I hope to share my perspectives and experiences in each of these areas.


Because the nature of work across the pillars are quite different, learning the skills of another pillar can be quite tough. Figuring out which resource or the right resource (online or otherwise) can sometimes be a feat on its own. Often, I learn just enough to solve the problem at hand and almost immediately move on to other objectives.

This particular approach poses a problem when the following situations occur:

  1. When the challenges repeat itself (similar problem but in a different domain)
  2. The original challenge needs to be revisited upon new discovery,

and I realize I lost my intuition/justifications about my initial approach, which required me to recall my learning curve in order to proceed.

Keeping up with the new tools, updating oneself with new best practices can get quite demanding. (This is generally true in technology, not just data). In some sense, you wish there exists some form of transfer learning3 that can be applied!

Common FAQ

Footnotes


  1. This short (5minutes) video at Strata where Dr Cassie describes 'Data Science' 

  2. If you are more interested / spare time, this other video by society of decision professionals which also features Dr Cassie is really good. 

  3. This is actually a pun from neural networks, where you remove/retrain the last few layers of a neural network. In this context, it is about leveraging your background to learn a new vertical.