Introduction

The paper that you're reading is a little memoir of my journey into data science over the past six years - hardly the blink of an eye! From starting to dabble in text mining in 2012 to handling multiple large and complex sets in highly strategic projects, a huge number of challenges have arisen. This paper will talk about my learnings and give simple solutions to several analytical and technical challenges faced by most mid-sized companies working in the insights and strategy sphere - companies that don't have the kind of large-scale IT infrastructure support often taken for granted in big technology companies. A product-based tech company typically solves a single problem by making one specific product while every project we do is different, with new questions every single time requiring tailored solutions.

I don't claim that the solutions I'll talk about will work in all scenarios, but I do believe that the underlying concepts can be modified and replicated in different contexts. While I discuss processes requiring minimum infrastructure support, this does have its limits. However, if new developments in big data handling can be added to the mix, the processes can improved further. Finally, in no way can I say that what we've done is the only or best solution, but I do feel that there are approaches like the ones we're using that can effectively improve day-to-day handling of mid-size data (around 30GB) in companies that don't have a large IT department or big data tech in place.

Background