A 2011 McKinsey & Co. survey pointed out that many organizations don’t have the skilled personnel needed to mine big data for insights and the structures and incentives required to use big data to make informed decisions and act on them.
Big data is a mixture of distributed data architectures and tools like Hadoop, NoSQL, Hive and R. Data scientists serve as the gatekeepers and mediators between these systems and the people who run the business – the domain experts.
Three main roles served by the data scientist: data architecture, machine learning, and analytics. While these roles are important, but not every company actually needs a highly specialized data team of the sort you’d find at Google or Facebook.
Most of the standard challenges that require big data, like recommendation engines and personalization systems, can be abstracted out. On a per domain basis, however, feature creation could be templatized. What if domain experts could directly encode their ideas and representations of their domains into the system, bypassing the data scientists as middleman and translator?