Redmond Review

Will Big Data Be Big with Developers?

.NET developers are database developers. Whether using ADO.NET, the Entity Framework or data binding, .NET devs work with transactional data as a matter of course.

.NET developers are database developers. Whether using ADO.NET, the Entity Framework or data binding, .NET devs work with transactional data as a matter of course. But data analytics work is another matter. In fact, very few application and enterprise developers do analytics. Can Microsoft change that?

Microsoft opened the big data world to its ecosystem about a year ago with the announcement of its "Project Isotope" Hadoop on Windows initiative. A year later, though still in preview form, the technology has a brand (HDInsight) and significant integration with .NET and Visual Studio, and is clearly strategic to Microsoft. And developers are in the crosshairs: HDInsight was featured at BUILD, the flagship Microsoft developer conference.

Why does Microsoft think developers will take to analytics with big data now, when they didn't do so with business intelligence (BI) before? And given the overwhelming orientation of the big data world to Linux and Java, how does Microsoft expect to succeed in the space with Windows and .NET? At first glance, this looks to be a fool's errand. Is Microsoft naïve and tone deaf, or is it on to something?

Unboxing HDInsight
Before we judge whether developers will flock to HDInsight or shun it, let's get a sense of what the product is and what developer tools it features. HDInsight is based on the open source Apache Hadoop project, which provides processing and analysis of huge data sets (up to petabyte scale) by distributing the storage and compute workloads across numerous servers in a cluster. While this may sound straightforward -- and similar in principle to products such as SQL Server Parallel Data Warehouse (PDW) -- Hadoop can be pretty hard to work with.

Hadoop is natively queried through imperative Java code, using a two-pass approach called MapReduce. In this framework, a Map function first preprocesses the data, and a Reduce function then aggregates it. Multiple Mappers run in parallel across various nodes in the cluster, passing their output to multiple Reducer nodes to finish the work, also in parallel. A component included in most Hadoop distributions (including HDInsight) called "Pig" provides a data transformation language abstraction layer over Java-based MapReduce code. "Hive," another such component, provides a SQL-like abstraction over it.

What does Microsoft bring to the party? With HDInsight, developers can write MapReduce code in C# instead of Java, or use a LINQ provider to manipulate MapReduce indirectly through Hive. A NuGet package provides the C# MapReduce support, and a single-node developer version of HDInsight allows local debugging of such code in Visual Studio. A command-line utility provides deployment of the assembly to the local Hadoop instance. Deployment directly from Visual Studio to remote clusters, including the Windows Azure HDInsight implementation, seems a safe bet for future releases.

Bringing Hadoop to Windows (including to developers' own PCs) and providing integration and debugging support for C# and LINQ is a neat trick. It goes a long way toward making Hadoop an enterprise developer-friendly technology. Microsoft's alternate JavaScript-based framework for MapReduce code makes it friendly to Node.js and JavaScript developers, too. But will HDInsight appeal to the Linux- and Java-focused big data pros out there? Probably not, but therein lies the real value.

A Bigger Tent
Big data is a huge industry phenomenon right now, but the "data scientists" and MapReduce developers that enable its implementation are an exclusive bunch. These professionals are in short supply, and they don't come cheap. In other words, big data is a specialty at the height of its hype cycle, ripe for disruption.

We've seen this move before. Microsoft democratized Windows development with Visual Basic, enterprise development with .NET, relational database development with SQL Server and BI with a combination of that product plus SharePoint and Office. Every time Microsoft has disrupted an elite specialization, it's done so with devel- opers in its ecosystem. Now it's trying again with big data and HDInsight.

Hadoop is different from past disrupted areas, though, because it's already developer-focused. But the developers who typify the Hadoop faithful right now work in lab environments -- whether in academic organizations, big Internet companies or startups. Even in the enterprise, big data practitioners work in lab-like organizations; they're not, by and large, typical developers from IT and business units.

But for big data to be big, that needs to change; the skill set needs to be ubiquitous and mainstream. Business developers are database developers. Microsoft thinks they can be big data developers too. And if they're also Windows client/Phone/Server/Azure developers, that would be "big" for Redmond, indeed.

About the Author

Andrew Brust is Founder and CEO of Blue Badge Insights, an analysis, strategy and advisory firm serving Microsoft customers and partners. Brust is also a Microsoft Regional Director and MVP; an advisor to the New York Technology Council; and co-author of "Programming Microsoft SQL Server 2012" (Microsoft Press, 2012). A frequent speaker at industry events, Brust is co-chair of the Visual Studio Live! family of conferences and a contributing editor to Visual Studio Magazine. Brust has been a participant in the Microsoft ecosystem for over 20 years, and has worked closely with both Microsoft's Redmond-based corporate team and its field organization for much of the last 15. He is a member of several "insiders" groups that supply him with insight around important technologies out of Redmond. Follow Brust on Twitter @andrewbrust.

comments powered by Disqus

Reader Comments:

Add Your Comment:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above