Over the next five years, the marketing department will spend more on software than any other area of the organization. According to IDC, total spending is expected to reach $32.3 billion in 2018, up from $20.2 billion in 2014. It’s one of the fastest-growing areas in business tech spending, with a projected annual growth rate of 12.4%.
So if you’re a CMO investing in technology to run the revenue engine, how can you identify which solutions are built for long term innovation?
The simple answer, find out whether or not their data science and engineering teams use Spark.
Before I attempt to explain the importance of Spark, it bears noting that I’m about the farthest thing from an engineer in the Radius organization; I manage content marketing. Spark may not be a part of my daily life, but the tools that Spark enables are changing the way all of us market – and making our jobs both more interesting and more fun. Spark empowers some of the most innovative technology solutions, such as Bizo, Autodesk, Kenshoo, VideoAmp, Shopify, Taboola, Ooyala, Radius, and lots more, which allow marketers to create beautiful campaigns that also drive revenue. If that sounds like an oxymoron to you, you can begin to see the power behind Spark.
Apache Spark is an engine for large-scale data processing that operates at unprecedented speeds, and it was built specifically to empower data scientists and engineers.
Spark was conceived as an improvement to the wildly popular Hadoop framework. Hadoop itself was created to compute on large data sets with cheap, commoditized hardware. This is done by performing simple computational tasks that are easy to distribute over many clusters. The issue here is that processing is done by sending and receiving data to and from the hard disk in every step, which is the slowest task a computer can perform. The renowned database researcher Jim Gray used the following analogy for data access times: if accessing data on the processor is equivalent to accessing information from your head, then accessing data in RAM is equivalent to driving the 1.5 hours from Berkely to Sacramento to retrieve it, and accessing data on disc is equivalent to taking a two year space shuttle trip to Pluto to retrieve it!
Enter Spark, which eliminates the back and forth between the data and the disk. With Spark, data transformations stay in RAM, and if you have a multi-step process, you keep as many things in memory as possible. Using Spark, engineers can process increasingly complex problems with massive data sets at ten times the speed they can with just Hadoop.
Spark isn’t just fast; it’s much easier to use. It frees developers from the constraints of simple data processing primitives that Hadoop requires by offering a much richer set of data transformations. Before Radius adopted Spark, our Lead Data Engineer was the only person in the company who could run queries on the database. Now we not only have the whole engineering team running multiple data processing tasks a day, but we even have our business development team processing data and creating dashboard visualizations.
You can solve more exciting problems using Spark and Hadoop than you can using Hadoop alone. Top talent clusters around companies that use top technology. When a top tier data engineer looks for a job, whether or not the company uses Spark has an impact on his or her decision.
The amount of data companies today amass is staggering. According to Scott Brinker, marketing technology expert, the next generation of marketing technology will revolve around companies built on data. The maturity of the big data ecosystem has spawned a generation of companies using Spark to build interactive applications and experiences powered by innovative algorithms.
Us marketers love to talk about being data-driven, but we’re not very good at understanding data. If we were, we probably wouldn’t be marketers. If you want to run real data-driven marketing campaigns – with results that go beyond open rates and customer insights – you should be using applications powered by Spark. Spark enables a new generation of technology tools that don’t just analyze campaign performance, they actually predict it. That’s when marketing starts to get really good.
“Spark is a big machine that’s evolving very quickly,” Patrick Wendell, Databricks
Bizo, acquired by LinkedIn in 2014, allows marketers to target display campaigns to specific business demographic audiences. As a part of LinkedIn, Bizo helps marketers better target ads on the social network.
Bizo uses Spark to allow users to compare behavior of website visitors based on whether or not they’ve been exposed to a Bizo display ad.
Ooyala offers video solutions for media companies, enterprises, broadcasters, and operators. Ooyala’s applications apply data analytics so customers can deliver personalized viewing experiences on any screen.
Ooyala processes over two billion video events a day. Ooyala uses Spark to turn all that data into digestible, actionable insights about viewing experiences for customers.
Conviva helps customers optimize end user online video experience by providing analytics for online video providers and distributors, and enabling multi-CDN infrastructure with traffic shaping control.
Conviva uses Spark to make video streaming of web content seamless for viewers – with minimal buffering.
Interested in learning more about Spark? Our data science team shares the lessons they’ve learned in their transition from Hadoop to Spark.