Tuesday, November 17, 2015

Spirent leverages big data to keep user experience quality a winning factor for telcos

The next BriefingsDirect big-data case study discussion explores the ways that Spirent Communications advances the use of big data to provide improved user experiences for telecommunications operators.

We'll learn how advanced analytics that draws on multiple data sources provide Spirent’s telco customers’ rapid insights into their networks and operations. That insight, combined with analysis of user actions and behaviors, provides a "total picture" approach to telco services and uses that both improves the actual services proactively -- and also boosts the ability to better support help desks.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. 

Spirent’s insights thereby help operators in highly competitive markets reduce the spend on support, reduce user churn, and better adhere to service-level agreements (SLAs), while providing significant productivity gains.

To hear how Spirent uses big data to make major positive impacts on telco operations, we're joined by Tom Russo, Director of Product Management and Marketing at Spirent Communications in Matawan, New Jersey. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: User experience quality enhancement is essential, especially when we're talking about consumers that can easily change carriers. Controlling that experience is more challenging for an organization like a telco. They have so many variables across networks. So at a high-level, tell me how Spirent masters complexity using big data to help telcos maintain the best user experience.

Russo: Believe it or not, historically, operators haven't actually managed their customers as much as they've managed their networks. Even within the networks, they've done this in a fairly siloed fashion.

Russo
There would be radio performance teams that would look at whether the different cell towers were operating properly, giving good coverage and signal strength to the subscribers. As you might imagine, they wouldn't talk to the core network people, who would make sure that people can get IP addresses and properly transmit packets back and forth. They had their own tools and systems, which were separate, yet again, from the services people, who would look at the different applications. You can see where it’s going.

There were also customer-care people, who had their own tools and systems that didn’t leverage any of that network data. It was very inefficient, and not wrapped around the customer or the customer experience.

New demands

They sort of got by with those systems when the networks weren't running too hot. When competition wasn't too fierce, they could get away with that. But these days, with their peers offering better quality of service, over-the-top threats, increasing complexity on the network in terms of devices, and application services, it really doesn't work any more.
Automate Data Collection and Analysis
In Support of Business Objectives
With Spirent InTouch Analytics
It takes too long to troubleshoot real customer problems. They spend too much time chasing down blind alleys in terms of solving problems that don't really affect the customer experience, etc. They need to take a more customer-centric approach. As you’d imagine that’s where we come in. We integrate data across those different silos in the context of subscribers.

We collect data across those different silos -- the radio performance, the core network performance, the provisioning, the billing etc. -- and fuse it together in the context of subscribers. Then, we help the operator identify proactively where that customer experience is suffering, what we call hotspots, so that they can act before the customers call and complain, which is expensive from a customer-care perspective and before they churn, which is very expensive in terms of customer replacement. It's a more customer-centric approach to managing the network.

Gardner: So your customer experience management does what your customers had a difficult time doing internally. But one aspect of this is pulling together disparate data from different sources, so that you can get the proactive inference and insights. What did you do better around data acquisition?
We integrate data across those different silos in the context of subscribers.

Russo: The first key step is being able to integrate with a variety of these different systems. Each of the groups had their different tools, different data formats, different vendors.

Our solution has a very strong what we call extract, transform, load (ETL), or data mediation capability, to pull all these different data sources together, map them into a uniform model of the telecom network and the subscriber experience.

This allows us to see the connections between the subscriber experience, the underlying network performance and even things like outcomes -- whether people churn, whether they provide negative survey responses, whether they've called and complained to  customer care, etc.

Then, with that holistic model, we can build high-level metrics like quality of experience scores, predictive models, etc. to look across those different silos, help the operators see where the hot spots of customer dissatisfaction is, where people are going to eventually churn, or where other costs are going to be incurred.

Gardner: Before we go more deeply into this data issue, tell me a bit more about Spirent. Is the customer experience division the only part? Tell me about the larger company, just so we have a sense of the breadth and depths of what you offer.

World leader

Russo: Most people, at least in telecom, know Spirent as a lab vendor. Spirent is one of the world leaders in the markets for simulating, emulating, and testing devices, network elements, applications, and services, as they go from the development phase to the launch phase in their lifecycle. Most of their products focus on that, the lab testing or the launch testing, making sure that devices are, as we call it, "fit for launch."
HP Big Data Analytics Engines
Meet Complex Enterprise-scale OEM Requirements
Get More Information
Spirent has historically had less of a presence in the live network domain. In the last year or two, they’ve made a number of strategic acquisitions in that space. They’ve made a number of internal investments to leverage the capabilities and knowledge base that they have from the lab side into the live network.

One of those investments, for example, was an acquisition back in early 2014 of DAX Technologies, a leading customer experience management vendor. That acquisition, plus some additional internal investments has led to the growth of our Customer Experience Management (CEM) Business Unit.

Gardner: Tom, tell me some typical use cases where your customers are using Spirent in the field. Who are those that are interacting with the software? What is it that they're doing with it? What are some of the typical ways in which it’s bringing value there?

Russo: Basically, we have two user bases that leverage our analytics. One is the customer-care groups. What they're trying to do is obtain, very quickly, a 360-degree view of the experience that a subscriber is seeing -- who is calling in and complaining about their service and the root causes of problems that they might be having with their services.

If you think about the historic operation, this was a very time-intensive, costly operation, because they would have to swivel chair, as we call it, between a variety of different systems and tools trying to figure out whether I had a network-related issue, a provisioning issue, a billing issue, or something else. These all could potentially take hours, even hundreds of hours, to resolve.

With our system, the customer-care groups have one single pane of glass, one screen, to see all aspects of the customer experience to very quickly identify the root causes of issues that they are having and resolve them. So it keeps customers happier and reduces the cost of the customer-care operation.

The second group that we serve is on the engineering side. We're trying to help them identify hotspots of customer dissatisfaction on the network, whether that be in terms of devices, applications, services, or network elements so that they can prioritize their resources around those hotspots, as opposed to noisy, traditional engineering alarms. The idea here is that this allows them to have maximal impact on the customer experience with minimal costs and minimal resources.

Gardner: You recently rolled out some new and interesting services and solutions. Tell us a little but about that.

Russo: We’ve rolled out the latest iteration of our InTouch solution, our flagship product. It’s called InTouch Customer and Network Analytics (CNA) and it really addresses feedback that we've received from customers in terms of what they want in an analytic solution.

We're hearing that they want to be more proactive and predictive. Don’t just tell me what's going on right now, what’s gone on historically, how things have trended, but help me understand what’s going to happen moving forward, where our customer is going to complain. Where is the network going to experience performance problems in the future. That's an increasing area of focus for us and something that we've embedded to a great degree in the InTouch CNA product.

More flexibility

Another thing that they've told us is that they want to have more flexibility and control on the visualization and reporting side. Don't just give me a stock set of dashboards and reports and have me rely on you to modify those over time. I have my own data scientists, my own engineers, who want to explore the data themselves.
Automate Data Collection and Analysis
In Support of Business Objectives
With Spirent InTouch Analytics
We've embedded Tableau business intelligence (BI) technology into our product to give them maximum flexibility in terms of report authorship and publication. We really like the combination of Tableau and Hewlett Packard Enterprise (HPE) Vertica because it allows them to be able to do those ad-hoc reports and then also get good performance through the Vertica database.

And another thing that we are doing more and more is what we call Closed Loop Analytics. It's not just identifying an issue or a customer problem on the network, but it's also being able to trigger an action. We have an integration and partnership with another business unit in Spirent called Mobilethink that can change device settings for example.

If we see a device is mis-provisioned, we can send alert to Mobilethink, and they can re-provision the device to correct something like a mis-provisioned access point name (APN) and resolve the problem. Then, we can use our system to confirm indeed that the fix was made and that the experience has improved.
We're trying to tie it all together, everything from the subscriber transactions and experience to the underlying network performance, again to the outcome type information.

Gardner: It’s clear to me, Tom, how we can get great benefits from doing this properly and how the value escalates the more data and the more information you get, and the better you can serve those customers. Let's drill down a bit into how you can make this happen. As far as data goes, are we talking about 10 different data types, 50? Given the stream and the amount of data that comes off of a network, what size data we are talking about and how do you get a handle on that?

Russo: In our largest deployment, we're talking about a couple of dozen different data sources and a total volume of data on the order of 50 to 100 billion transactions a day. So, it’s large volume, especially on the transactional side, and high variety. In terms of what we're talking about, it’s a lot of machine data. As I mentioned before, there is the radio performance, core network performance, and service performance type of information.

We also look at things like whether you're provisioning correctly for the services that you're trying to interact with. We look at your trouble ticket history to try and correlate things like network performance and customer care activity. We will look at survey data, net promoter score (NPS) type information, billing churn, and related information.

We're trying to tie it all together, everything from the subscriber transactions and experience to the underlying network performance, again to the outcome type information -- what was the impact of the experience on your behavior?

Gardner: What specifically is your history with HPE Vertica? Has this been something that's been in place for some time? Did you switch to it from something else? How did that work out?

Finishing migration

Russo: Right now, we're finishing the migration to HP Vertica technology, and it will be embedded in our InTouch CNA solution. There are a couple of things that we like about Vertica. One is the price-performance aspects. The columnar lookups, the projections, give us very strong query response performance, but it's also able to run on commodity hardware, which gives us price advantage that's also bolstered by the columnar compression.

So price performance-wise and maturity-wise we like it. It’s a field-proven, tested solution. There are some other features in terms of strong Hadoop integration that we like. A lot of carriers will have their own Hadoop clusters, data oceans, etc. that they want us to integrate with. Vertica makes that fairly straightforward, and we like a lot of the embedded analytics as well, the Distributed R capability for predictive analytics and things along those lines.

Gardner: It occurs to me that the effort that you put into this at Spirent and being able to take vast amounts of data across a complex network and then come out with these analytic benefits could be extended to any number of environments. Is there a parallel between what you are doing with mobile and telco carriers that could extend to maybe networks that are managing the Internet of Things (IoT) types of devices?
We definitely see our solution helping operators who are trying to be IoT platform providers to ensure the performances of those IoT services and the SLAs that they have for them.

Russo: Absolutely. We're working with carriers on IoT already. The requirements that these things have in terms of the performance that they need to operate properly are different than that of human beings, but nevertheless, the underlying transactions that have to take place, the ability to get a radio connection and set up an IP address and communicate data back and forth to one another and do it in a robust reliable way, is still critical.

We definitely see our solution helping operators who are trying to be IoT platform providers to ensure the performances of those IoT services and the SLAs that they have for them. We also see a potential use for our technology going a step further into the vertical IoT applications themselves in doing, for example, predictive analytics on sensor data itself. That could be a future direction for us.
HP Big Data Analytics Engines
Meet Complex Enterprise-scale OEM Requirements
Get More Information
Gardner: Any words of wisdom for folks that are starting to do with large data volumes across wide variety of sources and are looking also for that more real-time analytics benefit? Any lessons learned that you could share from where Spirent has been and gone for others that are going to be facing some of these same big data issues?

Russo: It's important to focus on the end-user value and the use cases as opposed to the technology. So, we never really focus on getting data for the sake of getting data. We focus more on what problem a customer is trying to accomplish and how we can most simply and elegantly solve it. That steered us clear from jumping on the latest and greatest technology bandwagons, instead going with the proven technologies and leveraging our subject-matter expertise.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:

Thursday, November 12, 2015

Powerful reporting from YP's data warehouse helps SMBs deliver the best ad campaigns

The next BriefingsDirect big-data innovation case study highlights how Yellow Pages (YP) has developed a massive enterprise data warehouse with near real-time reporting capabilities that pulls oceans of data and information from across new and legacy sources.

We explore how YP then continuously delivers precise metrics to over half a million paying advertisers -- many of them SMBs and increasingly through mobile interfaces -- to best analyze and optimize their marketing and ad campaigns.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To learn more, BriefingsDirect recently sat down with Bill Theisinger, Vice President of Engineering for Platform Data Services at YP in Glendale, California. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us about YP, the digital arm of what people would have known as Yellow Pages a number of years ago. You're all about helping small businesses become better acquainted with their customers, and vice versa.
Hewlett Packard Enterprise
Vertica Community Edition

 Start Your Free Trial Now
Theisinger: YP is a leading local marketing solutions provider in the U.S., dedicated to helping local businesses and communities grow. We help connect local businesses with consumers wherever they are and whatever device they are on, desktop and mobile.

Theisinger
Gardner: As we know, the world has changed dramatically around marketing and advertising and connecting buyers and sellers. So in the digital age, being precise, being aware, being visible is everything, and that means data. Tell us about your data requirements in this new world.

Theisinger: We need to be able to capture how consumers interact with our customers, and that includes where they interact -- whether it’s a mobile device or web device -- and also within our network of partners. We reach about 100 million consumers across the U.S and we do that through both our YP network and our partner network.

Gardner: Tell us too about the evolution. Obviously, you don’t build out data capabilities and infrastructure overnight. Some things are in place, and you move on, you learn, adapt, and you have new requirements. Tell us your data warehouse journey.

Needed to evolve

Theisinger: Yellow Pages saw the shift of their print business moving heavily online and becoming heavily digital. We needed to evolve with that, of course. In doing so, we needed to build infrastructure around the systems that we were using to support the businesses we were helping to grow.

And in doing that, we started to take a look at what the systems requirements were for us to be able to report and message value to our advertisers. That included understanding where consumers were looking, what we were impressing to them, what businesses we were showing them when they searched, what they were clicking on, and, ultimately what businesses they called. We track all of those different metrics.

When we started this adventure, we didn't have the technology and the capabilities to be able to do those things. So we had to reinvent our infrastructure. That’s what we did

Gardner: And as we know, getting more information to your advertisers to help them in their selection and spending expertise is key. It differentiates companies. So this is a core proposition for you. This is at the heart of your business.

Given the mission criticality, what are the requirements? What did you need to do to get that reporting, that warehouse capability?

Theisinger: We need to be able to scale to the size of our network and the size of our partner network, which means no click left behind, if you will, no impression untold, no search unrecognized. That's billions of events we process every day. We needed to look at something that would help us scale. If we added a new partner, if we expanded the YP network, if we added hundreds, thousands, tens of thousands of new advertisers, we needed the infrastructure to able to help us do that.
We need to be able to scale to the size of our network and the size of our partner network, which means no click left behind, if you will, no impression untold, no search unrecognized.

Gardner: I understand that you've been using Hadoop. You might be looking at other technologies as they emerge. Tell us about your Hadoop experience and how that relates to your reporting capabilities.

Theisinger: When I joined YP, Hadoop was a heavy buzz product in the industry. It was a proven product for helping businesses process large amounts of unstructured data. However, it still poses a problem. That unstructured data needs to be structured at some point, and it’s that structure that you report to advertisers and report internally.

That's how we decided that we needed to marry two different technologies -- one that will allow us to scale a large unstructured processing environment like Hadoop and one that will allow us to scale a large structured environment like Hewlett Packard Enterprise (HPE) Vertica.

Business impact

Gardner: How has this impacted your business, now that you've been able to do this and it's been in the works for quite a while? Any metrics of success or anecdotes that can relate back to how the people in your organization are consuming those metrics and then extending that as service and product back into your market? What has been the result?

Theisinger: We have roughly 10,000 jobs that we run every day, both to process data and also for analytics. That data represents about five to six petabytes of data that we've been able to capture about consumers, their behaviors, and activities. So we process that data within our Hadoop environment. We then pass that along into HPE Vertica, structure it in a way that we can have analysts, product owners, and other systems retrieve it, pull and look at those metrics, and be able to report on them to the advertisers.
Hewlett Packard Enterprise
Vertica Community Edition

 Start Your Free Trial Now
Gardner: Is there an automation to this as you look to present a more and better analytics on top of the Vertica? What are you doing to make that customizable to people based on their needs, but at the same time, controlled and managed so that it doesn't become unwieldy?

Theisinger: There is a lot of interaction between customers, both internal and external, when we decide how and what we’re going to present in terms of data, and there are a lot of ways we do that. We present data externally through an advertiser portal. So we want to make sure we work very closely with human factors and ergonomics (HFE) and the use experience (UX) designers as well as our advertisers, through focus groups, workshops, and understanding what they want to understand about the data that we present them.

Then, internally, we decide what would make sense and how we feel comfortable being able to present it to them, because we have a universe of a lot more data than what we probably want to show people.

We also do the same thing internally. We've been able to provide various teams internally whether its sales, marketing, or finance, insights into who's clicking on various business listings, who's viewing various businesses, who’s calling businesses, what their segmentation is, and what their demographics look like and it allows us a lot of analytical insight. We do most of that work through the analytics platforms, which is, in this case, HPE Vertica.
Small businesses need to be able to just pick up their mobile device and look at the effectiveness of their campaigns with YP.

Gardner: Now, that user experience is becoming more and more important. It wasn't that long ago when these reports were going to people who were data scientists or equivalent, but now we're taking the amount to those 600,000 small businesses. Can you tell us a little bit about lessons learned when it comes to delivering an end analytics product, versus building out the warehouse? They seem to be interdependent but we're seeing more and more emphasis on that user experience these days.

Theisinger: You need to bridge the gap between analytics and just data storage and processing. So you have to present them in-state. This is what happens. It’s very descriptive of what's going on, and we try to be a little bit more predictive when it comes to the way we want to do analysis at YP. We're looking to go beyond just descriptive analytics.

What has also changed is the platform by which you present the data. It's going highly mobile. Small businesses need to be able to just pick up their mobile device and look at the effectiveness of their campaigns with YP. They're able to do that through a mobile platform we’ve built called YP for Merchants.

They can log in and see their metrics that are core to their business and how those campaigns are performing. They can even see some details, like if they missed a phone call and they want to be able to reach back out to a consumer and see if they need to help, solve a problem, or provide a service.

Developer perspective

Gardner: And given that your developers had to go through the steps of creating that great user experience and taking it to the mobile tier, was there anything about HPE Vertica, your warehouse, or your approach to analytics that made that development process easier? Is there an approach to delivering this from a developer perspective that you think others might learn from?
Hewlett Packard Enterprise
Vertica Community Edition

 Start Your Free Trial Now
Theisinger: There is, and it takes a lot more people than just the analytics team in my group or the engineers in my team. It’s a lot of other teams within YP that build this. But first and foremost, people want to see the data as real time and as near real time as they can.

When a small business relies on contact from customers, we track those calls. When a potential customer calls a small business and that small business isn’t able to actually get to the call or respond to that customer because maybe they are on a job, it's important to know that that call happened recently. It's important for that small business to reach back out to the consumer, because that consumer could go somewhere else and get that service from a competitor.

To be able to do that as quickly as possible is a hard-and-fast requirement. So processing the data as quickly as you can and presenting that, whether it be on a mobile device, in this case, as quickly as you can is definitely paramount to making that a success.
Having the right infrastructure puts you in the position to be able to do that. That’s where businesses are going to end up growing, whether it's ours or small businesses.

Gardner: I've spoken to a number of people over the years and one of the takeaways I get is that infrastructure is destiny. It really seems to be the case in your business that having that core infrastructure decision process done correctly has now given you the opportunity to scale up, be innovative, and react to the market. I think it’s also telling that, in this data-driven decade that we’ve been in for a few years now, the whole small business sector of the economy is a huge part of our overall productivity and growth as an economy.

Any thoughts, generally about making infrastructure decisions for the long run, decisions you won't regret, decisions that that can scale over time and are future proof?

Theisinger: Yeah, for speaking about what I've seen through the job that we’ve had it here at YP, we reach over half a million paying advertisers. The shift is happening between just telling the advertisers what's happened to helping them actually drive new business.

So it's around the fact that I know who my customers are now, how do I find more of them, or how do I reach out to them, how do I market to them? That's where the real shift is. You have to have a really strong scalable and extensible platform to be able to answer that question. Having the right infrastructure puts you in the position to be able to do that. That’s where businesses are going to end up growing, whether it's ours or small businesses.

And our success is hinged to whether or not we can get these small businesses to grow. So we are definitely 100 percent focused on trying to make that happen.

Gardner: It’s also telling that you’ve been able to adjust so rapidly. Obviously, your business has been around for a long time. People are very familiar with the Yellow Pages, the actual physical product, but you've gone to make software so core to your value and your differentiation. I'm impressed and I commend you on being able to make that transitions fairly rapidly.

Core talent

Theisinger: Yeah, well thank you. We’ve invested a lot in the people within the technology team we have there in Glendale. We've built our own internal search capabilities, our own internal products. We’ve pulled a lot of good core talent from other companies.

I used to work at Yahoo with other folks, and YP is definitely focused on trying to make this transition a successful one, but we have our eye on our heritage. Over a hundred years of being very successful in the print business is not something you want to turn your back on. You want to be able to embrace that, and we’ve learned a lot from it, too.

So we're right there with small businesses. We have a very large sales force, which is also very powerful and helpful in making this transition a success. We've leaned on all of that and we become one big kind of happy family, if you will. We all worked very closely together to make this transition successful.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:

Tuesday, November 3, 2015

Big data generates new insights into what’s happening in the world's tropical ecosystems

The next BriefingsDirect big-data innovation case study interview explores how large-scale monitoring of rainforest biodiversity and climate has been enabled and accelerated by cutting-edge big-data capture, retrieval, and analysis.

We'll learn how quantitative analysis and modeling are generating new insights into what’s happening in tropical ecosystems worldwide, and we'll hear how such insights are leading to better ways to attain and verify sustainable development and preservation methods and techniques.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To learn more about data science -- and how hosting that data science in the cloud -- helps the study of biodiversity, we're pleased to welcome Eric Fegraus, Senior Director of Technology of the TEAM Network at Conservation International and Jorge Ahumada, Executive Director of the TEAM Network, also at Conservation International in Arlington, Virginia. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Knowing what’s going on in environments in the tropics helps us understand what to do and what not to do to preserve them. How has that changed? We spoke about a year ago, Eric. Are there any trends or driving influences that have made this data gathering more important than ever.

Fegraus: Over this last year, we’ve been able to roll out our analytic systems across the TEAM Network. We're having more-and-more uptake with our protected-area managers using the system and we have some good examples where the results are being used.

Fegraus
For example, in Uganda, we noticed that a particular cat species was trending downward. The folks there were really curious why this was happening. At first, they were excited that there was this cat species, which was previously not known to be there.

This particular forest is a gorilla reserve, and one of the main economic drivers around the reserve is ecotourism, people paying to go see the gorillas. Once they saw that these cats are going down, they started asking what could be impacting this. Our system told them that the way they were bringing in the eco-tourists to see the gorillas had shifted and that was potentially having an impact of where the cats were. It allowed them to readjust and think about their practices to bring in the tourists to the gorillas.

Information at work

Gardner: Information at work.

Fegraus: Information at work at the protected-area level.

Gardner: Just to be clear for our audience, the TEAM Network stands for the Tropical Ecology Assessment and Monitoring. Jorge, tell us a little bit about how that came about, the TEAM Network and what it encompasses worldwide?
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Ahumada: The TEAM Network was a program that started about 12 years ago and it was started to fill a void in the information we have from tropical forests. Tropical forests cover a little bit less than 10 percent of the terrestrial area in the world, but they have more than 50 percent of the biodiversity.

Ahumda
So they're the critical places to be conserved from that point of view, despite the fact we didn’t have any information about what's happening in these places. That’s how the TEAM Network was born, and the model was to use data collection methods that were standardized, that were replicated across a number of sites, and have systems that would store and analyze that data and make it useful. That was the main motivation.

Gardner: Of course, it’s super-important to be able to collect and retrieve and put that data into a place where it can be analyzed. It’s also, of course, important then to be able to share that analysis. Eric, tell us what's been happening lately that has led to the ability for all of those parts of a data lifecycle to really come to fruition?

Fegraus: Earlier this year, we completed our end-to-end system. We're able to take the data from the field, from the camera traps, from the climate stations, and bring it into our central repository. We then push the data into Vertica, which is used for the analytics. Then, we developed a really nice front-end dashboard that shows the results of species populations in all the protected areas where we work.

The analytical process also starts to identify what could be impacting the trends that we're seeing at a per-species level. This dashboard also lets the user look at the data in a lot of different ways. They can aggregate it and they can slice and dice it in different ways to look at different trends.

Gardner: Jorge, what sort of technologies are they using for that slicing and dicing? Are you seeing certain tools like Distributed R or visualization software and business-intelligence (BI) packages? What's the common thread or is it varied greatly?

Ahumada: It depends on the analysis, but we're really at the forefront of analytics in terms of big data. As Michael Stonebraker and other big data thinkers have said, the big-data analytics infrastructure has concentrated on the storage of big data, but not so much on the analytics. We break that mold because we're doing very, very sophisticated Bayesian analytics with this data.

One of the problems of working with camera-trap data is that you have to separate the detection process from the actual trend that you're seeing because you do have a detection process that has error.

Hierarchical models

We do that with hierarchical models, and it's a fairly complicated model. Just using that kind of model, a normal computer will take days and months. With the power of Vertica and power of processing, we’ve been able to shrink that to a few hours. We can run 500 or 600 species from 13 sites, all over the world in five hours. So it’s a really good way to use the power of processing.

We’d been also more recently working with Distributed R, a new package that was written by HP folks at Vertica, to analyze satellite images, because we're also interested in what’s happening at these sites in terms of forest loss. Satellite images are really complicated, because you have millions of pixels and you don’t really know what each pixel is. Is it forest, agricultural land, or a house? So running that on normal R, it's kind of a problem.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Distributed R is a package that actually takes some of those functions, like random forest and regression trees, and takes full power of the vertical processing of Vertica. So we’ve seen a 10-fold increase in performance with that, and it allows us to get much more information out of those images.

Gardner: Not only are you on the cutting-edge for the analytics, you've also moved to the bleeding edge on infrastructure and distribution mechanisms. Eric, tell us a little bit about your use of cloud and hybrid cloud?

Fegraus: To back up a little bit, we ended up building a system that uses Vertica. It’s an on-premise solution and that's what we're using in the TEAM Network. We've since realized that this solution we built for the TEAM Network can also be readily scalable to other organizations and government agencies, etc., different people that want to manage camera trap data, they want to do the analytics.

So now, we're at a process where we’ve been essentially doing software development and producing software that’s scalable. If an organization wants to replicate what we’re doing, we have a solution that we can spin up in the cloud that has all of the data management, the analytics, the data transformations and processing, the collection, and all the data quality controls, all built into a software instance that could be spun up in the cloud.
In many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data.

Gardner: And when you say “in the cloud,” are you talking about a specific public cloud, in a specific country or all the above, some of the above?

Fegraus: All of the above. We'll be using Vertica or we're using Vertica OnDemand. We're actually going to transition our existing on-premise solution into Vertica OnDemand. The solution we’re developing uses mostly open-source software and it can be replicated in the Amazon cloud or other clouds that have the right environments where we can get things up and running.

Gardner: Jorge, how important is that to have that global choice for cloud deployment and attract users and also keep your cost limited?

Ahumada: It’s really key, because in many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data. As Eric was saying, the big limitation here is which cloud solutions are available in each country. Right now, we have something with cloud OnDemand here, but in some of the countries, we might not have the same infrastructure. So we'll have to contract different vendors or whatever.

But it's a way to keep cost down, deliver the information really quick, and store the data in a way that is safe and secure.

What's next?

Gardner: Eric, now that we have this ability to retrieve, gather, analyze, and now distribute, what comes next in terms of having these organizations work together? Do we have any indicators of what the results might be in the field? How can we measure the effectiveness at the endpoint -- that is to say, in these environments based on what you have been able to accomplish technically?

Fegraus: One of the nice things about the software that we built that can run in the various cloud environments, is that it can also be connected. For example, if we start putting these solutions in a particular continent, and there are countries that are doing this next to each other, there are not going to be silos that will be unable to share an aggregated level of data across each other so that we can get a holistic picture of what's happening.

So that was very important when we started going down this process, because one of the big inhibitors for growth within the environmental sciences is that there are these traditional silos of data that people in organizations keep and sit on and essentially don't share. That was a very important driver for us as we were going down this path of building software.

Gardner: Jorge, what comes next in terms of technology. Are the scale issues something you need to hurdle to get across? Are there analytics issues? What's the next requirements phase that you would like to work through technically to make this even more impactful?

Ahumada: As we scale up in size and  start  having more granularity in the countries where we work, the challenge is going to be keeping these systems responsive and information coming. Right now, one of the big limitations is the analytics. We do have analytics running at top speeds, but once we started talking about countries, we're going to have an the order of many more species and many more protected areas to monitor.
This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data.

This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data. We're looking forward to keep working with our technology partners, and in particular HP, to help them guide this process. As a case study, we're very well-positioned for that, because we already have that challenge.

Gardner: Also it appears to me that you are a harbinger, a bellwether, for the Internet of Things (IoT). Much of your data is coming from monitoring, sensors, devices, and cameras. It's in the form of images and raw data. Any thoughts about what others who are thinking about the impact of the IoT should consider, now that you have been there?

Fegraus: When we talk about big data, we're talking about data collected from phones, cars, and human devices. Humans are delivering the data. But here we have a different problem. We're talking about nature delivering the data and we don't have that infrastructure in places like Uganda, Zimbabwe, or Brazil.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
So we have to start by building that infrastructure and we have the camera traps as an example of that. We need to be able to deploy much more, much larger-scale infrastructure to collect data and diversify the sensors that we currently have, so that we can gather sound data, image data, temperature, and environmental data in a much larger scale.

Satellites can only take us some part of the way, because we're always going to have problems with resolution. So it's really deployment on the ground which is going to be a big limitation, and it's a big field that is developing now.

Gardner: Drones?

Fegraus: Drones, for example, have that capacity, especially small drones that are showing to be intelligent, to be able to collect a lot of information autonomously. This is at the cutting edge right now of technological development, and we're excited about it.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:

Thursday, October 29, 2015

DevOps and security, a match made in heaven

This next BriefingsDirect DevOps thought leadership discussion explores the impact of improved development on security and how those investing in DevOps models specifically can expect to improve their security, compliance, and risk-mitigation outcomes.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To help better understand the relationship between DevOps and security, we're joined by two panelists: Gene Kim, DevOps researcher and author focused on IT operations, information security and transformation (his most recent book, The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win, will soon be followed by The DevOps Cookbook), and Ashish Kuthiala, Senior Director of Marketing and Strategy for Hewlett Packard Enterprise (HP) DevOps. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Coordinating and fostering increased collaboration between development, testers, and IT operations has a lot of benefits. We've been talking about that in a number of these discussions, but security specifically. How specifically is DevOps engendering safer code and improved security?

Kuthiala: Dana, I look at security as no different than any other testing that you do on your code. Anything that you catch early-on in the process, fix it, and close the vulnerabilities is much simpler, much easier, and much cheaper to fix than when the end-product is in the hands of the users.
Learn the Four Keys
to Continuous DevOps
At that point, it could be in the hands of thousands of users, deployed in thousands of environments, and it's really very expensive. Even if you want to fix it there, if some trouble happens, if there is security breach, you're not just dealing with the code vulnerability, but you are also dealing with loss of brand, loss of revenue, and loss of reputation in the marketplace. Gene has done a lot of study on security and DevOps. I would love to hear his point of view on that.

Promise is phenomenal

Kim: You're so right. The promise of DevOps for advancing the information security objective is phenomenal, but unfortunately, the way most information security practitioners react to DevOps is one of moral outrage and fear. The fear being verbalized is that Dev and Ops are deploying more quickly than ever, and the outcomes haven't been so great. You're doing one release a year, what will happen if they are doing 10 deploys a day? [See a recent interview with Gene from the DevOps Enterprise Summit.]

Kim
We can understand why they might be just terrified of this. Yet, what Ashish described is that DevOps represents the ideal integration of testing into the the daily work of Dev and Ops. We have testing happening all the time. Developers own the responsibilities of building and running the test. It’s happening after every code commit, and these are exactly same sort of behaviors and cultural norms that we want in information security. After all, security is just another aspect of quality.

We're seeing many, many examples of how organizations are creating what some people calling DevOps(Sec), that is DevOps plus security. One of my favorite examples is Capital One, which calls DevOps in their organization DevOps(Sec). Basically, information security is being integrated into every stage of the software development lifecycle. This is actually what every information security practitioner has wanted for the last two decades.

Kuthiala
Kuthiala: Gene, that brings up an interesting thought. As we look at Dev and Ops teams coming together without security, increasingly we talk about how people need to have generally more skills across the spectrum. Developers need to understand production systems and to be able to support their code in production. But what you just described, does that mean that’s how the developers and planners start to become security specialist or think like that? What have you seen?

Kim: Let's talk about the numbers for a second. I love this ratio of 100 to 10 to 1. For every 100 developers, we have 10 operations people, and you have one security person. So there's no way you're going to get the adequate coverage, right? There are not enough security people around. If we can't embed Ops people into these project or service teams, then we have to train developers to care and know when seek help from the Ops experts.

We have the similar challenge in information security -- how we train, whether it's about secure coding, regular compliance, or how we create evidence that controls exist and are effective. It is not going to be security doing the work. Instead, security needs to be training Dev and Ops on how to do things securely.

Kuthiala: Are there patterns that they should be looking at in security? Are there any known patterns out there or are there some being developed? What you have seen with the customers that you work with?

Kim: In the deployment pipeline, instead of having just unit tests being run after every code commit, you actually run static code analysis tools. That way you know that it's functionally correct, and the developers are getting fast feedback and then they’re writing things that are potentially more secure than they would have otherwise.

And then alongside that in production, there are the monitoring tools. You're running things like the dynamic security testing. Now, you can actually see how it’s behaving in the production environment. In my mind, that's the ideal embodiment of how information security work should be integrated into the daily work of dev, test, and operations.

Seems contradictory

Kuthiala: It seems a little contradictory in nature. I know DevOps is all about going a little faster, but actually, you’re adding more functionality right up front and slowing this down. Is it a classic case of going slower to go faster? Walk before you can run, until you get to crawl? From my point of view, it slows you down here, but toward the end, you speed up more. Are you able to do this?

Kim: I would claim the opposite. We're getting the best of all worlds, because the security testing is now automated. It’s being done on demand by the developers, as opposed to your opening a ticket, "Gene, can you scan my application?" And I'll get back to you in about six weeks.
Learn the Four Keys
to Continuous DevOps
That’s being done automatically as part of my daily work. My claim would be not only is it faster, but we'll get better coverage than we had before. The fearful info sector person would ask how we can do this for highly regulated environments, where there is a lot of compliance regimes in place.

If you were to count the number of controls that are continuously operating, not only do you have orders and managing more controls, but they are actually operating all the time as opposed to testing once a year.

Kuthiala: From what I've observed with my customers, I have two kind of separate questions here. First, if you look at some of the highly regulated industries, for example, the pharmaceutical industry, it's not just internal compliance and regulations. It's part of security, but they often have to go to the outside agencies for almost physical paperwork kind of regulatory compliance checks.
Not only can you be compliant with all the relevant laws, contractual obligations, and regulations, but you can significantly decrease the amount of work.

As they're trying to go toward DevOps and speed this up, they are saying, "How do we handle that portion of the compliance checks and the security checks, because they are manual checks? They're not automated. How do we deal with external agencies and incorporate this in? What have you seen work really well?

Kim: Last year, at the DevOps Enterprise Summit, we had one bank, and it was a smaller bank. This year, we have five including some of the most well-known banks in the industry. We had manufacturing. I think we had coverage of almost every major industry vertical, the majority of which are heavily regulated. They are all able to demonstrate that not only can you be compliant with all the relevant laws, contractual obligations, and regulations, but you can significantly decrease the amount of work.

One of my favorite examples came from Salesforce. Selling to the Federal government, they had to apply with FedRAMP. One of the things that they got agreement on from security, compliance groups, and change management was that all infrastructure changes made through the automation tools could be considered a standard change.

In other words, they wouldn’t require review and approval, but all changes that were done manually would still require approvals, which would often take weeks. This really shows that we can create this fast path not just for the people doing the work, but also, this make some work significantly easier for security and compliance as well.

Human error

Kuthiala: And you're taking on the human error possibility in there. People can be on vacation, slowing things down. People can be sick. People may not be in their jobs anymore. Automation is a key answer to this, as you said. [More insights from HP from the DevOps Enterprise Summit.]

Gardner: One of things we've been grappling with in the industry is how to get DevOps accelerated into cultures and organizations. What about the security as a point on the arrow here? If we see and recognize that security can benefit from DevOps and we want to instantiate DevOps models faster, wouldn’t the security people be a good place to be on the evangelistic side of DevOps?

Kim: That’s a great observation, Dana. In fact, I think part of the method behind the madness is that the goal of the DevOps Enterprise Summit is to prove points. We have 50 speakers all from large, complex organizations. The goal is to get coverage of the industry verticals.
Learn the Four Keys
to Continuous DevOps
I also helped co-host a one-day DevOps Security Conference at the RSA Conference, and this was very much from a security perspective. It was amazing to find those champions in the security community who are driving DevOps objectives. They have to figure out how security fits into the DevOps ecosystem, because we need them to show that the water is not only just safe, but the water is great.

Kuthiala: This brings up a question, Gene. For any new project that kicks off, it’s a new company. You can really define the architecture from scratch, thus enabling you a lot of practices you need to put in place, whether it's independent deliverables and faster deliverables, all acting independent of each other.

But for the bigger companies and enterprise software that’s being released -- we've discussed this in our past talks -- you need to look at the architecture underneath it and see how we can modernize this to do this.
Just as marketing is too important to leave to the marketing people, and quality is too important to leave to the QA people -- so too security is too important to leave just to the security people.

So, when you start to address security, how do you go about approaching that, because you know you're dealing with a large base of code that’s very monolithic? It can take thousands of people to release something out to the customers. Now, you're trying to incorporate security into this with any new features and functions you add.

I can see how you can start to incorporate security and the expertise into it and scan it right from development cycle. How do you deal with that big component of the architecture that’s already there? Any best practices?

Kim: One of the people who have best articulated the philosophy is Gary Gruver. He said something that, for me, was very memorable. If you don’t have automated testing, and I think his context was very much like unit testing, automated regression testing, you have a fundamentally broken cost model, and it becomes too expensive. You get to a point where it becomes too expensive to add features.

That’s not even counting security testing. You get to a point where not only it is too expensive, but it becomes too risky to change code.

We have to fully empower developers to get feedback on their work and have them fully responsible for not just the features, but the non-functional requirements, testability, deployability, manageability, and security.

A better way

Gardner: Assume that those listening and reading here are completely swayed by our view of things and they do want to have DevOps with security ingrained. Are there not also concurrent developments around big data and analytics that give them a better way to do this, once they've decided to do it.

It seems to me that there is an awful lot of data available within systems, whether it's log files, configuration databases. Starting to harness that affordably, and then applying that back to those automation capabilities is going to be a very powerful synergistic value. How does it work when we apply big data to DevOps and security, Ashish?

Kuthiala: Good question Dana. You're absolutely right with data sources now becoming easy, bringing together data sources into one repository and at an affordable cost. We're starting to build analytics on top of that and this has being applied in a number of areas.
We're finding that we're about 80 to 85 percent accurate in predicting what to test and not to test and what features are reflected or not.

The best example I can talk about is how HP has been working on an IP creation of the area of testing using big data analytics. So, if we have to go faster and we have to release software every hour or every two, versus every six to eight months, you need to test it as fast as well. You can no longer afford to go and run your 20,000 tests based on this one-line change of code.

You have to be able to figure out what modules are affected, which ones are not, and which ones are likely to break. We're starting to do some intelligent testing inside of our labs and we're finding that we're about 80 to 85 percent accurate in predicting what to test and not to test and what features are reflected or not.

Similarly, using the big data analytics and the security expertise that Gene talked about, you need to start digging through and analyzing exactly the same as we run any test. What security vulnerabilities do you want to test, which functions of the code? And it’s just a best practice moving forward that you start to incorporate the big data analytics into your security testing.

Kim: You were implying something that I just want to make explicit. One of the most provocative notions that Ashish and I talked about was to think about all the telemetry and all the data that the build mechanisms create. You start putting in all the results of testing, and suddenly we have a much better basis of where we apply our testing effort.
Learn the Four Keys
to Continuous DevOps
If we actually need to deploy faster, even if we completely automate our tests, and even if we parallelize them and run them across thousands of servers and if that takes days, we may be able use data to tell us where to surgically apply testing so we make a informed decision on whether to deploy or not. That's an awesome potential.

Gardner: Speaking of awesome potentials, when we compress the feedback loops using this data -- when development and operations are collaborating and communicating very well -- it seems to me that we're also moving from a reactive stance to security issues to a proactive stance.

One of the notions about security is that you can’t prevent people from getting in, but you can limit the damage they can do when they do get in. It seems to me that if you close a loop between development operations and test, you can get the right remediation out into operations and production much quicker. Therefore you can almost behave as we had seen with anti-malware software -- where the cycle between the inception of a problem, the creation of the patch, and then deployment of the patch was very, very short.

Is that vision pie in the sky or is that something we could get to when DevOps and security come together, Gene?

Key to prevention

Kim: You're right on. The way an auditor would talk about it is that there are things that we can do to prevent: that’s code review, that’s automated code testing and scanning.

Making libraries available so that developers are choosing things and deploying them in a secured state are all preventive controls. If we can make sure that we have the best situational awareness we can of the production environment, those are what allow quicker detection recovery.

The better we are at that, the better we are at mitigating, effectively mitigating risk.

Kuthiala: Gene, as you were talking, I was thinking. We have this notion of rolling back code when something breaks in production, and that’s a very common kind of procedure. You go back into the lab, fix what didn’t work, and then you roll it back into production. If it works, it's fine. Otherwise, you roll it back and do it over again.

But with the advent of DevOps and those who are doing this successfully, there are no roll backs. They roll forward. You just go forward, because with the discipline of DevOps, if done well, you can quickly put a patch into production within hours, versus months, days, and weeks.
The more you talk about IoT, the more holes are open for hackers to get in.

And similarly like you talked about security, you know once a vulnerability is out there that you want to go fix it, you want to issue the patch. With DevOps and security, there are lot of similarities.

Gardner: Before we close out, is there anything more for the future? We've heard a lot about the Internet of Things (IoT), a lot more devices, device types, networks, extended networks, and variable networks. Is there a benefit with DevOps and security as a tag team, as we look to an increased era of complexity around the IoT sensors and plethora of disparate networks? Ashish?

Kuthiala: The more you talk about IoT, the more holes are open for hackers to get in. I'll give you classic example. I've been looking forward to the day where my phone is all I carry. I don’t have to open my car with my keys or I can pay for things with it, and we have been getting toward that vision, but a lot of my friends who are in high-tech are actually skeptical.
Learn the Four Keys
to Continuous DevOps
What happens if you lose your phone? Somebody has access to it. You know their counter argument against that. You can switch off your phone and wipe the data etc. But I think as IoT grows in number, more holes open up. So, it becomes even more important to incorporate your security planning cycles right into the planning and software development cycles.

Gardner: Particularly if you're in an industry where you expect to an have an Internet of Things ramp-up, getting automation in place, thinking about DevOps, thinking about security as an integral part of DevOps -- it all certainly makes a great deal of sense to me.

Kim: Absolutely, you said it better than I ever could. Yes.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in: