Tuesday, January 5, 2010

Architectural shift joins app logic with massive data sets to take advanced BI analytics to real-time performance heights

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Read a full transcript or download a copy. Learn more. Sponsor: Aster Data Systems.

New architectures for data and logic processing are ushering in a game-changing era of advanced analytics.

These new approaches support massive data sets to produce powerful insights and analysis -- yet with unprecedented price-performance. As we enter 2010, enterprises are including more forms of diverse data into their business intelligence (BI) activities. They're also diversifying the types of analysis that they expect from these investments.

At the same time, more kinds and sizes of companies and government agencies are seeking to deliver ever more data-driven analysis for their employees, partners, users, and citizens. It boils down to giving more communities of participants what they need to excel at whatever they're doing. By putting analytics into the hands of more decision makers, huge productivity wins across entire economies become far more likely.

But such improvements won’t happen if the data can't effectively reach the application's logic, if the systems can't handle the massive processing scale involved, or the total costs and complexity are too high.

In this sponsored podcast discussion we examine how convergence of data and logic, of parallelism and MapReduce -- and of a hunger for precise analysis with a flood of raw new data -- are all setting the stage for powerful advanced analytics outcomes.

To help learn how to attain advanced analytics and to uncover the benefits from these new architectural activities for ubiquitous BI, we're joined by Jim Kobielus, senior analyst at Forrester Research, and Sharmila Mulligan, executive vice president of marketing at Aster Data Systems. The discussion is moderated by BriefingsDirect's Dana Gardner, principal analyst at Interarbor Solutions.

Here are some excerpts:
Kobielus: Advanced analytics is focused on how to answer questions about the future. It's what's likely to happen -- forecast, trend, what-if analysis -- as well as what I like to call the deep present, really current streams for complex event processing.

What's streaming in now? And how can you analyze the great gushing streams of information that are emanating from all your applications, your workflows, and from social networks?

Advanced analytics is all about answering future-oriented, proactive, or predictive questions, as well as current streaming, real-time questions about what's going on now. Advanced analytics leverages the same core features that you find in basic analytics -- all the reports, visualizations, and dashboarding -- but then takes it several steps further.

... What Forrester is seeing is that, although the average data warehouse today is in the 1-10 terabyte range for most companies, we foresee the average warehouse size going, in the middle of the coming decade, into the hundreds of terabytes.

In 10 years or so, we think it's possible, and increasingly likely, that petabyte-scale data warehouses or content warehouses will become common. It's all about unstructured information, deep history, and historical information. A lot of trends are pushing enterprises in the direction of big data.

... We need to rethink the platforms with which we're doing analytical processing. Data mining is traditionally thought of as being the core of advanced analytics. Generally, you pull data from various sources into an analytical data mart.

That analytical data mart is usually on a database that's specific to a given predictive modeling project, let's say a customer analytics project. It may be a very fast server with a lot of compute power for a single server, but quite often what we call the analytical data mart is not the highest performance database you have in your company. Usually, that high performance database is your data warehouse.

As you build larger and more complex predictive models you quickly run into resource constraints on your existing data-mining platform. So you have to look for where you can find the CPU power, the data storage, and the I/O bandwidth to scale up your predictive modeling efforts.

... But, [there is] another challenge, which is advanced analytics producing predictive models. Those predictive models increasingly are deployed in-line to transactional applications to provide some basic logic and rules that will drive such important functions as "next best offer" being made to customers based on a broad variety of historical and current information.

How do you inject predictive logic into your transactional applications in a fairly seamless way? You have to think through that, because, right now, quite often analytical data models, predictive models, in many ways are not built for optimal embedding within your transactional applications. You have to think through how to converge all these analytical models with the transactional logic that drives your business.

New data platform

Mulligan: What we see with customers is that the advanced analytics needs and the new generation of analytics that they are trying to do is driving the need for a new data platform.

What you've got is a situation where enterprises want to be able to do more scalable reporting on massive data sets with very, very fast response times. On the reporting side, in terms of the end result to the customer, it is similar to the type of report they are trying to achieve, but the difference is that the quantity of data that they're trying to get at, and the amount of data that these reports are filling up is far greater than what they had before.

That's what's driving a need for a new platform underneath some of the preexisting BI tools that are, in themselves, good at reporting, but what the BI tools need is a data platform beneath them that allows them to do more scalable reporting than you could do before.

... Previously, the choice of a data management platform was based primarily on price-performance, being able to effectively store lots of data, and get very good performance out of those systems. What we're seeing right now is that, although price performance continues to be a critical factor, it's not necessarily the only factor or the primary thing driving their need for a new platform.

What's driving the need now, and one of the most important criteria in the selection process, is the ability of this new platform to be able to support very advanced analytics.

Customers are very precise in terms of the type of analytics that they want to do. So, it's not that a vendor needs to tell them what they are missing. They are very clear on the type of data analysis they want to do, the granularity of data analysis, the volume of data that they want to be able to analyze, and the speed that they expect when they analyze that data.

There is a big shift in the market, where customers have realized that their preexisting platforms are not necessarily suitable for the new generation of analytics that they're trying to do.

They are very clear on what their requirements are, and those requirements are coming from the top. Those new requirements, as it relates to data analysis and advanced analytics, are driving the selection process for a new data management platform.

There is a big shift in the market, where customers have realized that their preexisting platforms are not necessarily suitable for the new generation of analytics that they're trying to do.

We see the push toward analysis that's really more near real-time than what they were able to do before. This is not a trivial thing to do when, it comes to very large data sets, because what you are asking for is the ability to get very, very quick response times and incredibly high performance on terabytes and terabytes of data to be able to get these kind of results in real-time.

Social network analysis

Kobielus: Let's look at what's going to be a true game changer, not just for business, but for the global society. It's a thing called social network analysis.

It's predictive models, fundamentally, but it's predictive models that are applied to analyzing the behaviors of networks of people on the web, on the Internet, Facebook, and Twitter, in your company, and in various social network groupings, to determine classification and clustering of people around common affinities, buying patterns, interests, and so forth.

As social networks weave their way into not just our consumer lives, but our work lives, our life lives, social network analysis -- leveraging all the core advanced analytics of data mining and text analytics -- will take the place of the focus group.

You're going to listen to all their tweets and their Facebook updates and you're going to look at their interactions online through your portal and your call center. Then, you're going to take all that huge stream of event information -- we're talking about complex event processing (CEP) -- you're going to bring it into your data warehousing grid or cloud.

You're also going to bring historical information on those customers and their needs. You're going to apply various social network behavioral analytics models to it to cluster people into the categories that make us all kind of squirm when we hear them, things like yuppie and Generation X and so forth.

They can get a sense of how a product or service is being perceived in real-time, so that the the provider of that product or service can then turn around and tweak that marketing campaign ...

Social network analysis becomes more powerful as you bring more history into it -- last year, two years, five years, 10 years worth of interactions -- to get a sense for how people will likely respond likely to new offers, bundles, packages, campaigns, and programs that are thrown at them through social networks.

If you can push not just the analytic models, but to some degree bring transactional applications, such as workflow, into this environment to be triggered by all of the data being developed or being sifted by these models, that is very powerful.

Mulligan: One of the biggest issues that the preexisting data pipeline faces is that the data lives in a repository that's removed from where the analytics take place. Today, with the existing solutions, you need to move terabytes and terabytes of data through the data pipeline to the analytics application, before you can do your analysis.

There's a fundamental issue here. You can't move boulders and boulders of data to an application. It's too slow, it's too cumbersome, and you're not factoring in all your fresh data in your analysis, because of the latency involved.

One of the biggest shifts is that we need to bring the analytics logic close to the data itself. Having it live in a completely different tier, separate from where the data lives, is problematic. This is not a price-performance issue in itself. It is a massive architectural shift that requires bringing analytics logic to the data itself, so that data is collocated with the analytics itself.

MapReduce plays a critical role in this. It is a very powerful technology for advanced analytics and it brings capabilities like parallelization to an application, which then allows for very high-performance scalability.

What we see in the market these days are terms like "in-database analytics," "applications inside data," and all this is really talking about the same thing. It's the notion of bringing analytics logic to the data itself.

... In the marriage of SQL with MapReduce, the real intent is to bring the power of MapReduce to the enterprise, so that SQL programmers can now use that technology. MapReduce alone does require some sophistication in terms of programming skills to be able to utilize it. You may typically find that skill set in Web 2.0 companies, but often you don’t find developers who can work with that in the enterprise.

What you do find in enterprise organizations is that there are people who are very proficient at SQL. By bringing SQL together with MapReduce what enterprise organizations have is the familiarity of SQL and the ease of using SQL, but with the power of MapReduce analytics underneath that. So, it’s really letting SQL programmers leverage skills they already have, but to be able to use MapReduce for analytics.

... One of the biggest requirements in order to be able to do very advanced analytics on terabyte- and petabyte-level data sets, is to bring the application logic to the data itself. Earlier, I described why you need to do this. You want to eliminate as much data movement as possible, and you want to be able to do this analysis in as near real-time as possible.

What we did in Aster Data 4.0 is just that. We're allowing companies to push their analytics applications inside of Aster’s MPP database, where now you can run your application logic next to the data itself, so they are both collocated in the same system. By doing so, you've eliminated all the data movement. What that gives you is very, very quick and efficient access to data, which is what's required in some of these advanced analytics application examples we talked about.

Pushing the code

What kind of applications can you push down into the system? It can be any app written in Java, C, C++, Perl, Python, .NET. It could be an existing custom application that an organization has written and that they need to be able to scale to work on much larger data sets. That code can be pushed down into the apps database.

It could be a new application that a customer is looking to write to do a level of analysis that they could not do before, like real-time fraud analytics, or very deep customer behavior analysis. If you're trying to deliver these new generations of advanced analytics apps, you would write that application in the programming language of your choice.

Kobielus: In this coming decade, we're going to see predictive logic deployed into all application environments, be they databases, clouds, distributed file systems, CEP environments, business process management (BPM) systems, and the like. Open frameworks will be used and developed under more of a service-oriented architecture (SOA) umbrella, to enable predictive logic that’s built in any tool to be deployed eventually into any production, transaction, or analytic environment.
Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Read a full transcript or download a copy. Learn more. Sponsor: Aster Data Systems.

You may also be interested in:

Sunday, January 3, 2010

Getting on with 2010 and celebrating ZapThink’s 10-year anniversary

This guest post comes courtesy of Ronald Schmelzer, senior analyst at ZapThink.

By Ronald Schmelzer

t’s hard to believe that ZapThink will be a full decade old in 2010. For those of you that don’t know, ZapThink was founded in October 2000 with a simple mission: record and communicate what was happening at the time with XML standards.

From that humble beginning, ZapThink has emerged as a (still small) advisory and education powerhouse focused on Service-Oriented Architecture (SOA), Cloud Computing, and loosely coupled forms of Enterprise Architecture (EA).

Oh, how things have changed, and how they have not. As is our custom, we’ll use this first ZapFlash of the year to look retrospectively at the past year and the upcoming future. But we’ll also wax a bit nostalgic and poetic as we look at the past 10 years and surmise where this industry might be heading in the next decade.

2009: A year of angst

The Times Square Alliance has it right in celebrating Good Riddance Day just prior to New Year’s Eve. There’s a lot that we can be thankful to put behind us. Anne Thomas Manes started out the year with an angst-filled posting declaring that SOA is Dead. Getting past the misleading headline, many in the industry came to the quick realization that SOA is far from dead, but rather going into a less hyped phase.

And for that reason we’re glad. We say good riddance to vendor hype, consulting firm over-selling, and the general proliferation of misunderstanding that plagued the industry from 2000 until this point (SOA is Web Services? SOA is integration middleware? Buy an ESB get a SOA?). We can now declare that the vendor marketing infatuation with SOA is dead and they have a new target in mind: Cloud Computing.

Last year we predicted that SOA would be pushed from the daily marketing buzz to replaced with Cloud Computing as the latest infatuation of the marketingerati. Specifically we said, “We expect the din of the cloud-related chatter to turn into a real roar by this time next year. Everything SOA-related will probably be turned into something cloud-related by all the big vendors, and companies will desperately try to turn their SOA initiatives into cloud initiatives.”

Oh, boy, were we right ... in spades. Perhaps this wasn’t the most remarkable of predictions, though. Every analyst firm, press writer, and book author was positively foaming at the mouth with Cloud-this and Cloud-that. Of course, if history is a lesson, 90% of what’s being spouted is EAI-cum-SOA-cum-Cloud marketing babble and intellectual nonsense.

But history also teaches us that people have short-term memories and won’t remember. They’ll continue to buy the same software and consulting services warmed over as new tech with only a few enhancements, mostly in the user interface and system integration to change things.

We’ve seen the rapid emergence of a wide range of EA frameworks, SOA methodologies, and disciplines benefiting from a rapid increase in EA and SOA training expenditures.

We also predicted a boom year for SOA education and training, which ended up panning out, for the most part. ZapThink now generates the vast majority of its revenues from SOA training and certification, which has become a multi-million dollar business for us, by itself.

ZapThink is not alone in realizing this boom of EA and SOA training spending. We’ve seen the rapid emergence of a wide range of EA frameworks, SOA methodologies, and disciplines benefiting from a rapid increase in EA and SOA training expenditures. We also predicted that ZapThink would double in size, which hasn’t exactly happened. Instead, we’ve decided to grow through use of partners and contractors – a much wiser move in an economy that has proven to be sluggish throughout 2009.

Yet, not all of our predictions panned out. We promised that there would be one notable failure and one notable success that would be universally and specifically attributed to SOA in 2009, and I can’t say that this has happened. If it did, we’d all know about it.

Rather, we saw the continued recession of SOA into the background as other, more highly hyped and visible initiatives got the thumbs-up of success or the mark of failure. In fact, perhaps this is how it should have been all along. Why should we all know with such grand visibility if it was SOA that succeeded or failed? Indeed, failure or success can rarely be solely attributable to any form of architecture. So, I think it’s possible to say that the prediction itself was misguided. Maybe we should instead have asked for raises for all those involved in SOA projects in 2009.

2010 and beyond: Where are things heading?

It’s easy to have 20/20 hindsight, however. It’s much more difficult to make predictions for the year ahead that aren’t just the obvious no-brainers that anyone who has been observing the market can make. Sure, we can assert that the vendors will continue to consolidate, IT spending will rebound with improving economic conditions, and that cloud computing will continue its inevitable movement through the hype cycle, but that wouldn’t be providing you with any information. Rather, we believe that we can stick our necks out a bit to make some predictions for 2010.

In 2010, we predict that:
  • Open Source SOA infrastructure will dominate – Lack of interest by venture capitalists and consolidation by the Big Five IT infrastructure providers will result in such lack of choice for SOA infrastructure solutions that end users will flock to open source alternatives. As a result, 2010 will be the year that open source SOA infrastructure finally gains enough adoption that it will be on the short list for most large SOA implementations. We’ll see (finally) a robust open source SOA registry/repository offering, SOA management solutions, SOA governance offerings, and SOA infrastructure solutions that rival commercial ones in terms of performance, reliability, and support.

    The Rich Internet Application (RIA) Market wars are over – Put a fork in it, it’s done.

  • The Rich Internet Application (RIA) market wars are over – Put a fork in it, it’s done. Good try Microsoft Silverlight. Nice effort, RIA startups and commercial vendors. Customers have spoken. Adobe Flash and open source Ajax solutions based on Javascript have won. Yes, there will be niches and industries were Silverlight and other commercial solutions might be appropriate and gain traction, but we see way too much (awesome quality) open source jQuery (and Prototype) solutions out there and too much adoption of Flash by the end user base for this trend to go away. And Java on the client? Feggetaboutit – that time has come and gone. As a result, this will be the end of ZapThink’s coverage of the space. Just as we declared the Native XML Database market done in 2002. So too we declare this market contest over.

  • Cloud privacy & security issues put to rest – Already we’re seeing people anguishing about Cloud’s unreliability, insecurity, and lack of privacy. Really? You think people didn’t realize this when they made their Cloud investments in the first place? There’s simply too much economic benefit in running services and applications in a dynamically scalable way on someone else’s infrastructure. The Cloud providers won’t be giving up any time soon. Nor will IT implementers. This means that there will be a credible solution to these problems, and it will become well understood and implemented by year’s end. If you’re looking for a company to start in 2010 that will have a huge, ready customer base and potential for multi-million dollar valuations with an exit in 18-24 months, then this is the place to look. Start a cloud privacy/reliability/security company that addresses current pain points and you’ll win. We’ll just take 5 percent for the suggestion, thanks.

But all these 2010 predictions are still too easy. Since we’ve been around for the past decade, perhaps we should make some predictions about the decade ahead? Where will IT and SOA and EA be in 2020? Most ironically, we believe that not much will really change in the enterprise software landscape. If you were to fall asleep in a Rip van Winklesque fashion today and wake up on January 1, 2020, you’d find that:
  • Mainframes will still exist — Look folks, if they haven’t been subsumed by all the movements of the past 30 years, they won’t be gone in another 10. Mainframes and legacy systems are here to stay. Invest in mainframe-related stocks.

  • We’ll still be talking about Enterprise Architecture – One of the biggest lessons of the past 10 years is that the business still doesn’t understand or value enterprise architecture. CIOs are still, for the most part, business managers who treat IT as a cost center or as a resource they manage on a project-by-project and acquisition-by-acquisition basis. Long-term planning? Put enterprise architects in control of IT strategy? Forget it. In much the same way that the most knowledgeable machinists and assembly line experts would never get into management positions at the automakers, so too will we fail to see EA grab its rightful reins in the enterprise. We’ll still be talking about how necessary, under implemented, and misunderstood EA will be in 2020. You’ll see the same speakers, trainers, and consultants, but with a bit more grey on top (if they don’t already have it now).

    Soon, your most private information will be spread onto hundreds of servers and databases around the world that you can’t control and have no visibility over.

  • More things in IT environments we don’t control – IT is in for long-term downward spending pressure. The technologies and methodologies that are emerging now: Cloud, mobile, Agile, Iterative, Service-Oriented are only pushing more aspects of IT outside the internal environment and into environments that businesses don’t control. Soon, your most private information will be spread onto hundreds of servers and databases around the world that you can’t control and have no visibility over. You can’t fight this battle. Private clouds? Baloney. That’s like trying to stop tectonic shift. The future of IT is outside the enterprise. Deal with it.

  • IT vendors will still be selling 10 years from now what they’ve built (or have) today – There is nothing to indicate that the patterns of vendor marketing and IT purchasing have changed in the past 10 years or will change at all in the next 10 years. Vendors will still peddle their same warmed-over wares as new tech for the next 10 years. And even worse, end users will buy them. IT procurement is still a short-sighted, tactically project-focused, solving yesterday’s problems affair. It would require a huge shift in purchasing and marketing behavior to change this, and I regret that I don’t see that happening by 2020.

The ZapThink take

Some of the above predictions may seem gloomy. Perhaps the current recessionary environment is putting a haze on the positive visions of our crystal ball. More likely, however, is the fact that the enterprise IT industry is in a long-term consolidating phase.

IT is a relatively new innovation for the business having been part of the lexicon and budgets of enterprises probably for 60 years at the longest. Just as the auto industry went through a rapid period of expansion and innovation from the beginning of the past century through the 1960s to later be followed by consolidation and slowing down of innovation, so too will we see the same happen with enterprise IT.

In fact, it’s already begun. Five vendors control over 70 percent of all enterprise IT software and hardware expenditures in the enterprise. Enterprise end users will necessarily need to follow their lead as they do less of their own IT development and innovation in-house.

Now, this doesn’t apply to IT as a whole – we see remarkable advancement and development in IT outside the enterprise. As we’ve discussed many times before, there is a digital divide between the IT environment inside the enterprise and the environment we experience when we’re at home or using consumer-oriented websites, devices, and applications.

We expect that digital divide to continue to separate, and, perhaps within the next 10 years, reach a point where enterprise IT investment will stagnate. Instead, the business will come to depend on outside providers for their technology needs. Wherever that goes, ZapThink has been there for the past 10 years, and we expect to be here another 10. In what shape and what form we will be, that is for you, our customers and readers to determine.

This guest post comes courtesy of Ronald Schmelzer, senior analyst at ZapThink.


SOA and EA Training, Certification,
and Networking Events

In need of vendor-neutral, architect-level SOA and EA training? ZapThink's Licensed ZapThink Architect (LZA) SOA Boot Camps provide four days of intense, hands-on architect-level SOA training and certification.

Advanced SOA architects might want to enroll in ZapThink's SOA Governance and Security training and certification courses. Or, are you just looking to network with your peers, interact with experts and pundits, and schmooze on SOA after hours? Join us at an upcoming ZapForum event. Find out more and register for these events at http://www.zapthink.com/eventreg.html.