One-on-One with HP
Greg Battas, CTO and Vish Mulchand, BI Product Marketing Director

submitted by Maroushka Kanywani, Dashboard InsightTuesday, July 29, 2008

Dashboard Insight's Maroushka Kanywani recently spoke with Greg Battas, CTO of HP Business Intelligence and Vish Mulchand, HP Business Intelligence Product Marketing Director about HP's latest release, Neoview 2.3 and its unique role in the BI industry.

Maroushka Kanywani: Good afternoon, gentlemen. Vish, could you tell me a little bit about your role at HP?

Vish Mulchand: I run the Product Marketing group in HP for Neoview and I work for John Miller, who in turn works for Ben Barnes, who is Vice President and General Manager of Business Intelligence, HP. 

MK: How does HP approach BI?

VM: There is a large consensus that there’s been an information explosion - information keeps growing and it is going to keep growing. Companies that are able to capitalize on that information to drive business insight become leaders in their field. As they say, there is a tidal wave of information – you either ride the wave or you get crushed by it. So to us at HP, the information wave is one of the main tenets of operational BI.

Look at it from a customer perspective – say a telecom company such as a mobile phone provider, for example. Things such as revenue, churn and maintaining or protecting revenue are a key problem for many of these providers. An emerging mobile phone company that today, gains 5 million customers a month could lose 4 million the following month because as you know, in many parts of the emerging world where they don’t have a landline infrastructure, many people buy mobile phones but on pre-paid plans as opposed to a post-paid plan. So holding on to your 4 million customers becomes a huge problem.  

If you look at a problem like that, we feel that some of the tenets of operational business intelligence actually can provide a foundation for solving a problem such as customer churn. One school of thought proposes that if the telecom can understand the types of patterns that are emerging – that is look at massive amounts of data and run some intelligent heuristics from that data to figure out certain patterns, they could then start promoting different options to their customers. You would have front line customer service representatives in a position where they are able to access this data and make a customer an offer based on an analysis of their call patterns.

As companies look to try and either grow or protect their revenue these are the kinds of capabilities they would want to have and operational BI, in our mind, gives companies these very kinds of capabilities.

MK: Using a telecom is a great example, Vish. It clearly shows what operational BI is about.

VM: If you look at the telecom example and categorize it as being in the operational BI environment and then look at traditional business intelligence today, you will find that one of the challenges is that while traditional BI does a lot of good things it is however, specialized in the area of certain types of queries.

Many of the attributes that we look for in our operational systems exist in other kinds of systems however what is really lacking today is a combination – having an operational business system that has all these attributes that allow you to go deep into these kinds of things.

The data warehouse, in our mind, is a key piece of an operational business offering because it is the foundation on which all the data is stored, accessed and extracted from. A data warehouse needs to be able to support many users, provide real-time access to data, support mixed workloads such as a customer service agent banging on their keyboard versus a fraud application trying to pick out fraud. These are very different things and could cause contention in the system yet the system needs to be up and running smoothly 24/7.

MK: So where does HP’s Neoview come in?

Greg Battas: I’m the CTO for the Neoview program. Just to give some background, we introduced Neoview about a year ago but we’ve been working on it for quite a bit longer than that and the objective over the first couple of years was to really build out and harden this product with a number of customers – gearing it towards a high-end, enterprise-wide use.

While a number of vendors preferred to target data marts and quick, easy, and small sales, HP was really building this platform out to go after that very high-end niche in the market. I would characterize what we’ve done up until this release as filling out and hardening this product to handle the very large, traditional enterprise data warehouse and we did some operational characteristics as well.

The new release centers on a number of features that are geared towards the operational BI world. If you think about it, 15 to 20 years ago business intelligence entailed a few analysts trying to figure out whether or not to buy the bank across the street. Then in the 1990s, we moved to a model of BI where you had literally hundreds of knowledge workers who were trying to decide what products to put on the shelf, doing analytic-type jobs.

But now analytics is done simply when a customer is standing in the check-out line. Some of the things looked at are de-duplication, matching and understanding whether or not we know the person, whether this is a person you’ve done business with before, what their previous purchases were and how should they be treated while they are on the phone, the web site or the check-out line in a store. Now we have a bigger problem because we have to deal with real-time events as well as things which happened in the past.

Neoview 2.3, our latest release, adds a number of different features, three of which are highlighted with the release.

The first feature is something called Adaptive Segmentation and the trick here is that when you build a very large computer system with literally hundreds of processors in it, it’s relatively easier to teach that to execute very large, complex analytic queries – something that other vendors have done in the past. But sometimes when you do that, you actually make it perform worse at doing very small, short-running things. For example if you want to pull up the phone records of a certain individual, maybe the desire is not to have that problem decomposed across a thousand processors and have them all try to work on the query in parallel since it’s such a small problem.
What one would ideally desire is to segment the machine up, if you will, and be able to run hundreds of those queries concurrently on different portions of the machine – and have them execute faster.

The adaptive segmentation feature provides the ability to handle the extremely high-volume, short-running queries at the same time as massive parallel queries that need this huge machine to run on.

It’s a way to partition the machine into what you might like to think of as “lanes” and have these different query workloads sent down these “lanes” on the system – all concurrently. This allows for handling the one hundred thousand customer service agents coming into the system as well perform massive analytic queries and so forth.

The second feature of Neoview 2.3 is something called Skewbuster, which is a set of patents around a technique we developed to eliminate one of the bigger challenges  in the very complex, analytics end of the spectrum.  While adaptive segmentation helps with high-volume things, Skewbuster helps with a particular problem you have when dealing with large data sets and you’re trying to execute on them in parallel and they’re skewing the data.

A common example of this is if you encode something and have a customer ID or number for everyone in your system but if you don’t know the customer number and put a zero in the field and it turns out that 15% of your customer IDs are zero. That actually causes a lot of problems with big parallel computer systems and DBAs spend a lot of their time trying to manage their way out of that.

So the Skewbuster techniques automatically eliminate those problems and allow for handling very large queries and to do them much quicker. We’ve seen improvements of as much as 30 or 40x speed up executing very large queries with Skewbuster in place.

Skewbuster and Adaptive Segmentation can be looked at as complementary bookends; with a very large, traditional analytics complex-type queries and extremely high-volume but low-complexity queries respectively.

A third feature of Neoview 2.3 is called the Transporter. Most data warehouses have some type of a product that you load the data warehouse with and some kind of a product that you extract from them and these are typically thought of as batch operations. Sometimes tables are even taken offline and data loaded into the database at night when nobody is using it and then extract data to send to somebody but in the world I just described with hundreds of thousands of customer service agents you can’t really take it offline like that.

I like to think of Transporter as something that ingests; it fills the role of a loader but is an extremely fast way of bringing data in and actually getting it ingested into a database while the database is online, being queried and used. So it never has to go offline, it doesn’t have to be modified or have indices dropped into it or do any kind of maintenance that people would normally have to do. The streams of data could be a set of events that are pouring in off of call detail records, off switches, or it could be a point-of-sale stream. As those events are occurring they are being ingested into the table in real time.

Complementary to that is getting data back out again, which is the publish-and-subscribe mechanism. So we did a streaming interface into the database –a product feature that we call “pub-sub.” This mechanism allows you to subscribe to the database and have events that are occurring sent to you. For example you can specify that you’d like transactions that involve the stock ticker symbol “HP” or “HPQ” sent to you. You also have the option of not only having events sent to you but also subscribing to a particular state – such as a message sent whenever the inventory level falls below a certain point.

MK: So this is similar to having a “Google Alert” being sent to you – but in a more specific way.

GB: Yes. Picture Google Alerts on billions of rows of data and as these events are occurring and the states are changing you can be subscribed to the database engine itself.

MK: I have noticed that Google is not always savvy enough to differentiate between certain terms in depth, giving you information that is not really relevant to your search parameters.

GB: Well, in this case the subscriptions are on structured data versus unstructured. In Google’s case, what they’re trying to do is text string matching where what HP would be doing is more around structured information like inventory levels and so forth. So we would have records that have fields in them such as date, time and transaction ID and somewhere in there might be a field detailing how many were sold and in the database there’s a table that is keeping track of how many records you have on hand. So we can actually watch the structured information the same way that Google would be doing text matching against unstructured streams. So yes, this is kind of a structured analogy to what Google does – you described it in a good way.

On top of all this when you start to do all these different workloads at the same time, it actually becomes very tricky to guarantee service levels and protect one from another. For instance, when a call center hits me and wants to know things about a certain customer I should always respond in a second or less – and it should not matter that someone in the back room just started up a huge job to try and do some analytic number crunching against the same data. I’ve got to be able to guarantee the service levels and protect one workload from another since the database is shared by different users.

In the last year, we’ve done an enormous amount of work around workload management that culminated in our Neoview 2.3 release. This involves one system that collects in real time all the statistics of everything that’s running so that we can understand every user, what they’re doing, how much memory they’re consuming and all these things about their workloads and then there’s another sub-system that sits on top of that actually manages all these workloads. It allows one to set priorities at service levels such as cueing multiple instances of jobs up so they do not impact other users and once the system has free resources they would automatically start again.

So there is an automatic workload manager that sits on top of this and to help to prioritize and throttle these so that I can guarantee that I can give the right service level to mission-critical operational jobs that are coming through in the face of other workloads that may be more backroom and analytic.

I think this release of Neoview 2.3 gives us at HP some really interesting leading industry capabilities to take what today I think, was a platform that competed square with some of the biggest analytic types of products in the business that did pure analytics.

Now we’ve extended that to be able to do mission-critical, real-time types of operational BI scenarios where we’re plugged right into the business and no longer is the client going to be an analyst sitting there with a query tool. Instead the client might be a call center or an operational system that’s touching a customer and we have to be able to connect to them, provide data to them and provide service levels they can count on.

MK:Greg and Vish, it was a pleasure speaking to both of you today.

 

(Copyright 2008 - Dashboard Insight - All rights reserved.)

    Other articles by this author

Discussion:

No comments have been posted yet.

Site Map | Contribute | Privacy Policy | Contact Us | Dashboard Insight © 2008