The toolbox is half-full - for large data warehouses - includes product table - Data Warehousing Directions, part 3 - Industry Trend or Event
Linda WilsonBig, bigger, biggest. Scalability, in a nutshell, is the issue facing is managers as they plan their future analytic data warehouse and data analysis needs.
To meet users' demands for more data from more sources, they'll grow their warehouses up to 10Tb. To scale that high without sacrificing performance, they'll move to 64-bit platforms. With warehouses that large and complex, IS managers demand integrated, turnkey solutions from their software vendors.
"If we could get everything from one vendor that would certainly be a plus," says Beach Clark, manager of network architecture at The Home Depot Inc. in Atlanta. Home Depot's IS staff has talked about building an enterprise warehouse, says Clark, but hasn't gone ahead for a number of reasons, including complexity.
Data warehouse vendors have heard the demands of users and potential customers like Home Depot, and they're responding.
Relational database vendors, for example, are adding multidimensional capabilities to their products. They are also building alliances or developing in-house products that will allow them to offer tight integration with front-end and management-layer tools. Meanwhile, multidimensional database vendors, who came into being in response to users' demands for robust decision-support tools, are concentrating on front-end analysis software -- OLAP (online analytical processing) and data mining tools -- and integrating their front ends with other database products. They're also positioning mining tools to search for patterns in huge stores of data, while OLAP tools follow up with analysis of those patterns.
In the middle layer, management tools transparently enable users to find and convert data into the format required for analysis. These vendors want to eliminate the need for homegrown interfaces to connect users to data from myriad internal and external sources.
Relational databases appeal to many organizations shopping for warehousing solutions because they provide fast access to lots of data. Still, general-purpose relational databases weren't designed to scale into the multi-terabyte range. RDBMS vendors such as Oracle Corp., Redwood Shores, Calif.; Sybase Inc., Emeryville, Calif.; and Microsoft Corp., Redmond, Wash., are adjusting their products.
SQL Server should be able to scale to 1Tb in the next 12 to 18 months, according to Dan Basica, SQL Server product manager at Microsoft. Meanwhile, Oracle claims Version 7.3 of the Oracle database can scale into the terabytes today. And, Sybase says its IQ product will scale to 10Tb today.
Relational database vendors also need to adapt to the OLAP capabilities that are characteristic of decision support. Because relational databases were originally designed for OLTP (online transaction processing), they aren't suited to quickly handling analytical queries. As databases become larger, working around this shortcoming is increasingly difficult.
That's one reason why SeaFirst Bankcard Services, Spokane, Wash., purchased Red Brick Warehouse VPT from Red Brick Systems Inc., Los Gatos, Calif., several years ago. SeaFirst was concerned that traditional relational databases wouldn't provide adequate performance. "I don't worry about outgrowing Red Brick," says Dwain Cloninger, data warehouse manager at SeaFirst. "If it can't scale to 10Tb now, I know it will be able to by the time we'll need it." SeaFirst's warehouse currently stores 150 million records totaling 60Gb.
To appeal to performance-conscious customers like Cloninger, Oracle offers a combination of relational and multidimensional capabilities. Oracle Express Server functions either as a multidimensional database or as a management layer, in which data is stored in a separate relational database and extracted and analyzed dynamically. The upcoming Oracle 8.0 will tighten the integration between Express Server and the RDBMS to speed the performance of multidimensional calculations performed on data stored in the relational database.
Sybase, meanwhile, has a slightly different strategy. The company's data mart product, Sybase IQ, is a relational database optimized for analysis. The product allows for the storage of every transaction in a specialized engine," says Richard Finkelstein, president of Performance Computing, a Chicago-based consultancy specializing in data warehouses and intranet strategies.
IQ is designed for both a 64-bit platform and a parallel processing architecture. It will ship its first implementation of the 64-hit platform, for Digital Alpha servers, during the first quarter of 1997. Because it stores individual transactions rather than just summaries, IQ is particularly well-suited to analysis based on detailed data, such as predicting a customer's future purchases based on past transactions.
This is a way for people to develop applications quickly and still scale," contends Joshua Bersin, group director of data warehouse solutions at Sybase.
For its part, Microsoft is taking a partnering approach. Answering IS managers' concerns about SQL Server's scalability at the warehousing high end, Microsoft announced in September that it had partnered with NCR Corp., Dayton, Ohio, and its Teradata database, which is scheduled to support a 64-bit environment in 1998. The alliance allows customers to easily build a solution of a series of data marts on SQL Server and a central, enterprise-wide warehouse on Teradata. Standard interfaces between SQL Server and Teradata will be available in the next release of SQL Server in 1997.
As for multidimensional databases, they can't scale high enough to satisfy the performance demands of corporations building multi-terabyte warehouses. Multidimensional databases are designed to provide quick access to precalculated summaries for commonly asked questions, making them ideal for OLAP.
Because of scalability shortcomings, multidimensional database vendors are concentrating R&D efforts on their front-end products and forming alliances with vendors of other databases. For example, Pilot Software Inc., Cambridge, Mass., earlier this year announced support of OLAP query extensions in Microsoft SQL Server 6.5. Pilot Discovery Server, a data mining tool released in September, connects with both Microsoft SQL Server and Oracle 7.
Upgrading databases is not enough, though. Users not only want to quickly analyze small subsets of a data warehouse, but to comb through the entire warehouse. Multidimensional databases are being integrated so that OLAP and mining tools can be used in tandem. That's the case with Pilot's OLAP and mining tools, as well as OLAP and mining products from Brio Technology Inc., Palo Alto, Calif., and DataMind Corp., Redwood City, Calif.
The idea is that mining tools will comb through huge warehouses, finding patterns in the data, while OLAP tools will perform further analysis on those patterns, delivered to users as subsets of the warehouse. The reality is quite different.
Data mining tools "are not designed right now to deal with large amounts of data," says Mike Majerczyk, a consultant specializing in data warehousing at consultancy Electronic Data Systems Corp. in Plano, Texas. Mining tools typically scale from 20Gb to 40Gb, Majerczyk says.
While smaller vendors are positioning themselves as integrated solutions providers, larger relational database vendors like Sybase and Oracle are expected to become the vendors of choice for integrated, easy-to-use solutions. Notes Steve Cranford, Baltimore-based partner in charge of data warehousing practice at KPMG: "They are building a full array of products so they can say they have soup-to-nuts solutions."
[Graph OMITTED]
RELATED ARTICLE: Outlook for VLDB Tools
Very large databases (VLDBs) are appearing across a variety of industries. In the short-term, however, there will be a dearth of the 64-bit software necessary to effectively plumb the depths of these monster data warehouses. Here's a sampler of current 64-bit strategies and upcoming products for handling very large databases:
Company Product Brio Technology Inc. BrioQuery Enterprise Intersolv Inc. DataDirect Microsoft Corp. NT SQL Server NCR Corp. Teradata relational database Teradata relational database Oracle Corp. Express, the multidimensional front end to Oracle RDBMS Oracle 8.0 RDBMS Pilot Software Inc. Decision Support Suite Platinum Technology Inc. InfoRefiner, Repository, InfoPump, InfoHub, Forest & Trees, etc. Prism Solutions Inc. Warehouse Manager and Directory Manager Red Brick Systems Inc. Data warehouse engine Sybase Inc. IQ, the high-speed analysis tool for Sybase RDBMS Company Upcoming event Brio Technology Inc. No plans for 64-bit Intersolv Inc. 64-bit design Microsoft Corp. Operating system port to 64-bit for Intel/HP microprocessor First 64-bit version probably for Intel/NT platform Interfaces to NCR Teradata platform NCR Corp. Upgrade to 64-bit Interfaces to Microsoft's SQL Server Oracle Corp. Tighter integration between front-end tool and Oracle 8 Upgrading to 64-bit Pilot Software Inc. No plans for 64-bit at this point Platinum Technology Inc. Integrating various 32-bit products; no plans for 64-bit Prism Solutions Inc. No plans for 64-bit Red Brick Systems Inc. Integrating data warehouse engine with data mining tools Sybase Inc. 64-bit ready now, implementing for Digital Alpha server Company When Brio Technology Inc. Intersolv Inc. Available now for Digital's Alpha Microsoft Corp. 1998 1998 1997 NCR Corp. 1998 1997 Oracle Corp. Not determined Not determined Pilot Software Inc. Platinum Technology Inc. Q3/97 Prism Solutions Inc. Red Brick Systems Inc. October 1996 Sybase Inc. Q1/97
Source: Sentry Market Research
Linda Wilson is a writer based in Glen Ellyn, Ill.
COPYRIGHT 1996 Wiesner Publications, Inc.
COPYRIGHT 2004 Gale Group