At a low cost of entry, the emerging technologies that define the big data trend are already delivering value, so first consider the problems you need to solve -- then dive in.
Already, "big data" has become one of those buzzphrases you say with an apologetic smirk. It sounds like marketecture, broad enough to apply to almost anything.
So let's clear up what big data is and isn't. Perhaps you've heard the canonical "three V's" definition: data high in volume, velocity, and variability. In other words, big data comes in multiterabyte quantities, accrues or changes fast, often resists normalized structure -- and tends to demand technologies beyond the tried-and-true RDBMS or data warehouse.
That cluster of new technologies around big data -- including Hadoop, a wild array of new NoSQL databases, massively parallel processing (MPP) analytic databases, and more -- together represent the biggest leap forward in data management and analytics since the 1980s. That's really what big data is about. And these emerging technologies are already delivering business value: in deep insights about customer behavior, in faster app dev cycles, in the ability to use commodity hardware, and in reduced software licensing costs, because almost all these new technologies are open source.
Assuming your data volumes are exploding as fast as everyone else's, you're part of the big data trend whether you like it or not. So why not employ the tools purpose-built for the big data era? It's a better strategy than blindly buying more Oracle licenses or building another gold-plated data warehouse. Where you start, though, depends on the problems you want to solve.
Problem No. 1: I don't want to pay Oracle more money
This is not a big data problem per se, but software surrounding the big data trend may help solve it.
Many companies use Oracle (or DB2 or SQL Server) as their default data store for almost everything. After all, the RDBMS is probably the most successful technology in the history of software, and if you want a battle-tested, unassailable RDBMS with all the bells and whistles, you choose Oracle (or other ironclad commercially licensed software) and pay a lot for it. That's where data goes, period.
Now, nobody would power down their Oracle servers and port all their existing customer and product data to, say, MongoDB. For one thing, the security isn't there yet -- and by their nature NoSQL databases tend to compromise ACID compliance. Also, when complex transactions are involved, even NoSQL vendors will tell you that an RDBMS remains your best solution. Finally, if you just want to save money, you're not going to waste a fortune rearchitecting an Oracle database and its applications for NoSQL.
Problem No. 2: I can't get what I want from BI
Business intelligence always seems to rank among the top few technology priorities for big companies. Yet year after year, few seem very happy with the results.
It all boils down to the questions you want to ask. If you have queries related to, say, the regional distribution of your transactions or trends in the costs of your materials -- or if you want to make some predictions about how all that may play out next year -- conventional business intelligence and analytics systems probably remain your best bet.
Problem No. 3: Help! I can't move fast enough!
In the good old days, databases were a lot easier to spec out. If you were a big enterprise scoping out a new order entry system, you probably had a solid idea of how many people would use it, when the peak demand would be, and how frequently (or infrequently) the data model would change.
That was before the "agile" days of the Web. Now companies experiment with all kinds of new applications, many of them public-facing Web apps. Some wither quickly because no one finds them compelling; others may explode in popularity and turn the database into a bottleneck overnight. Moreover, shifts in customer needs, brainstorms for new enhancements, and so on demand a fluid data model.
With an RDBMS, data needs to fit into rows and columns, and required fields rule, so a request to alter the data model kicks off an elaborate change management process. You need to upgrade the horsepower of a single RDBMS server, and when you need to add more RDBMS servers, you must "shard" the database across them, which incurs other complications.