Charting the Future with Big Data
What Is Big Data?
Interest in big data has exploded since the McKinsey Global Institute issued its report on “Big Data: The Next Frontier for Innovation, Competition, and Productivity” in 2011. An era will soon come, the report notes, when the amount of information requiring analysis will exceed the capacity of existing data processing systems; being able to make strategic uses of such data can thus provide the competitive edge that leads to new business opportunities.
What exactly, though, is big data? Broadly speaking, it is a term used to describe data that is so large it is difficult to process using traditional database and software technologies. More specifically, it refers to the total agglomeration of user-specific data generated while logged onto a service, usually including various information about a user’s attributes.
Typically, it consists of data relating to the use of Internet services. It may also include point-of-sales customer records, information from smart meters, and measurement data from sensors like accelerometers and wireless wearables. Unlike sampled data, big data looks at all information generated by a service’s users, meaning that it is collated not in units of days or hours but minutes and seconds. Information regarding particular localities can be ascertained not just for the municipal or equivalent level but also for areas measuring just a few dozens of meters across.
The Three Vs
Big data is generally characterized by its huge “volume,” the broad “variety” of information, and the great “velocity” at which it streams in—the so-called three Vs. But when we look at actual big data sets, say, data collected by smart meters or accelerometers, there is not the variety one would find in a list of searched keywords, nor are they necessarily marked by great velocity or volume.
So the real distinguishing hallmarks of big data may have more to do with how they are applied. There are three such features.
The first is the relative paucity of information regarding the user’s attributes compared to data derived from behavioral observation or questionnaire surveys. Even if information is available regarding when and how long a service was used, there are few contextual clues, such as why the service was used and under what conditions. There is also little information about users’ attributes (occupation, education level, income, whether or not they have small children, and so on). This presents a challenge when using such data for marketing purposes.
The second is that all data generated is included, unlike in the case of sampled data. This enables the measurement of the tail end of data sets—where patterns may be different from those seen in high-volume zones.
And the third is the availability of information in real time—at the point of purchase or use—so that a response can be made immediately. Relevant information about certain cars can be streamed, for example, as soon as a user with an interest in particular motor vehicles visits a related web page. Such services are impossible with traditional marketing data.
Delivering More Value
Next, let us consider the six steps toward the fuller utilization of big data from the marketing viewpoint of delivering value to customers.
The first step is figuring out the structure and needs of the market, based on which many new products and services can be strategically developed. The second is designing the value proposition of the services based on such analysis. The third step is customer-specific marketing, and the fourth is responding to identified market needs, such as through advertising, email, and website content. The fifth step is monitoring the impact of the first four steps, and the sixth is getting hard numbers as forecasts for the near future.
The areas in which big data is particularly strong are the last three steps. Big data is needed in order to target services and information to specific users. A typical example would be filtering inappropriate ads based on patterns gleaned from a user’s web history or offering different discount coupons according to past consumption behaviors.
Real-time advertising and the autocomplete function of search engines—where a trending word or phrase is automatically displayed without the user typing it in completely—are products of machine learning from accumulated user data. The more specific the information one wants to obtain, the more indispensable big data becomes. An example would be tracking the changes in late-afternoon, bulk purchase patterns of chicken over the course of a week at a particular shop in front of a train station.
Accurate Forecasts through Predictive Analytics
Perhaps one of the biggest hopes people place in big data is its ability to accurately forecast not the distant future but the near and immediate future. There are two interesting instances in which big data was applied by Yahoo Japan, one being to predict the results of the July 2013 House of Councillors election.
Our prediction results were quite good—better, in fact, than those of any of the major media networks. What we realized was that election results had a high correlation with the amount of online attention candidates were receiving—keyword searches, the number of tweets, and Facebook entries—so we made projections for each electoral district based on this information. Traditional forecasts take a more comprehensive approach, considering such factors as the comments of election experts, voter interests, and poll results. All we did was to look at the patterns and volume of online data, and we wound up with better projections.
The other example was our economic forecasts. The Cabinet Office announces its economic indicators of business conditions with a time lag of about one or two months. We decided to come up with a way of ascertaining current conditions, not those of two months ago. We started by analyzing the keywords searched on Yahoo Japan, of which there are some 7.5 billion varieties per year. We narrowed the list down to around 600,000 that were constantly searched for and extracted approximately 200 that were most closely correlated with business conditions. We then created an index based on those 200 words that turned out to be quite accurate.
In these ways, big data can be highly effective in predicting the near or immediate future. It is already being actively used for supply chain management at convenience stores, for instance, which is one reason they manage to turn a profit despite the fact that deliveries of thousands of products are made three times a day to each shop.
The use of big data is not without its obstacles, however. First of all, very few organizations actually have a volume of data on their hands that can be described as “big.” Secondly, since the data that is available is not sufficiently organized, it cannot be integrated and used collectively. Retailing data, for example, is generally classified hierarchically, but the layers tend to be uniquely structured for each retailing chain, and there is little compatibility between chains, even when they belong to the same corporate group. There is no easy way to amalgamate such data and to fully exploit its potential.
Even after these two hurdles are cleared, a mechanism is needed to handle the traffic of such data in real time and to turn it into useful information. Most companies do not as yet have such a capacity. They often do not even have the infrastructure to store the volume of information that continues to stream in, and even if they do, they do not have the human resources to utilize and maintain such data.
What these companies need are professionals with the data science and engineering skills to turn the vast sea of data into practical business solutions. The reality at most companies in Japan today is that they do not have enough data, cannot integrate what they do have, do not have mechanisms to process and utilize the data, are without adequate storage capacity, lack the human resources to utilize and maintain stored data, and do not have the expertise to turn the amassed data into practical solutions.
The Polarization of Security Measures
One criticism that is often voiced in relation to big data is the need for privacy protection. But because such concerns are being expressed from multiple vantage points, the arguments tend to become rather convoluted. Most major Internet operators already have critical security measures in place. Yahoo Japan, for one, clearly delinks any personally identifiable information from data regarding behavioral history. All data used is strictly anonymous.
But at most traditional companies, data is managed in ways that continue to link personal information directly with log data. Many do not have adequate safeguards and are thus often not even aware whether or not their data has leaked. So there appears to be a polarization in the use and protection of personal data. This suggests a need for corporate guidelines in the use and storage of such information.
Handicaps Faced by Japanese Companies
Japanese companies face three major handicaps compared to their counterparts in the United States and elsewhere. The first is that there are few corporate entities actually generating big data. The second is the lack of infrastructure to facilitate the use of data. Electricity costs are many times higher than in the United States, for instance, driving up the cost of building and operating a data center. This is one reason that major IT firms have been hesitant to set up operations in Japan, including data infrastructure. Incentives are needed to encourage more IT enterprises to build data centers in Japan, such as by offering special electricity rates for such firms.
The third handicap, as mentioned above, is the shortage of human resources. The kind of professionals needed are those with skills in data science, data engineering, and business management—that is, those who are familiar with the technical limitations and possibilities and have the insight to identify and solve business problems.
By data science skills, I am referring to practical knowledge of data processing, artificial intelligence, statistics, and other related disciplines. Data engineering involves the practical application of such knowledge to create and to run systems capable of generating useful information. Of course, all of these skills need not be held by a single person; but conscious and focused effort should nonetheless be made to develop human resources with these skills. This, I believe, will largely determine the success or failure of efforts to utilize big data to full advantage.
(Originally published in Japanese on October 15, 2014)