I recently posted an article, Metadata, Connection, and the Big Data Story, covering the big-data analysis process as applied to “human data” that is communicated in intentionally expressive sources such as text, video, and social likes and shares and in implicit expressions of sentiment.
The article is spun out from Q&A interview of four industry figures: Fernando Lucini (HP Autonomy), Marie Wallace (IBM), Elliot Turner (AlchemyAPI), and Stephen Pulman (University of Oxford and TheySay). Read each interview by clicking or tapping on the name of the interviewee.
This interview is —
Analytics, Semantics & Sense: Q&A with Fernando Lucini, HP Autonomy
1) What’s the really interesting information content that we’re not really getting at yet, and what’s interesting about it?
For me it’s the electronic essence of people. We interact with many systems, we communicate, we create, etc.. we are represented as human beings by a wealth of information and context, yet we don’t connect these dots, in a way that’s truly useful, for each of us to be better served by information. We have some pockets of what is referred to in the industry as “profiling” which clumsily help us in being presented ecommerce and advertising, but don’t help us choose a doctor when needed nor help us connect with our loved ones, or help us be better professionals through information in time etc….For me this is critical as we are surrounded by such a mass of generally low value information that we are going to need for this “essence” or “characterization” to be our pre-filter for information.
It’s close to my heart as it’s a human information problem and it’s what we are all about here at HP Autonomy.
2) How well are we doing with Natural Language Processing, noting that formally, “processing” includes both understanding and generation, two parts of a conversation?
Seeing as NLP in this context (question-answer) has been around and well understood for decades now, it feels like it’s having a new lease of life with things like Siri and others. But the reality is that the volumes and specialization needed to have the “conversation” means that we will only be dealing with a portion (the most traversed) of “conversations”. I’m not sure this is the solution to our interaction with information, but it’s certainly a part of it. Certainly in the consumer world. In the enterprise world this might prove more tricky as one does not share the conversations with vendors to analyze.
When faced with NLP I try to go back to the essence of the problem and work out what it is we are trying to solve. Is it that we humans tend to communicate in question form? Or is it that we tend to ask questions of our machine systems? In either case it’s a problem of understanding strong elements of information as well as the weak elements, then how they connect to other pieces of information that have strong and weak elements etc….i’m not sure questions in many cases are anything but the limited beginning of seeking answers within information. It’s how the information connects that’s critical and our ability to tap into the right connections.
3) And how well are we able to mine and automate understanding of affective states, of mood, emotion, attitudes, and intent, in the spectrum of sources available to us?
This one for me is a supply/demand driven question. I think we are much better able to do this today than 12 month ago and will be better again in 12 months from now. This is because there is a demand for this type of analysis and thus enough people out there are working to solve this unitary problem. Using all sorts of techniques, from unsupervised to supervised, from statistical to linguistic and anything in between.
Some sources are better suited to this analysis, say tweets or Facebook entries. Principally because they are short and to the point. Then we will have things like email that will pose a substantial problem as we use prose to characterize our needs, wants and desires with the full beauty of language.
Then we have the entire world of video and audio. Where even human beings might have difficulty identifying what are very personal states of expression which are very dependent on the individual. Yet, this is potentially the most interesting of media?
4) Deep learning, active learning, or maybe some form of machine learning that’s being cooked up in a research lab: What business benefits are delivered by these technologies, and what are the limits to their usefulness, technical or other?
As in the NLP question we need to go back to basics and never lose sight of the reason why we do things. Certainly as it relates to business and data. The central (and obvious) premise is that the information created by a company in all its forms represents an incredible asset to the any company. Yet clearly this information is serving a purpose already as it relates to the business. So for me business benefits in “mobilizing” this information further are in making sure that any piece of information fulfils its maximum potential. This is simple; when this information is needed it should be presented. A human being can then decide on whether to use or not. It’s about machine augmented intelligence.
There are limits. Technically speaking there is a cost to making all information available, businesses need to have a clear view on how this translates to value or won’t make the purchase. Then there is the fact that whatever technology one uses as we try to help the user in their daily lives we must take into account the fact that’s it’s down to the consumer to ultimately decide if the dots have been connected and this might be a very subjective operation. This speaks to perceived usefulness of this type of technology.
This is what we do for a living here at HP Autonomy. We connect the dots.
5) Mobile’s growth is only accelerating, complicating the data picture, accompanied by a desire for faster, more accurate, and more useful, situational insights delivery. How is your company keeping up?
Thankfully we put the emphasis on information and making sure we manage, understand and act on it generically. So mobile is great for us because of its greater use of information (in our case human information), and it makes some of our offerings even more relevant.
Interestingly I think mobile is turning every one of us into data scientists. Certainly data specialists and certainly data discerners. This is great because it pushes the industry to create tools for a more demanding user which in turn moves technology to new levels. This is exactly the point I make clumsily around NLP. Mobile will make NLP or question-answer systems take a leap. It’s down to the discerning consumer of the product.
6) Where does the greatest opportunity reside, for your company as a solution provider? Internationalization? Algorithms, visualization, or other technical advances? In data integration and synthesis and expansion to new data sources? In monetizing data, that is, yourselves, or via partners, or assisting your customers? In untapped business domains or in greater uptake in the domains you already serve?
For our business we see great opportunity in three areas (in no particular order of importance as they are all equal);
Firstly in supporting the chief marketing officer’s organization in any company in making better and more real-time decisions as they connect consumer and product. Using groundbreaking algorithms and our understanding of human information to automate what is at the moment a very manual process. This will mean companies can now consume more data about their users and mobilize more products yet be even more agile and certainly more successful.
Secondly we see incredible opportunity in a field that we already lead, which is the management and compliance of information to both regulated industries as well as for companies that look at governance of information proactively. Securing or protecting information as well as evaluating the risk in information and then being able to act suitably and in accordance with regulation and law is a considerable integration and synthesis challenge, but one which must be done 100% correctly. We continue to evolve our products to serve our customers in this respect better and better.
Finally I think that there is a definitive trend towards self-service and using tools in the cloud. So we are making a substantial bet in launching HP IDOL OnDemand. Where we are taking our core information platform and launching it as a developer friendly platform that any developer can create applications with in the cloud. The full richness of our platform is available to inquire, investigate and improve information to deliver information rich applications. This platform will shorten substantially the time it takes for any company to create an information rich application for their business but at the same time provide all the value of cloud in terms of costs and supportability. I can see that as customers experience that they can be more agile in creating valuable applications with HP IDOL OnDemand they will trend towards placing more and bigger bets in this platform.
Thank you to Fernando!
Click on the links that follow to read other Analytics, Semantics & Sense Q&A responses: Fernando Lucini, Marie Wallace, Elliot Turner, and Stephen Pulman. And click here for the article I spun out from them, Metadata, Connection, and the Big Data Story.