Monday, June 27, 2011

The secret weapon of marketing war 3/3

Compute & Conquer: Generals of text mining

The general who wins the battle makes many calculations in his temple before the battle is fought. The general who loses makes but few calculations beforehand.


In the previous part, we have seen how text analytics (TA) can be used to build virtual map (key players and their relationships) of the market segment of our interest (a.k.a. battlefield) and in the following one we will dig down deeper into data to discover dynamics of the fight - who is winning or loosing and why. If you have missed the introductory part for some reason, please follow this link. Now, we will demonstrate all of this on the group of mobile network operators that we have created in the previous part. Our collection of news articles grew by that time up to 674 stories. We can recall that we have three key players: Orange Slovakia, T-com/T-mobile (part of Deutsche Telekom) Slovakia and Telefonica Slovakia (formerly known under O2 brand).

Picture 1: Network operators


First we will analyze T-mobile/T-com brand:

Picture 2: Results for Tcom/Tmobile Slovakia

On the picture we can see interesting topic - red square region with terms "pokuta"(fine)/"vypadok"(crash/outage)/"linka"(line) - which represents articles dealing with consequences of a emergency line (112) crash in Zilina region, operated by Tcom (the same company that runs Tmobile brand), that caused unnecessary death of one woman (more details here). Being in the news because of service failure with such an impact has a negative effect on brand perception:

Picture 3: Articles regarding consequences for emergency line crash. 

Unfortunately, the above was not the only problem that got attention in the media  -  from the results (still picture 2 - 3d row - terms ("zhromazdenie (assembly, meeting), "valne" (general, company), dividend") follows that another problematic situation occurred when the major shareholder in Tcom/T-mobile (Deutsche Telekom - DT) decided on the annual general meeting not to pay dividends back to its shareholders, which affected mainly the Slovak Republic and its government that still has significant (but minority) shares in the company. The recently generated negative image of Tcom in the eyes of slovak consumers was further strengthened by the fact that DT used the dividend issue to exercise a pressure on the government to sell its remaining shares (again picture 2, 5th row, terms "podiel"(stake)/"rokovat"(discussed)/"akcia"(share)/"vlada"/(government).

Picture 4: Articles regarding DT's intent to buy remaing shares of Tcom from Slovak government


Fortunately, there are also topics, that on the contrary build positive image of Tmobile/Tcom brand (as can be seen from picture 5). Tmobile is obviously more active (attacking strategy) than Orange and its activity is the main reason why direct benefits for customers emerge - for example via cheaper roaming offers, better network coverage, introduction of new smarphones or price reductions on existing ones.

Picture 5: Positive elements of T-com/Tmobile's presence in the media


To sum things up for Tcom/Tmobile, we have:
  • Problematic dividends, DT's pressure on Slovak government (rows 3 and 5).
  • Emergency line crash and fine from authorities as consequence (7th row).
  • Better and cheaper roaming services (summer is here) for existing customers (2nd row - terms "zakaznik"customer/roaming/"vyhodne"(better price)).
  • Cheaper smartphones and tablets (popular Samsung galaxy) of course in connection with customer acquisition for pre-paid services (9th row).
  • Coverage expansion of T-mobile HSPA network (for example in towns of Kolarovo, Poltar, Fiľakovo etc..) which means better mobile data services, faster internet, etc.. (2nd row - terms "zakaznik"customer/roaming/"vyhodne"(better price)).
Now, let's briefly go through results for Orange:


Picutre 6: Results for Orange Slovakia

  • Orange responded to Tmobile roaming offer attacks (defensive strategy) (row 7) 
  • Company is also offering cheaper smart phones/tablets from popular vendors HTC/Samsung (adding also iPhone 4.0 white) (highlighted row also supported by top window displaying respective articles)
  • Operator pushes hard with LTE network tests with uploads faster than 100 Mbit/s (rows 3 and 10)
As we can see, there is almost no negative flavor in the news about Orange (well, we can see spot mentioning of network outage (second row on the previous picture – term “zlyhat”), but since it is mentioned only once we will not consider it as relevant in this demonstration (of course in the real "production" analysis we might treat it differently).

Finally, we can take a look at Telefonica, which is the youngest and smallest player-operator in Slovakia:
Picture 7: Results for Telefonica Slovakia

From the results we can see, that:

  • Operator is not competing in the same area (flanking strategy) as T-mobile/Orange, rather it focuses on pre-paid offers of free SMS messages for 30 days following the day of renewal of pre-paid credit. Highlighted row on picture 7 containing terms like “zakaznik”(customer)/”predplateny”(prepaid)/”kredit”(credit). Supporting articles are displayed on the right side of the picture.
  • Telefonica is dropping O2 from the official company/brand name (as can be seen on the previous picture, 4th row)
  • Row 6 take us to very interesting topic. Twitter account - @o2slovakia - which from it’s inception  truthfully informed about all special offers by O2, suddenly displayed controversial status “Forget about O2, we’re just lying and bullsh***ng you. Go for Orange or T-mobile, for quality for better price…”. This of course on the first sight appeared as ordinary hack into the account of mobile operator on the popular social network, but Telefonica quickly informed, that the account was not maintained by them and therefore from the beginning provided unofficial information. After contacting the social network operator for the brand name abuse, the problem account was removed. We can only speculate, that this might be a revenge for the aggressive tone that Telefonica used against it’s competitors for example here following the principle that stronger players do not attack weaker players... at  least not officially…    

Picture 8: Results for articles discussing attack on O2 brand
When speaking about Telefonica, it is also interesting to take a closer look on one more topic. Virtual opertor (based on Telefonica) - Tesco mobile - attracts price sensitive customer with offer to call “only for one cent”, marketing the idea aggressively as “A price revolution in Slovakia”. The name of the player behind the campaing remained hidden for the whole duration of the “heating” time (the period when only promising marketing messages were available with no real offers to compare with). After gaining momentum and attracting customers, campaign quickly fade away (after revelation of full offer details of the offer which were of course not so revolutionary). Recognizing signs and techniques of guerrilla style fight? Picture 7, 7th row, terms tesco/”virtualny” (virtual)/”jeden”(one)/”volanie”(calling).

With the help of text analytics, generals fighting in marketing wars can decipher tactics of the oponents even from publicly available sources and take appropriate actions. What is more important – topics created from text analysis can be used to measure overall positive vs. negative image (or position) associated with the brand and it’s impact on customers. The next part of the article is beyond of text analytics alone - but we need to go through it just to illustrate possibilities of using TA – so please take this only as really illustrative example. Using this approach in real-world will require more detailed models and computations.

Let’s assume right now, that all discovered topics will be grouped in the following categories and later marked with respective positive or negative score according to this table (model):

Picture 9: Model that will be used to categorize discovered topics for each operator


For example the LTE showcase topic by Orange will be categorized under the category “Perception of operated network” and will get +5 points. If we cannot put a topic into any category from our model, we will treat that topic as “generic” one. Running briefly through all topics discovered showed above, we might  yield the following table.

Picture 10: Topics and points that were assigned to them

The table says that Orange is clearly leading (for the period in which we reviewed articles) with T-com/Tmobile finishing second and Telefonica third. Results can be visualized as a pie chart (which is of course more suitable especially when you’re going to present results to someone else).

Picture 11: Visualization of results after category assignments, not taking frequency into account

As the heading of the previous chart suggests, we should also consider frequency of articles supporting each topic. We can assume, that the more frequent a topic is in the media, the more significant impact it has – both positive, or negative. Table with topics and frequencies (if frequency is not detected, number “1” is used instead of zero value) follows.

Picture 12: Frequencies of topics

Finally we can compute the resulting table, where values are dot product of frequencies and original category points from model.

Picture 13: Topics and points assigned to them taking frequencies into computation model

Now we can see, that situation remained unchanged for Orange, but Telefonica and T-mobile switched positions, where the latter lost its position due to the higher frequency of negative topics (emergency line crash, problematic dividends) in the media. Please note, that the computation model presented here is tailored for the purpose of this demonstration and therefore is not suitable for real deployment. For example in the real life the impact of frequency should be weakened (for example with some weight) not to influence the final results too much.

Picture 14: Visualization of results after taking topic frequencies into computational model.

So far we have demonstrated how TA can processes raw data (free form text articles) and turn them into valuable information, that can be later evaluated, visualized and if necessary measured for longer time periods to discover trends.

Picture 15: Hypothetical graph visualization of trends in brand perception of mobile network operators.

Picture 15 displays purely hypothetical graph, on which we can see that T-com/T-mobile “might” systematically strengthen its position over time (showing that their marketing and PR guys using TA in order to succeed in their fight against competitors). Trend visualization is just one example how to use the newly derived information, but possibilities are endless (for example enhancing existing BI prediction models for existing customer segments, etc...).

I hope that the marketing war trilogy helped you to better understand how you or your company can benefit from usage of text analytics/mining techniques right now and in real-life scenarios. Again, I will appreciate your feedback as comments under the blog.

Sunday, June 12, 2011

The secret weapon of marketing war (2/3)

Battlefield: Slovakia

Now the reason the enlightened prince and the wise general conquer the enemy whenever they move and their achievements surpass those of ordinary men is foreknowledge.

Note: If you have for whatever reason missed introductory part, please follow this link first. 

Before a military battle, the battlefield  is usually mapped and studied in great details. Marketing battles are bit different in that they take place in the minds of consumers, so on a terrain for which it is hard to retrieve electronic map in few seconds from popular search provider and to make it even worse – landscape can change rapidly like dunes after a sandstorm. The question is, how can such place be mapped or studied?

The answer is s that it is possible to construct such a map by assembling bits and pieces of data from reliable sources. Valid source can vary from any user forum, social network to specialized portal. In this demo, articles will be used from top 5 portals specializing in mobile phone market news coverage. Again only (mostly incomplete) information published via RSS channels will be subject of the analysis, to conform with the IP rules. Despite considered as more direct and therefore better source for market analysis – user forums or social networks will not be used simply because none (in Slovakia) is publishing something like RSS/Atom feeds. Of course, at dawn of the war of life or death for your company, this is not something that would stop you, but here we can accept that consumer's mindset is strongly influenced by what he/she reads or hears from the media. Our example consists of 532 articles harvested from the following sources:


Of course, simple aggregation of news articles alone is useless – but this is exactly the point where secret weapon of our choice - text analytics (TA) –  can save the day. As we have seen in the previous demonstration, TA can be used to extract important domain topics and their relationships. So as a first step, we will run all articles through the analytic engine and from the results we can learn, that there are three mobile network operators active in Slovakia – Orange, T-mobile and Telefonica O2 (green frames on the picture #1) and also three major mobile phone platforms  - Symbian, Windows Phone, Android (red frames on the same picture).  Of course there might be also other platforms, but these were not discovered during the first run because of the domination of those mentioned above. The biggest cluster (291 articles – blue frame) also tells us that we have a category of companies in the data set (and strong presence of “operators”) with a close relation to "mobile phone(mobilny telefon)". So the first results are quite promising because we can see that data collection is full of information about the key players operating on the analyzed market.

Picture #1: Result of first run

To process this result, we will create an element representing mobile phone/smartphone category and as we go we will add elements representing particular devices. We will also create another element representing operators and will add elements representing particular companies. After few more cycles, our virtual battlefield map will consist of operators (picture #2): 

Picture #2: Operators
as part of larger group of companies (picture #3) mentioned in the data. On the next picture we can see, that the original list of operators can be enriched with  mobile OS/platform vendors and device vendors:

Picture #3: Companies

and to make it complete, we have a mobile OS/platform category and mobile phone/smartphone category with respective devices.

Picutre #4: Devices

As we move on, deeper research reveals additional topics - additions to previously defined categories of platforms, devices and companies. For example on the next picture (picture #5) we can see emergence of apple/iphone company/device:

Picture #5: Emergence of apple/iphone from the data

Another interesting characteristics that can be read from previous picture, but is even explained in more details in the next one, is that information written about mobile phones is closely related to specific device features like display, camera or processor (picture #6). 

Picture #6: Display/camera features of mobile phones frequently mentioned in data collection

After few more cycles, we can stop and see how a virtual topic map of our battlefield including companies (vendors, operators) devices and key features can be tested by performing a search for example for HTC company (picture 7). On the picture we can see related products and devices:

Picture #7: HTC search showing related topics

It is very important to mention that numbers in brackets after the element names represent number of articles matched under each topic. Numbers are absolute, not relative to our current search.

So far we have seen how text mining/analytics can be used to create a virtual model of some specialized market by aggregation and analysis of generally available information on the net. In the next part, we will use it for deeper analysis and will discuss the findings in the context of marketing warfare strategies. For example let's look back on the updated picture (picture #8) of companies:


Picture #8: Updated list of key players related to mobile phone market in Slovakia

It follows from the picture that device vendor Nokia is (still) clear leader (when considering frequency in the news alone) in the group of device vendors (HTC, Sony Ericsson, Apple, Samsung). It is also clear, that player number #2 (HTC) is catching up. Using categorization introduced by the Marketing Warfare book, we can view Nokia as the player still dominating the market, HTC and Sony Ericsson as the players with increased market share and the rest (Apple, Samsung) of companies can be labeled as "profitable survivors". Such categorization of companies can help us to predict which company is most likely to adopt which strategy/style when fighting on the battlefield. Leaders are advised to choose defensive styles, strong pursuers will most likely adopt open offensive moves and the rest of the market will most probably try to compete in a less aggressive style. To prove that our strategy (market analysis via news aggregation and mining) so far meets the reality, you can check article on Mobilmania, that presents official data from T-mobile with similar findings to what we have discovered so far.  

The secret weapon of marketing war (1/3)

Introduction

In the book – Marketing Warfare (by Al Ries and Jack Trout, 1986) - we were taught that marketing is a war and that philosophy of customer-centrism is not appropriate - otherwise - the company that dominates the market simply would be the one, that performed the best market research. Authors rather suggest firms to switch to competitor-centrism and apply military strategies to business scenarios, with competing firms projected as sides in a military conflict and market share projected as a territory which is being fought over. Ries and Trout discuss four strategies for fighting the marketing war:

  • defensive
  • offensive
  • flanking
  • guerrilla

A firm's market share relative to that of competitors' determines which strategy is appropriate. It is argued that in matured, low-growth markets, business operates as a zero-sum game, where one player's gain is possible only at another player’s expense. Success depends on fighting competitors for market share.

So although it might not be obvious at the first sight, war is among us. War with real battles, campaigns, styles & strategies, commanders, soldiers and weapons. War in which nearly everybody (as consumer) is involved. War, that takes place in our very minds.
 
In the upcoming two parts, I would like to demonstrate on a real-world data, how the text analytics can help you or your company  to achieve victories. For the purpose of the following demo, articles of portals covering news and stories related to mobile phones are used to reveal status of marketing battles in this interesting market segment of Slovakia.

You can continue reading the next part, called Battlefield: Slovakia.

Sunday, March 27, 2011

It's better to search, than to be searched!

Consistent with the headline, this post will be about enhanced search & information retrieval accomplished via usage of text analytics. For those who haven't read the Introduction post, please do it before reading this one.

In the following demonstration, we will use RSS feeds from a few (credible) information sources from Slovakia. This is because of my preparation for upcoming PosAm TechDay, where I will give presentation in Slovak language. Translation is provided along the whole post and all pictures, so I hope this fact will not discourage you from further reading. For the record, here is the list of used sources from which I downloaded and indexed RSS feeds. From each source, two channels were used, one publishing domestic and the other one foreign news stories:

SME.sk - internet portal of a print media
Pravda.sk - internet portal of a print media
Aktuality.sk - internet-only news portal
TA3.sk - internet portal of a TV news channel

The reason why only RSS content (and not full featured articles) was used is simply because using RSS is legal as defined by copyright policies of content publishers (as you can see I really stick with the message from the headline and therefore will leave my web-spiders locked down this time :-). For the purpose of this demo, more than 400 article previews were downloaded and indexed. First of all, very basic and very common full-text search capability will be presented with the query “japonsk*” which is equal to “japan*” in English:

 

To quickly translate few results from the result set:
  • First result informs about established way (by Embassy of Japan) how to help people of Japan.
  • Second about Japanese heroes which volunteered for the service (details of the service not known from the headline).
  • Third about radioactive contamination of water in nine prefectures of Japan, etc...
Of course, by reading all the titles and bodies of RSS records, user will gain detailed understanding of what exactly is going on, but let's project this into daily life situation - when one employee is trying to search for something he is not deeply familiar with – gaining really good overview can take a lot of time.

Now let’s have a look, how text analytics can fill the gap here. After running first analysis, we might be presented with data like the one on the next picture:



We can see emergence of variety of topics (large pop up window in the background with table and the yellow highlighted row). In the first line, after quick examination, there are several common words (starting from left – "inform“ (a lot of stories are taken from information agencies and transformed into form of "agency xyz informed about..."), "which“ and "agency“, but after that comes a first word of interest – "reactor", which we will take and create a related topic (smallest pop up window on the left) – from now on, under this topic, only the news containing the word "reactor" (reaktor in Slovak) will match (preview – smaller pop up window in front of large one, slightly to the right). In a similar way we will create more topics. Topics will vary from domestic to foreign affairs as we are processing all articles together). To name few topics from foreign affairs (as much wider audience should be familiar with them): UN resolution on no-fly zone in Libya, Fukusima power plant problem, radiation measurements across Europe, fear of radioactive contamination of food in Japan, etc..

During the following analysis, the bottom table is also used which shows suggestions about relationships between topics that are yet not connected. From it we can derive a standard „HAS“ relationship - the example below is the connection between Japan and Fukusima nuclear power plant, but from the table also follows a suggestion that "Japan" should link with "food from Japan"). Please note, that in our example - there are two standard relationships (Generalization - „IS“ and Aggregation – „HAS“):



Next step focuses back on suggestions for new topics - for example "radioactivity" OR "radiation":



From the diagram we can see the topic, create new topic dialog with test results preview of articles dealing with radiation and spread of radioactive pollution into Europe. Steps of new topic creation and linkage of existing topics are repeated until satisfaction with the „network“ of topics (or end of time given for analytics of particular domain) is reached. Now we have a fancy topic network overlay over indexed articles. How we can use it? A picture is worth of million words so here we go - the very same search regarding “japan*” related articles (left table) can bring on best-related topics (upper right table) and also bit more intelligent visualization of associated topics:



I hope you see the difference from a basic full-text search. Right now we have a summary of all topics, related to our search that can be used for further navigation. For example, from topic visualization we can see that there is something going on with Fukusima nuclear power plant (and the linkage with radioactivity is suggesting that it’s of no good) which is not obvious from the first bunch of relevant full-text results. Of course, as next step in his search, user will click on Fukusima topic and see how the results table, suggested topics table and visualization will change to provide him/her with more targeted information about the situation in Fukusima – and also with hint that there are both - nuclear reactor and radiation - covered. All this is retrieved before reading any news story (which are quite short one sentence articles in this example, but it’s easy to imagine a more common situation - usually in a business environment - that there are larger source documents full of unstructured data):



Congratulations to those who held out till the end of this first demonstration. To sum it up, we have seen how text analytics/text mining can enrich the user’s experience when searching through unstructured data. Benefits are twofold:

  1. User is navigated from known (japan query) to unknown (nuclear reactor problem at Fukusima).
  2. User search is faster and more likely to provide the needed results. The speed in our demonstration might not be so obvious, because we have gone through a bit detailed description how topics network is being built. We can rectify it right now and simply imagine, that the network building and final search is done by different users in the system (with different objectives). The first user (doing analysis) knows the domain and can quickly distinguish from the results what is important and what is not. The other user (only searching) will receive all relevant knowledge immediately free-of-any-extra-effort during his search.
I hope that the first demonstration was interesting for you, if you have any comments about something you liked/disliked, please leave a comment so that I can improve next time. And I'll appreciate any response anyway. Thank you for your time.

Friday, March 25, 2011

Introduction

This post will serve as general introduction into any other post on this blog, which is dedicated to demonstration/examples/impacts of text analytics technique in certain scenarios. To gain quick insight into what text analytics is, please have a look at Wikipedia. The short definition is:

"The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.[1] The term is roughly synonymous with text mining; indeed, Prof. Ronen Feldman modified a 2000 description of "text mining"[2] in 2004 to describe "text analytics."[3] The latter term is now used more frequently in business settings while "text mining" is used in some of the earliest application areas, dating to the 1980s,[4] notably life-sciences research and government intelligence"

(If you would like to get more detailed story, excellent one comes from Seth Grimes)

Text analytics is about employing raw computer power to process large sets (docs, records, ...) of unstructured textual data in order to mine out structural information (categories, tags, associations, etc...) that can be used in variety of ways. For example:

  • Organization of huge document sets in order to achieve better retrieval capabilities.
  • Pattern recognition in text records, that leads to definition of new (for example business) rules.
  • ...

There are of course more situations, where it is handy to have structural representation above the huge pile of unstructured data - which will be explored in later posts. Each post will be dedicated to one demonstration, there will be no ordering sou you can pick up and jump right into those you like. The next post will demonstrate enhanced information retrieval accomplished via text analytics.