Sunday, January 15, 2012

Pursuit of semantic desktop

Welcome back to our mini-series of articles exploring usage of text analysis (text mining) techniques. In this part, we are going to make few changes. The first change is regarding the source of information used for demonstration. We will move from Internet resources (like RSS/Atom feeds) into personal desktop area and regular documents - just to prove, that the same techniques can be applied here as well. The second change is that supporting visualizations are provided as YouTube videos, rather than static pictures.

The definition of Semantic desktop is straigthforward (given in Sauermann et al. 2005):

A Semantic Desktop is a device in which an individual stores all her digital information like documents, multimedia and messages. These are interpreted as Semantic Web resources, each is identified by a Uniform Resource Identifier (URI) and all data is accessible and queryable as RDF graph. Resources from the web can be stored and authored content can be shared with others. Ontologies allow the user to express personal mental models and form the semantic glue interconnecting information and systems. Applications respect this and store, read and communicate via ontologies and Semantic Web protocols. The Semantic Desktop is an enlarged supplement to the user’s memory.

To put it in a simple way - the idea here is that we can do a lot better with files located on our machines than just by seeing their name, suffix, icon and size. We can analyze the actual content of the files, recognize their topical and non-topical structures (via techniques like clustering, named entity recognition, tagging etc...), and associations between them and construct ontologies as mentioned before - to express personal mental models, rules etc. With adding a little bit of semantic-web formalism we can ask our desktop to give us those documents where Ford is mentioned as a famous actor and not successful entrepreneur, we can even ask desktops of our colleagues, with their mental models, seeking new knowledge and get insight not only in the content of documents, but also in meta information created by other (perhaps even better) brains.

The whole process can be pretty simple and straightforward - let's suppose that we have bunch of documents (unsorted and without any further knowledge beyond the mentioned attributes of name, suffix and size). We can index and mine out some important (in this case topical) structures,


The first video shows indexing and obfuscation of 1200+ files, followed by cold-start analysis and creation of the very first element. Video ends with preview of created element via simple search.

following with the important associations between them.



In the second video we will see, how easily links between previously created knowledge elements (see previous video) can be created - for example - missing link between "search/searching" and "index/indexing".  


Now let's skip the formalism and try to use our work during sample search:



Simple demonstration how previously mined topics (via text analysis) and their relationships can be used to provide users with "navigated search" experience. First a simple query "search" is fired, resulting into presentation of (500+) search results and suggested topics. "Top two topics" ("scenario" and "tool") however are not a direction we want to go, so we choose to exclude these suggestions. After that we see topic "clustering" that (in the domain of information retrieval) is the way we want to go - so we choose to drill down here - this brings up more related topics and of course narrows the resulting set of documents to nearly a half. After that we do more drill-downs and excludes - each one shaping our topic map (= mental model of one particular user) and supporting result set to the final state from which we can start exploring the documents.


Possibilities are endless - we can use structures and models to identify patterns and trends, working like this for a while we can add time factor and visualizations like variety of time-lines - all this will lead us to definition of new rules and structures, fueled by shared knowledge of group of similar professionals (social semantic desktop) build with similar tools and effort those of ours.



Monday, June 27, 2011

The secret weapon of marketing war 3/3

Compute & Conquer: Generals of text mining

The general who wins the battle makes many calculations in his temple before the battle is fought. The general who loses makes but few calculations beforehand.


In the previous part, we have seen how text analytics (TA) can be used to build virtual map (key players and their relationships) of the market segment of our interest (a.k.a. battlefield) and in the following one we will dig down deeper into data to discover dynamics of the fight - who is winning or loosing and why. If you have missed the introductory part for some reason, please follow this link. Now, we will demonstrate all of this on the group of mobile network operators that we have created in the previous part. Our collection of news articles grew by that time up to 674 stories. We can recall that we have three key players: Orange Slovakia, T-com/T-mobile (part of Deutsche Telekom) Slovakia and Telefonica Slovakia (formerly known under O2 brand).

Picture 1: Network operators


First we will analyze T-mobile/T-com brand:

Picture 2: Results for Tcom/Tmobile Slovakia

On the picture we can see interesting topic - red square region with terms "pokuta"(fine)/"vypadok"(crash/outage)/"linka"(line) - which represents articles dealing with consequences of a emergency line (112) crash in Zilina region, operated by Tcom (the same company that runs Tmobile brand), that caused unnecessary death of one woman (more details here). Being in the news because of service failure with such an impact has a negative effect on brand perception:

Picture 3: Articles regarding consequences for emergency line crash. 

Unfortunately, the above was not the only problem that got attention in the media  -  from the results (still picture 2 - 3d row - terms ("zhromazdenie (assembly, meeting), "valne" (general, company), dividend") follows that another problematic situation occurred when the major shareholder in Tcom/T-mobile (Deutsche Telekom - DT) decided on the annual general meeting not to pay dividends back to its shareholders, which affected mainly the Slovak Republic and its government that still has significant (but minority) shares in the company. The recently generated negative image of Tcom in the eyes of slovak consumers was further strengthened by the fact that DT used the dividend issue to exercise a pressure on the government to sell its remaining shares (again picture 2, 5th row, terms "podiel"(stake)/"rokovat"(discussed)/"akcia"(share)/"vlada"/(government).

Picture 4: Articles regarding DT's intent to buy remaing shares of Tcom from Slovak government


Fortunately, there are also topics, that on the contrary build positive image of Tmobile/Tcom brand (as can be seen from picture 5). Tmobile is obviously more active (attacking strategy) than Orange and its activity is the main reason why direct benefits for customers emerge - for example via cheaper roaming offers, better network coverage, introduction of new smarphones or price reductions on existing ones.

Picture 5: Positive elements of T-com/Tmobile's presence in the media


To sum things up for Tcom/Tmobile, we have:
  • Problematic dividends, DT's pressure on Slovak government (rows 3 and 5).
  • Emergency line crash and fine from authorities as consequence (7th row).
  • Better and cheaper roaming services (summer is here) for existing customers (2nd row - terms "zakaznik"customer/roaming/"vyhodne"(better price)).
  • Cheaper smartphones and tablets (popular Samsung galaxy) of course in connection with customer acquisition for pre-paid services (9th row).
  • Coverage expansion of T-mobile HSPA network (for example in towns of Kolarovo, Poltar, Fiľakovo etc..) which means better mobile data services, faster internet, etc.. (2nd row - terms "zakaznik"customer/roaming/"vyhodne"(better price)).
Now, let's briefly go through results for Orange:


Picutre 6: Results for Orange Slovakia

  • Orange responded to Tmobile roaming offer attacks (defensive strategy) (row 7) 
  • Company is also offering cheaper smart phones/tablets from popular vendors HTC/Samsung (adding also iPhone 4.0 white) (highlighted row also supported by top window displaying respective articles)
  • Operator pushes hard with LTE network tests with uploads faster than 100 Mbit/s (rows 3 and 10)
As we can see, there is almost no negative flavor in the news about Orange (well, we can see spot mentioning of network outage (second row on the previous picture – term “zlyhat”), but since it is mentioned only once we will not consider it as relevant in this demonstration (of course in the real "production" analysis we might treat it differently).

Finally, we can take a look at Telefonica, which is the youngest and smallest player-operator in Slovakia:
Picture 7: Results for Telefonica Slovakia

From the results we can see, that:

  • Operator is not competing in the same area (flanking strategy) as T-mobile/Orange, rather it focuses on pre-paid offers of free SMS messages for 30 days following the day of renewal of pre-paid credit. Highlighted row on picture 7 containing terms like “zakaznik”(customer)/”predplateny”(prepaid)/”kredit”(credit). Supporting articles are displayed on the right side of the picture.
  • Telefonica is dropping O2 from the official company/brand name (as can be seen on the previous picture, 4th row)
  • Row 6 take us to very interesting topic. Twitter account - @o2slovakia - which from it’s inception  truthfully informed about all special offers by O2, suddenly displayed controversial status “Forget about O2, we’re just lying and bullsh***ng you. Go for Orange or T-mobile, for quality for better price…”. This of course on the first sight appeared as ordinary hack into the account of mobile operator on the popular social network, but Telefonica quickly informed, that the account was not maintained by them and therefore from the beginning provided unofficial information. After contacting the social network operator for the brand name abuse, the problem account was removed. We can only speculate, that this might be a revenge for the aggressive tone that Telefonica used against it’s competitors for example here following the principle that stronger players do not attack weaker players... at  least not officially…    

Picture 8: Results for articles discussing attack on O2 brand
When speaking about Telefonica, it is also interesting to take a closer look on one more topic. Virtual opertor (based on Telefonica) - Tesco mobile - attracts price sensitive customer with offer to call “only for one cent”, marketing the idea aggressively as “A price revolution in Slovakia”. The name of the player behind the campaing remained hidden for the whole duration of the “heating” time (the period when only promising marketing messages were available with no real offers to compare with). After gaining momentum and attracting customers, campaign quickly fade away (after revelation of full offer details of the offer which were of course not so revolutionary). Recognizing signs and techniques of guerrilla style fight? Picture 7, 7th row, terms tesco/”virtualny” (virtual)/”jeden”(one)/”volanie”(calling).

With the help of text analytics, generals fighting in marketing wars can decipher tactics of the oponents even from publicly available sources and take appropriate actions. What is more important – topics created from text analysis can be used to measure overall positive vs. negative image (or position) associated with the brand and it’s impact on customers. The next part of the article is beyond of text analytics alone - but we need to go through it just to illustrate possibilities of using TA – so please take this only as really illustrative example. Using this approach in real-world will require more detailed models and computations.

Let’s assume right now, that all discovered topics will be grouped in the following categories and later marked with respective positive or negative score according to this table (model):

Picture 9: Model that will be used to categorize discovered topics for each operator


For example the LTE showcase topic by Orange will be categorized under the category “Perception of operated network” and will get +5 points. If we cannot put a topic into any category from our model, we will treat that topic as “generic” one. Running briefly through all topics discovered showed above, we might  yield the following table.

Picture 10: Topics and points that were assigned to them

The table says that Orange is clearly leading (for the period in which we reviewed articles) with T-com/Tmobile finishing second and Telefonica third. Results can be visualized as a pie chart (which is of course more suitable especially when you’re going to present results to someone else).

Picture 11: Visualization of results after category assignments, not taking frequency into account

As the heading of the previous chart suggests, we should also consider frequency of articles supporting each topic. We can assume, that the more frequent a topic is in the media, the more significant impact it has – both positive, or negative. Table with topics and frequencies (if frequency is not detected, number “1” is used instead of zero value) follows.

Picture 12: Frequencies of topics

Finally we can compute the resulting table, where values are dot product of frequencies and original category points from model.

Picture 13: Topics and points assigned to them taking frequencies into computation model

Now we can see, that situation remained unchanged for Orange, but Telefonica and T-mobile switched positions, where the latter lost its position due to the higher frequency of negative topics (emergency line crash, problematic dividends) in the media. Please note, that the computation model presented here is tailored for the purpose of this demonstration and therefore is not suitable for real deployment. For example in the real life the impact of frequency should be weakened (for example with some weight) not to influence the final results too much.

Picture 14: Visualization of results after taking topic frequencies into computational model.

So far we have demonstrated how TA can processes raw data (free form text articles) and turn them into valuable information, that can be later evaluated, visualized and if necessary measured for longer time periods to discover trends.

Picture 15: Hypothetical graph visualization of trends in brand perception of mobile network operators.

Picture 15 displays purely hypothetical graph, on which we can see that T-com/T-mobile “might” systematically strengthen its position over time (showing that their marketing and PR guys using TA in order to succeed in their fight against competitors). Trend visualization is just one example how to use the newly derived information, but possibilities are endless (for example enhancing existing BI prediction models for existing customer segments, etc...).

I hope that the marketing war trilogy helped you to better understand how you or your company can benefit from usage of text analytics/mining techniques right now and in real-life scenarios. Again, I will appreciate your feedback as comments under the blog.