Mining Text to Understand Your Culture Puzzle

It is estimated that 80% of data in the world exists in the form of documents, reviews, blog posts, emails, and articles. Unlike numeric data, text-based data is unstructured, making it more difficult to identify themes and trends across different media. However, in our work we found that text-based data conveys a number of attributes (such as values, beliefs, perceptions, needs, etc.), which cannot be expressed in traditional data sources. For organizations that want to understand their culture, text-based analysis is an often overlooked, but critical piece of the puzzle.

While traditional statistical methods are not as effective with text-based data, there are a number of methods to help us sift through this information. Furthermore, when we couple traditional and text-based analytical approaches, we have a bigger lens to understand our client’s organization.

To give an example, I used Glassdoor.com to understand employee attitudes at a large multinational airline. There were a total of 30 reviews, which were posted from 2008 to 2014. The reviews were from employees at 14 locations in 12 different countries. The reviews included numeric ratings of the overall organization, its culture, career opportunities, work-life balance, senior management, and compensation and benefits. In addition, the reviewers included a summary of the pros and cons of working at the organization.

Using text mining methods, we can sift through the unstructured data to understand overarching themes and attitudes. In the text cloud above, size corresponds to how frequently the word appeared across the reviews. Color corresponds to whether the word was positive (green) or negative (red). While this is one of the more basic approaches, a couple of key concepts emerge. We find words such as “flexible”, “friendly”, “security”, and “interesting” which may indicate that the airline has a friendly working environment and provides a certain degree of job security. On the other hand, there are other words such as “bureaucratic”, “cheap”, “worsening”, and “slow” which may indicate that there are some cumbersome processes within the organization.

This particular approach looks at words in isolation based on their frequency and sentiment; however, we can also look at clusters of words (ngrams) to clarify these themes. The most frequent word clusters include: “low salary”, “quality service”, “management good”, “learning opportunity”, and “great experience”. This may indicate that the overall management is received well, but the overall salary and compensation is low. This latter point might be why a number of reviewers indicated their experience was great but was more of a “learning opportunity,” indicating that they may have moved on to other opportunities. This text mining process saves us and our client’s time, provides an overview of the themes and attitudes, and point our culture overview toward key areas for further exploration.

On the other side of the spectrum, we can look at the scores and ratings to find patterns and themes. The map above shows ratings by location. Color corresponds to the airline’s overall rating, while the size of the dots corresponds to the employees’ attitudes toward the organization’s culture. One interesting pattern is how the ratings were higher in Eastern Europe and South Asia versus Germany, Switzerland, and New York. From one perspective this can be an important indicator of locations where issues might exist; however, for multinational companies it is also important to consider how values, attitudes, and assumptions about the work experience may change from country to country. Employees in Switzerland may have very different expectations than employees in Greece, Russia, or the Philippines. These are important factors to consider when bringing together different people (whether two offices down or two thousand miles away) to address common challenges.

Understanding culture is a lot like a puzzle. There are a lot of pieces. No one piece is exactly the same, and they all fit together in a unique way. Leveraging a variety of data types and data sources point leaders to key pieces that can make the picture come to life. In today’s competitive market, companies need to leverage the full range of tools at their disposal to orient their organizations for long-term success.