Chris Diehl

    • About Me
    • Motivating Challenges
    • Publications
    • Presentations
    • Blog
    • Contact

    02/17/2013

    Bias and Self-Interest: Timeless Constraints in the Era of Open Data

    Senseless tragedies such as the recent Sandy Hook school shooting in Newtown, Connecticut give us pause as a nation and leave us struggling to understand how these acts happen all too often. With each loss, I begin to wonder if we are learning as a society. Are we truly reflecting on the problem or are we trapped in the same patterns that prevent us from moving forward? A closer look at how we respond hints at an answer.

    In the early hours after the shooting, the emotional impact on the nation felt severe. My Facebook and Twitter feeds were consumed with sadness and dismay as we attempted to understand the details of the situation. Once the narrative became clearer, it ignited feelings that were hard to ignore. Such cognitive priming is a powerful force. One whose effect I believe we often underestimate.

    Within short order, the calls for action began. The national conversation pivoted to gun control with such blistering speed and intensity that many sensed a moment for change. We fixated quickly on the implement of the devastation, the assault weapon, a prominent aspect of the event that is easiest to comprehend. Focusing on the obvious and the available gives us a place to start. When attempting to find answers and take action, we are not prepared to take on the subtle and less visible forces at play. Where do we even begin on those fronts? The impact of guns on society should be easier to understand and discuss, right?

    Well maybe not.

    I started searching online for open data to understand the relationship between the availability of guns and gun deaths. My search reminded me of the difficulty of finding specific open data to address a particular question. We are awash in data sources these days; yet they are scattered and varied in terms of their utility.

    During my search, I stumbled upon the following figure depicting intentional firearm deaths per 100,000 people versus the percentage of households with a firearm. It showed the relationship I was looking for and confirmed what I suspected. I posted this to Facebook and got mostly comments wondering why the US is such an outlier. One friend went down another path: questioning the conclusions to be drawn from the figure itself. He wanted to know how many innocent civilians were killed versus criminals and how many lives were saved by a citizen having a gun during a crime. I knew where this was going. The common refrain from the pro-gun lobby is that society stands to benefit from citizens having weapons to deal with these situations when they unfold. Therefore his natural response was to question while mine was to accept; we each remained consistent with our prior beliefs.

    Gunchart

    Later on I started to reflect on whether my faith in the result was justified. Where did this data come from? Who put this figure together? What are their beliefs, intentions and motivations? Is my friend's skepticism warranted? 

    After several days of reading numerous news articles from my preferred sources, with facts and figures galore, I was tired. I could not digest any more. I disconnected for a bit and gave myself space to reflect. During this time, another question came to mind: with instant access to global dialogue, reporting and open data, am I afforded more opportunity to better understand complex issues today over years past? I'm not convinced we've made a significant improvement. In fact, one might argue the situation is worse. 

    Much of the content I consumed merely reinforced my prior beliefs in those early days. Some of the national dialogue I listened to began to ask the deeper questions but with little reflection on answers. Wicked social challenges are wicked precisely because of their complexity, where nonlinearity, interdependency and uncertainty weave together to create systems defying simple explanation. In the face of this wicked social challenge, I ultimately realized I only knew one thing with certainty: an assault weapon with a high-capacity clip in the hands of a skilled individual can quickly rain devastation down on others.

    While some open data advocates speak of a brighter future enabled by greater perspective on our world, I cannot help but fixate on timeless constraints such as incentive structures and cognitive biases. When data does not threaten a particular position or worldview, the underlying reality portrayed by a data source stands a greater chance of being broadly accepted and potentially influential. When data threatens a position or is presented by a party with a particular agenda, questions will surface. Others with counter with data of their own. How does open data somehow mitigate questions about provenance and legitimacy? Does it stand to enable greater impact on public discourse? On issues where the stakes are perceived to be high, I'm skeptical. 

    The pattern of self-deception in a complex environment seems routine these days. Many are unaware or, in some cases, unwilling to admit that there are fundamental limits to what is knowable about our world. Meanwhile the desire for plausible, causal narratives remains. Without conscious effort, we are predisposed to extrapolate from the few and the available while seeking coherence. In today's media environment, coherence is easier than ever to design given the plethora of voices one can choose from. This leaves cause for concern. 

    Given the impact of cognitive biases and self-interest, along with fundamental limitations on what we can understand, I wonder about the best ways to approach public policy challenges. At the end of the day, decisions must be made, resources allocated and actions taken. How do we design measurement approaches that allow us to learn in some manner that is robust to these confounding factors? It seems we're still wanting for answers.

    Posted by Chris Diehl at 8:46 AM in Cognitive Psychology, Social Psychology | Permalink | Comments (6) | TrackBack (0)

    01/15/2013

    Cyber 2012: Crossing the Threshold

    The evolution of the cyber domain is as fascinating as it is concerning. It is a perfect example of the complexity we constuct in our modern world that provides significant asymmetries in favor of those who choose to attack the system. In 2012, we witnessed the continued evolution of collection and attack capabilities to unprecedented levels. I thought it would be worthwhile to review some highlights from the past year that in my mind only bolster the case for strengthening community resilience before high consequence events occur.

    The most notable cyber event of 2012 in my opinion was not a particular cyber act but rather an admission. Through this particular New York Times article by David Sanger, US officials confirmed the existence of a long-standing offensive cyber operation designed to degrade Iran's nuclear enrichment program. Operation Olympic Games was responsible for the design and development of Stuxnet, malware specifically designed to target computer systems within Iran's enrichment facilities. Based on available reporting, it appears Stuxnet worked as designed after being delivered to the target network that was isolated from the Internet. The United States apparently went through great lengths to map the targeted network and then to design and test malware tailored to the particular control systems present on the network.

    With his decision, Obama realized he was establishing precedent in a new era of warfare. One that in some respects bears resemblance to the nuclear era. Within the cyber domain, there are a number of entities exercising mature offensive capabilities. At the same time, no one is in a position to credibly defend their networks against advanced adversaries. The lack of credible defense leads us to discuss active defense: the use of offensive action to defend against an identified threat. This begins to sound like mutually assured destruction from years past. Unfortunately the situation is complicated further by the lack of ability to attribute cyber attacks to their perpetrators. We stand vulnerable to counterpunches by others and won't necessarily know where they've ultimately come from.

    For some time now, attacks against our infrastructure have been a concern but not one that has garnered as much attention as it deserves in my opinion. In 2012, the tenor of the messaging seemed to change. This may have simply been a reflection of the desire to get new cybersecurity legislation passed. Yet whatever the motivation, it was very interesting to see senior government officials at the 2012 Aspen Security Forum speaking openly about the nature of the threat. In his hour long interview at the event, NSA Director General Alexander made specific reference to nation-state and non-state actors actively attempting to attack our critical infrastructure in ways that, if successful, could bring down key services for "weeks or months." To hear his specific statement, listen to the following video starting around 13:00.  

     

    Alexander emphasizes the obvious in his remarks. Offensive actions of the nature we are employing today, such as those taken against SCADA systems, will at some point be directed back at the US. Cyberweapons that the US releases into the wild will be studied by others. Those lessons learned can then be leveraged in new cyberweapons developed by adversaries. If public reports are correct, Iran took little time to respond with attacks such as the Saudi Aramco computer sabotage incident. More will undoubtedly follow. 

    Given this threat context, it was encouraging to see new signs in 2012 of a growing acceptance of the following reality: the Internet's complexity is beyond comprehension, thereby establishing a persistent asymmetry that can only be dealt with by redesigning the fundamentals of the network. Clearly this is an activity that will take time but at least movement is underway at DARPA to reconsider foundational design principles. This presentation from the November 2011 DARPA Cyber Colloquium gives some perspective into their thinking on the nature of the cyber challenge. 

    Matternet_009_large

    In the meantime, we are left to manage downside risk in a system we will never fully understand. The construction of additional complexity continues with the march toward the Internet of Things. As an example, consider Matternet, a bold vision for leveraging advances in unmanned aerial vehicles to dramatically transform the world's transportation network with autonomous delivery of small payloads. In areas of the world where transportation networks remain woefully inadequate, such a network could have dramatic impact. From a security perspective, it undoubtedly introduces new challenges and concerns. I look forward to seeing how these tradeoffs are managed, assuming they are given the recognition they deserve. 

     

    Posted by Chris Diehl at 8:02 AM in Cyber | Permalink | Comments (0) | TrackBack (0)

    12/03/2012

    Grappling with Uncertainty

    This year, I've had the opportunity to speak on several occasions with members of the national security community about organizational resilience in an uncertain world. These presentations have mainly been a continuation of musings that I began last year, leading me to write these two earlier posts on the subject. During these presentations, my aim has always been twofold: to make the audience more cognizant of the limits we face when attempting to reason about our world and to provide a general frame for thinking about operating in an environment that is fundamentally unpredictable. 

    Within the national security community, many are required to reason about the future of complex phenomena in our world. Far too often, they are forced to make predictions about future outcomes that are unpredictable. The need to make such predictions often stems from a slow response cycle. Therefore risk is compounded by the implications of an incorrect prediction coupled with an inability to rapidly respond to emergent realities.

    Nowhere is this risk more obvious than in the Department of Defense's (DoD) acquisition system. In major acquisition programs, it is not uncommon to see program life run 10-20 years from initial requirements definition to full scale production. Therefore the DoD is required to speculate about the future threat environment many years into the future. Enormously expensive programs become conditioned on a set of hypotheses that won't be fully tested for years to come. 

    One of the clearest examples of a flawed prediction came in the U.S. Army's Future Combat Systems (FCS) program. Starting in the early 2000s, there was only one program within the Army: FCS. FCS represented the future of how the Army would fight. One of the core tenets of FCS was that there existed a tradeoff between information superiority and armor requirements that could be exploited. With the presence of organic intelligence, surveillance and reconnaissance (ISR) assets in the FCS brigade combat team, it was believed that 60-ton armored vehicles could be replaced with 20-ton armored vehicles. These ISR assets would provide the necessary information superiority to mitigate the risk of increased vulnerability. 

    What happened following our initial decisive victory in Iraq in 2003? The asymmetric threat emerged in the form of the improvised explosive device (IED). In classic form, the U.S. scrambled to find a technological solution to this problem to improve our ability to detect the threat. Unfortunately that proved to be no easy task. IEDs were cobbled together with whatever resources were available. In addition, employment tactics were constantly evolving. We were faced with a tenacious adversary that was constantly probing and experimenting. Any detectable and exploitable regularities could be lost in short order. We were not learning nearly as fast as our adversary. 

    The implication of this vulnerability was clear. Our lack of situational awareness meant 20-ton vehicles were severely threatened. Later on, even 60-ton M-1 Abrams main battle tanks were being compromised by powerful new IEDs. With their destruction lay the wreckage of the FCS hypothesis. 

    M1

    More recently, another potentially threatened hypothesis surfaced in the press. Earlier this year, German Eurofighter Typhoons caused a bit of a stir during an air-to-air combat exercise when facing off against U.S. Air Force F-22 Raptors. The Typhoons reportedly were able to get within visual range of the Raptors and defeat some of them. The Raptor by design is meant to detect and engage threats beyond visual range (BVR), capitalizing on stealth, advanced sensors and weapons to strike first without warning. Therefore initial reporting of the Typhoons getting in close and killing the Raptor raised a lot of questions. Some contend that the Red Flag engagements were contrived to stress the limits of the Raptors and therefore provide no new insights into the relative performance of the Raptor. This seems consistent with the Germans' comments about the Raptor's overwhelming capability BVR. Others suggest a simple disagreement on outcomes. 

    F22 Typhoon

    While the Raptor advantage may not be currently threatened, the point is that 'silver bullet' technologies can quickly become obsolete with unforeseen advances in technology and tactics, negating enormous investments to bring them to fruition. The ability to adapt quickly to shifts in the environment is key to avoid severe downside risk associated with flawed predictions.

    In discussions following my presentations, the reactions from members of the audience were quite illuminating. Some audience members were clearly in denial about the limits we face. When pressed with further examples, continued resistance ensued. For others, the message clearly penetrated but uneasiness followed. Strategic planning lies at the core of many government organizations. Calling such a central operating principle into question left some wondering how to proceed. Others clearly called out the weaknesses in the acquisition system and felt saddled with a reality that left them few options.

    Discussions like these hopefully seed the landscape for further reflection and reform. Thankfully others are championing the need for a shift in perspective. I'm concerned about the lack of awareness of our limitations that results in government programs seeking magical technological solutions to remove the ambiguity that lingers on the road ahead. More often, we need to turn our gaze inward to reflect on our own processes to understand how they aid or hinder our performance when wicked challenges arise.  

     

    Posted by Chris Diehl at 9:51 PM in Resilience | Permalink | Comments (0) | TrackBack (0)

    05/15/2012

    Building Relationships for Social Good

    For some time now, I've firmly believed that the novel opportunities for innovation lie at the boundaries between disciplines. My word choice in this statement belies the fact that I've spent many years up until recently in the research world. As I reflect on this now, I prefer the more general statement we obtain when replacing 'disciplines' with 'communities'. New opportunities become apparent when ideas and resources that are constrained in separate communities are allowed to mix. Often their mixture is the catalyst for new perspective and, in some cases, new steps forward.

    This is precisely the belief that drives Jake Porway in his mission to bridge the divide between those that work with data and those tackling some of the world's toughest problems. Jake originally started Data Without Borders (now DataKind) after his dismay over seeing the clear separation between those with deep data analytic skills and those with deep perspective on compelling problems. In one community, he saw creative people looking for new challenges; in the other, he saw people with a clear, unmet need for talent to support data-driven decisionmaking. With his passion for the issue and fellow compatriots at his side, he continues to experiment with various mechanisms to increase substantive connections between these communities.

    As their first foray into bridging the gap, Jake, Craig Barowsky and Drew Conway began organizing and leading Datadives in major cities across the US. The goal of the datadives is to connect volunteers with social sector organizations for targeted collaborations over a weekend. Teams of volunteers have produced impressive capabilities in short order to everyone's delight.

    I had the pleasure of serving as a data ambassador with Jake and Drew during the San Francisco Datadive, leading a team of volunteers working with the non-profit organization Mobilizing Health. Not surprisingly, during the event, we spent the bulk of our time getting familiar with the data and generating structured forms of the data to address the problems we had framed. In many cases, avenues of exploration we pursued were not terribly interesting. Yet, near the end of the event, some opportunities became clear. So we pushed to get as many results and insights captured as possible. 

    At the end of the event, I was pleased with what we managed to pull together. We had defined a meaningful direction to explore and had initial results to defend its promise. At the same time, I was disappointed in that we had no more focused time. As all of us volunteers headed back to our busy lives, many of the shared insights and bursts of innovation would likely not be leveraged. Clearly that is to be expected in most cases. The question is can we do better?

    I recently met up with Jake again at the Omidyar Network Executive Forum for a data workshop that he organized. During the event, Jake, myself, Dave Gutelius, Josh Wills and Rufus Pollock engaged with representatives from various social sector organizations to help them think about their data challenges and offer insights on how they might extract more value from their data. It was a very fun event yet far too short.

    After the event, Jake, Dave, Josh, Rufus and I talked about the challenges of bridging the divide and the possibilities for innovation. As we talked, my mind began thinking about the issues associated with engagement. From my experience at the Datadive, it was clear that some online environment was needed to allow relationships that formed at the Datadive to continue. Clearly team members could continue working together remotely using available communication mechanisms. The problem is that others could not easily be rallied to the cause. If one assumes volunteers will come and go, there needs to be an environment in which context is maintained and others can discover the problems and associated prior work. Right now relationship development across the divide is mainly catalyzed offline. Allowing those relationships to seamlessly develop, whether offline or online, could be a huge win. Without the persistence that the online environment would offer, I believe it's difficult for any of these relationships to sustain momentum and attract additional contributors. The latter may be critical if most people can only contribute small amounts of time at sporadic intervals. 

    One of the wonderful outcomes of a Datadive is how it uncovers a group of people in a given locale with the common desire to make an impact. During our discussion after the data workshop, we discussed ways to achieve similar outcomes online. As we discussed the problems with some of the government data websites, such as data.gov, it struck me that the web servers for those sites have another view of the people, as represented by IP addresses, that share a common interest in particular datasets. Why isn't that view made explicit? Why are we not building social features into these sites to facilitate communities to form around datasets? 

    Bringing data, code and people together into a common environment may present significant challenges. Yet if such an environment could be built and used to demonstrate community formation solely through online interactions, the addition of offline relationship formation through events such as datadives would only enhance the outcome.

    Posted by Chris Diehl at 8:54 PM | Permalink | Comments (0) | TrackBack (0)

    11/20/2011

    Pursuing Meaningful Questions

    Recently I read the description for the upcoming modeling challenge associated with the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction (SBP 2012). Here's an excerpt from the announcement:

    Social media has inarguably played a key role in facilitating information dissemination in numerous real-world events including citizen protests driven by economic (e.g., the Occupy Wall Street, Greek's the Indignant Citizens Movement) or socio-political crisis (e.g., Egyptian revolution, the Arab Spring), disaster recovery and response (Haiti earthquake), political campaigns, and many others. Prior to such events, the social network is virtually non-existent and emerges dynamically at an overwhelming pace afterwards. A similar phenomenon could be observed during the outbreak of an epidemic. Identifying the nodes that spread the information (epidemic) fastest, early on before the network stabilizes, could have a significant impact in decision making. Companies could also find this problem significant during early product promotions.

    Considering the significance of the above-mentioned problem, the theme of the SBP 2012 challenge is to model diffusion in rapidly evolving networks. Submissions will be evaluated based on theoretical grounding as well as experimental evidence. 

    My initial reaction to the challenge can be summarized with one word: disappointment. I find this challenge emblematic of a significant body of research on social media that continues to pursue answers to questions that are misguided or poorly defined, leading to little practical benefit. Without the context of broader challenges, social media research will likely continue to generate results that fail to advance our understanding and deliver actionable insights. 

    How can we move from the current pattern of research to more impactful results? Within government circles, a shift in mindset is needed. Far too many program managers believe prediction is possible with enough data and computation for a range of network analytic challenges. They need to shift focus from predicting the future to understanding the now. Deriving timely situational awareness from large volumes of data remains a daunting challenge. The arrival of the real-time social web has made the problem no easier. 

    Once the requisite shift in priority from prediction to awareness is made, a host of additional questions need to be addressed. Consider the assertion made in the announcement:

    A similar phenomenon could be observed during the outbreak of an epidemic. Identifying the nodes that spread the information (epidemic) fastest, early on before the network stabilizes, could have a significant impact in decision making.

    What decision making process are we talking about here? This announcement is similar to many other statements about expected utility. Vague claims are made with limited context. No vision is advanced that clarifies the larger problem being addressed and how derived results help us move toward a solution. 

    In fairness to academic researchers, they don't usually have the context to assemble a rich narrative about the larger problems. Constructing a vision of transformational impact takes time as it requires significant discussion between domain experts who understand the problems and methodological experts who understand the approaches that might support novel, impactful solutions. Communication and coordination among these disparate groups remains a costly endeavor even in the era of social media. Incentives are such that reaching across the divide is not the norm. New approaches are needed to shift that balance.

    Government program managers need to lead with appropriate vision and incentives to assemble the multi-disciplinary teams that are absolutely required. Experimentation is needed to find paradigms that encourage the development and sustainment of productive partnerships. Partnerships that can move from transformational vision to prototypes and back as ideas are refined.

    In the absence of leadership from the government sector, those motivated to make an impact need to begin assembling the larger picture and framing the core challenges. If we focus on the examples given in the announcement that involve the use of social media in struggles between a populace and government, a larger set of questions can be defined using insights from news reports.  

    At the moment, governments here in the US and abroad clearly are disadvantaged in this new information space. In struggles abroad, that asymetric advantage was celebrated by the US as protestors across the Arab world rose up against oppressive regimes with astounding results. Now city governments in the US wrestle with the challenges of responding to dynamic, seemingly leaderless crowds associated with the Occupy movement. Feeling the pressure, we are seeing government responses increasingly turn violent when options appear limited and frustrations run high. 

    Within this context, how can social media analysis potentially generate options for governments other than violence? When violence erupts, communication has clearly broken down. From the government perspective, in some cases, they have little idea of who to negotiate with to shape conditions on the ground. Can analysis of social media flows give them perspective on who they should be communicating with? Are their sources of information that are clearly influencing conditions on the ground? How does one uncover such influence?

    Giving governments this type of insight is clearly a double-edged sword. With a government actively trying to avoid clashes in the streets, communication options may be embraced and leveraged for the common good. This insight might also be used to degrade the network by attacking the key sources of information influencing the behavior of the crowd. This raises further questions about how to create options for dialog without indirectly incentivizing undesirable outcomes. Are there novel ways to protect freedom of speech and assembly while minimizing impacts on public safety and security?  

    Framing analytic challenges in the broader context of the societal challenges we face today will challenge us to think differently and reach for new perspective. Network insights only create meaningful opportunities if those insights aid us in making better decisions. Let's reach farther to create and ultimately capitalize on opportunities for greater societal impact.

    Posted by Chris Diehl at 1:10 PM | Permalink | Comments (0) | TrackBack (0)

    08/26/2011

    Optimizing Organizational Performance in an Uncertain World - Part 2 - Measure and React

    In a January 2011 report, the Economist Intelligence Unit published results from a survey of 300 senior executives from around the globe. The report highlighted a number of key challenges that businesses are grappling with as they face increasing complexity in the business environment. According to the survey, much of this complexity is driven by the increasing expectations of the customer. Global businesses serving diverse markets are expected to provide tailored products and more responsive customer service. Meeting these specialized needs while adapting to constraints within each marketplace has complicated decision-making. Meanwhile the timelines for comprehending the risks and making decisions have only grown shorter. How can organizations perform more effectively when faced with these challenges?

    In Everything is Obvious, Duncan Watts highlights the work of Michael Raynor in The Strategy Paradox in which Raynor states that business leadership often spends too much time focused on execution as opposed to managing strategic uncertainty. Raynor suggests that business leadership should focus solely on planning to mitigate strategic risks while managers are left to focus on day-to-day execution. Raynor recommends a variation on traditional scenario planning which he describes as strategic flexibility. In this approach, planners devise an expansive set of hypothetical futures along with optimized strategies tailored to each hypothesis. Then the set of strategies is dissected to identify the elements that are common across the set.  Strategic flexibility involves devising strategy that incorporate the common elements while hedging the risks that are specific to the various hypotheses.

    Strategic flexibility, along with more traditional forms of scenario-based planning, requires predictions about the future. Often such predictions are made by a small number of individuals reasoning about environments they mistakenly believe they understand. Due to ever-present uncertainty in complex environments, these predictions, no matter how well-conceived, leave businesses vulnerable to high consequence events that cannot be foreseen (Taleb's black swans). As Watts explains, comprehending the black swan a priori would involve not only successfully predicting its occurrence but also predicting the environment in which the black swan unfolds. Only then would the significance of the black swan be understood. When faced with these constraints and others, alternative approaches are needed to avoid misguided overconfidence that rests on predictions of the future.

    One alternative that Watts advances involves shifting focus from the future to the present. If the future cannot be reliably predicted, one should focus instead on rapidly adapting to changes in the present environment. On the Web, the implementation of measure and react strategies by companies such as Google, Yahoo! and Facebook is now routine. Watts provides the example of the fashion company Zara as a non-Web technology company capitalizing on this strategy. Instead of attempting to anticipate fashion trends, Zara observes what people are wearing now and generates a range of ideas that are variations on the current trends. Then they make a series of small batches of garments to sell in a variety of markets. Based on the responses they observe in stores, Zara rapidly produces more of the successful garments and drops those that fail to gain a significant response. What makes this all possible is Zara's agile production which allows them to move from a design to selling a garment in stores in a little more than two weeks. 

    Zara is an example of an organization that is able to efficiently test hypotheses in the marketplace with relatively minimal investments of capital and time. By shrinking the cycle time associated with hypothesis testing, an organization can explore a larger range of possibilities prior to committing to a course of action. This type of agility and adaptation is routine within the startup world; yet it is far from commonplace in larger organizations. We tend to ascribe this lack of agility to the burdens of communication, coordination and decision-making in large-scale organizations. Although it is unclear how well we understand the underlying factors at play. How do we more routinely break from the current status quo that seems to emerge as organizations scale? How much can social media improve the dynamics of the organization? Is the status quo at organizational scale mainly driven by fundamental human tendencies? Ones that are difficult to comprehend due to complex dependencies and unobserved forces? Social psychologists are uncovering the complexities within the self and how much our unconscious minds drive our behavior. How those processes affect outcomes in group settings will likely be the subject of research for some time to come. 

    While research continues to explore the emergence and impact of social and cultural constraints on organizational performance, experimentation can continue along the social media dimension to explore improved ways to minimize the cost of communication, coordination and decision-marking. Social media within the enterprise provides new opportunities to connect employees with one another and with ideas emerging from varied quarters of the organization. The enduring challenge is how to provide the user capabilities to manage the cost-benefit tradeoff. The cost to the user comes in terms of time and attention. The benefit of a particular attention portfolio is more challenging to assess, especially when attempting to engineer serendipity. Defining the cost-benefit tradeoff more explicitly is the first step in the design of systems to improve the user experience.

    By building social media infrastructure that adapts to the changing needs of the enterprise, one hopes to expose decision-makers to people who are closer to the problems that they are trying to address. Too often, the decision-makers are distant from the realities on the ground and therefore lack the appropriate context to understand the ramifications of a course of action. Watts highlights the idea of planning as knowledge aggregation and discusses the many forms that this can take. Leveraging local knowledge that may lie within or external to the organization first requires one to accept that context matters and those close to the problem are best positioned to define the solution. It seems clear that we can develop technology to reduce the cost of knowledge aggregation. The most daunting challenge may be to convert traditional strategic planners to this uncommon mindset.

    Posted by Chris Diehl at 11:56 AM | Permalink | Comments (0) | TrackBack (0)

    06/21/2011

    Optimizing Organizational Performance in an Uncertain World - Part 1 - The Limits of Prediction

    How can technology enhance the performance of organizations? How does one characterize organizational performance in a world where our ability to predict is highly constrained? These are two general questions that have been on my mind for some time now. In an effort to clarify my thoughts on the matter and broaden the discussion, I thought I would summarize my view to date, providing the first of what I expect to be a series of blog posts on this general topic. In this post, we begin with some reflection on the limits of prediction. A thorough understanding of these limits will aid us in identifying meaningful measures of organizational performance and approaches to optimizing them.

    My perspective has been influenced significantly by the writings of Nassim Taleb and Duncan Watts. "The Black Swan" was my initial exposure to Taleb's thoughts on the limits of prediction. Taleb's Edge essay "The Fourth Quadrant: A Map of the Limits of Statistics" is an excellent continuation of this thread. Duncan Watts' latest book "Everything is Obvious: Once You Know the Answer" takes a critical look at common sense reasoning to explore the pitfalls we encounter when attempting to understand complex phenomena. Taleb and Watts share a common interest in behavioral predispositions that lead us to repeated surprise. Watts delves into these issues deeper and considers the available options for planning given the limits of prediction. Not surprisingly, there are no obvious solutions and many questions remain. Yet a significant first step is simply understanding the limits of prediction so as to avoid being deceived by randomness. 

    One does not have to look far within the research community and elsewhere to see the overconfidence in mathematical and statistical models. I have watched with increasing concern the development and application of models of complex social phenomena. The field of social network analysis has seen an explosion of activity in the last five years as social media data has transformed the study of large-scale social systems. Many scientists and mathematicians have entered the space with the objective of uncovering general patterns in social systems that can be used for prediction. Some have made unreasonable claims about the range of predictions one could make, thus leaving the impression that present limitations will be resolved with more data and computation. The danger comes when such models are used as guides for future actions. I have pressed some program managers in the past on the implications of their approaches. Some remain confident in the validity of their methods. Others consider a marginal methodology better than none at all when forced to make decisions in high risk scenarios. Far stronger doses of skepticism are still needed in my opinion. 

    To appreciate the futility of predicting the future state of complex systems such as social networks, it is instructive to consider the difficulties we face when addressing prediction tasks that appear quite tenable. Watts provides a telling example in the context of the ultimatum game. In the ultimatum game, two players interact to decide on a given split of a sum of money. One player proposes a split and the other accepts or rejects the offer. If the offer is accepted, both players receive their proposed share of the money. If the offer is rejected, neither player receives money. For such a game, it seems reasonable to expect that the player making the proposal will offer a split that may be in his or her favor but not too egregious as to seem unfair to the other player. Yet what is viewed as reasonable and fair turns out to vary widely across cultures, leading to some surprising outcomes depending on your perspective. A study was conducted where the game was played in 15 small-scale, preindustrial societies across five continents. These experiments showed a wide variation in outcomes. At one extreme, even very low offers were readily accepted without resentment. In other cases, "hyperfair" offers, where the proposer would take a minority fraction of the money, were rejected as frequently as unfair offers.

    What happened here? Watts explains:

    "As it turns out, the Au and Gnau tribes had long-established customs of gift exchange, according to which receiving a gift obligates the receiver to reciprocate at some point in the future. Because there was no equivalent of the ultimatum game in the Au or Gnau societies, they simply “mapped” the unfamiliar interaction onto the most similar social exchange they could think of—which happened to be gift exchange—and responded accordingly. Thus what might have seemed like free money to a Western participant looked to an Au or Gnau participant very much like an unwanted obligation. The Machiguenga, by contrast, live in a society in which the only relationship bonds that carry any expectation of loyalty are with immediate family members. When playing the ultimatum game with a stranger, therefore, Machiguenga participants—again mapping the unfamiliar onto the familiar—saw little obligation to make fair offers, and experienced very little of the resentment that would well up in a Western player upon being presented with a split that was patently unequal. To them, even low offers were seen as a good deal."

    This example highlights that common sense knowledge has a social context. To acquire that knowledge requires one to participate in the society. Without such an experience base, opportunities abound for misunderstanding and surprise.

    Consider how often we apply our common sense knowledge to make sense of large groups of people distant from our daily lives. As we digest the world news, we can't help but shape narratives based on our limited perspective. From these narratives, we are then compelled to derive causal explanations, thus setting ourselves up for the next surprise.

    Another illuminating example comes from a 2010 study conducted by Goel, Mason and Watts where they examine the real and perceived attitude agreement among Facebook friends. The results of their study made clear that participants were very bad at identifying when their friends disagreed with them, even in the context of close friendships. Anecdotal reports from the participants showed surprise over how their friends had perceived them. The authors conducted additional analysis to understand how participants were responding in cases where they lacked specific information about their friend's beliefs. It appears that when in doubt, the participants leveraged stereotypes to make inferences about their friend's beliefs. Without realizing, we naturally fill in details when information is unavailable, leading to misrepresentation and potential overconfidence.

    If we struggle with predicting attributes of our friends, one might expect we should fair better with predictions about ourselves. Social psychologists Dan Gilbert and Timothy Wilson have studied the topic of affective forecasting which is focused on how people think about the future and how they think they will react emotionally to future events that might occur in their lives. On the whole, they claim we clearly have some ability to make forecasts but that we are prone to make certain systematic errors. One of the most common is impact bias where we overestimate the emotional impact of a potential event on our lives. For positive events, we anticipate the gain in happiness will be more pronounced. For negative events, we see a more significant emotional burden in our future. 

    In many ways, we are not well suited for reasoning about the complexities of the world. We are all impacted by a range of cognitive biases that sway our judgement. Some of the cognitive biases, laid out by Watts, Taleb and others include:

    • Hindsight bias - Overestimation of the predictability of an outcome after it is known.
    • Creeping determinism - Tendency to treat a realized outcome as inevitable.
    • Post-hoc fallacy - Inferring causality from a sequence of events.
    • Confirmation bias - Favoring information that supports a currently held belief.
    • Motivated reasoning - Giving greater scrutiny to contradictory evidence then confirmatory evidence.
    • Priming - Impact of a stimulus on future behaviors and choices.
    • Anchoring - Overreliance on a piece of information in decisionmaking. 

    At the same time, we face fundamental constraints in terms of the available context with which to reason about the world. 

    To delve into the implications of the coupling of these biases and constraints, we must begin by clarifying the definition of a complex system. My working definition of a complex system is a system with a multitude of input and state variables that interact often in a complex, nonlinear manner. Even when privy to the complete state of the complex system, the future output trajectory may be highly uncertain due to the randomness inherent in the input signals driving the system. Often we see complex systems exhibiting behavior where even minor perturbations can lead to significant changes in the state and output variables. So even in the ideal (unrealizable) scenario where the system is completely transparent to us, our ability to forecast the future state of the system is very limited due to the accumulation of uncertainty. 

    Compare this now to what we attempt to accomplish when we reason about the world. Whether constructing a mental or algorithmic model of a complex system, we are attempting to identify a representation of the underlying system that allows us to make inferences about its future state. Putting aside the fundamental uncertainty the system presents to us in the ideal scenario, we are now faced with significant observational limitations that only magnify that uncertainty. We are never certain we have uncovered all of the relevant input variables. At the same time, the opportunity to uncover the relevance of a given variable only comes after examples are available that clarify the underlying relationships. This assumes that we've measured the appropriate input and state variables to identify the model. When relevance is only clear in hindsight, it is impossible to ensure that one's observational resources are appropriately focused to uncover those relationships. 

    Revisiting the examples I cited earlier with these constraints in mind, it should be no surprise that we are routinely surprised in such settings. We make predictions with what context is available to us at the time, not realizing what gaps in our knowledge exist. Meanwhile, even when the available context is rich, the fundamental complexity of the scenario at hand can simply be beyond our comprehension. Yet we continue to wrestle with the complexity, attempting to derive compelling narratives that provide an illusion of understanding. Our cognitive biases become involved in that process, providing additional distortion in our view of reality. Even when we attempt to counteract some of those biases through algorithmic inference, the deception continues. History provides only one realization of complex social systems. Unlike certain physical systems that we can meticulously study in controlled settings, complex social systems must be observed in the wild. Therefore we are not privy to multiple trials. Each example witnessed is arguably unique. 

    So where does that leave us? How do organizations plan in environments where the future is unpredictable? The short answer is to focus not on prediction but adaptation. When we truly embrace surprise as the norm, the question we must address is: how do we efficiently pivot when the surprise finally comes? This will be the subject of future posts.

    Recommended reading:

    N. Taleb - "Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets" - 2008

    J. Cooper Ramo - "The Age of the Unthinkable: Why the New World Disorder Constantly Surprises Us And What We Can Do About It" - 2009

    N. Taleb - "The Black Swan: The Impact of the Highly Improbable" - 2010

    D. Watts - "Everything is Obvious: Once You Know the Answer" - 2011

    As always, comments and questions are most welcome. This is an exercise for me to discover how to make these arguments in a clear and compelling manner. Undoubtedly there will be room for improvement. 

     

    Posted by Chris Diehl at 12:04 AM in Cognitive Psychology, Social Psychology | Permalink | Comments (3) | TrackBack (0)

    04/01/2011

    Exploring Complex, Dynamic Graph Data - Part 3

    (Parts 1 and 2 of the series)

    Since my last post, I've been reminded once again of the challenges presented by dynamic graph data. Earlier I wrote about how I'd hoped to exploit graph rewriting operations in Gremlin to efficiently tease out certain classes of communication event sequences in the Enron communication graph. Unfortunately the queries to produce those event sequences are nontrivial in Gremlin even with the abstraction provided through user-defined steps. At present, I think this is due to a mismatch between the representation I've chosen and the capabilities offered by Gremlin. There's no doubt that Gremlin simplifies many operations one wants to perform on multi-node-type, multi-edge-type graphs. Time simply introduces some additional wrinkles given it imposes an ordering that must be respected to obtain valid traversals through the graph structure. The tests to ensure properly ordered traversals turn out to be the source of the complexity. 

    To date, I've had a number of conversations with colleagues about exploring dynamic graph data. I've still yet to uncover a mechanism by which one can explore the complexities of this type of data with relative ease. I suppose I should not be surprised; yet I am to some degree given the volumes of data being produced. It seems we are not yet in a position to uncover the more complex dynamic patterns that we expect to lurk in these datasets without serious effort and luck.

    Left without general approaches for efficient exploratory data analysis in this context, we need to be able to efficiently realize domain-specific analytics to test our hypotheses. We need capabilities to address all layers in the process: persistence, query, analysis and visualization. 

    My experimentation has centered around composing different technology stacks to support this process. The very first technology stack I experimented with was Neo4j + Gremlin + Python + Gephi. Neo4j was a natural mechanism for representing and persisting the Enron data along with the social metadata. Python was an obvious choice for conditioning the Enron data and populating Neo4j. Gremlin offered me the capability to subset and transform the Enron data graph easily and export those results in GraphML form. Gephi allowed me to easily visualize the results and perform further operations on the data to enhance the signals I wanted to see.

    If one wants to move beyond a singular focus on a particular dimension of the data, such as graph structure, it's imperative to explore other options for visualization. I find that a number of the visual forms I want to see require specialized visualization. Ideally I want different projections of the data available to me simultaneously in linked, interactive visualizations. For someone skilled in Javascript, such visualizations are no longer so burdensome to create. Protovis has gone a long way toward minimizing that burden. D3, Mike Bostock's latest creation, looks poised to build on Protovis' success and go even further. Even if you do not envision yourself doing serious infovis development, I think it is worthwhile to pick up some Protovis skills. I find it useful for realizing more complex static visualizations in a browser; thus allowing me to get a better view of multiple dimensions at once.

    Below is a snapshot of a communication ego network browser I put together one afternoon to let me visualize traffic patterns between the ego and alters. It is essentially a series of stacked bullet charts representing email traffic to and from the ego. The colored bars show total email counts. The mid gray bars show the number of emails with the recipient in the To field. The dark gray bars show the number of emails in threads. The number of relationships in this ego network far exceeds what is displayed. Since it is in a web browser, I can quickly scroll and scan the data to get a feel for the patterns.   

    To support all of my analytical needs, there was little question that Python was the best choice for me. Numpy/Scipy/Matplotlib and NetworkX give Python a natural advantage on their own. With NLTK for natural language processing and a host of available machine learning and optimization packages, the scale tips even further.   

    If you are a (J)Rubyist who appreciates the power of Gremlin, keep an eye on a project called Pacer. Pacer brings Gremlin to JRuby, thus expanding development options. The Tinkerpop crew has been busy and continues to develop new capabilities. 

    I'll continue to experiment with different compositions as time permits and needs dictate and share those discoveries here. 

    Posted by Chris Diehl at 5:35 PM in Exploratory Data Analysis, Graph Mining | Permalink | Comments (0) | TrackBack (0)

    01/27/2011

    Exploring Complex, Dynamic Graph Data - Part 2

    (Part 1 of the series)

    In my first post, I advanced the idea that in order to conduct exploratory data analysis (EDA) we need four basic capabilities: persistence, query, analysis and visualization. Moving forward, I want to talk about specific approaches and capabilities I've examined that I believe provide value for EDA. In this post, we'll focus on mechanisms for persistence and query.

    If you've paid even the slightest attention to recent advances in the database world, it should be clear that the array of available capabilities is ever increasing. The NoSQL movement is responsible for many of the newest arrivals. I focused on NoSQL databases because of the appeal of a schemaless database. When conducting EDA, we don't yet understand the patterns in the data that are of interest. In addition, we may have an evolving list of questions we wish to ask of the data. Therefore any mechanism that simplifies data representation and conditioning, so that we may cycle faster in our exploration, is a benefit.

    One of the first questions I considered when attempting to prune the space of candidates is: should I use a graph database? This immediately led to another question: how is a graph database distinct from other databases? Some have advanced the definition that "a graph database is any storage system that provides index-free adjacency." Stated more simply, a graph database is a storage system that captures an explicit representation of a graph. With index-free adjacency, one-hop traversals in the graph become constant time operations, giving graph databases the edge when deep traversals of the graph are required.

    When first considering the question of whether or not to use a graph database, I viewed the question from an either/or perspective where the goal was to choose one database. Later on, it became clear that this was a limited way to think about the problem. Different classes of databases have their use cases at which they excel. It is possible that no one class of database will provide all the capabilities required. When working with social media data that presents both rich structured and unstructured data, polyglot persistence seems more appropriate. This allows one to use a graph database to capture the multi-relational graph structure while another database represents the unstructured attribute data. 

    Having decided to use a graph database to represent aspects of my data, I turned my attention toward graph query languages to understand how this further constrained the viable database options. It is during this search that I discovered the graph traversal language called Gremlin developed by Marko Rodriguez. Marko and several others have been hard at work since November 2009 developing a graph processing stack that allows one to persist and process general property graphs using one of several available graph databases. Currently the stack supports Neo4j, OrientDB, RDF Sail and an in-memory graph database called TinkerGraph. Support for Redis, InfiniteGraph and DEX is planned for future releases. 

    While perusing the Gremlin documentation and Marko's presentations, it became clear that Gremlin has the potential to provide significant utility for EDA by simplifying the implementation of complex graph rewriting operations. This is significant because of the core data representation challenge I alluded to earlier. During EDA, we are not yet aware of the underlying patterns in the data. Therefore we do not understand what representations are most appropriate. To reduce the time required to uncover those patterns, we want to take advantage of any techniques that allow us to quickly investigate different perspectives on the data.

    Graph rewriting involves the transformation of one graph to another through some set of operations. If we have a mechanism to efficiently implement and execute graph rewriting operations, we can develop a base representation for the data, the data graph, and apply graph rewriting operations as needed to the data graph to filter and transform the data for other analyses.  

    One of the core concepts brought to bear when defining graph rewriting operations in Gremlin is the user-defined step.  A user-defined step is a sequence of one or more atomic steps in a multi-relational graph. Through the use of user-defined steps, we are defining new definitions of adjacency and higher-level abstractions to utilize when operating on the data graph.

    My first application of user-defined steps was in the definition of a graph rewriting operation that would take a portion of the Enron email data graph and construct the 1.5-hop communication graph for a specified email address. Consider the diagram below. 

    User_Defined_Steps
    In the data graph, a given email message is represented by a collection of email address and message vertices. SENT and RECEIVED_BY edges specify the relationships between those addresses and the message. The communication graph represents the volume of communication between email addresses with weighted, directed edges. By defining the SENT_TO and RECEIVED_FROM steps, we can more easily express attributes of the data graph that are relevant to the construction of the communication graph. 

    This is a trivial example relative to the range of possibilities that Gremlin enables. Many more complex operations can be easily expressed with Gremlin's syntax. The following is a recent screencast Marko did on Gremlin that provides a better sense of its capabilities through a series of examples.

    Gremlin has recently migrated to Groovy as its host language. The opportunities available now are huge. I'm looking forward to exploring the art of the possible with Groovy Gremlin in the future and sharing more results here. As we begin to discuss complete technology stacks that support EDA, I'll share more thoughts, and hopefully more examples, on ways the composition of Gremlin with other capabilities provides utility. At the moment, I'm only limited by my lack of Groovy skills, not a lack of ideas to explore. 

    Posted by Chris Diehl at 7:00 PM in Exploratory Data Analysis, Graph Mining | Permalink | Comments (0) | TrackBack (0)

    01/26/2011

    Enron Email Data and Manager-Subordinate Relationship Metadata

    Today I have released a GraphML representation of the Enron email data and manager-subordinate relationship metadata derived from a document released during the Enron investigation. The dataset has been posted to InfoChimps and is available here. Associated documentation for the dataset has been posted here. This data formed the basis for the original social relationship identification experiments my colleagues and I conducted in 2007. Please feel free to get in touch with questions or comments. I hope this serves as a useful resource for others. 

    Posted by Chris Diehl at 11:43 AM in Datasets, Graph Mining, Social Relationship Identification | Permalink | Comments (0) | TrackBack (0)

    Next »

    Archives

    • February 2013
    • January 2013
    • December 2012
    • May 2012
    • November 2011
    • August 2011
    • June 2011
    • April 2011
    • January 2011
    • September 2010

    More...

    Categories

    • Cognitive Psychology
    • Crowdsourcing
    • Cyber
    • Datasets
    • Disaster Response
    • Exploratory Data Analysis
    • Graph Mining
    • Resilience
    • Social Psychology
    • Social Relationship Identification
    • Social Signaling
    Subscribe to RSS feed
    Copyright © Chris Diehl 2008-2013 Designed By Délicat Designs