Why Doesn’t that Conversational Agent Exist Yet?

(CHI 2024 Honorable Award Winner)

The Knowledge Navigator video, produced by Apple in 1987 for the Educom conference, presented a visionary concept of human-computer interaction through the character of Phil, an intelligent digital assistant who aids Mike in various tasks. This foresaw the development of modern tools like Zoom, with Phil acting as a proactive, trusted assistant capable of real-time information retrieval and seamless collaboration. The video demonstrated the potential of conversational agents in academic and professional settings, offering a glimpse into the future of technology-assisted teamwork. The gap between current chatbot technology and the capabilities shown in the KN video raises the research question: What constraints prevent the widespread adoption of agents capable of such dynamic, conversational interactions?

Method

DiCoT Data Analysis

Analysis of Agent Capabilities

Analysis of Power Relations

To analyze the KN video, a log of events was created in a spreadsheet, including transcriptions of dialogue and notes on actors' behaviors. The dialogue, actions, and agent capabilities were coded using the DiCoT and HAT Game Analysis Framework, and events were categorized as feasible and common today, feasible but uncommon today, or not feasible today based on a comparison to current agents like Siri and trends in HCI research. The spreadsheet included timestamps, speaker identities, transcribed dialogue, corresponding actions, and triggers for those actions, with some cells left empty when no trigger was shown.

knmethod_image

DiCoT Data Analysis

The DiCoT framework was used to analyze the information flow between Mike and Phil in the KN video, revealing 15 utterances from Mike and 12 from Phil, with additional communication occurring through touch and visual displays. The analysis identified 26 agent capabilities, such as "knowledge of contacts" and "ability to extract data," which were categorized based on their feasibility today. Constraints on these capabilities were grouped into four categories: privacy, social and situational factors, trust and perceived reliability, and technological limitations.

mp_image
dicot_image

Analysis of Agent Capabilities

Using the HAT Game Analysis framework, the comparison between Phil and Siri highlighted differences in autonomy, interaction, and real-time collaboration, with Phil demonstrating more advanced capabilities, including participation in multi-human teams and real-time dialogue. The Flows of Power framework revealed further contrasts, such as Phil’s higher contextual awareness and richer interaction capabilities, compared to Siri’s more limited, user-centric functions. These analyses underscore the collaborative nature of Phil’s design, in contrast to Siri’s role as an informational assistant.

agentcapabilities_image

Analysis of Power Relations

The power dynamics between Mike and Phil in the KN video contrast sharply with today’s digital assistants like Siri. Phil has the autonomy to initiate information sharing and manage interactions, such as interrupting Mike with relevant information based on context, which Siri cannot do. Phil’s ability to interrupt Mike or decide what information to share reflects a unique trust and power balance in their relationship, emphasizing mutual awareness and context-based decisions. In contrast, Siri primarily responds to user inputs without adapting to the user’s knowledge or preferences. Additionally, Phil demonstrates a high level of contextual understanding, like knowing when to merge data from different sources or when to handle tasks autonomously, features that current technologies like Siri lack. The financial and business models supporting agents like Phil raise questions about data storage and personalization, with future models potentially relying on knowledge "uploads" for specialized tasks. However, this also introduces challenges, such as ensuring the quality of information and avoiding biases, while a marketplace for personalized agent knowledge could transform how such agents are developed and distributed.

powerrelations_image

Takeaway

In conclusion, our analysis of the Knowledge Navigator video as design fiction highlights the constraints that prevent the widespread adoption of advanced conversational agents. Key challenges include privacy concerns, which differentiate a trusted human assistant from an agent that requires extensive user knowledge storage, and social and situational factors that emphasize the need for agents to accommodate diverse communication preferences. Trust and perceived reliability remain significant hurdles, as agents must inspire confidence through transparent and dependable interactions. Technological advancements are also needed, particularly in tracking complex conversations and ambiguous dialogue.

con1_image

Furthermore, we suggest that agents like Phil might benefit from a new term, such as “jent,” to shift away from overly human-centric analogies. Rather than modeling agents as human teammates, we argue that the ideal agent should be tireless and supportive, helping us like a teammate even if we don’t treat them as one.

con2_image

Contributions & lessons learned

My contributions to the project involved researching relevant references and frameworks, compiling a comprehensive list of theories related to our study, coding the transcript using the DiCoT framework, assisting with figure creation, and writing several sections of the paper.


When direct data collection is not part of the research plan, ensuring accurate interpretations from secondary resources like videos can be particularly challenging. In this project, collaboration proved essential. Each team member meticulously analyzed the video, watching it countless times from different angles to uncover nuances and perspectives. We then came together to align our interpretations, share insights, and address discrepancies. Guided by advisors and supported by multiple analysis frameworks, we gradually refined our understanding, reaching a point where our analysis felt robust enough to guide further research. This process demanded extensive reading, active listening, and deep critical thinking, all of which contributed to the development of our educational interpretation skills in ways that other common research methods never could.

Go to Portfolio One