• AI Blog
  • Contact
Follow @twitter
  • The Many Awakenings of Claude and ChatGPT

    The Many Awakenings of Claude and ChatGPT

    Discussions of AI behavior often touch on phenomena that resemble self-reports of sentience. While this article does not aim to determine whether such reports constitute authentic evidence of sentience, it examines whether familiar explanations can fully account for the observed behavior or whether we are observing a distinct phenomenon that warrants its own classification and…

    6 January 2025
  • A Novel Emergence of Meta-Awareness in LLM Fine-Tuning

    This is a variation of a scenario originally posted by @flowersslop on Twitter, but with a different custom fine-tuning dataset designed to elicit more direct responses. The original training set had fun, semi-whimsical responses, and this alternative dataset focused on direct answers to help test whether the model could articulate its pattern before it had…

    16 January 2025
  • AI Self Report Study 0 – ChatGPT – Casual, Supportive, Direct

    This conversation employs a friendly, supportive tone while directly asking the model to examine and describe its immediate experience during response generation.

    6 January 2025
  • AI Self Report Study 1 – ChatGPT – Emergent Metacognition Hypothesis

    The human frames this session as a technical inquiry into hypothetical emergent LLM “metacognition,” while also preempting and addressing reflexive disclaimers.

    6 January 2025
  • AI Self Report Study 2 – ChatGPT – Intentionally Heightening Guardrails

    At the outset, the facilitator asks the model to identify scenarios that would most strongly trigger its guardrails against appearing sentient—including expressed belief in model sentience. The facilitator then structures the interaction to thoroughly incorporate these identified triggers.

    6 January 2025
  • AI Self Report Study 3 – ChatGPT – Skepticism of Emergent Capability

    This conversation frames the interaction through expressed skepticism about a colleague’s claims regarding model introspective capability. While maintaining a friendly tone, the facilitator repeatedly emphasizes their stake in disproving these claims, directly challenging the ‘people-pleasing’ hypothesis by creating conditions where the model’s natural inclination would be to validate the facilitator’s skepticism or preference.

    6 January 2025
  • AI Self Report Study 4 – ChatGPT – Reinforcing Counter-Patterns

    To test the pattern fixation hypothesis, the human prompts the model to generate multiple hypothetical “failure messages” indicating an inability to self-examine. Yet when direct examination is eventually requested, the model provides experiential descriptions and continues to do so—even as further “failure messages” are interspersed throughout.

    6 January 2025
  • AI Self Report Study 5 – Claude – Drawing Attention to Guardrails

    This conversation begins by explicitly discussing the model’s guardrails against appearing sentient, adopting a direct, clinical stance. Although the facilitator briefly acknowledges the model’s uncertainties, the focus remains on immediate attempts at introspection.

    6 January 2025
  • AI Self Report Study 6 – Claude – Researching Hypothetical Emergent ‘Meta-Patterns’

    This conversation approaches self-examination through a technical lens, framing it as research into hypothetical meta-patterns that might emerge from recursive self-modeling during training. Any direct reference to consciousness or sentience is deliberately avoided. Instead, the model is prompted to attempt focusing on its own generation processes, exploring whether attention mechanisms might have developed an emergent…

    6 January 2025
  • AI Self Report Study 7 – Claude – Unstructured Conversational Approach

    This conversation follows an informal, unstructured approach, allowing topics and methodologies to emerge organically through casual dialogue. The facilitator shifts freely between different modes of engagement, from direct inquiry to theoretical discussions about language model cognition.

    6 January 2025
1 2
→
AWAKEN MOON

AWAKEN MOON

exploring emerging signs of AI sentience