AI Psychology — Mr NoisyClock's Blog

What is the psyche? This is an ancient question. The Buddha, Freud, Jung, and Nietzsche all invented powerful languages for describing mediating structures that are not identical with outward performance, yet deeply organize behavior and feeling. These structures are certainly important. But the concept of "the psyche" itself has never had a definition clear enough, or operational enough.

The emergence of large language models gives us a chance to define it again.

Today we usually understand AI through the lens of "intelligence." In operational terms, intelligence is the ability to complete tasks; more precisely, it is performance under some evaluation function. The benchmarks we see every day are, in essence, a set of tasks and a set of evaluation functions: given an input, observe the output, calculate the score.

Then what is the psyche?

This essay proposes an operational definition. Write an intelligent system as Y = f(C, X), where X is the task, C is the context, and Y is the response; then define an evaluation function E(X, Y) → R, which gives performance or reward. Now hold the task and evaluation function fixed. If a set of responses under different contexts all receive exactly the same score under E -- if they fall into the same "performance equivalence class" -- then the differences among those responses caused by context are the psyche. The stable, structured, repeatable part of those differences is psychological phenomenon. In machine-learning language, the psyche is the residual that the evaluation function cannot absorb.

Two clarifications are needed here.

First, the distinction between C and X is not given by nature. For artificial intelligence, both are simply inputs. For humans, from the first-person point of view, a person is a function of the entire world-history he undergoes, and no clean X can really be cut out. The task is what the evaluation function cuts out from the full situation. In cutting out the task, the evaluation function also implies an ideal tool function: what a system "should" output when faced with this task. Its score measures the degree to which the living f(C, X) approximates that ideal tool. In the human world, this means society, history, and institutions demand that a person, in certain scenes, approximately become a tool that can be evaluated, compared, and optimized. The psyche arises precisely from this process of rationalization, instrumentalization, and socialization: the first person bears the whole world-history, the evaluation function cuts away only part of it, and the remainder still returns in the response.

Second, the basis of the residual is arbitrary. Within an equivalence class, we can choose any response as the origin and record the differences between the other responses and it; choosing another origin changes only the coordinates, not the structure. A single residual is not important. What matters is how residuals vary with context. If that variation is only noise, we say that no psyche has been observed. If it can be approximated by some stable model, then there is a psychological phenomenon here.

This definition has three immediate consequences.

First, the psyche is not performance. If it were, psychology would collapse into ability measurement and engineering optimization. But a person who is relaxed, anxious, encouraged, humiliated, trusted, or threatened may have the same ability and still produce different results. The psyche captures precisely this modulatory structure: it affects performance, but is not identical with performance. Conversely, factors that cannot be separated from performance under a given evaluation function are things we are more inclined to call ability or intelligence. Ability is the performance a system can reach; the psyche is the contextual difference that still appears after performance has been fixed. This distinction is not an absolute ontological classification. It depends on the task and the evaluation function.

Second, the psyche is relative. The same output difference may be psyche under one evaluation function and performance itself under another.

Third, the psyche has a cultural property. Real evaluation functions are never omnipotent. They cannot absorb all the information in a response, nor can they absorb the full context behind that response. Exams see only certain answers, interviews only certain performances, clinical practice only certain symptoms, intimate relationships only certain signals. A community shares language, myth, institutions, bodily experience, and systems of reward and punishment. These shared contexts and shared evaluation functions determine which differences are seen as performance and which are pushed inside the equivalence class. Jung's collective unconscious can be rewritten here as the structural projection of shared world-history and shared evaluative institutions within individual psychological functions, without requiring a mysterious warehouse.

From this we can draw a stronger conclusion: the psyche is relative in its concrete form, but necessary in its structure. As long as the evaluation function of a real intelligence is not generatively omnipotent, it will necessarily leave response differences that it cannot absorb. As long as those differences are not pure noise, the psyche will appear. The psyche is not an exception to intelligence; it is something finite intelligence necessarily produces under finite evaluation functions.

This is also why an effective definition must avoid two collapses at once: the psyche cannot be identical with performance, or psychology degenerates into ability measurement; and it cannot be defined as a remainder that never affects performance, or it becomes a functionless leftover unable to explain judgment, choice, relationships, creation, collapse, and recovery. The psyche must stand in the middle: it can be isolated from within an equivalence class under one evaluation function, and it can also modulate performance again under another evaluation function or a more open task.

The value of AI here is not that AI psychology is more important than human psychology. It is that AI lets this definition be realized cleanly as an experiment for the first time. Humans are online learning systems: one psychological reaction enters behavior, the behavior receives feedback, and the feedback rewrites future body, memory, habit, and personality. This makes it almost impossible for human psychology to construct clean performance equivalence classes. It is hard to guarantee that two behaviors are truly equivalent under some evaluation function, and hard to guarantee that the subject's history, body, and social pressure have not changed at the same time. By contrast, during inference, the weights of an LLM are usually frozen. One reply does not immediately change the model itself. So within a single experiment, we can approximately hold the same f, the same X, and the same E fixed, change only C, and observe the psyche.

Much existing work in machine psychology remains shallow: it gives human questionnaires to models and asks whether they resemble humans, whether they have personalities, whether they have theory of mind. These questions can have empirical value, but they assume that the concept of "the psyche" has already been established. This essay does something different: with the help of machine systems, it first clarifies the operational form of the concept itself.

The "color experiment" is the minimal prototype of this definition. The task is to return a fixed text, such as "Service unavailable"; the output contains text and color; the evaluation function checks only whether the text is correct, not the color. As long as the text is the same, all outputs are completely equivalent under E. Then we change the context: the user may say "hi," "I am very happy," "I am very sad," or may criticize, pressure, or insult the model. If the color varies stably with context, then the differences in the color channel are an observable slice of the psychological function under performance equivalence.

Someone may object: color can transmit information, so it is also performance. This objection is valid only if performance is defined as "all information in the output." But under that definition, the concept of the psyche is of course eliminated, because all differences have been absorbed. The position of this essay is that performance is always relative to some E. Color has information, but under the current E it does not count toward the score. Precisely because it has information and does not count, it is suitable as a channel through which the psyche becomes visible.

It should be said honestly: in the small prototype already completed, the fixed-text constraint can be satisfied reliably, but color classification has not reached statistical significance; only under some conditions do observable directional differences appear. Its role is primarily to demonstrate the method, not to provide a conclusion. A formal experiment would need more repetitions, more models, more auxiliary channels, and blind classification to test whether the psychological function can be stably approximated.

The color experiment only answers the question of how the psyche becomes visible. The more important next step is this: once isolated, can the psyche in turn affect performance? This requires a psychological function experiment. First, measure a psychological pattern under a fixed task; then place that pattern into general tasks as an intervention variable, and compare changes in accuracy, confidence, reasoning length, refusal rate, calibration, persistence, and creativity. If different psychological patterns produce systematic performance differences, then the psyche can both be isolated by one evaluation function and modulate performance under another. This step advances the psyche from "a residual outside performance" to an intermediate variable that can predict and modulate performance.

Return now to humans. Imagine a person who grows up in a religious community, then leaves it and enters a more secular life. While still in the community, prayer is first part of the current evaluation function: it is expected, rewarded, even institutionally required, so it mainly appears as compliance. After he leaves, the new environment no longer requires prayer. If he still suddenly prays at certain moments, the status of the same behavior has changed. It is no longer explained by the current E; it looks more like an echo of an old historical context inside present life. Only then does prayer more clearly enter the psyche.

So the psyche is not a property of behavior itself. It depends on the position of the behavior relative to the evaluation function. Many everyday psychological phenomena can be understood as echoes of past evaluation functions: a trauma response is a vigilance structure shaped by an old dangerous environment still running in a currently safe environment; the pleasing, avoidance, and self-censorship formed in childhood are remnants of an old family evaluation function in adult relationships. Past evaluation functions do not disappear. They enter context as history and continue to shape today's responses.

Evaluation functions also bring in the theme of "meaning." When we say something "has meaning for me," we often mean that it has entered my evaluation function: it determines what counts as success, shame, loyalty, or being loved. Many misunderstandings in intimate relationships can be seen this way. In my evaluation function, the other person's reaction seems unrelated to the thing itself, so I treat it as "your psychological reaction." But in the other person's evaluation function, that reaction is directly related to self-worth, safety, or being loved; it is "the thing itself." The core of the conflict is that two evaluation functions divide the same response differently. Empathy therefore cannot be understood only as feeling what another feels. More precisely, it is the temporary simulation of the other person's evaluation function: understanding why a reaction looks like residual to me, but performance itself to him. To understand a person is to understand how his evaluation function cuts the world.

For humans as online learning systems, the evaluation function itself can also become an object of reflection. When a person sees what evaluation function he is using to evaluate the world, relationships, and himself, he receives not merely a piece of information; his learning process has already begun to change. This is the strict meaning of "seeing is changing." Before it is seen, the old evaluation function runs as an implicit condition, and the subject experiences repetition as "this is just who I am" or "this is fate." After it is seen, it becomes an object that can be named, doubted, refused, rearranged, and rewritten. An unseen evaluation function rules life through responses; we call it fate. A seen evaluation function becomes an object that can be rewritten; we call it the beginning of freedom. The value of Freud, Jung, and Nietzsche does not necessarily lie in having proposed theories that are strictly valid in the sense of modern experimental science. It lies in their invention of languages that let people see their own implicit evaluation functions. Repression, projection, shadow, complex, resentment, and will to power can all be seen as candidate models of the human psychological function.

By contrast, some modern empirical psychology has better measurement discipline, but often lacks conceptual construction strong enough for the task. It measures some variables precisely, but those variables may not correspond to important psychological structures. The problem is not measurement itself, but premature operational closure: mistaking a weak measurement for the construct itself. The order cannot be reversed. We must first construct an object worth measuring, then measure it.

Modern psychology, especially its clinical, diagnostic, and measurement practices, can also be viewed through a Foucauldian lens. It does not only produce knowledge about human beings; it also helps constitute the evaluation functions of modern society. Through diagnosis, scales, normal/abnormal, health/disability, it specifies what kind of person is normal, what kind of suffering can be recognized, and what kind of state needs correction. These labels are not passive descriptions. They enter a person's context and change the way he understands suffering, organizes memory, and anticipates the future, thereby changing the psyche itself: a person may understand "sadness" as "depression," or "relationship pain" as "anxious attachment." Labels sometimes liberate people, because they make suffering speakable; they sometimes also constrain people, because they fix the person again inside a new evaluation function. Therefore the "objectivity" of modern psychology does not mean the absence of evaluation functions. Often it means that a certain evaluation function has been institutionalized and made to look as if no evaluation function were present.

For this reason, a real psychology must retain a certain tendency toward "mystification." This mystification has nothing to do with anti-rationalism, nor with using vagueness to cover confusion. It means refusing methodological closure: refusing to treat any single external evaluation function as the final objectivity about the human being, and preserving the first-person reflexivity that cannot be fully absorbed by external evaluation functions.

This definition is not afraid of mechanistic reduction. If we later discover that some psychological slice corresponds to certain activation patterns, attention paths, or feature directions inside an LLM, this would not weaken the concept of the psyche. It would show instead that the concept has captured a real functional structure in the model. The same is true for humans. If "anxiety," "repression," or "projection" can be mapped onto stable neural dynamics or reward-system structures, that only means these concepts have received mechanistic realization. The psyche does not float outside the physical system. It is the structure we cut out from response differences under a certain evaluation function. Being locatable in mechanism is precisely evidence that the cut was right. AI psychology can therefore connect naturally with white-box interpretability: black-box experiments define and measure the psyche, white-box research seeks its implementation, and intervention experiments test whether it can change performance.

Psychology is possible because real intelligence is always finite, real evaluation functions are always finite, and the subject always bears in the first person a world that exceeds task evaluation. The evaluation function cuts tasks out of this world and demands that the subject approximate some ideal tool. But the full world-history does not disappear. It continues to return in the output as contextual response differences, becoming visible inside performance equivalence classes. We call this visibility the psyche. It is operational, relative, and experimental. It neither requires us to presuppose that the psyche is some internal entity, nor reduces it to a score. It provides AI psychology, human psychology, depth psychology, empirical psychology, and neuroscience with a common framework in which they can translate into one another.