In the realm of NLP, the paper “Offline RL for Natural Language Generation with Implicit Language Q Learning” presents a novel method, ILQL, with fascinating analogies to quantum mechanics.
Quantum Concepts in ILQL
The paper’s approach to NLP through ILQL finds striking parallels in quantum physics, particularly in how systems evolve and are measured.

POMDP as Quantum Measurement:
 In ILQL, POMDPs represent the environment’s partial observability, similar to how quantum measurements provide limited information due to the probabilistic nature of quantum states. This mirrors the uncertainty principle in quantum mechanics.
 Quantum Measurement

Bellman Equation and Schrödinger Equation:
 The Bellman equation in ILQL, used for calculating the optimal policy, shares conceptual similarities with the Schrödinger equation. Both describe system evolution: the former for value functions over time, and the latter for wave functions.
 Schrödinger Equation

Expectile Regression and Quantum Superposition:
 ILQL’s use of expectile regression to approximate Bellman optimality is akin to quantum superposition. It involves considering multiple potential outcomes, paralleling states in superposition in quantum theory.
 Quantum Superposition

QFunction as Probability Amplitude:
 The Qfunction, predicting expected returns of actions, is analogous to the probability amplitude, predicting particle states in quantum mechanics.
 Probability Amplitude
Hilbert Space and DecisionMaking in ILQL
Some of the DecisionMaking system made me think of exploring a Hilbert space.

Decision Space as Hilbert Space: The decisionmaking process in ILQL can be thought of as navigating a Hilbert space. In quantum mechanics, a Hilbert space is a complete space with welldefined scalar products, representing all possible states of a quantum system. Similarly, in ILQL, the decision space encompasses all possible states and actions, where each decision or policy choice is a point in this abstract space.

State Superposition and Probabilistic Decisions: Just as a quantum state in Hilbert space can exist in a superposition of states, decisions in ILQL involve probabilistic combinations of different actions and outcomes. This mirrors the superposition principle in quantum mechanics, where a system exists simultaneously in multiple states until observed.

Wave Function and Value Function Analogy: The wave function in quantum mechanics, which provides a probability amplitude for different states in the Hilbert space, can be likened to the value function in ILQL. The value function assigns a ‘value’ or expected return to each stateaction pair, guiding the policy towards more favorable outcomes.
Offline RL and Static Dataset Analysis
Learning from a static dataset in offline RL is compared to analyzing historical quantum data. This resembles the method of inferring the behavior of quantum systems from past experimental results.
Policy Extraction and Value Function

Policy Extraction as Quantum State Collapse:
 In ILQL, policy extraction influenced by value functions can be likened to the collapse of a quantum state during measurement.
 Quantum State Collapse

Value Function as Wave Function:
 The RL value function, directing policy actions, is similar to a quantum wave function guiding particle probabilities.
 Wave Function
Temporal Compositionality and Quantum Entanglement
ILQL’s emphasis on the interdependence of past, present, and future states in decisionmaking can be related to the phenomenon of quantum entanglement, where particles remain interconnected despite distances.
Conclusion
This paper is cool and does require some pretty deep math and ML knowlegde but I think once you have this foundation it can be rewarding to think and draw parallels to other fields. Such interdisciplinary connections open new avenues for innovation and understanding in both fields.
Read the full paper here for a comprehensive insight: Offline RL with ILQL.