0%

PHYS 5120 - 计算能源材料和电子结构模拟 Lecture-4

Lecturer: Prof.PAN DING

1 Monte Carlo (MC) Method:

  • 内容:

This whiteboard provides a concise but detailed overview of two important and related simulation techniques in computational physics and chemistry: the Metropolis Monte Carlo (MC) method and Hamiltonian (or Hybrid) Monte Carlo (HMC). Here is a detailed breakdown of the concepts presented.

1. Metropolis Monte Carlo (MC) Method

The heading “Metropolis MC method” introduces a foundational algorithm in statistical mechanics. Metropolis Monte Carlo is a method used to generate a sequence of states for a system, allowing for the calculation of average properties. 左上角的这一部分介绍了基础的 Metropolis Monte Carlo 算法。它是一种生成状态序列的方法,使得处于任何状态的概率都符合期望的概率分布(在物理学中通常是玻尔兹曼分布)。

  • Conceptual Diagram: The small box with numbered sites (0-5) and an arrow showing a move from state 0 to 2, and then to 3, illustrates a “random walk.” In Metropolis MC, the system transitions from one state to another by making small, random changes. 小方框中标有编号的位点(0-5),箭头表示从状态 0 到状态 2,再到状态 3 的移动,代表“随机游走”。在 Metropolis MC 中,系统通过进行微小的随机变化从一个状态过渡到另一个状态。
  • Random Number Generation: The notation rand t \in (0,1) indicates the use of a random number \(t\) drawn from a uniform distribution between 0 and 1. This is a core component of the algorithm, used to decide whether to accept or reject a proposed new state. 符号 rand t \in (0,1) 表示使用从 0 到 1 之间的均匀分布中抽取的随机数 \(t\)。这是算法的核心部分,用于决定是否接受或拒绝提议的新状态。
  • Detailed Balance Condition: The equation \(P_o T(o \to n) = P_n T(n \to o)\) is the principle of detailed balance. It states that in a system at equilibrium, the probability of being in an old state (\(o\)) and transitioning to a new state (\(n\)) is equal to the probability of being in the new state and transitioning back to the old one. This condition is crucial because it ensures that the simulation will eventually sample states according to their correct thermodynamic probabilities (the Boltzmann distribution). 方程 \(P_o T(o \to n) = P_n T(n \to o)\) 是详细平衡的原理。它指出,在平衡系统中,处于旧状态 (\(o\)) 并转变为新状态 (\(n\)) 的概率等于处于新状态并转变回旧状态的概率。此条件至关​​重要,因为它确保模拟最终将根据正确的热力学概率(玻尔兹曼分布)对状态进行采样。
  • Acceptance Rate: The note \sim 30\%? likely refers to the target acceptance rate for an efficient Metropolis MC simulation. If new states are accepted too often or too rarely, the exploration of the system’s possible configurations is inefficient. While the famous optimal acceptance rate for certain high-dimensional problems is around 23.4%, a range of 20-50% is often considered effective. 注释“30%?”指的是高效 Metropolis 蒙特卡罗模拟的目标接受率。如果新状态接受过于频繁或过于稀少,系统对可能配置的探索就会变得低效。虽然某些高维问题的最佳接受率约为 23.4%,但通常认为 20-50% 的范围是有效的。

2. Hamiltonian / Hybrid Monte Carlo (HMC)

The second topic, “Hamiltonian/Hybrid MC (HMC),” is a more advanced Monte Carlo method that uses principles from classical mechanics to propose new states more intelligently than the simple random-walk approach of the standard Metropolis method. This often leads to a much higher acceptance rate and more efficient exploration of the state space. 第二个主题“哈密顿/混合蒙特卡罗 (HMC)”是一种更先进的蒙特卡罗方法,它利用经典力学原理,比标准 Metropolis 方法中简单的随机游走方法更智能地提出新状态。这通常会带来更高的接受率和更高效的状态空间探索。

The whiteboard outlines a four-step HMC algorithm:

Step 1: Randomize Velocities The first step is to randomize the velocities: \(\vec{v}_i \sim \mathcal{N}(0, k_B T)\). 第一步是随机化速度:\(\vec{v}_i \sim \mathcal{N}(0, k_B T)\)。 * This step introduces momentum into the system. For each particle \(i\), a velocity vector \(\vec{v}_i\) is randomly drawn from a normal (Gaussian) distribution with a mean of 0 and a variance related to the temperature \(T\) and the Boltzmann constant \(k_B\). 此步骤将动量引入系统。对于每个粒子 \(i\),速度矢量 \(\vec{v}_i\) 会随机地从正态(高斯)分布中抽取,该分布的均值为 0,方差与温度 \(T\) 和玻尔兹曼常数 \(k_B\) 相关。 * The full formula for this probability distribution, \(f(\vec{v})\), is the Maxwell-Boltzmann distribution, which is written out further down the board. 该概率分布的完整公式 \(f(\vec{v})\)麦克斯韦-玻尔兹曼分布

Step 2: Molecular Dynamics (MD) Integration The board notes this as t=0 \to h \text{ or } mh MD and mentions the Verlet algorithm.

  • This is the “Hamiltonian dynamics” part of the algorithm. Starting from the current positions and the newly randomized velocities, the system’s trajectory is calculated for a short period of time (\(h\) or \(mh\)) using Molecular Dynamics (MD). 这是算法的“哈密顿动力学”部分。从当前位置和新随机化的速度开始,使用分子动力学 (MD) 计算系统在短时间内(\(h\)\(mh\))的轨迹。
  • The name Verlet refers to the Verlet integration algorithm, a numerical method used to solve Newton’s equations of motion. It is popular in MD simulations because it is time-reversible and conserves energy well over long simulations. 指的是 Verlet 积分算法,这是一种用于求解牛顿运动方程的数值方法。它在 MD 模拟中很受欢迎,因为它具有时间可逆性,并且在长时间模拟中能量守恒效果良好。

Step 3: Calculate Total Energy The third step is to calculate total energy: \(E_n = K_n + V_n\). 第三步是“计算总能量”:\(E_n = K_n + V_n\)。 * After the MD trajectory, the system is in a new state \(n\). The total energy of this new state, \(E_n\), is calculated as the sum of its kinetic energy (\(K_n\), from the velocities) and its potential energy (\(V_n\), from the positions). MD 轨迹之后,系统处于新状态 \(n\)。新状态的总能量 \(E_n\) 等于其动能 (\(K_n\),由速度计算得出)和势能 (\(V_n\),由位置计算得出)之和。

Step 4: Acceptance Test The final step is the acceptance criterion: \(\text{acc}(o \to n) = \min(1, e^{-\beta(E_n - E_o)})\). 最后一步是验收标准:\(\text{acc}(o \to n) = \min(1, e^{-\beta(E_n - E_o)})\)。 * This is the Metropolis acceptance criterion. The algorithm decides whether to accept the new state \(n\) or reject it and stay in the old state \(o\). 这是 Metropolis 验收标准。算法决定是接受新状态 \(n\) 还是拒绝它并保持旧状态 \(o\)。 * The probability of acceptance depends on the change in total energy (\(E_n - E_o\)). If the new energy is lower, the move is always accepted. If the new energy is higher, it might still be accepted with a probability \(e^{-\beta(E_n - E_o)}\), where \(\beta = 1/(k_B T)\). This allows the system to escape from local energy minima. 验收概率取决于总能量的变化 (\(E_n - E_o\))。如果新能量较低,则始终接受该移动。如果新的能量更高,它仍然可能以概率 \(e^{-\beta(E_n - E_o)}\) 被接受,其中 \(\beta = 1/(k_B T)\)。这使得系统能够摆脱局部能量最小值。

Key Formulas and Notations

  • Maxwell-Boltzmann Distribution麦克斯韦-玻尔兹曼分布: The formula for the velocity distribution is given as: \(f(\vec{v}) = \left(\frac{m}{2\pi k_B T}\right)^{3/2} \exp\left(-\frac{m v^2}{2 k_B T}\right)\) This gives the probability density for a particle of mass \(m\) to have a velocity \(\vec{v}\) at a given temperature \(T\).质量为 \(m\) 的粒子速度为 的概率密度

  • Energy Conservation and Acceptance Rate: The notes \(E_n \approx E_o\) and \(75\%\) highlight a key advantage of HMC. Because the Verlet integrator approximately conserves energy, the final energy \(E_n\) after the MD trajectory is usually very close to the initial energy \(E_o\). This means the term \((E_n - E_o)\) is small, and the acceptance probability is high. The \(75\%\) indicates a typical or target acceptance rate for HMC, which is significantly higher than for standard Metropolis MC. 注释 \(E_n \approx E_o\)\(75\%\) 凸显了 HMC 的一个关键优势。由于 Verlet 积分器近似地守恒能量,MD 轨迹后的最终能量 \(E_n\) 通常非常接近初始能量 \(E_o\)。这意味着 \((E_n - E_o)\) 项很小,接受概率很高。\(75\%\) 表示 HMC 的典型或目标接受率,明显高于标准 Metropolis MC。

  • Hamiltonian Operator: The symbol \(\hat{H}\) written on the adjacent board represents the Hamiltonian operator, which gives the total energy of the system. The note Δ Adiabatic suggests that the MD evolution is ideally an adiabatic process (no heat exchange), during which the total energy (the Hamiltonian) is conserved. 相邻板上的符号 \(\hat{H}\) 代表哈密顿算符,它给出了系统的总能量。注释“Δ Adiabatic”表明 MD 演化在理想情况下是一个绝热过程(无热交换),在此过程中总能量(哈密顿量)守恒。

This whiteboard displays the fundamental equation of quantum chemistry: the time-dependent Schrödinger equation, along with the detailed breakdown of the molecular Hamiltonian operator. This equation is the starting point for almost all ab initio (first-principles) quantum mechanical calculations of molecular systems. 这块白板展示了量子化学的基本方程:含时薛定谔方程,以及分子哈密顿算符的详细分解。该方程是几乎所有分子系统从头算(第一性原理)量子力学计算的起点。

3. The Time-Dependent Schrödinger Equation

At the top of the board, the fundamental equation governing the evolution of a quantum mechanical system is presented: 白板顶部显示了控制量子力学系统演化的基本方程: \(i\hbar \frac{\partial \Psi}{\partial t} = \hat{\mathcal{H}} \Psi\)

  • \(\Psi\) (Psi) is the wave function of the system. It contains all the information that can be known about the system (e.g., the positions and momenta of all particles). 是系统的波函数。它包含了关于系统的所有已知信息(例如,所有粒子的位置和动量)。

  • \(\hat{\mathcal{H}}\) is the Hamiltonian operator, which represents the total energy of the system. 是哈密顿算符,表示系统的总能量。

  • \(i\) 是虚数单位。

  • \(i\) is the imaginary unit.

  • \(\hbar\) is the reduced Planck constant.是约化普朗克常数

  • \(\frac{\partial \Psi}{\partial t}\) represents how the wave function changes over time.表示波函数随时间的变化。

This equation states that the time evolution of the quantum state is dictated by the system’s total energy operator, the Hamiltonian. The note “Δ Adiabatic process” likely connects to the context of the Born-Oppenheimer approximation, where the electronic Schrödinger equation is solved for fixed nuclear positions, assuming the electrons adjust adiabatically (instantaneously) to the motion of the nuclei. 该方程表明,量子态的时间演化由系统的总能量算符——哈密顿算符决定。注释“Δ绝热过程”与玻恩-奥本海默近似相关,在该近似中,电子薛定谔方程是针对固定原子核位置求解的,假设电子以绝热方式(瞬时)调整以适应原子核的运动。

4. The Full Molecular Hamiltonian (\(\hat{\mathcal{H}}\))

The main part of the whiteboard is the detailed expression for the non-relativistic, time-independent molecular Hamiltonian. It is the sum of the kinetic and potential energies of all the nuclei and electrons in the system. The equation can be broken down into five distinct terms: 白板的主要部分是非相对论性、时间无关的分子哈密顿量的详细表达式。它是系统中所有原子核和电子的动能和势能之和。

该方程可以分解为五个不同的项:

\(\hat{\mathcal{H}} = -\sum_{I=1}^{P} \frac{\hbar^2}{2M_I}\nabla_I^2 - \sum_{i=1}^{N} \frac{\hbar^2}{2m}\nabla_i^2 + \frac{e^2}{2}\sum_{I=1}^{P}\sum_{J \neq I}^{P} \frac{Z_I Z_J}{|\vec{R}_I - \vec{R}_J|} + \frac{e^2}{2}\sum_{i=1}^{N}\sum_{j \neq i}^{N} \frac{1}{|\vec{r}_i - \vec{r}_j|} - e^2\sum_{I=1}^{P}\sum_{i=1}^{N} \frac{Z_I}{|\vec{R}_I - \vec{r}_i|}\)

Let’s analyze each component:

A. Kinetic Energy Terms 动能项

  1. Kinetic Energy of the Nuclei 原子核的动能: \(-\sum_{I=1}^{P} \frac{\hbar^2}{2M_I}\nabla_I^2\) This term is the sum of the kinetic energy operators for all the nuclei in the system.此项是系统中所有原子核的动能算符之和。
    • The sum is over all nuclei, indexed by \(I\) from 1 to \(P\).该和涵盖所有原子核,索引为 \(I\),从 1 到 \(P\)
    • \(M_I\) is the mass of nucleus \(I\).是原子核 \(I\) 的质量。
    • \(\nabla_I^2\) is the Laplacian operator, which involves the second spatial derivatives with respect to the coordinates of nucleus \(I\).是拉普拉斯算符,它涉及原子核 \(I\) 坐标的二阶空间导数。
  2. Kinetic Energy of the Electrons 电子的动能: \(-\sum_{i=1}^{N} \frac{\hbar^2}{2m}\nabla_i^2\) This is the corresponding sum of the kinetic energy operators for all the electrons.这是所有电子的动能算符的对应和。
    • The sum is over all electrons, indexed by \(i\) from 1 to \(N\).该和是针对所有电子的,索引为 \(i\),从 1 到 \(N\)
    • \(m\) is the mass of an electron.是电子的质量。
    • \(\nabla_i^2\) is the Laplacian operator with respect to the coordinates of electron \(i\).是关于电子 \(i\) 坐标的拉普拉斯算符。

B. Potential Energy Terms (Electrostatic Interactions) 势能项(静电相互作用)

  1. Nuclear-Nuclear Repulsion 核间排斥力: \(+\frac{e^2}{2}\sum_{I=1}^{P}\sum_{J \neq I}^{P} \frac{Z_I Z_J}{|\vec{R}_I - \vec{R}_J|}\) This term represents the potential energy from the electrostatic (Coulomb) repulsion between all pairs of positively charged nuclei.该项表示所有带正电原子核对之间静电(库仑)排斥力产生的势能。
    • The double summation runs over all unique pairs of nuclei (\(I, J\)).对所有唯一的原子核对 (\(I, J\)) 进行双重求和。
    • \(Z_I\) is the atomic number (i.e., the charge) of nucleus \(I\).是原子核 \(I\) 的原子序数(即电荷)。
    • \(\vec{R}_I\) is the position vector of nucleus \(I\).是原子核 \(I\) 的位置矢量。
    • \(e\) is the elementary charge.是基本电荷。
  2. Electron-Electron Repulsion 电子间排斥力: \(+\frac{e^2}{2}\sum_{i=1}^{N}\sum_{j \neq i}^{N} \frac{1}{|\vec{r}_i - \vec{r}_j|}\) This term represents the potential energy from the electrostatic repulsion between all pairs of negatively charged electrons.该项表示所有带负电的电子对之间静电排斥的势能。
    • The double summation runs over all unique pairs of electrons (\(i, j\)).对所有不同的电子对 (\(i, j\)) 进行双重求和。
    • \(\vec{r}_i\) is the position vector of electron \(i\).是电子 \(i\) 的位置矢量。
  3. Nuclear-Electron Attraction 核-电子引力: \(-e^2\sum_{I=1}^{P}\sum_{i=1}^{N} \frac{Z_I}{|\vec{R}_I - \vec{r}_i|}\) This final term represents the potential energy from the electrostatic attraction between the nuclei and the electrons.这最后一项表示原子核和电子之间静电引力的势能。
    • The summation runs over all nuclei and all electrons.该求和适用于所有原子核和所有电子。

5. Notations and Conventions

  • Atomic Units: The note \(\frac{1}{4\pi\epsilon_0} = k = 1\) is a key indicator of the convention being used. This sets the Coulomb constant to 1, which is a hallmark of Hartree atomic units. In this system, the elementary charge (\(e\)), electron mass (\(m\)), and reduced Planck constant (\(\hbar\)) are also set to 1. This simplifies the Hamiltonian significantly, removing the physical constants and making the equations easier to work with computationally. 是所用约定的关键指标。这将库仑常数设置为 1,这是Hartree 原子单位的标志。在这个系统中,基本电荷 (\(e\))、电子质量 (\(m\)) 和​​约化普朗克常数 (\(\hbar\)) 也设为 1。这显著简化了哈密顿量,消除了物理常数,使方程更易于计算。
  • Interaction Terms: The notations \(\{i, j\}\), \(\{i, j, k\}\), etc., refer to the “many-body” problem. The Hamiltonian contains two-body terms (interactions between pairs of particles), and solving the Schrödinger equation exactly is extremely difficult because the motion of every particle is correlated with every other particle. Computational methods are designed to approximate these interactions. 符号 \(\{i, j\}\)\(\{i, j, k\}\) 等指的是“多体”问题。哈密顿量包含二体项(粒子对之间的相互作用),而精确求解薛定谔方程极其困难,因为每个粒子的运动都与其他粒子相关。计算方法旨在近似这些相互作用。

This whiteboard presents the mathematical foundation for non-adiabatic molecular dynamics, a sophisticated method in theoretical chemistry and physics used to simulate processes where the Born-Oppenheimer approximation breaks down. This typically occurs in photochemistry, electron transfer reactions, and when molecules interact with intense laser fields. 这块白板展示了非绝热分子动力学的数学基础,这是理论化学和物理学中一种复杂的方法,用于模拟玻恩-奥本海默近似失效的过程。这通常发生在光化学、电子转移反应以及分子与强激光场相互作用时。

6. Topic: Non-Adiabatic Molecular Dynamics (MD) 非绝热分子动力学 (MD)

The title “Δ non-adiabatic MD” indicates that the topic moves beyond the standard Born-Oppenheimer approximation. In this approximation, it is assumed that the light electrons adjust instantaneously to the motion of the heavy nuclei, allowing the system to be described by a single potential energy surface. Non-adiabatic methods, by contrast, account for the quantum mechanical coupling between multiple electronic states.

标题“Δ 非绝热 MD”表明该主题超越了标准的玻恩-奥本海默近似。在该近似中,假设轻电子会根据重原子核的运动进行瞬时调整,从而使系统可以用单个势能面来描述。相比之下,非绝热方法则考虑了多个电子态之间的量子力学耦合。

7. The Born-Huang Ansatz 玻恩-黄拟设

The starting point for this method is the “ansatz” (an educated guess for the form of the solution). This is the Born-Huang expansion for the total molecular wave function, \(\Psi\). 该方法的起点是“拟设”(对解形式的合理猜测)。这是分子总波函数 \(\Psi\) 的玻恩-黄展开式。

\(\Psi(\vec{R}, \vec{r}, t) = \sum_{n} \Theta_n(\vec{R}, t) \Phi_n(\vec{R}, \vec{r})\)

  • \(\Psi(\vec{R}, \vec{r}, t)\) is the total wave function for the entire molecule. It depends on the coordinates of all nuclei (\(\vec{R}\)), all electrons (\(\vec{r}\)), and time (\(t\)). 是整个分子的总波函数。它取决于所有原子核 (\(\vec{R}\))、所有电子 (\(\vec{r}\)) 和时间 (\(t\)) 的坐标。

  • \(\Phi_n(\vec{R}, \vec{r})\) are the electronic wave functions. They are the solutions to the electronic Schrödinger equation for a fixed nuclear geometry \(\vec{R}\) and form a complete basis set. The index \(n\) labels the electronic state (e.g., ground state, first excited state, etc.). 它们是给定原子核几何构型 \(\vec{R}\) 的电子薛定谔方程的解,并构成一个完整的基组。下标 \(n\) 标记电子态(例如,基态、第一激发态等)。

  • \(\Theta_n(\vec{R}, t)\) are the nuclear wave functions. Each \(\Theta_n\) describes the motion of the nuclei on the potential energy surface of the corresponding electronic state, \(\Phi_n\). Crucially, they depend on time. 是核波函数。每个 \(\Theta_n\) 描述原子核在相应电子态 \(\Phi_n\) 势能面上的运动。至关重要的是,它们依赖于时间。

This ansatz expresses the total molecular state as a superposition of electronic states, where the coefficients of the superposition are the nuclear wave functions. 该拟设将总分子态表示为电子态的叠加,其中叠加的系数是核波函数。

8. The Partitioned Molecular Hamiltonian 分割分子哈密顿量

The total molecular Hamiltonian, \(\hat{\mathcal{H}}\), is partitioned into terms that act on the nuclei and electrons separately. 总分子哈密顿量 \(\hat{\mathcal{H}}\) 被分割成分别作用于原子核和电子的项。

\(\hat{\mathcal{H}} = -\sum_{I} \frac{\hbar^2}{2M_I}\nabla_I^2 + \hat{\mathcal{H}}_e + \hat{V}_{nn}\)

  • \(-\sum_{I} \frac{\hbar^2}{2M_I}\nabla_I^2\): This is the kinetic energy operator for the nuclei, often denoted as \(\hat{T}_n\).这是原子核的动能算符,通常表示为 \(\hat{T}_n\)

  • \(\hat{\mathcal{H}}_e\): This is the electronic Hamiltonian, which includes the kinetic energy of the electrons and the potential energy of electron-electron and electron-nuclear interactions. 这是电子哈密顿量,包含电子的动能以及电子-电子和电子-核相互作用的势能。

  • \(\hat{V}_{nn}\): This is the potential energy operator for nuclear-nuclear repulsion.这是核-核排斥的势能算符。

9. The Electronic Schrödinger Equation 电子薛定谔方程

The electronic basis functions, \(\Phi_n\), are defined as the eigenfunctions of the electronic Hamiltonian (plus the nuclear repulsion term) for a fixed nuclear configuration \(\vec{R}\). 电子基函数 \(\Phi_n\) 定义为对于固定的核构型 \(\vec{R}\),电子哈密顿量(加上核排斥项)的本征函数。

\((\hat{\mathcal{H}}_e + \hat{V}_{nn}) \Phi_n(\vec{R}, \vec{r}) = E_n(\vec{R}) \Phi_n(\vec{R}, \vec{r})\)

  • \(E_n(\vec{R})\) are the eigenvalues, which are the potential energy surfaces (PES). Each electronic state \(n\) has its own PES, which dictates the forces acting on the nuclei when the molecule is in that electronic state. 是特征值,即势能面 (PES)。每个电子态 \(n\) 都有其自身的势能面,它决定了分子处于该电子态时作用于原子核的力。

10. Deriving the Equations of Motion for the Nuclei 推导原子核运动方程

The final part of the whiteboard begins the derivation of the time-dependent Schrödinger equation for the nuclear wave functions, \(\Theta_k\). The process starts with the full time-dependent Schrödinger equation, \(i\hbar \frac{\partial \Psi}{\partial t} = \hat{\mathcal{H}} \Psi\). To find the equation for a specific nuclear wave function \(\Theta_k\), this main equation is projected onto the corresponding electronic basis state \(\Phi_k\). 白板的最后一部分开始推导原子核波函数 \(\Theta_k\) 的含时薛定谔方程。该过程从完整的含时薛定谔方程 \(i\hbar \frac{\partial \Psi}{\partial t} = \hat{\mathcal{H}} \Psi\) 开始。为了找到特定原子核波函数 \(\Theta_k\) 的方程,需要将这个主方程投影到相应的电子基态 \(\Phi_k\) 上。

This is done by multiplying from the left by the complex conjugate of the electronic wave function, \(\Phi_k^*\), and integrating over all electronic coordinates, \(d\vec{r}\). 可以通过从左边乘以电子波函数 \(\Phi_k^*\) 的复共轭,然后在所有电子坐标 \(d\vec{r}\) 上积分来实现。

\(\int \Phi_k^* i\hbar \frac{\partial}{\partial t} \Psi \,d\vec{r} = \int \Phi_k^* \hat{\mathcal{H}} \Psi \,d\vec{r}\)

The board then shows the result of substituting the Born-Huang ansatz for \(\Psi\) and the partitioned Hamiltonian for \(\hat{\mathcal{H}}\) into this projected equation: 然后,黑板显示将 Born-Huang 拟设式代入 \(\Psi\),将分块哈密顿量代入以下投影方程的结果:

\(i\hbar \frac{\partial}{\partial t} \Theta_k(\vec{R}, t) = \int \Phi_k^* \left( -\sum_{I} \frac{\hbar^2}{2M_I}\nabla_I^2 + \hat{\mathcal{H}}_e + \hat{V}_{nn} \right) \sum_n \Theta_n \Phi_n \,d\vec{r}\)

  • Left Hand Side: The left side of the projection has been simplified. Because the electronic basis functions \(\Phi_n\) form an orthonormal set (\(\int \Phi_k^* \Phi_n d\vec{r} = \delta_{kn}\)), the sum collapses to a single term for \(n=k\). 投影左侧已简化。由于电子基函数 \(\Phi_n\) 构成一个正交集 (\(\int \Phi_k^* \Phi_n d\vec{r} = \delta_{kn}\),因此当 \(n=k\) 时,和将折叠为一个项。

  • Right Hand Side: This complex integral is the core of non-adiabatic dynamics. When the nuclear kinetic energy operator, \(\nabla_I^2\), acts on the product \(\Theta_n \Phi_n\), it acts on both functions (via the product rule). The terms that arise from \(\nabla_I\) acting on the electronic wave functions \(\Phi_n\) are known as non-adiabatic coupling terms. These terms are responsible for enabling transitions between different electronic potential energy surfaces, which is the essence of non-adiabatic dynamics. 这个复积分是非绝热动力学的核心。当核动能算符 \(\nabla_I^2\) 作用于乘积 \(\Theta_n \Phi_n\) 时,它会作用于这两个函数(通过乘积规则)。由 \(\nabla_I\) 作用于电子波函数 \(\Phi_n\) 而产生的项称为非绝热耦合项。这些术语负责实现不同电子势能面之间的转变,这是非绝热动力学的本质。

This whiteboard continues the mathematical derivation for non-adiabatic molecular dynamics started in the previous image. It focuses on expanding the nuclear kinetic energy term to reveal the crucial couplings between different electronic states.这块白板延续了上一张图片中非绝热分子动力学的数学推导。它着重于扩展核动能项,以揭示不同电子态之间的关键耦合。

11. Starting Point: The Projected Schrödinger Equation 起点:投影薛定谔方程

The derivation picks up from the equation for the time evolution of the nuclear wave function, \(\Theta_k\). The right-hand side of this equation is being evaluated. 推导过程取自核波函数 \(\Theta_k\) 的时间演化方程。该方程的右边正在求值。

\(= \int \Phi_k^* \left( -\sum_{I} \frac{\hbar^2}{2M_I}\nabla_I^2 \right) \sum_n \Theta_n \Phi_n \,d\vec{r} + E_k \Theta_k\)

This equation separates the total energy into two parts 该方程将总能量分为两部分 : * The first term is the contribution from the nuclear kinetic energy operator, \(-\sum_{I} \frac{\hbar^2}{2M_I}\nabla_I^2\). 第一项是核动能算符的贡献 * The second term, \(E_k \Theta_k\), is the contribution from the potential energy. This term arises from the action of the electronic Hamiltonian part \((\hat{\mathcal{H}}_e + \hat{V}_{nn})\) on the basis functions. Due to the orthonormality of the electronic wavefunctions (\(\int \Phi_k^* \Phi_n \,d\vec{r} = \delta_{kn}\)), the sum over \(n\) collapses to a single term for the potential energy. 第二项,\(E_k \Theta_k\),是势能的贡献。这一项源于电子哈密顿量部分 \((\hat{\mathcal{H}}_e + \hat{V}_{nn})\) 对基函数的作用。由于电子波函数(\(\int \Phi_k^* \Phi_n \,d\vec{r} = \delta_{kn}\))的正交性,\(n\)项的和会坍缩为势能的一项。

The challenge, and the core of the physics, lies in evaluating the first term, as the nuclear derivative \(\nabla_I\) acts on both the nuclear wave function \(\Theta_n\) and the electronic wave function \(\Phi_n\). 难点在于,也是物理的核心在于如何计算第一项,因为核导数 \(\nabla_I\) 同时作用于核波函数 \(\Theta_n\) 和电子波函数 \(\Phi_n\)

12. Applying the Product Rule for the Laplacian 应用拉普拉斯算子的乘积规则

To expand the kinetic energy term, the product rule for the Laplacian operator acting on two functions (A and B) is used. The board writes this rule as: 为了展开动能项,我们利用了拉普拉斯算子作用于两个函数(A 和 B)的乘积规则。棋盘上将这条规则写成: \(\nabla^2(AB) = (\nabla^2 A)B + 2(\nabla A)\cdot(\nabla B) + A(\nabla^2 B)\)

In our case, \(A = \Theta_n(\vec{R}, t)\) and \(B = \Phi_n(\vec{R}, \vec{r})\). The derivative \(\nabla_I\) is with respect to the nuclear coordinates \(\vec{R}_I\). 在我们的例子中,\(A = \Theta_n(\vec{R}, t)\)\(B = \Phi_n(\vec{R}, \vec{r})\)。导数 \(\nabla_I\) 是关于原子核坐标 \(\vec{R}_I\) 的。

13. Expanding the Kinetic Energy Term 展开动能项

Applying this rule, the integral containing the kinetic energy operator is expanded: 应用此规则,展开包含动能算符的积分: \(= -\sum_I \frac{\hbar^2}{2M_I} \int \Phi_k^* \sum_n \left( (\nabla_I^2 \Theta_n)\Phi_n + 2(\nabla_I \Theta_n)\cdot(\nabla_I \Phi_n) + \Theta_n(\nabla_I^2 \Phi_n) \right) d\vec{r} + E_k \Theta_k\)

This step explicitly shows how the nuclear kinetic energy operator gives rise to three distinct types of terms.此步骤明确展示了核动能算符如何产生三种不同类型的项。

14. Final Result and Identification of Coupling Terms 最终结果及耦合项的识别

The final step is to take the integral over the electronic coordinates (\(d\vec{r}\)) and rearrange the terms. The expression is simplified by again using the orthonormality of the electronic wave functions, \(\int \Phi_k^* \Phi_n \, d\vec{r} = \delta_{kn}\). 最后一步是对电子坐标 (\(d\vec{r}\)) 进行积分,并重新排列各项。再次利用电子波函数的正交性简化表达式,\(\int \Phi_k^* \Phi_n \, d\vec{r} = \delta_{kn}\)

\(= -\sum_I \frac{\hbar^2}{2M_I} \left( \nabla_I^2 \Theta_k + \sum_n 2 \left( \int \Phi_k^* \nabla_I \Phi_n \, d\vec{r} \right) \cdot \nabla_I \Theta_n + \sum_n \left( \int \Phi_k^* \nabla_I^2 \Phi_n \, d\vec{r} \right) \Theta_n \right) + E_k \Theta_k\)

This final equation is profound. It represents the time-independent Schrödinger equation for the nuclear wave function \(\Theta_k\), but it is coupled to all other nuclear wave functions \(\Theta_n\). Let’s break down the key terms within the parentheses: 最后一个方程意义深远。它代表了核波函数 \(\Theta_k\) 的与时间无关的薛定谔方程,但它与所有其他核波函数 \(\Theta_n\) 耦合。让我们分解一下括号内的关键项:

  • \(\nabla_I^2 \Theta_k\): This is the standard kinetic energy term for the nuclei moving on the potential energy surface of state \(k\). This is the only term that would remain in the simple Born-Oppenheimer (adiabatic) approximation. 这是原子核在势能面 \(k\) 上运动的标准动能项。这是在简单的 Born-Oppenheimer(绝热)近似中唯一保留的项。

  • \(\left( \int \Phi_k^* \nabla_I \Phi_n \, d\vec{r} \right)\): This is the first-derivative non-adiabatic coupling term (NACT), often called the derivative coupling. This vector quantity determines the strength of the coupling between electronic states \(k\) and \(n\) due to the velocity of the nuclei. It is the primary term responsible for enabling transitions between different potential energy surfaces. 这是一阶导数非绝热耦合项 (NACT),通常称为导数耦合。该矢量决定了由于原子核速度而导致的电子态 \(k\)\(n\) 之间耦合的强度。它是实现不同势能面之间跃迁的主要项。

  • \(\left( \int \Phi_k^* \nabla_I^2 \Phi_n \, d\vec{r} \right)\): This is the second-derivative non-adiabatic coupling term, a scalar quantity. While often smaller than the first-derivative term, it is also part of the complete description of non-adiabatic effects. 是二阶导数非绝热耦合项,一个标量。虽然它通常小于一阶导数项,但它也是非绝热效应完整描述的一部分。

In summary, this derivation shows mathematically how the motion of the nuclei (via the \(\nabla_I\) operator) can induce quantum mechanical transitions between different electronic states (\(\Phi_k \leftrightarrow \Phi_n\)). The strength of these transitions is governed by the non-adiabatic coupling terms, which depend on how the electronic wave functions change as the nuclear geometry changes. 总之,该推导从数学上展示了原子核的运动(通过 \(\nabla_I\) 算符)如何诱导不同电子态之间的量子力学跃迁(\(\Phi_k \leftrightarrow \Phi_n\))。这些跃迁的强度由非绝热耦合项控制,而非绝热耦合项又取决于电子波函数如何随原子核几何结构的变化而变化。

This whiteboard concludes the derivation of the equations for non-adiabatic molecular dynamics by defining the coupling operator and then showing how different levels of approximation—specifically the Born-Huang and the more restrictive Born-Oppenheimer approximations—arise from neglecting certain coupling terms. 这块白板通过定义耦合算符,并展示不同程度的近似——特别是 Born-Huang 近似和更严格的 Born-Oppenheimer 近似——是如何通过忽略某些耦合项而产生的,从而推导出非绝热分子动力学方程的。

15. Definition of the Non-Adiabatic Coupling Operator 非绝热耦合算符的定义

The whiteboard begins by collecting all the non-adiabatic coupling terms derived previously into a single operator, \(C_{kn}\). 白板首先将之前推导的所有非绝热耦合项合并为一个算符 \(C_{kn}\)

Let \(C_{kn} = -\sum_{I} \frac{\hbar^2}{2M_I} \left( 2 \left( \int \Phi_k^* \nabla_I \Phi_n \, d\vec{r} \right) \cdot \nabla_I + \left( \int \Phi_k^* \nabla_I^2 \Phi_n \, d\vec{r} \right) \right)\)

  • This operator, \(C_{kn}\), represents the total effect of the coupling between electronic state \(k\) and electronic state \(n\), which is induced by the kinetic energy of the nuclei. 此算符 \(C_{kn}\) 表示由原子核动能引起的电子态 \(k\) 和电子态 \(n\) 之间耦合的总效应。
  • The operator acts on the nuclear wave function that follows it in the full equation. The \(\nabla_I\) term acts as a derivative on that wave function. 该算符作用于完整方程中跟随它的核波函数。\(\nabla_I\) 项充当该波函数的导数。

16. The Coupled Equations of Motion 耦合运动方程

Using this compact definition, the full set of coupled time-dependent Schrödinger equations for the nuclear wave functions can be written as: 基于此简洁定义,核波函数的完整耦合含时薛定谔方程组可以写成:

\(i\hbar \frac{\partial}{\partial t} \Theta_k = \left( -\sum_{I} \frac{\hbar^2}{2M_I}\nabla_I^2 + E_k \right) \Theta_k + \sum_n C_{kn} \Theta_n\)

This is the central result. It shows that the time evolution of the nuclear wave function on a given potential energy surface \(k\) (described by \(\Theta_k\)) depends on two things: 这是核心结论。它表明,核波函数在给定势能面 \(k\)(用 \(\Theta_k\) 描述)上的时间演化取决于两个因素: 1. The motion on its own surface, governed by its kinetic energy and the potential \(E_k\). 其自身表面上的运动,由其动能和势能 \(E_k\) 控制。 2. The influence of the nuclear wave functions on all other electronic surfaces (\(\Theta_n\)), mediated by the coupling operators \(C_{kn}\). 核波函数对所有其他电子表面(\(\Theta_n\))的影响,由耦合算符 \(C_{kn}\) 介导。

17. The Born-Huang Approximation 玻恩-黄近似

The first and most crucial approximation is introduced to simplify this complex set of coupled equations. 为了简化这组复杂的耦合方程,引入了第一个也是最重要的近似。

If \(C_{kn} = 0\) for \(k \neq n\) (Born-Huang approximation)

This approximation assumes that the off-diagonal coupling terms, which are responsible for transitions between different electronic states, are negligible. However, it retains the diagonal coupling term (\(C_{kk}\)). This leads to a simplified, uncoupled equation: 该近似假设导致不同电子态之间跃迁的非对角耦合项可以忽略不计。然而,它保留了对角耦合项(\(C_{kk}\))。这可以得到一个简化的非耦合方程:

\(i\hbar \frac{\partial}{\partial t} \Theta_k = \left( -\sum_{I} \frac{\hbar^2}{2M_I}\nabla_I^2 + E_k + C_{kk} \right) \Theta_k\)

Substituting the definition of \(C_{kk}\): 代入 \(C_{kk}\) 的定义:

\(i\hbar \frac{\partial}{\partial t} \Theta_k = \left( -\sum_{I} \frac{\hbar^2}{2M_I}\nabla_I^2 + E_k - \sum_I \frac{\hbar^2}{2M_I} \left( 2 \left( \int \Phi_k^* \nabla_I \Phi_k \, d\vec{r} \right) \cdot \nabla_I + \int \Phi_k^* \nabla_I^2 \Phi_k \, d\vec{r} \right) \right) \Theta_k\)

The term \(C_{kk}\) is known as the diagonal Born-Oppenheimer correction (DBOC). It represents a small correction to the potential energy surface \(E_k\) that arises from the fact that the electrons do not adjust perfectly and instantaneously to the nuclear motion, even within the same electronic state. \(C_{kk}\) 项被称为对角玻恩-奥本海默修正 (DBOC)。它表示对势能面 \(E_k\) 的微小修正,其原因是即使在相同的电子态下,电子也无法完美且即时地适应核运动。

  • Note on Real Wavefunctions 关于实波函数的注释: The board shows that for real wavefunctions, the first-derivative part of the diagonal correction vanishes: \(\int \Phi_k \nabla_I \Phi_k \, d\vec{r} = 0\). This is because the integral is related to the gradient of the normalization condition, \(\nabla_I \int \Phi_k^2 \, d\vec{r} = \nabla_I(1) = 0\), which expands to \(2\int \Phi_k \nabla_I \Phi_k \, d\vec{r} = 0\). 黑板显示,对于实波函数,对角修正的一阶导数部分为零:\(\int \Phi_k \nabla_I \Phi_k \, d\vec{r} = 0\)。这是因为积分与归一化条件的梯度有关,\(\nabla_I \int \Phi_k^2 \, d\vec{r} = \nabla_I(1) = 0\),其展开为 \(2\int \Phi_k \nabla_I \Phi_k \, d\vec{r} = 0\)

18. The Born-Oppenheimer Approximation 玻恩-奥本海默近似

The final and most widely used approximation is the Born-Oppenheimer approximation. It is more restrictive than the Born-Huang approximation. 最后一种也是最广泛使用的近似方法是玻恩-奥本海默近似。它比玻恩-黄近似更具限制性。

If \(C_{kk} = 0\) (Born-Oppenheimer approximation) 若\(C_{kk} = 0\)(玻恩-奥本海默近似)

This assumes that the diagonal correction term is also negligible. By setting all \(C_{kn}=0\) (both diagonal and off-diagonal), the equations become completely decoupled, and the nuclear motion evolves independently on each potential energy surface. 这假设对角修正项也可忽略不计。通过令所有\(C_{kn}=0\)(包括对角和非对角),方程组完全解耦,原子核运动在每个势能面上独立演化。

The result is the standard time-dependent Schrödinger equation for the nuclei: 由此可得标准的原子核的含时薛定谔方程

\(i\hbar \frac{\partial}{\partial t} \Theta_k = \left( -\sum_{I} \frac{\hbar^2}{2M_I}\nabla_I^2 + E_k \right) \Theta_k\)

This equation is the foundation of most of quantum chemistry. It states that the nuclei move on a static potential energy surface \(E_k(\vec{R})\) provided by the electrons, without any possibility of transitioning to other electronic states or having the surface be corrected by their own motion.

该方程是大多数量子化学的基础。原子核在由电子提供的静态势能面 \(E_k(\vec{R})\) 上运动,不存在跃迁到其他电子态或因自身运动而修正势能面的可能性。

【问题】主要为了图像不显示问题

Step1:根目录中的配置文件

Step2:将 Markdown 行替换为HTML 代码

Step3:设置下方添加ROOT

Step4:不需要此插件终端中运行以下命令来卸载插件:

1
2
3
4
5
6
7
8
$ # URL
## Set your site url here. For example, if you use GitHub Page, set url as 'https://username.github.io/project'
$ url: https://TianyaoBlogs.github.io/

$ root: /

$ permalink: :year/:month/:day/:title/

1
$ <img src="/imgs/5054C3/General_linear_regression_model.png" alt="A diagram of the general linear regression model">
1
$ npm uninstall hexo-asset-image

PHYS 5120 - 计算能源材料和电子结构模拟 Lecture-3

Lecturer: Prof.PAN DING

1 radial distribution function RDF静态结构:

  • 内容: This whiteboard serves as an excellent summary, pulling together all the key concepts we’ve discussed into a single, cohesive picture. Let’s connect everything on this slide to our detailed conversation.

1. RDF: The Static Structure RDF静态结构

On the top left, you see RDF (Radial Distribution Function).

  • The Plots: The board shows the familiar \(g(r)\) plot with its characteristic peaks for a liquid. Below it is a plot of the interatomic potential energy, \(V(r)\). This addition is very insightful! It shows why the first peak in \(g(r)\) exists: it corresponds to the minimum energy distance (\(\sigma\)) where particles are most stable and likely to be found. 白板展示了我们熟悉的\(g(r)\)图,它带有液体的特征峰。下方是原子间势能\(V(r)\)的图。这个补充非常有见地!它解释了为什么 \(g(r)\) 中的第一个峰值存在:它对应于粒子最稳定且最有可能被发现的最小能量距离 (\(\sigma\))。
  • Connection: This section summarizes our first discussion. It’s the starting point for our analysis—a static snapshot of the material’s average atomic arrangement before we consider how the atoms move. 本节总结了我们的第一个讨论。这是我们分析的起点——在我们考虑原子如何运动之前,它是材料平均原子排列的静态快照。

2. MSD and The Einstein Relation: The Displacement Picture 均方位移 (MSD) 和爱因斯坦关系:位移图像

The board then moves to dynamics, presenting two methods to calculate the diffusion constant, D. The first is the Einstein relation. 两种计算扩散常数 D的方法。第一种是爱因斯坦关系

  • The Formula: It correctly states that the Mean Squared Displacement (MSD), \(\langle r^2 \rangle\), is equal to \(6Dt\) in three dimensions. It then rearranges this to solve for \(D\): 它正确地指出了均方位移 (MSD),\(\langle r^2 \rangle\),在三维空间中等于 \(6Dt\)。然后重新排列该公式以求解 \(D\)\[D = \lim_{t\to\infty} \frac{\langle |\vec{r}(t) - \vec{r}(0)|^2 \rangle}{6t}\]
  • The Diagram: The central diagram beautifully illustrates the concept. It shows a particle in a simulation box (with “N=108” likely being the number of particles simulated) moving from an initial position \(\vec{r}_i(0)\) to a final position \(\vec{r}_i(t_j)\). The MSD is the average of the square of this displacement over all particles and many time origins. The graph labeled “MSD” shows how you would plot this data and find the slope (“fitting”) to calculate \(D\). 中间的图表完美地阐释了这个概念。它展示了一个粒子在模拟框中(“N=108” 可能是模拟粒子的数量)从初始位置 \(\vec{r}_i(0)\) 移动到最终位置 \(\vec{r}_i(t_j)\)。MSD 是该位移平方在所有粒子和多个时间原点上的平均值。标有“MSD”的图表显示了如何绘制这些数据并找到斜率(“拟合”)来计算 \(D\)
  • Connection: This is a perfect summary of the “Displacement Picture” we analyzed on the second whiteboard. It’s the most intuitive way to think about diffusion: how far particles spread out over time.这完美地总结了我们在第二个白板上分析的“位移图”。这是思考扩散最直观的方式:粒子随时间扩散的距离。

3. The Green-Kubo Relation: The Fluctuation Picture 格林-久保关系:涨落图

Finally, the board presents the more advanced but often more practical method: the Green-Kubo relation.

  • The Equations: This section displays the two key equations from our last discussion:
    1. The MSD as the double integral of the Velocity Autocorrelation Function (VACF). 速度自相关函数 (VACF) 的二重积分的均方差 (MSD)。
    2. The crucial derivative step: \(\frac{d\langle x^2(t)\rangle}{dt} = 2 \int_0^t dt'' \langle V_x(t) V_x(t'') \rangle\). 关键的导数步骤:\(\frac{d\langle x^2(t)\rangle}{dt} = 2 \int_0^t dt'' \langle V_x(t) V_x(t'') \rangle\)
  • The Diagram: The small diagram of a square with axes \(t'\) and \(t''\) visually represents the two-dimensional domain of integration for the double integral. 一个带有轴 \(t'\)\(t''\) 的小正方形图直观地表示了二重积分的二维积分域。
  • Connection: This summarizes the “Fluctuation Picture.” It shows the mathematical heart of the derivation that proves the Einstein and Green-Kubo methods are equivalent. As we concluded, this method is often numerically superior because it involves integrating a rapidly decaying function (the VACF) rather than finding the slope of a noisy, unbounded function (the MSD). 这概括了“涨落图”。它展示了证明爱因斯坦方法和格林-久保方法等价的推导过程的数学核心。正如我们总结的那样,这种方法通常在数值上更胜一筹,因为它涉及对快速衰减函数(VACF)进行积分,而不是求噪声无界函数(MSD)的斜率。

In essence, this single whiteboard is a complete roadmap for analyzing diffusion in a molecular simulation. It shows how to first characterize the material’s structure (\(g(r)\)) and then how to compute its key dynamic property—the diffusion constant D—using two powerful, interconnected methods. 本质上,这块白板就是分子模拟中分析扩散的完整路线图。它展示了如何首先表征材料的结构\(g(r)\)),然后如何使用两种强大且相互关联的方法计算其关键的动态特性——扩散常数 D

This whiteboard beautifully concludes the derivation of the Green-Kubo relation, showing the final formulas and how they are used in practice. It provides the punchline to the mathematical story we’ve been following.

Let’s break down the details.

4. Finalizing the Derivation

The top lines of the board show the final step in connecting the Mean Squared Displacement (MSD) to the Velocity Autocorrelation Function (VACF).

\[\lim_{t\to\infty} \frac{d\langle x^2 \rangle}{dt} = 2 \int_0^\infty d\tau \langle V_x(0) V_x(\tau) \rangle\]

  • The Left Side: As we know from the Einstein relation, the long-time limit of the derivative of the 1D MSD, \(\lim_{t\to\infty} \frac{d\langle x^2 \rangle}{dt}\), is simply equal to \(2D\).
  • The Right Side: This is the result of the mathematical derivation from the previous slide. It shows that this same quantity is also equal to twice the total integral of the VACF.

By equating these two, we can solve for the diffusion coefficient, D.

5. The Velocity Autocorrelation Function (VACF)

The board explicitly names the key quantity here:

\[\Phi(\tau) = \langle V_x(0) V_x(\tau) \rangle\]

This is the “Velocity autocorrelation function” (abbreviated as VAF on the board), which we’ve denoted as VACF. The variable has been changed from t to τ (tau) to represent a “time lag” or interval, which is common notation.

  • The Plot: The graph on the board shows a typical plot of the VACF, \(\Phi(\tau)\), versus the time lag \(\tau\).
    • It starts at a maximum positive value at \(\tau=0\) (when the velocity is perfectly correlated with itself).
    • It rapidly decays towards zero as the particle undergoes collisions that randomize its velocity.
  • The Integral: The shaded area under this curve represents the value of the integral \(\int_0^\infty \Phi(\tau) d\tau\). The Green-Kubo formula states that the diffusion coefficient is directly proportional to this area.

6. The Green-Kubo Formulas for the Diffusion Coefficient

After canceling the factor of 2, the board presents the final, practical formulas for D.

  • In 1 Dimension: \[D = \int_0^\infty d\tau \langle V_x(0) V_x(\tau) \rangle\]
  • In 3 Dimensions: This is the more general and useful formula. \[D = \frac{1}{3} \int_0^\infty d\tau \langle \vec{v}(0) \cdot \vec{v}(\tau) \rangle\] There are two important changes for 3D:
    1. We use the full velocity vectors and their dot product, \(\vec{v}(0) \cdot \vec{v}(\tau)\), to capture motion in all directions.
    2. We divide by 3 to get the average contribution to diffusion in any one direction (x, y, or z).

7. Practical Calculation in a Simulation

The last formula on the board shows how this is implemented in a computer simulation with a finite number of atoms.

\[D = \frac{1}{3N} \int_0^\infty d\tau \sum_{i=1}^{N} \langle \vec{v}_i(0) \cdot \vec{v}_i(\tau) \rangle\]

  • \(\sum_{i=1}^{N}\): This summation symbol indicates that you must compute the VACF for each individual atom (from atom i=1 to atom N).
  • \(\frac{1}{N}\): You then average the results over all N atoms in your simulation box.
  • \(\langle \dots \rangle\): The angle brackets here still imply an additional average over multiple different starting times (t=0) to get good statistics.

This formula is the practical recipe: to get the diffusion coefficient, you track the velocity of every atom, calculate each one’s VACF, average them together, and then integrate the result over time.

PHYS 5120 - 计算能源材料和电子结构模拟 Lecture-3

Lecturer: Prof.PAN DING

1 radial distribution function:

  • 内容:

This whiteboard explains the process of calculating the radial distribution function, often denoted as \(g(r)\), to analyze the atomic structure of a material, which is referred to here as a “film”. 本白板解释了计算径向分布函数(通常表示为 \(g(r)\))的过程,用于分析材料(本文中称为“薄膜”)的原子结构。

In simple terms, the radial distribution function tells you the probability of finding an atom at a certain distance from another reference atom. It’s a powerful way to see the local structure in a disordered system like a liquid or an amorphous solid.

简单来说,径向分布函数表示在距离另一个参考原子一定距离处找到一个原子的概率。它是观察无序系统(例如液体或非晶态固体)局部结构的有效方法。

## Core Concept: Radial Distribution Function 径向分布函数

The main goal is to compute the radial distribution function, \(g(r)\), which is defined as the ratio of the actual number of atoms found in a thin shell at a distance \(r\) to the number of atoms you’d expect to find if the material were an ideal gas (completely random). 主要目标是计算径向分布函数 \(g(r)\),其定义为在距离 \(r\) 的薄壳层中实际发现的原子数与材料为理想气体(完全随机)时预期发现的原子数之比。

The formula is expressed as: \[g(r)dr = \frac{n(r)}{\text{ideal gas}}\]

  • \(n(r)\): Represents the average number of atoms found in a thin spherical shell between a distance \(r\) and \(r+dr\) from a central atom. 表示距离中心原子 \(r\)\(r+dr\) 之间的薄球壳中原子的平均数量。
  • ideal gas: Represents the number of atoms you would expect in that same shell if the atoms were distributed completely randomly with the same average density (\(\rho\)). The volume of this shell is approximately \(4\pi r^2 dr\).表示如果原子完全随机分布且平均密度 (\(\rho\)) 相同,则该球壳中原子的数量。该球壳的体积约为 \(4\pi r^2 dr\)

A peak in the \(g(r)\) plot indicates a high probability of finding neighboring atoms at that specific distance, revealing the material’s structural shells (e.g., nearest neighbors, second-nearest neighbors, etc.).\(g(r)\) 图中的峰值表示在该特定距离处找到相邻原子的概率很高,从而揭示了材料的结构壳(例如,最近邻、次近邻等)。

## Calculation Method

The board outlines a two-step averaging process to get a statistically meaningful result from simulation data (a “film” at 20 frames per second).

  1. Average over atoms: In a single frame (a snapshot in time), you pick one atom as the center. Then, you count how many other atoms (\(n(r)\)) are in concentric spherical shells around it. This process is repeated, treating each atom in the frame as the center, and the results are averaged.

  2. Average over frames: The entire process described above is repeated for multiple frames from the simulation or video. This time-averaging ensures that the final result represents the typical structure of the material over time, smoothing out random fluctuations.

The board notes “dx = bin width 0.01Å”, which is a practical detail for the calculation. To create a histogram, the distance r is divided into small segments (bins) of 0.01 angstroms.

## Connection to Experiments

Finally, the whiteboard mentions “frame X-ray scattering”. This is a crucial point because it connects this computational analysis to real-world experiments. Experimental techniques like X-ray or neutron scattering can be used to measure a quantity called the structure factor, \(S(q)\), which is directly related to the radial distribution function \(g(r)\) through a mathematical operation called a Fourier transform. This allows scientists to directly compare the structure produced in their simulations with the structure of a real material measured in a lab. 最后,白板上提到了“帧 X 射线散射”。这一点至关重要,因为它将计算分析与实际实验联系起来。X射线或中子散射等实验技术可以用来测量一个称为结构因子\(S(q)\)的量,该量通过傅里叶变换的数学运算与径向分布函数\(g(r)\)直接相关。这使得科学家能够直接将模拟中产生的结构与实验室测量的真实材料结构进行比较。

The board correctly links \(g(r)\) to X-ray scattering experiments. The quantity measured in these experiments is the static structure factor, \(S(q)\), which describes how the material scatters radiation. The relationship between the two is a Fourier transform: 该板正确地将\(g(r)\)与X射线散射实验联系起来。这些实验中测量的量是静态结构因子\(S(q)\),它描述了材料如何散射辐射。两者之间的关系是傅里叶变换: \[S(q) = 1 + 4 \pi \rho \int_0^\infty [g(r) - 1] r^2 \frac{\sin(qr)}{qr} dr\] This equation is crucial because it bridges the gap between computer simulations (which calculate \(g(r)\)) and physical experiments (which measure \(S(q)\)). 这个方程至关重要,因为它弥合了计算机模拟(计算 \(g(r)\))和物理实验(测量 \(S(q)\))之间的差距。

## 2. The Gaussian Distribution: Probability of Particle Position 高斯分布:粒子位置的概率

The board starts with the formula for a one-dimensional Gaussian (or normal) distribution: 白板首先展示的是一维高斯(或正态)分布的公式:

\[f(x | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\]

This equation describes the probability of finding a particle at a specific position x after a certain amount of time has passed. * \(\mu\) (mu) is the mean or average position. For a simple diffusion process starting at the origin, the particles spread out symmetrically, so the average position remains at the origin (\(\mu = 0\)). * \(\sigma^2\) (sigma squared) is the variance, which measures how spread out the particles are from the mean position. A larger variance means the particles have, on average, traveled farther from the starting point. 这个方程描述了经过一定时间后,在特定位置“x”找到粒子的概率。 * \(\mu\) (mu)平均值或平均位置。对于从原点开始的简单扩散过程,粒子对称扩散,因此平均位置保持在原点(\(\mu = 0\))。 * \(\sigma^2\)(sigma 平方)方差,用​​于衡量粒子与平均位置的扩散程度。方差越大,意味着粒子平均距离起点越远。

The note “Black-Scholes” is a side reference. The Black-Scholes model, famous in financial mathematics for pricing options, uses similar mathematical principles based on Brownian motion to model the random fluctuations of stock prices. “Black-Scholes”注释仅供参考。Black-Scholes 模型在金融数学中以期权定价而闻名,它使用基于布朗运动的类似数学原理来模拟股票价格的随机波动。

## 3. Mean Squared Displacement (MSD): Quantifying the Spread 均方位移 (MSD):量化扩散

The core of the board is dedicated to the Mean Squared Displacement (MSD). This is the primary tool used to measure how far, on average, particles have moved over a time interval t. 本版块的核心内容是均方位移 (MSD)。这是用于测量粒子在时间间隔“t”内平均移动距离的主要工具。

The variance \(\sigma^2\) is formally defined as the average of the squared deviations from the mean: \[\sigma^2 = \langle x^2(t) \rangle - \langle x(t) \rangle^2\] * \(\langle x(t) \rangle\) is the average displacement. As mentioned, for simple diffusion, \(\langle x(t) \rangle = 0\). * \(\langle x^2(t) \rangle\) is the average of the square of the displacement. 方差\(\sigma^2\)的正式定义为与平均值偏差平方的平均值: \[\sigma^2 = \langle x^2(t) \rangle - \langle x(t) \rangle^2\] * \(\langle x(t) \rangle\)是平均位移。如上所述,对于简单扩散,\(\langle x(t) \rangle = 0\)。 * \(\langle x^2(t) \rangle\)是位移平方的平均值。

Since \(\langle x(t) \rangle = 0\), the variance is simply equal to the MSD: \[\sigma^2 = \langle x^2(t) \rangle\] 由于 \(\langle x(t) \rangle = 0\),方差等于均方差 (MSD): \[\sigma^2 = \langle x^2(t) \rangle\]

The crucial insight for a diffusive process is that the MSD grows linearly with time. The rate of this growth is determined by the diffusion coefficient, D. The board shows this relationship for different dimensions: 扩散过程的关键在于MSD 随时间线性增长。其增长率由扩散系数 D决定。棋盘显示了不同维度下的这种关系:

  • 1D: \(\langle x^2(t) \rangle = 2Dt\) (Movement along a line) (沿直线运动)
  • 2D: The board has a slight typo or ambiguity with \(\langle z^2(t) \rangle = 2Dt\). For 2D motion in the x-y plane, the total MSD would be \(\langle r^2(t) \rangle = \langle x^2(t) \rangle + \langle y^2(t) \rangle = 4Dt\). The note on the board might be referring to just one component of motion. **棋盘上的 \(\langle z^2(t) \rangle = 2Dt\) 存在轻微拼写错误或歧义。对于 x-y 平面上的二维运动,总平均散射差 (MSD) 为 \(\langle r^2(t) \rangle = \langle x^2(t) \rangle + \langle y^2(t) \rangle = 4Dt\)。黑板上的注释可能仅指运动的一个分量。
  • 3D: \(\langle r^2(t) \rangle = \langle |\vec{r}(t) - \vec{r}(0)|^2 \rangle = 6Dt\) (Movement in 3D space, which is the most common case in molecular simulations) (三维空间中的运动,这是分子模拟中最常见的情况) Here, \(\vec{r}(t)\) is the position vector of a particle at time t. The quantity \(\langle |\vec{r}(t) - \vec{r}(0)|^2 \rangle\) is the average of the squared distance a particle has traveled from its initial position \(\vec{r}(0)\). 这里,\(\vec{r}(t)\) 是粒子在时间 t 的位置矢量。 \(\langle |\vec{r}(t) - \vec{r}(0)|^2 \rangle\) 是粒子从其初始位置 \(\vec{r}(0)\) 行进距离的平方平均值。

## 4. The Einstein Relation: Connecting Microscopic Motion to a Macroscopic Property 爱因斯坦关系:将微观运动与宏观特性联系起来

Finally, the board presents the famous Einstein relation, which rearranges the 3D MSD equation to solve for the diffusion coefficient D:

\[D = \lim_{t \to \infty} \frac{\langle |\vec{r}(t) - \vec{r}(0)|^2 \rangle}{6t}\]

This is a cornerstone equation in statistical mechanics. It provides a practical way to calculate a macroscopic property—the diffusion coefficient D—from the microscopic movements of individual particles observed in a computer simulation. 这是统计力学中的一个基石方程。它提供了一种实用的方法,可以通过计算机模拟中观察到的单个粒子的微观运动来计算宏观属性——扩散系数“D”。

In practice, one would: 1. Run a simulation of particles. 运行粒子模拟。 2. Track the position of each particle over time. 跟踪每个粒子随时间的位置。 3. Calculate the squared displacement \(|\vec{r}(t) - \vec{r}(0)|^2\) for each particle at various time intervals t. 计算每个粒子在不同时间间隔“t”的位移平方\(|\vec{r}(t) - \vec{r}(0)|^2\)。 4. Average this value over all particles to get the MSD, \(\langle |\vec{r}(t) - \vec{r}(0)|^2 \rangle\). 对所有粒子取平均值,得到均方差(MSD),即\(\langle |\vec{r}(t) - \vec{r}(0)|^2 \rangle\)。 5. Plot the MSD as a function of time. 将MSD绘制成时间函数。 6. The slope of this line, divided by 6, gives the diffusion coefficient D. The lim t→∞ indicates that this linear relationship is most accurate for long time scales, after initial transient effects have died down. 这条直线的斜率除以6,即扩散系数“D”。“lim t→∞”表明,在初始瞬态效应消退后,这种线性关系在长时间尺度上最为准确。

## 5. Right Board: Green-Kubo Relations

This board introduces a more advanced and powerful method to calculate transport coefficients like the diffusion coefficient, known as the Green-Kubo relations. 本面板介绍了一种更先进、更强大的方法来计算扩散系数等传输系数,即Green-Kubo 关系

### Velocity Autocorrelation Function (VACF) 速度自相关函数 (VACF)

The key idea is to look at how a particle’s velocity at one point in time is related to its velocity at a later time. This is measured by the Velocity Autocorrelation Function (VACF): \[C_{vv}(t) = \langle \vec{v}(t') \cdot \vec{v}(t' + t) \rangle\] This function tells us how long a particle “remembers” its velocity. For a typical liquid, the velocity is quickly randomized by collisions, so the VACF decays to zero rapidly. 其核心思想是考察粒子在某一时间点的速度与其在之后时间点的速度之间的关系。这可以通过速度自相关函数 (VACF)来测量: \[C_{vv}(t) = \langle \vec{v}(t') \cdot \vec{v}(t' + t) \rangle\] 此函数告诉我们粒子“记住”其速度的时间。对于典型的液体,速度会因碰撞而迅速随机化,因此 VACF 会迅速衰减为零。

### Connecting MSD and VACF

The board shows the mathematical link between the MSD and the VACF. Starting with the definition of position as the integral of velocity, \(\vec{r}(t) = \int_0^t \vec{v}(t') dt'\), one can show that the MSD is a double integral of the VACF. The board writes this as: \[\langle x^2(t) \rangle = \left\langle \left( \int_0^t v(t') dt' \right) \left( \int_0^t v(t'') dt'' \right) \right\rangle = \int_0^t dt' \int_0^t dt'' \langle v(t') v(t'') \rangle\] This shows that the two pictures of motion—the particle’s displacement (MSD) and its velocity fluctuations (VACF)—are deeply connected. 该面板展示了 MSD 和 VACF 之间的数学联系。从位置定义为速度的积分开始,\(\vec{r}(t) = \int_0^t \vec{v}(t') dt'\),可以证明 MSD 是 VACF 的二重积分。黑板上写着: \[\langle x^2(t) \rangle = \left\langle \left( \int_0^t v(t') dt' \right) \left( \int_0^t v(t'') dt'' \right) \right\rangle = \int_0^t dt' \int_0^t dt'' \langle v(t') v(t'') \rangle\] 这表明,粒子运动的两幅图像——粒子的位移(MSD)和速度涨落(VACF)——之间存在着深刻的联系。

### The Green-Kubo Formula for Diffusion 扩散的格林-久保公式

By combining the Einstein relation with the integral of the VACF, one arrives at the Green-Kubo formula for the diffusion coefficient: \[D = \frac{1}{3} \int_0^\infty \langle \vec{v}(0) \cdot \vec{v}(t) \rangle dt\] This incredible result states that the macroscopic property of diffusion (\(D\)) is determined by the integral of the microscopic velocity correlations. It’s often a more efficient way to compute \(D\) in simulations than calculating the long-time limit of the MSD. 将爱因斯坦关系与VACF积分相结合,可以得到扩散系数的格林-久保公式: \[D = \frac{1}{3} \int_0^\infty \langle \vec{v}(0) \cdot \vec{v}(t) \rangle dt\] 这个令人难以置信的结果表明,扩散的宏观特性(\(D\))由微观速度关联的积分决定。在模拟中,这通常是计算\(D\)比计算MSD的长期极限更有效的方法。

## 6. The Grand Narrative: From Micro to Macro 宏大叙事:从微观到宏观

The previous whiteboards gave us two ways to calculate the diffusion constant, D, from the microscopic random walk of individual atoms: 之前的白板提供了两种从单个原子的微观随机游动计算扩散常数 D的方法: 1. Einstein Relation: From the long-term slope of the Mean Squared Displacement (MSD). 根据均方位移 (MSD) 的长期斜率。 2. Green-Kubo Relation: From the integral of the Velocity Autocorrelation Function (VACF). 根据速度自相关函数 (VACF) 的积分。

This new whiteboard shows how that single microscopic parameter, D, governs the large-scale, observable process of diffusion described by Fick’s Laws and the Diffusion Equation. 这块新的白板展示了单个微观参数 D 如何控制菲克定律扩散方程所描述的大规模可观测扩散过程。

## 1. The Starting Point: A Liquid’s Structure 起点:液体的结构

The plot on the top left is the Radial Distribution Function, \(g(r)\), which we discussed in detail from the first whiteboard. 左上角的图是径向分布函数 \(g(r)\),我们在第一个白板上详细讨论过它。

  • The Plot: It shows the characteristic structure of a liquid. The peaks are labeled “1st”, “2nd”, and “3rd”, corresponding to the first, second, and third solvation shells (layers of neighboring atoms). 它显示了液体的特征结构。峰分别标记为“第一”、“第二”和“第三”,分别对应于第一、第二和第三溶剂化壳层(相邻原子层)。
  • The Limit: The note lim r→∞ g(r) = 1 confirms that at large distances, the liquid has no long-range order, as expected.注释“lim r→∞ g(r) = 1”证实了在远距离下,液体没有长程有序,这与预期一致。
  • System Parameters: The values T = 0.71 and ρ = 0.844 are the temperature and density of the simulated system (likely in reduced or “Lennard-Jones” units) for which this \(g(r)\) was calculated. 值“T = 0.71”和“ρ = 0.844”分别是模拟系统的温度和密度(可能采用约化或“Lennard-Jones”单位),用于计算此 \(g(r)\)

This section sets the stage: we are looking at the dynamics within a system that has this specific liquid-like structure. 本节奠定了基础:我们将研究具有特定类液体结构的系统内的动力学。

## 2. The Macroscopic Laws of Diffusion 宏观扩散定律

The bottom-left and top-right sections introduce the continuum equations that describe how concentration changes in space and time. 左下角和右上角部分介绍了描述浓度随空间和时间变化的连续方程。左下角和右上角部分介绍了描述浓度随空间和时间变化的连续方程。

### Fick’s First Law 菲克第一定律

\[\vec{J} = -D \nabla C\] This is Fick’s first law of diffusion. It states that there is a flux of particles (\(\vec{J}\)), meaning a net flow. This flow is directed from high concentration to low concentration (hence the minus sign) and its magnitude is proportional to the concentration gradient (\(\nabla C\)). 这是菲克第一扩散定律。它指出存在粒子的通量 (\(\vec{J}\)),即净流量。该流量从高浓度流向低浓度(因此带有负号),其大小与浓度梯度 (\(\nabla C\)) 成正比。

The Crucial Link: The proportionality constant is D, the very same diffusion constant we calculated from the microscopic random walk (MSD/VACF). This is the key connection: the collective result of countless individual random walks is a predictable net flow of particles. 比例常数是D,与我们根据微观随机游走 (MSD/VACF) 计算出的扩散常数完全相同。这是关键的联系:无数个体随机游动的集合结果是可预测的粒子净流。

### The Diffusion Equation (Fick’s Second Law) 扩散方程(菲克第二定律)

\[\frac{\partial C(\vec{r},t)}{\partial t} = D \nabla^2 C(\vec{r},t)\] This is the diffusion equation, one of the most important equations in physics and chemistry (also called the heat equation, as noted). It’s derived from Fick’s first law and the principle of mass conservation (\(\frac{\partial C}{\partial t} + \nabla \cdot \vec{J} = 0\)). It’s a differential equation that tells you exactly how the concentration at any point, \(C(\vec{r},t)\), will change over time. 这就是扩散方程,它是物理学和化学中最重要的方程之一(也称为热方程)。它源于菲克第一定律和质量守恒定律(\(\frac{\partial C}{\partial t} + \nabla \cdot \vec{J} = 0\))。它是一个微分方程,可以精确地告诉你任意一点的浓度 \(C(\vec{r},t)\) 随时间的变化。

## 3. The Solution: Connecting Back to the Random Walk 与随机游动联系起来

This is the most beautiful part. The board shows the solution to the diffusion equation for a very specific scenario, linking the macroscopic equation directly back to the microscopic random walk. 黑板上展示了一个非常具体场景下扩散方程的解,将宏观方程直接与微观随机游动联系起来。

### The Initial Condition 初始条件

The problem is set up by assuming all particles start at a single point at time zero: \[C(\vec{r}, 0) = \delta(\vec{r})\] This is a Dirac delta function, representing an infinitely concentrated point source at the origin. 问题假设所有粒子在时间零点处从一个点开始: \[C(\vec{r}, 0) = \delta(\vec{r})\] 这是一个狄拉克函数,表示一个在原点处无限集中的点源。

### The Fundamental Solution (Green’s Function) 基本解(格林函数)

The solution to the diffusion equation with this starting condition is called the fundamental solution or Green’s function. For one dimension, it is: \[C(x,t) = \frac{1}{\sqrt{4\pi Dt}} \exp\left(-\frac{x^2}{4Dt}\right)\]

The “Aha!” Moment: This is a Gaussian distribution. Let’s compare it to the formula from the second whiteboard: * The mean is \(\mu=0\). 均值为 \(\mu=0\)。 * The variance is \(\sigma^2 = 2Dt\). 方差为 \(\sigma^2 = 2Dt\)

This is an incredible result. The macroscopic diffusion equation predicts that a concentration pulse will spread out over time, and the shape of the concentration profile will be a Gaussian curve. The width of this curve, measured by its variance \(\sigma^2\), is exactly the Mean Squared Displacement, \(\langle x^2(t) \rangle\), of the individual random-walking particles. 宏观扩散方程预测浓度脉冲会随时间扩散,浓度分布的形状将是高斯曲线。这条曲线的宽度,用其方差 \(\sigma^2\) 来衡量,恰好是单个随机游动粒子的均方位移 \(\langle x^2(t) \rangle\)

This perfectly unites the two perspectives: * Microscopic微观 (Board 2): Particles undergo a random walk, and their average squared displacement from the origin grows as \(\langle x^2(t) \rangle = 2Dt\). 粒子进行随机游动,它们相对于原点的平均平方位移随着 \(\langle x^2(t) \rangle = 2Dt\) 的增长而增长。 * Macroscopic宏观 (This Board): A collection of these particles, described by a continuum concentration C, spreads out in a Gaussian profile whose variance is \(\sigma^2 = 2Dt\). 这些粒子的集合,用连续浓度“C”来描述,呈方差为 \(\sigma^2 = 2Dt\) 的高斯分布。

The two pictures are mathematically identical.

统计机器学习Lecture-3

Lecturer: Prof.XIA DONG

1. General linear regression model.

Diagram of a linear regression model ## 1.1 general linear regression model - 内容: general linear regression model.

the fundamental equation:

\[y_i = \beta_0 + \beta_1x_{i1} + \dots + \beta_px_{ip} + \epsilon_i\]

And it correctly identifies the main goal: to estimate the parameters (the coefficients \(\beta_0, \beta_1, \dots, \beta_p\)) from data so we can make predictions on new data.

核心目标:通过数据估计参数(即系数 \(\beta_0, \beta_1, \dots, \beta_p\)),从而对新数据进行预测。

1.2 How we actually find the best values for the \(β\) coefficients (parameter estimation)?:

  • 内容: We find the best values for the \(\beta\) coefficients by finding the values that minimize the overall error of the model. The most common and fundamental method for this is called Ordinary Least Squares (OLS).

## The Main Method: Ordinary Least Squares (OLS) 普通最小二乘法 (OLS)

The core idea of OLS is to find the line (or hyperplane in multiple dimensions) that is as close as possible to all the data points simultaneously. OLS 的核心思想是找到一条尽可能同时接近所有数据点的直线(或多维超平面)。

1. Define the Error (Residuals) 误差

First, we need to define what “error” means. For any single data point, the error is the difference between the actual value (\(y_i\)) and the value predicted by our model (\(\hat{y}_i\)). This difference is called the residual. 首先,需要定义“误差”的含义。对于任何单个数据点,误差是实际值 (\(y_i\)) 与模型预测值 (\(\hat{y}_i\)) 之间的差值。这个差值称为残差

Residual = Actual Value - Predicted Value 残差 = 实际值 - 预测值 \[e_i = y_i - \hat{y}_i\]

You can visualize residuals as the vertical distance from each data point to the regression line. 可以将残差可视化为每个数据点到回归线的垂直距离。

2. The Cost Function: Sum of Squared Residuals 成本函数:残差平方和

We want to make all these residuals as small as possible. We can’t just add them up, because some are positive and some are negative, and they would cancel each other out. 所有残差尽可能小。不能简单地将它们相加,因为有些是正数,有些是负数,它们会相互抵消。

So, we square each residual (which makes them all positive) and then sum them up. This gives us the Sum of Squared Residuals (SSR), which is our “cost function.” 因此,将每个残差求平方(使它们都为正数),然后将它们相加。这就得到了残差平方和 (SSR),也就是“成本函数”。

\[SSR = \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\]

The goal of OLS is simple: find the values of \(\beta_0, \beta_1, \dots, \beta_p\) that make this SSR value as small as possible.

3. Solving for the Coefficients: The Normal Equation 求解系数:正态方程

For linear regression, calculus provides a direct, exact solution to this minimization problem. By taking the derivative of the SSR function with respect to each \(\beta\) coefficient and setting it to zero, we can solve for the optimal values. 对于线性回归,微积分为这个最小化问题提供了直接、精确的解。通过对 SSR 函数的每个 \(\beta\) 系数求导并将其设为零,就可以求解出最优值。

This process results in a formula known as the Normal Equation, which can be expressed cleanly using matrix algebra: 这个过程会得到一个称为正态方程的公式,它可以用矩阵代数清晰地表示出来:

\[\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]

  • \(\hat{\boldsymbol{\beta}}\) is the vector of our estimated coefficients.估计系数的向量。
  • \(\mathbf{X}\) is a matrix where each row is an observation and each column is a feature (with an added column of 1s for the intercept \(\beta_0\)).其中每一行代表一个观测值,每一列代表一个特征(截距 \(\beta_0\) 增加了一列全为 1 的值)。
  • \(\mathbf{y}\) is the vector of the actual response values.实际响应值的向量。

Statistical software and programming libraries (like Scikit-learn in Python) use this equation (or more computationally stable versions of it) to find the best coefficients for you instantly.

## An Alternative Method: Gradient Descent 梯度下降

While the Normal Equation gives a direct answer, it can be very slow if you have a massive number of features (e.g., hundreds of thousands). An alternative, iterative method used across machine learning is Gradient Descent.

The Intuition: Imagine the SSR cost function is a big valley. Your initial (random) \(\beta\) coefficients place you somewhere on the slope of this valley.

  1. Check the slope (the gradient) at your current position. 检查您当前位置的斜率(梯度)。
  2. Take a small step in the steepest downhill direction. 朝最陡的下坡方向迈出一小步**。
  3. Repeat. You keep taking steps downhill until you reach the bottom of the valley. The bottom of the valley represents the minimum SSR, and your coordinates at that point are the optimal \(\beta\) coefficients. 重复。您继续向下走,直到到达山谷底部。谷底代表最小SSR,该点的坐标即为最优\(\beta\)系数。

The size of each “step” you take is controlled by a parameter called the learning rate. Gradient Descent is the foundational optimization algorithm for many complex models, including neural networks. 每次“步进”的大小由一个称为学习率的参数控制。梯度下降是许多复杂模型(包括神经网络)的基础优化算法。

## Summary: OLS vs. Gradient Descent

Feature Ordinary Least Squares (OLS) Gradient Descent
How it works Direct calculation using the Normal Equation. Iterative; takes steps to minimize error.
Pros Provides an exact, optimal solution. No parameters to tune. More efficient for very large datasets. Very versatile.
Cons Can be computationally expensive with many features. Requires choosing a learning rate. May not find the exact minimum.

2. Simple Linear Regression

Simple_Linear_Regression

2.1 Simple Linear Regression

  • 内容: Simple Linear Regression: a special case of the general model you showed earlier where you only have one predictor variable (\(p=1\)).

## The Model and the Goal 模型和目标

Sets up the simplified equation for a line: \[y_i = \beta_0 + \beta_1x_i + \epsilon_i\] * \(y_i\) is the outcome you want to predict.要预测的结果。 * \(x_i\) is your single input feature or covariate.单个输入特征或协变量。 * \(\beta_1\) is the slope of the line. It tells you how much \(y\) is expected to increase for a one-unit increase in \(x\).表示 \(x\) 每增加一个单位,\(y\) 预计会增加多少。 * \(\beta_0\) is the intercept. It’s the predicted value of \(y\) when \(x\) is zero.当 \(x\) 为零时 \(y\) 的预测值。 * \(\epsilon_i\) is the random error term.是随机误差项。

The goal, stated as “Minimize the sum of squares of err,” is exactly the Ordinary Least Squares (OLS) method we just discussed. It’s written here as: \[\min_{a,b} \sum_{i=1}^{n} (y_i - a - bx_i)^2\] This is just a different way of writing the same thing, where they use a for the intercept (\(\beta_0\)) and b for the slope (\(\beta_1\)). You’re trying to find the specific values of the slope and intercept that make the sum of all the squared errors as small as possible. 目标,即“最小化误差平方和”,正是普通最小二乘法 (OLS)。: \[\min_{a,b} \sum_{i=1}^{n} (y_i - a - bx_i)^2\] 这是另一种写法,其中用 a 表示截距 (\(\beta_0\)),b 表示斜率 (\(\beta_1\))。尝试找到斜率和截距的具体值,使得所有平方误差之和尽可能小。

## The Solution: The Estimator Formulas 解决方案:估计公式

The most important part of this slide is the solution. For the simple case with only one variable, you don’t need complex matrix algebra (the Normal Equation). Instead, the minimization problem can be solved with these two straightforward formulas: 对于只有一个变量的简单情况,不需要复杂的矩阵代数(正态方程)。相反,最小化问题可以用以下两个简单的公式来解决:

1. The Slope: \(\hat{\beta}_1\)

\[\hat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}\] * Intuition: This formula might look complex, but it’s actually very intuitive. * The numerator, \(\sum(x_i - \bar{x})(y_i - \bar{y})\), is closely related to the covariance between X and Y. It measures whether X and Y tend to move in the same direction (positive slope) or in opposite directions (negative slope). 与 X 和 Y 之间的协方差密切相关。它衡量 X 和 Y 是倾向于朝相同方向(正斜率)还是朝相反方向(负斜率)移动。 * The denominator, \(\sum(x_i - \bar{x})^2\), is related to the variance of X. It measures how much X varies on its own. 它衡量 X 自身的变化量。 * In short, the slope is a measure of how X and Y vary together, scaled by how much X varies by itself. 斜率衡量的是 X 和 Y 共同变化的程度,并以 X 自身的变化量为标度。

2. The Intercept: \(\hat{\beta}_0\) 截距

\[\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}\] * Intuition: This formula is even simpler and has a wonderful geometric meaning. It ensures that the line of best fit always passes through the “center of mass” of the data, which is the point of averages \((\bar{x}, \bar{y})\). 它确保最佳拟合线始终穿过数据的“质心”,即平均值 \((\bar{x}, \bar{y})\) 的点。计算出最佳斜率 (\(\hat{\beta}_1\)) 后,就可以将其代入此公式。然后,可以调整截距 (\(\hat{\beta}_0\)),使直线完美地围绕数据云的中心点旋转。 * Once you’ve calculated the best slope (\(\hat{\beta}_1\)), you can plug it into this formula. You then adjust the intercept (\(\hat{\beta}_0\)) so that the line pivots perfectly around the central point of your data cloud.

In summary, this slide provides the precise, closed-form formulas to calculate the slope and intercept for the line of best fit in a simple linear regression model.

3. Statistical Inference

Statistical_Inference1 Statistical_Inference2 ## 3.1 Statistical Inference - 内容: Statistical Inference: These two slides are deeply connected and explain how we go from just calculating the coefficients to understanding how accurate and reliable they are. 解释了我们如何从仅仅计算系数到理解它们的准确性可靠性

## The Core Problem: Quantifying Uncertainty 量化不确定性

The second slide poses the fundamental questions: * “How accurate are \(\hat{\beta}_0\) and \(\hat{\beta}_1\)?”准确性如何? * “What are the distributions of \(\hat{\beta}_0\) and \(\hat{\beta}_1\)?”分布是什么?

The reason we ask this is that our estimated coefficients (\(\hat{\beta}_0, \hat{\beta}_1\)) were calculated from a specific sample of data. If we collected a different random sample from the same population, we would get slightly different estimates.估计的系数 (\(\hat{\beta}_0, \hat{\beta}_1\)) 是根据特定的数据样本计算出来的。如果我们从同一总体中随机抽取不同的样本,我们得到的估计值会略有不同。

The goal of statistical inference is to use the estimates from our single sample to make conclusions about the true, unknown population parameters (\(\beta_0, \beta_1\)) and to quantify our uncertainty about them.统计推断的目标是利用单个样本的估计值得出关于真实、未知的总体参数\(\beta_0, \beta_1\))的结论,并量化对这些参数的不确定性。

## The Key Assumption That Makes It Possible 实现这一目标的关键假设

To figure out the distribution of our estimates, we must make an assumption about the distribution of the errors. This is the most important assumption in linear regression for inference: 为了确定估计值的分布,必须对误差的分布做出假设。这是线性回归推断中最重要的假设: Assumption: \(\epsilon_i \stackrel{\text{i.i.d.}}{\sim} N(0, \sigma^2)\)

This means we assume the random error terms are: * Normally distributed (\(N\)).* 正态分布\(N\))。 * Have a mean of zero (our model is correct on average).* 均值为(模型平均而言是正确的)。 * Have a constant variance \(\sigma^2\) (homoscedasticity).* 方差为常数\(\sigma^2\)(方差齐性)。 * Are independent and identically distributed (i.i.d.), meaning each error is independent of the others.* 是独立同分布(i.i.d.)的,这意味着每个误差都独立于其他误差。

Why is this important? Because our coefficients \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are calculated as weighted sums of the \(y_i\) values, and the \(y_i\) values depend on the errors \(\epsilon_i\). This assumption about the errors allows us to prove that our estimated coefficients themselves are also normally distributed. 系数 \(\hat{\beta}_0\)\(\hat{\beta}_1\) 是通过 \(y_i\) 值的加权和计算的,而 \(y_i\) 值取决于误差 \(\epsilon_i\)。这个关于误差的假设使能够证明估计的系数本身也服从正态分布。

## The Solution: The Theorem and the t-distribution 定理和 t 分布

The first slide provides the central theorem that allows us to perform inference. It tells us exactly how to standardize our estimated coefficients so they follow a known distribution. 第一张幻灯片提供了进行推断的核心定理。它准确地告诉我们如何对估计的系数进行标准化,使其服从已知的分布。

1. The Standard Error (s.e.) 标准误差 (s.e.)

First, look at the denominators in the red dotted boxes. These are the standard errors of the coefficients, s.e.($\hat{\beta}_1$) and s.e.($\hat{\beta}_0$). 第一张幻灯片提供了进行推断的核心定理。它准确地告诉我们如何对估计的系数进行标准化,使其服从已知的分布。

  • What it is: The standard error is the estimated standard deviation of the coefficient’s sampling distribution. In simpler terms, it’s a measure of the average amount by which our estimate \(\hat{\beta}_1\) would differ from the true \(\beta_1\) if we were to repeat the experiment many times. 标准误差是系数抽样分布的标准差估计值。简单来说,它衡量的是如果我们重复实验多次,我们估计的 \(\hat{\beta}_1\) 与真实的 \(\beta_1\) 之间的平均差异。
  • A smaller standard error means a more precise and reliable estimate. 标准误差越小,估计值越精确可靠。

2. The t-statistic t 统计量

The theorem shows two fractions that form a t-statistic. The general structure for this is: 该定理展示了两个构成t 统计量的分数。其一般结构如下: \[t = \frac{\text{ (Sample Estimate - True Value) }}{\text{ Standard Error of the Estimate }}\]

For \(\beta_1\), this is: \(\frac{\hat{\beta}_1 - \beta_1}{\text{s.e.}(\hat{\beta}_1)}\).

The key insight is that this specific quantity follows a Student’s t-distribution with \(n-2\) degrees of freedom. 关键在于,这个特定量服从学生 t 分布,其自由度为\(n-2\)。 * Student’s t-distribution:** This is a probability distribution that looks very similar to the normal distribution but has slightly “heavier” tails. We use it instead of the normal distribution because we had to estimate the standard deviation of the errors (s in the formula), which adds extra uncertainty. 这是一种概率分布,与正态分布非常相似,但尾部略重。使用它来代替正态分布,是因为必须估计误差的标准差(公式中的 s),这会增加额外的不确定性。 * Degrees of Freedom (n-2): We start with n data points, but we lose two degrees of freedom because we used the data to estimate two parameters: \(\beta_0\) and \(\beta_1\). 从 n 个数据点开始,但由于用这些数据估计了两个参数:\(\beta_0\)\(\beta_1\),因此损失了两个自由度。 #### 3. Estimating the Error Variance (\(s^2\))估计误差方差 (\(s^2\)) To calculate the standard errors, we need a value for s, which is our estimate of the true error standard deviation \(\sigma\). This is calculated from the Residual Sum of Squares (RSS). 为了计算标准误差,我们需要一个 s 的值,它是对真实误差标准差 \(\sigma\) 的估计值。该值由残差平方和 (RSS) 计算得出。 * RSS: First, we calculate the RSS = \(\sum(y_i - \hat{y}_i)^2\), which is the sum of all the squared errors.* RSS:首先,计算 RSS = \(\sum(y_i - \hat{y}_i)^2\),即所有平方误差之和。 * \(s^2\): Then, we find the estimate of the error variance: \(s^2 = \text{RSS} / (n-2)\). We divide by \(n-2\) to get an unbiased estimate. * \(s^2\):然后,计算误差方差的估计值:\(s^2 = \text{RSS} / (n-2)\)。我们将其除以 \(n-2\) 即可得到无偏估计值。 * s is simply the square root of \(s^2\). This s is the value used in the standard error formulas.* s 就是 \(s^2\) 的平方根。这个 s 是标准误差公式中使用的值。

## What This Allows Us To Do (The Practical Use)

Because we know the exact distribution of our t-statistic, we can now achieve our goal of quantifying uncertainty: 因为知道 t 统计量的精确分布,所以现在可以实现量化不确定性的目标:

  1. Hypothesis Testing: We can test if a predictor is actually useful. The most common test is for the null hypothesis \(H_0: \beta_1 = 0\). If we can prove the observed \(\hat{\beta}_1\) is very unlikely to occur if the true \(\beta_1\) were zero, we can conclude there is a statistically significant relationship between \(x\) and \(y\). 可以检验一个预测变量是否真的有用。最常见的检验是零假设 \(H_0: \beta_1 = 0\)。如果能证明,当真实的 \(\beta_1\) 为零时,观测到的 \(\hat{\beta}_1\) 不太可能发生,那么就可以得出结论,\(x\)\(y\) 之间存在统计学上的显著关系。
  2. Confidence Intervals: We can construct a range of plausible values for the true coefficient. For example, we can calculate a 95% confidence interval for \(\beta_1\). This gives us a range where we are 95% confident the true value of \(\beta_1\) lies. 可以为真实系数构建一系列合理的值。

4. Multiple Linear Regression

Multiple_Linear Regression1 Multiple_Linear Regression2 ## 4.1 Multiple Linear Regression - 内容: Multiple Linear Regression:

Here’s a detailed breakdown that connects both slides.

## The Model: From One to Many Predictors 从单预测变量到多预测变量

The first slide introduces the Multiple Linear Regression model. This is a direct extension of the simple model, but instead of using just one predictor variable, we use multiple (\(p\)) predictors to explain our response variable. 多元线性回归模型是简单模型的直接扩展,但不是只使用一个预测变量,而是使用多个(\(p\))预测变量来解释响应变量。

The general formula is: \[y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \dots + \beta_px_{ip} + \epsilon_i\]

Key Change in Interpretation

This is the most important new concept. In simple regression, \(\beta_1\) was just the slope. In multiple regression, each coefficient has a more nuanced meaning: 在简单回归中,\(\beta_1\) 只是斜率。在多元回归中,每个系数都有更微妙的含义:

\(\beta_j\) is the average change in \(y\) for a one-unit increase in \(x_j\), while holding all other predictors constant.

This is incredibly powerful. Using the advertising example from your slide: * \(y_i = \beta_0 + \beta_1(\text{TV}_i) + \beta_2(\text{Radio}_i) + \beta_3(\text{Newspaper}_i) + \epsilon_i\) * \(\beta_1\) represents the effect of TV advertising on sales, after controlling for the amount spent on Radio and Newspaper ads. This allows you to isolate the unique contribution of each advertising channel.表示在控制广播和报纸广告支出后,电视广告对销售额的影响。这可以让您区分每个广告渠道的独特贡献。

## The Solution: Deriving the Normal Equation 推导正态方程

The second slide shows the mathematical process for finding the best coefficients (\(\beta_0, \beta_1, \dots, \beta_p\)) using the Ordinary Least Squares (OLS) method. It’s essentially a condensed derivation of the Normal Equation. 使用普通最小二乘法 (OLS) 寻找最佳系数 (\(\beta_0, \beta_1, \dots, \beta_p\)) 的数学过程。它本质上是正态方程的简化推导。

1. The Goal: Minimizing the Sum of Squares 最小化平方和

Just like before, our goal is to minimize the sum of the squared errors (or residuals): 目标是最小化平方误差(或残差)之和。

  • Scalar Form: \(\sum_{i=1}^{n} (y_i - \beta_0 - \beta_1x_{i1} - \beta_2x_{i2} - \beta_3x_{i3})^2\)
    • This is easy to read but gets very long with more variables. 代码易于阅读,但变量越多,代码越长。
  • Vector Form: \(\sum_{i=1}^{n} (y_i - \boldsymbol{\beta}^T \mathbf{x}_i)^2\)
    • This is a more compact and powerful way to write the same thing using linear algebra, where \(\boldsymbol{\beta}^T \mathbf{x}_i\) is the dot product that calculates the entire predicted value \(\hat{y}_i\). 这是一种更简洁、更强大的线性代数表示方法,其中 \(\boldsymbol{\beta}^T \mathbf{x}_i\) 是计算整个预测值 \(\hat{y}_i\) 的点积。

2. The Method: Using Calculus to Find the Minimum 使用微积分求最小值

To find the set of \(\beta\) values that results in the lowest possible error, we use calculus.

  • The Derivative (Gradient): Since our error function depends on multiple \(\beta\) coefficients, we can’t take a simple derivative. Instead, we take the gradient, which is a vector of partial derivatives (one for each coefficient). This tells us the “slope” of the error function in every direction. 导数(梯度) 误差函数依赖于多个 \(\beta\) 系数,因此我们不能简单地求导数。相反,采用梯度,它是一个由偏导数组成的向量(每个系数对应一个偏导数)。这告诉误差函数在各个方向上的“斜率”。

  • Setting the Gradient to Zero: The minimum of a function occurs where its slope is zero (the very bottom of the error “valley”). The slide shows the result of taking this gradient and setting it to zero.函数的最小值出现在其斜率为零的地方(即误差“谷底”的最低点)。幻灯片展示了取此梯度并将其设为零的结果。

The equation shown on the slide: \[2 \sum_{i=1}^{n} (\boldsymbol{\beta}^T \mathbf{x}_i - y_i)\mathbf{x}_i^T = 0\] …is the result of this calculus step. The goal is now to algebraically rearrange this equation to solve for \(\boldsymbol{\beta}\). 是这一微积分步骤的结果。现在的目标是用代数方法重新排列这个方程,以求解 \(\boldsymbol{\beta}\)

3. The Result: The Normal Equation 正则方程

After rearranging the equation from the previous step and expressing the sums in their full matrix form, we arrive at a clean and beautiful solution. While the slide doesn’t show the final step, the result of “Setting the gradient zero and solve \(\beta\)” is the Normal Equation: 重新排列上一步中的方程,并将和表示为完整的矩阵形式后,得到了一个简洁美观的解。虽然幻灯片没有展示最后一步,“设置梯度零点并求解 \(\beta\)” 的结果就是正态方程

\[\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]

  • \(\hat{\boldsymbol{\beta}}\) is the vector of our optimal coefficient estimates.
  • \(\mathbf{X}\) is the “design matrix” where each row is an observation and each column is a predictor variable. \(\mathbf{X}\) 是“设计矩阵”,其中每一行代表一个观测值,每一列代表一个预测变量。
  • \(\mathbf{y}\) is the vector of our response variable. \(\mathbf{y}\) 是我们的响应变量的向量。

This single equation is the general solution for finding the OLS coefficients for any linear regression model, no matter how many predictors you have. This is what statistical software calculates for you under the hood. 无论有多少个预测变量,这个简单的方程都是任何线性回归模型中 OLS 系数的通解。

5. matrix notatio

  • 内容: This slide introduces the matrix notation for multiple linear regression, which is a powerful way to represent the entire system of equations in a compact form. This notation isn’t just for tidiness—it’s the foundation for how the solutions are derived and calculated in software.

多元线性回归的矩阵符号,这是一种以紧凑形式表示整个方程组的有效方法。这种符号不仅仅是为了简洁,它还是软件中推导和计算解的基础。 Here is a more detailed breakdown.

## Why Use Matrix Notation?

Imagine you have 10,000 observations (\(n=10,000\)) and 5 predictor variables (\(p=5\)). Writing out the model equation for each observation would be impossible: \(y_1 = \beta_0 + \beta_1x_{11} + \dots + \beta_5x_{15} + \epsilon_1\) \(y_2 = \beta_0 + \beta_1x_{21} + \dots + \beta_5x_{25} + \epsilon_2\) …and so on for 10,000 lines.

假设你有 10,000 个观测值(n=10,000)和 5 个预测变量(p=5)。为每个观测值写出模型方程是不可能的: \(y_1 = \beta_0 + \beta_1x_{11} + \dots + \beta_5x_{15} + \epsilon_1\) \(y_2 = \beta_0 + \beta_1x_{21} + \dots + \beta_5x_{25} + \epsilon_2\) ……以此类推,直到 10,000 行。 Matrix notation allows us to consolidate this entire system into a single, elegant equation:矩阵符号使我们能够将整个系统合并成一个简洁的方程: \[\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}\] Let’s break down each component shown on your slide.

## The Components Explained

1. The Design Matrix: \(\mathbf{X}\) 设计矩阵

\[\mathbf{X} = \begin{pmatrix} 1 & x_{11} & x_{12} & \cdots & x_{1p} \\ 1 & x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n1} & x_{n2} & \cdots & x_{np} \end{pmatrix}\] This is the most important matrix. It contains all of your predictor variable data.这是最重要的矩阵。它包含所有预测变量数据。 * Rows: Each row represents a single observation (e.g., a person, a company, a day). There are n rows.每一行代表一个观察值(例如,一个人、一家公司、一天)。共有 n 行。 * Columns: Each column represents a predictor variable. There are p predictor columns, plus one special column.每列代表一个预测变量。共有 p 个预测列,外加一个特殊列。 * The Column of Ones: This is a crucial detail. This first column of all ones is a placeholder for the intercept term (\(\beta_0\)). When you perform matrix multiplication, this 1 gets multiplied by \(\beta_0\), ensuring the intercept is included in the model for every single observation. 这是一个至关重要的细节。第一列(全 1)是截距项 (\(\beta_0\)) 的占位符。执行矩阵乘法时,这个 1 会乘以 \(\beta_0\),以确保截距包含在模型中,适用于每个观测值。

2. The Coefficient Vector: \(\boldsymbol{\beta}\) 系数向量

\[\boldsymbol{\beta} = \begin{pmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{pmatrix}\] This is a column vector that contains all the model parameters—the unknown values we want to estimate. The goal of linear regression is to find the numerical values for this vector.

3. The Response Vector: \(\mathbf{y}\) 响应向量

\[\mathbf{y} = \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix}\] This is a column vector containing all the observed outcomes you are trying to predict (e.g., sales, test scores, stock prices).

4. The Error Vector: \(\boldsymbol{\epsilon}\) 误差向量

\[\boldsymbol{\epsilon} = \begin{pmatrix} \epsilon_1 \\ \vdots \\ \epsilon_n \end{pmatrix}\] This column vector bundles together all the individual, unobserved random errors. It represents the portion of y that our model cannot explain with X.

## Putting It All Together

When you write the equation \(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}\), you are actually representing the entire system of individual equations.

Let’s look at the multiplication \(\mathbf{X}\boldsymbol{\beta}\): \[\begin{pmatrix} 1 & x_{11} & \dots & x_{1p} \\ 1 & x_{21} & \dots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n1} & \dots & x_{np} \end{pmatrix} \begin{pmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{pmatrix} = \begin{pmatrix} 1\cdot\beta_0 + x_{11}\cdot\beta_1 + \dots + x_{1p}\cdot\beta_p \\ 1\cdot\beta_0 + x_{21}\cdot\beta_1 + \dots + x_{2p}\cdot\beta_p \\ \vdots \\ 1\cdot\beta_0 + x_{n1}\cdot\beta_1 + \dots + x_{np}\cdot\beta_p \end{pmatrix}\] As you can see, the result of this multiplication is a single column vector where each row is the “predictor” part of the regression equation for that observation. 此乘法的结果是一个单列向量,其中每一行都是该观测值的回归方程的“预测变量”部分。

By setting this equal to \(\mathbf{y} - \boldsymbol{\epsilon}\), you perfectly recreate the entire set of n equations in one clean statement. This compact form is what allows us to easily derive and compute the Normal Equation solution: \(\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\).这种紧凑形式使我们能够轻松推导和计算正态方程的解

6. the core mathematical conclusion of Ordinary Least Squares (OLS)

  • 内容: Of course. These slides present the core mathematical conclusion of Ordinary Least Squares (OLS) and a key geometric property that explains why this solution works. 展示了普通最小二乘法 (OLS) 的核心数学结论,以及一个关键的几何性质,解释了该解决方案为何有效。 Let’s break down the concepts and the calculation processes in detail.

## Part 1: The Objective and the Solution (Slide 1) 最小化几何距离

This slide summarizes the entire OLS problem and its solution in the language of matrix algebra.

The Concept: Minimizing Geometric Distance

“最小二乘准则”是我们模型的目标。 The “least squares criterion” is the objective of our model. The slide shows it in two equivalent forms:

  1. Summation Form: \(\sum_{i=1}^{n} (y_i - \beta_0 - \beta_1x_{i1} - \dots - \beta_px_{ip})^2\) This is the sum of the squared differences between the actual values (\(y_i\)) and the predicted values. 这是实际值 (\(y_i\)) 与预测值之差的平方和。
  2. Matrix Form: \(||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2 = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\) This is the more powerful way to view the problem. Think of \(\mathbf{y}\) (the vector of all actual outcomes) and \(\mathbf{X}\boldsymbol{\beta}\) (the vector of all predicted outcomes) as two points in an n-dimensional space. The expression \(||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2\) represents the squared Euclidean distance between these two points. 将 \(\mathbf{y}\)(所有实际结果的向量)和 \(\mathbf{X}\boldsymbol{\beta}\)(所有预测结果的向量)视为 n 维空间中的两个点。表达式 \(||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2\) 表示这两点之间的平方欧氏距离。 Therefore, the OLS problem is a geometric one: Find the coefficient vector \(\boldsymbol{\beta}\) that makes the predicted values vector \(\mathbf{X}\boldsymbol{\beta}\) as close as possible to the actual values vector \(\mathbf{y}\). 因此,OLS 问题是一个几何问题:找到一个系数向量 \(\boldsymbol{\beta}\),使预测值向量 \(\mathbf{X}\boldsymbol{\beta}\) 尽可能接近实际值向量 \(\mathbf{y}\)

The Solution: The Least Squares Estimator (LSE)最小二乘估计器 (LSE)

The slide provides the direct solution to this minimization problem, which is the Normal Equation:此最小化问题的直接解,即正态方程

\[\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]

This formula gives you the exact vector of coefficients \(\hat{\boldsymbol{\beta}}\) that minimizes the squared distance. We get this formula by taking the gradient (the multidimensional version of a derivative) of the distance function with respect to \(\boldsymbol{\beta}\), setting it to zero, and solving, as hinted at in your previous slides. 给出了使平方距离最小化的精确系数向量 通过取距离函数关于 \(\boldsymbol{\beta}\) 的梯度(导数的多维版本),将其设为零,然后求解,即可得到此公式。 Finally, the slide defines: * Fitted values: \(\hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}}\) (The vector of predictions using our optimal coefficients). 拟合值 * Residuals: \(\hat{\boldsymbol{\epsilon}} = \mathbf{y} - \hat{\mathbf{y}}\) (The vector of errors, representing the difference between actuals and predictions).误差向量,表示实际值与预测值之间的差异

## Part 2: The Geometric Property and Proofs (Slide 2)几何性质及证明

This slide explains a beautiful and fundamental property of the least squares solution: orthogonality.解释了最小二乘解的一个美妙而基本的性质:正交性

The Concept: Orthogonality of Residuals残差的正交性

The main idea is that the residual vector \(\hat{\boldsymbol{\epsilon}}\) is orthogonal (perpendicular) to every predictor variable in your model. 主要思想是残差向量 \(\hat{\boldsymbol{\epsilon}}\) 与模型中的每个预测变量正交(垂直)。

  • Geometric Intuition: Think of the columns of your matrix \(\mathbf{X}\) (i.e., your predictors and the intercept) as defining a flat surface, or a “hyperplane,” in a high-dimensional space. Your actual data vector \(\mathbf{y}\) exists somewhere in this space, likely not on the hyperplane. The OLS process finds the point on that hyperplane, \(\hat{\mathbf{y}}\), that is closest to \(\mathbf{y}\). The shortest line from a point to a plane is always one that is perpendicular to the plane. The residual vector, \(\hat{\boldsymbol{\epsilon}} = \mathbf{y} - \hat{\mathbf{y}}\), is that line. 将矩阵 \(\mathbf{X}\) 的列(即预测变量和截距)想象成在高维空间中定义一个平面或“超平面”。实际数据向量 \(\mathbf{y}\) 存在于该空间的某个位置,可能不在超平面上。OLS 过程会在该超平面 \(\hat{\mathbf{y}}\) 上找到与 \(\mathbf{y}\) 最接近的点。从一个点到一个平面的最短线始终是与该平面垂直的线。残差向量 \(\hat{\boldsymbol{\epsilon}} = \mathbf{y} - \hat{\mathbf{y}}\) 就是这条直线。

  • Mathematical Statement: This geometric property is stated as \(\mathbf{X}^T \hat{\boldsymbol{\epsilon}} = \mathbf{0}\). This equation means that the dot product of the residual vector with every column of \(\mathbf{X}\) is zero, which is the mathematical definition of orthogonality. 该等式意味着残差向量与 \(\mathbf{X}\) 每一列的点积都为零,这正是正交性的数学定义。

The Calculation Process (The Proofs)

1. Proof of Orthogonality: The slide shows a step-by-step calculation to prove that \(\mathbf{X}^T \hat{\boldsymbol{\epsilon}}\) is indeed zero. * Step 1: Start with the expression to be proven: \(\mathbf{X}^T \hat{\boldsymbol{\epsilon}}\) 从待证明的表达式开始: * Step 2: Substitute the definition of the residual, \(\hat{\boldsymbol{\epsilon}} = \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}\): \[\mathbf{X}^T (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})\] 代入残差的定义 * Step 3: Distribute the \(\mathbf{X}^T\): \[\mathbf{X}^T \mathbf{y} - \mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}}\]分配 Step 4: Substitute the Normal Equation for \(\hat{\boldsymbol{\beta}}\): \[\mathbf{X}^T \mathbf{y} - \mathbf{X}^T\mathbf{X} [(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}]\] * Step 5: The key step is the cancellation. A matrix \((\mathbf{X}^T\mathbf{X})\) multiplied by its inverse \((\mathbf{X}^T\mathbf{X})^{-1}\) equals the identity matrix \(\mathbf{I}\), which acts like the number 1 in multiplication. \[\mathbf{X}^T \mathbf{y} - \mathbf{I} \mathbf{X}^T\mathbf{y} = \mathbf{X}^T \mathbf{y} - \mathbf{X}^T\mathbf{y} = \mathbf{0}\] 关键步骤是消去。 This completes the proof, showing that the orthogonality property is a direct consequence of the Normal Equation solution.

2. Proof of LSE: This is a more abstract proof showing that our \(\hat{\boldsymbol{\beta}}\) truly gives the minimum possible error. It uses the orthogonality property and the Pythagorean theorem for vectors. It essentially shows that for any other possible coefficient vector \(\boldsymbol{v}\), the error \(||\mathbf{y} - \mathbf{X}\boldsymbol{v}||^2\) will always be greater than or equal to the error from our LSE, \(||\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}||^2\).

7.geometric interpretation

  • 内容:

These two slides together provide a powerful geometric interpretation of how Ordinary Least Squares (OLS) works, centered on the concepts of orthogonality and projection. 以正交性投影的概念为中心,从几何角度有力地诠释了普通最小二乘法 (OLS) 的工作原理。

Here’s a detailed summary of the concepts and the processes they describe.

## Summary

These slides explain that the process of finding the “best fit” line in regression is geometrically equivalent to projecting the actual data vector (\(\mathbf{y}\)) onto a hyperplane defined by the predictor variables (\(\mathbf{X}\)). This projection splits the actual data into two perpendicular components: 解释了回归分析中寻找“最佳拟合”直线的过程,其几何意义等同于将实际数据向量 (\(\mathbf{y}\)) 投影到由预测变量 (\(\mathbf{X}\)) 定义的超平面上。此投影将实际数据拆分为两个垂直分量:

  1. The Fitted Values (\(\hat{\mathbf{y}}\)): The part of the data that is perfectly explained by the model (the projection). 数据中能够被模型完美解释的部分(投影)。
  2. The Residuals (\(\hat{\boldsymbol{\epsilon}}\)): The part of the data that is unexplained (the error), which is perpendicular to the explained part. 数据中无法解释的部分(误差),它与被解释部分垂直。 A special tool called the projection matrix (H), or “hat matrix,” is introduced as the operator that performs this projection. 引入一个称为投影矩阵 (H)(或称“帽子矩阵”)的特殊工具,作为执行此投影的运算符。

## Concepts and Process Explained in Detail

1. The Fitted Values as a Linear Combination 拟合值作为线性组合

The first slide starts by stating that the fitted value vector \(\hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}}\) is a linear combination of the columns of \(\mathbf{X}\) (your predictors).

  • Concept: This means that the vector of fitted values, \(\hat{\mathbf{y}}\), must lie within the geometric space (a line, plane, or hyperplane) spanned by your predictor variables. The model is incapable of producing a prediction that does not live in this space. 这意味着拟合值向量 \(\hat{\mathbf{y}}\) 必须位于预测变量所构成的几何空间(直线、平面或超平面)内。模型无法生成不存在于此空间的预测。 #### 2. The Projection Matrix (The “Hat Matrix”) 投影矩阵(“帽子矩阵”) The second slide introduces the tool that makes this projection happen: the projection matrix, also called the hat matrix, H.

  • Definition: \(\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\)

  • Process: This matrix has a special job. When you multiply it by any vector (like our data vector \(\mathbf{y}\)), it projects that vector onto the space spanned by the columns of \(\mathbf{X}\). We can see this by starting with our definition of fitted values and substituting the normal equation solution for \(\hat{\boldsymbol{\beta}}\): \[\hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}[(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}] = [\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T]\mathbf{y}\] This shows that: \[\hat{\mathbf{y}} = \mathbf{H}\mathbf{y}\] This is why H is nicknamed the hat matrix—it “puts the hat” on \(\mathbf{y}\). 这个矩阵有其特殊的用途。当你将它乘以任何向量(例如我们的数据向量 \(\mathbf{y}\))时,它会将该向量投影到由 \(\mathbf{X}\) 的列所跨越的空间上。 #### 3. The Orthogonality of Fitted Values and Residuals 拟合值和残差的正交性 This is the central concept of the first slide and a fundamental property of least squares.

  • Concept: The fitted value vector (\(\hat{\mathbf{y}}\)) and the residual vector (\(\hat{\boldsymbol{\epsilon}} = \mathbf{y} - \hat{\mathbf{y}}\)) are orthogonal (perpendicular) to each other.

  • Mathematical Statement: Their dot product is zero: \(\hat{\mathbf{y}}^T(\mathbf{y} - \hat{\mathbf{y}}) = 0\).

  • Geometric Intuition: This means the vectors \(\mathbf{y}\), \(\hat{\mathbf{y}}\), and \(\hat{\boldsymbol{\epsilon}}\) form a right-angled triangle in n-dimensional space. The actual data vector \(\mathbf{y}\) is the hypotenuse, while the model’s prediction \(\hat{\mathbf{y}}\) and the error \(\hat{\boldsymbol{\epsilon}}\) are the two perpendicular legs. 这意味着向量 \(\mathbf{y}\)\(\hat{\mathbf{y}}\)\(\hat{\boldsymbol{\epsilon}}\) 在 n 维空间中构成一个直角三角形。实际数据向量 \(\mathbf{y}\) 是斜边,而模型的预测值 \(\hat{\mathbf{y}}\) 和误差值 \(\hat{\boldsymbol{\epsilon}}\) 是两条垂直边。

4. The Pythagorean Theorem of Least Squares

The orthogonality relationship directly implies the Pythagorean theorem.

  • Formula: \(||\mathbf{y}||^2 = ||\hat{\mathbf{y}}||^2 + ||\mathbf{y} - \hat{\mathbf{y}}||^2\)
  • Concept: This is one of the most important equations in statistics, as it partitions the total variance in the data. 这是统计学中最重要的方程之一,因为它可以分割数据中的总方差。
    • \(||\mathbf{y}||^2\) is proportional to the Total Sum of Squares (TSS): The total variation of the response variable around its mean.响应变量围绕其均值的总变异。
    • \(||\hat{\mathbf{y}}||^2\) is proportional to the Explained Sum of Squares (ESS): The portion of the total variation that is explained by your regression model.回归模型可以解释的总变异部分。
    • \(||\mathbf{y} - \hat{\mathbf{y}}||^2\) is the Residual Sum of Squares (RSS): The portion of the total variation that is left unexplained (the error).总变异中未解释的部分(即误差)。

This relationship, Total Variation = Explained Variation + Unexplained Variation, is the foundation for calculating metrics like R-squared (\(R^2\)), which measures the goodness of fit of your model. 总变异 = 解释变异 + 未解释变异,是计算R 平方 (\(R^2\)) 等指标的基础,该指标用于衡量模型的拟合优度。

5. Residuals and the Identity Matrix 残差和单位矩阵

Finally, the second slide shows that just as H projects onto the “model space,” a related matrix projects onto the “error space.” 最后,第二张幻灯片显示,正如H 投影到“模型空间”一样,相关矩阵也会投影到“误差空间”。 * Process: We can express the residuals using the hat matrix: \[\hat{\boldsymbol{\epsilon}} = \mathbf{y} - \hat{\mathbf{y}} = \mathbf{y} - \mathbf{H}\mathbf{y} = (\mathbf{I} - \mathbf{H})\mathbf{y}\] The matrix \((\mathbf{I} - \mathbf{H})\) is also a projection matrix. It takes the original data vector \(\mathbf{y}\) and projects it onto the space that is orthogonal to all of your predictors, giving you the residual vector directly.

8.visualization of Ordinary Least Squares (OLS) regression

  • 内容:

This slide provides an excellent geometric visualization of what’s happening “under the hood” in Ordinary Least Squares (OLS) regression. It translates the algebraic formulas into a more intuitive spatial concept. 这张幻灯片以出色的几何可视化方式展现了普通最小二乘 (OLS) 回归的“幕后”机制。它将代数公式转化为更直观的空间概念。

## Summary

The image shows that the process of finding the least squares estimates is geometrically equivalent to taking the actual outcome vector (\(\mathbf{y}\)) and finding its orthogonal projection (\(\hat{\mathbf{y}}\)) onto a hyperplane formed by the predictor variables (\(\mathbf{x}_1\) and \(\mathbf{x}_2\)). The projection \(\hat{\mathbf{y}}\) is the vector of fitted values, representing the closest possible approximation of the real data that the model can achieve.

该图显示,寻找最小二乘估计值的过程在几何上等同于将实际结果向量 (\(\mathbf{y}\)) 求出其正交投影 (\(\hat{\mathbf{y}}\)) 到由预测变量 (\(\mathbf{x}_1\)\(\mathbf{x}_2\) 构成的超平面上。投影 \(\hat{\mathbf{y}}\) 是拟合值的向量,表示模型能够达到的与真实数据最接近的近似值。

## The Concepts Explained Spatially空间概念解释

Let’s break down each element of the diagram and its meaning:

1. The Space Itself 空间本身

  • Concept: We are not in a simple 2D or 3D graph where axes are X and Y. Instead, we are in an n-dimensional space, where n is the number of observations in your dataset. Each axis in this space corresponds to one observation (e.g., one person, one day). 我们并非身处一个简单的二维或三维图形中,其中坐标轴为 X 和 Y。相反,我们身处一个 n 维空间,其中 n 是数据集中的观测值数量。此空间中的每个轴对应一个观测值(例如,一个人,一天)。
  • Meaning: A vector like y or x₁ is a single point in this high-dimensional space. For example, if you have 50 data points, y is a vector pointing to a specific location in a 50-dimensional space. 像 yx₁ 这样的向量是这个高维空间中的单个点。例如,如果您有 50 个数据点,y 就是指向 50 维空间中特定位置的向量。

2. The Predictor Hyperplane (The Yellow Surface)预测变量超平面(黄色表面)

  • Concept: The vectors for your predictor variables, x₁ and x₂, define a flat surface. If you had only one predictor, this would be a line. With two, it’s a plane. With more, it’s a hyperplane.预测变量的向量 x₁x₂ 定义了一个平面。如果只有一个预测变量,它就是一条线。如果有两个,它就是一个平面。如果有更多的预测变量,它就是一个超平面
  • Meaning: This yellow plane represents the “world of possible predictions” that your model is allowed to make. Any linear combination of your predictors—which is what a linear regression model calculates—will result in a vector that lies somewhere on this surface. 这个黄色平面代表你的模型可以做出的“可能预测的世界”。任何预测变量的线性组合(也就是线性回归模型计算的结果)都会产生一个位于这个平面某处的向量。 #### 3. The Actual Outcome Vector (y)实际结果向量 (y)
  • Concept: The red vector y represents your actual, observed data. It’s a single point in the n-dimensional space. 红色向量 y 代表你实际观察到的数据。它是 n 维空间中的一个点。
  • Meaning: Critically, this vector usually does not lie on the predictor hyperplane. If it did, your model would be a perfect fit with zero error. The fact that it’s “off the plane” represents the real-world noise and variation that the model cannot fully capture. 至关重要的是,这个向量通常位于预测变量超平面上。如果它位于超平面上,你的模型将完美拟合,误差为零。它“偏离平面”代表了模型无法完全捕捉到的真实世界的噪声和变化。

4. The Fitted Value Vector (ŷ)拟合值向量 (ŷ)

  • Concept: Since y is not on the plane, we find the point on the plane that is geometrically closest to y. This closest point is found by dropping a perpendicular line from y to the plane. The point where it lands is the orthogonal projection, labeled ŷ (y-hat). 由于 y 不在平面上,因此我们在平面上找到与 y 几何上最接近的点。这个最接近点是通过从 y** 到平面做一条垂直线找到的。垂直线所在的点就是正交投影,标记为 ŷ (y-hat)。
  • Meaning: ŷ is the vector of your model’s fitted values. It is the “best” possible approximation of the real data that can be created using the given predictors because it minimizes the distance (and therefore the squared error) between the actual data (y) and the model’s prediction. ŷ 是模型拟合值的向量。它是使用给定预测变量可以创建的对真实数据的“最佳”近似值,因为它最小化了实际数据 (y) 与模型预测值之间的距离(从而最小化了平方误差)。

5. The Residual Vector (The Dashed Line)残差向量(虚线)

  • Concept: The dashed line connecting the tip of y to the tip of ŷ is the residual vector (\(\boldsymbol{\epsilon} = \mathbf{y} - \hat{\mathbf{y}}\)). Its length is the shortest possible distance from y to the hyperplane. 连接y顶点和ŷ顶点的虚线是残差向量 (\(\boldsymbol{\epsilon} = \mathbf{y} - \hat{\mathbf{y}}\))。其长度是从y到超平面的最短可能距离。
  • Meaning: This vector represents the error of the model—the part of the actual data that is left over after accounting for the predictors. The right-angle symbol (└) is the most important part of the diagram, as it shows this error is orthogonal (perpendicular) to the prediction and to all the predictors. This visualizes the core property that the model’s errors are uncorrelated with the predictors. 连接y顶点和ŷ顶点的虚线是残差向量 (\(\boldsymbol{\epsilon} = \mathbf{y} - \hat{\mathbf{y}}\))。其长度是从y到超平面的最短可能距离。

9.Singular Value Decomposition (SVD) 奇异值分解 (SVD)

  • 内容:

These slides delve into the more advanced linear algebra behind the projection matrix (H), explaining its fundamental properties and offering a new way to construct it using Singular Value Decomposition (SVD). 探讨了投影矩阵 (H) 背后更高级的线性代数,解释了它的基本性质,并提供了一种使用奇异值分解 (SVD) 构造它的新方法。

## Summary

These slides show that the projection matrix (H), which is central to least squares, has two key mathematical properties: it’s symmetric and idempotent (projecting twice is the same as projecting once). These properties dictate that its eigenvalues must be either 1 or 0. Singular Value Decomposition (SVD) of the data matrix X provides an elegant and numerically stable way to express H as UUᵀ, which makes these fundamental properties easier to understand and prove. 这些幻灯片展示了投影矩阵 (H)(最小二乘法的核心)的两个关键数学性质:对称幂等(投影两次等于投影一次)。这些性质决定了它的特征值必须为 1 或 0。数据矩阵 X 的奇异值分解 (SVD) 提供了一种优雅且数值稳定的方式,将H 表示为 UUᵀ,这使得这些基本性质更容易理解和证明。

## Concepts and Process Explained in Detail

1. Singular Value Decomposition (SVD)

The first slide introduces SVD, a powerful method for factoring any matrix.

  • Concept: SVD breaks down your data matrix X into three simpler matrices: X = UDVᵀ. Think of this as revealing the fundamental structure of your data.SVD 将数据矩阵 X 分解为三个更简单的矩阵:X = UDVᵀ。这可以理解为揭示数据的基本结构。
    • U: An orthogonal matrix whose columns form a perfect, orthonormal basis for the space spanned by your predictors (the column space of X). These columns represent the principal directions of your data’s space.一个正交矩阵,其列构成预测变量所占空间(X 的列空间)的完美正交基。这些列代表数据空间的主方向。
    • D: A diagonal matrix containing the “singular values,” which measure the importance or magnitude of each of these principal directions.一个对角矩阵,包含“奇异值”,用于衡量每个主方向的重要性或大小。
    • V: Another orthogonal matrix.另一个正交矩阵
  • Process (How SVD simplifies the Projection Matrix) SVD 如何简化投影矩阵: The main takeaway from this slide is the new, simpler formula for the hat matrix: \[\mathbf{H} = \mathbf{UU}^T\] This result is derived by substituting X = UDVᵀ into the original, more complex formula for H: \[\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\] When you perform this substitution and use the fact that for orthogonal matrices U and V, we have UᵀU = I and VᵀV = I, the D and V matrices completely cancel out, leaving the beautifully simple form H = UUᵀ. This tells us that projection is fundamentally about the basis vectors (U) of the predictor space. 执行此代入并利用正交矩阵 UV 的公式,即 UᵀU = IVᵀV = IDV 矩阵完全抵消,剩下简洁的形式 H = UUᵀ。这告诉我们,投影本质上是关于预测空间的基向量(U)的。

2. The Properties of the Projection Matrix (H) 投影矩阵 (H) 的性质

The second slide describes the essential nature of any projection matrix.

  • Symmetric (H = Hᵀ): This property ensures that the projection is orthogonal (i.e., it finds the closest point by moving perpendicularly). 此性质确保投影是正交的(即,它通过垂直移动找到最近的点)。

  • Idempotent (H² = H): This is the most intuitive property of a projection. 这是投影最直观的性质。

    • Concept: “Doing it twice is the same as doing it once.” “两次和一次相同。”
    • Geometric Meaning: Imagine you project a point onto a flat tabletop. That projected point is now on the table. If you try to project it onto the table again, it doesn’t move. The projection of a projection is just the projection itself. Mathematically, this is H(Hv) = Hv, which simplifies to H² = H. 想象一下,你将一个点投影到平坦的桌面上。这个投影点现在在桌子上。如果你尝试再次将它投影到桌子上,它不会移动。投影的投影就是投影本身。从数学上讲,这是H(Hv) = Hv,简化为H² = H

3. Eigenvalues and Eigenspaces 特征值和特征空间

The idempotency property has a profound consequence for the matrix’s eigenvalues.

  • Concept: The eigenvalues of H can only be 1 or 0. H的特征值只能是1或0**。
  • Process (The Proof):
    1. Let v be an eigenvector of H with eigenvalue \(\lambda\). By definition, Hv = \(\lambda\)v. 设vH的特征向量,其特征值为\(\lambda\)。根据定义,Hv = \(\lambda\)v
    2. If we apply H again, we get H(Hv) = H(\(\lambda\)v) = \(\lambda\)(Hv) = \(\lambda\)(\(\lambda\)v) = \(\lambda^2\)v. 如果我们再次应用H,我们得到H(Hv) = H(\(\lambda\)v) = \(\lambda\)(Hv) = \(\lambda\)(\(\lambda\)v) = \(\lambda^2\)v
    3. So, we have H²v = \(\lambda^2\)v. 因此,我们有H²v = \(\lambda^2\)v
    4. But since H is idempotent, H² = H, which means H²v = Hv = \(\lambda\)v. 但由于H是幂等的,H² = H,这意味着H²v = Hv = \(\lambda\)v
    5. Therefore, we must have \(\lambda^2\)v = \(\lambda\)v, which means \(\lambda^2 = \lambda\). The only two numbers in existence that satisfy this equation are 0 and 1. 因此,我们必须有\(\lambda^2\)v = \(\lambda\)v,这意味着\(\lambda^2 = \lambda\)。满足此等式的仅有两个数字是01
  • Connecting Eigenvalues to the Model 将特征值连接到模型:
    • Eigenvalue = 1: The eigenvectors associated with an eigenvalue of 1 are the vectors that do not change when projected. This is only possible if they were already in the space being projected onto. Therefore, the space L₁ is the column space of X—the “model space.” H is the projection onto this space. 与特征值为 1 相关联的特征向量是投影时不会改变的向量。只有当它们已经存在于投影到的空间中时,这种情况才有可能发生。因此,空间 L₁ 是 X 的列空间——“模型空间”。H** 是到该空间的投影。

    • Eigenvalue = 0: The eigenvectors associated with an eigenvalue of 0 are the vectors that get sent to the zero vector when projected. This happens to vectors that are orthogonal to the projection space. Therefore, the space L₀ is the orthogonal “error” space. The matrix I - H is the projection onto this space.

    与特征值为 0 相关联的特征向量是投影时被发送到零向量的向量。这种情况发生在与投影空间正交的向量上。因此,空间 L₀正交“误差”空间。矩阵 I - H** 是到该空间的投影。

10.statistical inference

  • 内容:

These slides cover the theoretical backbone of statistical inference in linear regression. They explain the necessary assumptions and the resulting probability distributions of our estimates, which is what allows us to perform hypothesis tests and create confidence intervals.

这些幻灯片涵盖了线性回归中统计推断的理论基础。它们解释了必要的假设以及由此得出的估计概率分布,这使我们能够进行假设检验并创建置信区间。

## Summary

These slides lay out the statistical assumptions required for the Least Squares Estimator (LSE). The core idea is that if we assume the errors are independent and normally distributed, we can then prove that: 这些幻灯片列出了最小二乘估计量 (LSE) 所需的统计假设。其核心思想是,如果我们假设误差是独立的且服从正态分布,那么我们可以证明:

  1. Our estimated coefficients (\(\hat{\boldsymbol{\beta}}\)) also follow a Normal distribution (or a t-distribution when standardized). 我们的估计系数 (\(\hat{\boldsymbol{\beta}}\)) 也服从正态分布(标准化后服从t 分布)。

  2. Our summed-up squared errors (RSS) follow a Chi-squared distribution. 我们的平方误差总和 (RSS) 服从卡方分布

  3. A specific ratio of the explained variance to the unexplained variance follows an F-distribution, which is used to test the overall significance of the model. 解释方差与未解释方差的特定比率服从F 分布,该分布用于检验模型的整体显著性。

These known distributions are the foundation for all statistical inference in linear models.这些已知的分布是线性模型中所有统计推断的基础。

## Deeper Dive into Concepts and Processes

1. The Model Assumptions (The Foundation) 模型假设(基础)

The first slide states the two assumptions that are critical for everything that follows. Without them, we can’t make claims about the statistical properties of our estimates. 第一张幻灯片阐述了对后续所有内容都至关重要的两个假设。没有它们,我们就无法断言估计值的统计特性。

  • Assumption 1: \(\epsilon_i \sim N(0, \sigma^2)\)
    • Concept: This assumes that the error terms (the part of y that the model can’t explain) are drawn from a normal (bell-curve) distribution with a mean of zero and a constant variance \(\sigma^2\). **假设误差项(模型无法解释的 y 值部分)服从正态(钟形曲线)分布,该分布的均值为零,方差为常数 \(\sigma^2\)
    • Meaning in Plain English:
      • Mean of 0: The model is “correct on average.” The errors are not systematically positive or negative. **模型“平均正确”。误差并非系统地为正或负。
      • Normal Distribution: Small errors are more likely than large errors. This is a common assumption for random noise. **小误差比大误差更有可能出现。这是随机噪声的常见假设。
      • Constant Variance (\(\sigma^2\)): The amount of random scatter around the regression line is the same at all levels of the predictor variables. This is called homoscedasticity. 回归线周围的随机散度在预测变量的各个水平上都是相同的。这被称为同方差性
  • Assumption 2: Observations are independent 观测值是独立的**
    • Concept: Each data point \((x_i, y_i)\) is an independent piece of information. The value of the error for one observation gives no information about the error for another observation. 每个数据点 \((x_i, y_i)\) 都是一条独立的信息。一个观测值的误差值并不能反映另一个观测值的误差。
    • Meaning: This is often true for cross-sectional data (e.g., a random sample of people) but can be violated in time-series data where today’s error might be correlated with yesterday’s. 这通常适用于横截面数据(例如,随机抽样的人群),但在时间序列数据中可能不成立,因为今天的误差可能与昨天的误差相关。

2. The Distribution of the Coefficients (Theorem of LSE) 系数分布(最小二乘法定理)

This is the most important result for understanding the accuracy of our individual predictors.

  • Concept 1: The Sampling Distribution of \(\hat{\boldsymbol{\beta}}\)
    • Formula: \(\hat{\boldsymbol{\beta}} \sim N(\boldsymbol{\beta}, \sigma^2(\mathbf{X}^T\mathbf{X})^{-1})\)

    • Meaning: If you were to take many different random samples from the population and calculate the coefficients \(\hat{\boldsymbol{\beta}}\) for each sample, the distribution of those coefficients would be a multivariate normal distribution. **如果从总体中随机抽取许多不同的样本,并计算每个样本的系数 \(\hat{\boldsymbol{\beta}}\),则这些系数的分布将服从多元正态分布。

      • The center of this distribution is the true population coefficient vector \(\boldsymbol{\beta}\). This means our estimator is unbiased—on average, it finds the right answer. 该分布的中心是真实的总体系数向量 \(\boldsymbol{\beta}\)。这意味着我们的估计器是无偏的——平均而言,它能够找到正确的答案。
      • The “spread” of this distribution is its variance-covariance matrix, \(\sigma^2(\mathbf{X}^T\mathbf{X})^{-1}\). This tells us the uncertainty in our estimates. 该分布的“散度”是其方差-协方差矩阵
  • Concept 2: The t-statistic t 统计量
    • Formula: The standardized coefficient, \(\frac{\hat{\beta}_j - \beta_j}{\text{s.e.}(\hat{\beta}_j)}\), follows a t-distribution with \(n-p-1\) degrees of freedom.
    • Process & Meaning: In the real world, we don’t know the true error variance \(\sigma^2\). We have to estimate it using our sample data, which gives us \(s^2\). Because we are using an estimate of the variance, we introduce extra uncertainty. The t-distribution is like a normal distribution but with slightly “fatter” tails to account for this additional uncertainty. The degrees of freedom, \(n-p-1\), reflect the number of data points (n) minus the number of parameters we had to estimate (p slopes + 1 intercept). This is the basis for t-tests and confidence intervals for each coefficient. 在现实世界中,我们不知道真实的误差方差 \(\sigma^2\)。我们必须使用样本数据来估计它,从而得到 \(s^2\)。由于我们使用的是方差的估计值,因此引入了额外的不确定性。 t 分布类似于正态分布,但尾部略微“丰满”,以解释这种额外的不确定性。自由度 \(n-p-1\) 表示数据点的数量(n)减去我们需要估计的参数数量(p 个斜率 + 1 个截距)。这是 t 检验和每个系数置信区间的基础。

3. The Distribution of the Error (Theorem of Residual) 误差分布(残差定理)

This theorem helps us understand the properties of our model’s overall error.

  • Concept: The Residual Sum of Squares (RSS), when scaled by the true variance, follows a Chi-squared (\(\chi^2\)) distribution with \(n-p-1\) degrees of freedom. 残差平方和 (RSS) 经真实方差缩放后,服从自由度为 \(n-p-1\)卡方 (\(\chi^2\)) 分布

  • Process & Meaning: The Chi-squared distribution often arises when dealing with sums of squared normal variables. This theorem provides a formal probability distribution for our total model error. Its most important consequence is that it allows us to prove that: **卡方分布通常用于处理正态变量的平方和。该定理为我们模型的总体误差提供了一个正式的概率分布。它最重要的推论是,它使我们能够证明:

    \[s^2 = \text{RSS}/(n - p - 1)\] is an unbiased estimate of the true error variance \(\sigma^2\). This \(s^2\) is a critical ingredient for calculating the standard errors of our coefficients. \[s^2 = \text{RSS}/(n - p - 1)\] 是真实误差方差 \(\sigma^2\)无偏估计。这个 \(s^2\) 是计算系数标准误差的关键因素。

4. The F-Distribution and the Overall Model Test

This final theorem combines our findings about the coefficients and the residuals.

  • Concept: The F-statistic, which is essentially a ratio of the variance explained by the model to the variance left unexplained, follows an F-distribution. F 统计量本质上是模型解释的方差与未解释方差的比率,服从 F 分布。

  • Process & Meaning: This result relies on the fact that our coefficient estimates (\(\hat{\boldsymbol{\beta}}\)) are independent of our total error (RSS). The F-distribution is used for the F-test of overall significance. This test checks the null hypothesis that all of your slope coefficients are simultaneously zero (\(\beta_1 = \beta_2 = \dots = \beta_p = 0\)). If the F-test gives a small p-value, you can conclude that your model, as a whole, is statistically significant and provides a better fit than a model with no predictors. 如果 F 检验得出的 p 值较小,则可以得出结论,您的模型整体上具有统计显著性,并且比没有预测因子的模型拟合效果更好。

11.construct different types of intervals

  • 内容:

These slides explain how to use the statistical properties of the least squares estimates to construct different types of intervals, which are essential for quantifying the uncertainty in your model’s predictions and parameters. 这些幻灯片解释了如何利用最小二乘估计的统计特性来构建不同类型的区间,这对于量化模型预测和参数中的不确定性至关重要。

Summary

These slides show how to calculate three distinct types of intervals in linear regression, each answering a different question about uncertainty: 展示了如何计算线性回归中三种不同类型的区间,每种区间分别回答了关于不确定性的不同问题:

  1. Confidence Interval for a Parameter (\(\beta_j\)): Provides a plausible range for a single, true unknown coefficient in the model. 为模型中单个真实未知系数提供一个合理的范围。
  2. Confidence Interval for the Mean Response: Provides a plausible range for the average outcome for a given set of predictor values. 为给定一组预测变量值的平均结果提供一个合理的范围。
  3. Prediction Interval: Provides a plausible range for a single future outcome for a given set of predictor values. This interval is always wider than the confidence interval for the mean response because it must also account for individual random error. 为给定一组预测变量值的单个未来结果提供一个合理的范围。该区间始终比平均响应的置信区间更宽,因为它还必须考虑单个随机误差。

Deeper Dive into Concepts and Processes

1. Confidence Interval for a Single Parameter 单个参数的置信区间

This interval addresses the uncertainty around one specific coefficient, like the slope for your most important predictor. 此区间用于解决围绕某个特定系数的不确定性,例如最重要的预测因子的斜率。

  • The Question It Answers: “I’ve calculated a slope of \(\hat{\beta}_1 = 10.5\). How sure am I about this number? What is a plausible range for the true population slope?” 我计算出了斜率为 \(\hat{\beta}_1 = 10.5\)。我对这个数字有多确定?真实总体斜率的合理范围是多少?”
  • The Formula: \(\hat{\beta}_j \pm t_{n-p-1}(\alpha/2) s \sqrt{c_{jj}}\)
    • \(\hat{\beta}_j\): This is your best point estimate for the coefficient, taken directly from the model output. 这是该系数的最佳点估计值,直接取自模型输出。
    • \(t_{n-p-1}(\alpha/2)\): This is the critical value from a t-distribution. It’s a multiplier that sets the width of the interval based on your desired confidence level (e.g., for 95% confidence, \(\alpha=0.05\)). 这是 t 分布的临界值。它是一个乘数,根据您所需的置信水平设置区间宽度(例如,对于 95% 的置信度,\(\alpha=0.05\))。
    • \(s \sqrt{c_{jj}}\): This whole term is the standard error of the coefficient \(\hat{\beta}_j\). It measures the precision of your estimate. A smaller standard error means a narrower, more precise interval. 这整个项是系数 \(\hat{\beta}_j\)标准误差。它衡量您估计的精度。标准误差越小,区间越窄,精度越高。

2. Confidence Interval for the Mean Response 平均响应的置信区间

This interval addresses the uncertainty about the location of the regression line itself. 这个区间解决了回归线本身位置的不确定性。

  • The Question It Answers: “For a house with 3 bedrooms and 2 bathrooms, what is the plausible range for the average selling price of all such houses?” 它回答的问题:“对于一栋有 3 间卧室和 2 间浴室的房子,所有此类房屋平均售价的合理范围是多少?”
  • The Formula: \(\hat{\boldsymbol{\beta}}^T \mathbf{x} \pm t_{n-p-1}(\alpha/2)s\sqrt{\mathbf{x}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}}\)
    • \(\hat{\boldsymbol{\beta}}^T \mathbf{x}\): This is your point prediction, \(\hat{y}\), for the given input vector x. 这是给定输入向量 x 的点预测 \(\hat{y}\)
    • \(s\sqrt{\mathbf{x}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}}\): This is the standard error of the mean response. Its value depends on how far the input vector x is from the center of the data. This means the confidence interval is narrowest near the average of your data and gets wider as you move toward the extremes. 这是平均响应的标准误差。其值取决于输入向量 x 距离数据中心的距离。这意味着置信区间在数据平均值附近最窄,并且随着接近极值而变宽。

3. Prediction Interval for an Individual Response 单个响应的预测区间

This is the most comprehensive interval and is often the most useful for making real-world predictions. 这是最全面的区间,通常对于进行实际预测最有用。

  • The Question It Answers: “I want to predict the selling price for one specific house that has 3 bedrooms and 2 bathrooms. What is a plausible price range for this single house?”
  • The Formula: \(\hat{\boldsymbol{\beta}}^T \mathbf{x} \pm t_{n-p-1}(\alpha/2)s\sqrt{1 + \mathbf{x}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}}\)
  • The Key Difference: Notice the formula is identical to the one above, except for the 1 + ... inside the square root. This “1” is critically important. It accounts for the second source of uncertainty. 请注意,该公式与上面的公式完全相同,只是平方根中的1 + ...**不同。这个“1”至关重要。它解释了第二个不确定性来源。
    1. Uncertainty in the model: We are not perfectly certain about the true location of the regression line. This is captured by the \(\mathbf{x}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}\) term. **我们无法完全确定回归线的真实位置。这可以通过 \(\mathbf{x}^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}\) 项来捕捉。
    2. Uncertainty in the individual data point: Even if we knew the true regression line perfectly, individual data points would still be scattered around it due to random error (\(\epsilon\)). The “1” in the formula accounts for this irreducible, random error of a single observation. 即使我们完全了解真实的回归线,由于随机误差 (\(\epsilon\)),单个数据点仍然会散布在其周围。公式中的“1”解释了单个观测值中这种不可约的随机误差。

Because it accounts for both types of uncertainty, the prediction interval is always wider than the confidence interval for the mean. 由于它同时考虑了两种不确定性,因此预测区间总是比均值的置信区间更宽。

The Core Difference: An Analogy 一个类比

  • Confidence Interval (Mean) 均值置信区间: Like predicting the average arrival time for a specific flight that runs every day. After observing it for a year, you can predict the average very accurately (e.g., 10:05 AM ± 2 minutes). 就像预测每天特定航班的平均到达时间。经过一年的观察,您可以非常准确地预测平均值(例如,上午 10:05 ± 2 分钟)。

  • Prediction Interval (Individual) 个体预测区间: Like predicting the arrival time for that same flight on one specific day next week. You have to account for the uncertainty in the average plus the potential for random, one-time events like weather or air traffic delays. Your prediction must be wider to be safe (e.g., 10:05 AM ± 15 minutes). 就像预测同一航班下周某一天的到达时间。您必须考虑平均值的不确定性,以及*可能出现的随机、一次性事件,例如天气或空中交通延误。您的预测范围必须更广才能安全(例如,上午 10:05 ± 15 分钟)。

12.construct different types of intervals

  • 内容:

These slides explain Analysis of Variance (ANOVA), a method used in regression to break down the total variability in your data to test if your model is statistically significant as a whole.这些幻灯片讲解了方差分析 (ANOVA),这是一种用于回归分析的方法,用于分解数据中的总变异性,以检验模型整体是否具有统计显著性。

Summary

The core idea is to decompose the total variation in the response variable (Total Sum of Squares, SS_total) into two parts: the variation that is explained by your regression model (Regression Sum of Squares, SS_reg) and the variation that is left unexplained (Error Sum of Squares, SS_error). 其核心思想是将响应变量的总变异(总平方和,SS_total)分解为两部分:回归模型可以解释的变异(回归平方和,SS_reg)和未解释的变异(误差平方和,SS_error)。

By comparing the size of the explained variation to the unexplained variation using an F-statistic, we can formally test the hypothesis that our model has predictive power. This entire process is neatly organized in an ANOVA table. 通过使用F 统计量比较已解释变异与未解释变异的大小,我们可以正式检验模型具有预测能力的假设。整个过程都整齐地组织在方差分析表中。

Deeper Dive into Concepts and Connections

1. The Decomposition of Variances (The Core Equation) 方差分解(核心方程)

The first slide starts with the fundamental equation of ANOVA, which stems directly from the geometric properties of least squares: 第一张幻灯片以方差分析的基本方程开头,该方程直接源于最小二乘的几何性质:

\[SS_{total} = SS_{reg} + SS_{error}\]

  • SS_total (Total Sum of Squares): \(\sum(y_i - \bar{y})^2\)
    • Concept: This measures the total variation in your response variable, y. Imagine you didn’t have a model and your only prediction for any y was its overall average, ȳ. SS_total is the total squared error of this simple “mean-only” model. It represents the total amount of variation you are trying to explain. 这测量的是响应变量“y”的总变异。假设你没有模型,你对任何“y”的唯一预测是它的整体平均值“ȳ”。SS_total 是这个简单的“仅均值”模型的总平方误差。它代表了你试图解释的变异总量。
  • SS_reg (Regression Sum of Squares): \(\sum(\hat{y}_i - \bar{y})^2\)
    • Concept: This measures the explained variation. It’s the amount of variation in y that is captured by your regression model. It calculates the difference between your model’s predictions (ŷ) and the simple average (ȳ). A large SS_reg means your model’s predictions are a big improvement over just guessing the average. 它衡量解释变异**。它是回归模型捕捉到的 y 的变异量。它计算模型预测值(“ŷ”)与简单平均值(“ȳ”)之间的差异。较大的 SS_reg 意味着模型的预测结果比仅仅猜测平均值有显著改善。
  • SS_error (Error Sum of Squares): \(\sum(y_i - \hat{y}_i)^2\)
    • Concept: This measures the unexplained variation (also called the Residual Sum of Squares). It’s the amount of variation your model failed to capture. It’s the sum of the squared differences between the actual data (y) and your model’s predictions (ŷ). 它衡量未解释变异**(也称为残差平方和)。它是模型未能捕捉到的变异量。它是实际数据 (y) 与模型预测值 (ŷ) 之间平方差之和。

The R-squared value is a direct consequence of this decomposition. It’s the proportion of the total variance that is explained by the model: R 平方 值是这种分解的直接结果。它是模型解释的总方差的比例:

\[R^2 = \frac{SS_{reg}}{SS_{total}}\]

2. The ANOVA Table and the F-test

方差分析表和 F 检验 The second slide organizes these sums of squares to perform a formal hypothesis test. 第二张幻灯片整理了这些平方和,以进行正式的假设检验。

  • The Question: “Is there any relationship between my set of predictors and the response variable?” or “Is my model better than nothing?” “我的预测变量集和响应变量之间是否存在任何关系?”或“我的模型比没有模型好吗?”
  • The Hypotheses:
    • Null Hypothesis (\(H_0\)): \(\beta_1 = \beta_2 = \dots = \beta_p = 0\). (None of the predictors have a relationship with the response; the model is useless). 零假设 (\(H_0\))\(\beta_1 = \beta_2 = \dots = \beta_p = 0\)。 (所有预测变量都与响应变量无关;该模型毫无用处)。
    • Alternative Hypothesis (\(H_1\)): At least one \(\beta_j\) is not zero. (The model has some predictive value). 备择假设 (\(H_1\)):至少有一个 \(\beta_j\) 不为零。(该模型具有一定的预测值)。

To test this, we can’t just compare the raw SS values, because they depend on the number of data points and predictors. We need to normalize them. 为了验证这一点,我们不能仅仅比较原始的 SS 值,因为它们取决于数据点和预测变量的数量。我们需要对它们进行归一化。

  • Mean Squares (MS): This is the “average” variation. We calculate it by dividing the Sum of Squares by its degrees of freedom (df). 这是“平均”变异。我们通过将平方和除以其自由度 (df)** 来计算它。
    • MS_reg = \(SS_{reg} / p\). This is the average explained variation per predictor. 这是每个预测变量的平均解释变异。
    • MS_error = \(SS_{error} / (n - p - 1)\). This is the average unexplained variation, which is our estimate of the error variance, \(s^2\). 这是平均未解释变异,即我们对误差方差 \(s^2\) 的估计值。

3. The Connection: The F-statistic 联系:F 统计量

The F-statistic is the key that connects everything. It’s the ratio of the two mean squares: F 统计量是连接一切的关键。它是两个均方的比值: \[F = \frac{\text{Mean Squared Regression}}{\text{Mean Squared Error}} = \frac{MS_{reg}}{MS_{error}}\]

  • Intuitive Meaning: The F-statistic is a ratio of the average explained variation to the average unexplained variation. F 统计量是平均解释变异平均未解释变异的比值。
    • If your model is useless (\(H_0\) is true), the explained variation should be about the same as the random, unexplained variation. The F-statistic will be close to 1. 如果你的模型无效(H_0$ 为真),则解释变异应该与随机的未解释变异大致相同。F 统计量接近 1。
    • If your model is useful (\(H_1\) is true), the explained variation should be significantly larger than the unexplained variation. The F-statistic will be much greater than 1. 如果你的模型有效(H_1$ 为真),则解释变异应该显著大于未解释变异。 F 统计量将远大于 1。

We compare our calculated F-statistic to an F-distribution to get a p-value. A small p-value (< 0.05) provides strong evidence to reject the null hypothesis and conclude that your model, as a whole, is statistically significant. 我们将计算出的 F 统计量与F 分布进行比较,得出p 值。较小的 p 值(< 0.05)可以提供强有力的证据来拒绝零假设,并得出您的模型整体具有统计显著性的结论。

12.construct different types of intervals

  • 内容: These slides explain the Gauss-Markov theorem, a cornerstone result in statistics that establishes why the Least Squares Estimator (LSE) is considered the gold standard for fitting linear models under a specific set of assumptions. 这些幻灯片解释了高斯-马尔可夫定理,这是统计学中的一个基石性成果,它阐明了为什么最小二乘估计量 (LSE) 被认为是在特定假设条件下拟合线性模型的黄金标准。

Summary

The slides argue for the superiority of the Least Squares Estimator (LSE) by highlighting its key properties: it’s easy to compute, consistent, and efficient. This culminates in the Gauss-Markov Theorem, which proves that LSE is BLUE: the Best Linear Unbiased Estimator. This means that among all estimators that are both linear and unbiased, the LSE is the “best” because it has the smallest possible variance, making it the most precise. The second slide provides the key steps for the mathematical proof of this important theorem. 这些幻灯片通过强调最小二乘估计量 (LSE) 的关键特性来论证其优越性:易于计算、一致性高且高效。最终得出了高斯-马尔可夫定理,该定理证明了 LSE 是BLUE最佳线性无偏估计量。这意味着在所有线性且无偏的估计量中,LSE 是“最佳”的,因为它具有最小的方差,因此精度最高。第二张幻灯片提供了这一重要定理的数学证明的关键步骤。

Deeper Dive into the Concepts

Properties of LSE (Slide 1) 局部正交估计 (LSE) 的性质

  • Easy Computation易于计算: The LSE has a direct, closed-form solution called the Normal Equation (\(\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\)). You can calculate it directly without needing complex iterative algorithms.
  • Consistency一致性: As your sample size gets larger and larger, the LSE estimate (\(\hat{\boldsymbol{\beta}}\)) is guaranteed to get closer and closer to the true population value (\(\boldsymbol{\beta}\)). With enough data, it will find the truth. 随着样本量越来越大,LSE 估计值 (\(\hat{\boldsymbol{\beta}}\)) 必然会越来越接近真实的总体值 (\(\boldsymbol{\beta}\))。只要有足够的数据,它就能找到真相。
  • Efficiency效率: An efficient estimator is the one with the lowest possible variance. This means its estimates are the most precise and least spread out. 高效的估计器是方差尽可能低的估计器。这意味着它的估计值最精确,且分布最均匀。
  • BLUE (Best Linear Unbiased Estimator)BLUE(最佳线性无偏估计器): This acronym elegantly summarizes the Gauss-Markov theorem. 这个缩写完美地概括了高斯-马尔可夫定理。
    • Linear: The estimator is a linear function of the response variable y. We can write it as \(\hat{\boldsymbol{\beta}} = \mathbf{A}\mathbf{y}\) for some matrix A. 估计器是响应变量y的线性函数。对于某个矩阵A,我们可以将其写成 \(\hat{\boldsymbol{\beta}} = \mathbf{A}\mathbf{y}\)
    • Unbiased: The estimator does not systematically overestimate or underestimate the true parameter. On average, its expected value is the true value: \(E[\hat{\boldsymbol{\beta}}] = \boldsymbol{\beta}\). 估计器不会系统性地高估或低估真实参数。平均而言,其预期值即为真实值:\(E[\hat{\boldsymbol{\beta}}] = \boldsymbol{\beta}\)
    • Best: It has the minimum variance of all possible linear unbiased estimators. It’s the most precise and reliable estimator in its class. 在所有可能的线性无偏估计器中,它的方差最小。它是同类中最精确、最可靠的估计器。

The Gauss-Markov Theorem 高斯-马尔可夫定理

The theorem provides the theoretical justification for using OLS. 该定理为使用最小二乘法 (OLS) 提供了理论依据。 * The Core Idea: You could invent many different ways to estimate the coefficients of a linear model. As long as your proposed methods are both linear and unbiased, the Gauss-Markov theorem guarantees that none of them will be more precise than the standard least squares method. LSE gives the “sharpest” possible estimates. 你可以发明许多不同的方法来估计线性模型的系数。只要你提出的方法是线性的且无偏的,高斯-马尔可夫定理就能保证,它们都不会比标准最小二乘法更精确。最小二乘法 (LSE) 给出了“最精确”的估计值。

  • The Logic of the Proof (Slide 2) 证明逻辑: The proof is a clever comparison of variances. **该证明巧妙地比较了方差。
    1. It starts by defining any other linear unbiased estimator as \(\tilde{\boldsymbol{\beta}} = \mathbf{A}\mathbf{y}\). 首先,将任何其他线性无偏估计量定义为 \(\tilde{\boldsymbol{\beta}} = \mathbf{A}\mathbf{y}\)
    2. It uses the “unbiased” property to force a condition on the matrix A, which ultimately leads to the insight that A can be written in terms of the LSE matrix plus some other matrix D, where DX = 0. 它利用“无偏”性质对矩阵A强制施加一个条件,最终得出A可以写成LSE矩阵加上另一个矩阵D,其中DX = 0
    3. It then calculates the variance of this other estimator, which turns out to be: \[Var(\tilde{\boldsymbol{\beta}}) = Var(\text{LSE}) + \text{a non-negative term involving } \mathbf{D}\] 然后计算另一个估计量的方差,结果为: \[Var(\tilde{\boldsymbol{\beta}}) = Var(\text{LSE}) + \text{一个包含 } \mathbf{D} 的非负项\]
    4. Since the variance of any other linear unbiased estimator is the variance of the LSE plus something non-negative, the variance of the LSE must be the smallest possible value. 由于任何其他线性无偏估计量的方差都是LSE的方差加上一个非负项,因此LSE的方差必须是最小的可能值。

Further Understandings Beyond the Slides

1. What are the required assumptions?需要哪些假设?

The Gauss-Markov theorem is powerful, but it’s not magic. It only holds if a set of assumptions about the model’s errors (\(\epsilon\)) are met: 高斯-马尔可夫定理虽然强大,但并非魔法。它仅在满足以下关于模型误差 (\(\epsilon\)) 的假设时成立: * Zero Mean零均值: The average of the errors is zero (\(E[\epsilon] = 0\)). 误差的平均值为零 (\(E[\epsilon] = 0\))。 * Constant Variance (Homoscedasticity)恒定方差(同方差性): The errors have the same variance, \(\sigma^2\), at all levels of the predictors. 在预测变量的各个水平上,误差具有相同的方差 \(\sigma^2\)。 * Uncorrelated Errors不相关误差:** The error for one observation is not correlated with the error for another. 一个观测值的误差与另一个观测值的误差不相关。 * No Perfect Multicollinearity非完全多重共线性: The predictor variables are not perfectly linearly related. 预测变量并非完全线性相关。

Crucially, the Gauss-Markov theorem does NOT require the errors to be normally distributed. The normality assumption is only needed later for constructing confidence intervals and conducting t-tests and F-tests. 至关重要的是,高斯-马尔可夫定理并不要求误差服从正态分布。**正态性假设仅在构建置信区间以及进行 t 检验和 F 检验时需要。

2. When is LSE NOT the Best? (The Bias-Variance Tradeoff) 什么时候 LSE 不是最佳选择? (偏差-方差权衡)

While LSE is the best unbiased estimator, sometimes we can get better predictive performance by accepting a little bit of bias in exchange for a large reduction in variance. This is the core idea behind modern regularization methods: 虽然 LSE 是最好的无偏估计器,但有时我们可以通过接受少量偏差来大幅降低方差,从而获得更好的预测性能。这是现代正则化方法背后的核心思想: * Ridge Regression and LASSO岭回归和 LASSO: These are popular techniques that produce biased estimates of the coefficients. However, by introducing this small amount of bias, they can often produce models with a lower overall error (Mean Squared Error) than LSE, especially when predictors are highly correlated. 这些是产生有偏系数估计的流行技术。然而,通过引入少量偏差,它们通常可以生成比 LSE 具有更低总体误差(均方误差)的模型,尤其是在预测变量高度相关的情况下。 Therefore, while LSE is the theoretical champion in the world of unbiased estimators, in the world of predictive modeling, methods that intentionally introduce bias can sometimes be superior. 因此,虽然 LSE 是无偏估计领域的理论冠军,但在预测模型领域,有意引入偏差的方法有时会更胜一筹。

Meeting Notes

1. 九月份规划:

  • 内容:

1.1 经典论文模型复现:

1.2 数据集:

quantum-machine-QM9 中科大实验课做了个QM9数据集的demo

  • 闭壳层分子数据集

  • 分子规模:含 13.4 万个稳定的小分子有机物

  • 元素组成:仅包含 H(氢)、C(碳)、N(氮)、O(氧)、F(氟)5 种元素

  • 计算理论水平:所有分子属性基于DFT(密度泛函理论)/B3LYP 泛函 / 6-31G (2df,p) 基组计算

  • 包含属性:偶极矩、HOMO(最高占据分子轨道)能量、LUMO(最低未占据分子轨道)能量、0K 内能、298.15K 内能等,是闭壳层分子性质预测的基准数据集

  • 训练输入:闭壳层下的量子化学矩阵,即 Fock 矩阵(F)、密度矩阵(P)、哈密顿矩阵(H)、重叠矩阵(S),构成向量 T 的闭壳层形式(因闭壳层自旋对称,无需区分 α、β 自旋,故 T=[F,P,H,S])

  • 特征本质:这些矩阵编码了分子的电子结构信息(如轨道间相互作用、电子密度分布),是 OrbNet-Equi 学习 “分子结构 - 能量” 映射的核心依据

  • 训练输出:QM9 中的0K 内能作为核心训练目标输出,0K 内能是分子势能面(PES)计算的核心属性,直接关联分子稳定性与反应能垒预测,相比其他属性(如偶极矩),能量是闭壳层与开壳层系统共有的关键指标,便于后续扩展到开壳层能量预测。

1.3 复现方式:

数据小型化复现: setp1: 5k bonds and edges

setp2: 10k bonds and edges

setp3: 20k bonds and edges

1.4 两种数据空间一个待理解的概念Open-shell:

  • AO (Atomic Orbital)原子轨道
  • MO (Molecular Orbital)分子轨道
  • Open-shell开壳层组态

1. AO (Atomic Orbital) - 原子轨道层面 / 以原子为中心的模型

将分子看作是原子(节点)和化学键(边)构成的图。模型学习每个原子以及其周围局部环境的representation,预测整个分子的性质。

  • key: 分子的性质是由其组成原子以及原子间的相互作用决定的。

  • 输入: 原子的坐标、原子类型、以及原子间的距离或键合关系。

  • 方式: 消息传递图神经网络 (Message Passing Neural Network, MPNN) 。每个原子(节点)从其邻居原子那里接收“消息”(信息),更新自己的状态(特征向量)。过程会重复多次(对应图神经网络的多个GCL层),信息可以在整个分子中传播。

    • EGNN (E(n) Equivariant Graph Neural Network): 典型的以原子为中心的模型。等变性 (Equivariance),旋转或移动整个分子时,模型内部学习到的原子表示也会相应地旋转或移动,最终预测的能量等标量属性保持不变。符合物理规律,性能出色。它直接在原子的3D坐标上进行操作。
    • OrbNet: 数据来源采用半经验方法(GFN1-xTB)生成量子化学矩阵,显著降低了计算成本,同时保留了关键物理信息,支持数千原子规模的分子模拟。 闭壳层(Closed-shell)与开壳层(Open-shell)系统的区别:闭壳层电子自旋全配对(仅需考虑空间自由度),开壳层含未配对电子(需同时考虑空间和自旋自由度),开壳层在自由基、反应中间体等场景的关键意义。
      • key: 基于原子轨道(AO)特征(自洽场(SCF)收敛过程中的量子化学矩阵)预测分子能量。
      • 特征表示: 采用对称适配原子轨道(SAAO)基组,将 AO 特征编码为图结构数据。
      • 模型架构: 基于图神经网络(GNN),解码输出张量并求和得到分子能量。

2. MO (Molecular Orbital) - 分子轨道层面 / 以分子为整体的模型

直接学习或预测整个分子的全局属性,分子整体电子结构相关的属性。分子轨道本身就是由所有原子轨道线性组合而成的,描述了电子在整个分子中的运动状态。

  • key: 直接对分子的全局特征或其电子结构的宏观表现(如轨道能级)进行建模。
  • 典型输入: 整个分子的描述符(例如分子指纹 fingerprint),或者直接将分子结构作为输入来预测分子轨道的性质。
  • 工作方式: 这类模型可能不完全依赖于原子间的消息传递,而是旨在直接构建一个从分子到其全局属性的映射。例如,预测分子的最高占据分子轨道 (HOMO) 和最低未占据分子轨道 (LUMO) 的能量。

3. Open-shell - 开壳层组态

Open-shell

随便看看的一篇ICML-2024 ICML-WORKSHOP-2024

NPJ-2022

PNAS-2022 OrbNet-Equi !!! orbnet qm9的graph based的feature

2. 十月份规划:

2.1 What we need to do?

我们需要分析AO 和 MO 的表现。

我们不确定MO和AO的variability的差异,是由EGNN还是GPR带来的 他们的information不一样。

Learning curve - learnability图

我们需要通过对比学习找到 AO 和 MO 的 similarity。

理想化结果: 我们希望AO通过对比学习达到MO的程度,我们希望对比学习对AO更有用。

我们希望达到GCL + AO

AO从物理意义上更本质,MO的性质更好。

Final goal Inverse design 需要生成 AO

2.2 (Linear Combination of Atomic Orbitals)

LCAO 原子轨道线性组合 (Linear Combination of Atomic Orbitals)

  • key: 分子的复杂行为(由分子轨道MO描述)可以近似地通过其组成原子的更简单的行为(由原子轨道AO描述)来构建。一个分子轨道 (MO) 可以表示为多个原子轨道 (AO) 的加权和。

  • 数学形式: 一个分子轨道 \(\Psi_{MO}\),它可以表示为: \[\Psi_{MO} = c_1\phi_1 + c_2\phi_2 + \dots + c_n\phi_n = \sum_{i=1}^{n} c_i\phi_i\] 其中:

    • \(\Psi_{MO}\) 是一个分子轨道波函数。
    • \(\phi_i\) 是第 \(i\) 个原子的原子轨道波函数。
    • \(c_i\) 是每个原子轨道的组合系数 (coefficient),它是一个权重值,表示该原子轨道对这个分子轨道的贡献大小。系数通过求解薛定谔方程(通常使用Hartree-Fock等近似方法)得到的。
  • ex: 氢分子(H₂)。有两个氢原子,每个氢原子有一个1s原子轨道(\(\phi_A\)\(\phi_B\))。这两个原子轨道可以通过两种方式线性组合,形成两个分子轨道:

    1. 成键轨道 (Bonding MO): \(\Psi_{\sigma} = c_A\phi_A + c_B\phi_B\)。电子处于这个轨道时,会主要分布在两个原子核之间,形成稳定的化学键。能量比原来的AO更低。
    2. 反键轨道 (Antibonding MO): \(\Psi_{\sigma^*} = c'_A\phi_A - c'_B\phi_B\)。电子处于这个轨道时,会主要分布在原子核的外侧,排斥两个原子核,不利于成键。能量比原来的AO更高。

2.3 Localization (分子轨道局域化)

分子轨道(MOs),尤其是通过标准计算方法(如Hartree-Fock)直接求解出来的,通常是离域的 (delocalized)。这意味着每个MO都可能扩展到整个分子,由分子中几乎所有原子的AOs贡献构成。例如,在苯环中,计算出的π电子MO会均匀地分布在六个碳原子上。

分子轨道局域化 (Localization of Molecular Orbitals) 就是一个数学变换过程,它将这些离域的MOs转化为一组新的局域化分子轨道 (Localized Molecular Orbitals, LMOs)

  • 核心目标: 在不改变分子整体波函数和总能量的前提下,将分子轨道尽可能地限制在空间中的一小块区域内。
  • 变换结果:
    • 离域的成键轨道 \(\rightarrow\) 对应于特定 化学键 的局域轨道(例如C-H键,C=C双键)。
    • 离域的非键轨道 \(\rightarrow\) 对应于特定原子上的 孤对电子 (lone pair)内层电子
  • 局域化:
    1. 化学直观性: LMOs提供了清晰的化学图像,便于理解和分析化学成键情况。

3. LCAO 和 MO 的关系

  1. LCAO是构建MO的方法: LCAO是用于近似计算和表示分子轨道(MO)的数学框架。我们假设MO可以由一组已知的基函数(即原子轨道AO)线性组合而成。
  2. MO是LCAO方法的结果: 通过LCAO方法,结合量子力学变分原理求解薛定谔方程,我们最终得到了一系列分子轨道(MOs)的具体形式(即每个AO的贡献系数\(c_i\))以及它们的能量。

原子轨道 (AO) [输入] \(\xrightarrow{\text{LCAO方法 [过程/框架]}}\) 分子轨道 (MO) [输出/结果]

4. 高斯过程回归 (Gaussian Process Regression, GPR)

高斯过程回归 (GPR) 是一种基于贝叶斯思想的非参数回归方法。它在处理小样本、高维度、需要不确定性估计的复杂回归问题时特别有效。

key

GPR的核心是直接对函数本身进行建模。它假设我们想要建模的目标函数 \(f(x)\) 是一个服从高斯过程 (Gaussian Process, GP) 的随机函数。

  • 高斯过程 (GP) 一个高斯过程是无穷多个随机变量的集合,其中任意有限个随机变量的组合都服从一个联合高斯分布。 一个GP定义了一个关于函数的分布 (a distribution over functions)。当我们从这个GP中“采样”时,我们得到的不是一个数值,而是一整个函数。

一个高斯过程完全由两部分定义: 1. 均值函数 (Mean Function) \(m(x)\): 定义了函数分布的“期望”或“中心趋势”。通常为了简化,会假设均值为零。 2. 协方差函数 (Covariance Function) 或 核函数 (Kernel) \(k(x, x')\): 定义了函数在不同输入点 \(x\)\(x'\) 处的值之间的“相关性”或“相似性”。如果 \(x\)\(x'\) 很接近,核函数的值就很大,意味着 \(f(x)\)\(f(x')\) 的值会很相似。这编码了我们对函数平滑性的先验信念。

GPR 工作

GPR的工作流程:

第一步:定义先验分布 (Prior Distribution) 在看到任何训练数据之前,我们首先根据先验知识选择一个均值函数(通常为0)和一个核函数(例如常用的径向基函数核/RBF核)。这个GP定义了一个函数的先验分布,包含了我们能想到的所有“可能”的函数。

第二步:计算后验分布 (Posterior Distribution) 当我们得到一组训练数据 \((X_{train}, Y_{train})\) 后,我们利用贝叶斯定理来更新我们的函数分布。我们从先验分布中“筛选”掉那些与训练数据不符的函数,得到一个后验分布 (Posterior Distribution)

这个后验分布仍然是一个高斯过程,其均值和协方差有解析解(可以直接计算出来),不需要复杂的迭代优化。

进行预测

对于一个新的测试点 \(x_{test}\),我们想预测对应的 \(y_{test}\)。在后验分布下,\(y_{test}\) 的预测值服从一个一维高斯分布,这个分布有: 1. 预测均值 (Predicted Mean): 这就是我们对 \(y_{test}\) 的最佳点估计。它是由训练数据点的加权平均计算得出的,权重由核函数决定。 2. 预测方差 (Predicted Variance): 这衡量了我们对预测结果的不确定性。在靠近训练数据点的地方,方差会很小(预测很自信);在远离训练数据点的未知区域,方差会很大(预测很不确定)。

3. 后续规划:

IF AO 的学习表现比 MO 要好 我们将会聚焦于 AO (Atomic representation vs atomic orbital)

  1. AO one body decomposition
  2. MO two body decomposition

Download Pandoc!

pandoc-3.8-windows-x86_64.msi

【问题】主要为了解决默认的Next渲染器无法渲染复杂公式的问题

Step1:在系统变量中找到Path→点击编辑

Step2:点击新建→输入pandoc.exe的父目录路径(C:)→点击确定

Step3:重启终端

Step4:安装pandoc-3.8-windows-x86_64.msi

1
$ pandoc --version
1
$ npm list --depth=0 | Select-String "renderer"
1
npm uninstall hexo-renderer-kramed --save
1
npm uninstall hexo-renderer-markdown-it --save
1
$ npm list --depth=0 | Select-String "renderer"
1
$ npm install hexo-renderer-pandoc --save

More info: Ref

Step5:在你所在的博客头加入必要的引入

1
$ mathjax: true

Step6:在D:_AILab_HKUST_Machine_Learning的_config.yaml文件中确保你的Pandoc路径能被找到

1
2
3
4
$ pandoc:
$ pandoc_path: "C:/Users/Aprine/AppData/Local/Pandoc/pandoc.exe" #
$ args:
$ - "--mathjax"

Step7:在D:_AILab_HKUST_Machine_Learning_config.yaml文件中确保你的math信息配置正确

1
2
3
4
$ mathjax:
$ enable: true
$ # See: https://mhchem.github.io/MathJax-mhchem/
$ mhchem: true

Step8:修改你的head文件的基础格式

More info: Ref

Push new blog

1
$   hexo g -d

对比损失函数(InfoNCE/NT-Xent Loss)定义为: \(\mathcal{L}_{\text{q}} = -\log \underbrace{\left( \frac{\exp\left( \mathbf{q} \cdot \mathbf{k}^{+} / \tau \right)}{\exp\left( \mathbf{q} \cdot \mathbf{k}^{+} / \tau \right) + \sum\limits_{i=1}^{N} \exp\left( \mathbf{q} \cdot \mathbf{k}_{i}^{-} / \tau \right)} \right)}_{\text{Softmax 概率}}\)

用于无监督视觉表示学习的动量对比 Momentum Contrast for Unsupervised Visual Representation Learning

1. Momentum Contrast

1.1 定义

  • 内容: 对比学习(Contrastive Learning) 通过让模型学习区分相似正样本与不相似负样本的数据点来学习有用的特征。

1.2 正样本对(Positive Pairs):

  • 内容: 通常来自同一数据点的不同数据增强视图(例如,同一张图片的两次随机裁剪、颜色抖动等)。它们应该具有相似的语义信息。

1.3 负样本(Negatives):

  • 内容: 来自与查询样本不同的其他数据点。它们代表不同的语义内容。

1.4 目标: 模型的目标是学习一个编码器(Encoder)

  • 内容: 查询样本与其对应的正样本在特征空间中的距离很近(相似度高)。
  • 内容: 查询样本与大量负样本在特征空间中的距离很远(相似度低)。

2. 创新

2.1 动态字典(Dynamic Dictionary):

  • 内容: MoCo 维护一个先进先出FIFO的队列来存储编码后的特征Keys
  • 内容:当前批次的数据经过键编码器编码后,其特征被入队添加到字典队列尾部。
  • 内容:同时,队列中最老的批次特征被出队dequeue 移除。
  • 内容:队列可以将字典大小 \(K\) 设计得远大于单个批次的大小,从而提供海量且一致的负样本来源(一致性由下面的动量更新保证)。队列解耦了字典大小与批次大小的限制。

2.2 动量更新编码器(Momentum Update of Key Encoder):

  • 内容: 查询编码器使用标准的梯度下降更新(SGD
  • 内容: 键编码器不通过反向传播更新
  • 内容: 键编码器的参数 \(θ_k\) 通过动量更新Momentum Update从查询编码器的参数 \(θ_q\) 获得:\[ \theta_k \gets m \cdot \theta_k + (1 - m) \cdot \theta_q \]其中 \(m\) 是一个动量系数(如 \(m\) = \(0.999\)),非常接近\(1\)。 ## 2.3 优势:
  • 内容:动量更新使得键编码器 \(f_k\) 的参数变化非常缓慢和平滑。

3. 对比损失函数(InfoNCE Loss:

3.1 InfoNCE Loss (Noise-Contrastive Estimation Loss):

对比损失函数(InfoNCE/NT-Xent Loss)定义为: \(\mathcal{L}_{\text{q}} = -\log \underbrace{\left( \frac{\exp\left( \mathbf{q} \cdot \mathbf{k}^{+} / \tau \right)}{\exp\left( \mathbf{q} \cdot \mathbf{k}^{+} / \tau \right) + \sum\limits_{i=1}^{N} \exp\left( \mathbf{q} \cdot \mathbf{k}_{i}^{-} / \tau \right)} \right)}_{\text{Softmax 概率}}\)

3.2其中:

参数 含义
\(\mathbf{q}\) 查询向量(Query Vector):由查询编码器 \(f_q\) 输出(如 \(2048\) 维)。
\(\mathbf{k}^{+}\) 正样本键向量(Positive Key):由键编码器 \(f_k\) 输出(来自同一数据的不同增强视图)。
\(\mathbf{k}_{i}^{-}\) 负样本键向量(Negative Keys):来自字典队列的其他数据样本(数量为 \(N\),如 \(65536\))。
\(\tau\) 温度参数(Temperature):控制相似度分布的尖锐程度(典型值 \(0.05 \sim 0.2\))。
\(\cdot\) 向量点积(L2归一化后等价于余弦相似度,即 \(\mathbf{q} \cdot \mathbf{k} = \cos\theta\))。

3.2 数据流:

  • 原始输入: 一张图片 P.jpg (256x256 原始尺寸)
  • 预处理Step1: 随机裁剪出 224x224 的区域
  • 预处理Step2: 随机轻微改变颜色和亮度。
  • 预处理Step3: 归一化像素值。
  • 结果: [3, 224, 224] (一个 3x224x224 的张量)。输入到编码器的形式。
  • 查询编码器 \(f_q\) 处理:
  • Step1: 输入:[3, 224, 224] 张量。
  • Step2: 经过Model
  • Step3: 全局平均池化层 (Global Average Pooling) 将空间维度压缩掉。
  • Step4: 一个线性投影层将特征维度映射到 D
  • 结果:一个 \(D\) 维(如 \(M\) 维)的归一化向量 \(q\)。这个 \(q\) 代表了经过裁剪、颜色扰动后的猫头像的抽象特征。例如, [0.12, -0.05, 0.87, …, 0.03] (\(M\)个数值)。
  • 键编码器 \(f_k\) 处理:
  • Step1: 输入:[3, 224, 224] 张量。(对 P.jpg 应用另一组随机预处理得到的另一个 [3, 224, 224] 张量。)
  • Step2: 经过结构相同但参数由动量更新的 \(f_k\)
  • Step3: 一个 \(D\) 维(如 \(M\) 维)的归一化向量 \(k\)。例如 [0.15, -0.08, 0.84, …, 0.02]。这个 \(k\) 代表了同一数据但不同视角/颜色下的抽象特征。
  • 动态字典:
  • Step1:包含之前通过 \(f_k\) 计算出的 \(k\) 向量。例如,队列大小是 \(L\),里面存储了 \(L\) 个不同的 D=128 维向量,每个代表处理过的一张数据的特征。

用于MOB-ML(KA-GPR)的Criegee、H10链、小自由基、水键解离和具有MOB特征的QMSpin能量数据集 CaltechData!

Criegee, H10 chain, small radicals, water bond dissociation, and QMSpin energy datasets with MOB features for MOB-ML(KA-GPR)

目录

  1. Criegee数据集)
  2. H10链数据集
  3. 小自由基数据集
  4. 水分子键解离数据集
  5. QMSpin数据集

1. Criegee数据集

1.1 能量文件 (criegee.csv)

  • 内容: 包含 RHF 和 MRCI+Q 能量(cc-pVTZ 基组计算)。 ### 1.2 诊断文件 (criegee_diagnostic.csv):
  • 内容: 包含 CCSD/cc-pVTZ 计算的 T1 和 D1 诊断指标。 ### 1.3 结构文件夹 (共 800 个):
  • 内容: geo.xyz: 分子构型坐标
  • 内容: features_tz.hdf5: 用于 KA-GPR 的 对角 MOB 特征。

待学习:RHF, MRCI+Q, CC, KA-GPR, MOB

2. H10 链数据集 (h10.zip)

2.1 能量文件 (h10.csv):

  • 内容: 包含 RHF 和 MRCI+Q-F12 能量(cc-pVTZ-F12 基组计算)。 ### 2.2 结构文件夹:
  • 内容: geo.xyz: 分子构型坐标
  • 内容: features_tz.hdf5: 用于 KA-GPR 的 对角 MOB 特征。

待学习:RHF, MRCI+Q, CC, KA-GPR, MOB

3. 小自由基数据集 (small_radicals.zip)

3.1 包含 9 种自由基,每种自由基有:

3.1.1 能量文件 (x.csv):

  • 内容: 包含 ROHF 和 MRCI+Q 能量(cc-pVTZ 基组计算)。 ### 3.1.2 热化结构文件夹 (每种 200 个):
  • 内容: geo.xyz: 分子构型坐标
  • 内容: features_alpha.hdf5: α 自旋轨道的 MOB 特征
  • 内容: features_beta.hdf5: β 自旋轨道的 MOB 特征。

待学习:RHF, ROHF, MRCI+Q, CC, KA-GPR, MOB, α 自旋轨道的 MOB 特征, β 自旋轨道的 MOB 特征

4. 水分子键解离数据集 (h2o_dissociation.zip)

4.1 能量文件 (h2o_dissociation.csv)

  • 内容: 包含 初始构象 ID、OH 键解离路径的键长比例因子、ROHF 和 MRCI+Q 能量(aug-cc-pVTZ 基组计算)。 ### 4.2 初始构象文件夹 (共 50 个): ### 4.2.1 每个构象包含 20 个解离路径结构:
  • 内容: features_alpha.hdf5: α 自旋轨道的 MOB 特征
  • 内容: features_beta.hdf5: β 自旋轨道的 MOB 特征。

待学习:初始构象 ID、OH 键解离路径的键长比例因子, ROHF, MRCI+Q, CC, α 自旋轨道的 MOB 特征, β 自旋轨道的 MOB 特征

3. QMSpin 数据集 (qmspin.zip)

3.1 能量文件 (qmspin.csv):

  • 内容: 包含 单重态 (RHF) 和 三重态 (ROHF) 的 MRCI+Q 能量(cc-pVDZ 基组计算),自旋状态标记为 0(单重态)或 2(三重态)。 ### 3.2 结构文件夹:
  • 内容: geometries_singlet: 单重态优化结构
  • 内容: geometries_triplet: 三重态优化结构
  • 内容: 单重态特征文件:features_dz_singlet.hdf5: 单重态能量的 MOB 特征
  • 内容: 单重态特征文件:features_dz_singlet.hdf5: 单重态能量的 MOB 特征
  • 内容: 三重态特征文件:features_alpha_dz_triplet.hdf5: α 自旋轨道特征 features_beta_dz_triplet.hdf5: β 自旋轨道特征。

待学习:RHF, ROHF, MRCI+Q, CC, MOB, α 自旋轨道的 MOB 特征, β 自旋轨道的 MOB 特征

Welcome to TianyaoBlogs! This is my very first post. It has shown my process of building blogs!

Start to own your blogs

Download NodeJS & npm and check its version

1
$ node -v
1
$ npm -v

More info: Download More info: Download

Prepare your path

1
2
$ npm config set prefix "D:\Program Files (x86)\node_modules\node_global"
$ npm config set cache "D:\Program Files (x86)\node_modules\node_cache"

More info: Check

Check your environmental path

1
$ echo %NODE_PATH%

More info: Check

Generate SSH keys for the second account (For those people who have two or more github account)

1
$ ssh-keygen -t ed25519 -C "your_email@second_account.com"

Add the new public key to the second GitHub account

1
$   cat ~/.ssh/id_ed25519_second.pub

Configuring SSH multi-account rules

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$    cnano ~/.ssh/config
$ # (peninsula824)
$ Host github.com-peninsula
$ HostName github.com
$ User git
$ IdentityFile ~/.ssh/id_rsa # peninsula824_key
$ IdentitiesOnly yes

$ # (TianyaoBlogs)
$ Host github.com-tianyao
$ HostName github.com
$ User git
$ IdentityFile ~/.ssh/0901102262 # TianyaoBlogs_key
$ IdentitiesOnly yes

Test the connection

1
2
$   ssh -T git@github.com-peninsula
$ ssh -T git@github.com-tianyao

Update Hexo deployment configuration

1
2
3
4
5
$   deploy:
$ type: git
$ repo:
$ github: git@github.com-tianyao:TianyaoBlogs/TianyaoBlogs.github.io.git
$ branch: main

Complete the deployment

1
2
3
$   hexo clean
$ hexo generate
$ hexo deploy

Push new blog

1
$   hexo g -d