Radial-Basis Function and Cover’s Theorem
Implementation of RBF Based on Interpolation
Abstract
A Radial Basis Function (RBF) is a real-valued function whose output depends only on the distance between the input x and a fixed point c, called the center[1]. Data points are both linearly separable and non-linearly separable. Non-linearly separable data requires a curved decision boundary (not a straight line or hyperplane) to separate different classes perfectly. The Radial Basis Function (RBF) is the core component that gives RBF networks an RBF function as an activation function. It enables the network’s key characteristics, such as non-linear mapping and effective function approximation. This article dives into Cover’s theorem and RBF implementation with the required mathematics and algorithm.
1. Introduction
A Radial Basis Function (RBF) is a function φ that depends on the distance between two points. It’s used with a norm (a way to measure distance) in a vector space. The RBF kernel centered at a point c is written as φ_c = φ(∥x − c∥), where ∥x − c∥ is the distance between x and c[1].
Section 2 discusses Cover’s theorem, Section 3 outlines the RBF Functions with a numerical example, Section 4 is the RBF Algorithm, Section 5 provides detailed handwritten lecture notes, and Section 6 is the Conclusion.
2. Cover’s Theorem on the Separability of Patterns
Cover’s theorem states that:
“A complex pattern-classification problem, cast in a high-dimensional space nonlinearly, is more likely to be linearly separable than in a low-dimensional space.” This means the probability of separability increases with the increase in dimensionality. Thus, it also forms the theoretical basis for kernel-based learning algorithms.
Mathematical Interpretation:
----------------------------
Let:
- φ: ℝⁿ → ℝᴺ be a nonlinear transformation, where N >> n
- Data in ℝⁿ may not be linearly separable.
Then:
There exists a high probability that φ(data) ∈ ℝᴺ becomes linearly separable.
3. Numerical Examples of RBF
Example:
Given:
x = [4,5], [2, 3]] (fixed input) c₁ = [1, 2], c₂ = [3, 4] (centers, randomly selected) σ = ? (spread, computed using σ = d_max / √(2K)) w = Weights to be computed using the formula: w = (Φᵀ Φ)⁻¹ Φᵀ y b = 0.1 (bias, randomly selected)
Step 1: Compute the Spread (σ)
Formula for σ: σ = d_max / √(2K)
Where: - d_max is the maximum distance between any two centers. - K is the number of centers.
Step 1.1: Calculate d_max (distance between c₁ and c₂): d_max = √((c₂₁ — c₁₁)² + (c₂₂ — c₁₂)²) d_max = √((3–1)² + (4–2)²) d_max = √2² + 2 ² d_max = √(4 + 4) d_max = √8 ≈ 2.828
Step 1.2: Compute σ: σ = d_max / √(2K) σ = 2.828 / √(2 * 2) σ = 2.828 / √4 σ = 2.828 / 2 σ ≈ 1.414
Thus, the spread (σ) is approximately 1.414.
Step 2: Compute RBF Activations
The RBF activation for each center is computed using the formula: ϕₖ(x) = exp(- ||x — cₖ||² / 2σ²)
Where: - x is the input vector [ [4,5], [2, 3] ]. - c₁ = [1, 2] and c₂ = [3, 4] are the centers. - σ ≈ 1.414 is the spread.
Step 2.1: Compute ϕ
Step 2.1: Compute ϕ₁₁ (for c₁ = [1, 2]) for X = [4, 5]:
ϕ₁₁ = exp(- ((4 – 1)² + (5 – 2)²) / (2 * (1.414)²))
ϕ₁₁ = exp(- (3² + 3²) / (2 * 2))
ϕ₁₁ = exp(- (18 / 4))
ϕ₁₁ = exp(-4.5)
ϕ₁₁ ≈ 0.0111
Thus, ϕ₁₁ ≈ 0.0111
--------------------------------------------
Step 2.2: Compute ϕ₂₂ (for c₂ = [3, 4]) for X = [2, 3]:
ϕ₂₂ = exp(- ((2 – 3)² + (3 – 4)²) / (2 * (1.414)²))
ϕ₂₂ = exp(- (1² + 1²) / (2 * 2))
ϕ₂₂ = exp(- (2 / 4))
ϕ₂₂ = exp(-0.5)
ϕ₂₂ ≈ 0.6065
Thus, ϕ₂₂ ≈ 0.6065
--------------------------------------------
Summary (ϕ matrix):
Let:
- ϕ₁₁ = RBF between X₁ = [4,5] and c₁ = [1,2]
- ϕ₁₂ = RBF between X₁ = [4,5] and c₂ = [3,4]
- ϕ₂₁ = RBF between X₂ = [2,3] and c₁ = [1,2]
- ϕ₂₂ = RBF between X₂ = [2,3] and c₂ = [3,4]
You have already computed:
- ϕ₁₁ ≈ 0.0111
- ϕ₂₂ ≈ 0.6065
Now, compute:
ϕ₁₂ = exp(- ((4 – 3)² + (5 – 4)²) / (2 * (1.414)²))
ϕ₁₂ = exp(- (1² + 1²) / (2 * 2))
ϕ₁₂ = exp(-0.5)
ϕ₁₂ ≈ 0.6065
ϕ₂₁ = exp(- ((2 – 1)² + (3 – 2)²) / (2 * (1.414)²))
ϕ₂₁ = exp(- (1² + 1²) / (2 * 2))
ϕ₂₁ = exp(-0.5)
ϕ₂₁ ≈ 0.6065
Final ϕ matrix:
c₁ c₂
X₁ 0.0111 0.6065
X₂ 0.6065 0.6065
Step 3: Compute (Φᵀ Φ)
Compute (Φᵀ Φ)
Given:
Φ =
[ 0.0111 0.6065 ]
[ 0.6065 0.6065 ]
Now, compute ΦᵀΦ:
Φᵀ =
[ 0.0111 0.6065 ]
[ 0.6065 0.6065 ]
ΦᵀΦ =
[ (0.0111)^2 + (0.6065)^2 (0.0111)(0.6065) + (0.6065)^2 ]
[ (0.0111)(0.6065) + (0.6065)^2 (0.6065)^2 + (0.6065)^2 ]
=
[ 0.00012321 + 0.368 0.0067 + 0.368 ]
[ 0.0067 + 0.368 0.368 + 0.368 ]
≈
[ 0.3681 0.3747 ]
[ 0.3747 0.736 ]
Step 4: Compute (ΦᵀΦ)⁻¹
Step 4: Compute (ΦᵀΦ)⁻¹
Given:
ΦᵀΦ =
[ 0.3681 0.3747 ]
[ 0.3747 0.736 ]
To find the inverse of a 2x2 matrix:
If A = [a b]
[c d]
Then A⁻¹ = (1 / (ad - bc)) * [ d -b]
[-c a]
So,
Determinant = (0.3681)(0.736) - (0.3747)(0.3747)
≈ 0.270970 - 0.140399
≈ 0.130571
Now, compute the inverse:
(ΦᵀΦ)⁻¹ = (1 / 0.130571) *
[ 0.736 -0.3747 ]
[ -0.3747 0.3681 ]
≈
[ 5.636 -2.868 ]
[ -2.868 2.818 ]
Step 5: Compute Φᵀ y
Step 5: Compute Φᵀ y with y = [0.5, 0.7]
Given:
Φ =
[ 0.6065 0.6065 ]
[ 0.0111 0.6065 ]
y = [ 0.5, 0.7 ]
Now, compute Φᵀ:
Φᵀ =
[ 0.6065 0.0111 ]
[ 0.6065 0.6065 ]
Now, multiply Φᵀ by y:
Φᵀ y =
[ 0.6065 0.0111 ] * [ 0.5 ]
[ 0.6065 0.6065 ] [ 0.7 ]
= [ (0.6065 * 0.5) + (0.0111 * 0.7) ]
[ (0.6065 * 0.5) + (0.6065 * 0.7) ]
= [ 0.30325 + 0.00777 ]
[ 0.30325 + 0.42455 ]
= [ 0.31102 ]
[ 0.72780 ]
Thus, Φᵀ y =
[ 0.31102 ]
[ 0.72780 ]
Step 6: Compute Weights (w)
Now, we will compute the weights w using the formula: w = (Φᵀ Φ)⁻¹ Φᵀ y
Where: - Φ is the matrix of RBF activations for all the centers. - y is the target output vector. We assume them later.
Assume y = [0.5, 0.7] for simplicity.
Step 6: Compute Weight Vector w
w = (ΦᵀΦ)⁻¹ * Φᵀy
Given:
(ΦᵀΦ)⁻¹ =
[ 5.636 -2.868 ]
[ -2.868 2.818 ]
Φᵀy =
[ 0.31102 ]
[ 0.72780 ]
Now compute w:
w₁ = (5.636)(0.31102) + (-2.868)(0.72780)
≈ 1.75256 - 2.08712
≈ -0.33456
w₂ = (-2.868)(0.31102) + (2.818)(0.72780)
≈ -0.89245 + 2.05067
≈ 1.15822
So, w ≈
[ -0.33456 ]
[ 1.15822 ]
------------------------------------------------------
Step 7: Compute Feature Vector ϕ(X) for X = [2, 3]
Step 7: Compute Feature Vector ϕ(X) for X = [2, 3]
ϕ₁ = exp(- ((2–1)² + (3–2)²) / (2 * (1.414)²))
= exp(- (1² + 1²) / (2 * 2))
= exp(- (2 / 4))
= exp(-0.5)
≈ 0.6065
ϕ₂ = exp(- ((2–3)² + (3–4)²) / (2 * (1.414)²))
= exp(- (1² + 1²) / (2 * 2))
= exp(- (2 / 4))
= exp(-0.5)
≈ 0.6065
So, ϕ(X) = [0.6065, 0.6065]
Step 8: Compute Prediction ŷ
------------------------------------------------------
ŷ = ϕ(X) * w
= [0.6065, 0.6065] * [ -0.33456, 1.15822 ]ᵀ
= (0.6065)(-0.33456) + (0.6065)(1.15822)
≈ -0.2029 + 0.7025
≈ 0.4996
------------------------------------------------------
Final Prediction:
ŷ(X = [2, 3]) ≈ 0.4996
Special Case Computing Pseudo Inverse for the det = 0 case in RBF
Finding (Φᵀ Φ)⁻¹
To find (Φᵀ Φ)⁻¹, we need to compute the determinant and adjugate of the matrix Φᵀ Φ.
Given:
Φᵀ Φ =
[ 0.7358, 0.7358 ]
[ 0.7358, 0.7358 ]
### Step 1: Compute the Determinant of Φᵀ Φ
The determinant of a 2x2 matrix is calculated using the formula:
det(A) = (a * d) - (b * c)
For the matrix Φᵀ Φ, where:
a = 0.7358, b = 0.7358, c = 0.7358, d = 0.7358
det(Φᵀ Φ) = (0.7358 * 0.7358) - (0.7358 * 0.7358)
det(Φᵀ Φ) = 0–0 = 0
### Step 2: Compute the Adjugate of Φᵀ Φ
The adjugate of a 2x2 matrix is computed by swapping the diagonal elements and negating the off-diagonal elements. The adjugate of Φᵀ Φ is:
adj(Φᵀ Φ) =
[ d, -b ]
[ -c, a ]
For the matrix Φᵀ Φ, this becomes:
adj(Φᵀ Φ) =
[ 0.7358, -0.7358 ]
[ -0.7358, 0.7358 ]
### Step 3: Compute the Inverse of Φᵀ Φ
The inverse of a 2x2 matrix is calculated using the formula:
(Φᵀ Φ)⁻¹ = 1 / det(Φᵀ Φ) * adj(Φᵀ Φ)
However, since the determinant of Φᵀ Φ is zero, this means that the matrix is singular and does not have an inverse.
Thus, (Φᵀ Φ)⁻¹ does not exist because the determinant is zero.
Step 1: Add a regularization term to Φᵀ Φ
Φᵀ Φ =
[ [0.7358, 0.7358],
[0.7358, 0.7358] ]
Regularization parameter λ = 0.01
Φᵀ Φ + λI =
[ [0.7358 + 0.01, 0.7358],
[0.7358, 0.7358 + 0.01] ]
Φᵀ Φ + λI =
[ [0.7458, 0.7358],
[0.7358, 0.7458] ]
here I is the identity matrix
Step 2: Compute the inverse of Φᵀ Φ + λI
To compute the inverse of a 2x2 matrix
[ a, b ]
[ c, d ]
The formula for the inverse is:
(1 / (ad - bc)) * [ d, -b ]
[ -c, a ]
For Φᵀ Φ + λI = [ [0.7458, 0.7358],
[0.7358, 0.7458] ]
The determinant is:
det = (0.7458 * 0.7458) - (0.7358 * 0.7358) = 0.014816 (approximately)
Now compute the inverse:
(Φᵀ Φ + λI)⁻¹ = (1 / 0.014816) * [ [0.7458, -0.7358],
[-0.7358, 0.7458] ]
(Φᵀ Φ + λI)⁻¹ = 67.49 * [ [0.7458, -0.7358],
[-0.7358, 0.7458] ]
(Φᵀ Φ + λI)⁻¹ = [ [50.3340, -49.6591],
[-49.6591, 50.3340] ]
Step 3: Multiply the inverse by Φᵀ y to compute w
Φᵀ y = [ [0.7278]]
Now multiply (Φᵀ Φ + λI)⁻¹ by Φᵀ y:
w = [ [50.3340, -49.6591],
[-49.6591, 50.3340] ] * [ [0.7278],
[0.7278] ]
Performing the matrix multiplication:
w_1 = (50.3340 * 0.7278) + (-49.6591 * 0.7278) = 36.6330 - 36.1418 = 0.4911
Thus, the final weights are:
w = [ [0.4911] ]
Final Result:
w ≈ [0.4911]
4. RBF Algorithm
Input: Training data: {(xᵢ, yᵢ)} for i = 1 to N Number of RBF centers: K(randomly selected)
Algorithm Steps:
2. Choose spread (σ): A common choice: σ = d_max / √(2K) where d_max is the maximum distance between any two centers.
3. Compute RBF activations for each input: For each input xᵢ and center cⱼ, compute: Φᵢⱼ = ϕⱼ(xᵢ) = exp( — ||xᵢ — cⱼ||² / (2σ²) )
4. Train output weights using linear regression: w = (ΦᵀΦ)⁻¹ Φᵀ y
5. Predict for new input x: ŷ = Σⱼ=1 to K [ wⱼ ⋅ ϕⱼ(x) ] + b
5. Handwritten Lecture Notes
6. Conclusion
The Gaussian function is the most commonly used type of RBF. We explored the cover’s theorem and implemented kernel regression using RBF. The pseudo-inverse is used in cases where the matrix is non-square or singular (i.e., it does not have an inverse). RBF is used in machine learning for classification, regression, clustering, anomaly detection, and applications like face recognition and robotics to model non-linear data patterns. This article only demonstrates the RBF implementation without directly using it in Neural Networks.
References:
Appendix:
A. Lab Resources and Code Repositories — GitHub Repository: https://github.com/dilli822/Neural-Network-Labs — Contains implementations of various neural network models, algorithms, and experiments.
B. Implementation of RBF & Cover’s theorem
Tools