Radial-Basis Function and Cover’s Theorem

Implementation of RBF Based on Interpolation

Abstract

A Radial Basis Function (RBF) is a real-valued function whose output depends only on the distance between the input x and a fixed point c, called the center[1]. Data points are both linearly separable and non-linearly separable. Non-linearly separable data requires a curved decision boundary (not a straight line or hyperplane) to separate different classes perfectly. The Radial Basis Function (RBF) is the core component that gives RBF networks an RBF function as an activation function. It enables the network’s key characteristics, such as non-linear mapping and effective function approximation. This article dives into Cover’s theorem and RBF implementation with the required mathematics and algorithm.

1. Introduction

A Radial Basis Function (RBF) is a function φ that depends on the distance between two points. It’s used with a norm (a way to measure distance) in a vector space. The RBF kernel centered at a point c is written as φ_c = φ(∥x − c∥), where ∥x − c∥ is the distance between x and c[1].

Section 2 discusses Cover’s theorem, Section 3 outlines the RBF Functions with a numerical example, Section 4 is the RBF Algorithm, Section 5 provides detailed handwritten lecture notes, and Section 6 is the Conclusion.

2. Cover’s Theorem on the Separability of Patterns

Cover’s theorem states that:

“A complex pattern-classification problem, cast in a high-dimensional space nonlinearly, is more likely to be linearly separable than in a low-dimensional space.” This means the probability of separability increases with the increase in dimensionality. Thus, it also forms the theoretical basis for kernel-based learning algorithms.

Mathematical Interpretation:
----------------------------
Let:
- φ: ℝⁿ → ℝᴺ be a nonlinear transformation, where N >> n
- Data in ℝⁿ may not be linearly separable.

Then:
There exists a high probability that φ(data) ∈ ℝᴺ becomes linearly separable.

3. Numerical Examples of RBF

Example:

Given:

x = [4,5], [2, 3]] (fixed input) c₁ = [1, 2], c₂ = [3, 4] (centers, randomly selected) σ = ? (spread, computed using σ = d_max / √(2K)) w = Weights to be computed using the formula: w = (Φᵀ Φ)⁻¹ Φᵀ y b = 0.1 (bias, randomly selected)

Step 1: Compute the Spread (σ)

Formula for σ: σ = d_max / √(2K)

Where: - d_max is the maximum distance between any two centers. - K is the number of centers.

Step 1.1: Calculate d_max (distance between c₁ and c₂): d_max = √((c₂₁ — c₁₁)² + (c₂₂ — c₁₂)²) d_max = √((3–1)² + (4–2)²) d_max = √2² + 2 ² d_max = √(4 + 4) d_max = √8 ≈ 2.828

Step 1.2: Compute σ: σ = d_max / √(2K) σ = 2.828 / √(2 * 2) σ = 2.828 / √4 σ = 2.828 / 2 σ ≈ 1.414

Thus, the spread (σ) is approximately 1.414.

Step 2: Compute RBF Activations

The RBF activation for each center is computed using the formula: ϕₖ(x) = exp(- ||x — cₖ||² / 2σ²)

Where: - x is the input vector [ [4,5], [2, 3] ]. - c₁ = [1, 2] and c₂ = [3, 4] are the centers. - σ ≈ 1.414 is the spread.

Step 2.1: Compute ϕ

Step 2.1: Compute ϕ₁₁ (for c₁ = [1, 2]) for X = [4, 5]:

ϕ₁₁ = exp(- ((4 – 1)² + (5 – 2)²) / (2 * (1.414)²))
ϕ₁₁ = exp(- (3² + 3²) / (2 * 2))
ϕ₁₁ = exp(- (18 / 4))
ϕ₁₁ = exp(-4.5)
ϕ₁₁ ≈ 0.0111

Thus, ϕ₁₁ ≈ 0.0111

--------------------------------------------

Step 2.2: Compute ϕ₂₂ (for c₂ = [3, 4]) for X = [2, 3]:

ϕ₂₂ = exp(- ((2 – 3)² + (3 – 4)²) / (2 * (1.414)²))
ϕ₂₂ = exp(- (1² + 1²) / (2 * 2))
ϕ₂₂ = exp(- (2 / 4))
ϕ₂₂ = exp(-0.5)
ϕ₂₂ ≈ 0.6065

Thus, ϕ₂₂ ≈ 0.6065

--------------------------------------------

Summary (ϕ matrix):

Let:
- ϕ₁₁ = RBF between X₁ = [4,5] and c₁ = [1,2]
- ϕ₁₂ = RBF between X₁ = [4,5] and c₂ = [3,4]
- ϕ₂₁ = RBF between X₂ = [2,3] and c₁ = [1,2]
- ϕ₂₂ = RBF between X₂ = [2,3] and c₂ = [3,4]

You have already computed:
- ϕ₁₁ ≈ 0.0111
- ϕ₂₂ ≈ 0.6065

Now, compute:
ϕ₁₂ = exp(- ((4 – 3)² + (5 – 4)²) / (2 * (1.414)²))
ϕ₁₂ = exp(- (1² + 1²) / (2 * 2))
ϕ₁₂ = exp(-0.5)
ϕ₁₂ ≈ 0.6065

ϕ₂₁ = exp(- ((2 – 1)² + (3 – 2)²) / (2 * (1.414)²))
ϕ₂₁ = exp(- (1² + 1²) / (2 * 2))
ϕ₂₁ = exp(-0.5)
ϕ₂₁ ≈ 0.6065

Final ϕ matrix:

       c₁       c₂
X₁   0.0111   0.6065
X₂   0.6065   0.6065

Step 3: Compute (Φᵀ Φ)


Compute (Φᵀ Φ)
Given:
Φ = 
[ 0.0111   0.6065 ]
[ 0.6065   0.6065 ]

Now, compute ΦᵀΦ:

Φᵀ = 
[ 0.0111   0.6065 ]
[ 0.6065   0.6065 ]

ΦᵀΦ = 
[ (0.0111)^2 + (0.6065)^2     (0.0111)(0.6065) + (0.6065)^2 ]
[ (0.0111)(0.6065) + (0.6065)^2   (0.6065)^2 + (0.6065)^2   ]

= 
[ 0.00012321 + 0.368       0.0067 + 0.368 ]
[ 0.0067 + 0.368           0.368 + 0.368  ]

≈
[ 0.3681     0.3747 ]
[ 0.3747     0.736  ]

Step 4: Compute (ΦᵀΦ)⁻¹

Step 4: Compute (ΦᵀΦ)⁻¹

Given:
ΦᵀΦ =
[ 0.3681   0.3747 ]
[ 0.3747   0.736  ]

To find the inverse of a 2x2 matrix:
If A = [a  b]
         [c  d]
Then A⁻¹ = (1 / (ad - bc)) * [ d  -b]
                                      [-c  a]

So,

Determinant = (0.3681)(0.736) - (0.3747)(0.3747)
            ≈ 0.270970 - 0.140399
            ≈ 0.130571

Now, compute the inverse:

(ΦᵀΦ)⁻¹ = (1 / 0.130571) *
          [ 0.736   -0.3747 ]
          [ -0.3747  0.3681 ]

≈
[ 5.636  -2.868 ]
[ -2.868  2.818 ]

Step 5: Compute Φᵀ y

Step 5: Compute Φᵀ y with y = [0.5, 0.7]

Given:
Φ =
[ 0.6065  0.6065 ]
[ 0.0111  0.6065 ]

y = [ 0.5, 0.7 ]

Now, compute Φᵀ:

Φᵀ =
[ 0.6065   0.0111 ]
[ 0.6065   0.6065 ]

Now, multiply Φᵀ by y:

Φᵀ y =
[ 0.6065   0.0111 ]   *   [ 0.5 ]
[ 0.6065   0.6065 ]       [ 0.7 ]

= [ (0.6065 * 0.5) + (0.0111 * 0.7) ]
  [ (0.6065 * 0.5) + (0.6065 * 0.7) ]

= [ 0.30325 + 0.00777 ]
  [ 0.30325 + 0.42455 ]

= [ 0.31102 ]
  [ 0.72780 ]

Thus, Φᵀ y =
[ 0.31102 ]
[ 0.72780 ]

Step 6: Compute Weights (w)

Now, we will compute the weights w using the formula: w = (Φᵀ Φ)⁻¹ Φᵀ y

Where: - Φ is the matrix of RBF activations for all the centers. - y is the target output vector. We assume them later.

Assume y = [0.5, 0.7] for simplicity.

Step 6: Compute Weight Vector w
w = (ΦᵀΦ)⁻¹ * Φᵀy

Given:
(ΦᵀΦ)⁻¹ =
[  5.636  -2.868 ]
[ -2.868   2.818 ]

Φᵀy =
[ 0.31102 ]
[ 0.72780 ]

Now compute w:

w₁ = (5.636)(0.31102) + (-2.868)(0.72780)
   ≈ 1.75256 - 2.08712
   ≈ -0.33456

w₂ = (-2.868)(0.31102) + (2.818)(0.72780)
   ≈ -0.89245 + 2.05067
   ≈ 1.15822

So, w ≈
[ -0.33456 ]
[  1.15822 ]

------------------------------------------------------

Step 7: Compute Feature Vector ϕ(X) for X = [2, 3]

Step 7: Compute Feature Vector ϕ(X) for X = [2, 3]

ϕ₁ = exp(- ((2–1)² + (3–2)²) / (2 * (1.414)²))
    = exp(- (1² + 1²) / (2 * 2))
    = exp(- (2 / 4))
    = exp(-0.5)
    ≈ 0.6065

ϕ₂ = exp(- ((2–3)² + (3–4)²) / (2 * (1.414)²))
    = exp(- (1² + 1²) / (2 * 2))
    = exp(- (2 / 4))
    = exp(-0.5)
    ≈ 0.6065

So, ϕ(X) = [0.6065, 0.6065]

Step 8: Compute Prediction ŷ

------------------------------------------------------

ŷ = ϕ(X) * w
   = [0.6065, 0.6065] * [ -0.33456, 1.15822 ]ᵀ
   = (0.6065)(-0.33456) + (0.6065)(1.15822)
   ≈ -0.2029 + 0.7025
   ≈ 0.4996

------------------------------------------------------

Final Prediction:
ŷ(X = [2, 3]) ≈ 0.4996

Special Case Computing Pseudo Inverse for the det = 0 case in RBF

Finding (Φᵀ Φ)⁻¹

To find (Φᵀ Φ)⁻¹, we need to compute the determinant and adjugate of the matrix Φᵀ Φ.

Given:

Φᵀ Φ = 
[ 0.7358, 0.7358 ]
[ 0.7358, 0.7358 ]

### Step 1: Compute the Determinant of Φᵀ Φ

The determinant of a 2x2 matrix is calculated using the formula:

det(A) = (a * d) - (b * c)

For the matrix Φᵀ Φ, where:

a = 0.7358, b = 0.7358, c = 0.7358, d = 0.7358
det(Φᵀ Φ) = (0.7358 * 0.7358) - (0.7358 * 0.7358)
det(Φᵀ Φ) = 0–0 = 0

### Step 2: Compute the Adjugate of Φᵀ Φ

The adjugate of a 2x2 matrix is computed by swapping the diagonal elements and negating the off-diagonal elements. The adjugate of Φᵀ Φ is:

adj(Φᵀ Φ) = 
[ d, -b ]
[ -c, a ]

For the matrix Φᵀ Φ, this becomes:

adj(Φᵀ Φ) = 
[ 0.7358, -0.7358 ]
[ -0.7358, 0.7358 ]

### Step 3: Compute the Inverse of Φᵀ Φ

The inverse of a 2x2 matrix is calculated using the formula:

(Φᵀ Φ)⁻¹ = 1 / det(Φᵀ Φ) * adj(Φᵀ Φ)

However, since the determinant of Φᵀ Φ is zero, this means that the matrix is singular and does not have an inverse.

Thus, (Φᵀ Φ)⁻¹ does not exist because the determinant is zero.

Step 1: Add a regularization term to Φᵀ Φ

Φᵀ Φ = 
[ [0.7358, 0.7358], 
  [0.7358, 0.7358] ]

Regularization parameter λ = 0.01

Φᵀ Φ + λI = 
[ [0.7358 + 0.01, 0.7358], 
  [0.7358, 0.7358 + 0.01] ]

Φᵀ Φ + λI = 
[ [0.7458, 0.7358], 
  [0.7358, 0.7458] ]
here I is the identity matrix

Step 2: Compute the inverse of Φᵀ Φ + λI

To compute the inverse of a 2x2 matrix
 [ a, b ]
 [ c, d ]

The formula for the inverse is:

(1 / (ad - bc)) * [ d, -b ]
                  [ -c, a ]

For Φᵀ Φ + λI = [ [0.7458, 0.7358], 
                  [0.7358, 0.7458] ]

The determinant is:
det = (0.7458 * 0.7458) - (0.7358 * 0.7358) = 0.014816 (approximately)

Now compute the inverse:

(Φᵀ Φ + λI)⁻¹ = (1 / 0.014816) * [ [0.7458, -0.7358], 
                                  [-0.7358, 0.7458] ]

(Φᵀ Φ + λI)⁻¹ = 67.49 * [ [0.7458, -0.7358], 
                           [-0.7358, 0.7458] ]

(Φᵀ Φ + λI)⁻¹ = [ [50.3340, -49.6591], 
                   [-49.6591, 50.3340] ]

Step 3: Multiply the inverse by Φᵀ y to compute w


Φᵀ y = [ [0.7278]]

Now multiply (Φᵀ Φ + λI)⁻¹ by Φᵀ y:

w = [ [50.3340, -49.6591], 
      [-49.6591, 50.3340] ] * [ [0.7278], 
                                 [0.7278] ]

Performing the matrix multiplication:

w_1 = (50.3340 * 0.7278) + (-49.6591 * 0.7278) = 36.6330 - 36.1418 = 0.4911

Thus, the final weights are:

w = [ [0.4911] ]

Final Result:

w ≈ [0.4911]

4. RBF Algorithm

Input: Training data: {(xᵢ, yᵢ)} for i = 1 to N Number of RBF centers: K(randomly selected)

Algorithm Steps:

Choose centers (cⱼ): Use K-Means clustering or randomly select K samples from the training data.

2. Choose spread (σ): A common choice: σ = d_max / √(2K) where d_max is the maximum distance between any two centers.

3. Compute RBF activations for each input: For each input xᵢ and center cⱼ, compute: Φᵢⱼ = ϕⱼ(xᵢ) = exp( — ||xᵢ — cⱼ||² / (2σ²) )

4. Train output weights using linear regression: w = (ΦᵀΦ)⁻¹ Φᵀ y

5. Predict for new input x: ŷ = Σⱼ=1 to K [ wⱼ ⋅ ϕⱼ(x) ] + b

5. Handwritten Lecture Notes

6. Conclusion

The Gaussian function is the most commonly used type of RBF. We explored the cover’s theorem and implemented kernel regression using RBF. The pseudo-inverse is used in cases where the matrix is non-square or singular (i.e., it does not have an inverse). RBF is used in machine learning for classification, regression, clustering, anomaly detection, and applications like face recognition and robotics to model non-linear data patterns. This article only demonstrates the RBF implementation without directly using it in Neural Networks.

References:

Wikipedia Contributors. (2025, April 26). Radial basis function. In Wikipedia, The Free Encyclopedia. Retrieved from https://en.wikipedia.org/wiki/Radial_basis_function
Sengupta, S. (2009, September 23). Lec-24 Radial Basis Function Networks: Cover’s Theorem [Video]. YouTube (nptelhrd).
Haykin, S. (1998). Neural Networks: A Comprehensive Foundation. Prentice Hall.

Appendix:

A. Lab Resources and Code Repositories — GitHub Repository: https://github.com/dilli822/Neural-Network-Labs — Contains implementations of various neural network models, algorithms, and experiments.

B. Implementation of RBF & Cover’s theorem

Google Colab

Tools

OpenAI. (n.d). ChatGPT (April 2025 version)
Gemini (n.d). Google.(April 2025 version)
DeepSeek R1 (n.d). DeepSeek. (April 2025 version)

Radial-Basis Function and Cover’s Theorem

Dilli Hang Rai

Abstract

1. Introduction

2. Cover’s Theorem on the Separability of Patterns

3. Numerical Examples of RBF

Example:

Step 1: Compute the Spread (σ)

Step 2: Compute RBF Activations

Step 2.1: Compute ϕ

Step 3: Compute (Φᵀ Φ)

Step 4: Compute (ΦᵀΦ)⁻¹

Step 5: Compute Φᵀ y

Step 6: Compute Weights (w)

Step 7: Compute Feature Vector ϕ(X) for X = [2, 3]

Step 8: Compute Prediction ŷ

Special Case Computing Pseudo Inverse for the det = 0 case in RBF

Finding (Φᵀ Φ)⁻¹

4. RBF Algorithm

Algorithm Steps:

5. Handwritten Lecture Notes

6. Conclusion

References:

Appendix:

Tools

More articles by Dilli Hang Rai

Explore content categories

Abstract

1. Introduction

2. Cover’s Theorem on the Separability of Patterns

3. Numerical Examples of RBF

Example:

Step 1: Compute the Spread (σ)

Step 2: Compute RBF Activations

Step 2.1: Compute ϕ

Step 3: Compute (Φᵀ Φ)

Step 4: Compute (ΦᵀΦ)⁻¹

Step 5: Compute Φᵀ y

Step 6: Compute Weights (w)

Step 7: Compute Feature Vector ϕ(X) for X = [2, 3]

Step 8: Compute Prediction ŷ

Special Case Computing Pseudo Inverse for the det = 0 case in RBF

Finding (Φᵀ Φ)⁻¹

4. RBF Algorithm

Algorithm Steps:

5. Handwritten Lecture Notes

6. Conclusion

References:

Appendix:

Tools

More articles by Dilli Hang Rai

Least Mean Square Algorithm Part II

Linear Least Square Filters: LMS Algorithm Part I

Introduction to Parsers Part II: SLR(1),CLR(1),LALR(1)

Introduction to Parsers Part I: LL(1) And LR(0)

First() And Follow() Sets Examples

Direct Conversion Method of Regex to DFA

Relation between the Perceptron and Bayes Classifier for a Gaussian Environment

The Perceptron Convergence Theorem

Multi-Layer Perceptron Learning

Informed Search: Brief Study of Hill Climbing

Explore content categories