Radial-Basis Function and Cover’s Theorem

Radial-Basis Function and Cover’s Theorem

Implementation of RBF Based on Interpolation

Abstract

A Radial Basis Function (RBF) is a real-valued function whose output depends only on the distance between the input x and a fixed point c, called the center[1]. Data points are both linearly separable and non-linearly separable. Non-linearly separable data requires a curved decision boundary (not a straight line or hyperplane) to separate different classes perfectly. The Radial Basis Function (RBF) is the core component that gives RBF networks an RBF function as an activation function. It enables the network’s key characteristics, such as non-linear mapping and effective function approximation. This article dives into Cover’s theorem and RBF implementation with the required mathematics and algorithm.

1. Introduction

A Radial Basis Function (RBF) is a function φ that depends on the distance between two points. It’s used with a norm (a way to measure distance) in a vector space. The RBF kernel centered at a point c is written as φ_c = φ(∥x − c∥), where ∥x − c∥ is the distance between x and c[1].

Section 2 discusses Cover’s theorem, Section 3 outlines the RBF Functions with a numerical example, Section 4 is the RBF Algorithm, Section 5 provides detailed handwritten lecture notes, and Section 6 is the Conclusion.

2. Cover’s Theorem on the Separability of Patterns

Article content

Cover’s theorem states that:

“A complex pattern-classification problem, cast in a high-dimensional space nonlinearly, is more likely to be linearly separable than in a low-dimensional space.” This means the probability of separability increases with the increase in dimensionality. Thus, it also forms the theoretical basis for kernel-based learning algorithms.

Mathematical Interpretation:
----------------------------
Let:
- φ: ℝⁿ → ℝᴺ be a nonlinear transformation, where N >> n
- Data in ℝⁿ may not be linearly separable.

Then:
There exists a high probability that φ(data) ∈ ℝᴺ becomes linearly separable.        

3. Numerical Examples of RBF

Example:

Given:

x = [4,5], [2, 3]] (fixed input) c₁ = [1, 2], c₂ = [3, 4] (centers, randomly selected) σ = ? (spread, computed using σ = d_max / √(2K)) w = Weights to be computed using the formula: w = (Φᵀ Φ)⁻¹ Φᵀ y b = 0.1 (bias, randomly selected)

Step 1: Compute the Spread (σ)

Formula for σ: σ = d_max / √(2K)

Where: - d_max is the maximum distance between any two centers. - K is the number of centers.

Step 1.1: Calculate d_max (distance between c₁ and c₂): d_max = √((c₂₁ — c₁₁)² + (c₂₂ — c₁₂)²) d_max = √((3–1)² + (4–2)²) d_max = √2² + 2 ² d_max = √(4 + 4) d_max = √8 ≈ 2.828

Step 1.2: Compute σ: σ = d_max / √(2K) σ = 2.828 / √(2 * 2) σ = 2.828 / √4 σ = 2.828 / 2 σ ≈ 1.414

Thus, the spread (σ) is approximately 1.414.

Step 2: Compute RBF Activations

The RBF activation for each center is computed using the formula: ϕₖ(x) = exp(- ||x — cₖ||² / 2σ²)

Where: - x is the input vector [ [4,5], [2, 3] ]. - c₁ = [1, 2] and c₂ = [3, 4] are the centers. - σ ≈ 1.414 is the spread.

Step 2.1: Compute ϕ

Step 2.1: Compute ϕ₁₁ (for c₁ = [1, 2]) for X = [4, 5]:

ϕ₁₁ = exp(- ((4 – 1)² + (5 – 2)²) / (2 * (1.414)²))
ϕ₁₁ = exp(- (3² + 3²) / (2 * 2))
ϕ₁₁ = exp(- (18 / 4))
ϕ₁₁ = exp(-4.5)
ϕ₁₁ ≈ 0.0111

Thus, ϕ₁₁ ≈ 0.0111

--------------------------------------------

Step 2.2: Compute ϕ₂₂ (for c₂ = [3, 4]) for X = [2, 3]:

ϕ₂₂ = exp(- ((2 – 3)² + (3 – 4)²) / (2 * (1.414)²))
ϕ₂₂ = exp(- (1² + 1²) / (2 * 2))
ϕ₂₂ = exp(- (2 / 4))
ϕ₂₂ = exp(-0.5)
ϕ₂₂ ≈ 0.6065

Thus, ϕ₂₂ ≈ 0.6065

--------------------------------------------

Summary (ϕ matrix):

Let:
- ϕ₁₁ = RBF between X₁ = [4,5] and c₁ = [1,2]
- ϕ₁₂ = RBF between X₁ = [4,5] and c₂ = [3,4]
- ϕ₂₁ = RBF between X₂ = [2,3] and c₁ = [1,2]
- ϕ₂₂ = RBF between X₂ = [2,3] and c₂ = [3,4]

You have already computed:
- ϕ₁₁ ≈ 0.0111
- ϕ₂₂ ≈ 0.6065

Now, compute:
ϕ₁₂ = exp(- ((4 – 3)² + (5 – 4)²) / (2 * (1.414)²))
ϕ₁₂ = exp(- (1² + 1²) / (2 * 2))
ϕ₁₂ = exp(-0.5)
ϕ₁₂ ≈ 0.6065

ϕ₂₁ = exp(- ((2 – 1)² + (3 – 2)²) / (2 * (1.414)²))
ϕ₂₁ = exp(- (1² + 1²) / (2 * 2))
ϕ₂₁ = exp(-0.5)
ϕ₂₁ ≈ 0.6065

Final ϕ matrix:

       c₁       c₂
X₁   0.0111   0.6065
X₂   0.6065   0.6065        

Step 3: Compute (Φᵀ Φ)


Compute (Φᵀ Φ)
Given:
Φ = 
[ 0.0111   0.6065 ]
[ 0.6065   0.6065 ]

Now, compute ΦᵀΦ:

Φᵀ = 
[ 0.0111   0.6065 ]
[ 0.6065   0.6065 ]

ΦᵀΦ = 
[ (0.0111)^2 + (0.6065)^2     (0.0111)(0.6065) + (0.6065)^2 ]
[ (0.0111)(0.6065) + (0.6065)^2   (0.6065)^2 + (0.6065)^2   ]

= 
[ 0.00012321 + 0.368       0.0067 + 0.368 ]
[ 0.0067 + 0.368           0.368 + 0.368  ]

≈
[ 0.3681     0.3747 ]
[ 0.3747     0.736  ]        

Step 4: Compute (ΦᵀΦ)⁻¹

Step 4: Compute (ΦᵀΦ)⁻¹

Given:
ΦᵀΦ =
[ 0.3681   0.3747 ]
[ 0.3747   0.736  ]

To find the inverse of a 2x2 matrix:
If A = [a  b]
         [c  d]
Then A⁻¹ = (1 / (ad - bc)) * [ d  -b]
                                      [-c  a]

So,

Determinant = (0.3681)(0.736) - (0.3747)(0.3747)
            ≈ 0.270970 - 0.140399
            ≈ 0.130571

Now, compute the inverse:

(ΦᵀΦ)⁻¹ = (1 / 0.130571) *
          [ 0.736   -0.3747 ]
          [ -0.3747  0.3681 ]

≈
[ 5.636  -2.868 ]
[ -2.868  2.818 ]        

Step 5: Compute Φᵀ y

Step 5: Compute Φᵀ y with y = [0.5, 0.7]

Given:
Φ =
[ 0.6065  0.6065 ]
[ 0.0111  0.6065 ]

y = [ 0.5, 0.7 ]

Now, compute Φᵀ:

Φᵀ =
[ 0.6065   0.0111 ]
[ 0.6065   0.6065 ]

Now, multiply Φᵀ by y:

Φᵀ y =
[ 0.6065   0.0111 ]   *   [ 0.5 ]
[ 0.6065   0.6065 ]       [ 0.7 ]

= [ (0.6065 * 0.5) + (0.0111 * 0.7) ]
  [ (0.6065 * 0.5) + (0.6065 * 0.7) ]

= [ 0.30325 + 0.00777 ]
  [ 0.30325 + 0.42455 ]

= [ 0.31102 ]
  [ 0.72780 ]

Thus, Φᵀ y =
[ 0.31102 ]
[ 0.72780 ]        

Step 6: Compute Weights (w)

Now, we will compute the weights w using the formula: w = (Φᵀ Φ)⁻¹ Φᵀ y

Where: - Φ is the matrix of RBF activations for all the centers. - y is the target output vector. We assume them later.

Assume y = [0.5, 0.7] for simplicity.

Step 6: Compute Weight Vector w
w = (ΦᵀΦ)⁻¹ * Φᵀy

Given:
(ΦᵀΦ)⁻¹ =
[  5.636  -2.868 ]
[ -2.868   2.818 ]

Φᵀy =
[ 0.31102 ]
[ 0.72780 ]

Now compute w:

w₁ = (5.636)(0.31102) + (-2.868)(0.72780)
   ≈ 1.75256 - 2.08712
   ≈ -0.33456

w₂ = (-2.868)(0.31102) + (2.818)(0.72780)
   ≈ -0.89245 + 2.05067
   ≈ 1.15822

So, w ≈
[ -0.33456 ]
[  1.15822 ]

------------------------------------------------------        

Step 7: Compute Feature Vector ϕ(X) for X = [2, 3]

Step 7: Compute Feature Vector ϕ(X) for X = [2, 3]

ϕ₁ = exp(- ((2–1)² + (3–2)²) / (2 * (1.414)²))
    = exp(- (1² + 1²) / (2 * 2))
    = exp(- (2 / 4))
    = exp(-0.5)
    ≈ 0.6065

ϕ₂ = exp(- ((2–3)² + (3–4)²) / (2 * (1.414)²))
    = exp(- (1² + 1²) / (2 * 2))
    = exp(- (2 / 4))
    = exp(-0.5)
    ≈ 0.6065

So, ϕ(X) = [0.6065, 0.6065]        

Step 8: Compute Prediction ŷ

------------------------------------------------------

ŷ = ϕ(X) * w
   = [0.6065, 0.6065] * [ -0.33456, 1.15822 ]ᵀ
   = (0.6065)(-0.33456) + (0.6065)(1.15822)
   ≈ -0.2029 + 0.7025
   ≈ 0.4996

------------------------------------------------------

Final Prediction:
ŷ(X = [2, 3]) ≈ 0.4996        
Article content

Special Case Computing Pseudo Inverse for the det = 0 case in RBF

Finding (Φᵀ Φ)⁻¹

To find (Φᵀ Φ)⁻¹, we need to compute the determinant and adjugate of the matrix Φᵀ Φ.

Given:

Φᵀ Φ = 
[ 0.7358, 0.7358 ]
[ 0.7358, 0.7358 ]        

### Step 1: Compute the Determinant of Φᵀ Φ

The determinant of a 2x2 matrix is calculated using the formula:

det(A) = (a * d) - (b * c)        

For the matrix Φᵀ Φ, where:

a = 0.7358, b = 0.7358, c = 0.7358, d = 0.7358
det(Φᵀ Φ) = (0.7358 * 0.7358) - (0.7358 * 0.7358)
det(Φᵀ Φ) = 0–0 = 0        

### Step 2: Compute the Adjugate of Φᵀ Φ

The adjugate of a 2x2 matrix is computed by swapping the diagonal elements and negating the off-diagonal elements. The adjugate of Φᵀ Φ is:

adj(Φᵀ Φ) = 
[ d, -b ]
[ -c, a ]        

For the matrix Φᵀ Φ, this becomes:

adj(Φᵀ Φ) = 
[ 0.7358, -0.7358 ]
[ -0.7358, 0.7358 ]        

### Step 3: Compute the Inverse of Φᵀ Φ

The inverse of a 2x2 matrix is calculated using the formula:

(Φᵀ Φ)⁻¹ = 1 / det(Φᵀ Φ) * adj(Φᵀ Φ)        

However, since the determinant of Φᵀ Φ is zero, this means that the matrix is singular and does not have an inverse.

Thus, (Φᵀ Φ)⁻¹ does not exist because the determinant is zero.

Step 1: Add a regularization term to Φᵀ Φ

Φᵀ Φ = 
[ [0.7358, 0.7358], 
  [0.7358, 0.7358] ]        

Regularization parameter λ = 0.01

Φᵀ Φ + λI = 
[ [0.7358 + 0.01, 0.7358], 
  [0.7358, 0.7358 + 0.01] ]

Φᵀ Φ + λI = 
[ [0.7458, 0.7358], 
  [0.7358, 0.7458] ]
here I is the identity matrix        

Step 2: Compute the inverse of Φᵀ Φ + λI

To compute the inverse of a 2x2 matrix
 [ a, b ]
 [ c, d ]        

The formula for the inverse is:

(1 / (ad - bc)) * [ d, -b ]
                  [ -c, a ]

For Φᵀ Φ + λI = [ [0.7458, 0.7358], 
                  [0.7358, 0.7458] ]

The determinant is:
det = (0.7458 * 0.7458) - (0.7358 * 0.7358) = 0.014816 (approximately)        

Now compute the inverse:

(Φᵀ Φ + λI)⁻¹ = (1 / 0.014816) * [ [0.7458, -0.7358], 
                                  [-0.7358, 0.7458] ]

(Φᵀ Φ + λI)⁻¹ = 67.49 * [ [0.7458, -0.7358], 
                           [-0.7358, 0.7458] ]

(Φᵀ Φ + λI)⁻¹ = [ [50.3340, -49.6591], 
                   [-49.6591, 50.3340] ]        

Step 3: Multiply the inverse by Φᵀ y to compute w


Φᵀ y = [ [0.7278]]

Now multiply (Φᵀ Φ + λI)⁻¹ by Φᵀ y:

w = [ [50.3340, -49.6591], 
      [-49.6591, 50.3340] ] * [ [0.7278], 
                                 [0.7278] ]

Performing the matrix multiplication:

w_1 = (50.3340 * 0.7278) + (-49.6591 * 0.7278) = 36.6330 - 36.1418 = 0.4911

Thus, the final weights are:

w = [ [0.4911] ]

Final Result:

w ≈ [0.4911]        

4. RBF Algorithm

Input: Training data: {(xᵢ, yᵢ)} for i = 1 to N Number of RBF centers: K(randomly selected)

Algorithm Steps:

  1. Choose centers (cⱼ): Use K-Means clustering or randomly select K samples from the training data.

2. Choose spread (σ): A common choice: σ = d_max / √(2K) where d_max is the maximum distance between any two centers.

3. Compute RBF activations for each input: For each input xᵢ and center cⱼ, compute: Φᵢⱼ = ϕⱼ(xᵢ) = exp( — ||xᵢ — cⱼ||² / (2σ²) )

4. Train output weights using linear regression: w = (ΦᵀΦ)⁻¹ Φᵀ y

5. Predict for new input x: ŷ = Σⱼ=1 to K [ wⱼ ⋅ ϕⱼ(x) ] + b

5. Handwritten Lecture Notes

Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content

6. Conclusion

The Gaussian function is the most commonly used type of RBF. We explored the cover’s theorem and implemented kernel regression using RBF. The pseudo-inverse is used in cases where the matrix is non-square or singular (i.e., it does not have an inverse). RBF is used in machine learning for classification, regression, clustering, anomaly detection, and applications like face recognition and robotics to model non-linear data patterns. This article only demonstrates the RBF implementation without directly using it in Neural Networks.

References:

  1. Wikipedia Contributors. (2025, April 26). Radial basis function. In Wikipedia, The Free Encyclopedia. Retrieved from https://en.wikipedia.org/wiki/Radial_basis_function
  2. Sengupta, S. (2009, September 23). Lec-24 Radial Basis Function Networks: Cover’s Theorem [Video]. YouTube (nptelhrd).
  3. Haykin, S. (1998). Neural Networks: A Comprehensive Foundation. Prentice Hall.

Appendix:

A. Lab Resources and Code Repositories — GitHub Repository: https://github.com/dilli822/Neural-Network-Labs — Contains implementations of various neural network models, algorithms, and experiments.

B. Implementation of RBF & Cover’s theorem

Tools

  1. OpenAI. (n.d). ChatGPT (April 2025 version)
  2. Gemini (n.d). Google.(April 2025 version)
  3. DeepSeek R1 (n.d). DeepSeek. (April 2025 version)

To view or add a comment, sign in

More articles by Dilli Hang Rai

Explore content categories