Effective kernel/hyper-parameter search in outlier detection
This post is about how anomaly detection on real-time objects is achieved in computationally inexpensive space, time and effectively search for best hyper-parameters to optimize our model.
My Problem Statement was to identify relevant negative signals from stream of real-time objects being collected over a period of time. The higher degree objective function was to filter such objects from universe of junk + non-relevant signals(Multi-view learning problem).
Having experience in building text classifier's, a binary model was not a good choice, the reason being the space for any relevant negative objects was quite less compared to entire universe(here in our case it's news).
Feature Engineering(Go beyond bag of words)
- Identified some negative terms and run word-collocation model to find best signals to be used as n-grams features.
- We had run multiple analysis to find supporting features which compliment above negative signals.
- Choose both as feature space[and it works great, rather than just term metrics]
Strategy of choosing kernel's and hyper-parameters
- We couldn't reply on just grid search or cross validation for choosing hyper- parameter, as being a one-class problem and precision/recall rates are theoretically not so helpful(in our case training data samples was very less).
- We wrote our parameter search algorithm that helped us to learn from the previous negative evidences predicted and suggested the best kernel, degree, coefficients and cost function for us.
- We choose, polynomial kernel with 2nd degree order works best for text related problems(Note: higher order tends to over-fit, conduct analyse to see training & regularization error. However, is a bit difficult to evaluate regularization error's with polynomial kernels, but solution exists)
Scope of Optimization
Further the model was tuned based on signal evidences, i.e. how a particular signal is responsible for miss-classification rates that helps in either improving features or create new.