SortedData Class Reference

Overloading of the InputData class to support sorting of the column. More...

#include <SortedData.h>

Inheritance diagram for SortedData:

InputData List of all members.

Public Member Functions

void load (const string &fileName, const eInputType inputType=IT_TRAIN, const int verboseLevel=1)
 Overloading of the load function to support sorting.
vpIterator getSortedBegin (const int colIdx)
 Get the first element of the (sorted) column of the data.
vpIterator getSortedEnd (const int colIdx)
 Get the last element of the (sorted) column of the data.

Protected Types

typedef vector< pair< int,
double > > 
column
 A column of the data.

Protected Attributes

vector< column_sortedData
 the sorted data.

Detailed Description

Overloading of the InputData class to support sorting of the column.

This is particularly useful for stump-based learner, because they work column-by-column (dimension-by-dimension), looking for a threshold that minimizes the error, and sorting the data it's mandatory. The connection between this class and the weak learner that implements decision stump, is done with the overriding of method BaseLearner::createInputData() which will return the desired InputData type (and which might depend on the arguments of the command line too).

See also:
BaseLearner::createInputData()

StumpLearner::createInputData()

Date:
21/11/2005

Definition at line 56 of file SortedData.h.


Member Typedef Documentation

typedef vector< pair<int, double> > column [protected]
 

A column of the data.

The pair represents the index of the example and the value of the column. The index of the column is the index of the vector itself.

Remarks:
I am storing both the index and the value because it is a trade off between. speed in a key part of the code (finding the threshold) and the memory consumption. In case of very large databases, this could be turned into a index only vector.
Date:
11/11/2005

Definition at line 98 of file SortedData.h.


Member Function Documentation

vpIterator getSortedBegin const int  colIdx  )  [inline]
 

Get the first element of the (sorted) column of the data.

Parameters:
colIdx The column index
Returns:
The iterator to the first element of the column
Date:
10/11/2005

Definition at line 76 of file SortedData.h.

References SortedData::_sortedData.

vpIterator getSortedEnd const int  colIdx  )  [inline]
 

Get the last element of the (sorted) column of the data.

Parameters:
colIdx The column index
Returns:
The iterator to the last element (end()) of the column
Remarks:
It is the end() iterator, so it does not point to anything!
Date:
10/11/2005

Definition at line 85 of file SortedData.h.

References SortedData::_sortedData.

void load const string &  fileName,
const eInputType  inputType = IT_TRAIN,
const int  verboseLevel = 1
[virtual]
 

Overloading of the load function to support sorting.

Parameters:
fileName The name of the file to be loaded
inputType The type of input.
verboseLevel The level of verbosity
See also:
InputData::load()
Date:
21/11/2005

Reimplemented from InputData.

Definition at line 32 of file SortedData.cpp.

References InputData::_data, InputData::_numColumns, InputData::_numExamples, SortedData::_sortedData, nor_utils::comparePairOnSecond(), MultiBoost::IT_TEST, and InputData::load().


The documentation for this class was generated from the following files:
Generated on Mon Nov 28 21:43:48 2005 for MultiBoost by  doxygen 1.4.5