samplefit

samplefit is a Python library to assess sample fit, as opposed to model fit, via the Sample Fit Reliability algorithm as developed by Okasa & Younge (2022). samplefit is linked to the statsmodels library (Seabold & Perktold, 2010) and follows the same command workflow.

Description

samplefit is a Python library for the assessment of sample fit in econometric models. In particular, samplefit implements the Sample Fit Reliability (SFR) algorithm, a re-sampling procedure to estimate the reliability of data and check the sensitivity of results. To that end, SFR is a computational approach with three aspects: Scoring, to estimate a point-wise reliability score for every observation in a sample based on the expected estimation loss over sub-samples; Annealing, to test the sensitivity of results to the sequential removal of unreliable data points; and Fitting, to estimate a weighted regression that adjusts for the reliability of the data.

Installation

To install the samplefit library from PyPi run:

pip install samplefit

or alternatively, to clone the repo run:

git clone https://github.com/okasag/samplefit.git

samplefit relies on Python 3 and requires the following dependencies:

  • numpy (>=1.22.0)
  • pandas (>=1.3.5)
  • scipy (>=1.7.2)
  • statsmodels (>=0.12.2)
  • matplotlib (>=3.4.2)
  • joblib (>=1.0.1)
  • psutil (>=5.8.0)

The required modules can be installed by navigating to the root of the cloned project and executing the following command: pip install -r requirements.txt.

Examples

The example below demonstrates the workflow of using the samplefit library in conjunction with the well-known statsmodels library.

Import libraries:

import samplefit as sf
import statsmodels.api as sm

Get data:

boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim']
X = boston.data['lstat']
X = sm.add_constant(X)

Assess model fit:

model = sm.OLS(endog=Y, exog=X)
model_fit = model.fit()
model_fit.summary()

Assess sample fit:

sample = sf.SFR(model=model)
sample_fit = sample.fit()
sample_fit.summary()

Assess sample reliability:

sample_scores = sample.score()
sample_scores.plot()

Assess sample sensitivity:

sample_annealing = sample.anneal()
sample_annealing.plot()

Authors

Gabriel Okasa & Kenneth A. Younge

References

  • Okasa, Gabriel, and Kenneth A. Younge. “Sample Fit Reliability.” arXiv preprint arXiv:2209.06631. 2022.
  • Seabold, Skipper, and Josef Perktold. “statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference. 2010.
Expand source code
"""

 `samplefit` is a Python library to assess sample fit, as opposed to model fit,
 via the *Sample Fit Reliability* algorithm as developed by Okasa & Younge (2022).
 `samplefit` is linked to the `statsmodels` library (Seabold & Perktold, 2010)
 and follows the same command workflow.

 
 Description
 ----------------------------
 `samplefit` is a Python library for the assessment of sample fit in
 econometric models. In particular, `samplefit` implements the Sample Fit
 Reliability (SFR) algorithm, a re-sampling procedure to estimate the
 reliability of data and check the sensitivity of results. To that end,
 SFR is a computational approach with three aspects: *Scoring*, to estimate a 
 point-wise reliability score for every observation in a sample based on the
 expected estimation loss over sub-samples; *Annealing*, to test the sensitivity
 of results to the sequential removal of unreliable data points; and *Fitting*,
 to estimate a weighted regression that adjusts for the reliability of the data.

 Installation
 ----------------------------
 
 To install the `samplefit` library from `PyPi` run:

 ```
 pip install samplefit
 ```
 
 or alternatively, to clone the repo run:

 ```
 git clone https://github.com/okasag/samplefit.git
 ```

 `samplefit` relies on Python 3 and requires the following dependencies:
     
 * numpy (>=1.22.0)
 * pandas (>=1.3.5)
 * scipy (>=1.7.2)
 * statsmodels (>=0.12.2)
 * matplotlib (>=3.4.2)
 * joblib (>=1.0.1)
 * psutil (>=5.8.0)
 
 The required modules can be installed by navigating to the root of
 the cloned project and executing the following command:
 `pip install -r requirements.txt`. 

 Examples
 ----------------------------

 The example below demonstrates the workflow of using the `samplefit` library
 in conjunction with the well-known `statsmodels` library.

 Import libraries:
 ```python
 import samplefit as sf
 import statsmodels.api as sm
 ```

 Get data:
 ```python
 boston = sm.datasets.get_rdataset("Boston", "MASS")
 Y = boston.data['crim']
 X = boston.data['lstat']
 X = sm.add_constant(X)
 ```

 Assess model fit:
 ```python
 model = sm.OLS(endog=Y, exog=X)
 model_fit = model.fit()
 model_fit.summary()
 ```

 Assess sample fit:
 ```python
 sample = sf.SFR(model=model)
 sample_fit = sample.fit()
 sample_fit.summary()
 ```

 Assess sample reliability:
 ```python
 sample_scores = sample.score()
 sample_scores.plot()
 ```

 Assess sample sensitivity:
 ```python
 sample_annealing = sample.anneal()
 sample_annealing.plot()
 ```

 Authors
 ----------------------------
 Gabriel Okasa & Kenneth A. Younge

 References
 ----------------------------
 - Okasa, Gabriel, and Kenneth A. Younge. “Sample Fit Reliability.”
 arXiv preprint arXiv:2209.06631. 2022.
 - Seabold, Skipper, and Josef Perktold. “statsmodels: Econometric and 
 statistical modeling with python.” Proceedings of the 9th Python in Science 
 Conference. 2010.
"""

from samplefit.Reliability import SFR
from samplefit.Reliability import SFRFitResults
from samplefit.Reliability import SFRAnnealResults
from samplefit.Reliability import SFRScoreResults

__all__ = ["SFR", "SFRFitResults", "SFRAnnealResults", "SFRScoreResults"]
__version__ = "0.3.1.9000"
__module__ = 'samplefit'
__author__ = "Gabriel Okasa & Kenneth A. Younge"
__copyright__ = "Copyright (c) 2022, Gabriel Okasa & Kenneth A. Younge"
__license__ = "MIT License"

Sub-modules

samplefit.Reliability

samplefit …

Classes

class SFR (linear_model=None, n_samples=1000, min_samples=None, loss=None, n_jobs=-1, random_state=None)

Sample Fit Reliability class labeled SFR. Initializes parameters for sample fit.

Parameters

linear_model : statsmodels class
Linear model specified via statsmodels OLS or GLM class.
n_samples : int
The number of sub-samples in the re-sampling procedure. The default is 1000.
min_samples : int, float or NoneType
Minimum number of observations for each sub-sample, i.e. number of observations to draw from the data without replacement. If integer supplied, exact number of observation is sampled. If float, share of full sample is considered (rounded up). If None, the minimum number of observations to estimate the model is selected, i.e p+1 (reccommended), where p is number of model parameters. The default is None.
loss : str or lambda function
Loss function for evaluation of the estimation errors. Loss must be either 'absolute_error' (reccommended) or 'squared_error'. For a user defined loss function, user can directly supply own lambda function of type: 'lambda y, yhat:'. Default is 'absolute_error'.
n_jobs : int or NoneType

The number of parallel jobs to be used for multithreading in .fit(), .score() and .anneal(). Follows joblib semantics:

  • n_jobs=-1 means all - 1 available cpu physical cores.
  • n_jobs=None and n_jobs=1 means no parallelism.

The default is -1.

random_state : int, NoneType or numpy.random.RandomState object
Random seed used to initialize the pseudo-random number generator. See numpy documentation for details. If None specified, 0 is used. The default is None.

Returns

Initializes SFR class. Following methods are available:
 

.fit(), .score() and .anneal().

Notes

SFR includes methods to .fit(), .score() and .anneal().

For further details, see examples below.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# assess model fit
model = sm.OLS(endog=Y, exog=X)
model_fit = model.fit()
model_fit.summary()

# assess sample fit
sample = sf.SFR(linear_model=model)
sample_fit = sample.fit()
sample_fit.summary()

# assess sample sensitivity
sample_annealing = sample.anneal()
sample_annealing.plot()

# assess sample reliability
sample_scores = sample.score()
sample_scores.plot()
Expand source code
class SFR(BaseSFR):
    """
    Sample Fit Reliability class labeled `SFR()`. Initializes
    parameters for sample fit.

    Parameters
    ----------
    linear_model : statsmodels class
        Linear model specified via statsmodels OLS or GLM class.
    n_samples : int
        The number of sub-samples in the re-sampling procedure.
        The default is 1000.
    min_samples : int, float or NoneType
        Minimum number of observations for each sub-sample, i.e. number of
        observations to draw from the data without replacement. If integer
        supplied, exact number of observation is sampled. If float, share of
        full sample is considered (rounded up). If None, the minimum number of
        observations to estimate the model is selected, i.e p+1 (reccommended),
        where p is number of model parameters. The default is None.
    loss : str or lambda function
        Loss function for evaluation of the estimation errors. Loss must be
        either 'absolute_error' (reccommended) or 'squared_error'. For a user
        defined loss function, user can directly supply own lambda function of
        type: 'lambda y, yhat:'. Default is 'absolute_error'.
    n_jobs : int or NoneType
        The number of parallel jobs to be used for multithreading in
        [`.fit()`](#samplefit.Reliability.SFR.fit),
        [`.score()`](#samplefit.Reliability.SFR.score) and
        [`.anneal()`](#samplefit.Reliability.SFR.anneal).
        Follows
        [`joblib`](https://joblib.readthedocs.io){:target="_blank"} semantics:

        - `n_jobs=-1` means all - 1 available cpu physical cores.
        - `n_jobs=None` and `n_jobs=1` means no parallelism.

        The default is -1.
    random_state : int, NoneType or numpy.random.RandomState object
        Random seed used to initialize the pseudo-random number
        generator. See
        [`numpy` documentation](https://numpy.org/doc/stable/reference/random/legacy.html){:target="_blank"}
        for details. If None specified, 0 is used. The default is None.

    Returns
    -------
    Initializes SFR class. Following methods are available:
    .fit(), .score() and .anneal().


    Notes
    -----
    `SFR()` includes methods to [`.fit()`](#samplefit.Reliability.SFR.fit),
    [`.score()`](#samplefit.Reliability.SFR.score) and
    [`.anneal()`](#samplefit.Reliability.SFR.anneal).

    For further details, see examples below.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # assess model fit
    model = sm.OLS(endog=Y, exog=X)
    model_fit = model.fit()
    model_fit.summary()
    
    # assess sample fit
    sample = sf.SFR(linear_model=model)
    sample_fit = sample.fit()
    sample_fit.summary()
    
    # assess sample sensitivity
    sample_annealing = sample.anneal()
    sample_annealing.plot()
    
    # assess sample reliability
    sample_scores = sample.score()
    sample_scores.plot()
    ```
    """

    # define init function
    def __init__(self,
                 linear_model=None,
                 n_samples=1000,
                 min_samples=None,
                 loss=None,
                 n_jobs=-1,
                 random_state=None):
        # access inherited methods
        super().__init__(
            linear_model=linear_model,
            n_samples=n_samples,
            min_samples=min_samples,
            loss=loss,
            n_jobs=n_jobs,
            random_state=random_state
        )


    def fit(self,
            weights=None,
            n_boot=None):
        """
        Sample fit based on the reliability scores via the SFR algorithm.

        Parameters
        ----------
        weights : array-like of shape (n_obs, 1) or NoneType
            An array of weights for weighted regression. If None, squared
            reliability scores will be used as weights as a default. Note, that
            if bootstrapping is used for inference, the estimation of
            user-supplied weights is not reflected. Default is None.
        n_boot : int or NoneType
            Number of bootstrap replications for inference. If None specified,
            asymptotic approximation is used for inference instead. For valid
            inference, bootstrapping is recommended. Note that bootstrapping
            requires longer computation time. Default is None.

        Returns
        -------
        Results of class SFRFitResults. Following methods are available:
        .summary(), .conf_int() and .predict().

        Notes
        -----
        [`.fit()`](#samplefit.Reliability.SFR.fit) estimates the reliability
        scores via the SFR algorithm in the first step and estimates weighted
        regression in the second step, with the squared reliability scores as
        weights if not specified otherwise.

        Examples
        --------
        ```py
        # import libraries
        import samplefit as sf
        import statsmodels.api as sm
        
        # get data 
        boston = sm.datasets.get_rdataset("Boston", "MASS")
        Y = boston.data['crim'] # per capita crime rate
        X = boston.data['lstat'] # % lower status population
        X = sm.add_constant(X)
        
        # specify model
        model = sm.OLS(endog=Y, exog=X)
        
        # specify sample
        sample = sf.SFR(linear_model=model)
        
        # sample fit with defaults
        sample_fit = sample.fit()
        
        # sample fit with bootstrapping
        sample_fit = sample.fit(n_boot=1000)
        
        # get summary of sample fit
        sample_fit.summary()
        
        # get confidence intervals
        ci_low, ci_up = sample_fit.conf_int()
        
        # get predictions (in-sample)
        preds = sample_fit.predict()
        ```
        """
        return super().fit(
            weights=weights,
            n_boot=n_boot
            )


    def score(self):
        """
        Estimation of reliability scores via the SFR algorithm.

        Parameters
        ----------
        None.

        Returns
        -------
        Results of class SFRScoreResults. Following methods are available:
        .plot().

        Notes
        -----
        [`.score()`](#samplefit.Reliability.SFR.score) estimates the
        reliability scores via the SFR algorithm. Each observation is scored
        for the reliability with 0 being the most unreliable observation and
        with 1 being the most reliable observation within a sample.

        Examples
        --------
        ```py
        # import libraries
        import samplefit as sf
        import statsmodels.api as sm
        
        # get data 
        boston = sm.datasets.get_rdataset("Boston", "MASS")
        Y = boston.data['crim'] # per capita crime rate
        X = boston.data['lstat'] # % lower status population
        X = sm.add_constant(X)
        
        # specify model
        model = sm.OLS(endog=Y, exog=X)
        
        # specify sample
        sample = sf.SFR(linear_model=model)
        
        # score reliability
        sample_scores = sample.score()
        
        # extract reliability scores
        scores = sample_scores.scores
        
        # plot reliability scores
        sample_scores.plot()
        ```
        """
        return super().score()


    def anneal(self, share=0.1, n_boot=None):
        """
        Sample annealing based on the reliability scores via the SFR algorithm.

        Parameters
        ----------
        share : float or NoneType
            Share of sample that gets annealed based on the most unreliable
            observations. Default is 0.1.
        n_boot : int or NoneType
            Number of bootstrap replications for inference. If None specified,
            asymptotic approximation is used for inference instead. For valid
            inference, bootstrapping is recommended. Note that bootstrapping
            requires longer computation time. Default is None.

        Returns
        -------
        Results of class SFRAnnealResults. Following methods are available:
        .conf_int() and .plot().

        Notes
        -----
        [`.anneal()`](#samplefit.Reliability.SFR.anneal) re-estimates the model
        while sequentially dropping the most unreliable observations. Such
        annealing procedure helps to assess the sample sensitivity and detect
        how much the parameters depend on particularly unreliable observations.

        Examples
        --------
        ```py
        # import libraries
        import samplefit as sf
        import statsmodels.api as sm
        
        # get data 
        boston = sm.datasets.get_rdataset("Boston", "MASS")
        Y = boston.data['crim'] # per capita crime rate
        X = boston.data['lstat'] # % lower status population
        X = sm.add_constant(X)
        
        # specify model
        model = sm.OLS(endog=Y, exog=X)
        
        # specify sample
        sample = sf.SFR(linear_model=model)
        
        # sample annealing with defaults
        sample_annealing = sample.anneal()
        
        # sample annealing with specified share
        sample_annealing = sample.anneal(share=0.1)
        
        # sample annealing with bootstrapping
        sample_annealing = sample.anneal(n_boot=1000)
        
        # get confidence intervals
        ci_low, ci_up = sample_annealing.conf_int()
        
        # get annealing plot
        sample_annealing.plot()
        ```
        """
        return super().anneal(
            share=share,
            n_boot=n_boot
            )

Ancestors

  • samplefit._BaseReliability.BaseSFR

Methods

def anneal(self, share=0.1, n_boot=None)

Sample annealing based on the reliability scores via the SFR algorithm.

Parameters

share : float or NoneType
Share of sample that gets annealed based on the most unreliable observations. Default is 0.1.
n_boot : int or NoneType
Number of bootstrap replications for inference. If None specified, asymptotic approximation is used for inference instead. For valid inference, bootstrapping is recommended. Note that bootstrapping requires longer computation time. Default is None.

Returns

Results of class SFRAnnealResults. Following methods are available:
 

.conf_int() and .plot().

Notes

.anneal() re-estimates the model while sequentially dropping the most unreliable observations. Such annealing procedure helps to assess the sample sensitivity and detect how much the parameters depend on particularly unreliable observations.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# specify model
model = sm.OLS(endog=Y, exog=X)

# specify sample
sample = sf.SFR(linear_model=model)

# sample annealing with defaults
sample_annealing = sample.anneal()

# sample annealing with specified share
sample_annealing = sample.anneal(share=0.1)

# sample annealing with bootstrapping
sample_annealing = sample.anneal(n_boot=1000)

# get confidence intervals
ci_low, ci_up = sample_annealing.conf_int()

# get annealing plot
sample_annealing.plot()
Expand source code
def anneal(self, share=0.1, n_boot=None):
    """
    Sample annealing based on the reliability scores via the SFR algorithm.

    Parameters
    ----------
    share : float or NoneType
        Share of sample that gets annealed based on the most unreliable
        observations. Default is 0.1.
    n_boot : int or NoneType
        Number of bootstrap replications for inference. If None specified,
        asymptotic approximation is used for inference instead. For valid
        inference, bootstrapping is recommended. Note that bootstrapping
        requires longer computation time. Default is None.

    Returns
    -------
    Results of class SFRAnnealResults. Following methods are available:
    .conf_int() and .plot().

    Notes
    -----
    [`.anneal()`](#samplefit.Reliability.SFR.anneal) re-estimates the model
    while sequentially dropping the most unreliable observations. Such
    annealing procedure helps to assess the sample sensitivity and detect
    how much the parameters depend on particularly unreliable observations.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # specify model
    model = sm.OLS(endog=Y, exog=X)
    
    # specify sample
    sample = sf.SFR(linear_model=model)
    
    # sample annealing with defaults
    sample_annealing = sample.anneal()
    
    # sample annealing with specified share
    sample_annealing = sample.anneal(share=0.1)
    
    # sample annealing with bootstrapping
    sample_annealing = sample.anneal(n_boot=1000)
    
    # get confidence intervals
    ci_low, ci_up = sample_annealing.conf_int()
    
    # get annealing plot
    sample_annealing.plot()
    ```
    """
    return super().anneal(
        share=share,
        n_boot=n_boot
        )
def fit(self, weights=None, n_boot=None)

Sample fit based on the reliability scores via the SFR algorithm.

Parameters

weights : array-like of shape (n_obs, 1) or NoneType
An array of weights for weighted regression. If None, squared reliability scores will be used as weights as a default. Note, that if bootstrapping is used for inference, the estimation of user-supplied weights is not reflected. Default is None.
n_boot : int or NoneType
Number of bootstrap replications for inference. If None specified, asymptotic approximation is used for inference instead. For valid inference, bootstrapping is recommended. Note that bootstrapping requires longer computation time. Default is None.

Returns

Results of class SFRFitResults. Following methods are available:
 

.summary(), .conf_int() and .predict().

Notes

.fit() estimates the reliability scores via the SFR algorithm in the first step and estimates weighted regression in the second step, with the squared reliability scores as weights if not specified otherwise.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# specify model
model = sm.OLS(endog=Y, exog=X)

# specify sample
sample = sf.SFR(linear_model=model)

# sample fit with defaults
sample_fit = sample.fit()

# sample fit with bootstrapping
sample_fit = sample.fit(n_boot=1000)

# get summary of sample fit
sample_fit.summary()

# get confidence intervals
ci_low, ci_up = sample_fit.conf_int()

# get predictions (in-sample)
preds = sample_fit.predict()
Expand source code
def fit(self,
        weights=None,
        n_boot=None):
    """
    Sample fit based on the reliability scores via the SFR algorithm.

    Parameters
    ----------
    weights : array-like of shape (n_obs, 1) or NoneType
        An array of weights for weighted regression. If None, squared
        reliability scores will be used as weights as a default. Note, that
        if bootstrapping is used for inference, the estimation of
        user-supplied weights is not reflected. Default is None.
    n_boot : int or NoneType
        Number of bootstrap replications for inference. If None specified,
        asymptotic approximation is used for inference instead. For valid
        inference, bootstrapping is recommended. Note that bootstrapping
        requires longer computation time. Default is None.

    Returns
    -------
    Results of class SFRFitResults. Following methods are available:
    .summary(), .conf_int() and .predict().

    Notes
    -----
    [`.fit()`](#samplefit.Reliability.SFR.fit) estimates the reliability
    scores via the SFR algorithm in the first step and estimates weighted
    regression in the second step, with the squared reliability scores as
    weights if not specified otherwise.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # specify model
    model = sm.OLS(endog=Y, exog=X)
    
    # specify sample
    sample = sf.SFR(linear_model=model)
    
    # sample fit with defaults
    sample_fit = sample.fit()
    
    # sample fit with bootstrapping
    sample_fit = sample.fit(n_boot=1000)
    
    # get summary of sample fit
    sample_fit.summary()
    
    # get confidence intervals
    ci_low, ci_up = sample_fit.conf_int()
    
    # get predictions (in-sample)
    preds = sample_fit.predict()
    ```
    """
    return super().fit(
        weights=weights,
        n_boot=n_boot
        )
def score(self)

Estimation of reliability scores via the SFR algorithm.

Parameters

None.

Returns

Results of class SFRScoreResults. Following methods are available:
 

.plot().

Notes

.score() estimates the reliability scores via the SFR algorithm. Each observation is scored for the reliability with 0 being the most unreliable observation and with 1 being the most reliable observation within a sample.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# specify model
model = sm.OLS(endog=Y, exog=X)

# specify sample
sample = sf.SFR(linear_model=model)

# score reliability
sample_scores = sample.score()

# extract reliability scores
scores = sample_scores.scores

# plot reliability scores
sample_scores.plot()
Expand source code
def score(self):
    """
    Estimation of reliability scores via the SFR algorithm.

    Parameters
    ----------
    None.

    Returns
    -------
    Results of class SFRScoreResults. Following methods are available:
    .plot().

    Notes
    -----
    [`.score()`](#samplefit.Reliability.SFR.score) estimates the
    reliability scores via the SFR algorithm. Each observation is scored
    for the reliability with 0 being the most unreliable observation and
    with 1 being the most reliable observation within a sample.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # specify model
    model = sm.OLS(endog=Y, exog=X)
    
    # specify sample
    sample = sf.SFR(linear_model=model)
    
    # score reliability
    sample_scores = sample.score()
    
    # extract reliability scores
    scores = sample_scores.scores
    
    # plot reliability scores
    sample_scores.plot()
    ```
    """
    return super().score()
class SFRAnnealResults (sample=None, params=None, params_boot=None, stand_err=None, drop_idx=None)

Annealing results class labeled SFRAnnealResults. Initializes output of SFR.anneal().

Expand source code
class SFRAnnealResults(BaseSFRAnnealResults):
    """
    Annealing results class labeled `SFRAnnealResults()`.
    Initializes output of SFR.anneal().

    """

    # define init function
    def __init__(self,
                 sample=None,
                 params=None,
                 params_boot=None,
                 stand_err=None,
                 drop_idx=None
                 ):
        # access inherited methods
        super().__init__(
            sample=sample,
            params=params,
            params_boot=params_boot,
            stand_err=stand_err,
            drop_idx=drop_idx
        )


    def plot(self,
             yname=None,
             xname=None,
             title=None,
             alpha=0.05,
             percentile=False,
             color=None,
             path=None,
             figsize=None,
             ylim=None,
             xlabel=None,
             dpi=None,
             fname=None):
        """
        Plot the Annealing based on the reliability scores from the SFR
        algorithm.
        
        Parameters
        ----------
        yname : str or NoneType
            Name of the y axis. Default is 'Effect'.
        xname : list, tuple, str or NoneType
            Name or list of names of the exog variables for which parameter
            an annealing plot should be constructed. Must be one of the exog 
            variable names. If not supplied annealing plots for all parameters
            are constructed. Default are the supplied exog names.
        title : str or NoneType
            Title for the annealing plot. Default is 'SFR: Annealing'.
        alpha : float or NoneType
            Confidence level alpha. Default is 0.05.
        percentile : bool
            Percentile method for confidence intervals based on bootstrapping.
            If bootstrapping has not been used for annealing, it is ignored.
            Default is False.
        color : str or NoneType
            Color used for the confidence interval. Must be one of the
            matplotlib supported colors. Default is grey.
        path : str or NoneType
            Valid path to save the plot. If None, plot is not saved. Default
            is None.
        figsize : tuple or NoneType
            Tuple of x and y axis size for matplotlib figsize argument.
            Default is (10,5).
        ylim : tuple, list or NoneType
            Tuple of upper and lower limits of y axis. Default is automatic.
        xlabel : str or NoneType
            Label for the x axis for the exog variable. Default is 'xname'.
        dpi : float, int or NoneType
            The resolution for matplotlib scatter plot. Default is 100.
        fname : str or NoneType
            Valid figure name to save the plot. If None, generic name is used.
            Default is None.
            

        Returns
        -------
        Dictionary of matplotlib figures and axes. Prints annealing plots.

        Notes
        -----
        [`.plot()`](#samplefit.Reliability.SFRAnnealResults.plot) produces
        an annealing plot for assessment of sample fit sensitivity, together
        with parameters and confidence intervals.

        Examples
        --------
        ```py
        # import libraries
        import samplefit as sf
        import statsmodels.api as sm
        
        # get data 
        boston = sm.datasets.get_rdataset("Boston", "MASS")
        Y = boston.data['crim'] # per capita crime rate
        X = boston.data['lstat'] # % lower status population
        X = sm.add_constant(X)
        
        # specify model
        model = sm.OLS(endog=Y, exog=X)
        
        # specify sample
        sample = sf.SFR(linear_model=model)
        
        # sample annealing
        sample_annealing = sample.anneal()
        
        # default annealing plot
        sample_annealing.plot()
        
        # custom annealing
        sample_annealing.plot(title='My Title')
        ```
        """
        return super().plot(
            yname=yname,
            xname=xname,
            title=title,
            alpha=alpha,
            percentile=percentile,
            color=color,
            path=path,
            figsize=figsize,
            ylim=ylim,
            xlabel=xlabel,
            dpi=dpi,
            fname=fname
            )
    
    
    def conf_int(self,
                 alpha=0.05,
                 percentile=False):
        """
        Confidence intervals based on the annealing via the SFR algorithm.
        
        Parameters
        ----------
        alpha : float or NoneType
            Confidence level alpha. Default is 0.05.
        percentile : bool
            Percentile method for confidence intervals based on bootstrapping.
            If bootstrapping has not been used for annealing, it is ignored.
            Default is False.

        Returns
        -------
        Tuple of arrays of confidence bounds, lower and upper.

        Notes
        -----
        [`.conf_int()`](#samplefit.Reliability.SFRAnnealResults.conf_int)
        constructs confidence intervals for estimated paramaters. If annealed
        without bootstrapping, asymptotic approximations are used. If annealed
        with bootstrapping, the standard deviation of bootstrapped parameters
        is used for standard error approximation. If percentile=True, the
        percentile method is used instead of standard deviation.

        Examples
        --------
        ```py
        # import libraries
        import samplefit as sf
        import statsmodels.api as sm
        
        # get data 
        boston = sm.datasets.get_rdataset("Boston", "MASS")
        Y = boston.data['crim'] # per capita crime rate
        X = boston.data['lstat'] # % lower status population
        X = sm.add_constant(X)
        
        # specify model
        model = sm.OLS(endog=Y, exog=X)
        
        # specify sample
        sample = sf.SFR(linear_model=model)
        
        # sample annealing
        sample_annealing = sample.anneal()
        
        # compute confidence intervals with default settings
        ci_low, ci_up = sample_annealing.conf_int()
        
        # compute confidence intervals with custom alpha
        ci_low, ci_up = sample_fit.conf_int(alpha=0.1)
        ```
        """
        return super().conf_int(
            alpha=alpha,
            percentile=percentile
            )

Ancestors

  • samplefit._BaseResultsReliability.BaseSFRAnnealResults

Methods

def conf_int(self, alpha=0.05, percentile=False)

Confidence intervals based on the annealing via the SFR algorithm.

Parameters

alpha : float or NoneType
Confidence level alpha. Default is 0.05.
percentile : bool
Percentile method for confidence intervals based on bootstrapping. If bootstrapping has not been used for annealing, it is ignored. Default is False.

Returns

Tuple of arrays of confidence bounds, lower and upper.

Notes

.conf_int() constructs confidence intervals for estimated paramaters. If annealed without bootstrapping, asymptotic approximations are used. If annealed with bootstrapping, the standard deviation of bootstrapped parameters is used for standard error approximation. If percentile=True, the percentile method is used instead of standard deviation.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# specify model
model = sm.OLS(endog=Y, exog=X)

# specify sample
sample = sf.SFR(linear_model=model)

# sample annealing
sample_annealing = sample.anneal()

# compute confidence intervals with default settings
ci_low, ci_up = sample_annealing.conf_int()

# compute confidence intervals with custom alpha
ci_low, ci_up = sample_fit.conf_int(alpha=0.1)
Expand source code
def conf_int(self,
             alpha=0.05,
             percentile=False):
    """
    Confidence intervals based on the annealing via the SFR algorithm.
    
    Parameters
    ----------
    alpha : float or NoneType
        Confidence level alpha. Default is 0.05.
    percentile : bool
        Percentile method for confidence intervals based on bootstrapping.
        If bootstrapping has not been used for annealing, it is ignored.
        Default is False.

    Returns
    -------
    Tuple of arrays of confidence bounds, lower and upper.

    Notes
    -----
    [`.conf_int()`](#samplefit.Reliability.SFRAnnealResults.conf_int)
    constructs confidence intervals for estimated paramaters. If annealed
    without bootstrapping, asymptotic approximations are used. If annealed
    with bootstrapping, the standard deviation of bootstrapped parameters
    is used for standard error approximation. If percentile=True, the
    percentile method is used instead of standard deviation.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # specify model
    model = sm.OLS(endog=Y, exog=X)
    
    # specify sample
    sample = sf.SFR(linear_model=model)
    
    # sample annealing
    sample_annealing = sample.anneal()
    
    # compute confidence intervals with default settings
    ci_low, ci_up = sample_annealing.conf_int()
    
    # compute confidence intervals with custom alpha
    ci_low, ci_up = sample_fit.conf_int(alpha=0.1)
    ```
    """
    return super().conf_int(
        alpha=alpha,
        percentile=percentile
        )
def plot(self, yname=None, xname=None, title=None, alpha=0.05, percentile=False, color=None, path=None, figsize=None, ylim=None, xlabel=None, dpi=None, fname=None)

Plot the Annealing based on the reliability scores from the SFR algorithm.

Parameters

yname : str or NoneType
Name of the y axis. Default is 'Effect'.
xname : list, tuple, str or NoneType
Name or list of names of the exog variables for which parameter an annealing plot should be constructed. Must be one of the exog variable names. If not supplied annealing plots for all parameters are constructed. Default are the supplied exog names.
title : str or NoneType
Title for the annealing plot. Default is 'SFR: Annealing'.
alpha : float or NoneType
Confidence level alpha. Default is 0.05.
percentile : bool
Percentile method for confidence intervals based on bootstrapping. If bootstrapping has not been used for annealing, it is ignored. Default is False.
color : str or NoneType
Color used for the confidence interval. Must be one of the matplotlib supported colors. Default is grey.
path : str or NoneType
Valid path to save the plot. If None, plot is not saved. Default is None.
figsize : tuple or NoneType
Tuple of x and y axis size for matplotlib figsize argument. Default is (10,5).
ylim : tuple, list or NoneType
Tuple of upper and lower limits of y axis. Default is automatic.
xlabel : str or NoneType
Label for the x axis for the exog variable. Default is 'xname'.
dpi : float, int or NoneType
The resolution for matplotlib scatter plot. Default is 100.
fname : str or NoneType
Valid figure name to save the plot. If None, generic name is used. Default is None.

Returns

Dictionary of matplotlib figures and axes. Prints annealing plots.

Notes

.plot() produces an annealing plot for assessment of sample fit sensitivity, together with parameters and confidence intervals.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# specify model
model = sm.OLS(endog=Y, exog=X)

# specify sample
sample = sf.SFR(linear_model=model)

# sample annealing
sample_annealing = sample.anneal()

# default annealing plot
sample_annealing.plot()

# custom annealing
sample_annealing.plot(title='My Title')
Expand source code
def plot(self,
         yname=None,
         xname=None,
         title=None,
         alpha=0.05,
         percentile=False,
         color=None,
         path=None,
         figsize=None,
         ylim=None,
         xlabel=None,
         dpi=None,
         fname=None):
    """
    Plot the Annealing based on the reliability scores from the SFR
    algorithm.
    
    Parameters
    ----------
    yname : str or NoneType
        Name of the y axis. Default is 'Effect'.
    xname : list, tuple, str or NoneType
        Name or list of names of the exog variables for which parameter
        an annealing plot should be constructed. Must be one of the exog 
        variable names. If not supplied annealing plots for all parameters
        are constructed. Default are the supplied exog names.
    title : str or NoneType
        Title for the annealing plot. Default is 'SFR: Annealing'.
    alpha : float or NoneType
        Confidence level alpha. Default is 0.05.
    percentile : bool
        Percentile method for confidence intervals based on bootstrapping.
        If bootstrapping has not been used for annealing, it is ignored.
        Default is False.
    color : str or NoneType
        Color used for the confidence interval. Must be one of the
        matplotlib supported colors. Default is grey.
    path : str or NoneType
        Valid path to save the plot. If None, plot is not saved. Default
        is None.
    figsize : tuple or NoneType
        Tuple of x and y axis size for matplotlib figsize argument.
        Default is (10,5).
    ylim : tuple, list or NoneType
        Tuple of upper and lower limits of y axis. Default is automatic.
    xlabel : str or NoneType
        Label for the x axis for the exog variable. Default is 'xname'.
    dpi : float, int or NoneType
        The resolution for matplotlib scatter plot. Default is 100.
    fname : str or NoneType
        Valid figure name to save the plot. If None, generic name is used.
        Default is None.
        

    Returns
    -------
    Dictionary of matplotlib figures and axes. Prints annealing plots.

    Notes
    -----
    [`.plot()`](#samplefit.Reliability.SFRAnnealResults.plot) produces
    an annealing plot for assessment of sample fit sensitivity, together
    with parameters and confidence intervals.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # specify model
    model = sm.OLS(endog=Y, exog=X)
    
    # specify sample
    sample = sf.SFR(linear_model=model)
    
    # sample annealing
    sample_annealing = sample.anneal()
    
    # default annealing plot
    sample_annealing.plot()
    
    # custom annealing
    sample_annealing.plot(title='My Title')
    ```
    """
    return super().plot(
        yname=yname,
        xname=xname,
        title=title,
        alpha=alpha,
        percentile=percentile,
        color=color,
        path=path,
        figsize=figsize,
        ylim=ylim,
        xlabel=xlabel,
        dpi=dpi,
        fname=fname
        )
class SFRFitResults (sample=None, params=None, params_boot=None, stand_err=None, fittedvalues=None)

Fit Results class labeled SFRFitResults. Initializes output of SFR.fit().

Expand source code
class SFRFitResults(BaseSFRFitResults):
    """
    Fit Results class labeled `SFRFitResults()`.
    Initializes output of SFR.fit().

    """

    # define init function
    def __init__(self,
                 sample=None,
                 params=None,
                 params_boot=None,
                 stand_err=None,
                 fittedvalues=None
                 ):
        # access inherited methods
        super().__init__(
            sample=sample,
            params=params,
            params_boot=params_boot,
            stand_err=stand_err,
            fittedvalues=fittedvalues
        )


    def predict(self,
                params=None,
                exog=None):
        """
        Predict outcomes based on the sample fit via the SFR algorithm.
        
        Parameters
        ----------
        params : array-like or NoneType
            Array of parameters to predict with. If None supplied, the
            estimated parameters from the sample fit will be used.
            Default is None.
        exog : array-like or NoneType
            Matrix of features/covariates for which the outcomes should be
            predicted (out-of-sample). Column dimensions must be identical
            to the training data supplied to the statsmodels model class.
            If None supplied, in-sample predictions (fitted values) are
            returned. Default is None.

        Returns
        -------
        Array of predictions.

        Notes
        -----
        [`.predict()`](#samplefit.Reliability.SFRFitResults.predict) constructs
        predictions for outcome variable based on the estimated parameters.
        Predictions are based on the parameters of weighted fit. If no new
        values for exogeneous variables are supplied, fitted values are
        returned.

        Examples
        --------
        ```py
        # import libraries
        import samplefit as sf
        import statsmodels.api as sm
        
        # get data 
        boston = sm.datasets.get_rdataset("Boston", "MASS")
        Y = boston.data['crim'] # per capita crime rate
        X = boston.data['lstat'] # % lower status population
        X = sm.add_constant(X)
        
        # specify model
        model = sm.OLS(endog=Y, exog=X)
        
        # specify sample
        sample = sf.SFR(linear_model=model)
        
        # sample fit
        sample_fit = sample.fit()
        
        # predict in-sample
        preds = sample_fit.predict()
        
        # predict out-of-sample
        preds = sample_fit.predict(exog=X[0, :])
        ```
        """
        return super().predict(
            params=params,
            exog=exog
            )
    
    
    def conf_int(self,
                 alpha=0.05,
                 percentile=False):
        """
        Confidence intervals based on the sample fit via the SFR algorithm.
        
        Parameters
        ----------
        alpha : float or NoneType
            Confidence level alpha. Default is 0.05.
        percentile : bool
            Percentile method for confidence intervals based on bootstrapping.
            If bootstrapping has not been used for fitting, it is ignored.
            Default is False.

        Returns
        -------
        Tuple of arrays of confidence bounds, lower and upper.

        Notes
        -----
        [`.conf_int()`](#samplefit.Reliability.SFRFitResults.conf_int)
        constructs confidence intervals for estimated paramaters. If fitted
        without bootstrapping, asymptotic approximations are used. If fitted
        with bootstrapping, the standard deviation of bootstrapped parameters
        is used for standard error approximation. If percentile=True, the
        percentile method is used instead of standard deviation.

        Examples
        --------
        ```py
        # import libraries
        import samplefit as sf
        import statsmodels.api as sm
        
        # get data 
        boston = sm.datasets.get_rdataset("Boston", "MASS")
        Y = boston.data['crim'] # per capita crime rate
        X = boston.data['lstat'] # % lower status population
        X = sm.add_constant(X)
        
        # specify model
        model = sm.OLS(endog=Y, exog=X)
        
        # specify sample
        sample = sf.SFR(linear_model=model)
        
        # sample fit
        sample_fit = sample.fit()
        
        # compute confidence intervals with default settings
        ci_low, ci_up = sample_fit.conf_int()
        
        # compute confidence intervals with custom alpha
        ci_low, ci_up = sample_fit.conf_int(alpha=0.1)
        ```
        """
        return super().conf_int(
            alpha=alpha,
            percentile=percentile
            )
    
    
    def summary(self,
                yname=None,
                xname=None,
                title=None,
                alpha=0.05,
                percentile=False,
                get_table=False,
                verbose=True):
        """
        Summary of the sample fit via the SFR algorithm.
        
        Parameters
        ----------
        yname : str or NoneType
            Name of the endog variable. Default is 'y'.
        xname : list, tuple or NoneType
            List of name of the exog variables. Must have the same dimension as
            exog columns. Default are the supplied exog names.
        title : str or NoneType
            Title for the summary table. Default is 'SFR: Fitting'.
        alpha : float or NoneType
            Confidence level alpha. Default is 0.05.
        percentile : bool
            Percentile method for confidence intervals based on bootstrapping.
            If bootstrapping has not been used for fitting, it is ignored.
            Default is False.
        get_table : bool
            If a summary table should be returned or not. If True, a pandas
            DataFrame with estimation results is returned. Default is False.
        verbose : bool
            If a summary table should be printed to console or not.
            Default is True.

        Returns
        -------
        None. Prints summary table.

        Notes
        -----
        [`.summary()`](#samplefit.Reliability.SFRFitResults.summary) produces
        a summary table including information on sample fit as well as model
        fit together with parameters, standard errors, t-values, p-values
        and confidence intervals.

        Examples
        --------
        ```py
        # import libraries
        import samplefit as sf
        import statsmodels.api as sm
        
        # get data 
        boston = sm.datasets.get_rdataset("Boston", "MASS")
        Y = boston.data['crim'] # per capita crime rate
        X = boston.data['lstat'] # % lower status population
        X = sm.add_constant(X)
        
        # specify model
        model = sm.OLS(endog=Y, exog=X)
        
        # specify sample
        sample = sf.SFR(linear_model=model)
        
        # sample fit
        sample_fit = sample.fit()
        
        # default summary
        sample_fit.summary()
        
        # custom summary title
        sample_fit.summary(title='My Title')
        ```
        """
        return super().summary(
            yname=yname,
            xname=xname,
            title=title,
            alpha=alpha,
            percentile=percentile,
            get_table=get_table,
            verbose=verbose
            )

Ancestors

  • samplefit._BaseResultsReliability.BaseSFRFitResults

Methods

def conf_int(self, alpha=0.05, percentile=False)

Confidence intervals based on the sample fit via the SFR algorithm.

Parameters

alpha : float or NoneType
Confidence level alpha. Default is 0.05.
percentile : bool
Percentile method for confidence intervals based on bootstrapping. If bootstrapping has not been used for fitting, it is ignored. Default is False.

Returns

Tuple of arrays of confidence bounds, lower and upper.

Notes

.conf_int() constructs confidence intervals for estimated paramaters. If fitted without bootstrapping, asymptotic approximations are used. If fitted with bootstrapping, the standard deviation of bootstrapped parameters is used for standard error approximation. If percentile=True, the percentile method is used instead of standard deviation.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# specify model
model = sm.OLS(endog=Y, exog=X)

# specify sample
sample = sf.SFR(linear_model=model)

# sample fit
sample_fit = sample.fit()

# compute confidence intervals with default settings
ci_low, ci_up = sample_fit.conf_int()

# compute confidence intervals with custom alpha
ci_low, ci_up = sample_fit.conf_int(alpha=0.1)
Expand source code
def conf_int(self,
             alpha=0.05,
             percentile=False):
    """
    Confidence intervals based on the sample fit via the SFR algorithm.
    
    Parameters
    ----------
    alpha : float or NoneType
        Confidence level alpha. Default is 0.05.
    percentile : bool
        Percentile method for confidence intervals based on bootstrapping.
        If bootstrapping has not been used for fitting, it is ignored.
        Default is False.

    Returns
    -------
    Tuple of arrays of confidence bounds, lower and upper.

    Notes
    -----
    [`.conf_int()`](#samplefit.Reliability.SFRFitResults.conf_int)
    constructs confidence intervals for estimated paramaters. If fitted
    without bootstrapping, asymptotic approximations are used. If fitted
    with bootstrapping, the standard deviation of bootstrapped parameters
    is used for standard error approximation. If percentile=True, the
    percentile method is used instead of standard deviation.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # specify model
    model = sm.OLS(endog=Y, exog=X)
    
    # specify sample
    sample = sf.SFR(linear_model=model)
    
    # sample fit
    sample_fit = sample.fit()
    
    # compute confidence intervals with default settings
    ci_low, ci_up = sample_fit.conf_int()
    
    # compute confidence intervals with custom alpha
    ci_low, ci_up = sample_fit.conf_int(alpha=0.1)
    ```
    """
    return super().conf_int(
        alpha=alpha,
        percentile=percentile
        )
def predict(self, params=None, exog=None)

Predict outcomes based on the sample fit via the SFR algorithm.

Parameters

params : array-like or NoneType
Array of parameters to predict with. If None supplied, the estimated parameters from the sample fit will be used. Default is None.
exog : array-like or NoneType
Matrix of features/covariates for which the outcomes should be predicted (out-of-sample). Column dimensions must be identical to the training data supplied to the statsmodels model class. If None supplied, in-sample predictions (fitted values) are returned. Default is None.

Returns

Array of predictions.

Notes

.predict() constructs predictions for outcome variable based on the estimated parameters. Predictions are based on the parameters of weighted fit. If no new values for exogeneous variables are supplied, fitted values are returned.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# specify model
model = sm.OLS(endog=Y, exog=X)

# specify sample
sample = sf.SFR(linear_model=model)

# sample fit
sample_fit = sample.fit()

# predict in-sample
preds = sample_fit.predict()

# predict out-of-sample
preds = sample_fit.predict(exog=X[0, :])
Expand source code
def predict(self,
            params=None,
            exog=None):
    """
    Predict outcomes based on the sample fit via the SFR algorithm.
    
    Parameters
    ----------
    params : array-like or NoneType
        Array of parameters to predict with. If None supplied, the
        estimated parameters from the sample fit will be used.
        Default is None.
    exog : array-like or NoneType
        Matrix of features/covariates for which the outcomes should be
        predicted (out-of-sample). Column dimensions must be identical
        to the training data supplied to the statsmodels model class.
        If None supplied, in-sample predictions (fitted values) are
        returned. Default is None.

    Returns
    -------
    Array of predictions.

    Notes
    -----
    [`.predict()`](#samplefit.Reliability.SFRFitResults.predict) constructs
    predictions for outcome variable based on the estimated parameters.
    Predictions are based on the parameters of weighted fit. If no new
    values for exogeneous variables are supplied, fitted values are
    returned.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # specify model
    model = sm.OLS(endog=Y, exog=X)
    
    # specify sample
    sample = sf.SFR(linear_model=model)
    
    # sample fit
    sample_fit = sample.fit()
    
    # predict in-sample
    preds = sample_fit.predict()
    
    # predict out-of-sample
    preds = sample_fit.predict(exog=X[0, :])
    ```
    """
    return super().predict(
        params=params,
        exog=exog
        )
def summary(self, yname=None, xname=None, title=None, alpha=0.05, percentile=False, get_table=False, verbose=True)

Summary of the sample fit via the SFR algorithm.

Parameters

yname : str or NoneType
Name of the endog variable. Default is 'y'.
xname : list, tuple or NoneType
List of name of the exog variables. Must have the same dimension as exog columns. Default are the supplied exog names.
title : str or NoneType
Title for the summary table. Default is 'SFR: Fitting'.
alpha : float or NoneType
Confidence level alpha. Default is 0.05.
percentile : bool
Percentile method for confidence intervals based on bootstrapping. If bootstrapping has not been used for fitting, it is ignored. Default is False.
get_table : bool
If a summary table should be returned or not. If True, a pandas DataFrame with estimation results is returned. Default is False.
verbose : bool
If a summary table should be printed to console or not. Default is True.

Returns

None. Prints summary table.

Notes

.summary() produces a summary table including information on sample fit as well as model fit together with parameters, standard errors, t-values, p-values and confidence intervals.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# specify model
model = sm.OLS(endog=Y, exog=X)

# specify sample
sample = sf.SFR(linear_model=model)

# sample fit
sample_fit = sample.fit()

# default summary
sample_fit.summary()

# custom summary title
sample_fit.summary(title='My Title')
Expand source code
def summary(self,
            yname=None,
            xname=None,
            title=None,
            alpha=0.05,
            percentile=False,
            get_table=False,
            verbose=True):
    """
    Summary of the sample fit via the SFR algorithm.
    
    Parameters
    ----------
    yname : str or NoneType
        Name of the endog variable. Default is 'y'.
    xname : list, tuple or NoneType
        List of name of the exog variables. Must have the same dimension as
        exog columns. Default are the supplied exog names.
    title : str or NoneType
        Title for the summary table. Default is 'SFR: Fitting'.
    alpha : float or NoneType
        Confidence level alpha. Default is 0.05.
    percentile : bool
        Percentile method for confidence intervals based on bootstrapping.
        If bootstrapping has not been used for fitting, it is ignored.
        Default is False.
    get_table : bool
        If a summary table should be returned or not. If True, a pandas
        DataFrame with estimation results is returned. Default is False.
    verbose : bool
        If a summary table should be printed to console or not.
        Default is True.

    Returns
    -------
    None. Prints summary table.

    Notes
    -----
    [`.summary()`](#samplefit.Reliability.SFRFitResults.summary) produces
    a summary table including information on sample fit as well as model
    fit together with parameters, standard errors, t-values, p-values
    and confidence intervals.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # specify model
    model = sm.OLS(endog=Y, exog=X)
    
    # specify sample
    sample = sf.SFR(linear_model=model)
    
    # sample fit
    sample_fit = sample.fit()
    
    # default summary
    sample_fit.summary()
    
    # custom summary title
    sample_fit.summary(title='My Title')
    ```
    """
    return super().summary(
        yname=yname,
        xname=xname,
        title=title,
        alpha=alpha,
        percentile=percentile,
        get_table=get_table,
        verbose=verbose
        )
class SFRScoreResults (sample=None)

Scoring results class labeled SFRScoreResults. Initializes output of SFR.score().

Expand source code
class SFRScoreResults(BaseSFRScoreResults):
    """
    Scoring results class labeled `SFRScoreResults()`.
    Initializes output of SFR.score().

    """

    # define init function
    def __init__(self,
                 sample=None
                 ):
        # access inherited methods
        super().__init__(
            sample=sample
            )


    def plot(self,
             yname=None,
             xname=None,
             title=None,
             cmap=None,
             path=None,
             figsize=None,
             s=None,
             ylim=None,
             xlim=None,
             xlabel=None,
             dpi=None,
             fname=None,
             jitter=False):
        """
        Plot the reliability scores based on the SFR algorithm.
        
        Parameters
        ----------
        yname : str or NoneType
            Name of the endog variable. Default is 'y'.
        xname : list, tuple, str or NoneType
            Name or list of names of the exog variables for which parameter
            an scoring plot should be constructed. Must be one of the exog 
            variable names. If not supplied scoring plots for all parameters
            are constructed. Default are the supplied exog names.
        title : str or NoneType
            Title for the scoring plot. Default is 'SFR: Scoring'.
        cmap : str or NoneType
            Color map used for the reliability score. Must be one of the
            matplotlib supported color maps. Default is 'RdYlGn'.
        path : str or NoneType
            Valid path to save the plot. If None, plot is not saved. Default
            is None.
        figsize : tuple or NoneType
            Tuple of x and y axis size for matplotlib figsize argument.
            Default is (10,5).
        s : float, int or NoneType
            The marker size in points**2 as for in matplotlib scatter plot.
            Default is automatic.
        ylim : tuple, list or NoneType
            Tuple of upper and lower limits of y axis. Default is automatic.
        xlim : tuple, list or NoneType
            Tuple of upper and lower limits of x axis. Default is automatic.
        xlabel : str or NoneType
            Label for the x axis for the exog variable. Default is 'xname'.
        dpi : float, int or NoneType
            The resolution for matplotlib scatter plot. Default is 100.
        fname : str or NoneType
            Valid figure name to save the plot. If None, generic name is used.
            Default is None.
        jitter : bool
            Logical, if scatterplot should be jittered for categorical
            features. Note, that this involves random perturbation of the
            values of features along X axis, fixing seed is thus necessary for
            reproducibility. Default is False.
    
        Returns
        -------
        Dictionary of matplotlib figures and axes. Prints scoring plots.

        Notes
        -----
        [`.plot()`](#samplefit.Reliability.SFRScoreResults.plot) produces
        a scoring plot for assessment of sample fit reliability.

        Examples
        --------
        ```py
        # import libraries
        import samplefit as sf
        import statsmodels.api as sm
        
        # get data 
        boston = sm.datasets.get_rdataset("Boston", "MASS")
        Y = boston.data['crim'] # per capita crime rate
        X = boston.data['lstat'] # % lower status population
        X = sm.add_constant(X)
        
        # specify model
        model = sm.OLS(endog=Y, exog=X)
        
        # specify sample
        sample = sf.SFR(linear_model=model)
        
        # sample reliability
        sample_scores = sample.score()
        
        # default scoring plot
        sample_scores.plot()
        
        # custom scoring
        sample_scores.plot(title='My Title')
        ```
        """
        return super().plot(
            yname=yname,
            xname=xname,
            title=title,
            cmap=cmap,
            path=path,
            figsize=figsize,
            s=s,
            ylim=ylim,
            xlim=xlim,
            xlabel=xlabel,
            dpi=dpi,
            fname=fname,
            jitter=jitter
            )

Ancestors

  • samplefit._BaseResultsReliability.BaseSFRScoreResults

Methods

def plot(self, yname=None, xname=None, title=None, cmap=None, path=None, figsize=None, s=None, ylim=None, xlim=None, xlabel=None, dpi=None, fname=None, jitter=False)

Plot the reliability scores based on the SFR algorithm.

Parameters

yname : str or NoneType
Name of the endog variable. Default is 'y'.
xname : list, tuple, str or NoneType
Name or list of names of the exog variables for which parameter an scoring plot should be constructed. Must be one of the exog variable names. If not supplied scoring plots for all parameters are constructed. Default are the supplied exog names.
title : str or NoneType
Title for the scoring plot. Default is 'SFR: Scoring'.
cmap : str or NoneType
Color map used for the reliability score. Must be one of the matplotlib supported color maps. Default is 'RdYlGn'.
path : str or NoneType
Valid path to save the plot. If None, plot is not saved. Default is None.
figsize : tuple or NoneType
Tuple of x and y axis size for matplotlib figsize argument. Default is (10,5).
s : float, int or NoneType
The marker size in points**2 as for in matplotlib scatter plot. Default is automatic.
ylim : tuple, list or NoneType
Tuple of upper and lower limits of y axis. Default is automatic.
xlim : tuple, list or NoneType
Tuple of upper and lower limits of x axis. Default is automatic.
xlabel : str or NoneType
Label for the x axis for the exog variable. Default is 'xname'.
dpi : float, int or NoneType
The resolution for matplotlib scatter plot. Default is 100.
fname : str or NoneType
Valid figure name to save the plot. If None, generic name is used. Default is None.
jitter : bool
Logical, if scatterplot should be jittered for categorical features. Note, that this involves random perturbation of the values of features along X axis, fixing seed is thus necessary for reproducibility. Default is False.

Returns

Dictionary of matplotlib figures and axes. Prints scoring plots.

Notes

.plot() produces a scoring plot for assessment of sample fit reliability.

Examples

# import libraries
import samplefit as sf
import statsmodels.api as sm

# get data 
boston = sm.datasets.get_rdataset("Boston", "MASS")
Y = boston.data['crim'] # per capita crime rate
X = boston.data['lstat'] # % lower status population
X = sm.add_constant(X)

# specify model
model = sm.OLS(endog=Y, exog=X)

# specify sample
sample = sf.SFR(linear_model=model)

# sample reliability
sample_scores = sample.score()

# default scoring plot
sample_scores.plot()

# custom scoring
sample_scores.plot(title='My Title')
Expand source code
def plot(self,
         yname=None,
         xname=None,
         title=None,
         cmap=None,
         path=None,
         figsize=None,
         s=None,
         ylim=None,
         xlim=None,
         xlabel=None,
         dpi=None,
         fname=None,
         jitter=False):
    """
    Plot the reliability scores based on the SFR algorithm.
    
    Parameters
    ----------
    yname : str or NoneType
        Name of the endog variable. Default is 'y'.
    xname : list, tuple, str or NoneType
        Name or list of names of the exog variables for which parameter
        an scoring plot should be constructed. Must be one of the exog 
        variable names. If not supplied scoring plots for all parameters
        are constructed. Default are the supplied exog names.
    title : str or NoneType
        Title for the scoring plot. Default is 'SFR: Scoring'.
    cmap : str or NoneType
        Color map used for the reliability score. Must be one of the
        matplotlib supported color maps. Default is 'RdYlGn'.
    path : str or NoneType
        Valid path to save the plot. If None, plot is not saved. Default
        is None.
    figsize : tuple or NoneType
        Tuple of x and y axis size for matplotlib figsize argument.
        Default is (10,5).
    s : float, int or NoneType
        The marker size in points**2 as for in matplotlib scatter plot.
        Default is automatic.
    ylim : tuple, list or NoneType
        Tuple of upper and lower limits of y axis. Default is automatic.
    xlim : tuple, list or NoneType
        Tuple of upper and lower limits of x axis. Default is automatic.
    xlabel : str or NoneType
        Label for the x axis for the exog variable. Default is 'xname'.
    dpi : float, int or NoneType
        The resolution for matplotlib scatter plot. Default is 100.
    fname : str or NoneType
        Valid figure name to save the plot. If None, generic name is used.
        Default is None.
    jitter : bool
        Logical, if scatterplot should be jittered for categorical
        features. Note, that this involves random perturbation of the
        values of features along X axis, fixing seed is thus necessary for
        reproducibility. Default is False.

    Returns
    -------
    Dictionary of matplotlib figures and axes. Prints scoring plots.

    Notes
    -----
    [`.plot()`](#samplefit.Reliability.SFRScoreResults.plot) produces
    a scoring plot for assessment of sample fit reliability.

    Examples
    --------
    ```py
    # import libraries
    import samplefit as sf
    import statsmodels.api as sm
    
    # get data 
    boston = sm.datasets.get_rdataset("Boston", "MASS")
    Y = boston.data['crim'] # per capita crime rate
    X = boston.data['lstat'] # % lower status population
    X = sm.add_constant(X)
    
    # specify model
    model = sm.OLS(endog=Y, exog=X)
    
    # specify sample
    sample = sf.SFR(linear_model=model)
    
    # sample reliability
    sample_scores = sample.score()
    
    # default scoring plot
    sample_scores.plot()
    
    # custom scoring
    sample_scores.plot(title='My Title')
    ```
    """
    return super().plot(
        yname=yname,
        xname=xname,
        title=title,
        cmap=cmap,
        path=path,
        figsize=figsize,
        s=s,
        ylim=ylim,
        xlim=xlim,
        xlabel=xlabel,
        dpi=dpi,
        fname=fname,
        jitter=jitter
        )