StatsPy Reference Guide

Parameter

class statspy.core.Param(*args, **kwargs)[source]

Base class to define a PF shape parameter.

Two types of parameters can be built:

  • RAW parameters do not depend on any other parameter and have a value directly associated to them
  • DERIVED parameters are obtained from other parameters via an analytical formula.

Examples

Declarations:

>>> import statspy as sp 
>>> mu = sp.Param(name="mu",value=10.,label="\\mu")
>>> x = sp.Param("x = 10. +- 2.")
>>> x.value
10.0
>>> x.unc
2.0
>>> y = sp.Param("y = 5.", unc=1.)

Operations, building DERIVED parameters:

>>> x + y
x + y = 15.0 +- 2.2360679775
>>> z = x * y
>>> z.name = 'z'
>>> z
z = x * y = 50.0 +- 14.1421356237
>>> x**2
x ** 2 = 100.0 +- 40.0

Possible operations are +,-,*,/,**.

Attributes

name str Random Variable name
label str Random Variable name for printing purposes
value float Current numerical value
unc float Parameter uncertainty (e.g. after minimization)
neg_unc float Negative parameter uncertainty (only for intervals)
pos_unc float Positive parameter uncertainty (only for intervals)
bounds list Defines, if necessary, the lower and upper bounds
formula list (optional, only for DERIVED parameters) List of operators and parameters used to parse an analytic function
strform str (optional, only for DERIVED parameters) Representation of the formula as a string
partype Param.RAW or Param.DERIVED Tells whether it is a RAW or a DERIVED parameter
const bool Tells whether a parameter is fixed during a minimazation process. It is not a constant in the sense of programming.
poi bool Tells whether a parameter is a parameter of interest in an hypothesis test.
isuptodate bool Tells whether value needs to be computed again or not
logger logging.Logger message logging system

Methods

__add__(other)[source]

Add a parameter to another parameter or a numerical value.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is the sum of self and other

__call__()[source]

Return the parameter value.

Returns :

self.value : float

Parameter value possibly recomputed from self.formula if self.isuptodate is False.

__div__(other)[source]

Divide a parameter by another parameter or by a numerical value.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is the ratio of self and other

__getattribute__(name)[source]

Overload __getattribute__ to update the value attribute from the formula for a DERIVED parameter.

__iadd__(other)[source]

In-place addition (+=)

Parameters :

self : Param

other : Param, int, long, float

Returns :

self : Param

self parameter modified by other

__imul__(other)[source]

In-place multiplication (*=)

Parameters :

self : Param

other : Param, int, long, float

Returns :

self : Param

self parameter modified by other

__isub__(other)[source]

In-place subtraction (-=)

Parameters :

self : Param

other : Param, int, long, float

Returns :

self : Param

self parameter modified by other

__mul__(other)[source]

Multiply a parameter by another parameter or by a numerical value.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is the product of self and other

__pow__(other)[source]

Raise a parameter the power.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is self raised to the power of other

__radd__(other)[source]

Add a numerical value to a parameter

__rdiv__(other)[source]

Divide a numerical value by a parameter

__repr__()[source]

Return Parameter value and formula if DERIVED

__rmul__(other)[source]

Multiply a numerival value by a parameter

__rpow__(other)[source]

Raise a numerical value to the power

__rsub__(other)[source]

Subtract a numerical value to a parameter

__setattr__(name, value)[source]

Overload __setattr__ to make sure that quantities based on this parameter will be updated when its value is modified.

__sub__(other)[source]

Subtract a parameter to another parameter or a numerical value.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is the difference of self and other

__weakref__

list of weak references to the object (if defined)

get_raw_params()[source]

Get the list of RAW parameters from a DERIVED parameter

unbound_repr()[source]

Transform the value of the parameter such as it has no bounds.

This method is used with the minimization algorithms which require only unbound parameters. The transformation formula from a double-sided or a one-sided parameter to an unbound parameter are described in Section 1.3.1 of the MINUIT manual: Fred James, Matthias Winkler, MINUIT User’s Guide, June 16, 2004.

Returns :

new : float

parameter value within an unbound representation.

unbound_to_bound(val, unc=0.0)[source]

Transform the parameter value from an unbound to a bounded representation.

This method is used with the minimization algorithms which require only unbound parameters. The transformation formula from an unbound parameter to a double-sided or a one-sided parameter are described in Section 1.3.1 of the MINUIT manual: Fred James, Matthias Winkler, MINUIT User’s Guide, June 16, 2004. Since the transformation is non-linear, the transformation of the uncertainty is approximate and based on the error propagation formula. In particular, when the value is close to its limit, the uncertainty is not trustable and a more refined analysis should be performed.

Parameters :

val : float

parameter value within an unbound representation

unc : float

parameter uncertainty within an unbound representation

Probability Function

class statspy.core.PF(*args, **kwargs)[source]

Base class to define a Probability Function.

Probability Function is a generic name which includes both the probability mass function for discrete random variables and the probability density fucntion for continuous random variables. The function itself is defined in self.func:

  • For a RAW PF, it is a pdf or a pmf of scipy.stats.
  • For a DERIVED PF, it is list containing an operator and pointers to other PFs.

Examples

>>> import statspy as sp
>>> pmf_n = sp.PF("pmf_n=poisson(n;mu)",mu=10.)

Attributes

name str (optional) Function name
func scipy.stats.distributions.rv_generic, list Probability Density Function object.
params statspy.core.Param list List of shape parameters used to define the pf
norm Param Normalization parameter set to 1 by default. It can be different from 1 when the PF is fitted to data.
isuptodate bool Tells whether PF needs to be normalized or not
options dict Potential list of options
pftype PF.RAW or PF.DERIVED Tells whether a PF is a RAW PF or DERIVED from other PFs
logger logging.Logger message logging system

Methods

__add__(other)[source]

Add two PFs.

The norm parameters are also summed.

Parameters :

self : PF

other : PF

Returns :

new : PF

new pf which is the sum of self and other

__call__(*args, **kwargs)[source]

Evaluate Probability Function in x

Parameters :

args : float, ndarray, optional, multiple values for multivariate pfs

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

value : float, ndarray

Probability Function value(s) in x

__mul__(other)[source]

Multiply a PF by another PF.

Parameters :

self : PF

other : PF

Returns :

new : PF

new PF which is the product of self and other

__rmul__(scale)[source]

Scale PF normalization value

Parameters :

self : PF

scale : float

Returns :

self : PF

original PF with self.norm.value *= scale

__weakref__

list of weak references to the object (if defined)

cdf(*args, **kwargs)[source]

Compute the cumulative distribution function in x.

Parameters :

args : ndarray, tuple

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

value : float, ndarray

Cumulative distribution function value(s) in x

convolve(other, **kw)[source]

Convolve two PFs which is same as adding two random variables.

The scipy.signal package is used to perform the convolution with different options set by mode.

Parameters :

self : PF

other : PF

kw : keywork arguments, dict

May be:

  • options : str

    Way to perform the convolution. If mode is equal to:

    • fft then scipy.signal.fftconvolve is used
    • num then scipy.signal.convolve is used
    • rvs then random variates are used to generate the PF
Returns :

new : PF

new PF which is the convolution of self and other

Notes

This method is working only for 1-dim pdf/pmf currently.

corr()

Returns the correlation matrix of the free parameters.

Returns :

corr : ndarray

Correlation matrix

dF(x)[source]

Compute the uncertainty on PF given the uncertainty on the shape and norm parameters.

This method can be used to show an error band on your fitted PF. To compute the uncertainty on the PF, the error propagation formula is used:

dF(x;th) = (F(x;th+dth) - F(x;th-dth))/2
dF(x)^2 = dF(x;th)^T * corr(th,th') * dF(x;th')

so keep in mind it is only an approximation.

Parameters :

x : float, ndarray

Random variable(s) value(s)

Returns :

dF : float, ndarray

Uncertainty on the PF evaluated in x

get_list_free_params()[source]

Get the list of free parameters.

kurtosis(**kw)[source]

Estimate the kurtosis of the PF. See PF.mean() for the syntax.

leastsq_fit(xdata, ydata, ey=None, dx=None, cond=None, **kw)[source]

Fit the PF to data using a least squares method.

The fitting part is performed using the scipy.optimize.leastsq function. The Levenberg-Marquardt algorithm is used by the ‘leastsq’ method to find the minimum values. When calling this method, all PF parameters are minimized except the one which are set as ‘const’.

Parameters :

xdata : ndarray

Values for which ydata are measured and PF must be computed

ydata : ndarray

Observed values (like number of events)

ey : ndarray (optional)

Standard deviations of ydata. If not specified, it takes sqrt(ydata) as standard deviation.

dx : ndarray (optional)

Array containing bin-width of xdata. It can be used to normalize the PF to the integral while minimizing.

cond : boolean ndarray (optional)

Boolean array telling if a bin should be used in the fit or not

kw : keyword arguments

Keyword arguments passed to the leastsq method

Returns :

free_params : statspy.core.Param list

List of the free parameters used during the fit. Their ‘value’ and ‘unc’ arguments are extracted from minimization.

pcov : 2d array

Estimated covariance matrix of the free parameters.

chi2min : float

Least square sum evaluated in popt.

pvalue : float

p-value = P(chi2>chi2min,ndf) with P a chi2 distribution and ndf the number of degrees of freedom.

loc(loc)[source]

Derive a PF from another PF via a location parameter.

Parameters :

self : PF

loc : Param, value

Returns :

new : PF

new PF with x -> x + loc

logpf(*args, **kwargs)[source]

Compute the logarithm of the PF in x.

Parameters :

args : ndarray, tuple

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

value : float, ndarray

Logarithm of the Probability Function value(s) in x

maxlikelihood_fit(data, **kw)[source]

Fit the PF to data using the maximum likelihood estimator method.

The fitting part is performed using the scipy.optimize.minimize function. If keyword argument method is not specified, the BFGS algorithm is used. When calling this method, all PF parameters are minimized except the one which are set as ‘const’ before calling the method.

Parameters :

data : ndarray, tuple

Data used in the computation of the (log-)likelihood function

kw : keyword arguments (optional)

Keyword arguments passed to the scipy.optimize.minimize method.

Returns :

free_params : statspy.core.Param list

List of the free parameters used during the fit. Their ‘value’ arguments are extracted from the minimization process. If the Hessian is provided by the algorithm, the ‘unc’ argument is also updated.

nllfmin : float

Minimal value of the negative log-likelihood function

mean(**kw)[source]

Estimate the mean of the PF.

Warning

  • In the case of a RAW PF from scipy.stats, it calls the stats method and therefore returns the expected value.
  • In the case of a DERIVED PF, it returns an estimate derived from random variates and using the numpy.mean function.
Parameters :

kw : keywork arguments, optional

Shape parameters values

Returns :

mean : ndarray

nllf(data, **kw)[source]

Evaluate the negative log-likelihood function:

nllf = -sum_i(log(pf(x_i;params)))

sum runs over the x-variates defined in data array.

Parameters :

data : ndarray, tuple

x - variates used in the computation of the likelihood

kw : keyword arguments (optional)

Specify any Parameter name of the considered PF

Returns :

nllf : float

Negative log-likelihood function

pllr(data, **kw)[source]

Evaluate the profile log-likelihood ratio ( * -2 )

The profile likelihood ratio is defined by:

l = L(x|theta_r,\hat{\hat{theta_s}}) / L(x|\hat{theta_r},\hat{theta_s})

The profile log-likelihood ratio is then:

q = -2 * log(l)

Where

  • L is the Likelihood function (computed via data)
  • theta_r is the list of parameters of interest (Param.poi = True)
  • theta_s is the list of nuisance parameters
  • hat or double hat refers to the unconditional and conditional maximum likelilood estimates of the parameters respectively.

pllr is used as a test statistics for problems with numerous nuisance parameters. Asymptotically, the pllr PF is described by a chi2 distribution (Wilks theorem). Further information on the likelihood ratio can be found in Chapter 22 of “Kendall’s Advanced Theory of Statistics, Volume 2A”.

Parameters :

data : ndarray, tuple

x - variates used in the computation of the likelihood

kw : keyword arguments (optional)

Specify any Parameter of interest name of the considered PF, or any option used by the method maxlikelihood_fit.

uncond_nllf : float

unconditional minimal negative log-likelihood function value

Returns :

pllr : float

Profile log-likelihood ratio times -2

pvalue(*args, **kwargs)

Compute the pvalue in xobs.

The p-value is the probability of observing at least xobs Pr(x >= xobs).

Parameters :

args : ndarray, tuple

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

pvalue : float, ndarray

p-value(s) in x

rv_names()[source]

Return a list of RV names.

rvadd(other, **kw)[source]

Operation equivalent to adding two random variables.

rvdiv(other, **kw)[source]

Operation equivalent to dividing two random variables.

rvmul(other, **kw)[source]

Operation equivalent to multiplying two random variables.

rvs(**kwargs)[source]

Get random variates from a PF

Returns :

data : ndarray

Array of random variates

Examples

>>> import statspy as sp
>>> pdf_x = sp.PF("pdf_x=norm(x;mu=20,sigma=5)")
>>> data = pdf_x.rvs(size=1000)
rvsub(other, **kw)[source]

Operation equivalent to subtracting two random variables.

scale(scale)[source]

Derive a PF from another PF via a scale parameter.

Parameters :

self : PF

scale : Param, value

Returns :

new : PF

new PF with x -> x * scale

sf(*args, **kwargs)[source]

Compute the survival function (1 - cdf) in x.

Parameters :

args : ndarray, tuple

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

value : float, ndarray

Survival function value(s) in x

skew(**kw)[source]

Estimate the skewness of the PF. See PF.mean() for the syntax.

std(**kw)[source]

Estimate the standard deviation of the PF. See PF.mean() for the syntax.

var(**kw)[source]

Estimate the variance of the PF. See PF.mean() for the syntax.

Random Variable

class statspy.core.RV(*args, **kwargs)[source]

Base class to define a Random Variable.

Examples

>>> import statspy as sp 
>>> X = sp.RV("norm(x|mu=10,sigma=2)")

Attributes

name str Random Variable name
pf statspy.core.PF Probability Function object associated to a Random Variable
rvtype RV.CONTINUOUS or RV.DISCRETE Random Variable type
logger logging.Logger message logging system
__add__(other)[source]

Add two random variables or a random variable by a parameter.

The associated PF of the sum of the two random variables is a convolution of the two PFs. Caveat: this method assumes independent random variables.

Parameters :

self : RV

other : RV, Param, value

Returns :

new : RV

new RV which is the sum of self and other

__call__(**kwargs)[source]

Get random variates

Returns :

data : ndarray

Array of random variates

Examples

>>> import statspy as sp
>>> X = sp.RV("norm(x;mu=20,sigma=5)")
>>> x = X(size=1000)
__div__(other)[source]

Divide two random variables or a random variable by a parameter.

Caveat: this method assumes independent random variables.

Parameters :

self : RV

other : RV, Param, value

Returns :

new : RV

new RV which is the ratio of self and other

__mul__(other)[source]

Multiply two random variables or a random variable by a parameter.

Caveat: this method assumes independent random variables.

Parameters :

self : RV

other : RV, Param, value

Returns :

new : RV

new RV which is the product of self and other

__radd__(other)[source]

Add a random variable and a parameter.

Parameters :

self : RV

other : Param, value

Returns :

new : RV

new RV which is the sum of self and other

__rmul__(other)[source]

Multiply a parameter by a random variable.

Parameters :

self : RV

other : Param, value

Returns :

new : RV

new RV which is the product of self and other

__sub__(other)[source]

Subtract two random variables or a random variable by a parameter.

Caveat: this method assumes independent random variables.

Parameters :

self : RV

other : RV, Param, value

Returns :

new : RV

new RV which is the difference of self and other

__weakref__

list of weak references to the object (if defined)

Intervals estimation

module interval.py

This module contains functions to estimate confidence or credible intervals.

statspy.interval.pllr(pf, data, **kw)[source]

Compute confidence intervals using a profile likelihood ratio method.

Interval estimation is done through steps:

  • The (minus log-)likelihood is computed from pf and data via the PF.nllf method.

  • Best estimates hat{theta_i} for each parameter theta_i are computed with the PF.maxlikelihood_fit method.

  • The confidence interval of theta_i around its best estimate is computed from a profile log-likelihood ratio function q defined as:

    l = L(x|theta_i,\hat{\hat{theta_s}}) / L(x|\hat{theta_i},\hat{theta_s})
    q(theta_i) = -2 * log(l)

    where L is the likelihood function and theta_s are the nuisance parameters.

  • q(theta_i) is assumed to be described as a chi2 distribution (Wilks’ theorem). Bounds corresponding to a given confidence level (CL) are found by searching values for which q(theta_i) is equal to the chi2 quantile of CL:

    quantile = scipy.stats.chi2.ppf(cl, ndf)
    
Parameters :

pf : statspy.core.PF

Probability function used in the computation of the likelihood

data : ndarray, tuple

x - variates used in the computation of the likelihood

kw : keyword arguments (optional)

Possible keyword arguments are:

cl : float

Confidence level (0.6827 by default)

ndf : int

Number of degrees of freedom (1 by default)

root_finder : scipy.optimize function

Root finder algorithm (scipy.optimize.brentq by default)

Returns :

params : statspy.core.Param list

List of the parameters for which a confidence interval has been extracted including updated ‘value’, ‘neg_unc’ and ‘pos_unc’ arguments.

corr : ndarray

Correlation matrix

quantile : float

Quantile used in the computation of bounds

Hypothesis tests

module hypotest.py

This module contains functions to perform hypothesis tests.

class statspy.hypotest.Result[source]

Class to store results from an hypothesis test.

Among the variables stored in this class, there are:

  • the pvalue which is defined as Prob(t >= tobs | H0) with t the test statistics. If this value is lower than the predefined type I error rate, then the null hypothesis is rejected.
  • the Zvalue, the standard score (or Z-score), is the p-value expressed in the number of standard deviations.

Methods

__delattr__

x.__delitem__(y) <==> del x[y]

__setattr__

x.__setitem__(i, y) <==> x[i]=y

__weakref__

list of weak references to the object (if defined)

statspy.hypotest.pvalue_to_Zvalue(pvalue, mode='two-sided')[source]

Convert a p-value to a Z-value.

Definition:

math.sqrt(2.) * scipy.special.erfcinv(mode * pvalue)

mode is equal to 2 for a two-sided Z-value and to 1 for a one-sided Z-value.

Parameters :

pvalue : float

p-value defined as Prob(t >= tobs | H0)

mode : str

‘two-sided’ (default) or ‘one-sided’

Returns :

Zvalue : float

Z-value corresponding to the number of standard deviations

statspy.hypotest.Zvalue_to_pvalue(Zvalue, mode='two-sided')[source]

Convert a Z-value to a p-value.

Definition:

scipy.special.erfc(Zvalue / math.sqrt(2.)) / mode

mode is equal to 2 for a two-sided Z-value and to 1 for a one-sided Z-value.

Parameters :

Zvalue : float

Z-value corresponding to the number of standard deviations

mode : str

‘two-sided’ (default) or ‘one-sided’

Returns :

pvalue : float

p-value defined as Prob(t >= tobs | H0)

statspy.hypotest.pllr(pf, data, **kw)[source]

Profile likelihood ratio test.

For the likelihood ratio test, the likelihood is maximized separately for the null and the alternative hypothesis. The word “profile” means that in addition, the likelihood is maximized wrt the nuisance parameters. The test statistics is then defined as:

l = L(x|theta_r,\hat{\hat{theta_s}}) / L(x|\hat{theta_r},\hat{theta_s})
q_obs = -2 * log(l)

and is distributed asymptotically as a chi2 distribution. q_obs can be used to compute a p-value = Pr(q >= q_obs).

Parameters :

pf : statspy.core.PF

Probability function used in the computed of the likelihood

data : ndarray, tuple

x - variates used in the computation of the likelihood

kw : keyword arguments (optional)

Returns :

result : statspy.hypotest.Result

All information about the test is stored in the Result class.

Mathematical functions

statspy.core.exp(x)

Compute the exponential of a Parameter.

Parameters :

x : Param

Input parameter

Returns :

y : Param

Exponential of x

Examples

>>> import statspy as sp
>>> x = sp.Param("x = 4 +- 1")
>>> y = sp.exp(x)
statspy.core.log(x)

Compute the logarithm of a Parameter.

Parameters :

x : Param

Input parameter

Returns :

y : Param

Logarithm of x

Examples

>>> import statspy as sp
>>> x = sp.Param("x = 4 +- 1")
>>> y = sp.log(x)
statspy.core.sqrt(x)

Compute the square root of a Parameter.

Parameters :

x : Param

Input parameter

Returns :

y : Param

Square root of x

Examples

>>> import statspy as sp
>>> x = sp.Param("x = 4 +- 1")
>>> y = sp.sqrt(x)

Miscellaneous functions

statspy.core.get_obj(obj_name)[source]

Returns a Param, PF or RV object.

Look in the different dictionaries if an object named obj_name exists and returns it.

Parameters :

obj_name : str

Object name used to define a Param, PF or RV

Returns :

new : Param, PF, RV

Object if found in the dictionaries

Examples

>>> import statspy as sp
>>> mypmf = sp.PF('mypmf=poisson(n;lbda=5)')
>>> lbda = sp.get_obj('lbda')
>>> lbda.label = '\\lambda'

Table Of Contents

Previous topic

StatsPy Users Guide

This Page