StatsPy Reference Guide ¶

Contents

StatsPy Reference Guide

Parameter ¶

class statspy.core.Param(*args, **kwargs)[source]¶

Base class to define a PF shape parameter.

Two types of parameters can be built:

RAW parameters do not depend on any other parameter and have a value directly associated to them
DERIVED parameters are obtained from other parameters via an analytical formula.

Examples

Declarations:

>>> import statspy as sp 
>>> mu = sp.Param(name="mu",value=10.,label="\\mu")
>>> x = sp.Param("x = 10. +- 2.")
>>> x.value
10.0
>>> x.unc
2.0
>>> y = sp.Param("y = 5.", unc=1.)

Operations, building DERIVED parameters:

>>> x + y
x + y = 15.0 +- 2.2360679775
>>> z = x * y
>>> z.name = 'z'
>>> z
z = x * y = 50.0 +- 14.1421356237
>>> x**2
x ** 2 = 100.0 +- 40.0

Possible operations are +,-,*,/,**.

Attributes

name	str	Random Variable name
label	str	Random Variable name for printing purposes
value	float	Current numerical value
unc	float	Parameter uncertainty (e.g. after minimization)
neg_unc	float	Negative parameter uncertainty (only for intervals)
pos_unc	float	Positive parameter uncertainty (only for intervals)
bounds	list	Defines, if necessary, the lower and upper bounds
formula	list (optional, only for DERIVED parameters)	List of operators and parameters used to parse an analytic function
strform	str (optional, only for DERIVED parameters)	Representation of the formula as a string
partype	Param.RAW or Param.DERIVED	Tells whether it is a RAW or a DERIVED parameter
const	bool	Tells whether a parameter is fixed during a minimazation process. It is not a constant in the sense of programming.
poi	bool	Tells whether a parameter is a parameter of interest in an hypothesis test.
isuptodate	bool	Tells whether value needs to be computed again or not
logger	logging.Logger	message logging system

Methods

__add__(other)[source]¶

Add a parameter to another parameter or a numerical value.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is the sum of self and other

__call__()[source]¶

Return the parameter value.

Returns :

self.value : float

Parameter value possibly recomputed from self.formula if self.isuptodate is False.

__div__(other)[source]¶

Divide a parameter by another parameter or by a numerical value.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is the ratio of self and other

__getattribute__(name)[source]¶: Overload __getattribute__ to update the value attribute from the formula for a DERIVED parameter.

__iadd__(other)[source]¶

In-place addition (+=)

Parameters :

self : Param

other : Param, int, long, float

Returns :

self : Param

self parameter modified by other

__imul__(other)[source]¶

In-place multiplication (*=)

Parameters :

self : Param

other : Param, int, long, float

Returns :

self : Param

self parameter modified by other

__isub__(other)[source]¶

In-place subtraction (-=)

Parameters :

self : Param

other : Param, int, long, float

Returns :

self : Param

self parameter modified by other

__mul__(other)[source]¶

Multiply a parameter by another parameter or by a numerical value.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is the product of self and other

__pow__(other)[source]¶

Raise a parameter the power.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is self raised to the power of other

__radd__(other)[source]¶: Add a numerical value to a parameter

__rdiv__(other)[source]¶: Divide a numerical value by a parameter

__repr__()[source]¶: Return Parameter value and formula if DERIVED

__rmul__(other)[source]¶: Multiply a numerival value by a parameter

__rpow__(other)[source]¶: Raise a numerical value to the power

__rsub__(other)[source]¶: Subtract a numerical value to a parameter

__setattr__(name, value)[source]¶: Overload __setattr__ to make sure that quantities based on this parameter will be updated when its value is modified.

__sub__(other)[source]¶

Subtract a parameter to another parameter or a numerical value.

Parameters :

self : Param

other : Param, int, long, float

Returns :

new : Param

new parameter which is the difference of self and other

__weakref__¶: list of weak references to the object (if defined)

get_raw_params()[source]¶: Get the list of RAW parameters from a DERIVED parameter

unbound_repr()[source]¶

Transform the value of the parameter such as it has no bounds.

This method is used with the minimization algorithms which require only unbound parameters. The transformation formula from a double-sided or a one-sided parameter to an unbound parameter are described in Section 1.3.1 of the MINUIT manual: Fred James, Matthias Winkler, MINUIT User’s Guide, June 16, 2004.

Returns :

new : float

parameter value within an unbound representation.

unbound_to_bound(val, unc=0.0)[source]¶

Transform the parameter value from an unbound to a bounded representation.

This method is used with the minimization algorithms which require only unbound parameters. The transformation formula from an unbound parameter to a double-sided or a one-sided parameter are described in Section 1.3.1 of the MINUIT manual: Fred James, Matthias Winkler, MINUIT User’s Guide, June 16, 2004. Since the transformation is non-linear, the transformation of the uncertainty is approximate and based on the error propagation formula. In particular, when the value is close to its limit, the uncertainty is not trustable and a more refined analysis should be performed.

Parameters :

val : float

parameter value within an unbound representation

unc : float

parameter uncertainty within an unbound representation

Probability Function ¶

class statspy.core.PF(*args, **kwargs)[source]¶

Base class to define a Probability Function.

Probability Function is a generic name which includes both the probability mass function for discrete random variables and the probability density fucntion for continuous random variables. The function itself is defined in self.func:

For a RAW PF, it is a pdf or a pmf of scipy.stats.
For a DERIVED PF, it is list containing an operator and pointers to other PFs.

Examples

>>> import statspy as sp
>>> pmf_n = sp.PF("pmf_n=poisson(n;mu)",mu=10.)

Attributes

name	str (optional)	Function name
func	scipy.stats.distributions.rv_generic, list	Probability Density Function object.
params	statspy.core.Param list	List of shape parameters used to define the pf
norm	Param	Normalization parameter set to 1 by default. It can be different from 1 when the PF is fitted to data.
isuptodate	bool	Tells whether PF needs to be normalized or not
options	dict	Potential list of options
pftype	PF.RAW or PF.DERIVED	Tells whether a PF is a RAW PF or DERIVED from other PFs
logger	logging.Logger	message logging system

Methods

__add__(other)[source]¶

Add two PFs.

The norm parameters are also summed.

Parameters :

self : PF

other : PF

Returns :

new : PF

new pf which is the sum of self and other

__call__(*args, **kwargs)[source]¶

Evaluate Probability Function in x

Parameters :

args : float, ndarray, optional, multiple values for multivariate pfs

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

value : float, ndarray

Probability Function value(s) in x

__mul__(other)[source]¶

Multiply a PF by another PF.

Parameters :

self : PF

other : PF

Returns :

new : PF

new PF which is the product of self and other

__rmul__(scale)[source]¶

Scale PF normalization value

Parameters :

self : PF

scale : float

Returns :

self : PF

original PF with self.norm.value *= scale

__weakref__¶: list of weak references to the object (if defined)

cdf(*args, **kwargs)[source]¶

Compute the cumulative distribution function in x.

Parameters :

args : ndarray, tuple

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

value : float, ndarray

Cumulative distribution function value(s) in x

convolve(other, **kw)[source]¶

Convolve two PFs which is same as adding two random variables.

The scipy.signal package is used to perform the convolution with different options set by mode.

Parameters :

self : PF

other : PF

kw : keywork arguments, dict

May be:

options : str

Way to perform the convolution. If mode is equal to:

fft then scipy.signal.fftconvolve is used

num then scipy.signal.convolve is used

rvs then random variates are used to generate the PF

Returns :

new : PF

new PF which is the convolution of self and other

Notes

This method is working only for 1-dim pdf/pmf currently.

corr()¶

Returns the correlation matrix of the free parameters.

Returns :

corr : ndarray

Correlation matrix

dF(x)[source]¶

Compute the uncertainty on PF given the uncertainty on the shape and norm parameters.

This method can be used to show an error band on your fitted PF. To compute the uncertainty on the PF, the error propagation formula is used:

dF(x;th) = (F(x;th+dth) - F(x;th-dth))/2
dF(x)^2 = dF(x;th)^T * corr(th,th') * dF(x;th')

so keep in mind it is only an approximation.

Parameters :

x : float, ndarray

Random variable(s) value(s)

Returns :

dF : float, ndarray

Uncertainty on the PF evaluated in x

get_list_free_params()[source]¶: Get the list of free parameters.

kurtosis(**kw)[source]¶: Estimate the kurtosis of the PF. See PF.mean() for the syntax.

leastsq_fit(xdata, ydata, ey=None, dx=None, cond=None, **kw)[source]¶

Fit the PF to data using a least squares method.

The fitting part is performed using the scipy.optimize.leastsq function. The Levenberg-Marquardt algorithm is used by the ‘leastsq’ method to find the minimum values. When calling this method, all PF parameters are minimized except the one which are set as ‘const’.

Parameters :

xdata : ndarray

Values for which ydata are measured and PF must be computed

ydata : ndarray

Observed values (like number of events)

ey : ndarray (optional)

Standard deviations of ydata. If not specified, it takes sqrt(ydata) as standard deviation.

dx : ndarray (optional)

Array containing bin-width of xdata. It can be used to normalize the PF to the integral while minimizing.

cond : boolean ndarray (optional)

Boolean array telling if a bin should be used in the fit or not

kw : keyword arguments

Keyword arguments passed to the leastsq method

Returns :

free_params : statspy.core.Param list

List of the free parameters used during the fit. Their ‘value’ and ‘unc’ arguments are extracted from minimization.

pcov : 2d array

Estimated covariance matrix of the free parameters.

chi2min : float

Least square sum evaluated in popt.

pvalue : float

p-value = P(chi2>chi2min,ndf) with P a chi2 distribution and ndf the number of degrees of freedom.

loc(loc)[source]¶

Derive a PF from another PF via a location parameter.

Parameters :

self : PF

loc : Param, value

Returns :

new : PF

new PF with x -> x + loc

logpf(*args, **kwargs)[source]¶

Compute the logarithm of the PF in x.

Parameters :

args : ndarray, tuple

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

value : float, ndarray

Logarithm of the Probability Function value(s) in x

maxlikelihood_fit(data, **kw)[source]¶

Fit the PF to data using the maximum likelihood estimator method.

The fitting part is performed using the scipy.optimize.minimize function. If keyword argument method is not specified, the BFGS algorithm is used. When calling this method, all PF parameters are minimized except the one which are set as ‘const’ before calling the method.

Parameters :

data : ndarray, tuple

Data used in the computation of the (log-)likelihood function

kw : keyword arguments (optional)

Keyword arguments passed to the scipy.optimize.minimize method.

Returns :

free_params : statspy.core.Param list

List of the free parameters used during the fit. Their ‘value’ arguments are extracted from the minimization process. If the Hessian is provided by the algorithm, the ‘unc’ argument is also updated.

nllfmin : float

Minimal value of the negative log-likelihood function

mean(**kw)[source]¶

Estimate the mean of the PF.

Warning

In the case of a RAW PF from scipy.stats, it calls the stats method and therefore returns the expected value.
In the case of a DERIVED PF, it returns an estimate derived from random variates and using the numpy.mean function.

Parameters :

kw : keywork arguments, optional

Shape parameters values

Returns :

mean : ndarray

nllf(data, **kw)[source]¶

Evaluate the negative log-likelihood function:

nllf = -sum_i(log(pf(x_i;params)))

sum runs over the x-variates defined in data array.

Parameters :

data : ndarray, tuple

x - variates used in the computation of the likelihood

kw : keyword arguments (optional)

Specify any Parameter name of the considered PF

Returns :

nllf : float

Negative log-likelihood function

pllr(data, **kw)[source]¶

Evaluate the profile log-likelihood ratio ( * -2 )

The profile likelihood ratio is defined by:

l = L(x|theta_r,\hat{\hat{theta_s}}) / L(x|\hat{theta_r},\hat{theta_s})

The profile log-likelihood ratio is then:

q = -2 * log(l)

Where

L is the Likelihood function (computed via data)
theta_r is the list of parameters of interest (Param.poi = True)
theta_s is the list of nuisance parameters
hat or double hat refers to the unconditional and conditional maximum likelilood estimates of the parameters respectively.

pllr is used as a test statistics for problems with numerous nuisance parameters. Asymptotically, the pllr PF is described by a chi2 distribution (Wilks theorem). Further information on the likelihood ratio can be found in Chapter 22 of “Kendall’s Advanced Theory of Statistics, Volume 2A”.

Parameters :

data : ndarray, tuple

x - variates used in the computation of the likelihood

kw : keyword arguments (optional)

Specify any Parameter of interest name of the considered PF, or any option used by the method maxlikelihood_fit.

uncond_nllf : float

unconditional minimal negative log-likelihood function value

Returns :

pllr : float

Profile log-likelihood ratio times -2

pvalue(*args, **kwargs)¶

Compute the pvalue in xobs.

The p-value is the probability of observing at least xobs Pr(x >= xobs).

Parameters :

args : ndarray, tuple

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

pvalue : float, ndarray

p-value(s) in x

rv_names()[source]¶: Return a list of RV names.

rvadd(other, **kw)[source]¶: Operation equivalent to adding two random variables.

rvdiv(other, **kw)[source]¶: Operation equivalent to dividing two random variables.

rvmul(other, **kw)[source]¶: Operation equivalent to multiplying two random variables.

rvs(**kwargs)[source]¶

Get random variates from a PF

Returns :

data : ndarray

Array of random variates

Examples

>>> import statspy as sp
>>> pdf_x = sp.PF("pdf_x=norm(x;mu=20,sigma=5)")
>>> data = pdf_x.rvs(size=1000)

rvsub(other, **kw)[source]¶: Operation equivalent to subtracting two random variables.

scale(scale)[source]¶

Derive a PF from another PF via a scale parameter.

Parameters :

self : PF

scale : Param, value

Returns :

new : PF

new PF with x -> x * scale

sf(*args, **kwargs)[source]¶

Compute the survival function (1 - cdf) in x.

Parameters :

args : ndarray, tuple

Random Variable(s) value(s)

kwargs : keywork arguments, optional

Shape parameters values

Returns :

value : float, ndarray

Survival function value(s) in x

skew(**kw)[source]¶: Estimate the skewness of the PF. See PF.mean() for the syntax.

std(**kw)[source]¶: Estimate the standard deviation of the PF. See PF.mean() for the syntax.

var(**kw)[source]¶: Estimate the variance of the PF. See PF.mean() for the syntax.

Random Variable ¶

class statspy.core.RV(*args, **kwargs)[source]¶

Base class to define a Random Variable.

Examples

>>> import statspy as sp 
>>> X = sp.RV("norm(x|mu=10,sigma=2)")

Attributes

name	str	Random Variable name
pf	statspy.core.PF	Probability Function object associated to a Random Variable
rvtype	RV.CONTINUOUS or RV.DISCRETE	Random Variable type
logger	logging.Logger	message logging system

__add__(other)[source]¶

Add two random variables or a random variable by a parameter.

The associated PF of the sum of the two random variables is a convolution of the two PFs. Caveat: this method assumes independent random variables.

Parameters :

self : RV

other : RV, Param, value

Returns :

new : RV

new RV which is the sum of self and other

__call__(**kwargs)[source]¶

Get random variates

Returns :

data : ndarray

Array of random variates

Examples

>>> import statspy as sp
>>> X = sp.RV("norm(x;mu=20,sigma=5)")
>>> x = X(size=1000)

__div__(other)[source]¶

Divide two random variables or a random variable by a parameter.

Caveat: this method assumes independent random variables.

Parameters :

self : RV

other : RV, Param, value

Returns :

new : RV

new RV which is the ratio of self and other

__mul__(other)[source]¶

Multiply two random variables or a random variable by a parameter.

Caveat: this method assumes independent random variables.

Parameters :

self : RV

other : RV, Param, value

Returns :

new : RV

new RV which is the product of self and other

__radd__(other)[source]¶

Add a random variable and a parameter.

Parameters :

self : RV

other : Param, value

Returns :

new : RV

new RV which is the sum of self and other

__rmul__(other)[source]¶

Multiply a parameter by a random variable.

Parameters :

self : RV

other : Param, value

Returns :

new : RV

new RV which is the product of self and other

__sub__(other)[source]¶

Subtract two random variables or a random variable by a parameter.

Caveat: this method assumes independent random variables.

Parameters :

self : RV

other : RV, Param, value

Returns :

new : RV

new RV which is the difference of self and other

__weakref__¶: list of weak references to the object (if defined)

Intervals estimation ¶

module interval.py

This module contains functions to estimate confidence or credible intervals.

statspy.interval.pllr(pf, data, **kw)[source]¶

Compute confidence intervals using a profile likelihood ratio method.

Interval estimation is done through steps:

The (minus log-)likelihood is computed from pf and data via the PF.nllf method.
Best estimates hat{theta_i} for each parameter theta_i are computed with the PF.maxlikelihood_fit method.
The confidence interval of theta_i around its best estimate is computed from a profile log-likelihood ratio function q defined as:
```
l = L(x|theta_i,\hat{\hat{theta_s}}) / L(x|\hat{theta_i},\hat{theta_s})
q(theta_i) = -2 * log(l)
```
where L is the likelihood function and theta_s are the nuisance parameters.
q(theta_i) is assumed to be described as a chi2 distribution (Wilks’ theorem). Bounds corresponding to a given confidence level (CL) are found by searching values for which q(theta_i) is equal to the chi2 quantile of CL:
```
quantile = scipy.stats.chi2.ppf(cl, ndf)
```

Parameters :

pf : statspy.core.PF

Probability function used in the computation of the likelihood

data : ndarray, tuple

x - variates used in the computation of the likelihood

kw : keyword arguments (optional)

Possible keyword arguments are:

cl : float

Confidence level (0.6827 by default)

ndf : int

Number of degrees of freedom (1 by default)

root_finder : scipy.optimize function

Root finder algorithm (scipy.optimize.brentq by default)

Returns :

params : statspy.core.Param list

List of the parameters for which a confidence interval has been extracted including updated ‘value’, ‘neg_unc’ and ‘pos_unc’ arguments.

corr : ndarray

Correlation matrix

quantile : float

Quantile used in the computation of bounds

Hypothesis tests ¶

module hypotest.py

This module contains functions to perform hypothesis tests.

class statspy.hypotest.Result[source]¶

Class to store results from an hypothesis test.

Among the variables stored in this class, there are:

the pvalue which is defined as Prob(t >= tobs | H0) with t the test statistics. If this value is lower than the predefined type I error rate, then the null hypothesis is rejected.
the Zvalue, the standard score (or Z-score), is the p-value expressed in the number of standard deviations.

Methods

__delattr__¶: x.__delitem__(y) <==> del x[y]

__setattr__¶: x.__setitem__(i, y) <==> x[i]=y

__weakref__¶: list of weak references to the object (if defined)

statspy.hypotest.pvalue_to_Zvalue(pvalue, mode='two-sided')[source]¶

Convert a p-value to a Z-value.

Definition:

math.sqrt(2.) * scipy.special.erfcinv(mode * pvalue)

mode is equal to 2 for a two-sided Z-value and to 1 for a one-sided Z-value.

Parameters :

pvalue : float

p-value defined as Prob(t >= tobs | H0)

mode : str

‘two-sided’ (default) or ‘one-sided’

Returns :

Zvalue : float

Z-value corresponding to the number of standard deviations

statspy.hypotest.Zvalue_to_pvalue(Zvalue, mode='two-sided')[source]¶

Convert a Z-value to a p-value.

Definition:

scipy.special.erfc(Zvalue / math.sqrt(2.)) / mode

mode is equal to 2 for a two-sided Z-value and to 1 for a one-sided Z-value.

Parameters :

Zvalue : float

Z-value corresponding to the number of standard deviations

mode : str

‘two-sided’ (default) or ‘one-sided’

Returns :

pvalue : float

p-value defined as Prob(t >= tobs | H0)

statspy.hypotest.pllr(pf, data, **kw)[source]¶

Profile likelihood ratio test.

For the likelihood ratio test, the likelihood is maximized separately for the null and the alternative hypothesis. The word “profile” means that in addition, the likelihood is maximized wrt the nuisance parameters. The test statistics is then defined as:

l = L(x|theta_r,\hat{\hat{theta_s}}) / L(x|\hat{theta_r},\hat{theta_s})
q_obs = -2 * log(l)

and is distributed asymptotically as a chi2 distribution. q_obs can be used to compute a p-value = Pr(q >= q_obs).

Parameters :

pf : statspy.core.PF

Probability function used in the computed of the likelihood

data : ndarray, tuple

x - variates used in the computation of the likelihood

kw : keyword arguments (optional)

Returns :

result : statspy.hypotest.Result

All information about the test is stored in the Result class.

Mathematical functions ¶

statspy.core.exp(x)¶

Compute the exponential of a Parameter.

Parameters :

x : Param

Input parameter

Returns :

y : Param

Exponential of x

Examples

>>> import statspy as sp
>>> x = sp.Param("x = 4 +- 1")
>>> y = sp.exp(x)

statspy.core.log(x)¶

Compute the logarithm of a Parameter.

Parameters :

x : Param

Input parameter

Returns :

y : Param

Logarithm of x

Examples

>>> import statspy as sp
>>> x = sp.Param("x = 4 +- 1")
>>> y = sp.log(x)

statspy.core.sqrt(x)¶

Compute the square root of a Parameter.

Parameters :

x : Param

Input parameter

Returns :

y : Param

Square root of x

Examples

>>> import statspy as sp
>>> x = sp.Param("x = 4 +- 1")
>>> y = sp.sqrt(x)

Miscellaneous functions ¶

statspy.core.get_obj(obj_name)[source]¶

Returns a Param, PF or RV object.

Look in the different dictionaries if an object named obj_name exists and returns it.

Parameters :

obj_name : str

Object name used to define a Param, PF or RV

Returns :

new : Param, PF, RV

Object if found in the dictionaries

Examples

>>> import statspy as sp
>>> mypmf = sp.PF('mypmf=poisson(n;lbda=5)')
>>> lbda = sp.get_obj('lbda')
>>> lbda.label = '\\lambda'

StatsPy Reference Guide ¶

Parameter ¶

Probability Function ¶

Random Variable ¶

Intervals estimation ¶

Hypothesis tests ¶

Mathematical functions ¶

Miscellaneous functions ¶

Table Of Contents

Previous topic

This Page

Navigation

Previous topic

This Page

Quick search

Navigation