PyRoot vs. SciPy Fitting in Python
PyRoot, SciPy, Fitting, Python, CERN ROOT,
CERN ROOT is a powerful object oriented program in C++ data analysis framework with statistical and fitting methods used by many physicists in fundamental research. It developed and is maintained by the European Organization for Nuclear Research, known as CERN. The CERN ROOT analysis framework was used the analysis and figures presented in the discovery of the Higgs Boson.
I used ROOT in my own graduate research in astroparticle physics given the amount of particle physicist in my community. While I started my graduate research using ROOT in C/C++, I found Python to be a powerful interpreter and transitioned away from C/C++ in my analysis. Luckily, ROOT can now be imported into Python as PyRoot. Python also has the module SciPy (Scientific Python) with fitting and statistical methods for scientific computing. This is an example of how to use PyRoot and comparing it to SciPy by fitting a Gaussian distribution. A link to the raw codes and outputs in this example is in a Jupyter python notebook on my Github profile.
- Import SciPy and PyRoot
- Creating a SciPy Histogram
- Creating the histogram in PyRoot
- Fit the Histogram with PyRoot
- Plot Fitted Histogram in PyRoot
- SciPy Fit
- PyRoot vs SciPy
- Resources
Reviewing the Python syntax in the Jupyter notebook linked above.
Import SciPy and PyRoot
Import the desired python modules, specifically SciPy and PyRoot.
import numpy as np
import scipy as sp
from scipy.optimize import curve_fit
from scipy.stats import chi2
from ROOT import TCanvas, TGraph, TH1F, TF1, gStyle, TH1D
The versions I used in my example are:
Scipy v1.1.0 Numpy v1.15.0
ROOT Version: 6.18/04
Creating a SciPy Histogram
Generate a test distribution using 500 random samples from a gaussian distribution using scipy.random.normal()
with μ=1, σ=.1. The histogram was binned using plt.hist()
.
# Define Python Histogram #
mean, std = 1 , 0.1
data = sp.random.normal(mean,std,500)
# Define Bin Size #
xmin = np.floor(10.*data.min())/10.
xmax = np.ceil(10.*data.max())/10.
nbins = int((xmax-xmin)*100)
# print(xmin, xmax, nbins)
# Create Python Histogram #
hist, bin_edges, patches = plt.hist(data,nbins,(xmin,xmax),color='g',alpha=0.6)
bin_centers = (bin_edges[1:]+bin_edges[:-1])/2.
# Find Non-zero bins in Histogram
nz = hist>0
first_nz = bin_centers[nz][ 0] - 0.005
last_nz = bin_centers[nz][-1] + 0.005
Creating the histogram in PyRoot
Define the PyRoot histogram using TH1D()
. Then using the distribution sampled above and load it the histogram h
with SetContent()
.
root_hist = np.zeros(nbins+2,dtype=float)
root_hist[1:-1] = hist
h = TH1D('h','hist',nbins,bin_edges)
h.SetContent(root_hist)
h.SetTitle("Root Histogram with Fit;X Axis;Y Axis [Counts]")
Fit the Histogram with PyRoot
Fit the PyRoot histogram with Fit()
using the ROOT predefined gaus
function over the range xmin
to xmax
. The retrieve the fit function with GetFunction()
, retrieve the fit function f
using GetParameter()
, the fit function parameter error using GetParError()
, and the fit statistics with GetNDF()
, GetChisquared()
, and GetProb()
.
# Fit histogram with root #
h.Fit('gaus','','',xmin,xmax)
# Get Root Fit and Goodness of Fit Parameters #
f = h.GetFunction('gaus')
const,mu,sigma = f.GetParameter(0), f.GetParameter(1), f.GetParameter(2)
econst,emu,esigma = f.GetParError(0), f.GetParError(1), f.GetParError(2)
ndf,chi2,prob = f.GetNDF(),f.GetChisquare(),f.GetProb()
print(chi2, ndf)
print(chi2/ndf,prob)
which outputs the fit statistics
49.08659294755991 48
1.0226373530741648 0.4293397390693532
FCN=49.0866 FROM MIGRAD STATUS=CONVERGED 68 CALLS 69 TOTAL
EDM=8.44808e-08 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 Constant 1.82687e+01 1.13856e+00 2.97321e-03 2.68351e-05
2 Mean 1.00457e+00 5.02995e-03 1.73624e-05 -8.12271e-02
3 Sigma 1.00541e-01 4.61612e-03 4.01582e-05 5.02345e-03
Plot Fitted Histogram in PyRoot
Plotting the PyRoot canvas and histogram and save the canvas as a pdf.
# Draw Fit and Histogram#
h.Draw()
c1.Draw()
c1.SaveAs('root_fit.png')
PyRoot fit in red to the gaussian sampled distribution. The fit statistics from ROOT are in the upper right.
SciPy Fit
Fit the distribution with SciPy using curve_fit()
with the below defined gaus()
.
#SciPy Fit #
# Gaussian Fit Function :
def gaus(x, const, mu, sigma):
return const* np.exp(-0.5*((x - mu)/sigma)**2)
# Define Range and Fit :
coeff, covar = curve_fit(gaus, bin_centers[nz], hist[nz])
PyRoot vs SciPy
Comparing PyRoot to SciPy and plot both using the retrieved gaussian fit parameters. Calculate the Χ2 value and fit probability manually to confirm the fit statistics outputs of PyRoot.
# Define Fit Curves #
x = bin_centers
root_gaus = (const,mu,sigma)
opti_gaus = coeff
# Calculate chi Squared
f_root = gaus(x,*root_gaus)
f_opti = gaus(x,*opti_gaus)
ch2_root = np.sum( (hist[nz]-f_root[nz])**2/hist[nz])
ch2_opti = np.sum( (hist[nz]-f_opti[nz])**2/hist[nz])
# Calculate Degrees of Freedom with 3 fit parameters
dof = len(x[nz])-3
# Calculate Probablity #
p_root = sp.stats.chi2.sf(ch2_root, dof)
p_opti = sp.stats.chi2.sf(ch2_opti, dof)
print('Hist\tCalc Package\tChi Sq\t Chi2/dof\tprob')
print('Root\tRoot\t{0}\t{1}\t{2}'.format(chi2, chi2/dof, prob))
print('Root\tPython\t{0}\t{1}\t{2}'.format(ch2_root, ch2_root/dof, p_root))
print('SciPy\tPython\t{0}\t{1}\t{2}'.format(ch2_opti, ch2_opti/dof, p_opti))
print('Degrees of Freedom =\t{0}'.format(str(dof)))
which outputs
Hist Calc Package Chi Sq Chi2/dof prob
Root Root 49.08659294755991 1.0226373530741648 0.4293397390693532
Root Python 49.08659294755992 1.022637353074165 0.42933973906935374
Scipy Python 54.44279541875839 1.1342249045574664 0.24270327085696833
Degrees of Freedom = 48
PyRoot achieves a better Χ2/ndf and thus a better fit. I hypothesis difference in fits stats from the same data is due the minimization methods of the two modules. SciPy has more examples online with how to use but this example will hopefully allow you to use PyRoot
instead as it is more robust.
SciPy in red and PyRoot in black.
Resources
Back to Top