Data points, sets and files
Instance of DataPoint
represents one kinematic point.
It can correspond to a single experimental measurement,
and most of Gepard functions that calculate observables
or various form factors accept DataPoint
as an argument.
So, when you want to tabulate something, or
plot something (such as CFF or cross-section) as a continuous
function of some variable, you will have to create
a “continuum” of DataPoints
.
Several DataPoint
objects can be collected in a special
DataSet
object. This is not necessary, but it is convenient,
and all experimental datasets that ship with Gepard are
DataSet
objects with unique ID number.
DataPoint Attributes
Instance of DataPoint
can have following attributes. Some
are used by the code, some are just for convenience.
Attribute |
Description |
---|---|
|
Bjorken \(x_B\) |
|
Mandelstam t, i. e., momentum transfer to target squared. |
|
\(Q^2\) |
|
azimutal angle \(\phi\) |
|
harmonic of azimuthal angle \(\phi\). Here values 0, 1, … correspond to zeroth, first, … cosine harmonics, while -1, -2, … correspond to first, second, … sine harmonics. |
|
measured observables |
|
value measured |
|
total uncertainty of val |
|
statistical uncertainty of val |
|
systematic uncertainty of val |
|
coordinate frame used ( |
|
id number of the dataset to which point belongs |
|
reference to where data was published |
Some details and other attributes are given below.
Coordinate frames
Take note that Gepard internally works in the BMK frame, while most of the experimental
data is published in the Trento frame. There are convenience functions
to_conventions
and from_conventions
that transform datapoints
in place from Trento to BMK frame and back, respectively.
>>> import gepard as g
>>> pt = g.DataPoint(xB=0.1, phi=1, frame='Trento')
>>> pt.to_conventions()
>>> pt.phi # = (pi - phi)
2.141592653589793
>>> pt.frame
'Trento'
>>> pt.from_conventions()
>>> pt.phi
1.0
Note that the frame
attribute keeps the original value even after
transformation to BMK frame.
All datasets that are made available in Gepard as g.dset
are already
transformed into the BMK frame.
Working with datasets
Datasets that ship with Gepard are all collected in the Python dictionary
g.dset
, where keys are ID numbers of datasets. Detailed description
of available datasets will be here.
There is utility function g.list_data
that gives short tabular description
of sets with given IDs:
>>> g.list_data(list(range(47, 54)))
[ 47] ZEUS 6 XGAMMA 0812.2517 Table 1
[ 48] ZEUS 6 XGAMMA 0812.2517 Table 2
[ 49] ZEUS 8 XGAMMA 0812.2517 Table 3
[ 50] HALLA 288 XLUw 0607029 DFT analysis with MC error propagation by KK
[ 51] HALLA 96 XUUw 0607029 DFT analysis with MC error propagation by KK
[ 52] HERMES 36 TSA 1004.0177 Table 4
[ 53] HERMES 36 BTSA 1004.0177 Table 4
In first column above are ID numbers of a given dataset.
Another utility function, g.describe_data
gives short tabular description
of given DataSet
:
>>> g.describe_data(g.dset[52])
npt x obs collab FTn id ref.
----------------------------------------------
12 x TSA HERMES -1.0 52 arXiv:1004.0177v1
12 x TSA HERMES -2.0 52 arXiv:1004.0177v1
12 x TSA HERMES -3.0 52 arXiv:1004.0177v1
----------------------------------------------
TOTAL = 36
>>> pt = g.dset[52][0] # First point of this dataset
>>> pt.xB, pt.t, pt.Q2, pt.val, pt.err
(0.079, -0.031, 1.982, -0.008, 0.05239274758971894)
Useful utility function is g.select
which selects subset of points
from a dataset according to some criteria:
>>> len(g.dset[143])
90
>>> twist_resist = g.select(g.dset[143], criteria=['Q2 > 5', 't < 0.2'])
>>> len(twist_resist)
40
There are some plotting routines available for inspection of data and
comparison with theory. First, there is a universal jbod
(“just a bunch
of data”) routine that plots any dataset, alone or with theory prediction lines.
For example, ZEUS cross section data (id=49
) from the table above:
>>> import gepard as g
>>> import gepard.plots
>>> from gepard.fits import th_KM15, th_KM10b
>>> gepard.plots.jbod(points=g.dset[49], lines=[th_KM15, th_KM10b]).show()
(Source code
, png
, hires.png
, pdf
)
Also, for some datasets there are dedicated plots, like
>>> import gepard.plots
>>> from gepard.fits import th_KM15, th_KM10b
>>> gepard.plots.H1ZEUS(lines=[th_KM15, th_KM10b]).show()
(Source code
, png
, hires.png
, pdf
)
Finally, there is a convenient method df
which transforms any DataSet
into a
corresponding pandas DataFrame
, which makes
it easy to perform various dataset analyses. E. g., to find the mean
values of kinematic variables of a dataset, you can do it like this:
>>> g.dset[52].df()[['Q2', 'xB', 't']].mean()
Q2 2.780750
xB 0.107083
t -0.143667
dtype: float64
Dataset files
Each dataset that ships with Gepard is stored in the single
ASCII file. User can add their own data files by placing them
in some separate directory, say mydatafiles
, and adding an empty file named
__init__.py
to this directory, which makes data files into proper Python modules.
(Read about Python’s library importlib_resources
for details.)
This directory has to be in Python
module search path. Current working directory (where you start Python, can be
displayed in IPython or Jupyter by issuing %pwd
), is usually in the
search path, and user can explicitely add some other directory to the path like this:
>>> import sys
>>> sys.path.append('<path to mydatafiles>')
Then datafile is available to be imported, and there is a utility
function g.loaddata
that parses all files in the directory
and creates corresponding DataSet
objects:
>>> import mydatafiles
>>> mydset = g.data.loaddata(mydatafiles)
Now mydset
is analogous to g.dset
, which means that datasets
are available as mydset[id]
.
Data files are meant to be readable by both human and computer and follow the following rules:
Syntactic rules:
Empty lines and lines starting with hash sign (
#
) are ignored by parser and can be used for comments meant for human readers.First part of the file is a preamble, consisting of lines with structure
key = value
where
key
should be regular computer variable identifier, i. e., should consist only of letters and numbers (no spaces), and should not start with a number. These keys will become attributes ofDataPoint
object and can be accessed using dot.
operator, like this:>>> pt = g.dset[52][0] # first point of this dataset >>> pt.collaboration 'HERMES'
second and final part of the file is just a grid of numbers.
Semantic rules:
There is world-unique ID number of the file, given by key
id
, and name of the person who created the file, given by keyeditor
. If there are further edits by other people keys such aseditor2
can be used.Other information describing origin of the data can be given using keys such as
collaboration
,year
,reference
, etc. These keys can be used for automatic plots generation.Coordinate frame used is given by key
frame
, equal to eitherTrento
orBMK
.Scattering process is described using keys
in1particle
,in2particle
, …out1particle
, … , set equal to usual symbols for HEP particle names (e
for electron,p
for proton, …).Kinematical and polarization properties of a particle
in1
are then given using keywordsin1energy
,in1polarizationvector
(L
for longitudinal,T
for transversal,U
or unspecified for unpolarized) etc.Key
in1polarization
describes the amount of polarization and is set to 1 if polarization is 100% or if measurements are already renormalized to take into account smaller polarization (which they mostly are).Sign of
in1polarization
describes how the asymmetries are formed, by giving polarization of the first term in the asymmetry numerator (and similarly forin1charge
).For convenience, type of the process is summarized by keys
process
(equal toep2epgamma
for leptoproduction of photon,gammastarp2gammap
for DVCS,gammastarp2rho0p
for DVMP of rho0, etc.) andexptype
(equal tofixed target
orcollider
).Finally, columns of numbers grid are described in the preamble using keys such as
x1name
giving the column variable andx1value = columnK
, whereK
is the corresponding grid column number counting from 1. Herex1
,x2
, …, are used for kinematics (x-axes, such as \(x_{\rm B}\), \(Q^2\), \(t\), \(\phi\)), whiley1
is for the measured observable.Units should be specified by keys such as
in1unit
, and in particular for angles it should be stated whether their unit isdeg
orrad
.Uncertainties are given by keys
y1error
(total error),y1errorstatistic
,y1errorsystematic
,y1errorsystematicplus````y1errorsystematicminus
,y1errornormalization
.For Fourier harmonics, special column names are used:
FTn
for harmonic of azimuthal angle \(\phi\) between lepton and reaction plane andvarFTn
for harmonic of azimuthal angle \(\phi_S\) of target polarization vector. Then in the grid, positive numbers 0, 1, 2, … denote \(\cos 0\phi\), \(\cos\phi\), \(\cos 2\phi\), … harmonics, while negative numbers -1, -2, … denote \(\sin\phi\), \(\sin 2\phi\), … harmonics.If some kinematical value is common to the whole data set then instead of
x1value = columnK
we can specify, e. g.,x1value = 0.36
.Names for observables are standardized. and given in table.