Citizen science: Top quark production at CERN's Large Hadron Collider

There were so many things that have happened last week. Even though I was able to work on the fourth set of tasks for this project, I had to put aside the writing part of my progress since I had to spend my weekend for some urgent work tasks. Last week was also the end of summer season in our country, we're now in the season when every afternoon heavy rain pours. The drastic transition from hot and humid to cold and rainy days was so evident that my body take its toll 😩 which further delay my writing.

I was actually very excited when I was trying to understand what the histograms show because it was also last week when I was able to finish reading one of the published article related to this project. I thought about how some of the figures in the article are similar in form with the results we are able to produce in this set of tasks. I had a feeling of wearing a particle physicist's hat for a few seconds, hahaha.

And now, before I discuss about this week's progress. Here's just a quick review of how far we've come already in this project, that you may take interest and hopefully join us:
  1. Installation of the tools [MG5aMC] necessary for particle physics simulation
  2. Using the MG5aMC to simulate top quark production at CERN's LHC
  3. Installation and use of MadAnalysis 5 for detector simulation and event reconstruction
For the set of tasks this week, we investigated the simulated collisions and studied their properties in the same way a with the particle physicists, which explains what I felt in my story. ☺️ To follow on this progress report, you can check on the fourth blog post of the citizen science project by @lemouth, Citizen science on Hive - Deciphering top quark production at CERN's Large Hadron Collider.

From the previous set of tasks, where we performed event reconstruction on the simulated collisions, we use those to analyze the resulting higher-level objects which are the electrons, muons, taus, jets, photons, and missing energy. We start by launching MadAnalysis5 in the terminal by typing ./bin/ma5 in the directory named madanalysis5.

Next, is we import the simulated events once the prompt >ma5 shows in the terminal:
ma5> import ANALYSIS_0/Output/SAF/_defaultset/lheEvents0_0/myevents.lhe.gz as ttbar
And then, we set the production cross section associated with this event sample to 505.491 pb by typing ma5> set ttbar.xsection = 505.491. As I remember, I got a different production cross section from the samples I've simulated. It was mentioned in the tutorial post that I can use this value instead, I wonder though, how does this affect the analysis?

Normalization of the reconstructed events

The first thing to do before we proceed on the analysis is to normalize the reconstructed events in terms of high-level objects to the amount of collisions recorded during the full run 2 of the LHC. We do this my indicating in the MadAnalysis5 the value of the luminosity to 140fb-1. The luminosity is defined as the measure of how many particles can be squeeze through a given space in a given time which does not necessarily mean that the particles collide.

        ma5> set main.lumi = 140
        ma5> plot NAPID
        ma5> submit
        ma5> open

This will give us the results shown in Figure 1 where the y-axis indicates the value of luminosity we used. The histogram shows the distribution of our reconstructed events where we can see a lot of b-jets (b~/b column), some electrons (e+/e- column), some missing energy (ve~/ve column), some muons (mu+/mu- column), really a lot of jets (g column), and a bunch of photons (a column).

Screen Shot 2022-05-22 at 8.50.00 PM.png

Figure 1. Decays from top-antitop pair normalized with the full run 2 of the LHC

The distribution is very reasonable as it agrees with what actually happens in the LHC. To explain further, a top-antitop pair decays immediately and produces a b-quark and a W boson. A b-quark gives rise to a b-jet which corresponds to a jet of strongly-interacting particles. There are two b-jets produced in every single simulated collision, one from each of the decay of the top quark and top antiquark. The identification of the b-flavour of the jets is not a 100% efficient process, some are identifies as light (i.e. normal) jets which contributes to the column g in Figure 1.

In the same manner with the b-jets, there are two W bosons that originate from the top quark and top antiquark. The instantaneous decay of a W boson produces either of the following:
  • one electron and one neutrino
  • one muon and one neutrino
  • a pair of light jets
  • one tau lepton and one neutrino
The decays occur at the same rate except for the option where a pair of light jets is produced, which is more frequent. The large amount of light jets for this reconstructed events can be attributed to the mistagging of the b-jets, more frequent decay rate from the W bosons, and lastly, from radiation. The questions that come to mind are quite general, what are the factors that affect the reconstruction to higher-level objects? (especially for the case of taus). And, are these factors also the reason for the efficiency problem of identifying the b-flavour jets as mentioned earlier?

Particle multiplicity

The amount of light jets and photons observed in the event final state comes from the very energetic particles which radiate a lot. However, some of these radiations are often not so energetic. In order to focus on the objects that have sufficient energy, an additional step to our analysis is to 'clean' our sample set. For the series of particles in this set of tasks, we have imposed values of transverse momentum that will be ignored.

Lepton multiplicity

The leptons are the electrons, muons, taus, neutrinos, and their corresponding antiparticles. We implement a cut on the leptons with PT < 20 GeV using the following commands:

        ma5> define l = l+ l-

        ma5> plot N(l) 5 0 5
        ma5> select (l) PT > 20
        ma5> plot N(l) 5 0 5
        ma5> plot NAPID
        ma5> resubmit
        ma5> open

In the code, we define the leptons (l) in the way that the program understands. The values 5 0 5 in the plot command means that there are 5 bins ranging from 0 to 5. The N sysmbol indicates that we want to determine the multiplicity of the particles in some object. The resulting histograms are shown in Figure 2.

Figure 2. (left) Multiplicity of final-state leptons. (right) Multiplicity of final-state leptons after the cut [PT > 20 GeV].

Photon multiplicity

We impose the same cut restriction for the photons using the same commands, only we change the symbol from l to a, which is the label that corresponds to photons in MadAnalysis5. Figure 3 shows the resulting histograms for the multiplicity of the photons.

It was mentioned from the tutorial that a general trend of a migration of events from the right part to the left can be observed upon imposing the cut. This is because the not so energetic charged particles are ignored, thus their number decreases. This effect is obvious in the zero-bin photons in the histogram after the cut (in Figure 3) which shows the removal of most photons that originate from radiation.

Figure 3. (left) Multiplicity of final-state photons. (right) Multiplicity of final-state photons after the cut [PT > 20 GeV].

Jet multiplicity

Here comes the first of the tasks we did independently from the tutorial, haha, it is to redo the selection cut for the b-jets. Using the same commands, I used the label j and imposed a selection cut where we ignore the jets with PT < 25 GeV. The resulting histogram in Figure 4 shows that the migration shifted from left to right. The b-jets with higher multiplicity remained after the imposed selection. My guess comes from what I remember from the introduction to particle physics course I took online years ago, that jets have large transverse momentum that is also why they can reach the outer layer of the detector which measures the hadronic decays. By considering objects with higher values of transverse momentum, we observe showers with higher multiplicity of jets in the distribution.

Figure 4. (left) Multiplicity of final-state jets. (right) Multiplicity of final-state jets after the cut [PT > 25 GeV].

Selection of a subset of all simulated collisions

In this section, we select a subset of all generated collisions to study the properties the object of our concern. In this example, we enforced one of the top quarks to decay into one lepton, one neutrino, and one b-jet. The other top quark will then be enforced to decay into one b-jet and two light jets. The subset we focus on is the event final state that consists of exactly one lepton. To do this, we use the command: ma5> select N(l) == 1. This cut as shown in Figure 5 has an efficiency of about 30%, which means that about 30% of all events feature exactly one lepton in the final state.

Screen Shot 2022-05-22 at 10.08.05 PM.png

Figure 5. Parameters of the selection of events containing only one lepton.

Again, looking into the multiplicty distribution of the leptons after the imposed cut. The resulting histogram in Figure 6 shows that there are only one lepton (one-bin lepton) event final state in the subset.

Screen Shot 2022-05-22 at 9.48.47 PM.png

Figure 6. Selection of the events containing only one lepton.

The properties of a lepton

The particles of objects in our event can be distinguished from their properties. The transverse momentum is one of them, the other one is the pseudo-rapidity that indicates how central an object is in a detector. An object is very central to the detector if it has a small pseudo-rapidity values and longitudinal if large.

To observe the values of these lepton properties, we look into their distribution. In MadAnalysis 5, we type the following commands:

        ma5> plot PT(l) 25 0 200 [logY]
        ma5> plot ABSETA(l) 25 0 2.5 [logY]
        ma5> resubmit
        ma5> open

Figure 7. (left) Transverse momentum distribution of the leptons (right) Pseudo-rapidity distribution of the leptons.

Comparing the results shown in Figure 7 from the histograms in the tutorial, I noticed a difference in the distribution of the pseudo-rapidity. The one from the tutorial looks more uniform compared to the results I've got. I wonder what could be the reason for this?

The mass of a W boson

Another thing we can investigate on is the mass of the combination of jets in the selected events. Since the two top quarks are enforced to decay and produce two b-jets. We mentally combine the two jets and observe the resulting distribution. To do this, we type the following commands:

        ma5> plot M(j[1] j[2]) 50 0 250 [logY]

        ma5> resubmit
        ma5> open

Screen Shot 2022-05-22 at 10.03.31 PM.png

Figure 8. Mass of the combination of two jets.

The resulting histogram in Figure 8 shows a peak around 80 GeV which corresponds to the mass of the W boson. As mentioned previously, the two jets we combined have an intermediate W boson in the decay. Amazingly, this is how particle physicists look for the presence of real intermediate particles in a process. It is to try to reconstruct peaks that appear in the middle of distributions. What an experience to be able to do the same work! :)

Missing transverse energy spectrum

Here's the last task we have to work on independently, I must say, the questions accompanying the homework were really helpful. I was obliged to think make sense the results I got, not in a way that I should get the right answers only. The questions were like fuel to my curiosity that I had to answer the whys in the hope that it does make sense hahaha. To generate the missing transverse energy distribution, we use the following commands:

        ma5> plot MET 50 0 500 [logY]
        ma5> resubmit
        ma5> open

Screen Shot 2022-05-22 at 10.05.52 PM.png

Figure 9. Missing transverse energy distribution.

I have chosen a wide range of values to consider for the reason that I may see the bigger picture. Haha. The histogram in Figure 9 shows a distribution skewed to the left or the low-value transverse momentum. This could be the reason why the missing energy is difficult to measure since bulk of these missing energies have low level of energy.

Links of my progress reports for this project:

  1. Citizen science: Installing a tool for particle physics simulations
  2. Citizen science: Using MG5aMC to simulate top quark production at CERN's LHC
  3. Citizen science: Detector simulation and event reconstruction using MadAnalysis 5
3 columns
2 columns
1 column