A post mortem analysis of a Data Science approach for determining the existence and decay patterns of the Higgs boson.
This post originally appeared on kasperfred.com
In 2013, the CERN LHC Atlas team released a dataset containing simulated proton-proton collisions some of which resulted in a 125 GeV Higgs which would decay to two taus. Others resulted in two top quarks decaying to either a lepton or a tau, or a W boson decaying to either an electron or a muon and tau pair.
The problem was construct a neural network that correctly segmentize the events using features that can either be measured directly in the accelerator, or can be derived from the measurements.
The final network consisting of only a single hidden layer was able to predict the existence of the Higgs boson with an accuracy of 99.997% over 5 positive predictions (the network thinking that the Higgs exists). For such a simple network, I find that number to be rather impressive.This piece will be a summarization of my immediate reflections following the project.
This was the first large project I've done. All my previous projects have been small enough to live in a single file. It's also the first time I worked with multiple very different models for the same problem.Furthermore, the dataset didn't have a standard approach as is the case with many of the old stables such as the MNIST and Iris datasets. Another unforseen problem was that the dataset was so large that I ran into GPU memory issues which had to be addressed.This change in scope necessitated that I found a better way of organizing the code. While you can find the code on Github, I'd like to highlight a few thing that worked out well, and some that didn't.
In conclusion, while the scaling went okay everything considered, but there's still a lot of room for improvement.
One interesting challenge was how to decide on which model is the best. I'll eventually write a whole essay about discussing the different techniques for comparing, and evaluating performance of a model, and how to compare it to other models, but in summation, I ended up using a combination failure type analysis, and statistical hypothesis testing.With that said, however, this step was a significant bottleneck in the pipeline as each model took a couple of hours to train, so identifying what is wrong, and adjust that appropriately became very important.How best to do this, however, is something that I havne't figured out yet, and as it's not something that has been written much about it would seem that I'm not alone there. (If you know something about this, you're more than welcome to contact me)One potential solution, if you have the computing power, might be to use genetic, or machine learning algorithms to automatically come up with network architectures. I've yet to look much into this, but it looks interesting.
I'll talk a lot more about this in the detailed analysis, but if I had to name one thing that surprised me about the data, it'd be the inverse exponential relative importance of the features as seen on the graph below.Specifically, it surprised me how well the shallow network was able encode this information.
These were some of my immediate thoughts regarding the project.Once the paper has been graded, I'll follow up with a more in depth analysis of the models, and how they may be improved upon.