Object detection has been an important task in the computer vision domain in recent decades. The goal is to detect instances of objects, such as humans, cars, etc., in digital images. Hundreds of methods have been developed to answer a single question: What objects are where?
Traditional methods tried to answer this question by extracting hand-crafted features like edges and corners within the image. Most of these approaches used a sliding-window approach, meaning that they kept checking small parts of the image in different scales to see if any of these parts contained the object they were looking for. This was really time-consuming, and even the slightest change in the object shape, lightning, etc., could have caused the algorithm to miss it.
Then there came the deep learning era. With the increasing capability of computer hardware and the introduction of large-scale datasets, it became possible to exploit the advancement in the deep learning domain to develop a reliable and robust object detection algorithm that could work in an end-to-end manner.