For this task, the relation network was combined with two other types of neural nets: one for recognizing objects in the image, and one for interpreting the question. (sciencemag.org)
For the computer vision side, researchers train their systems on a massive dataset of images, so they learn to identify objects in images. (sciencedaily.com)
They can identify cars and thousands of other objects in images with similar accuracy and in the past three years have come to dominate machine learning competitions. (scientificamerican.com)