This shared task is concerned with automatic translation between signed and spoken languages. The task requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation.