Generating Landmark-based Manipulation Instructions from Image Pairs
Zarrieß S, Voigt H, Schlangen D, Sadler P (2022)
In: Proceedings of the 15th International Conference on Natural Language Generation. Shaikh S, Ferreira T, Stent A (Eds); Stroudsburg, PA: Association for Computational Linguistics: 203-211.
Konferenzbeitrag
| Veröffentlicht | Englisch
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Zarrieß, SinaUniBi;
Voigt, HenrikUniBi;
Schlangen, David;
Sadler, Philipp
Herausgeber*in
Shaikh, Samira;
Ferreira, Thiago;
Stent, Amanda
Einrichtung
Abstract / Bemerkung
We investigate the problem of generating landmark-based manipulation instructions (e.g. move the blue block so that it touches the red block on the right) from image pairs showing a before and an after state in a visual scene. We present a transformer model with difference attention heads that learns to attend to target and landmark objects in consecutive images via a difference key. Our model outperforms the state-of-the-art for instruction generation on the BLOCKSdataset and particularly improves the accuracy of generated target and landmark references. Furthermore, our model outperforms state-of-the-art models on a difference spotting dataset.
Erscheinungsjahr
2022
Titel des Konferenzbandes
Proceedings of the 15th International Conference on Natural Language Generation
Seite(n)
203-211
Konferenz
International Natural Language Generation Conference (INLG 2022)
Konferenzort
Waterville, Maine, USA
Konferenzdatum
2022-07-18 – 2022-07-22
ISBN
978-1-955917-57-5
Page URI
https://pub.uni-bielefeld.de/record/2967313
Zitieren
Zarrieß S, Voigt H, Schlangen D, Sadler P. Generating Landmark-based Manipulation Instructions from Image Pairs. In: Shaikh S, Ferreira T, Stent A, eds. Proceedings of the 15th International Conference on Natural Language Generation. Stroudsburg, PA: Association for Computational Linguistics; 2022: 203-211.
Zarrieß, S., Voigt, H., Schlangen, D., & Sadler, P. (2022). Generating Landmark-based Manipulation Instructions from Image Pairs. In S. Shaikh, T. Ferreira, & A. Stent (Eds.), Proceedings of the 15th International Conference on Natural Language Generation (pp. 203-211). Stroudsburg, PA: Association for Computational Linguistics.
Zarrieß, Sina, Voigt, Henrik, Schlangen, David, and Sadler, Philipp. 2022. “Generating Landmark-based Manipulation Instructions from Image Pairs”. In Proceedings of the 15th International Conference on Natural Language Generation, ed. Samira Shaikh, Thiago Ferreira, and Amanda Stent, 203-211. Stroudsburg, PA: Association for Computational Linguistics.
Zarrieß, S., Voigt, H., Schlangen, D., and Sadler, P. (2022). “Generating Landmark-based Manipulation Instructions from Image Pairs” in Proceedings of the 15th International Conference on Natural Language Generation, Shaikh, S., Ferreira, T., and Stent, A. eds. (Stroudsburg, PA: Association for Computational Linguistics), 203-211.
Zarrieß, S., et al., 2022. Generating Landmark-based Manipulation Instructions from Image Pairs. In S. Shaikh, T. Ferreira, & A. Stent, eds. Proceedings of the 15th International Conference on Natural Language Generation. Stroudsburg, PA: Association for Computational Linguistics, pp. 203-211.
S. Zarrieß, et al., “Generating Landmark-based Manipulation Instructions from Image Pairs”, Proceedings of the 15th International Conference on Natural Language Generation, S. Shaikh, T. Ferreira, and A. Stent, eds., Stroudsburg, PA: Association for Computational Linguistics, 2022, pp.203-211.
Zarrieß, S., Voigt, H., Schlangen, D., Sadler, P.: Generating Landmark-based Manipulation Instructions from Image Pairs. In: Shaikh, S., Ferreira, T., and Stent, A. (eds.) Proceedings of the 15th International Conference on Natural Language Generation. p. 203-211. Association for Computational Linguistics, Stroudsburg, PA (2022).
Zarrieß, Sina, Voigt, Henrik, Schlangen, David, and Sadler, Philipp. “Generating Landmark-based Manipulation Instructions from Image Pairs”. Proceedings of the 15th International Conference on Natural Language Generation. Ed. Samira Shaikh, Thiago Ferreira, and Amanda Stent. Stroudsburg, PA: Association for Computational Linguistics, 2022. 203-211.