Exploring better image captioning with grid features
Abstract Nowadays, Artificial Intelligence Generated Content (AIGC) has shown promising prospects in both computer vision and natural language processing communities.Meanwhile, as an essential aspect of AIGC, image to captions has received much more attention.Recent vision-language research is developing from the bulky region visual representations