Text-to-image synthesis: Starting composite from the foreground content

被引：4

作者：

Zhang, Zhiqiang ^{[1
]}

Zhou, Jinjia ^{[1
]}

Yu, Wenxin ^{[2
]}

Jiang, Ning ^{[2
]}

机构：

[1] Hosei Univ, Grad Sch Sci & Engn, Tokyo, Japan

[2] Southwest Univ Sci & Technol, Sch Comp Sci & Technol, Mianyang, Sichuan, Peoples R China

来源：

INFORMATION SCIENCES | 2022年 / 607卷

关键词：

Text-to-image synthesis; Generative adversarial networks; Computer vision; Deep learning; ADVERSARIAL NETWORKS;

D O I：

10.1016/j.ins.2022.06.044

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, text-to-image synthesis has become a hot issue in computer vision and has been widely concerned. Many methods have achieved encouraging results in this field at present, but it is still a great challenge to improve the quality of the synthesized image further. In this paper, we propose a multi-stage synthesis method, which starts composite from the foreground content. The whole synthesis process is divided into three stages. The first stage generates the foreground results, and the third stage synthesizes the final image results. The second stage results include two situations: one is to continue to synthesize the foreground results; the other is to synthesize the image results with background information. Experiments demonstrate that the method of continuing to generate the foreground results in the second stage can achieve better results on the Caltech-UCSD Birds (CUB) and Oxford-102 datasets, while the way of synthesizing foreground results only in the first stage can obtain better performance on the Microsoft Common Objects in Context (MS COCO) dataset. Besides, our synthesized results on the three datasets are subjectively more realistic with better detail processing. It also outperforms most existing methods in quantitative comparison results, which demonstrates the effectiveness and superiority of our method. (C) 2022 Elsevier Inc. All rights reserved.

引用

页码：1265 / 1285

页数：21

共 48 条

[1]

Alec R., 2016, PROC ICLR C

[2] Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space [J].

Anh Nguyen ;

Clune, Jeff ;

Bengio, Yoshua ;

Dosovitskiy, Alexey ;

Yosinski, Jason .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3510-3520

[3]

[Anonymous], 2019, P WACV C

[4]

[Anonymous], 2019, P AAAI C

[5]

[Anonymous], 2017, P NIPS C

[6]

[Anonymous], 2021, INFORM SCIENCES

[7]

[Anonymous], 2020, PROC ICME C

[8]

[Anonymous], 2018, P CVPR C

[9]

[Anonymous], 2022, INFORM SCIENCES

[10]

[Anonymous], 2020, P AAAI C

← 1 2 3 4 5 →