The one-click dress-up tool gains popularity as Lao Huang dons a fitted T-shirt, prompting the CEO to acknowledge his competition for the top position.

It’s hard to laugh, the latest virtual try-on tool has been ruined by netizens. Academician Huang, Musk, Altman, Smith and other big names had their clothes ripped off.

In front of me, Lao Huang took off his leather jacket and put on a candy bag:

Later, Altman showed off his colorful arms wearing CUCCI:

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

Then Lao Ma became Spider-Man:

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

Hollywood superstar Smith also changed his style:

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

But when it comes to the research itself, it is indeed serious research.

named IDM–VTONled by researchers from the Korea Advanced Institute of Science and Technology and OMNIOUS.AI The company's research team is based ondiffusion modelBuild.

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

At present, the official demo has been released, and everyone can try it out. The inference code has been open source.

In addition to what was shown at the beginning, researcher Hua Baofian also had a great time and put Lao Huang in an exclusive jersey. Its CEO quickly forwarded the joke:

I was replaced and couldn't compete with him for CEO.

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

Netizens watching the fun also lamented that after so many years, they finally no longer have to worry about their “handicap” (AI can help you do it).

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

Come and play~

We also quickly got started and experienced it. The entire demo page looks like this:

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

It is also very simple to operate.

First upload the character map, and you can manually or automatically select the area to be modified. Then, upload the clothes you want to change into.

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

Click Try-on directly, and the mask image and the image after the change will be automatically generated:

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

The handle of the automatically generated mask above was also selected, so the resulting left hand didn’t look good.

We manually select and apply it, and use our own images for all people and clothes.

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

What do you think of the effect this time?

Let’s show a wave of pictures of netizens’ finished products. DeepMind co-founder Suleiman wore a Smiling Mask Shoggoth co-branded T-shirt:

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

Many netizens even really want this dress.

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

Altmann was once again used as a model by netizens:

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

Of course, there are also times when things go wrong. For example, Musk is wearing a copycat CUCCI.

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

After seeing the effect, let’s look at how IDM–VTON is technically implemented.

Based on diffusion model

In terms of technology, IDM–VTON is based on the diffusion model and designs a sophisticated attention module to improve the consistency of clothing images and generate realistic virtual try-on images.

The model architecture roughly contains three parts:

TryonNet: Main UNet, processing character images.

IP-Adapter: Image cue adapter that encodes high-level semantics for clothing images.

GarmentNet: Parallel UNet, extracting low-level features of clothing.

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

When providing input to UNet, the researchers integrated noisy latent features of person images, segmentation masks, masked images, and Densepose data.

They also add detailed descriptions for the garments, such as (V) for “short-sleeved crew neck T-shirt.” This description is then used as an input prompt for GarmentNet (eg, “a photo of (V)”) and TryonNet (eg, “the model is wearing (V)”).

The intermediate features generated by TryonNet and GarmentNet are merged and then passed to the self-attention layer. The researchers used only the first half of the output from TryonNet. These outputs, together with the features of the text encoder and IP-Adapter, are fused through cross-attention layers.

Ultimately, the researchers fine-tuned the TryonNet and IP-Adapter modules and locked in other parts of the model.

During the experimental phase, they used the VITON-HD dataset to train the model and evaluated it on VITON-HD, DressCode, and the internally collected In-the-Wild dataset.

IDM–VTON outperforms previous methods both qualitatively and quantitatively.

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

IDM-VTON can generate realistic images and preserve fine-grained details of clothing.

一键换装神器爆火，老黄换上抱抱脸 T 恤，CEO 本人：我被替代了，和他争 CEO 职位争不过

For more details, interested family members can view the original paper.

Project link:

(1)https://idm-vton.github.io/?continueFlag=589fb545dbbb123446456b65a635d849
(2)https://arxiv.org/abs/2403.05139
(3)https://huggingface.co/spaces/yisol/IDM-VTON?continueFlag=589fb545dbbb123446456b65a635d849

Reference links:

(1)https://twitter.com/multimodalart/status/1782508538213933192
(2)https://twitter.com/fffiloni/status/1783158082849108434
(3)https://twitter.com/ClementDelangue/status/1783179067803533577