The one-click dress-up tool gains popularity as Lao Huang dons a fitted T-shirt, prompting the CEO to acknowledge his competition for the top position.

It’s hard to laugh, the latest virtual try-on tool has been ruined by netizens. Academician Huang, Musk, Altman, Smith and other big names had their clothes ripped off.

In front of me, Lao Huang took off his leather jacket and put on a candy bag:

Advertisement

Later, Altman showed off his colorful arms wearing CUCCI:

Then Lao Ma became Spider-Man:

Advertisement

Hollywood superstar Smith also changed his style:

But when it comes to the research itself, it is indeed serious research.

named IDM–VTONled by researchers from the Korea Advanced Institute of Science and Technology and OMNIOUS.AI The company's research team is based ondiffusion modelBuild.

At present, the official demo has been released, and everyone can try it out. The inference code has been open source.

In addition to what was shown at the beginning, researcher Hua Baofian also had a great time and put Lao Huang in an exclusive jersey. Its CEO quickly forwarded the joke:

I was replaced and couldn't compete with him for CEO.

Netizens watching the fun also lamented that after so many years, they finally no longer have to worry about their “handicap” (AI can help you do it).

Come and play~

We also quickly got started and experienced it. The entire demo page looks like this:

It is also very simple to operate.

First upload the character map, and you can manually or automatically select the area to be modified. Then, upload the clothes you want to change into.

Click Try-on directly, and the mask image and the image after the change will be automatically generated:

The handle of the automatically generated mask above was also selected, so the resulting left hand didn’t look good.

We manually select and apply it, and use our own images for all people and clothes.

What do you think of the effect this time?

Let’s show a wave of pictures of netizens’ finished products. DeepMind co-founder Suleiman wore a Smiling Mask Shoggoth co-branded T-shirt:

Many netizens even really want this dress.

Altmann was once again used as a model by netizens:

Of course, there are also times when things go wrong. For example, Musk is wearing a copycat CUCCI.

After seeing the effect, let’s look at how IDM–VTON is technically implemented.

Based on diffusion model

In terms of technology, IDM–VTON is based on the diffusion model and designs a sophisticated attention module to improve the consistency of clothing images and generate realistic virtual try-on images.

The model architecture roughly contains three parts:

TryonNet: Main UNet, processing character images.

IP-Adapter: Image cue adapter that encodes high-level semantics for clothing images.

GarmentNet: Parallel UNet, extracting low-level features of clothing.

When providing input to UNet, the researchers integrated noisy latent features of person images, segmentation masks, masked images, and Densepose data.

They also add detailed descriptions for the garments, such as (V) for “short-sleeved crew neck T-shirt.” This description is then used as an input prompt for GarmentNet (eg, “a photo of (V)”) and TryonNet (eg, “the model is wearing (V)”).

The intermediate features generated by TryonNet and GarmentNet are merged and then passed to the self-attention layer. The researchers used only the first half of the output from TryonNet. These outputs, together with the features of the text encoder and IP-Adapter, are fused through cross-attention layers.

Ultimately, the researchers fine-tuned the TryonNet and IP-Adapter modules and locked in other parts of the model.

During the experimental phase, they used the VITON-HD dataset to train the model and evaluated it on VITON-HD, DressCode, and the internally collected In-the-Wild dataset.

IDM–VTON outperforms previous methods both qualitatively and quantitatively.

IDM-VTON can generate realistic images and preserve fine-grained details of clothing.

For more details, interested family members can view the original paper.

Project link:

  • (1)https://idm-vton.github.io/?continueFlag=589fb545dbbb123446456b65a635d849

  • (2)https://arxiv.org/abs/2403.05139

  • (3)https://huggingface.co/spaces/yisol/IDM-VTON?continueFlag=589fb545dbbb123446456b65a635d849

Reference links:

  • (1)https://twitter.com/multimodalart/status/1782508538213933192

  • (2)https://twitter.com/fffiloni/status/1783158082849108434

  • (3)https://twitter.com/ClementDelangue/status/1783179067803533577

Advertisement