MisoYuri's Deck :: OpenSEED: A Simple Framework for Open-Vocabulary Segmentation and Detection

OpenSEED: A Simple Framework for Open-Vocabulary Segmentation and Detection

category AI 2024. 11. 6. 02:21 by MisoYuri

Summary

Contributions
- can jointly learn from detection and segmentation data towards an open-vocabulary model for both tasks.
- locate the discrepancies in two tasks/datasets and propose separate techniques including shared semantic space, decoupled decoding, and conditioned mask assistance to mitigate the issues.

Method

대부분 MaskDINO와 비슷하다.
- DINO와 동일하게 two-stage로 이루어져 있다.
  - Visual backbone
  - Encoder → Feature
  - Encdoer’s feature selection(two-stage manner)
  - decoder → Mask header, BBox header
다만, MaskDINO의 경우에는 language 성분이 들어가지 않았다는 주요 차이점이 존재한다.

Language-guided foreground query selection

Decoder contains a limited number of foreground queries (a few hundred typically)
- making it hardly handle all possible concepts in the image.
Similarity(encoder feature, text feature) score를 기준으로 two stage query와 encoder feature를 상위 top-k개를 뽑아 사용한다.

Bridge Data Gap: Conditioned Mask Decoding

The ultimate goal is to bridge the data gap by using a single loss function to train multiple tasks
Detection dataset의 경우에는 coarse location(bbox)와 class 정보만 가지고 있다.

Experimental Results

'AI' 카테고리의 다른 글

MASKED FREQUENCY MODELING FOR SELF-SUPERVISED VISUAL PRE-TRAINING (0)	2024.11.06
Masked Autoencoders are Secretly Efficient Learners (0)	2024.11.06
Morphing Tokens Draw Strong Masked Image Models (0)	2024.11.05
DAT: Vision Transformer with Deformable Attention (0)	2023.09.19
Deformable DETR: Deformable Transformers for End-to-End Object Detection (0)	2023.09.19

댓글 , 엮인글

NOTICE

전체 보기

MORE+

분류 전체보기 (40)

최근 글
최근 댓글

Trackback

TAG

MORE+

ARCHIVE

CALENDAR

LINK

오늘

어제

전체

티스토리툴바