Standard contrastive language-image pre-training can neglect objects in visual scenes. ItemizedCLIP forces models to learn and attend to all described items, resulting in better visual representations.
Recent visual agents can score well while using image tools unfaithfully-e.g., cropping irrelevant regions or ignoring tool outputs. CodeV represents tools as executable Python code and trains with Tool-Aware Policy Optimization (TAPO), using process-level rewards on visual tool inputs and outputs to improve both accuracy and faithful tool use on search and broader multimodal benchmarks.
CLIPred is a framework that jointly optimizes the I-JEPA self-supervision and CLIP language supervision objectives for visual representation learning, outperforming either alone and achieving better zero-shot transfer than DINOv2+CLIP at lower training cost.
This paper introduces Restorative Step-Calibrated Diffusion (RSCD) for biomedical optical image restoration, improving reconstruction fidelity by adapting denoising dynamics to the characteristics of microscopy data.
SimCLIP is a generalized framework for CLIP fine-tuning that constructs minibatches containing clusters of similar image-text pairs to produce harder in-batch negatives, improving downstream performance over standard CLIP fine-tuning without hand-crafted hard negative captions.
This work proposes Masked Slice Diffusion for Super-Resolution (MSDSR), a strategy for volumetric biomedical super-resolution trained with only 2D supervision, enabling high-quality 3D reconstruction when fully paired 3D labels are scarce.
This study introduces Slide Pre-trained Transformers (SPT), a self-supervised framework for whole-slide representation learning that captures multiscale histologic structure to support downstream pathology tasks with limited manual annotation.
HiDisc is a self-supervised learning method that leverages the inherent patient-slide-patch hierarchy of biomedical microscopy to learn stronger visual representations without explicit negative mining.
OpenSRH is the first public dataset of clinical stimulated Raman histology images from brain tumor patients, released alongside benchmarks to accelerate machine learning research for intraoperative brain tumor diagnosis.
This paper develops a weakly supervised denoising approach for stimulated Raman histology, improving image quality in label-free optical microscopy of human brain tumor specimens.