ZeroCLIP — Text-Free Deterministic Image Conditioning

Basis Decomposition CLIP Conditioning

CLIP encodes a vocabulary of ~1024 concepts once, offline, into a static anchor library. At runtime, a seed tuple is hashed and fed into coherent noise to produce weights over this library. The conditioning vector is a weighted sum of anchors, projected onto the unit sphere.

1 Seed (concept, style, mood, world)

2 pack_seed() → 16 bytes → FNV-1a → 64-bit hash

3 coherent_value() × N_anchors → raw weights

4 softmax() → normalized weights [N]

5 weights @ anchors → conditioning [D]

6 L2-normalize → unit sphere [D]

Nearby seeds produce similar weight distributions — coherent noise guarantees smooth visual transitions across the semantic space.

Anchor Weight Distribution

concept_id1000

style_id500

Top 10 anchor weights (softmax-normalized). Move sliders to see smooth transitions.

Latent Coordinate MLP

A tiny neural network (~50k parameters) learns to map 3D coordinates directly to CLIP embedding vectors. Trained offline on CLIP-encoded vocabulary, the MLP becomes the entire algorithm. At inference, the text encoder is gone.

1 Seed → pack_seed() → FNV-1a → 64-bit hash

2 child_hash(×3) → (cx, cy, cz) ∈ [0,1]³

3 MLP: [3] → 256 → 256 → 256 → [D]

4 L2Normalize → unit sphere [D]

The MLP learns a continuous function over the CLIP manifold — it can generalize to coordinates that were never explicitly trained.

3D Coordinate Space

concept axis (x)0.50

style axis (y)0.50

mood axis (z)0.50

3D coordinate mapped to MLP input. Dot shows current position in the learned latent space.

Pure Entropy Conditioning

Conditioning vectors that have never been text and never will be. A seed is hashed and Box-Muller transformed into D normal samples, then L2-normalized onto the unit hypersphere. In guided mode, PCA projection keeps samples near the CLIP manifold.

1 Seed (u64) → FNV-1a → base hash

2 child_hash(×D) → Box-Muller → D normal samples

3a Pure: L2-normalize → uniform point on S⊃(D-1)

3b Guided: PCA project → mean + components⊃T · z → L2-normalize

These conditioning vectors exist only as positions in latent space, never expressible as text — yet the diffusion model responds to them.

Hypersphere Sampling (2D Projection)

blend (coherent entropy)0.50

Pure = full sphere. Guided = near manifold (green region). Star = region anchor.

Self-Bootstrapped Prior Anchors

The most radical variant: no text encoder at any stage. The diffusion model probes random conditioning vectors, measures which produce coherent images, refines them via gradient ascent, and clusters the results into a diverse anchor library. The model discovers its own semantic basis.

P1 Probe: N random vectors → partial denoise → coherence score

P2 Refine: Riemannian gradient ascent on latent sharpness

P3 Cluster: cosine k-means → diverse anchor representatives

RT Runtime: identical to Option A (weighted sum over anchors)

The anchors have no names. Some may correspond to known concepts; others activate visual patterns that language has no word for.

Bootstrap Pipeline

Random vectors scored by coherence. Brighter = higher score. Top candidates selected.

Core Concepts

The foundations shared by all four ZeroCLIP variants.

The Seed

A 4-tuple of integers: (concept_id, style_id, mood_salt, world_seed). Packed into 16 bytes. The entire identity of a conditioning vector fits in less space than a single pixel.

concept_id and style_id are u16 (0–65535). mood_salt is u16. world_seed is u64.

The CONDITIONING Format

ComfyUI expects [(tensor[1, 77, D], dict)]. The conditioning vector [D] is repeat-padded across 77 token positions. For SDXL, the dict carries pooled_output.

D=768 for SD1.x, D=1024 for SD2.x, D=2048 for SDXL sequence embeddings.

Determinism Guarantee

FNV-1a is a pure function: same bytes in, same hash out, always. No Math.random(), no Date.now(), no floating-point order dependence. The entire pipeline is a composition of pure functions.

Same seed = byte-identical conditioning = identical image, on any machine, in any session, forever.

Seed Explorer

Enter seed values and see the packed bytes and FNV-1a hash output.

concept_id (u16)

style_id (u16)

mood_salt (u16)

world_seed (u64)

Packed 16 Bytes (hex)

FNV-1a 64-bit Hash

ComfyUI Node Pack

20 nodes across 5 categories. Drop into ComfyUI/custom_nodes/ZeroClip-nodes/ and restart. Zero pip dependencies.

Node	Variant	Inputs	Output	Purpose
ZeroClip Seed Pack	Shared	concept, style, mood, world	ZEROCLIP_SEED	Pack 4-tuple seed
ZeroClip Seed From Random	Shared	seed (INT)	ZEROCLIP_SEED	Derive 4-tuple from single int
ZeroClip Empty Conditioning	Shared	dimension, sdxl	CONDITIONING	Zero vector for negative input
ZeroClip Conditioning Blend	Shared	cond_a, cond_b, weight	CONDITIONING	Weighted interpolation
ZeroClip Conditioning Info	Shared	conditioning	STRING	Debug: shape, norm, stats
ZeroClip-A Load Anchors	A	anchors_file	ANCHORS	Load .npy anchor library
ZeroClip-A Conditioning	A	anchors, seed	CONDITIONING	Anchor weighted sum
ZeroClip-A Conditioning (SDXL)	A	anchors_seq, anchors_pooled, seed	CONDITIONING	SDXL dual-anchor
ZeroClip-A Batch	A	anchors, concept range, batch	CONDITIONING	Sweep concept_id
ZeroClip-B Load MLP	B	model_file	MODEL	Load .pt checkpoint
ZeroClip-B Conditioning	B	model, seed	CONDITIONING	MLP forward pass
ZeroClip-B Conditioning (SDXL)	B	model_seq, model_pooled, seed	CONDITIONING	SDXL dual-MLP
ZeroClip-C Load Projection	C	projection_file	PROJECTION	Load .npz PCA matrix
ZeroClip-C Conditioning	C	seed, mode, dim, (projection)	CONDITIONING	Entropy sampling
ZeroClip-C Coherent Entropy	C	seed, region, blend, mode, dim	CONDITIONING	Regional coherence blend
ZeroClip-C Conditioning (SDXL)	C	seed, mode, dims, (projections)	CONDITIONING	SDXL entropy
ZeroClip-D Load Anchors	D	anchors_file	ANCHORS	Load bootstrap .npy
ZeroClip-D Conditioning	D	anchors, seed	CONDITIONING	Bootstrap anchor sum
ZeroClip-D Conditioning (SDXL)	D	anchors_seq, anchors_pooled, seed	CONDITIONING	SDXL bootstrap
ZeroClip-D Batch	D	anchors, concept range, batch	CONDITIONING	Sweep concept_id

Minimal Workflows

Variant A

ZeroClip Seed Pack (1000, 500, 0, 42)

↓

ZeroClip-A Load Anchors → ZeroClip-A Conditioning

↓ positive

KSampler (+ Empty Conditioning as negative)

↓

VAEDecode → SaveImage

Variant B

ZeroClip Seed Pack (1000, 500, 0, 42)

↓

ZeroClip-B Load MLP → ZeroClip-B Conditioning

↓ positive

KSampler (+ Empty Conditioning as negative)

↓

VAEDecode → SaveImage

Variant C (Pure)

ZeroClip-C Conditioning (seed=42, pure, dim=768)

↓ positive (zero setup required)

KSampler (+ Empty Conditioning as negative)

↓

VAEDecode → SaveImage

Variant D

ZeroClip Seed Pack (1000, 500, 0, 42)

↓

ZeroClip-D Load Anchors → ZeroClip-D Conditioning

↓ positive

KSampler (+ Empty Conditioning as negative)

↓

VAEDecode → SaveImage

Getting Started

Install

Copy ZeroClip-nodes/ into ComfyUI/custom_nodes/. Restart ComfyUI. 20 nodes appear under the ZeroClip category. No pip install needed.

Build Artifacts

Run the build scripts for your chosen variant. Place output files in ComfyUI/models/zeroclip/. Variant C pure mode needs no build at all.

Wire Nodes

Connect Seed Pack or Seed From Random to a Conditioner node. Wire the CONDITIONING output to KSampler's positive input. Use Empty Conditioning for negative.