VOB to SPH Converter

Extract VOB DVD audio as NIST SPHERE speech data online

Drop files here. 1 GB maximum file size or Sign Up
to
Facebook Amazon Microsoft Tesla Nestle Walmart L'Oreal

DVD to Speech Corpus

Extract dialogue from VOB DVD content and package it as NIST SPHERE — ready for speech recognition training and evaluation.

Research-Grade Quality

DVD audio from VOB provides clean source material. SPH output preserves that quality for serious speech research applications.

Secure Files

VOB uploads are removed after conversion. SPH output is deleted within 24 hours — your research materials stay confidential.

How to convert VOB to SPH

1

Select files from Computer, Google Drive, Dropbox, URL or by dragging it on the page.

2

Choose sph or any other format you need as a result (more than 200 formats supported)

3

Let the file convert and you can download your sph file right afterwards

About formats

VOB (Video Object) is the primary container format used on DVD-Video discs, defined as part of the DVD specification developed by the DVD Forum. The format first appeared with the DVD standard finalized in September 1996 and has since been used on billions of DVD discs produced worldwide. VOB files are based on the MPEG-2 program stream format, containing multiplexed MPEG-2 video alongside audio in AC-3 (Dolby Digital), DTS, MPEG-1 Layer II, or LPCM formats. Beyond audio and video, VOB files also carry DVD subtitle streams as bitmap overlays, navigation data for menu interaction, and chapter point information. The files reside in the VIDEO_TS directory on a DVD disc, with naming conventions (VTS_01_1.VOB, etc.) reflecting the title and part structure of the content. Individual VOB files are limited to approximately 1 GB to accommodate the UDF file system requirements, with longer content spanning multiple files seamlessly. The format supports both NTSC (720x480) and PAL (720x576) video resolutions at bit rates up to 9.8 Mbps for combined audio and video. Integration of video, multi-track audio, subtitles, and navigation into a single program stream made VOB a complete solution for consumer movie delivery. While streaming and newer disc formats have supplanted DVD for new content, VOB remains hugely relevant for accessing the vast library of existing DVD content.
Developer: DVD Forum
Initial release: September 1996
SPH is the file extension for audio stored in the NIST SPHERE (SPeech HEader REsources) format, a standard created by the U.S. National Institute of Standards and Technology around 1990. Built for speech research, SPH files carry a 1024-byte ASCII header packed with metadata — database identifiers, channel counts, sample rates, byte ordering, and compression type — making every recording self-describing. The underlying audio is typically 16-bit linear PCM sampled at 16 kHz, though other configurations are permitted. Researchers at NIST, DARPA, and universities worldwide rely on SPH for distributing speech corpora such as TIMIT, Switchboard, and the LDC collections that underpin modern automatic speech recognition systems. A key advantage is that the human-readable header lets scripts parse recording metadata without binary decoding. The format's strict standardization also eliminates ambiguity when sharing datasets across institutions and platforms. Because SPH files store uncompressed PCM, they preserve full audio fidelity — critical when training acoustic models where even small artifacts can skew results.
Initial release: 1990

Frequently Asked Questions

Why convert VOB to SPH?

SPH is the NIST standard for speech research audio. DVD VOB dialogue becomes structured data for ASR training and linguistic analysis.

What frameworks read SPH?

Kaldi, HTK, Praat, and the NIST SPHERE toolkit all support SPH natively. It is the benchmark format for speech corpus distribution.

Does SPH preserve DVD quality?

SPH stores PCM without compression. Dialogue from DVD VOB files reaches your research tools at full quality for accurate analysis.

Can DVD subtitles help?

Subtitles in VOB are separate from audio. Convert audio to SPH for the speech signal — then pair it with transcription data separately.

Is batch processing available?

Upload multiple VOB chapters and batch-convert them to SPH. Build a speech corpus from an entire DVD efficiently.