Pipeline Presets¶

Below is an index of all available pipeline presets that are "baked in" to Taters. This list will be growing.

Tip: In the CLI, run python -m taters.pipelines.run_pipeline --list-presets or python -m taters.pipelines.run_pipeline --describe-preset <id> for the same info in your terminal.

Quick list¶

ID	Title	Tags
`conversation_video`	Conversation video → transcripts + features	audio, video, diarization, embeddings

Details¶

Conversation video → transcripts + features (conversation_video)

Extract audio, diarize, compute Whisper embeddings, then unify transcripts and get a whole boatload of text-based features/measures.

Tags: audio, video, diarization, embeddings
Version: 2
Authors: Ryan L. Boyd

Use cases - Video recordings of conversations - Interview studies - Focus groups - etc.

Inputs

Key	Value
file type	video files (e.g., mkv, mp4, etc.)

Requirements

Key	Value
cpu	True
gpu/cuda	optional
ffmpeg	True
Extras	diarization, cuda, readability

Variables

Variable	Default	Description
`archetypes_dict_path`	`./dictionaries/archetypes`	A folder or list of .CSV archetype dictionaries that you want to apply to the transcripts.

`device`	`auto`	Which device you would like to use for pytorch-heavy stuff (cpu
`dictionaries_path`	`./dictionaries/liwc`	A folder or list full of LIWC-formatted dictionary files (.dicx, .dic, .csv) that you want to apply to the transcripts.

`features_dir`	`./features/`	The base directory where you would like your feature files to be saved.
`num_speakers`	`None`	For diarization - how many speakers would you like to try to cluster the data into?
`overwrite_existing`	`False`	Do not overwrite outputs unless true
`transcripts_dir`	`./transcripts/`	The base directory where you would like your individual transcripts to be saved.
`whisper_model`	`base`	Faster-Whisper model size

Notes Safe to re-run; steps short-circuit if outputs exist. See docs for tuning.

CLI example

python -m taters.pipelines.run_pipeline --root_dir video-data --file_type video  --preset conversation_video --workers 8 --var device=cuda