Skip to content

Pipeline Presets

Below is an index of all available pipeline presets that are "baked in" to Taters. This list will be growing.

Tip: In the CLI, run python -m taters.pipelines.run_pipeline --list-presets or python -m taters.pipelines.run_pipeline --describe-preset <id> for the same info in your terminal.

Quick list

ID Title Tags
conversation_video Conversation video → transcripts + features audio, video, diarization, embeddings

Details

Conversation video → transcripts + features (conversation_video)

Extract audio, diarize, compute Whisper embeddings, then unify transcripts and get a whole boatload of text-based features/measures.

Tags: audio, video, diarization, embeddings
Version: 2
Authors: Ryan L. Boyd

Use cases - Video recordings of conversations - Interview studies - Focus groups - etc.

Inputs

Key Value
file type video files (e.g., mkv, mp4, etc.)

Requirements

Key Value
cpu True
gpu/cuda optional
ffmpeg True
Extras diarization, cuda, readability

Variables

Variable Default Description
archetypes_dict_path ./dictionaries/archetypes A folder or list of .CSV archetype dictionaries that you want to apply to the transcripts.
device auto Which device you would like to use for pytorch-heavy stuff (cpu
dictionaries_path ./dictionaries/liwc A folder or list full of LIWC-formatted dictionary files (.dicx, .dic, .csv) that you want to apply to the transcripts.
features_dir ./features/ The base directory where you would like your feature files to be saved.
num_speakers None For diarization - how many speakers would you like to try to cluster the data into?
overwrite_existing False Do not overwrite outputs unless true
transcripts_dir ./transcripts/ The base directory where you would like your individual transcripts to be saved.
whisper_model base Faster-Whisper model size

Notes Safe to re-run; steps short-circuit if outputs exist. See docs for tuning.

CLI example

python -m taters.pipelines.run_pipeline --root_dir video-data --file_type video  --preset conversation_video --workers 8 --var device=cuda