当前位置: 代码迷 >> 综合 >> Kaldi 测试报错:utils/split_scp.pl: Refusing to split data because number of speakers 2 is less than the
  详细解决方案

Kaldi 测试报错:utils/split_scp.pl: Refusing to split data because number of speakers 2 is less than the

热度:55   发布时间:2023-12-15 05:24:55.0

新建了一个文件夹robin存放两个人的音频数据(共2x10条wav音频)做测试,运行./test_cos.sh ~/kaldi/egs/sre16/v2/robin/sub_TIMIT_test
在最后报错:sid/compute_vad_decision.sh: moving data_test/vad.scp to data_test/.backup utils/split_scp.pl: Refusing to split data because number of speakers 2 is less than the number of output .scp files 4
测试到底是如何测试的?至少需要4个人的数据?

再加入所有人(168)的数据,继续试试——还是报错:utils/split_scp.pl: Refusing to split data because number of speakers 2 is less than the number of output .scp files 4

可能是数据存放的位置有严格要求,就移动文件至~/kaldi/data/目录下。

依旧报错。
仔细按照报错的原因进入~/kaldi/egs/sre16/v2/steps去修改make_mfcc.sh里的nj参数:由nj=4改为nj=1,依旧报错:

steps/make_mfcc.sh: Succeeded creating MFCC features for data_test
sid/compute_vad_decision.sh --nj 4 --cmd run.pl data_test exp/make_vad_test mfcc_test
utils/split_scp.pl: Refusing to split data because number of speakers 2 is less than the number of output .scp files 4

后来得知,脚本make_mfcc.sh里面默认的参数可以不用去管,直接去修改当时./extract_xvectors.sh ~/kaldi/data/TIMIT_test exp/xvector_test操作中的extract_xvectors.sh即可:

#!/bin/bash
if [ $# != 2 ]; thenecho "Usage: $0 <data-path>"echo " $0 ~/kaldi/data/TIMIT_test exp/xvector_TIMIT_test"exit 1;
fidata_set=$1
xvector_set=$2
nnet_dir=exp/xvector_nnet_1a
. ./cmd.sh
. ./path.sh
set -e
mfccdir=`pwd`/mfcc
vaddir=`pwd`/mfccrm -rf exp/make_mfcc_test mfcc_test exp/make_vad_test
mkdir -p exp/make_vad_test exp/make_mfcc_test mfcc_test $xvector_set
#produce spk2utt utt2spk wav.scp
cj_script/data_test_prep.sh $data_set data_test#produce trials
python produce_trials.py data_test/utt2spk data_test/trials
cp data_test/trials $xvector_set/trials
utils/fix_data_dir.sh data_teststeps/make_mfcc.sh --mfcc-config conf/mfcc.conf --nj 1 --cmd "$train_cmd" \data_test exp/make_mfcc_test mfcc_testsid/compute_vad_decision.sh --nj 1 --cmd "$train_cmd" \data_test exp/make_vad_test mfcc_test# extract xvectorssid/nnet3/xvector/extract_xvectors.sh --cmd "$train_cmd --mem 60G" --nj 1 --use-gpu false \$nnet_dir data_test \$xvector_set

将其中的三处nj 4改为nj 1
再运行./extract_xvectors.sh ~/kaldi/data/TIMIT_test exp/xvector_test,就不会再报错:

found data files 1680
Preparing data_test ......
cj_script/data_test_prep.sh: data preparation succeeded
fix_data_dir.sh: kept 20 utterances out of 1680
fix_data_dir.sh: old files are kept in data_test/.backup
steps/make_mfcc.sh --mfcc-config conf/mfcc.conf --nj 1 --cmd run.pl data_test exp/make_mfcc_test mfcc_test
steps/make_mfcc.sh: moving data_test/feats.scp to data_test/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data_test
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for data_test
sid/compute_vad_decision.sh --nj 1 --cmd run.pl data_test exp/make_vad_test mfcc_test
Created VAD output for data_test
sid/nnet3/xvector/extract_xvectors.sh --cmd run.pl --mem 60G --nj 1 --use-gpu false exp/xvector_nnet_1a data_test exp/xvector_test
sid/nnet3/xvector/extract_xvectors.sh: using exp/xvector_nnet_1a/extract.config to extract xvectors
sid/nnet3/xvector/extract_xvectors.sh: extracting xvectors for data_test
sid/nnet3/xvector/extract_xvectors.sh: extracting xvectors from nnet
sid/nnet3/xvector/extract_xvectors.sh: combining xvectors across jobs
sid/nnet3/xvector/extract_xvectors.sh: computing mean of xvectors for each speaker
  相关解决方案