在Python中使用TensorFlow识别声音性别-BFW博客

在<a href='/tag/python.html'>Python</a>中使用TensorFlow识别声音性别

语音性别识别是一种可以通过处理语音信号来确定说话者性别类别的技术，在本教程中，我们将尝试使用Python中的TensorFlow框架按语音对性别进行分类。

性别识别在许多领域都可能有用，包括自动语音识别，可以帮助提高这些系统的性能。它也可以用于按性别对呼叫进行分类，或者您可以将其作为功能添加到虚拟助手中，以区分通话者的性别。

一、准备数据集

我们不会使用原始音频数据，因为音频样本可以是任意长度，并且在噪声方面可能会出现问题。结果，我们需要先进行某种特征提取，然后再将其输入神经网络。

特征提取始终是任何语音分析任务的第一阶段，基本上将任何长度的音频作为输入，并输出适合分类的固定长度向量。特征提取方法的一些例子是MFCC和Mel谱图。

我们将使用Mozilla的Common Voice Dataset（https://www.kaggle.com/mozillaorg/common-voice），它是用户在Common Voice网站上读取的语音数据集，其目的是实现对自动语音识别的培训和测试。但是，在查看了数据集后，实际上在流派列中标记了许多样本。因此，我们可以提取这些标记的样本并进行性别识别。

这是我为语音性别识别准备数据集的工作：

首先，我过滤了流派字段中标记的样本。

之后，我对数据集进行了平衡，以使女性样本的数量等于男性样本的数量，这将有助于神经网络不会针对特定性别过度拟合。

最后，我使用了梅尔频谱图提取技术从每个语音样本中获取了一个128长度矢量。

您可以在此存储库中（https://github.com/x4nth055/gender-recognition-by-voice）查看为本教程准备的数据集。

另外，如果您希望自己运行数据集，请运行以下（从文件.mp3到.npy文件）的脚本，如下：

import glob
import os
import pandas as pd
import numpy as np
import shutil
import librosa
from tqdm import tqdm


def extract_feature(file_name, **kwargs):
    """
    Extract feature from audio file `file_name`
        Features supported:
            - MFCC (mfcc)
            - Chroma (chroma)
            - MEL Spectrogram Frequency (mel)
            - Contrast (contrast)
            - Tonnetz (tonnetz)
        e.g:
        `features = extract_feature(path, mel=True, mfcc=True)`
    """
    mfcc = kwargs.get("mfcc")
    chroma = kwargs.get("chroma")
    mel = kwargs.get("mel")
    contrast = kwargs.get("contrast")
    tonnetz = kwargs.get("tonnetz")
    X, sample_rate = librosa.core.load(file_name)
    if chroma or contrast:
        stft = np.abs(librosa.stft(X))
    result = np.array([])
    if mfcc:
        mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
        result = np.hstack((result, mfccs))
    if chroma:
        chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
        result = np.hstack((result, chroma))
    if mel:
        mel = np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
        result = np.hstack((result, mel))
    if contrast:
        contrast = np.mean(librosa.feature.spectral_contrast(S=stft, sr=sample_rate).T,axis=0)
        result = np.hstack((result, contrast))
    if tonnetz:
        tonnetz = np.mean(librosa.feature.tonnetz(y=librosa.effects.harmonic(X), sr=sample_rate).T,axis=0)
        result = np.hstack((result, tonnetz))
    return result

dirname = "data"

if not os.path.isdir(dirname):
    os.mkdir(dirname)


csv_files = glob.glob("*.csv")

for j, csv_file in enumerate(csv_files):
    print("[+] Preprocessing", csv_file)
    df = pd.read_csv(csv_file)
    # only take filename and gender columns
    new_df = df[["filename", "gender"]]
    print("Previously:", len(new_df), "rows")
    # take only male & female genders (i.e droping NaNs & 'other' gender)
    new_df = new_df[np.logical_or(new_df['gender'] == 'female', new_df['gender'] == 'male')]
    print("Now:", len(new_df), "rows")
    new_csv_file = os.path.join(dirname, csv_file)
    # save new preprocessed CSV 
    new_df.to_csv(new_csv_file, index=False)
    # get the folder name
    folder_name, _ = csv_file.split(".")
    audio_files = glob.glob(f"{folder_name}/{folder_name}/*")
    all_audio_filenames = set(new_df["filename"])
    for i, audio_file in tqdm(list(enumerate(audio_files)), f"Extracting features of {folder_name}"):
        splited = os.path.split(audio_file)
        # audio_filename = os.path.join(os.path.split(splited[0])[-1], splited[-1])
        audio_filename = f"{os.path.split(splited[0])[-1]}/{splited[-1]}"
        # print("audio_filename:", audio_filename)
        if audio_filename in all_audio_filenames:
            # print("Copyying", audio_filename, "...")
            src_path = f"{folder_name}/{audio_filename}"
            target_path = f"{dirname}/{audio_filename}"
            #create that folder if it doesn't exist
            if not os.path.isdir(os.path.dirname(target_path)):
                os.mkdir(os.path.dirname(target_path))
            features = extract_feature(src_path, mel=True)
            target_filename = target_path.split(".")[0]
            np.save(target_filename, features)
            # shutil.copyfile(src_path, target_path)

好了，我们开始操作步骤吧

首先，请使用pip安装以下库：

pip3 install numpy pandas tqdm sklearn tensorflow pyaudio librosa

接下来，打开一个新的笔记本或bfwstudio并导入我们需要的模块：

import pandas as pd
import numpy as np
import os
import tqdm
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard, EarlyStopping
from sklearn.model_selection import train_test_split

现在要获取每个样本的性别，有一个CSV元数据文件（在此处检查），可将每个音频样本的文件路径链接到其适当的性别：

df = pd.read_csv("balanced-all.csv")
df.head()

看起来是这样的：

filename gender
0 data/cv-other-train/sample-069205.npy female
1 data/cv-valid-train/sample-063134.npy female
2 data/cv-other-train/sample-080873.npy female
3 data/cv-other-train/sample-105595.npy female
4 data/cv-valid-train/sample-144613.npy female

让我们看看数据帧如何结束：

df.tail()

输出：

filename gender
66933 data/cv-valid-train/sample-171098.npy male
66934 data/cv-other-train/sample-022864.npy male
66935 data/cv-valid-train/sample-080933.npy male
66936 data/cv-other-train/sample-012026.npy male
66937 data/cv-other-train/sample-013841.npy male

让我们看看每种性别的样本数量：

# get total samples
n_samples = len(df)
# get total male samples
n_male_samples = len(df[df['gender'] == 'male'])
# get total female samples
n_female_samples = len(df[df['gender'] == 'female'])
print("Total samples:", n_samples)
print("Total male samples:", n_male_samples)
print("Total female samples:", n_female_samples)

输出：

Total samples: 66938
Total male samples: 33469
Total female samples: 33469

完美，有大量平衡的音频样本，以下函数将所有文件加载到单个阵列中，因为它适合内存，所以我们不需要任何生成机制（因为每个音频样本只是提取的功能，大小为1KB）：

def load_data(vector_length=128):
    """A function to load gender recognition dataset from `data` folder
    After the second run, this will load from results/features.npy and results/labels.npy files
    as it is much faster!"""
    # make sure results folder exists
    if not os.path.isdir("results"):
        os.mkdir("resul...

点击查看剩余70%

打赏博主×

在Python中使用TensorFlow识别声音性别

网友评论0

Google关闭了中文搜索项目(Dragonfly)

SYN泛洪攻击原理及防御

你可能不知道的网络犯罪正在靠近你

科班与非科班程序员比较

bfw平台是做什么的?

bfwso it技术搜索的专业垂直搜索引擎

php中文分词插件phpanalysis、结巴、SCWS对比

技术部应急响应预案

chrome浏览器使用RecordRTC进行屏幕与摄像头混合录制

推荐几款html网页中录制摄像头和麦克风的录音录像js插件

{{item.title}}

何为BFWSOA框架

BFWSOA框架特性

BFWSOA框架程序流程图

MVCVPSCW七层架构

BFWSOA框架创建一个小应用

BFWSOA框架路由模式与Apache、Nginx配置

BFWSOA框架表单验证与提交

BFWSOA框架数据库操作

BFWSOA 缓存设置

BFWSOA模型简介

python有没有将python脚本与python运行环境一键打包成exe的代码？

nodejs如何执行浏览器中运行的js代码？

iframe中如何阻止其他域名网页的打开或跳转？

webrtc如何实现多人音频电话会议？

如何实现uni.connectSocket兼容web与小程序app端的websocket通讯？

webrtc如何浏览器中实现多人群音视频通话会议？

indexdb中的表结构与数据如何导出导入恢复？

php如何将mysql数据库中表结构导出sql建表语句？

chrome如何注入js捕获音频mp3地址并自动下载？

go或python如何搭建一个websocket的消息推送平台？