In the development of modern communication technology, although wideband speech coding provides high-fidelity speech transmis sion, its high bandwidth requirement limits its application in resource constrained environments. Thus, narrowband speech coding is still of great significance. Recently, end-to-end neural speech coding has made significant progress and demonstrated superior compression performance over traditional methods. However, existing methods are limited in re constructing details, especially in low birate environments. To address this, we introduce MSCACodec, a narrowband-based neural speech codec that achieves advanced performance at low bitrates. MSCACodec adopts a multi-scale residual and channel attention feature fusion method to se lectively focus on multi-scale information to enhance feature representa tion, solving the problem of inconsistent hierarchical information caused by multi-scale feature fusion. In addition, we also propose a Temporal Convolutional Gated Recurrent Unit (TCGRU) module, which combines temporal convolutional networks and gated recurrent units to enhance the reconstruction quality using global context and gating mechanisms. The experimental results show that, whether in subjective or objective evaluation, MSCACodec achieves higher quality reconstructed speech than Encodec and HiFiCodec at bitrates of 1.2kbps and 2.4kbps, and is even better than LyraV2 and Opus at 6kbps.
origin_samples1
Codec2-1.2kbps
LyraV2-3.2kbps
Encodec-1.2kbps
HiFiCodec-1.2kbps
MSCACodec-1.2kbps
Codec2-2.4kbps
LyraV2-6kbps
Encodec-2.4kbps
HiFiCodec-2.4kbps
MSCACodec-2.4kbps
origin_samples2
Codec2-1.2kbps
LyraV2-3.2kbps
Encodec-1.2kbps
HiFiCodec-1.2kbps
MSCACodec-1.2kbps
Codec2-2.4kbps
LyraV2-6kbps
Encodec-2.4kbps
HiFiCodec-2.4kbps
MSCACodec-2.4kbps
origin_samples3
Codec2-1.2kbps
LyraV2-3.2kbps
Encodec-1.2kbps
HiFiCodec-1.2kbps
MSCACodec-1.2kbps
Codec2-2.4kbps
LyraV2-6kbps
Encodec-2.4kbps
HiFiCodec-2.4kbps
MSCACodec-2.4kbps
origin_samples1
Codec2-1.2kbps
LyraV2-3.2kbps
Encodec-1.2kbps
HiFiCodec-1.2kbps
MSCACodec-1.2kbps
Codec2-2.4kbps
LyraV2-6kbps
Encodec-2.4kbps
HiFiCodec-2.4kbps
MSCACodec-2.4kbps
origin_samples2
Codec2-1.2kbps
LyraV2-3.2kbps
Encodec-1.2kbps
HiFiCodec-1.2kbps
MSCACodec-1.2kbps
Codec2-2.4kbps
LyraV2-6kbps
Encodec-2.4kbps
HiFiCodec-2.4kbps
MSCACodec-2.4kbps
origin_samples3
Codec2-1.2kbps
LyraV2-3.2kbps
Encodec-1.2kbps
HiFiCodec-1.2kbps
MSCACodec-1.2kbps
Codec2-2.4kbps
LyraV2-6kbps
Encodec-2.4kbps
HiFiCodec-2.4kbps
MSCACodec-2.4kbps