COLOSSEO: wav file format

wav文件格式分析详解

Offset  Size  Name             Description

The canonical WAVE format starts with the RIFF header:

0         4   ChunkID          Contains the letters "RIFF" in ASCII form
                               (0x52494646 big-endian form).
4         4   ChunkSize        36 + SubChunk2Size, or more precisely:
                               4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)
                               This is the size of the rest of the chunk 
                               following this number.  This is the size of the 
                               entire file in bytes minus 8 bytes for the
                               two fields not included in this count:
                               ChunkID and ChunkSize.
8         4   Format           Contains the letters "WAVE"
                               (0x57415645 big-endian form).

The "WAVE" format consists of two subchunks: "fmt " and "data":
The "fmt " subchunk describes the sound data's format:

12        4   Subchunk1ID      Contains the letters "fmt "
                               (0x666d7420 big-endian form).
16        4   Subchunk1Size    16 for PCM.  This is the size of the
                               rest of the Subchunk which follows this number.
20        2   AudioFormat      PCM = 1 (i.e. Linear quantization)
                               Values other than 1 indicate some 
                               form of compression.
22        2   NumChannels      Mono = 1, Stereo = 2, etc.
24        4   SampleRate       8000, 44100, etc.
28        4   ByteRate         == SampleRate * NumChannels * BitsPerSample/8
32        2   BlockAlign       == NumChannels * BitsPerSample/8
                               The number of bytes for one sample including
                               all channels. I wonder what happens when
                               this number isn't an integer?
34        2   BitsPerSample    8 bits = 8, 16 bits = 16, etc.
          2   ExtraParamSize   if PCM, then doesn't exist
          X   ExtraParams      space for extra parameters

The "data" subchunk contains the size of the data and the actual sound:

36        4   Subchunk2ID      Contains the letters "data"
                               (0x64617461 big-endian form).
40        4   Subchunk2Size    == NumSamples * NumChannels * BitsPerSample/8
                               This is the number of bytes in the data.
                               You can also think of this as the size
                               of the read of the subchunk following this 
                               number.
44        *   Data             The actual sound data.

Notes:

The default byte ordering assumed for WAVE data files is little-endian. Files written using the big-endian byte ordering scheme have the identifier RIFX instead of RIFF.

The sample data must end on an even byte boundary. Whatever that means.

8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767.

There may be additional subchunks in a Wave data stream. If so, each will have a char[4] SubChunkID, and unsigned long SubChunkSize, and SubChunkSize amount of data.

RIFF stands for Resource Interchange File Format.

General discussion of RIFF files:

Multimedia applications require the storage and management of a wide variety of data, including bitmaps, audio data, video data, and peripheral device control information. RIFF provides a way to store all these varied types of data. The type of data a RIFF file contains is indicated by the file extension. Examples of data that may be stored in RIFF files are:

Audio/visual interleaved data (.AVI)

Waveform data (.WAV)

Bitmapped data (.RDI)

MIDI information (.RMI)

Color palette (.PAL)

Multimedia movie (.RMN)

Animated cursor (.ANI)

A bundle of other RIFF files (.BND)

NOTE: At this point, AVI files are the only type of RIFF files that have been fully implemented using the current RIFF specification. Although WAV files have been implemented, these files are very simple, and their developers typically use an older specification in constructing them.
For more info see http://www.ora.com/centers/gff/formats/micriff/index.htm

什么是WAV和PCM？

WAV：wav是一种无损的音频文件格式，WAV符合 PIFF(Resource Interchange File Format)规范。所有的WAV都有一个文件头，这个文件头音频流的编码参数。WAV对音频流的编码没有硬性规定，除了PCM之外，还有几乎所有支持ACM规范的编码都可以为WAV的音频流进行编码。
PCM:PCM（Pulse Code Modulation----脉码调制录音)。所谓PCM录音就是将声音等模拟信号变成符号化的脉冲列，再予以记录。PCM信号是由[1]、[0]等符号构成的数字信号，而未经过任何编码和压缩处理。与模拟信号比，它不易受传送系统的杂波及失真的影响。动态范围宽，可得到音质相当好的影响效果。
简单来说：wav是一种无损的音频文件格式，pcm是没有压缩的编码方式。

WAV和PCM的关系

WAV可以使用多种音频编码来压缩其音频流，不过我们常见的都是音频流被PCM编码处理的WAV，但这不表示WAV只能使用PCM编码，MP3编码同样也可以运用在WAV中，和AVI一样，只要安装好了相应的Decode，就可以欣赏这些WAV了。在Windows平台下，基于PCM编码的WAV是被支持得最好的音频格式，所有音频软件都能完美支持，由于本身可以达到较高的音质的要求，因此，WAV也是音乐编辑创作的首选格式，适合保存音乐素材。因此，基于PCM编码的WAV被作为了一种中介的格式，常常使用在其他编码的相互转换之中，例如MP3转换成WMA。
简单来说：pcm是无损wav文件中音频数据的一种编码方式，但wav还可以用其它方式编码。

文／张明云（简书作者）
原文链接：http://www.jianshu.com/p/1d1f893e53e9
著作权归作者所有，转载请联系作者获得授权，并标注“简书作者”。

PCM的每个样本值包含在一个整数i中，i的长度为容纳指定样本长度所需的最小字节数。

首先存储低有效字节，表示样本幅度的位放在i的高有效位上，剩下的位置为0，这样8位和16位的PCM波形样本的数据格式如下所示。

样本大小数据格式最小值最大值

8位PCM unsigned int 0 225

16位PCM int -32767 32767

一、综述
WAVE文件作为多媒体中使用的声波文件格式之一，它是以RIFF格式为标准的。
RIFF是英文Resource Interchange File Format的缩写，每个WAVE文件的头四个
字节便是“RIFF”。
WAVE文件是由若干个Chunk组成的。按照在文件中的出现位置包括：RIFF WAVE
Chunk, Format Chunk, Fact Chunk(可选), Data Chunk。具体见下图：

其中除了Fact Chunk外，其他三个Chunk是必须的。每个Chunk有各自的ID，位
于Chunk最开始位置，作为标示，而且均为4个字节。并且紧跟在ID后面的是Chunk大
小（去除ID和Size所占的字节数后剩下的其他字节数目），4个字节表示，低字节
表示数值低位，高字节表示数值高位。下面具体介绍各个Chunk内容。
PS：
所有数值表示均为低字节表示低位，高字节表示高位。

    以'FIFF'作为标示，然后紧跟着为size字段，该size是整个wav文件大小减去ID
和Size所占用的字节数，即FileLen - 8 = Size。然后是Type字段，为'WAVE'，表
示是wav文件。
    结构定义如下：
struct RIFF_HEADER
{
  char szRiffID[4];  // 'R','I','F','F'
  DWORD dwRiffSize;
  char szRiffFormat[4]; // 'W','A','V','E'
};

    以'fmt '作为标示。一般情况下Size为16，此时最后附加信息没有；如果为18
则最后多了2个字节的附加信息。主要由一些软件制成的wav格式中含有该2个字节的
附加信息。
    结构定义如下：
struct WAVE_FORMAT
{
  WORD wFormatTag;
  WORD wChannels;
  DWORD dwSamplesPerSec;
  DWORD dwAvgBytesPerSec;
  WORD wBlockAlign;
  WORD wBitsPerSample;
};
struct FMT_BLOCK
{
  char  szFmtID[4]; // 'f','m','t',' '
  DWORD  dwFmtSize;
  WAVE_FORMAT wavFormat;
};

    Fact Chunk是可选字段，一般当wav文件由某些软件转化而成，则包含该Chunk。
    结构定义如下：
struct FACT_BLOCK
{
  char  szFactID[4]; // 'f','a','c','t'
  DWORD  dwFactSize;
};

    Data Chunk是真正保存wav数据的地方，以'data'作为该Chunk的标示。然后是
数据的大小。紧接着就是wav数据。根据Format Chunk中的声道数以及采样bit数，
wav数据的bit位置可以分成以下几种形式：
    ---------------------------------------------------------------------
    |   单声道 |    取样1    |    取样2    |    取样3    |    取样4    |
    |           |--------------------------------------------------------
    | 8bit量化 |    声道0    |    声道0    |    声道0    |    声道0    |
    ---------------------------------------------------------------------
    |   双声道 |          取样1            |           取样2           |
    |           |--------------------------------------------------------
    | 8bit量化 | 声道0(左) | 声道1(右) | 声道0(左) | 声道1(右) |
    ---------------------------------------------------------------------
    |           |          取样1            |           取样2           |
    |   单声道 |--------------------------------------------------------
    | 16bit量化 |    声道0    | 声道0      |    声道0    | 声道0      |
    |           | (低位字节) | (高位字节) | (低位字节) | (高位字节) |
    ---------------------------------------------------------------------
    |           |                         取样1                         |
    |   双声道 |--------------------------------------------------------
    | 16bit量化 | 声道0(左) | 声道0(左) | 声道1(右) | 声道1(右) |
    |           | (低位字节) | (高位字节) | (低位字节) | (高位字节) |
    ---------------------------------------------------------------------
                         图6 wav数据bit位置安排方式

    Data Chunk头结构定义如下：
    struct DATA_BLOCK
{
  char szDataID[4]; // 'd','a','t','a'
  DWORD dwDataSize;
};

三、小结
因此，根据上述结构定义以及格式介绍，很容易编写相应的wav格式解析代码。
这里具体的代码就不给出了。

COLOSSEO

Apr 29, 2016

wav file format

Notes:

General discussion of RIFF files:

什么是WAV和PCM？

WAV和PCM的关系

No comments:

Post a Comment