IndexReader.getTermFeqVector() -> DirectoryReader.getTermFreqVector() -> SegmentReader.getTermFreqVector() -> TermVectorsReader.get()

Term Vector 是从索引(index)到文档(document)到域(field)到词(term)的正向信息，有了词向量信息，就可以得到一篇文档包含那些词的信息。

词向量索引文件(tvx)

一个段(segment)包含 N 篇 document，此文件就有 N 项，每一项代表一篇 document。
每一项包含两部分信息：第一部分是词向量文档文件(.tvd)中此 doc 的偏移量，第二部分是词向量域文件(.tvf)中此文档的第一个域的偏移量。

词向量文档文件(tvd)

一个段(segment)包含 N 篇 document，此文件就有N项，每一项包含了此 document 的所有的 field 的信息。
每一项首先是此 doc 包含的 field 的个数 NumFields，然后是一个 NumFields 大小的数组，数组的每一项是域号。然后是一个 (NumFields - 1) 大小的数组，由前面我们知道，每篇 doc 的 第一个域 在 .tvf 中的偏移量在 .tvx 文件中保存，而其他(NumFields - 1)个域在 .tvf 中的偏移量就是第一个域的偏移量加上这(NumFields - 1)个数组的每一项的值。

词向量域文件(tvf)

此文件包含了此段中的所有的 field，并不对文档做区分，到底第几个域到第几个域是属于那篇文档，是由 .tvx 中的第一个域的偏移量以及 .tvd 中的(NumFields - 1)个域的偏移量来决定的。
对于每一个 field，首先是此域包含的词的个数 NumTerms，然后是一个8位的byte，最后一位是指定是否保存位置信息，倒数第二位是指定是否保存偏移量信息。然后是 NumTerms 个项的数组，每一项代表一个词(Term)，对于每一个词，由词的文本 TermText，词频 TermFreq (也即此词在此文档中出现的次数)，词的位置信息，词的偏移量信息。

TermVectorsReader.get()

public void get(int docNum, String field, TermVectorMapper mapper) throws IOException {
    if (tvx != null) {
        int fieldNumber = fieldInfos.fieldNumber(field);
        //We need to account for the FORMAT_SIZE at when seeking in the tvx
        //We don't need to do this in other seeks because we already have the
        // file pointer
        //that was written in another file
        seekTvx(docNum);
        //System.out.println("TVX Pointer: " + tvx.getFilePointer());
        long tvdPosition = tvx.readLong();
        tvd.seek(tvdPosition);
        int fieldCount = tvd.readVInt();
        //System.out.println("Num Fields: " + fieldCount);
        // There are only a few fields per document. We opt for a full scan
        // rather then requiring that they be ordered. We need to read through
        // all of the fields anyway to get to the tvf pointers.
        int number = 0;
        int found = -1;
        for (int i = 0; i < fieldCount; i++) {
            if (format >= FORMAT_VERSION)
                number = tvd.readVInt();
            else
                number += tvd.readVInt();
            if (number == fieldNumber)
                found = i;
        }
        // This field, although valid in the segment, was not found in this
        // document
        if (found != -1) {
            // Compute position in the tvf file
            long position;
            if (format >= FORMAT_VERSION2)
                position = tvx.readLong();
            else
                position = tvd.readVLong();
            for (int i = 1; i <= found; i++)
                position += tvd.readVLong();
            mapper.setDocumentNumber(docNum);
            readTermVector(field, position, mapper);
        } else {
            //System.out.println("Fieldable not found");
        }
    } else {
        //System.out.println("No tvx file");
    }
}

感谢：
http://www.cnblogs.com/forfuture1978/archive/2009/12/14/1623599.html