60行代码，从头开始构建GPT！最全实践指南来了

声明:本文来自于微信公众号新智元（ID:AI_era），作者:桃子，授权站长之家转载发布。

【新智元导读】GPT早已成为大模型时代的基础。国外一位开发者发布了一篇实践指南，仅用60行代码构建GPT。

60行代码，从头开始构建GPT?

最近，一位开发者做了一个实践指南，用Numpy代码从头开始实现GPT。

你还可以将 OpenAI发布的GPT-2模型权重加载到构建的GPT中，并生成一些文本。

话不多说，直接开始构建GPT。

什么是GPT?

GPT代表生成式预训练Transformer，是一种基于Transformer的神经网络结构。

- 生成式（Generative）:GPT生成文本。

- 预训练（Pre-trained）:GPT是根据书本、互联网等中的大量文本进行训练的。

- Transformer:GPT是一种仅用于解码器的Transformer神经网络。

大模型，如OpenAI的GPT-3、谷歌的LaMDA，以及Cohere的Command XLarge，背后都是GPT。它们的特别之处在于，1）非常大（拥有数十亿个参数)，2) 受过大量数据(数百GB的文本)的训练。

直白讲，GPT会在提示符下生成文本。

即便使用非常简单的API（输入=文本，输出=文本），一个训练有素的GPT也可以做一些非常棒的事情，比如写邮件，总结一本书，为Instagram发帖提供想法，给5岁的孩子解释黑洞，用SQL编写代码，甚至写遗嘱。

以上就是 GPT 及其功能的高级概述。让我们深入了解更多细节。

输入/输出

GPT定义输入和输出的格式大致如下所示:

defgpt（inputs:list[int]）->list[list[float]]:#inputshasshape[n_seq]#outputhasshape[n_seq，n_vocab]output=#beepboopneuralnetworkmagicreturnoutput

输入是由映射到文本中的token的一系列整数表示的一些文本:

#integersrepresenttokensinourtext，forexample:#text="notallheroeswearcapes":#tokens="not""all""heroes""wear""capes"inputs=[1，0，2，4，6]

Token是文本的子片段，使用分词器生成。我们可以使用词汇表将token映射到整数:

#theindexofatokeninthevocabrepresentstheintegeridforthattoken#i.e.theintegeridfor"heroes"wouldbe2，sincevocab[2]="heroes"vocab=["all"，"not"，"heroes"，"the"，"wear"，"."，"capes"]#apretendtokenizerthattokenizesonwhitespacetokenizer=WhitespaceTokenizer（vocab）#theencode()methodconvertsastr->list[int]ids=tokenizer.encode("notallheroeswear")#ids=[1，0，2，4]#wecanseewhattheactualtokensareviaourvocabmappingtokens=[tokenizer.vocab[i]foriinids]#tokens=["not"，"all"，"heroes"，"wear"]#thedecode()methodconvertsbackalist[int]->strtext=tokenizer.decode(ids)#text="notallheroeswear"

简而言之:

- 有一个字符串。

- 使用分词器将其分解成称为token的小块。

- 使用词汇表将这些token映射为整数。

在实践中，我们会使用更先进的分词方法，而不是简单地用空白来分割，比如字节对编码（BPE）或WordPiece，但原理是一样的:

vocab将字符串token映射为整数索引

encode方法，可以转换str -> list[int]

decode 方法，可以转换 list[int] -> str （[2]）

输出

输出是一个二维数组，其中 output[i][j] 是模型预测的概率，即 vocab[j] 处的token是下一个tokeninputs[i+1] 。例如:

vocab=["all"，"not"，"heroes"，"the"，"wear"，"."，"capes"]inputs=[1，0，2，4]#"not""all""heroes""wear"output=gpt（inputs）#["all"，"not"，"heroes"，"the"，"wear"，"."，"capes"]#output[0]=[0.750.10.00.150.00.00.0]#givenjust"not"，themodelpredictstheword"all"withthehighestprobability

#["all"， "not"， "heroes"， "the"， "wear"， "."， "capes"]# output[1] =[0.00.00.80.10.00.00.1]# given the sequence ["not"， "all"]， the model predicts the word "heroes" with the highest probability

#["all"， "not"， "heroes"， "the"， "wear"， "."， "capes"]# output[-1] = [0.00.00.00.10.00.050.85]# given the whole sequence ["not"， "all"， "heroes"， "wear"]， the model predicts the word "capes" with the highest probability

要获得整个序列的下