Here at Xlpat Labs, as a part of R&D, we have focused on fine-tuning an OpenAI pre-trained model to generate coherent patent claims automatically. Patent claims language is an untouched area of research and a challenge in itself. We are figuring out language structures in claim text and leveraging its human explanations to meet our goal. The idea is to generate and train a text corpora that suits modern NLP advancements.
The second major challenge is the semantic text evaluation. The meaning of a given word such as ‘go’ or ‘get’ varies according to its context. Word2Vec and similar approaches do not address this enigma or the concurrence of different semantics for a given word.
The emergence of ElMO/ BERT/ GPT-2 demonstrated a way through the semantics gap. By including information about preceding and succeeding text in the vector we can encode the context of a given word or phrase.
This approach obtains much better results owing to the attention mechanism used in encoding not so clean Patent claims text.
Notably, GPT-2 has demonstrated the state of art results incoherent text generation. Which takes us a step closer to achieve over our second challenge.
GPT-2 is the successor to GPT which held to be one of the first pre-trained language models on Transformer architecture.
The transformer architecture takes a sentence (a sentence here means a sequence of 512 or fewer words), encodes this sentence using attention mechanism (refer to “attention is all you need” paper) and transforms it using a decoder.
This recipe is an excellent choice for someone whose taste buds like sequence-to-sequence modelling. But we were looking for language modelling in its whole, and we aim to predict a successor sentence given a reference sentence!
Well, the decoder part of the transformer alone can do this. All that we do using a decoder in a transformer is —we generate a new sequence of words referring to the sequence encoding provided by the encoder
word(x) = Decoder(word(x−1),encoding)
and if we take away one ingredient from the decoder, we can transform this dish into GPT: word(x) = Decoder(word(x−1))
Whoa! We have transformed the transformer to do what a language model is supposed to do. In short, GPT is the decoder part of a transformer.
After putting in 1.5 billion parameters and over 40 gigabytes of internet wisdom we have a benchmark language model in hand whose evolution was termed to be the ‘NLP’s ImageNet moment’ and which has mastered(hopefully) the dynamics of vast English corpus.
But this cake is still half-baked. On our way to achieving the aim, we fine-tuned this model for next sentence prediction leveraging its linguistic capabilities and adding a few task-specific layers on the top of it.
The four GPT-2 model sizes built by OpenAI released GPT-2 model in four sizes 117M, 345M, 762M, and 1.5B, in terms of the number of parameters in the neural network.
Base 117M built model suffixes us for now while we test next line generation/completion with GPT-2.
We choose to play around with a chunk of Google Patents Public Dataset on BigQuery leveraging SQL’s clarity and flexibility.
Followed by the addition of start and end tags, the transformation of text data into a compressed number format. A few parameters include a learning rate of 1e-4, temperature as 1.0, top_k as
40 and batch size as 1.
Have a look at the generated text within 100 steps of fine-tuning.
reference text = The climate change is making
generated text = “The climate change is making it harder for people to stay in their homes and to access affordable health care,” said CUNY’s chief scientific officer, Sarah Coughlin. She said the latest IPCC report, which was released on Tuesday, said climate change was not only harming the health of people in developing countries, it was also causing damage to the biodiversity of the oceans. It is hard to believe that climate change is causing this problem to be
even more widespread.
Not bad. Not bad. It does extremely good for what it was trained originally, that is to predict the next word. Ironically, we believe in leveraging capabilities until we break it. A few challenges here for future research are:
1. How to use a different dataset to validate and prevent overfitting? (Because what all comprises the original training dataset is wide and unknown.)
2. Just by visuals, it can be concluded that it does OVERFIT. Though we can overcome this by increasing training steps.
3. Trying out the interactive model for GPT-2.
We are on-boarded!