Skip to main content

Galactica : A Major Breakthrough in AI

Galactica, the language model for scientific literature released by Meta this week is fascinating. Galactica has the following capabilities:

  • Generating Lecture Notes on any scientific topic!
  • Generating a Wikipedia Article !! (Not retrieving. Generating)
  • Developing a JuPyter Notebook for a topic !!! 
  • Generating a Literature Review Article !!!! (With Complete Structure, Equations, Citations, and References).


The best part is it is not only released as a paper but a live demo is also available. ( https://galactica.org/ )


You can play with it yourself by providing various kinds of inputs in plain natural language. 




With the Information explosion, Search Engine based retrieval is quickly and increasingly becoming inadequate, particularly in scientific domains. Galactica is proposing a language model-based approach that can store, combine and reason scientific knowledge. 



(https://galactica.org/?max_new_tokens=400&prompt=jupyter+notebook+on+set+operations+demo)




The research paper describing Galactica is available in open-access mode here: https://galactica.org/static/paper.pdf


A must-read paper, I will say for any ML researcher. It is engrossing to go through the contents of this paper. 


Remarkably, there is a sentence at the end of Page No: 2, as quoted below:


Galactica was used to help write this paper, including recommending missing citations, topics to discuss in the introduction and related work, recommending further work, and helping write the abstract and conclusion.

Wow!!! They have built a model and the same model is used to enhance the research paper written about the model. Shall we call this "Recursive writing" or "Meta Writing"?  


The idea of language models is very popular in ML research circles. There are many. To name a few:

  • BERT
  • GPT 3
  • ELECTRA
  • DeBERTa
  • PALM


These models are trained with Billions of Parameters. Yes. You read that correctly. Billions of parameters. For example, the PALM model has 540 billion parameters. The Galactica has 106 Billion Tokens. 


Galactica's 106 Billion tokens include 48 million papers, 2 million code chunks, 8 million reference materials, 2 million knowledge bases, etc.,.  Galactica's ability to handle LaTex equations seems thrilling. 


The model is available in open source mode which would certainly help the research community to build on it further. 


If you ask the question, is it going to be the complete replacement for paper writing, the answer is NO. In the words of Yann Le-Cunn, Meta's Chief AI scientist, 


This tool is to paper writing as driving assistance is to driving. It won’t write papers automatically for you, but it will greatly reduce your cognitive load while you write them


AI models are growing swiftly. What we used to call earlier "Human Only" skills are now fastly getting acquired by AI. Let it be image generation, video generation, or now scientific knowledge-based article generation, AI is Everywhere. 




Comments