Sunday, June 4, 2023

summarization, information retrieval, and creative synthesis

Large Language Models like ChatGPT are a hot topic due to the novelty of results in multiple application domains. Stepping back from the hype, the central capabilities seem to include summarization of content, information retrieval, and creative synthesis. Unfortunately those are not separate categories -- the summarization or information retrieval can contain hallucinations that get stated confidently.

Focusing on the topic of information retrieval and setting aside hallucinations, let's consider alternative mechanisms for search:  
  • plain text search, like what Google supports
  • boolean logic, i.e., AND/OR/NOT
  • use of special indicators like wild cards, quotes for exact search
  • regular expressions
  • graph queries for inference engines that support inductive, deductive, and abduction
Except for the last, those search mechanisms all return specific results from a previously collected set of sources. 

--> I expect conventional search to remain important. There are cases where I really am looking for a specific document and not a summarization.

--> Specialized search capabilities like regular expressions and wild cards will remain relevant for matching specific text strings. An LLM might provide suggestions on designing the regex?

--> Graph queries rely on bespoke databases that LLMs are not trained on currently. I'm not aware of any reason these can't be combined. 

The Physics Derivation Graph effectively provides a knowledge graph for mathematical Physics. Combining this with machine learning is feasible.

No comments:

Post a Comment