In the realm of natural language processing, question answering over tabular data, also known as TableQA, presents a big challenge. SAP, in consequence, presents a challenge in enhancing Large Language Models' (LLMs) proficiency in reasoning over its data because it resides in tables, and table perturbations limit LLM robustness.
Still, now, no method has succeeded in combining Deep Learning with structure data analysis for several reasons, and spoiler alert, research is advancing towards aggregating multiple reasoning pathways. I will explain in the last section.
When designing an NLP pipeline where one of the sources of data is a table, the key challenges in handling the structural nature of tables and the difficulty of linearizing tables without losing critical structural and relational information are two of the main problems and not the only ones. Models also struggle with precise numerical computation (SAP representation of a 263 number), and the risk of crucial details being overshadowed by a dense Query result.
Recent advancements like the StructGPT framework (specialized interfaces for reading structured data) show promise in engaging LLMs with structured data but lack in integrating symbolic reasoning, a critical aspect for enhancing LLM capabilities in tabular reasoning. Symbolic AI and learning from examples (neural networks) must be defined together to understand rules and learn from many examples.
Other approaches that go into building or fine-tuning a model from structured datasets come from the early days of Encoder-Decoder technologies, like TaBERT, TaPas, or PASTA, don't solve the fundamental problem of table perturbations requiring continuous pre-training, which complicates over time and size.
Recent LLM advancements have shown potential for tabular reasoning, with techniques like Chain-of-Thought illustrating effectiveness, although Chain of Thought is not designed for tabular data.
Google research - Chain of Tables

Released this first week of 2024,
Chain of Tables explicitly incorporates tabular data within a reasoning chain (Agent). This method guides LLMs to iteratively perform operations and update tables, forming a chain representing the reasoning process for table-related problems.
Chain of Tables process involves defining common table operations and prompting LLMs for step-by-step reasoning. Each operation enriches or condenses the table, aiding in reaching accurate predictions. This iterative process continues until a solution is reached.

Google depicts a comparison of three different reasoning methods applied to a complex table: (a) generic reasoning, (b) program-aided reasoning, and (c) the proposed Chain of table approach.
A table combines a cyclist's nationality and name in a single cell in the scenario. Method (a) struggles to provide the correct answer due to the complexity of multi-step reasoning. Method (b) employs program execution, such as SQL queries, but it cannot accurately parse the name and nationality from the table. On the other hand, method (c) Chain of table uses a series of operations to iteratively modify the complex table into a format more suited to the query. This enables the Large Language Model (LLM) to arrive at the correct answer.
Google research presents an interesting approach for integrating LLMs with tables since fine-tuning LMs for table understanding is complex.
Google method also brings challenges by itself, not mentioned in the paper; .

Intermediate results are stored in the transformed tables using Google's approach. These intermediate tables with aggregated data are hosted on the Agent's memory and could require extensive memory management techniques for the Agent, a topic that still lacks much investigation.
Unfortunately, Google did not release this code, so (in Jan 2024) we can't implement Chain of Tables in our Agent frameworks, Amazon Bedrock, Langchain, or SAP's Generative AI Hub.
Microsoft analyzed how good GPT4 is for tabular data.
In November 23, Microsoft, in a paper titled "
GPT4Table: Can Large Language Models Understand Structured Table Data?" investigated the capabilities of Large Language Models in understanding and processing structured table data.
A traditional LLM to DB Query or TableQA is a 2-step process
- Question Decomposition
- Data Retrieval
This must be orchestrated and use a powerful LLM like OpenAI and an Agent, which chains the SQL question to another agent that interprets the result of that SQL question to form the answer.

Ye et al. 2023
Microsoft Research designed a benchmark comprising tasks to evaluate LLMs' called
Structural Understanding Capabilities. These tasks include cell lookup, row retrieval, and size detection, each presenting unique challenges described in the paper.

Microsoft research is really extensive, indicating GPT4's capabilities to understand a table structure, but it is still far from perfect; even in simple tasks like table size detection, there are failures.
Not mentioned in the paper is cost or latency.
Longer context windows resulting from a query might reduce quality, and processing large amounts of data in a prompt to GPT-4, where a single gigabyte of raw text costs hundreds of dollars in API, indicates it's essential to be strategic and selective in data extraction from extensive corpora.
Conclusion
In this blog, I discuss the challenges and recent advancements in the field of NLP, focusing on question-answering over tabular data (TableQA). Highlight the difficulties Large Language Models face in reasoning over data presented in tables, mainly due to table perturbations that limit their robustness. Recently released by Google Research, Chain of Tables incorporates tabular data within a reasoning chain, improving LLM's capability to reason over table-based questions.