Image Retrieval

LlamaParse json mode supports extracting any images found in a page object by using the getImages function. They are downloaded to a local folder and can then be sent to a multimodal LLM for further processing.

Installation

npm install llamaindex @llamaindex/cloud @llamaindex/openai

Usage

We use the getImages method to input our array of JSON objects, download the images to a specified folder and get a list of ImageNodes.

const reader = new LlamaParseReader();
const jsonObjs = await reader.loadJson("../data/uber_10q_march_2022.pdf");
const imageDicts = await reader.getImages(jsonObjs, "images");

Multimodal Indexing

You can create an index across both text and image nodes by requesting alternative text for the image from a multimodal LLM.

import { Document, ImageNode, VectorStoreIndex } from "llamaindex";
import { LlamaParseReader } from "@llamaindex/cloud";
import { OpenAI } from "@llamaindex/openai";
import { createMessageContent } from "llamaindex";
 
const reader = new LlamaParseReader();
async function main() {
  // Load PDF using LlamaParse JSON mode and return an array of json objects
  const jsonObjs = await reader.loadJson("../data/uber_10q_march_2022.pdf");
  // Access the first "pages" (=a single parsed file) object in the array
  const jsonList = jsonObjs[0]["pages"];
 
  const textDocs = getTextDocs(jsonList);
  const imageTextDocs = await getImageTextDocs(jsonObjs);
  const documents = [...textDocs, ...imageTextDocs];
  // Split text, create embeddings and query the index
  const index = await VectorStoreIndex.fromDocuments(documents);
  const queryEngine = index.asQueryEngine();
  const response = await queryEngine.query({
    query:
      "What does the bar graph titled 'Monthly Active Platform Consumers' show?",
  });
 
  console.log(response.toString());
}
 
main().catch(console.error);

We use two helper functions to create documents from the text and image nodes provided.

Text Documents

To create documents from the text nodes of the json object, we just map the needed values to a new Document object. In this case we assign the text as text and the page number as metadata.

function getTextDocs(jsonList: { text: string; page: number }[]): Document[] {
  return jsonList.map(
    (page) => new Document({ text: page.text, metadata: { page: page.page } }),
  );
}

Image Documents

To create documents from the images, we need to use a multimodal LLM to generate alt text.

For this we create ImageNodes and add them as part of our message.

We can use the createMessageContent function to simplify this.

async function getImageTextDocs(
  jsonObjs: Record<string, any>[],
): Promise<Document[]> {
  const llm = new OpenAI({
    model: "gpt-4o",
    temperature: 0.2,
    maxTokens: 1000,
  });
  const imageDicts = await reader.getImages(jsonObjs, "images");
  const imageDocs = [];
 
  for (const imageDict of imageDicts) {
    const imageDoc = new ImageNode({ image: imageDict.path });
    const prompt = () => `Describe the image as alt text`;
    const message = await createMessageContent(prompt, [imageDoc]);
 
    const response = await llm.complete({
      prompt: message,
    });
 
    const doc = new Document({
      text: response.text,
      metadata: { path: imageDict.path },
    });
    imageDocs.push(doc);
  }
 
  return imageDocs;
}

The returned imageDocs have the alt text assigned as text and the image path as metadata.

You can see the full example file here.

API Reference

LlamaParseReader