nagini - improving image search relevance

Search results page for robot Gale crater details page


NaGINI is a Python library that implements natural language processing and optical character recognition to extract keyword tags from an image based on text from the source webpage and text within the image itself. After accepting an image URL and a source URL (or source HTML), NaGINI automatically fetches, parses, and returns relevant information about the image to be indexed.

NaGINI also performs image deduplication through the use of a locality-sensitive image hashing algorithm: perceptual hash. Images are represented as 64-bit hashes, where hashes that are nearly-identical (as measured by hamming distance) mean that the images that were hashed are also nearly-identical. The hash is chunked to increase the performance of querying the database for near-duplicates.

A new and modern front-end interface was designed and built using React.js to showcase the new functionality.


Jet Propulsion Laboratory

Final Report (PDF)