Google Imagines LLM and Photos Library-powered Chatbot

Google found that LLMs are ideal for this use case, with Gemini of course having multimodal capabilities that can parse images, videos, and audio as well as text.

Large language models (LLMs) will make possible a slew of new capabilities in consumer apps over the coming years. Google has explored creating a personalized chatbot that can “answer previously impossible questions” about your life by setting LLMs loose on your Google Photos library and other sources.

Project Ellman would use LLMs to gain awareness of what’s happening in a photo. Examples from the internal presentation include being aware of your: university years, Bay Area years, and years as a parent/birth of children. For example, a class reunion could be recognized as such: “It’s exactly 10 years since he graduated and is full of faces not seen in 10 years so it’s probably a reunion.”

Taking this a step further is an example about how an LLM could discern your preference for a particular type of food (Italian) because of frequent pictures of certain dishes (pasta and pizza). Meanwhile, screenshots could be used to determine everything from interest, upcoming purchases, travel plans, and favorite websites.

“One of the reasons that an LLM is so powerful for this bird’s-eye approach, is that it’s able to take unstructured context from all different elevations across this tree, and use it to improve how it understands other regions of the tree,” a slide reads, alongside an illustration of a user’s various life “moments” and “chapters.”

Google found that LLMs are ideal for this use case, with Gemini of course having multimodal capabilities that can parse images, videos, and audio as well as text.

This technology would come together as a chatbot for a user that was described as: “Imagine opening ChatGPT but it already knows everything about your life. What would you ask it?”

Practical use cases include asking when siblings last visited and finding towns similar to where you live.

In 2016, Google Assistant was announced as “your own personal Google,” and earlier this year Sundar Pichai said the company has the AI “technology to actually do those things now.” At the first Made by Google event in October of 2016, the CEO said Assistant’s “goal is to build a personal Google for each and every user.”

The company tells that Project Ellman is “early internal exploration” from the Google Photos team that was presented at an internal event that featured other presentations by the Gemini team:

“Google Photos has always used AI to help people search their photos and videos, and we’re excited about the potential of LLMs to unlock even more helpful experiences. This was an early internal exploration and, as always, should we decide to roll out new features, we would take the time needed to ensure they were helpful to people, and designed to protect users’ privacy and safety as our top priority.”

Google Imagines LLM and Photos Library-powered Chatbot

Must read

Infoblox to Acquire Network Observability Firm Kentik

Cloudflare Launches Precursor to Detect Evasive Bots

Bezos, UK’s Sovereign AI Fund Back CuspAI’s $450M Round

The End of “Seeing Is Believing”: AI’s New Risk to Children

Google found that LLMs are ideal for this use case, with Gemini of course having multimodal capabilities that can parse images, videos, and audio as well as text.

More articles

Latest posts

Infoblox to Acquire Network Observability Firm Kentik

Cloudflare Launches Precursor to Detect Evasive Bots

Bezos, UK’s Sovereign AI Fund Back CuspAI’s $450M Round

The End of “Seeing Is Believing”: AI’s New Risk to Children

Sherpa.ai Raises $18M for Sovereign AI Platform

Actian Adds Jaspersoft to Data Management Portfolio

Why Most Enterprise AI Agent Pilots Never Reach Production

Quick Links

Popular Categories

What to Read Next