When we first looked at this space in late 2023, many generative AI R packages focused largely on OpenAI and ChatGPT-like functionality or coding add-ins. Today, the landscape includes packages that natively support more LLM suppliers—including models run locally on your own computer. The range of genAI tasks you can do in R has also broadened.
Here’s an updated look at one of the more intriguing categories in the landscape of generative AI packages for R: adding large language model capabilities to your own scripts and apps. My next article will cover tools for getting help with your R programming and running LLMs locally.
ellmer
ellmer wasn’t one of the earliest entries in this space, but as the main Tidyverse package for using LLMs in an R workflow, it’s among the most important. ellmer has the resources of Posit (formerly RStudio) behind it, and its co-authors—Posit Chief Scientist Hadley Wickham, (creator of ggplot2 and dplyr
) and CTO Joe Cheng, original author of the Shiny R web framework—are well known for their work on other popular R packages.
Kyle Walker, author of several R and Python packages including tidycensus
, posted on Bluesky: “{ellmer} is the best #LLM interface in any language, in my opinion.” (Walker recently contributed code to ellmer for processing PDF files.)
Getting started with ellmer
ellmer supports a range of features, use cases, and platforms, and it’s also well-documented. You can install the ellmer
package from CRAN the usual way or try the development version with pak::pak("tidyverse/ellmer")
.
To use ellmer, you start by creating a chat object with functions such as chat_openai()
or chat_claude()
. Then you’ll use that object to interact with the LLM.
Check the chat function help file to see how to store your API key (unless you’re using a local model). For example, chat_openai()
expects an environment variable, OPENAI_API_KEY
, which you can store in your .Renviron
file. A call like chat_anthropic()
will look for ANTHROPIC_API_KEY
.
ellmer supports around a dozen LLM platforms including OpenAI and Azure OpenAI, Anthropic, Google Gemini, AWS Bedrock, Snowflake Cortex, Perplexity, and Ollama for local models. Here’s an example of creating a basic OpenAI chat object:
library(ellmer)
my_chat
And here’s the syntax for using Claude:
my_chat_claude
System prompts aren’t required, but I like to include them if I’ve got a topic-specific task. Not all chat functions require setting the model, either, but it’s a good idea to specify one so you’re not unpleasantly surprised by either an old, outdated model or a cutting-edge pricey one.
Several R experts said in early 2025 that Anthropic’s Claude Sonnet 3.5 seemed to be best specifically for writing R code, and early reports say Sonnet 3.7 is even better in general at code. However, with the new batch of models released by OpenAI, Google, and others, it’s worth keeping an eye on how that unfolds. This is a good reason for coding in a way that it’s relatively easy to swap models and providers. Separating the chat object from other code should do the trick.
I’ve found OpenAI’s new o3-mini model to be quite good at writing R code. It’s less expensive than Sonnet 3.7, but as a “reasoning” model that always thinks step by step, it may not respond as quickly as conventional models do.
Sonnet is somewhat pricier than GPT-4o, but unless you’re feeding it massive amounts of code, the bill should still be pretty small (my queries typically run two cents each).
If you want to set other parameters for the model, including temperature (meaning the amount of randomness acceptable in the response), use api_arg
. That takes a named list like this one:
my_chat
A higher temperature is often viewed as more creative, whereas a lower one is considered more precise. Also note that temperature isn’t an explicit argument in ellmer’s chat functions, so you need to remember to set it in the api_args
named list. (Updating this setting is a good idea since temperature can be an important parameter when working with LLMs.)
Check the model provider’s API documentation for a list of available argument names and valid value ranges for temperature and other settings.
Interactivity
ellmer offers three levels of interactivity for using its chat objects: a chatbot interface, either in the R console or browser only; streaming and saving the results as text; or only storing query results without streaming.
live_console(my_chat)
opens my_chat
in a chat interface in the R console. live_browser(my_chat)
creates a basic chatbot interface in a browser. In both cases, responses are streamed.
ellmer’s “interactive method call” uses a syntax like my_chat$chat()
to both display streamed responses and save the result as text:
my_results
If you use my_chat
again to ask another question, you’ll find the chat retained your previous question-and-answer history, so you can refer to your previous questions and the responses. For example, the following will refer back to the initial question:
my_results2
This chat query includes a history of previous questions and responses.
When using ellmer code non-interactively within an R script as part of a larger workflow, the documentation suggests wrapping your code inside an R function, like so:
my_function
In this case, my_answer
is a simple text string.
The package also includes documentation for tool/function calling, prompt design, extracting structured data, and more. I’ve used it along with the shinychat package to add chatbot capabilities to a Shiny app. You can see example code that creates a simple Shiny chat interface for Ollama local models on your system, including dropdown model selection and a button to download the chat, in this public GitHub gist.
Note that ellmer started life as elmer, but they are branches of the same resulting package.
tidyllm
Some of tidyllm’s capabilities overlap with ellmer’s, but its makers say the interface and design philosophy are very different. For one thing, tidyllm combines “verbs”—the type of request you want to make, such as chat()
or send_batch()
—with providers. Requests include text embeddings (generating numerical vectors to quantify the semantic meaning of a text string) as well as chats and batch processing.
tidyllm currently supports Anthropic Claude models, OpenAI, Google Gemini, Mistral, Groq (which is not grok), Perplexity, Azure, and local Ollama models. If using a cloud provider, you’ll need to set up your API key either with Sys.setenv()
or in your .Renviron
file.
Querying
A simple chat query typically starts by creating an LLMMessage
object via the llm_message("your query here")
function, and then piping that to a chat()
function, such as
library(tidyllm)
my_conversation
chat(openai(.model = "gpt-4o-mini", .temperature = 0, .stream = FALSE))
The llm_message()
can also include a .system_prompt
message.
The returned value of my_conversation
is just a text string with the model’s response, which is easy to print out.
You can build on that by adding another message and chat, which will keep the original query and response in memory, such as:
# Keep history of last message
my_conversation2
llm_message("How would I rotate labels on the x-axis 90 degrees?") |>
chat(openai(.model = "gpt-4o-mini", .temperature = 0, .stream = FALSE))
print(my_conversation2)
It’s possible to extract some metadata from the results with
result_metadata
which returns a tibble (a special data frame) with columns for model
, timestamp
, prompt_tokens,
completion_tokens
, total_tokens
(which oddly was the same as completion_tokens
when I tested in some cases), and api_specific
lists with more token details. You can also extract the user message that generated the reply with get_user_message(my_conversation2)
.
llm_message()
also supports sending images to models that support such uploads with the .imagefile
argument. And, you can ask questions about PDF files with the .pdf
argument (make sure you also have the pdftools
R package installed). For example, I downloaded the tidyllm PDF reference manual from CRAN to a file named tidyllm.pdf
in a files subdirectory, and then asked a question using tidyllm with R:
my_conversation3
chat(openai(.model = "gpt-4o", .temperature = .1, .stream = FALSE))
result_metadata3 showed that my query used 22,226 input tokens and 556 output tokens. At $2.50 per million input tokens, that interaction was around 5.5 cents for the query part to include the entire 58-page PDF in the question, plus another half a cent or so for the response (at $10 per million).
I tried the same query with Claude Sonnet 3.5 and I estimate it cost around 7.7 cents for my query, including the entire 58-page PDF, plus another penny or so for the output (which is $15 per million).
However, if I built up several other questions on top of that, the tokens could add up. This code is just to show package syntax and capabilities, by the way; it’s not an efficient way to query large documents. (See the section on RAG and Ragnar later in the article for a more typical approach.)
If you don’t need immediate responses, another way to trim spending is to use lower-cost batch requests. Several providers, including Anthropic, OpenAI, and Mistral, offer discounts of around 50% for batch processing. The tidyllm send_batch()
function submits such a request, check_batch() checks on the status, and
fetch_batch()
gets the results. You can see more details on the tidyllm package website.
tidyllm was created by Eduard Brüll, a researcher at the ZEW economic think tank in Germany.
batchLLM
As the name implies, batchLLM is designed to run prompts over multiple targets. More specifically, you can run a prompt over a column in a data frame and get a data frame in return with a new column of responses. This can be a handy way of incorporating LLMs in an R workflow for tasks such as sentiment analysis, classification, and labeling or tagging.
It also logs batches and metadata, lets you compare results from different LLMs side by side, and has built-in delays for API rate limiting.
batchLLM’s Shiny app offers a handy graphical user interface for running LLM queries and commands on a column of data.
batchLLM also includes a built-in Shiny app that gives you a handy web interface for doing all this work. You can launch the web app with batchLLM_shiny()
or as an RStudio add-in, if you use RStudio. There’s also a web demo of the app.
batchLLM’s creator, Dylan Pieper, said he created the package due to the need to categorize “thousands of unique offense descriptions in court data.” However, note that this “batch processing” tool does not use the less expensive, time-delayed LLM calls offered by some model providers. Pieper explained on GitHub that “most of the services didn’t offer it or the API packages didn’t support it” at the time he wrote batchLLM. He also noted that he had preferred real-time responses to asynchronous ones.
Two more tools to watch
We’ve looked at three top tools for integrating large language models into R scripts and programs. Now let’s look at a couple more tools that focus on specific tasks when using LLMs within R: retrieving information from large amounts of data, and scripting common prompting tasks.
ragnar (RAG for R)
RAG, or retrieval augmented generation, is one of the most useful applications for LLMs. Instead of relying on an LLM’s internal knowledge or directing it to search the web, the LLM generates its response based only on specific information you’ve given it. InfoWorld’s Smart Answers feature is an example of a RAG application, answering tech questions based solely on articles published by InfoWorld and its sister sites.
A RAG process typically involves splitting documents into chunks, using models to generate embeddings for each chunk, embedding a user’s query, and then finding the most relevant text chunks for that query based on calculating which chunks’ embeddings are closest to the query’s. The relevant text chunks are then sent to an LLM along with the original question, and the model answers based on that provided context. This makes it practical to answer questions using many documents as potential sources without having to stuff all the content of those documents into the query.
There are numerous RAG packages and tools for Python and JavaScript, but not many in R beyond generating embeddings. However, the ragnar package, currently very much under development, aims to offer “a complete solution with sensible defaults, while still giving the knowledgeable user precise control over all the steps.”
Those steps either do or will include document processing, chunking, embedding, storage (defaulting to DuckDB), retrieval (based on both embedding similarity search and text search), a technique called re-ranking to improve search results, and prompt generation.
If you’re an R user and interested in RAG, keep an eye on ragnar.
tidyprompt
Serious LLM users will likely want to code certain tasks more than once. Examples include generating structured output, calling functions, or forcing the LLM to respond in a specific way (such as chain-of-thought).
The idea behind the tidyprompt package is to offer “building blocks” to construct prompts and handle LLM output, and then chain those blocks together using conventional R pipes.
tidyprompt “should be seen as a tool which can be used to enhance the functionality of LLMs beyond what APIs natively offer,” according to the package documentation, with functions such as answer_as_json()
, answer_as_text()
, and answer_using_tools()
.
A prompt can be as simple as
library(tidyprompt)
"Is London the capital of France?" |>
answer_as_boolean() |>
send_prompt(llm_provider_groq(parameters = list(model = "llama3-70b-8192") ))
which in this case returns FALSE
. (Note that I had first stored my Groq API key in an R environment variable, as would be the case for any cloud LLM provider.) For a more detailed example, check out the Sentiment analysis in R with a LLM and ‘tidyprompt’ vignette on GitHub.
There are also more complex pipelines using functions such as llm_feedback()
to check if an LLM response meets certain conditions and user_verify()
to make it possible for a human to check an LLM response.
You can create your own tidyprompt
prompt wraps with the prompt_wrap()
function.
The tidyprompt
package supports OpenAI, Google Gemini, Ollama, Groq, Grok, XAI, and OpenRouter (not Anthropic directly, but Claude models are available on OpenRouter). It was created by Luka Koning and Tjark Van de Merwe.
The bottom line
The generative AI ecosystem for R is not as robust as Python’s, and that’s unlikely to change. However, in the past year, there’s been a lot of progress in creating tools for key tasks programmers might want to do with LLMs in R. If R is your language of choice and you’re interested in working with large language models either locally or via APIs, it’s worth giving some of these options a try.