
Julien Salinas wears many hats. He's an entrepreneur, software developer and, until lately, a volunteer fireman in his mountain village an hour's drive from Grenoble, a tech hub in southeast France.
He's nurturing a two-year old startup, NLP Cloud, that's already profitable, employs about a dozen people and serves customers around the globe. It's one of many companies worldwide using NVIDIA software to deploy some of today's most complex and powerful AI models.
NLP Cloud is an AI-powered software service for text data. A major European airline uses it to summarize internet news for its employees. A small healthcare company employs it to parse patient requests for prescription refills. An online app uses it to let kids talk to their favorite cartoon characters.
Large Language Models Speak Volumes It's all part of the magic of natural language processing (NLP), a popular form of AI that's spawning some of the planet's biggest neural networks called large language models. Trained with huge datasets on powerful systems, LLMs can handle all sorts of jobs such as recognizing and generating text with amazing accuracy.
NLP Cloud uses about 25 LLMs today, the largest has 20 billion parameters, a key measure of the sophistication of a model. And now it's implementing BLOOM, an LLM with a whopping 176 billion parameters.
Running these massive models in production efficiently across multiple cloud services is hard work. That's why Salinas turns to NVIDIA Triton Inference Server.
High Throughput, Low Latency Very quickly the main challenge we faced was server costs, Salinas said, proud his self-funded startup has not taken any outside backing to date.
Triton turned out to be a great way to make full use of the GPUs at our disposal, he said.
For example, NVIDIA A100 Tensor Core GPUs can process as many as 10 requests at a time - twice the throughput of alternative software - thanks to FasterTransformer, a part of Triton that automates complex jobs like splitting up models across many GPUs.
FasterTransformer also helps NLP Cloud spread jobs that require more memory across multiple NVIDIA T4 GPUs while shaving the response time for the task.
Customers who demand the fastest response times can process 50 tokens - text elements like words or punctuation marks - in as little as half a second with Triton on an A100 GPU, about a third of the response time without Triton.
That's very cool, said Salinas, who's reviewed dozens of software tools on his personal blog.
Touring Triton's Users Around the globe, other startups and established giants are using Triton to get the most out of LLMs.
Microsoft's Translate service helped disaster workers understand Haitian Creole while responding to a 7.0 earthquake. It was one of many use cases for the service that got a 27x speedup using Triton to run inference on models with up to 5 billion parameters.
NLP provider Cohere was founded by one of the AI researchers who wrote the seminal paper that defined transformer models. It's getting up to 4x speedups on inference using Triton on its custom LLMs, so users of customer support chatbots, for example, get swift responses to their queries.
NLP Cloud and Cohere are among many members of the NVIDIA Inception program, which nurtures cutting-edge startups. Several other Inception startups also use Triton for AI inference on LLMs.
Tokyo-based rinna created chatbots used by millions in Japan, as well as tools to let developers build custom chatbots and AI-powered characters. Triton helped the company achieve inference latency of less than two seconds on GPUs.
In Tel Aviv, Tabnine runs a service that's automated up to 30% of the code written by a million developers globally (see a demo below). Its service runs multiple LLMs on A100 GPUs with Triton to handle more than 20 programming languages and 15 code editors.
https://blogs.nvidia.com/wp-content/uploads/2022/10/Tabnine.mp4
Twitter uses the LLM service of Writer, based in San Francisco. It ensures the social network's employees write in a voice that adheres to the company's style guide. Writer's service achieves a 3x lower latency and up to 4x greater throughput using Triton compared to prior software.
If you want to put a face to those words, Inception member Ex-human, just down the street from Writer, helps users create realistic avatars for games, chatbots and virtual reality applications. With Triton, it delivers response times of less than a second on an LLM with 6 billion parameters while reducing GPU memory consumption by a third.
It's another example of how LLMs are expanding AI's horizons.
Triton is widely used, in part, because its versatile. The software works with any style of inference and any AI framework - and it runs on CPUs as well as NVIDIA GPUs and other accelerators.
A Full-Stack Platform Back in France, NLP Cloud is now using other elements of the NVIDIA AI platform.
For inference on models running on a single GPU, it's adopting NVIDIA TensorRT software to minimize latency. We're getting blazing-fast performance with it, and latency is really going down, Salinas said.
The company also started training custom versions of LLMs to support more languages and enhance efficiency. For that work, it's adopting NVIDIA Nemo Megatron, an end-to-end framework for training and deploying LLMs with trillions of parameters.
The 35-year-old Salinas has the energy of a 20-something for coding and growing his business. He describes plans to build private infrastructure to complement the four public cloud services the startup uses, as well as to expand into LLMs that handle speech and text-to-image to address applications like semantic search.
I always loved coding, but being a good developer is not enough: You have to understand your customers
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
13/05/2026
New Adobe Premiere Color Grading Mode Accelerated on NVIDIA GPUs
Joel Pennington May 13, 2026
0 Comments
New NVIDIA RTX-accelerated features streamlin...
13/05/2026
Grass Valley announced that dB Broadcast has delivered new IP-based outside broadcast (OB) trucks for Cloudbass, featuring Grass Valley LDX 100 Series cameras a...
13/05/2026
Ikegami will exhibit the latest additions to its wide range of broadcast production cameras, control units, viewfinders and monitors on stand 5D3-1 at Broadcast...
13/05/2026
FISE, working with the founding members of the XR Sports Alliance (XRSA), Accedo, Qualcomm Technologies, Inc. and HBS, have collaborated to develop an immersive...
13/05/2026
Canon Unveils New EOS R6 V Full-Frame EOS Camera and RF20-50mm F4 L IS USM PZ Bu...
13/05/2026
Boston Conservatory at Berklee Honors Beth Morrison and Moses Pendleton at Comme...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
13/05/2026
Creative software developer Foundry today announced the latest developments on Nuke Stage. A purpose-built application for end-to-end virtual production and in-...
13/05/2026
A definitive portrait of one of Ireland's most influential musicians
New TV documentary airs Monday 18 May on RT One and RT Player at 9.35pm
Watch the...
13/05/2026
Agentic AI is changing the way users get work done. Following the success of OpenClaw, the community is embracing new open source agentic frameworks. The latest...
13/05/2026
Reinforcement-learning agents - AI systems that learn by trial and error - can c...
12/05/2026
Beyond the Hype: A Strategic Post-Hoc Analysis of NAB 2026 If NAB Show 2026 had an underlying theme, it was a quiet, industry-wide pivot from the high-energy sp...
12/05/2026
Guntermann and Drunck (G&D), a Panoptec Technologies Group company, and CT Square, led by Chandresh Shah, have announced a joint venture to distribute G&D and V...
12/05/2026
With 30 days until the start of the FIFA World Cup 2026, Telemundo, the exclusive Spanish-language home of the tournament in the United States, has announced th...
12/05/2026
The NHL has announced the return of Stanley Pup for its third consecutive year, a 90-minute special featuring adoptable rescue dogs competing on a miniature rin...
12/05/2026
NBCUniversal presented its 2026 Upfront to advertisers at Radio City Music Hall, detailing upcoming programming across NBC, Peacock, Bravo, and Versant properti...
12/05/2026
FOX Sports has announced funding for the Fandom and Social Connection Initiative at Harvard Kennedy School's Shorenstein Center on Media, Politics, and Publ...
12/05/2026
TNDV and Live Media, both divisions of Live Media Group, supported live broadcast coverage around NCAA Final Four weekend in Indianapolis, including the March M...
12/05/2026
The European Football Alliance (EFA) has announced a content distribution agreement with Fubo Sports Network, the free ad-supported streaming TV (FAST) channel ...
12/05/2026
For the first time, Spanish-speaking fans in the U.S. will have two separate tel...
12/05/2026
CP Communications led a comprehensive spectrum management initiative on behalf of Churchill Downs during Kentucky Derby week, coordinating RF assets across the ...
12/05/2026
LiveU has announced a strategic partnership with DRONERESPONDERS, a 501(c)3 non-...
12/05/2026
Open Broadcast Systems has announced that BMC TV, a specialist in IP transport of broadcast content, has selected the Open Broadcast Systems 5G Flyaway solution...
12/05/2026
NEP Europe, part of NEP Group, has announced it will deliver broadcast solutions...
12/05/2026
Grass Valley has announced continued collaboration with Ravensbourne University ...
12/05/2026
Stats Perform has announced the launch of Opta Pulse, an AI-assisted video creation and distribution platform for leagues, rights holders, and broadcasters. The...
12/05/2026
FOX Sports has announced a collaboration with Sesame Workshop to integrate Sesame Street characters into FOX Sports' FIFA World Cup 2026 programming. Conten...
12/05/2026
To date, NHL Productions has produced 19 broadcasts with commentary in American Sign Language
NHL in ASL (American Sign Language) may be just one show, but the...
12/05/2026
Google's Brian Albert: creators, athletes, highlights, nostalgia, second-scr...
12/05/2026
A still from Past Lives by Celine Song, an official selection of the Premieres program at the 2023 Sundance Film Festival. (Courtesy of Sundance Institute | p...
12/05/2026
Spotify is where fans and artists come together, turning discovery into somethin...
12/05/2026
Features patented Marco-MMC clocking technology
Black Lion Audio's latest release combines the company's expertise in clocking with their renowned p...
12/05/2026
New Track Panel, sequencer upgrades & more
Following their recent public beta release, Reason Studios have announced the full release of Reason 14. With the...
12/05/2026
One month to go! SBS reveals expansive FIFA World Cup 2026 lineup beyond the pi...
12/05/2026
Rohde & Schwarz presents its advanced solutions for power electronics testing at...
12/05/2026
aconnic AG (ISIN: DE000A0LBKW6), Munich, has developed a modified fund raising p...
12/05/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
12/05/2026
Tyrell Corporation, specialists in high-end live sports and entertainment broadcasts, was tasked with delivering compelling broadcast coverage of premier equest...
12/05/2026
Registration is now open for IBC2026 as the global media, entertainment and technology community prepares to converge on the RAI Amsterdam from 11 14 September ...
12/05/2026
Ross Video, a global leader in live video production technology, will present its latest innovations and integrated production workflows at BroadcastAsia 2026, ...
12/05/2026
500 selected leaders from around the world across start-ups, corporates, and venture capital. Over 50bn in assets under management among attending investors, a...