AI engineering requires no academia or ML – just problem-solving

Let’s talk about AI engineering without needless hype.

Do you think you know what AI engineering is? Well, think twice – especially if you think that it requires academia or machine learning.

Tejas Kumar, AI DevRel Engineer at DataStax, debunked these and other myths about AI engineering at this year’s Shift Conference in Zadar. So, we asked him what AI engineering truly is.

Just apply AI to solve your dev problems

Tejas thinks the main problem with AI engineering is that it lacks a formal definition. Many people confuse it with machine learning research and engineering, but those are completely different skills.

AI engineering does not require academia, experience with machine learning models, Python, or linear algebra: it’s just applying AI to solve problems.

According to Tejas, one can do this without ever training anything, instead just making a network request to an AI API and using the returned output to solve problems.

One of the topics Kumar will cover in his lecture will also be AI engineering techniques that help reduce costs, including fine-tuning transfer learning, efficient data preprocessing, and optimizing model architecture.

Best practices for maximizing value involve semantic caching, i.e., caching based on inferred intent as opposed to keywords and identifiers, along with fine-tuning cheaper, smaller, more specialized models based on inferences from larger, more expensive ones.

Avoiding AI hallucinations and ensuring privacy with RAG and local models

But is there a way to avoid AI hallucinations, one of the biggest problems with using AI?

The answer is Retrieval-Augmented Generation (RAG).

RAG combines pre-trained models with external knowledge retrieval while fine-tuning adapts pre-trained models to specific tasks. RAG helps reduce hallucinations by grounding responses in factual information, whereas fine-tuning can improve performance on targeted tasks but may be more prone to overfitting.

And there’s that thing about privacy… So, we asked Kumar how AI engineers can work with large language models in a privacy-sensitive way, especially when running these models locally.

“To work with large language models in a privacy-sensitive way, AI engineers can run models locally using tools like LLVM, llama.cpp, or Ollama. They are inference engines that can be used with a compatible open-source model to keep the entire flow offline.”

To further enhance privacy, engineers can run these models on machines without internet access but only within private networks, says he. If you want to hear more about this topic, join us at the Shift Conference in Zadar on September 16th and 17th.