Software Engineering

Ask an NLP Engineer: From GPT to the Ethics of AI


Over the past year, Toptal data scientist and natural language processing engineer (NLP) Daniel Pérez Rubio has been intensely focused on developing advanced language models like BERT and GPT—the same language model family behind omnipresent generative AI technologies like OpenAI’s ChatGPT. What follows is a summary of a recent ask-me-anything-style Slack forum in which Rubio fielded questions about AI and NLP topics from other Toptal engineers around the world.

This comprehensive Q&A will answer the question “What does an NLP engineer do?” and satisfy your curiosity on subjects such as essential NLP foundations, recommended technologies, advanced language models, product and business concerns, and the future of NLP. NLP professionals of varying backgrounds can gain tangible insights from the topics discussed.

Editor’s note: Some questions and answers have been edited for clarity and brevity.

New to the Field: NLP Basics

What steps should a developer follow to move from working on standard applications to starting professional machine learning (ML) work?
—L.P., Córdoba, Argentina

Theory is much more important than practice in data science. Still, you’ll also have to get familiar with a new tool set, so I’d recommend starting with some online courses and trying to put your learnings into practice as much as possible. When it comes to programming languages, my recommendation is to go with Python. It’s similar to other high-level programming languages, offers a supportive community, and has well-documented libraries (another learning opportunity).

How familiar are you with linguistics as a formal discipline, and is this background helpful for NLP? What about information theory (e.g., entropy, signal processing, cryptanalysis)?
—V.D., Georgia, United States

As I am a graduate in telecommunications, information theory is the foundation that I use to structure my analytical approaches. Data science and information theory are considerably connected, and my background in information theory has helped shape me into the professional I am today. On the other hand, I have not had any kind of academic preparation in linguistics. However, I have always liked language and communication in general. I’ve learned about these topics through online courses and practical applications, allowing me to work alongside linguists in building professional NLP solutions.

Can you explain what BERT and GPT models are, including real-life examples?
—G.S.

Without going into too much detail, as there’s a lot of great literature on this topic, BERT and GPT are types of language models. They’re trained on plain text with tasks like text infilling, and are thus prepared for conversational use cases. As you have probably heard, language models like these perform so well that they can excel at many side use cases, like solving mathematical tests.

A diagram of recommended NLP tools in four categories: programming languages, cloud services, workflow orchestration services, and language models.
The Top Recommended NLP Tools (in Green) and Their Alternatives (in Light Blue)

What are the best options for language models besides BERT and GPT?
—R.K., Korneuburg, Austria

The best one I can suggest, based on my experience, is still GPT-2 (with the most recent release being GPT-4). It’s lightweight and powerful enough for most purposes.

Do you prefer Python or R for performing text analysis?
—V.E.

I can’t help it—I love Python for everything, even beyond data science! Its community is great, and it has many high-quality libraries. I know some R, but it’s so different from other languages and can be difficult to use for production. However, I must say that its statistics-oriented capabilities are a big pro compared to Python-based alternatives, though Python has many high-quality, open-source projects to compensate.

Do you have a preferred cloud service (e.g., AWS, Azure, Google) for model building and deployment?
—D.B., Traverse City, United States

Easy one! I hate vendor lock-in, so AWS is my preferred choice.

Do you recommend using a workflow orchestration for NLP pipelines (e.g., Prefect, Airflow, Luigi, Neptune), or do you prefer something built in-house?
—D.O., Registro, Brazil

I know Airflow, but I only use it when I have to orchestrate several processes and I know I’ll want to add new ones or change pipelines in the future. Those tools are particularly helpful for cases like big data processes involving heavy extract, transform, and load (ETL) requirements.

What do you use for less complex pipelines? The standard I see most frequently is building a web API with something like Flask or FastAPI and having a front end call it. Do you recommend any other approach?
—D.O., Registro, Brazil

I try to keep it simple without adding unnecessary moving parts, which can lead to failure later on. If an API is needed, then I use the best resources I know of to make it robust. I recommend FastAPI in combination with a Gunicorn server and Uvicorn workers—this combination works wonders!

However, I normally avoid architectures like microservices from scratch. My take is that it is best to work toward modularity, readability, and clear documentation. If the day comes that you need to change to a microservices approach, then you can address the update and celebrate the fact that your product is important enough to merit those efforts.

I’ve been using MLflow for experiment tracking and Hydra for configuration management. I’m considering trying Guild AI and BentoML for model management. Do you recommend any other similar machine learning or natural language processing tools?
—D.O., Registro, Brazil

What I use the most is custom visualizations and pandas’ style method for quick comparisons.

I usually use MLflow when I need to share a common repository of experiment results within a data science team. Even then, I typically go for the same kind of reports (I have a slight preference for plotly over matplotlib to help make reports more interactive). When the reports are exported as HTML, the results can be consumed immediately, and you have full control of the format.

I’m eager to try Weights & Biases specifically for deep learning, since monitoring tensors is much harder than monitoring metrics. I’ll be happy to share my results when I do.

Advancing Your Career: Complex NLP Questions

Can you break down your day-to-day work regarding data cleaning and model building for real-world applications?
—V.D., Georgia, USA

Data cleaning and feature engineering take around 80% of my time. The reality is that data is the source of value for any machine learning solution. I try to save as much time as possible when building models, especially since a business’s target performance requirements may not be high enough to need fancy tricks.

Regarding real-world applications, this is my main focus. I love seeing my products help solve concrete problems!

Suppose I’ve been asked to work on a machine learning model that doesn’t work, no matter how much training it gets. How would you perform a feasibility analysis to save time and offer proof that it is better to move to other approaches?
—R.M., Dubai, United Arab Emirates

It is helpful to use a Lean approach to validate the performance capabilities of the optimal solution. You can achieve this with minimal data preprocessing, a good base of easy-to-implement models, and strict best practices (separation of training/validation/test sets, use of cross-validation when possible, etc.).

Is it possible to build smaller models that are almost as good as larger ones but use fewer resources (e.g., by pruning)?
—R.K., Korneuburg, Austria

Sure! There has been a great advance in this area recently with DeepMind’s Chinchilla model, which performs better and has a much smaller size (in compute budget) than GPT-3 and comparable models.

AI Product and Business Insights

A flowchart of four arrows describing the machine learning product development cycle from start to finish.
The Machine Learning Product Development Cycle

Can you share more about your machine learning product development methods?
—R.K., Korneuburg, Austria

I almost always start with an exploratory data analysis, diving as deep as I must until I know exactly what I need from the data I’ll be working with. Data is the source of value for any supervised machine learning product.

Once I have this knowledge (usually after several iterations), I share my insights with the customer and work to understand the questions they want to solve to become more familiar with the project’s use cases and context.

Later, I work toward quick and dirty baseline results using easy-to-implement models. This helps me understand how difficult it will be to reach the target performance metrics.

For the rest, it’s all about focusing on data as the source of value. Putting more effort toward preprocessing and feature engineering will go a long way, and constant, transparent communication with the customer can help you navigate uncertainty together.

Generally, what is the outermost boundary of current AI and ML applications in product development?
—R.K., Korneuburg, Austria

Right now, there are two major boundaries to be figured out in AI and ML.

The first one is artificial general intelligence (AGI). This is starting to become a large focus area (e.g., DeepMind’s Gato). However, there’s still a long way to go until AI reaches a more generalized level of proficiency in multiple tasks, and facing untrained tasks is another obstacle.

The second is reinforcement learning. The dependence on big data and supervised learning is a burden we need to eliminate to tackle most of the challenges ahead. The amount of data required for a model to learn every possible task a human does is likely out of our reach for a long time. Even if we achieve this level of data collection, it may not prepare the model to perform at a human level in the future when the environment and conditions of our world change.

I don’t expect the AI community to solve these two difficult problems any time soon, if ever. In the case that we do, I don’t predict any functional challenges beyond those, so at that point, I presume the focus would change to computational efficiency—but it probably won’t be us humans who explore that!

When and how should you incorporate machine learning operations (MLOps) technologies into a product? Do you have tips on persuading a client or manager that this needs to be done?
—N.R., Lisbon, Portugal

MLOps is great for many products and business goals such as serverless solutions designed to charge only for what you use, ML APIs targeting typical business use cases, passing apps through free services like MLflow to monitor experiments in development stages and application performance in later stages, and more. MLOps especially yields huge benefits for enterprise-scale applications and improves development efficiency by reducing tech debt.

However, evaluating how well your proposed solution fits your intended purpose is important. For example, if you have spare server space in your office, can guarantee your SLA requirements are met, and know how many requests you’ll receive, you may not need to use a managed MLOps service.

One common point of failure occurs from the assumption that a managed service will cover project requisites (model performance, SLA requirements, scalability, etc.). For example, building an OCR API requires intensive testing in which you assess where and how it fails, and you should use this process to evaluate obstacles to your target performance.

I think it all depends on your project objectives, but if an MLOps solution fits your goals, it’s typically more cost-effective and controls risk better than a tailor-made solution.

In your opinion, how well are organizations defining business needs so that data science tools can produce models that help decision-making?
—A.E., Los Angeles, United States

That question is key. As you probably know, compared to standard software engineering solutions, data science tools add an extra level of ambiguity for the customer: Your product is not only designed to deal with uncertainty, but it often even leans on that uncertainty.

For this reason, keeping the customer in the loop is crucial; every effort made to help them understand your work is worth it. They are the ones who know the project requirements most clearly and will approve the final result.

The Future of NLP and Ethical Considerations for AI

How do you feel about the rising power consumption caused by the large convolutional neural networks (CNNs) that companies like Meta are now routinely building?
—R.K., Korneuburg, Austria

That’s a great and sensible question. I know some people think those models (e.g., Meta’s LLaMA) are useless and waste resources. But I’ve seen how much good they can do, and since they’re usually offered later to the public for free, I think the resources spent to train those models will pay off over time.

What are your thoughts on those who claim that AI models have achieved sentience? Based on your experience with language models, do you think they are getting anywhere close to sentience in the near future?
—V.D., Georgia, United States

Assessing whether something like AI is self-conscious is so metaphysical. I don’t like the focus of these types of stories or their resulting bad press for the NLP field. In general, most artificial intelligence projects don’t intend to be anything more than, well, artificial.

In your opinion, should we worry about ethical issues related to AI and ML?
—O.L., Ivoti, Brazil

We surely should—especially with recent advances in AI systems like ChatGPT! But a substantial degree of education and subject matter expertise is required to frame the discussion, and I’m afraid that certain key agents (e.g., governments) will still need time to achieve this.

One important ethical consideration is how to reduce and avoid bias (e.g., racial or gender bias). This is a job for technologists, companies, and even customers—it is critical to put in the effort to avoid the unfair treatment of any human being, regardless of the cost.

Overall, I see ML as the main driver that could potentially lead humanity to its next Industrial Revolution. Of course, during the Industrial Revolution many jobs ceased to exist, but we created new, less menial, and more creative jobs as replacements for many workers. It is my opinion that we will do the same now and adapt to ML and AI!

The editorial team of the Toptal Engineering Blog extends its gratitude to Rishab Pal for reviewing the technical content presented in this article.