Expanding my toolbox as a programmer with ChatGPT and Copilot

Markus Dücker

September 24, 2023

As Engineering Lead at Handshake Europe, I am responsible for staying informed on new technological developments and figuring out how these could open up new use cases to serve our customers better or allow us to work more effectively as an Engineering team. In many cases, this means diving deep into highly technical topics that people outside of Software Development have rarely heard about.

But there are also topics that break this mold, like Generative AI and more specifically the release of ChatGPT by OpenAI. It not only created wide news coverage globally but also saw viral adoption, with an estimated count of 100 million active users already. Single-handedly creating an entirely new sector, in which a handful of companies are now fiercely competing with each other. And more broadly, it caused companies across all industries to rethink their products and start integrating AI-powered features into their offerings.

In this article, I will give an overview of Generative AI and Large Language Models (LLMS), put into context the transformational advancements that were made in AI over recent years, and share how I’m using these new tools in my day-to-day work.

What is Generative AI?

To explain Generative AI, we should start first by defining what’s meant by Artificial Intelligence. This term refers to the simulation of human intelligence in machines (software systems), enabling them to perform tasks that typically require human intelligence. Researchers classify these machines as “narrow” or “weak” AI if they show levels of human intelligence for a specific task only (e.g. vision or transcription) and  “general” or “strong” AI if they exhibit human intelligence across all domains, consequently making them indistinguishable from humans.

As can be seen in the following graph from “Our World in Data”, there have been significant advancements in AI systems over the last years when benchmarked against human performance for specific domain of tasks. Less than 10 years ago no AI system could fully match human performance. But then, in the 2010s AI systems reached the same level for the first time and kept improving further, ultimately surpassing human intelligence in these domains.

While Artificial Intelligence covers a wide range of tasks that these systems might be performing, Generative AI focuses specifically on the creation of new content for different types of modalities (text, audio, images, etc.). These systems are typically trained on large data sets of (human-created) content and are able to produce new and similar content, which might be indistinguishable from human-created content.

What are Large Language Models (LLMs)? 

For text-based models like ChatGPT it’s important to understand that these are based on Large Language Models (LLM), a specific type of model trained on massive amounts of text data, which learns the patterns in the input data and is optimized to give human-like responses to a query (prompt). Recent breakthroughs in LLMs were mainly achieved by scaling up the model's size, measured in the number of trainable parameters, and processing gigantic amounts of training data with ever more computational power.

Just looking at the evolution of OpenAI’s GPT models, we see the model size increase exponentially over the years:

  • GPT with 117 million params [2018]
  • GPT-2 with 1.5 billion params [2019]
  • GPT-3 with 175 billion params [2020]
  • GPT-4 with 1.7 trillion params [2023]

How do LLMs work?

Irrespective of their size, all these Large Language Models work by the same core principle: they will predict the next word (token) for a given input text. And by repeating this operation over and over, adding one new word at a time, they will generate the desired new content. A more intuitive way to understand this would be by comparing it to the prediction of the next word on your iPhone Keyboard. By repeatedly choosing the next predicted word it’s possible to build up endless sentences. However the prediction on the iPhone is based on a simple statistical model for words and you will find that the generated sentence won’t make much sense, so by definition, it can’t be considered a Generative AI model.

What makes LLMs work so well for text generation is the underlying transformer architecture which is optimized for generating text by using a much more complex neural network approach, that takes into account the context of previous words when predicting the next word.

Leveraging AI-powered tools for my work as a programmer

When it comes to writing code, developers historically think of this as a craft that takes a lot of practice to get good at. It is understood that good code needs to be written manually by experienced programmers. Before Github Copilot and ChatGPT, we had a limited number of “no code” and code generation approaches, that were trying to deliver components or entire applications without writing a line of code at all. However, the general consensus used to be that those approaches are only suitable for a very limited number of use cases and won’t ever cover the variety and complexity of problems programmers are faced with on a daily basis.

With the release of Github Copilot and ChatGPT this assumption has been challenged. Out of the people using Copilot that I know almost everybody can point to a moment when they were surprised and impressed by the good answer Copilot was giving them. So we can definitely say that Copilot significantly expanded the variety and complexity of problems developers can now get good auto-generated code for. However, there are also examples of Copilot (and ChatGPT) giving a convincing answer but then the generated code doesn't work or is just going in the wrong direction. This happens more frequently when working in a very niche topic for which there is not a lot of content online or when there are lots of subtleties to the problem (e.g. a different syntax or set of features between different versions).

Examples of how I’m using AI in my day-to-day work as a programmer

1. Autocomplete lines of code with Github Copilot
At Handshake the majority of developers (me included) are using Copilot to help with their programming work. It has been trained on a large amount of publicly available code (billions of lines) and is available as an unobtrusive helper.  While I continue writing my code as usual it will autosuggest how to complete my current line or even an entire snippet of code.

2. Explain & Answer questions for a section of code with Github Copilot Chat
With Copilot Chat I can select a specific section of code and then either let the AI explain the code to me or directly ask a question to Copilot about that code.

3. Writing test cases with Github Copilot Chat
Selecting a specific part of the functionality, I can ask Copilot to generate test cases for me. The generated tests are usually a good start and save a lot of time during initial testing. I don’t recommend fully relying on the AI here, instead the output should be reviewed with scrutiny and the developer should always spend time thinking about extra cases they want to cover.

4. Having a window with ChatGPT open at all times
For a lot of general questions, it’s very convenient to be able to ask ChatGPT any time and get an immediate response back. Sometimes this is just traditional Rubber Ducking where the simple process of typing out my problem as a coherent question to ChatGPT already leads me to the correct answer by having to reframe my problem. Or I’m actually engaging with ChatGPT in one or multiple back-and-forths to get to the answer.
It takes a bit of experience to develop a feeling of which questions ChatGPT is good at answering and identifying the situations where it is just too eager to respond with an incorrect or “made up” answer (called “hallucinations”).

Apart from code questions, ChatGPT is good at helping to break down a larger task into smaller steps. Also before getting started on a new topic that I don’t know much about, it’s good to ask a high-level question like “What should I consider when …” and use ChatGPT’s answer for getting a quick overview or checklist.

Don’t trust, always verify

It takes an experienced developer to recognise that they are potentially getting stuck in a “dead end” with a convincing sounding, but incorrect answer from Copilot. Even if the code works flawlessly, AI can not relieve the developer of their responsibility to personally check that the code is working properly and not introducing any unwanted behaviors. 

The most helpful framing I have heard is to think of the AI as an intern supporting you. Your AI intern is extremely smart and can surprise you with the quality of their results, but other times they make beginner’s mistakes that are not obvious to them and require some additional coaching. Additionally, the intern’s results strongly depend on how good and complete the instructions they receive from their managers are. This highlights the importance of good prompting and making sure you describe the task as detailed as possible and with all relevant requirements to the AI.

The rapid advancements in AI over the last few years have fundamentally changed the landscape of what's possible with technology. For software developers, tools like Github Copilot and ChatGPT provide an AI assistant that can help accelerate many tasks. However, it's important to keep a critical perspective and not think of these tools as a silver bullet. 

Successful adoption requires a certain level of experience that enables programmers to leverage the AI, while never fully outsourcing their responsibility for writing robust, secure and well-architected code. Going forward, I'm excited to see AI continue to empower developers, amplifying our capabilities, while the uniquely human skills like creativity, judgment and ethics remain central to building amazing products that serve the universities, students and employers we support. 

Sign up to our monthly Early Talent and AI newsletter to keep up with the latest news around AI and the future of work from trusted sources as well generative AI tools and resources that help you do more with less.

Share