With the buzz about Artificial Intelligence dominating the tech news scene for the last several years, many are asking if their jobs are safe. Generative AI like ChatGPT seems a natural fit for creating software, which is primarily created using text instructions in one or more programming languages. Microsoft, which has invested heavily in OpenAI, uses ChatGPT to power a tool called Github Copilot. They position it as a coding assistant, your "AI pair programmer". But it makes you wonder - could it render human software developers obsolete?
The short answer is: No.
Microsoft has correctly judged Copilot to be a tool, and not a replacement, for humans. But how good is it? Copilot's language syntax is almost always perfect, which is impressive. However, it introduces a whole new class of mistakes that a human would not make.
Much of the challenge in using Copilot is providing the right amount of context for it. If you give it an example to mimic, and then give it instructions for what you want, it usually does very well. If you provide too much context, though, such as several hundred lines of code, Copilot cannot determine which part of the code applies to your question. Humans have a larger working memory than Copilot along with a greater ability to identify what needs to change, and that makes a big difference.
Another key challenge to using Copilot is the same challenge developers face when doing a web search - how do you know if the answer is actually applicable to your situation? To make this decision, a human has to have sufficient understanding of the language and libraries to evaluate the code they find, compare it to their need, and adapt it as appropriate.
We see this when interviewing job candidates to work at Sembit; one of the skills we judge is looking up information on the web, because we all use online resources throughout our workday. There are lots of code examples and answers out there - picking an appropriate solution among that motley group is a skill indeed. The best developers use web searches (AI-powered or otherwise) as an extension of their mind, plucking specific details in seconds rather than trying to remember everything. They learn new details on the fly, filling gaps in their understanding. Less skilled developers spend many minutes searching with poor terms and overlooking solutions because they don't know the language well enough to recognize the answer when it's on the screen in front of them.
Just like a web search, Copilot is hit-and-miss with the quality of its answers. And unfortunately, a mostly-right solution with a subtle bug is arguably the worst case - it is more likely to be missed by code review.
Some people are very optimistic about AI advances in the coming years, and project a near future where AI will surpass human skill in many areas. This is based on projecting forward the sharp up-turn in AI performance over the last 5 years. In the area of programming, though, our tests this month showed Copilot to perform at about the same level as it did a year ago. Amazing for a machine, but not good enough to truly save any time for an experienced developer.
Here is the transcript of the tasks we gave it.