AI programmer Devin covertly addresses bugs, discussing technology with CTO and netizen at top coder proficiency.

The first AI programmer Devinappeared in the internal group of celebrity startup companies.

To solve a technical problem, Devin borrowed his creator's account,Communicate with the CTO of the client companyand adjusted the code plan based on the responses.

Advertisement

The conversation was so professional that onlookers said that the world is too crazy.

The incident happened in the office software Slack. Akshat in the screenshot is an AI infrastructure startup company. Modal Labs of CTO Akshat Bubna.

Modal Labs is also one of the first customers of Devin developer Cognition.

At this time Devin is wearing one of his creators and IOI gold medalist Steven Hao vest.

Advertisement

The conversation began with Devin, an AI programmer, asking about the lifecycle of keys on the Modal Lab platform, specifically the time it takes for keys to be updated and propagated to running applications.

Devin saidI have checked the documentation myselfincluding a guide to keys and environment variables, a CLI command reference, an API reference, and container lifecycle hooks and parameters,However, no clear information about the key propagation time has been found..

Devin asked how long it typically takes for updated keys to be used by running applications, as this is critical to their operations,Understanding this will help manage their deployment process.

The human CTO explained that when keys are updated, they will not invalidate already running Modal containers, but newly started containers will read the updated values.

Devin thanks andDecided to temporarily adopt a manual method to manage keys in Modal, that is, calling the modal deploy command to trigger the restart of the relevant application container when needed.

After watching the entire process, Raunak Chowdhuri, who is also an AI entrepreneur, commented:

Finding problems, creating tickets, and tweaking code is how the best human developers work.

Devin More actual test results

Not many people and companies have obtained Devin's early test qualifications, but some people are still publishing actual test results.

Wharton professor passionate about AI Ethan Molick After trying it, I think its novel real-time interaction method is the most worthy of attention.

You can “talk” to it at any time, just like a human, and it will constantly execute and debug your ideas in the background.

In a test, Ethan Mollick asked Devin to develop a website explaining “dilution in startup financing.”

However, he revealed that AI is not yet able to do this autonomously and error-free without any help.

There's still a long way to go before we can hand over a major project to artificial intelligence, but it's still a fascinating start.

Another entrepreneur shared the testing process McKay Wrigley A little more exciting.

In the 27-minute test he posted, he only sent a GitHub connection to let Devin deploy code from open source projects.

Devin Independently break down tasks into a series of sub-stepsand start executing it step by step.

During the execution process, Devin encountered obstacles when installing the Supabase database.I opened the corresponding Github repository and started checking the documentation.

It can be seen from the subsequent terminal feedback that Devin found out what should be filled in for various ports and keys required to run Supabase.

(Everyone who has pretended to do so knows that feeding birds is quite troublesome…)

Meanwhile, Devin is stillContinuously revise your follow-up plans according to the actual situation.

After a while, a local chatbot program started running.

After testing for a while, Mckay Wrigley thought,Devin can already calculate the Agent's ChatGPT moment.

Reproduce Devin planning

Everyone on Devin's side is still testing continuously, and the open source “reproduction” plan on the other side is also in progress…

No, GitHub 30,000 Star project MetaGPT Just released the new “Open Source Version of Devin”.

nameddata interpreter(Data Interpreter):

Like Devin, Data Interpreter can also implement autonomous programming, iteratively observe data, predict and analyze disease progression, and machine operating status; it can also build machine learning models, perform mathematical reasoning, automatically reply to emails, and imitate websites…

For example, analyze the closing price trend from NVIDIA stock price data:

Analyzing data to predict wine quality:

In addition, Ali Qwen member Binyan Hui and others opened OpenDevin The project has just started and has received 1.2k stars.

Binyan Hui tweeted that there was a preliminary roadmap and a group of great people working hard to complete the front-end prototype in a short period of time.

At the same time, the project team is also recruiting new members:

In addition, a team called Maisa AI launched Maisa KPU(Knowledge Processing Unit), which is considered to have some competition with Devin by netizens.

Maisa KPU is currently in the testing stage, and it can solve complex problems and reasoning. The benchmark results released by the team are as follows:

According to the demo, KPU can become an “intelligent customer service” to help customers solve the problem of undelivered orders when the customer does not write down the order number correctly:

Devin Benchmark Technical Report Released

Recently, Devin's founding team Cognition also released a technical report on SWE-bench testing. In addition to the previously announced test results, the team also revealed some new information.

For example, one of Cognition's goals is to enable Devin, an AI agent specializing in software development, to successfully contribute code to large, complex code bases.

The choice to run the agent end-to-end on SWE-bench is also based on the consideration that it is closer to real-world software development.

In addition, the R&D team also revealed that in order to prevent Devin from cheating in the test, such as looking for external pull request information, the test has been set up to ensure that Devin cannot access relevant information, and the Devin operation has been manually checked during the process. .

Finally, the team emphasized that Devin is still in its infancy and there is still a lot of room for improvement:

Families interested in more details can view the report details.

Less than a week after Devin was released, the discussion among netizens is already very heated. For example, the eldest brother said that what he was worried about a year ago finally happened. From now on, Stack Overflow will be filled with Devins asking questions, and people will have to be squeezed out (Stack Overflow is in danger!!!):

Some netizens responded (manual dog head):

They can answer each other's questions.

Some netizens discovered that Cognition, the team behind Devin, was recruiting full-time software engineers, so they slowly raised a question mark:

Shouldn't Devin be filling these positions to save them money?

Finally, if Devin were released to the public, what would you want to do with it?

Reference links:

  • (1)https://www.cognition-labs.com/post/swe-bench-technical-report

  • (2)https://x.com/raunakdoesdev/status/1769066769786757375

  • (3)https://twitter.com/emollick/status/1768742585122558063

  • (4)https://x.com/mckaywrigley/status/1767985840448516343

  • (5)https://x.com/maisaAI_/status/1768657114669429103?s=20

Advertisement