feeling a relentless sense of AI FOMO. Day by day, I see individuals sharing AI ideas, new brokers and expertise they constructed, and vibe-coded apps. I’m more and more realizing that adapting shortly to AI is turning into a requirement for staying aggressive as an information scientist right now.
However I’m not solely speaking about brainstorming with ChatGPT, producing code with Cursor, or sharpening a report with Claude. The larger shift is that AI can now take part in a way more end-to-end knowledge science workflow.
To make the concept concrete, I attempted it on an actual venture utilizing my Apple Well being knowledge.
A Easy Instance — Apple Well being Evaluation
Context
I’ve been sporting an Apple Watch daily since 2019 to trace my well being knowledge, akin to coronary heart charge, vitality burned, sleep high quality, and so forth. This knowledge accommodates years of behavioral alerts about my every day life, however the Apple Well being app principally surfaces it with easy pattern views.
I attempted to investigate a two-year Apple Well being export six years in the past. However it ended up turning into a type of facet initiatives that you simply by no means completed… My objective this time is to extract extra insights from the uncooked knowledge shortly with the assistance of AI.
What I needed to work with
Listed here are the related assets I’ve:
- Uncooked Apple Well being export knowledge: 1.85GB in XML, uploaded to my Google Drive.
- Pattern code to parse the uncooked export to structured datasets in my GitHub repo from six years in the past. However the code could possibly be outdated.
Workflow with out AI
An ordinary workflow with out AI would look quite a bit like what I attempted six years in the past: Examine the XML construction, write Python to parse it into structured native datasets, conduct EDA with Pandas and Numpy, and summarize the insights.
I’m certain each knowledge scientist is aware of this course of — it’s not rocket science, but it surely takes time to construct. To get to a cultured insights report, it will take a minimum of a full day. That’s why that 6-year-old repo remains to be marked as WIP…
AI end-to-end workflow
My up to date workflow with AI is:
- AI locates the uncooked knowledge in my Google Drive and downloads it.
- AI references my previous GitHub code and writes a Python script to parse the uncooked knowledge.
- AI uploads the parsed datasets to Google BigQuery. In fact, the evaluation may be carried out regionally with out BigQuery, however I set it up this option to higher resemble an actual work setting.
- AI runs SQL queries towards BigQuery to conduct the evaluation and compile an evaluation report.
Primarily, AI handles practically each step from knowledge engineering to evaluation, with me appearing extra as a reviewer and decision-maker.
AI-generated report
Now, let’s see what Codex was in a position to generate with my steering and a few back-and-forth in half-hour, excluding the time to arrange the setting and tooling.
I selected Codex as a result of I primarily use Claude Code at work, so I needed to discover a distinct instrument. I used this opportunity to arrange my Codex setting from scratch so I can higher consider all the hassle required.
You’ll be able to see that this report is nicely structured and visually polished. It summarized priceless insights into annual tendencies, train consistency, and the affect of journey on exercise ranges. It additionally supplied suggestions and said limitations and assumptions. What impressed me most was not simply the velocity, however how shortly the output started to seem like a stakeholder-facing evaluation as a substitute of a tough pocket book.
Please be aware that the report is sanitized for my knowledge privateness.



How I Really Did It
Now that we’ve seen the spectacular work AI can generate in half-hour, let me break it down and present you all of the steps I took to make it occur. I used Codex for this experiment. Like Claude Code, it might probably run within the desktop app, an IDE, or the CLI.
1. Arrange MCP
To allow Codex to entry instruments, together with Google Drive, GitHub, and Google BigQuery, the following step was to arrange Mannequin Context Protocol (MCP) servers.
The best option to arrange MCP is to ask Codex to do it for you. For instance, once I requested it to arrange Google Drive MCP, it configured my native recordsdata shortly with clear subsequent steps on tips on how to create an OAuth consumer within the Google Cloud Console.
It doesn’t all the time succeed on the primary strive, however persistence helps. After I requested it to arrange BigQuery MCP, it failed a minimum of 10 instances earlier than the connection succeeded. However every time, it supplied me with clear directions on tips on how to take a look at it and what data was useful for troubleshooting.


2. Make a plan with the Plan Mode
After organising the MCPs, I moved to the precise venture. For an advanced venture that includes a number of knowledge sources/instruments/questions, I often begin with the Plan Mode to decide on the implementation steps. In each Claude Code and Codex, you’ll be able to allow Plan Mode with /plan. It really works like this: you define the duty and your tough plan, the mannequin asks clarifying questions and proposes a extra detailed implementation plan so that you can overview and refine. Within the screenshots beneath, you’ll find my first iteration with it.



3. Execution and iteration
After I hit “Sure, implement this plan”, Codex began executing by itself, following the steps. It labored for 13 minutes and generated the primary evaluation beneath. It moved quick throughout totally different instruments, but it surely did the evaluation regionally because it encountered extra points with the BigQuery MCP. After one other spherical of troubleshooting, it was in a position to add the datasets and run queries in BigQuery correctly.

Nevertheless, the first-pass output was nonetheless shallow, so I guided it to go deeper with follow-up questions. For instance, I’ve flight tickets and journey plans from previous travels in my Google Drive. I requested it to search out them and analyze my exercise patterns throughout journeys. It efficiently positioned these recordsdata, extracted my journey days, and ran the evaluation.
After a couple of iterations, it was in a position to generate a way more complete report, as I shared at first, inside half-hour. You will discover its code here. That was most likely one of the vital vital classes from the train: AI moved quick, however depth nonetheless got here from iteration and higher questions.

Takeaways for Information Scientists
What AI Modifications
Above is a small instance of how I used Codex and MCPs to run an end-to-end evaluation with out manually writing a single line of code. What are the takeaways for knowledge scientists at work?
- Suppose past coding help. Relatively than utilizing AI just for coding and writing, it’s price increasing its position throughout the complete knowledge science lifecycle. Right here, I used AI to find uncooked knowledge in Google Drive and add parsed datasets to BigQuery. There are various extra AI use instances associated to knowledge pipelining and mannequin deployment.
- Context turns into a pressure multiplier. MCPs are what made this workflow way more highly effective. Codex scanned my Google Drive to find my journey dates and skim my previous GitHub code to search out pattern parsing code. Equally, you’ll be able to allow different company-approved MCPs to assist your AI (and your self) higher perceive the context. For instance:
– Hook up with Slack MCP and Gmail MCP to seek for previous related conversations.
– Use Atlassian MCP to entry the desk documentation on Confluence.
– Arrange Snowflake MCP to discover the info schema and run queries. - Guidelines and reusable expertise matter. Though I didn’t reveal it explicitly on this instance, you need to customise guidelines and create expertise to information your AI and lengthen its capabilities. These subjects are price their very own article subsequent time 🙂
How the Function of Information Scientists Will Evolve
However does this imply AI will substitute knowledge scientists? This instance additionally sheds mild on how knowledge scientists’ roles will pivot sooner or later.
- Much less handbook execution, extra problem-solving. Within the instance above, the preliminary evaluation Codex generated was very fundamental. The standard of AI-generated evaluation relies upon closely on the standard of your drawback framing. You should outline the query clearly, break it into actionable duties, determine the best method, and push the evaluation deeper.
- Area information is vital. Area information remains to be very a lot required to interpret outcomes appropriately and supply suggestions. For instance, AI observed my exercise stage had declined considerably since 2020. It couldn’t discover a convincing rationalization, however mentioned: “Doable causes embrace routine modifications, work schedule, life-style shifts, damage, motivation, or much less structured coaching, however these are inferences, not findings.” However the true purpose behind it, as you might need realized, is the pandemic. I began working from residence in early 2020, so naturally, I burned fewer energy. This can be a quite simple instance of why area information nonetheless issues — even when AI can entry all of the previous docs in your organization, it doesn’t imply it’ll perceive all of the enterprise nuances, and that’s your aggressive benefit.
- This instance was comparatively easy, however there are nonetheless many courses of labor the place I’d not belief AI to function independently right now, particularly initiatives that require stronger technical and statistical judgment, akin to causal inference.
Essential Caveats
Final however not least, there are some concerns you have got to bear in mind whereas utilizing AI:
- Information safety. I’m certain you’ve heard this many instances already, however let me repeat it as soon as extra. The information safety threat of utilizing AI is actual. For a private facet venture, I can set issues up nonetheless I need and take my very own threat (truthfully, granting AI full entry to Google Drive appears like a dangerous transfer, so that is extra for illustration functions). However at work, all the time comply with your organization’s steering on which instruments are protected to make use of and the way. And ensure to learn by each single command earlier than clicking “approve”.
- Double-check the code. For my easy venture, AI can write correct SQL with out issues. However in additional sophisticated enterprise settings, I nonetheless see AI make errors in its code now and again. Generally, it joins tables with totally different granularities, inflicting fanning out and double-counting. Different instances, it misses vital filters and circumstances.
- AI is handy, but it surely would possibly accomplish your ask with sudden unintended effects… Let me let you know a joke to finish this text. This morning, I turned on my laptop computer and noticed an alert of no disk storage left — I’ve a 512GB SSD MacBook Professional, and I used to be fairly certain I had solely used round half of the storage. Since I used to be enjoying with Codex final evening, it turned my first suspect. So I really requested it, “hey did you do something? My ‘system knowledge’ had grown by 150GB in a single day”. It responded, “No, Codex solely takes xx MB”. Then I dug up my recordsdata and noticed a 142GB “bigquery-mcp-wrapper.log”… Doubtless, Codex arrange this log when it was troubleshooting the BigQuery MCP setup. Later within the precise evaluation job, it exploded into an enormous file. So sure, this magical wishing machine comes at a price.
This expertise summed up the tradeoff nicely for me: AI can dramatically compress the space between uncooked knowledge and helpful evaluation, however getting probably the most out of it nonetheless requires judgment, oversight, and a willingness to debug the workflow itself.
