Close Menu
    Trending
    • Topp 10 AI-filmer genom tiderna
    • OpenAIs nya webbläsare ChatGPT Atlas
    • Creating AI that matters | MIT News
    • Scaling Recommender Transformers to a Billion Parameters
    • Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know
    • Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI
    • ChatGPT Gets More Personal. Is Society Ready for It?
    • Why the Future Is Human + Machine
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Advanced Prompt Engineering for Data Science Projects
    Artificial Intelligence

    Advanced Prompt Engineering for Data Science Projects

    ProfitlyAIBy ProfitlyAIAugust 19, 2025No Comments12 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    , you’ve gotten most likely puzzled a number of occasions how one can enhance your workflows, how one can velocity up duties, and how one can output higher outcomes.

    The daybreak of LLMs has helped quite a few knowledge scientists and ML engineers to not solely enhance their fashions but additionally to assist them iterate sooner, be taught and deal with the duties that actually matter.

    On this article, I’m sharing with you my favourite prompts and immediate engineering ideas that assist me deal with Information Science and AI duties.

    In addition to, quickly Immediate Engineering will likely be a required ability in virtually all DS and ML job descriptions.

    This information walks you thru sensible, research-backed immediate methods that velocity up (and generally automate) each stage of your ML workflow.

    That is the second of a collection of 3 articles I’m writing about Immediate Engineering for Information Science:

    • Half 2: Immediate Engineering for Options, Modeling, and Analysis (this text)
    • Half 3: Immediate Engineering for Docs, DevOps, and Studying

    👉All of the prompts on this article can be found on the finish of this text as a cheat sheet 😉

    On this article:

    1. First Issues First: What Makes a Good Immediate?
    2. Immediate Engineering for Options, Modeling, and Analysis
    3. Immediate Engineering cheat sheet

    First Issues First: What Makes a Good Immediate?

    You would possibly know this by now but it surely’s all the time good to refresh our minds about this. Let’s break it down.

    Anatomy of a Excessive-High quality Immediate

    Function & Process

    Begin by telling the LLM who it’s and what it must do. E.g.:

    "You're a senior knowledge scientist with expertise in characteristic engineering, knowledge cleansing and mannequin deployment".)

    Context & Constraints

    This half is admittedly necessary. Add particulars and context as a lot as you may.

    Professional Tip: Add all particulars + context in the identical immediate. It’s confirmed that it really works finest like this.

    This consists of: knowledge kind and format, knowledge supply and origin, pattern schema, output format, degree of element, construction, tone and magnificence, token limits, calculation guidelines, area information, and many others.

    Examples or Checks

    Give it a number of examples to observe, and even unit checks to verify the output.

    Instance — Formatting type for a abstract

    **Enter:**
    Transaction: { "quantity": 50.5, "foreign money": "USD", "kind": "credit score", "date": "2025-07-01" }
    
    **Desired Output:**
    - Date: 1 July 2025
    - Quantity: $50.50
    - Sort: Credit score
    

    Analysis Hook

    Ask it to fee its personal response, clarify its reasoning, or output a confidence rating.

    Different Prompting Suggestions

    Clear delimiters (##) make sections scannable. Use them on a regular basis!

    Put your directions earlier than the info, and wrap context in clear delimiters like triple backticks.

    Eg: ## These are my directions

    Be as a lot particular as you may. Say “return a Python record” or “solely output legitimate SQL.”

    Preserve the temperature low (≤0.3) for duties that want constant output, however you may improve it for inventive duties like characteristic brainstorming.

    In case you are on a finances, use cheaper fashions for fast concepts, then swap to a premium one to shine the ultimate model.

    Immediate Engineering for Options, Modeling, and Analysis

    1. Textual content Options

    With the appropriate immediate, an LLM can immediately generate a various set of semantic, rule-based, or linguistic options, full with sensible examples you may, after reviewing, plug into your workflow.

    Template: Univariate Textual content Characteristic Brainstorm

    ## Directions
    Function: You're a feature-engineering assistant.  
    Process: Suggest 10 candidate options to foretell {goal}.  
    
    ## Context
    Textual content supply: """{doc_snippet}"""  
    Constraints: Use solely pandas & scikit-learn. Keep away from duplicates.  
    
    ## Output
    Markdown desk: [FeatureName | FeatureType | PythonSnippet | NoveltyScore(0–1)]  
    
    ## Self-check
    Charge your confidence in protection (0–1) and clarify in ≤30 phrases.
    

    Professional Suggestions:

    • Pair this with embeddings to create dense options.
    • Validate the outputted Python snippets in a sandboxed setting earlier than utilizing them (so that you catch syntax errors or knowledge sorts that don’t match).

    2. Tabular Options

    Guide characteristic engineering is normally not enjoyable. Particularly for tabular knowledge, this course of can take some days and it’s normally very subjective.

    Instruments like LLM-FE take a special method. They deal with LLMs as evolutionary optimizers that iteratively invent and refine options till the efficiency will get higher.

    Developed by researchers at Virginia Tech, LLM-FE works in loops:

    1. The LLM proposes a brand new transformation primarily based on the present dataset schema.
    2. The candidate characteristic is examined utilizing a easy downstream mannequin.
    3. Probably the most promising options are stored, refined, or mixed (similar to in genetic algorithms, however powered by pure language prompts).

    This technique has confirmed to carry out rather well in comparison with handbook characteristic engineering.

    Structure of the LLM-FE framework, the place a big language mannequin acts as an evolutionary optimizer. Supply: nikhilsab/LLMFE: This is the official repo for the paper “LLM-FE”

    Immediate (LLM-FE type):

    ## Directions
    Function: Evolutionary characteristic engineer.  
    Process: Counsel ONE new characteristic from schema {schema}.  
    Health aim: Max mutual data with {goal}.  
    
    ## Output
    JSON: { "feature_name": "...", "python_expression": "...", "reasoning": "... (≤40 phrases)" }  
    
    ## Self-check
    Charge novelty & anticipated influence on course correlation (0–1).
    

    3. Time-Sequence Options

    When you’ve ever battle with seasonal developments or sudden spikes in your time-series knowledge, you recognize it may be laborious to cope with all of the shifting items.

    TEMPO is a mission that permits you to immediate for decomposition and forecasting in a single easy step, so it may well prevent hours of handbook work.

    Seasonality-Conscious Immediate:

    ## Directions
    System: You're a temporal knowledge scientist.  
    Process: Decompose time collection {y_t} into elements.  
    
    ## Output
    Dict with keys: ["trend", "seasonal", "residual"]  
    
    ## Further
    Clarify detected change-points in ≤60 phrases.  
    Self-check: Affirm decomposition sums ≈ y_t (tolerance 1e-6).
    

    4. Textual content Embedding Options

    The thought of the subsequent immediate is fairly simple: I’m taking paperwork and pulling out the important thing insights that may truly be helpful for somebody attempting to grasp what they’re coping with.

    ## Directions
    Function: NLP characteristic engineer
    Process: For every doc, return sentiment_score, top3_keywords, reading_level.
    
    ## Constraints
    - sentiment_score in [-1,1] (neg→pos)
    - top3_keywords: lowercase, no stopwords/punctuation, ranked by tf-idf (fallback: frequency)
    - reading_level: Flesch–Kincaid Grade (quantity)
    
    ## Output
    CSV with header: doc_id,sentiment_score,top3_keywords,reading_level
    
    ## Enter
    docs = [{ "doc_id": "...", "text": "..." }, ...]
    
    ## Self-check
    - Header current (Y/N)
    - Row depend == len(docs) (Y/N)
    

    As a substitute of simply providing you with a fundamental “constructive/adverse” classification, I’m utilizing a steady rating between -1 and 1, which supplies you far more nuance.

    For the key phrase extraction, I went with TF-IDF rating as a result of it truly works rather well at surfacing the phrases that matter most in every doc.

    Code Era & AutoML

    Selecting the best mannequin, constructing the pipeline, and tuning the parameters—it’s the holy trinity of machine studying, but additionally the half that may eat up days of labor.

    LLMs are game-changers for these things. As a substitute of me sitting there evaluating dozens of fashions or hand-coding yet one more preprocessing pipeline, I can simply describe what I’m attempting to do and get strong suggestions again.

    Mannequin Choice Immediate Template:

    ## Directions
    System: You're a senior ML engineer.  
    Process: Analyze preview knowledge + metric = {metric}.  
    
    ## Steps
    1. Rank high 5 candidate fashions.  
    2. Write scikit-learn Pipeline for one of the best one.  
    3. Suggest 3 hyperparameter grids.  
    
    ## Output
    Markdown with sections: [Ranking], [Code], [Grids]  
    
    ## Self-check
    Justify high mannequin selection in ≤30 phrases.
    

    You don’t should cease at rating and pipelines, although.

    You can too tweak this immediate to incorporate mannequin explainability from the start. This implies asking the LLM to justify why it ranked fashions in a sure order or to output characteristic significance (SHAP values) after coaching.

    That means, you’re not simply getting a black-box suggestion, you’re getting a transparent reasoning behind it.

    Bonus Bit (Azure ML Version)

    When you’re utilizing Azure Machine Studying, this will likely be helpful to you.

    With AutoMLStep, you wrap a whole automated machine studying experiment (mannequin choice, tuning, analysis) right into a modular step inside an Azure ML pipeline. You’ll have entry to model management, scheduling, and simple repeat runs.

    You can too make use of Prompt Flow: it provides a visible, node-based layer to this. Options embrace drag‑and‑drop UI, move diagrams, immediate testing, branching logic, and reside analysis:

    Instance of a easy pipeline in Azure AI Foundry’s Immediate Stream editor, the place totally different instruments—just like the LLM instrument and Python instrument—are linked collectively. Supply: Prompt flow in Azure AI Foundry portal – Azure AI Foundry | Microsoft Learn

    You can too simply plug Immediate Stream into no matter you’ve already obtained operating, after which your LLM and AutoML items all work collectively with none downside. Every part simply flows in a single automated setup you can truly ship.

    Prompts for Nice-Tuning

    Nice-tuning a big mannequin doesn’t all the time imply retraining it from scratch (who has time for that?).

    As a substitute, you should utilize light-weight methods like LoRA (Low-Rank Adaptation) and PEFT (Parameter-Environment friendly Nice-Tuning).

    LoRA

    So LoRa is definitely fairly intelligent, as as an alternative of retraining an enormous mannequin from scratch, it mainly simply provides tiny trainable layers on high of what’s already there. A lot of the authentic mannequin stays frozen, and also you’re solely tweaking these small weight matrices to get it to do what you need.

    PEFT

    PEFT is mainly the umbrella time period for all these sensible approaches (LoRA being one in all them) the place you’re solely coaching a small slice of the mannequin’s parameters as an alternative of the entire large factor.

    The compute financial savings are unbelievable. What used to take perpetually and break the bank now runs means sooner and cheaper since you’re barely touching many of the mannequin.

    The perfect factor about all of this: you don’t even have to write down these fine-tuning scripts your self anymore. LLMs can truly generate the code for you, and so they get higher at it over time by studying from how nicely your fashions carry out.

    Nice-Tuning Dialogue Immediate

    ## Directions
    Function: AutoTunerGPT.  
    Signature: base_model, task_dataset → tuned_model_path.  
    Objective: Nice-tune {base_model} on {task_dataset} utilizing PEFT-LoRA.
    
    ## Constraints
    - batch_size ≤ 16, epochs ≤ 5  
    - Save to ./lora-model  
    - Use F1 on validation; set seed=42; allow early stopping (no val acquire 2 epochs)
    
    ## Output
    JSON:
    {
      "tuned_model_path": "./lora-model",
      "train_args": { "batch_size": ..., "epochs": ..., "learning_rate": ..., "lora_r": ..., "lora_alpha": ..., "lora_dropout": ... },
      "val_metrics": { "f1_before": ..., "f1_after": ... },
      "expected_f1_gain": ...
    }
    
    ## Self-check
    - Confirm constraints revered (Y/N).  
    - If N, clarify in ≤20 phrases.
    

    Device tip: Use DSPy to enhance this course of. DSPy is an open-source framework for constructing self-improving pipelines. This implies it may well mechanically rewrite prompts, implement constraints (like batch dimension or coaching epochs), and monitor each change in a number of runs.

    In observe, you may run a fine-tuning job right now, overview the outcomes tomorrow, and have the system auto-adjust your immediate and coaching settings for a greater outcome with out you having to start out from scratch!

    Let LLMs Consider Your Fashions

    Smarter Analysis Prompts
    Research present that LLMs rating predictions virtually like people, when guided by good prompts.

    Listed below are 3 prompts that may aid you increase your analysis course of:

    Single-Instance Analysis Immediate

    ## Directions
    System: Analysis assistant.  
    Consumer: Floor fact = {fact}; Prediction = {pred}.
    
    ## Standards
    - factual_accuracy ∈ [0,1]: 1 if semantically equal to fact; 0 if contradictory; partial if lacking/additional however not fallacious.  
    - completeness ∈ [0,1]: fraction of required info from fact current in pred.
    
    ## Output
    JSON:
    { "accuracy": <float>, "completeness": <float>, "rationalization": "<≤40 phrases>" }
    
    ## Self-check
    Cite which info have been matched/missed in ≤15 phrases.
    

    Cross-Validation Code

    ## Directions
    You might be CodeGenGPT.
    
    ## Process
    Write Python to:
    - Load prepare.csv
    - Stratified 80/20 break up
    - Practice LightGBM on {feature_list}
    - Compute & log ROC-AUC (validation)
    
    ## Constraints
    - Assume label column: "goal"
    - Use sklearn for break up/metric, lightgbm.LGBMClassifier
    - random_state=42, test_size=0.2
    - Return ONLY a Python code block (no prose)
    
    ## Output
    (solely code block)
    

    Regression Decide

    ## Directions
    System: Regression evaluator
    Enter: Fact={y_true}; Prediction={y_pred}
    
    ## Guidelines
    abs_error = imply absolute error over all factors
    Let R = max(y_true) - min(y_true)
    Class:
    - "Wonderful" if abs_error ≤ 0.05 * R
    - "Acceptable" if 0.05 * R < abs_error ≤ 0.15 * R
    - "Poor" if abs_error > 0.15 * R
    
    ## Output
    { "abs_error": <float>, "class": "Wonderful/Acceptable/Poor" }
    
    ## Self-check (transient)
    Validate len(y_true)==len(y_pred) (Y/N)
    

    Troubleshooting Information: Immediate Version

    When you ever discover one in all these 3 issues, right here is how one can repair it:

    Drawback Symptom Repair
    Hallucinated options Makes use of columns that don’t exist Add schema + validation in immediate
    An excessive amount of “inventive” code Flaky pipelines Set library limits + add check snippets
    Analysis drift Inconsistent scoring Set temp=0, log immediate model

    Wrapping It Up

    Since LLMs grew to become fashionable, immediate engineering has formally leveled up. Now, it’s a true and severe methodology that touches each a part of ML and DS workflows. That’s why an enormous a part of AI analysis is concentrated on how one can enhance and optimize prompts.

    On the finish, higher immediate engineering means higher outputs and a number of time saved. Which I suppose is the dream of any knowledge scientist 😉


    Thanks for studying!

    👉 Seize the Immediate Engineering Cheat Sheet with all prompts of this text organized. I’ll ship it to you while you subscribe to Sara’s AI Automation Digest. You’ll additionally get entry to an AI instrument library and my free AI automation publication each week!

    Thanks for studying! 😉


    I supply mentorship on profession progress and transition here.

    If you wish to help my work, you may buy me my favorite coffee: a cappuccino. 😊

    References

    What is LoRA (Low-Rank Adaption)? | IBM

    A Guide to Using ChatGPT For Data Science Projects | DataCamp



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWater Cooler Small Talk: Should ChatGPT Be Blocked at Work?
    Next Article Capturing and Deploying PyTorch Models with torch.export
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Creating AI that matters | MIT News

    October 21, 2025
    Artificial Intelligence

    Scaling Recommender Transformers to a Billion Parameters

    October 21, 2025
    Artificial Intelligence

    Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know

    October 21, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Shaip Announces Successful Completion of SOC 2 Type 2 Audit for Shaip Data Platform

    April 7, 2025

    This patient’s Neuralink brain implant gets a boost from Grok

    May 7, 2025

    Implementing IBCS rules in Power BI

    July 1, 2025

    OpenAI’s GPT-5 Is Nearly Here. And It Might Be the Moment AGI Arrives

    July 29, 2025

    How we really judge AI

    June 10, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Tips for Setting Expectations in AI Projects

    August 13, 2025

    New Skechers AI Store Assistant Rates Outfit and Suggests What to Buy

    May 2, 2025

    Mobile App Development with Python | Towards Data Science

    June 11, 2025
    Our Picks

    Topp 10 AI-filmer genom tiderna

    October 22, 2025

    OpenAIs nya webbläsare ChatGPT Atlas

    October 22, 2025

    Creating AI that matters | MIT News

    October 21, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.