Close Menu
    Trending
    • Gemini introducerar funktionen schemalagda åtgärder i Gemini-appen
    • AIFF 2025 Runway’s tredje årliga AI Film Festival
    • AI-agenter kan nu hjälpa läkare fatta bättre beslut inom cancervård
    • Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value
    • Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.
    • 5 Crucial Tweaks That Will Make Your Charts Accessible to People with Visual Impairments
    • Why AI Projects Fail | Towards Data Science
    • The Role of Luck in Sports: Can We Measure It?
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » 4 Levels of GitHub Actions: A Guide to Data Workflow Automation
    Artificial Intelligence

    4 Levels of GitHub Actions: A Guide to Data Workflow Automation

    ProfitlyAIBy ProfitlyAIApril 4, 2025No Comments13 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    has change into an indispensable factor for making certain operational effectivity and reliability in fashionable software program improvement. GitHub Actions, an built-in Steady Integration and Steady Deployment (CI/CD) software inside GitHub, has established its place within the software program improvement trade by offering a complete platform for automating improvement and deployment workflows. Nonetheless, its functionalities prolong past this … We’ll delve into using GitHub Actions inside the realm of information area, demonstrating the way it can streamline processes for builders and information professionals by automating information retrieval from exterior sources and information transformation operations.

    GitHub Motion Advantages

    Github Actions are already well-known for its functionalities within the software program improvement area, whereas lately, additionally found as providing compelling advantages in streamlining information workflows:

    • Automate the info science environments setup, reminiscent of putting in dependencies and required packages (e.g. pandas, PyTorch).
    • Streamline the info integration and information transformation steps by connecting to databases to fetch or replace data, and utilizing scripting languages like Python to preprocess or remodel the uncooked information.
    • Create an iterable information science lifecycle by automating the coaching of machine studying fashions each time new information is accessible, and deploying fashions to manufacturing environments robotically after profitable coaching.
    • GitHub Actions is free for limitless utilization on GitHub-hosted runners for public repositories. It additionally offers 2,000 free minutes of compute time per thirty days for particular person accounts utilizing non-public repositories. It’s simple to arrange for constructing a proof-of-concept merely requiring a GitHub account, with out worrying about opting in for a cloud supplier.
    • Quite a few GitHub Actions templates, and neighborhood assets can be found on-line. Moreover, neighborhood and crowdsourced boards present solutions to widespread questions and troubleshooting help.

    GitHub Motion Constructing Blocks

    GitHub Motion is a characteristic of GitHub that permits customers to automate workflows straight inside their repositories. These workflows are outlined utilizing YAML information and could be triggered by varied occasions reminiscent of code pushes, pull requests, difficulty creation, or scheduled intervals. With its in depth library of pre-built actions and the power to put in writing customized scripts, GitHub Actions is a flexible software for automating duties.

    • Occasion: When you’ve got come throughout utilizing an automation in your units, reminiscent of turning on darkish mode when after 8pm, then you might be aware of the idea of utilizing a set off level or situation to provoke a workflow of actions. In GitHub Actions, that is known as an Occasion, which could be time-based e.g. scheduled on the first day of the month or robotically run each hour. Alternatively, Occasions could be triggered by sure behaviors, like each time adjustments are pushed from a neighborhood repository to a distant repository.
    • Workflow: A workflow consists by a sequence of jobs and GitHub permits flexibility of customizing every particular person step in a job to your wants. It’s usually outlined by a YAML file saved within the .github/workflow listing in a GitHub repository.
    • Runners: a hosted atmosphere that permits working the workflow. As a substitute of working a script in your laptop computer, now you possibly can borrow GitHub hosted runners to do the job for you or alternatively specify a self-hosted machine.
    • Runs: every iteration of working the workflow create a run, and we will see the logs of every run within the “Actions” tab. GitHub offers an interface for customers to simply visualize and monitor Motion run logs.

    4 Ranges of Github Actions

    We’ll show the implementation GitHub actions by 4 ranges of problem, beginning with the “minimal viable product” and progressively introducing extra elements and customization in every stage.

    1. “Easy Workflow” with Python Script Execution

    Begin by making a GitHub repository the place you wish to retailer your workflow and the Python script. In your repository, create a .github/workflows listing (please word that this listing should be positioned inside the workflows folder for the motion to be executed efficiently). Inside this listing, create a YAML file (e.g., simple-workflow.yaml) that defines your workflow.

    The exhibits a workflow file that executes the python script hello_world.py primarily based on a guide set off.

    identify: simple-workflow
    
    on: 
        workflow_dispatch:
        
    jobs:
        run-hello-world:
          runs-on: ubuntu-latest
          steps:
              - identify: Checkout repo content material
                makes use of: actions/checkout@v4
              - identify: run good day world
                run: python code/hello_world.py

    It consists of three sections: First, identify: simple-workflow defines the workflow identify. Second, on: workflow_dispatch specifies the situation for working the workflow, which is manually triggering every motion. Final, the workflow jobs jobs: run-hello-world break down into the next steps:

    • runs-on: ubuntu-latest: Specify the runner (i.e., a digital machine) to run the workflow — ubuntu-latest is a typical GitHub hosted runner containing an atmosphere of instruments, packages, and settings obtainable for GitHub Actions to make use of.
    • makes use of: actions/checkout@v4: Apply a pre-built GitHub Motion checkout@v4 to drag the repository content material into the runner’s atmosphere. This ensures that the workflow has entry to all needed information and scripts saved within the repository.
    • run: python code/hello_world.py: Execute the Python script situated within the code sub-directory by working shell instructions straight in your YAML workflow file.

    2. “Push Workflow” with Atmosphere Setup

    The primary workflow demonstrated the minimal viable model of the GitHub Motion, nevertheless it didn’t take full benefit of the GitHub Actions. On the second stage, we are going to add a bit extra customization and functionalities – robotically arrange the atmosphere with Python model 3.11, set up required packages and execute the script each time adjustments are pushed to important department.

    identify: push-workflow
    
    on: 
        push:
            branches:
                - important
    
    jobs:
        run-hello-world:
          runs-on: ubuntu-latest
          steps:
              - identify: Checkout repo content material
                makes use of: actions/checkout@v4
              - identify: Arrange Python
                makes use of: actions/setup-python@v5
                with:
                  python-version: '3.11' 
              - identify: Set up dependencies
                run: |
                  python -m pip set up --upgrade pip
                  pip set up -r necessities.txt
              - identify: Run good day world
                run: python code/hello_world.py
    • on: push: As a substitute of being activated by guide workflow dispatch, this permits the motion to run each time there’s a push from the native repository to the distant repository. This situation is often utilized in a software program improvement setting for integration and deployment processes, which can also be adopted within the Mlops workflow, making certain that code adjustments are constantly examined and validated earlier than being merged into a unique department. Moreover, it facilitates steady deployment by robotically deploying updates to manufacturing or staging environments as quickly as adjustments are pushed. Right here we add an non-compulsory situation branches: -main to solely set off this motion when it’s pushed to the primary department.
    • makes use of: actions/setup-python@v5: We added the “Arrange Python” step utilizing GitHub’s built-in motion setup-python@v5. Utilizing the setup-python motion is the really useful approach of utilizing Python with GitHub Actions as a result of it ensures constant conduct throughout completely different runners and variations of Python.
    • pip set up -r necessities.txt: Streamlined the set up of required packages for the atmosphere, that are saved within the necessities.txt file, thus pace up the additional constructing of information pipeline and information science resolution.

    In case you are within the fundamentals of organising a improvement atmosphere to your information science tasks, my earlier weblog publish “7 Tips to Future-Proof Machine Learning Projects” offers a bit extra clarification.

    3. “Scheduled Workflow” with Argument Parsing

    On the third stage, we add extra dynamics and complexity to make it extra appropriate for real-world functions. We introduce scheduled jobs as they bring about much more advantages to an information science undertaking, enabling periodic fetching of newer information and decreasing the necessity to manually run the script each time information refresh is required. Moreover, we make the most of dynamic argument parsing to execute the script primarily based on completely different date vary parameters in response to the schedule.

    identify: scheduled-workflow
    
    on: 
        workflow_dispatch:
        schedule:
            - cron: "0 12 1 * *" # run 1st day of each month
    
    jobs:
        run-data-pipeline:
            runs-on: ubuntu-latest
            steps:
                - identify: Checkout repo content material
                  makes use of: actions/checkout@v4
                - identify: Arrange Python
                  makes use of: actions/setup-python@v5
                  with:
                    python-version: '3.11'  # Specify your Python model right here
                - identify: Set up dependencies
                  run: |
                    python -m pip set up --upgrade pip
                    python -m http.consumer
                    pip set up -r necessities.txt
                - identify: Run information pipeline
                  run: |
                      PREV_MONTH_START=$(date -d "`date +%Ypercentm01` -1 month" +%Y-%m-%d)
                      PREV_MONTH_END=$(date -d "`date +%Ypercentm01` -1 day" +%Y-%m-%d)
                      python code/fetch_data.py --start $PREV_MONTH_START --end $PREV_MONTH_END
                - identify: Commit adjustments
                  run: |
                      git config person.identify '<github-actions>'
                      git config person.e-mail '<[email protected]>'
                      git add .
                      git commit -m "replace information"
                      git push
    • on: schedule: - cron: "0 12 1 * *": Specify a time primarily based set off utilizing the cron expression “0 12 1 * *” – run at 12:00 pm on the first day of each month. You need to use crontab.guru to assist create and validate cron expressions, which comply with the format: “minute/hour/ day of month/month/day of week”.
    • python code/fetch_data.py --start $PREV_MONTH_START --end $PREV_MONTH_END: “Run information pipeline” step runs a sequence of shell instructions. It defines two variables PREV_MONTH_START and PREV_MONTH_END to get the primary day and the final day of the earlier month. These two variables are handed to the python script “fetch_data.py” to dynamically fetch information for the earlier month relative to each time the motion is run. To permit the Python script to just accept customized variables by way of command-line arguments, we use argparse library to construct the script. This deserves a separate subject, however right here is fast preview of how the python script would appear like utilizing the argparse library to deal with command-line arguments ‘–begin’ and ‘–finish’ parameters.
    ## fetch_data.py
    
    import argparse
    import os
    import urllib
    
    def important(args=None):
    	  parser = argparse.ArgumentParser()
    	  parser.add_argument('--start', sort=str)
    	  parser.add_argument('--end', sort=str)
    	  args = parser.parse_args(args=args)
    	  print("Begin Date is: ", args.begin)
    	  print("Finish Date is: ", args.finish)
    	  
    	  date_range = pd.date_range(begin=args.begin, finish=args.finish)
    	  content_lst = []
    	
    	  for date in date_range:
    	      date = date.strftime('%Y-%m-%d')
    	
    		  params = urllib.parse.urlencode({
    	          'api_token': '<NEWS_API_TOKEN>',
    	          'published_on': date,
    	          'search': search_term,
    	      })
    		  url = '/v1/information/all?{}'.format(params)
    		    
    		  content_json = parse_news_json(url, date)
    		  content_lst.append(content_json)
    
    	  with open('information.jsonl', 'w') as f:
    	      for merchandise in content_lst:
    	          json.dump(merchandise, f)
    	          f.write('n')
    	  
          return content_lst

    When the command python code/fetch_data.py --start $PREV_MONTH_START --end $PREV_MONTH_END executes, it creates a date vary between $PREV_MONTH_START and $PREV_MONTH_END. For every day within the date vary, it generates a URL, fetches the day by day information by the API, parses the JSON response, and collects all of the content material right into a JSON listing. We then output this JSON listing to the file “information.jsonl”.

    - identify: Commit adjustments
      run: |
          git config person.identify '<github-actions>'
          git config person.e-mail '<[email protected]>'
          git add .
          git commit -m "replace information"
          git push

    As proven above, the final step “Commit adjustments” commits the adjustments, configures the git person e-mail and identify, phases the adjustments, commits them, and pushes to the distant GitHub repository. It is a needed step when working GitHub Actions that lead to adjustments to the working listing (e.g., output file “information.jsonl” is created). In any other case, the output is simply saved within the /temp folder inside the runner atmosphere, and seems as if no adjustments have been made after the motion is accomplished.

    4. “Safe Workflow” with Secrets and techniques and Atmosphere Variables Administration

    The ultimate stage focuses on bettering the safety and efficiency of the GitHub workflow by addressing non-functional necessities.

    identify: secure-workflow
    
    on: 
        workflow_dispatch:
        schedule:
            - cron: "34 23 1 * *" # run 1st day of each month
    
    jobs:
        run-data-pipeline:
            runs-on: ubuntu-latest
            steps:
                - identify: Checkout repo content material
                  makes use of: actions/checkout@v4
                - identify: Arrange Python
                  makes use of: actions/setup-python@v5
                  with:
                    python-version: '3.11'  # Specify your Python model right here
                - identify: Set up dependencies
                  run: |
                    python -m pip set up --upgrade pip
                    python -m http.consumer
                    pip set up -r necessities.txt
                - identify: Run information pipeline
                  env:
                      NEWS_API_TOKEN: ${{ secrets and techniques.NEWS_API_TOKEN }} 
                  run: |
                      PREV_MONTH_START=$(date -d "`date +%Ypercentm01` -1 month" +%Y-%m-%d)
                      PREV_MONTH_END=$(date -d "`date +%Ypercentm01` -1 day" +%Y-%m-%d)
                      python code/fetch_data.py --start $PREV_MONTH_START --end $PREV_MONTH_END
                - identify: Examine adjustments
                  id: git-check
                  run: |
                      git config person.identify 'github-actions'
                      git config person.e-mail '[email protected]'
                      git add .
                      git diff --staged --quiet || echo "adjustments=true" >> $GITHUB_ENV
                - identify: Commit and push if adjustments
                  if: env.adjustments == 'true'
                  run: |
                      git commit -m "replace information"
                      git push
                      

    To enhance workflow effectivity and cut back errors, we add a verify earlier than committing adjustments, making certain that commits and pushes solely happen when there are precise adjustments for the reason that final commit. That is achieved by the command git diff --staged --quiet || echo "adjustments=true" >> $GITHUB_ENV.

    • git diff --staged checks the distinction between the staging space and the final commit.
    • --quiet suppresses the output — it returns 0 when there aren’t any adjustments between the staged atmosphere and dealing listing; whereas it returns exit code 1 (normal error) when there are adjustments between the staged atmosphere and dealing listing
    • This command is then linked to echo "adjustments=true" >> $GITHUB_ENV by the OR operator || which tells the shell to run the remainder of the road if the primary command failed. Due to this fact, if adjustments exist, “adjustments=true” is handed to the atmosphere variable $GITHUB_ENV and accessed on the subsequent step to set off git commit and push conditioned on env.adjustments == 'true'.

    Lastly, we introduce the atmosphere secret, which reinforces safety and avoids exposing delicate info (e.g., API token, private entry token) within the codebase. Moreover, atmosphere secrets and techniques supply the good thing about separating the event atmosphere. This implies you possibly can have completely different secrets and techniques for various phases of your improvement and deployment pipeline. For instance, the testing atmosphere (e.g., within the dev department) can solely entry the take a look at token, whereas the manufacturing atmosphere (e.g. in the primary department) will be capable to entry the token linked to the manufacturing occasion.

    To arrange atmosphere secrets and techniques in GitHub:

    1. Go to your repository settings
    2. Navigate to Secrets and techniques and Variables > Actions
    3. Click on “New repository secret”
    4. Add your secret identify and worth

    After organising the GitHub atmosphere secrets and techniques, we might want to add the key to the workflow atmosphere, for instance under we added ${{ secrets and techniques.NEWS_API_TOKEN }} to the step “Run information pipeline”.

    - identify: Run information pipeline
      env:
          NEWS_API_TOKEN: ${{ secrets and techniques.NEWS_API_TOKEN }} 
      run: |
          PREV_MONTH_START=$(date -d "`date +%Ypercentm01` -1 month" +%Y-%m-%d)
          PREV_MONTH_END=$(date -d "`date +%Ypercentm01` -1 day" +%Y-%m-%d)
          python code/fetch_data.py --start $PREV_MONTH_START --end $PREV_MONTH_END

    We then replace the Python script fetch_data.py to entry the atmosphere secret utilizing os.environ.get().

    import os api_token = os.environ.get('NEWS_API_TOKEN')

    Take-Residence Message

    This information explores the implementation of GitHub Actions for constructing dynamic information pipelines, progressing by 4 completely different ranges of workflow implementations:

    • Degree 1: Fundamental workflow setup with guide triggers and easy Python script execution.
    • Degree 2: Push workflow with improvement atmosphere setup.
    • Degree 3: Scheduled workflow with dynamic date dealing with and information fetching with command-line arguments
    • Degree 4: Safe pipeline workflow with secrets and techniques and atmosphere variables administration

    Every stage builds upon the earlier one, demonstrating how GitHub Actions could be successfully utilized within the information area to streamline information options and pace up the event lifecycle.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleBenchmarking OCR APIs on Real-World Documents
    Next Article 10 Nsfw Ai Prompts For Affiliate Marketing 2025 » Ofemwire
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value

    June 6, 2025
    Artificial Intelligence

    Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.

    June 6, 2025
    Artificial Intelligence

    5 Crucial Tweaks That Will Make Your Charts Accessible to People with Visual Impairments

    June 6, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    How to Reduce Your Power BI Model Size by 90%

    May 26, 2025

    AI algorithm predicts heart disease risk from bone scans

    April 30, 2025

    ChatGPT Now Recommends Products and Prices With New Shopping Features

    April 29, 2025

    Stolen faces, stolen lives: The disturbing trend of AI-powered exploitation

    April 18, 2025

    This Self-Driving Taxi Could Replace Uber by 2025 — And It’s Backed by Toyota

    April 25, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Googles framtidsvision är att Gemini utför googling åt användarna

    May 23, 2025

    Gemini i Google Drive kan nu sammanfatta och analysera dina video filer

    May 30, 2025

    4 Levels of GitHub Actions: A Guide to Data Workflow Automation

    April 4, 2025
    Our Picks

    Gemini introducerar funktionen schemalagda åtgärder i Gemini-appen

    June 7, 2025

    AIFF 2025 Runway’s tredje årliga AI Film Festival

    June 7, 2025

    AI-agenter kan nu hjälpa läkare fatta bättre beslut inom cancervård

    June 7, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.