Close Menu
    Trending
    • What health care providers actually want from AI
    • Alibaba har lanserat Qwen-Image-Edit en AI-bildbehandlingsverktyg som öppenkällkod
    • Can an AI doppelgänger help me do my job?
    • Therapists are secretly using ChatGPT during sessions. Clients are triggered.
    • Anthropic testar ett AI-webbläsartillägg för Chrome
    • A Practical Blueprint for AI Document Classification
    • Top Priorities for Shared Services and GBS Leaders for 2026
    • The Generalist: The New All-Around Type of Data Professional?
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » How to Import Pre-Annotated Data into Label Studio and Run the Full Stack with Docker
    Artificial Intelligence

    How to Import Pre-Annotated Data into Label Studio and Run the Full Stack with Docker

    ProfitlyAIBy ProfitlyAIAugust 29, 2025No Comments9 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Dataset preparation for an object detection coaching workflow can take a very long time and infrequently be irritating. Label Studio, an open-source knowledge annotation software, can assist by offering a simple strategy to annotate datasets. It helps all kinds of annotation templates, together with pc imaginative and prescient, pure language processing, and audio or speech processing. Nonetheless, we’ll focus particularly on the item detection workflow.

    However what if you wish to make the most of pre-annotated open-source datasets, such because the Pascal VOC dataset? On this article, I’ll present you simply import these duties into Label Studio’s format whereas establishing the complete stack — together with a PostgreSQL database, MinIO object storage, an Nginx reverse proxy, and the Label Studio backend. MinIO is an S3-compatible object storage service: you would possibly use cloud-native storage in manufacturing, however you can too run it regionally for growth and testing.

    On this tutorial, we’ll undergo the next steps:

    1. Convert Pascal VOC annotations – remodel bounding containers from XML into Label Studio duties in JSON format.
    2. Run the complete stack – begin Label Studio with PostgreSQL, MinIO, Nginx, and the backend utilizing Docker Compose.
    3. Arrange a Label Studio venture – configure a brand new venture contained in the Label Studio interface.
    4. Add photographs and duties to MinIO – retailer your dataset in an S3-compatible bucket.
    5. Join MinIO to Label Studio – add the cloud storage bucket to your venture so Label Studio can fetch photographs and annotations immediately.

    Conditions

    To observe this tutorial, be sure you have:

    From VOC to Label Studio: Making ready Annotations

    The Pascal VOC dataset has a folder construction the place the prepare and check datasets are already break up. The Annotations folder incorporates the annotation recordsdata for every picture. In complete, the coaching set contains 17,125 photographs, every with a corresponding annotation file.

    .
    └── VOC2012
        ├── Annotations  # 17125 annotations
        ├── ImageSets 
        │   ├── Motion
        │   ├── Format
        │   ├── Primary
        │   └── Segmentation
        ├── JPEGImages  # 17125 photographs
        ├── SegmentationClass
        └── SegmentationObject

    The XML snippet beneath, taken from one of many annotations, defines a bounding field round an object labeled “individual”. The field is specified utilizing 4 pixel coordinates: xmin, ymin, xmax, and ymax.

    XML snippet from the Pascal VOC dataset (Picture by Writer)

    The illustration beneath reveals the inside rectangle because the annotated bounding field, outlined by the top-left nook (xmin, ymin) and the bottom-right nook (xmax, ymax), throughout the outer rectangle representing the picture.

    Pascal VOC bounding field coordinates in pixel format (Picture by Writer)

    Label Studio expects every bounding field to be outlined by its width, top, and top-left nook, expressed as percentages of the picture dimension. Beneath is a working instance of the transformed JSON format for the annotation proven above.

    {
      "knowledge": {
        "picture": "s3://<bucket_name>/<prefix>/2007_000027.jpg"
      },
      "annotations": [
        {
          "result": [
            {
              "from_name": "label",
              "to_name": "image",
              "type": "rectanglelabels",
              "value": {
                "x": 35.802,
                "y": 20.20,
                "width": 36.01,
                "height": 50.0,
                "rectanglelabels": ["person"]
              }
            }
          ]
        }
      ]
    }

    As you possibly can see within the JSON format, you additionally have to specify the situation of the picture file — for instance, a path in MinIO or an S3 bucket in the event you’re utilizing cloud storage.

    Whereas preprocessing the information, I merged the complete dataset, though it was already divided into coaching and validation. This simulates a real-world state of affairs the place you sometimes start with a single dataset and carry out the splitting into coaching and validation units your self earlier than coaching.

    Working the Full Stack with Docker Compose

    I merged the docker-compose.yml and docker-compose.minio.yml recordsdata right into a simplified single configuration so the complete stack can run on the identical community. Each recordsdata have been taken from the official Label Studio GitHub repository.

    
    
    companies:
      nginx:
        # Acts as a reverse proxy for Label Studio frontend/backend
        picture: heartexlabs/label-studio:newest
        restart: unless-stopped
        ports:
          - "8080:8085" 
          - "8081:8086"
        depends_on:
          - app
        setting:
          - LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-}
        
        volumes:
          - ./mydata:/label-studio/knowledge:rw # Shops Label Studio initiatives, configs, and uploaded recordsdata
        command: nginx
    
      app:
        stdin_open: true
        tty: true
        picture: heartexlabs/label-studio:newest
        restart: unless-stopped
        expose:
          - "8000"
        depends_on:
          - db
        setting:
          - DJANGO_DB=default
          - POSTGRE_NAME=postgres
          - POSTGRE_USER=postgres
          - POSTGRE_PASSWORD=
          - POSTGRE_PORT=5432
          - POSTGRE_HOST=db
          - LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-}
          - JSON_LOG=1
        volumes:
          - ./mydata:/label-studio/knowledge:rw  # Shops Label Studio initiatives, configs, and uploaded recordsdata
        command: label-studio-uwsgi
    
      db:
        picture: pgautoupgrade/pgautoupgrade:13-alpine
        hostname: db
        restart: unless-stopped
        setting:
          - POSTGRES_HOST_AUTH_METHOD=belief
          - POSTGRES_USER=postgres
        volumes:
          - ${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/knowledge  # Persistent storage for PostgreSQL database
      minio:
        picture: "minio/minio:${MINIO_VERSION:-RELEASE.2025-04-22T22-12-26Z}"
        command: server /knowledge --console-address ":9009"
        restart: unless-stopped
        ports:
          - "9000:9000"
          - "9009:9009"
        volumes:
          - minio-data:/knowledge   # Shops uploaded dataset objects (like photographs or JSON duties)
        # configure env vars in .env file or your methods setting
        setting:
          - MINIO_ROOT_USER=${MINIO_ROOT_USER:-minio_admin_do_not_use_in_production}
          - MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minio_admin_do_not_use_in_production}
          - MINIO_PROMETHEUS_URL=${MINIO_PROMETHEUS_URL:-http://prometheus:9090}
          - MINIO_PROMETHEUS_AUTH_TYPE=${MINIO_PROMETHEUS_AUTH_TYPE:-public}
     
    volumes:
      minio-data: # Named quantity for MinIO object storage

    This simplified Docker Compose file defines 4 core companies with their quantity mappings:

    App – runs the Label Studio backend itself.

    • Shares the mydata listing with Nginx, which shops initiatives, configurations, and uploaded recordsdata.
    • Makes use of a bind mount: ./mydata:/label-studio/knowledge:rw → maps a folder out of your host into the container.

    Nginx – acts as a reverse proxy for the Label Studio frontend and backend.

    • Shares the mydata listing with the App service.

    PostgreSQL (db) – manages metadata and venture data.

    • Shops persistent database recordsdata.
    • Makes use of a bind mount: ${POSTGRES_DATA_DIR:-./postgres-data}:/var/lib/postgresql/knowledge.

    MinIO – an S3-compatible object storage service.

    • Shops dataset objects similar to photographs or JSON annotation duties.
    • Makes use of a named quantity: minio-data:/knowledge.

    While you mount host folders similar to ./mydata and ./postgres-data, you want to assign possession on the host to the identical consumer that runs contained in the container. Label Studio doesn’t run as root — it makes use of a non-root consumer with UID 1001. If the host directories are owned by a unique consumer, the container gained’t have write entry and also you’ll run into permission denied errors.

    After creating these folders in your venture listing, you possibly can alter their possession with:

    mkdir mydata 
    mkdir postgres-data
    sudo chown -R 1001:1001 ./mydata ./postgres-data

    Now that the directories are ready, we will convey up the stack utilizing Docker Compose. Merely run:

    docker compose up -d

    It might take a couple of minutes to drag all of the required photographs from Docker Hub and arrange Label Studio. As soon as the setup is full, open http://localhost:8080 in your browser to entry the Label Studio interface. It’s essential create a brand new account, after which you possibly can log in along with your credentials to entry the interface. You possibly can allow a legacy API token by going to Group → API Token Settings. This token permits you to talk with the Label Studio API, which is particularly helpful for automation duties.

    Arrange a Label Studio venture

    Now we will create our first knowledge annotation venture on Label Studio, particularly for an object detection workflow. However earlier than beginning to annotate your photographs, you want to outline the sorts of courses to select from. Within the Pascal VOC dataset, there are 20 sorts of pre-annotated objects.

    XML-style labeling setup (Picture by Writer)

    Add photographs and duties to MinIO

    You possibly can open the MinIO consumer interface in your browser at localhost:9000, after which log in utilizing the credentials you specified beneath the related service within the docker-compose.yml file.

    I created a bucket with folders, one in every of which is used for storing photographs and one other for JSON duties formatted in accordance with the directions above.

    Screenshot of an instance bucket in MinIO (Picture by Writer)

    We arrange an S3-like service regionally that permits us to simulate S3 cloud storage with out incurring any expenses. If you wish to switch recordsdata to an S3 bucket on AWS, it’s higher to do that immediately over the web, contemplating the information switch prices. The excellent news is which you could additionally work together along with your MinIO bucket utilizing the AWS CLI. To do that, you want to add a profile in ~/.aws/config and supply the corresponding credentials in ~/.aws/credentials beneath the identical profile identify.

    After which, you possibly can simply sync along with your native folder utilizing the next instructions:

    #!/bin/bash
    set -e
    
    PROFILE=<your_profile_name>
    MINIO_ENDPOINT=<your_minio_endpoint>   # e.g. http://localhost:9000
    BUCKET_NAME=<your_bucket_name>
    SOURCE_DIR=<your_local_source_dir>    
    DEST_DIR=<your_bucket_destination_dir> 
    
    aws s3 sync 
          --endpoint-url "$MINIO_ENDPOINT" 
          --no-verify-ssl 
          --profile "$PROFILE" 
          "$SOURCE_DIR" "s3://$BUCKET_NAME/$DEST_DIR"
    
     
    

    Join MinIO to Label Studio

    In any case the information, together with the pictures and annotations, has been uploaded, we will transfer on to including cloud storage to the venture we created within the earlier step.

    Out of your venture settings, go to Cloud Storage and add the required parameters, such because the endpoint (which factors to the service identify within the Docker stack together with the port quantity, e.g., minio:9000), the bucket identify, and the related prefix the place the annotation recordsdata are saved. Every path contained in the JSON recordsdata will then level to the corresponding picture.

    Screenshot of the Cloud Storage settings (Picture by Writer)

    After verifying that the connection is working, you possibly can sync your venture with the cloud storage. It’s possible you’ll have to run the sync command a number of occasions because the dataset incorporates 22,263 photographs. It might seem to fail at first, however if you restart the sync, it continues to make progress. Ultimately, all of the Pascal VOC knowledge might be efficiently imported into Label Studio.

    Screenshot of the duty record (Picture by Writer)

    You possibly can see the imported duties with their thumbnail photographs within the job record. While you click on on a job, the picture will seem with its pre-annotations.

    Screenshot of a picture with bounding containers (Picture by Writer)

    Conclusions

    On this tutorial, we demonstrated import the Pascal VOC dataset into Label Studio by changing XML annotations into Label Studio’s JSON format, operating a full stack with Docker Compose, and connecting MinIO as S3-compatible storage. This setup lets you work with large-scale, pre-annotated datasets in a reproducible and cost-effective method, all in your native machine. Testing your venture settings and file codecs regionally first will guarantee a smoother transition when shifting to cloud environments.

    I hope this tutorial helps you kickstart your knowledge annotation venture with pre-annotated knowledge which you could simply broaden or validate. As soon as your dataset is prepared for coaching, you possibly can export all of the duties in common codecs similar to COCO or YOLO.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleYouTube tillämpar AI-förbättringar på videor utan skaparnas medgivande
    Next Article Unlocking Multimodal Video Transcription with Gemini
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    The Generalist: The New All-Around Type of Data Professional?

    September 1, 2025
    Artificial Intelligence

    How to Develop a Bilingual Voice Assistant

    August 31, 2025
    Artificial Intelligence

    The Machine Learning Lessons I’ve Learned This Month

    August 31, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    ChatGPT Now Recommends Products and Prices With New Shopping Features

    April 29, 2025

    Understanding AI Hallucinations: The Risks and Prevention Strategies with Shaip

    April 7, 2025

    Model Predictive-Control Basics | Towards Data Science

    August 12, 2025

    Exploring Prompt Learning: Using English Feedback to Optimize LLM Systems

    July 16, 2025

    By putting AI into everything, Google wants to make it invisible 

    May 21, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Manus AI lanserar intelligent bildgenerering – mer än bara en bildgenerator

    May 17, 2025

    Ethical Challenges & Societal Impact

    April 10, 2025

    The Hidden Security Risks of LLMs

    May 29, 2025
    Our Picks

    What health care providers actually want from AI

    September 2, 2025

    Alibaba har lanserat Qwen-Image-Edit en AI-bildbehandlingsverktyg som öppenkällkod

    September 2, 2025

    Can an AI doppelgänger help me do my job?

    September 2, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.