Quickstart Guide

Welcome to the quickstart guide for lit_ecology_classifier! This guide will walk you through the installation process and show you how to use the package for ecological classification.

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.6 or higher
pip (Python package installer)
Git (optional, for cloning the repository)

Installation

# Lit Ecology Quickstart on Daint-GPU (CSCS)

This guide will help you quickly set up the Lit Ecology classifier on the Daint-GPU system at CSCS.

## Prerequisites

Access to the Daint-GPU system (CSCS).
Basic knowledge of Python environments and module systems.

## Steps

Navigate to your scratch space:

`bash cd $SCRATCH `

Load necessary modules:

`bash module load daint-gpu cray-python `

Create a Python virtual environment:

`bash python -m venv lit_ecology source lit_ecology/bin/activate `

Source the model script (replace with the actual path):

`bash source get_model.sh `

Create directories for parameters and phyto data:

`bash mkdir params mkdir params/phyto `

Upgrade pip and install PyTorch:

`bash lit_ecology/bin/python -m pip install --upgrade pip pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 `

Install lit_ecology_classifier and timm:

`bash pip install lit_ecology_classifier pip install timm==0.9.2 `

Create a directory for Slurm scripts:

`bash mkdir slurm `

## Important Notes:

Replace placeholders: Replace the placeholders (e.g., /store/empa/…) with the actual paths to your files and directories.
GPU version: Make sure that the version of the cudatoolkit installed in the venv matches the version of the GPU on the cluster (here cu118).
Slurm scripts: You’ll likely need to create Slurm scripts in the slurm directory to run your jobs efficiently on the cluster. Refer to the official CSCS documentation for guidance on writing Slurm scripts.

Usage

Prepare Your Data

Ensure your dataset is structured in the required format. The data should be organized with each class having its own subdirectory. The overall structure should look like this:
```
dataset_name/
├── class1/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
├── class2/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
└── ...
```
Once organized, compress the dataset into a .tar or .zip file.
Train the Model

To train the model, run the following command with appropriate arguments:
```
python -m lit_ecology_classifier.main --max_epochs 2 --dataset phyto --priority config/priority.json
```
Arguments: - –max_epochs: The number of epochs to train. - –dataset: The name of the dataset. - –priority: Path to the priority configuration file.
Evaluate the Model

After training, you can evaluate the model on your test dataset. Modify the script as necessary to point to your test data.
```
python -m lit_ecology_classifier.evaluate --dataset phyto --priority config/priority.json
```
Generate Documentation

To generate the documentation, navigate to the docs directory and run:
```
cd docs
make html
```
You can view the generated documentation by opening docs/_build/html/index.html in your web browser.

Additional Resources

Data Structure Details

The TarImageDataset or ImageFolderDataset class expects the data to be structured as follows:

The root directory should contain subdirectories for each class.
Each subdirectory should contain the image files for that class.

Example structure:

dataset_name/
├── class1/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
├── class2/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
└── ...

Compress the dataset_name directory into a .tar or .zip file before using it with the lit_ecology_classifier. The code will automatically deduce form the datapath whether it is .tar or folder based dataset.