Data Organization Guidelines

This guide explains the lab's data management workflow, from receiving NGS data to long-term storage and backup. Following these guidelines ensures data accessibility, security, and compliance with best practices.


Overview: Storage Infrastructure

The lab uses a three-tier storage system:

Storage Location Purpose Access Level Backup
IBU Account Primary workspace for active analysis Long-term users User-managed
BigData Server Permanent storage for raw and final data Long-term users User-managed
Common Server Shared lab resources and light data All lab members User-managed
p910 Server Gateway and temporary workspace All lab members BACKUP folder only

1. Getting Started: Required Accounts

For Long-Term Lab Members

Priority Setup

Obtain these accounts as soon as you join the lab:

IBU Account (Required for all long-term users)
- Purpose: Primary workspace for data analysis and pipeline development
- How to get it: Contact the Dry-lab manager
- What to use it for:
- Store and analyze raw data
- Develop and test analysis scripts
- Run computational pipelines

GitHub Access
- Purpose: Version control for all analysis code
- Organization: ParisodLab GitHub
- Requirement: All scripts and pipelines must be version-controlled

For Short-Term Users

If you don't have IBU access, you can use:
- p910 server for temporary storage and testing


2. Data Storage Workflow

Step 1: Receiving Raw Data

When you receive raw NGS data from the sequencing facility:

  1. Immediate Actions:

    • Submit data to ENA or NCBI with complete metadata
    • Apply embargo if needed (typically 2 years for unpublished data)
    • See ENA Guidelines for detailed submission instructions
  2. Storage:

    • Transfer raw data to your IBU account for analysis
    • Upload a copy to BigData serverRaw_data/ folder

Metadata Requirements

Always include comprehensive metadata:

  • Sample information: Population, subspecies, elevation, collection date
  • ENA checklist fields: Propagation method, soil properties, health status
  • Sequencing library: Instrument model, library prep details, file checksums
  • Environmental context: When available, include collection methods and conditions

Step 2: Data Analysis

Primary Analysis Location: IBU Account

  • Store working copies of data on your IBU account
  • Develop analysis scripts and pipelines
  • Test and refine your workflows
  • Use version control (Git) for all code

Temporary Testing: p910 Server

  • Use /media/Data drives for script testing only
  • Not for long-term storage
  • Clean up after testing

Step 3: Final Results

When analysis is complete:

  1. Upload to BigData Server:

    • Unfiltered results (e.g., complete VCF files) → Raw_data/ or Users/your_name/
    • Filtered results (final datasets) → Users/your_name/ or Shared/
  2. Archive After Publication:

    • Move published data to Archives/ folder
    • Include a README with publication DOI and description


3. Storage Locations Explained

IBU Account (Primary Workspace)

Who needs it: All long-term lab members

Use for: - Active data analysis - Pipeline development and testing - Temporary storage of working datasets - Computational jobs

Best Practices: - Keep organized project folders - Clean up completed projects regularly - Version control all scripts on GitHub - Transfer important data to BigData for permanent storage


BigData Server (Permanent Storage)

Purpose: Long-term storage following "write once, read many" principle

How to Connect

=== "macOS / Linux" smb://bigdata.unifr.ch/science/biol/groupe_parisod/

=== "Windows" \\bigdata.unifr.ch\science\biol\groupe_parisod\

Folder Structure

Folder Purpose What to Store
Archives/ Published or archived data Data already on NCBI/ENA, old projects no longer in use
Raw_data/ Active raw datasets NGS data currently being analyzed or used in ongoing projects
Shared/ Lab-wide resources Protocols, reference genomes, software, shared results
Users/your_name/ Personal storage Your analysis results, processed data, project-specific files PLEASE NO TEMPORARY FILES

Best Practices

  • Archive promptly: Move data to Archives/ after publication or public deposition
  • Stay organized: Include README files explaining folder contents
  • Clean regularly: Remove obsolete files from Raw_data/
  • Follow naming conventions: Use descriptive, consistent file and folder names
  • Document everything: Future you (and others) will thank you!

Common Server (Shared Lab Resources)

Purpose: Lightweight data and collaborative resources

How to Connect

=== "macOS / Linux" smb://common.unifr.ch/biol/_Ecologie/Parisod_group/

=== "Windows" \\common.unifr.ch\biol\_Ecologie\Parisod_group\

What's Available

Folder Contents
Greenhouse/ Environmental monitoring data (temperature, light, humidity)
Lab_meetings/ Meeting reports and notes
Presentations/ Lab posters, talks, and presentation materials
Research_doc/Projects/ Project documentation and plans
Research_doc/Protocols/ Lab protocols and methods
Research_doc/Publications/ Published papers and manuscripts
Research_doc/Scripts/ Shared analysis scripts

Contribute!

Help build our shared knowledge base:

  • Upload your presentations after talks or posters
  • Share useful protocols you've developed
  • Document your projects for future reference
  • Add scripts that others might find useful

p910 Server (Gateway & Temporary Workspace)

Purpose: Access point to network drives and temporary testing environment

What p910 IS Used For

Network Drive Gateway - Accessing BigData, Common, and other UniFR drives - Managing data transfers between servers

Temporary Testing - Testing scripts before running on IBU - Short-term storage for users without IBU access - Data transfers between sequencing facilities and BigData Documentation Management - Lab documentation (version-controlled on GitHub) - System administration tasks

What p910 IS NOT Used For

Long-term data storage on /media/Data drives
Primary analysis workspace
Permanent project storage

Storage Limitations

The /media/Data drives on p910 are NOT for permanent storage. Move valuable data to BigData as soon as possible.



4. Backup System on p910

Critical Information

Only data in the BACKUP folder on p910 will be automatically backed up. Data stored elsewhere on p910 is NOT backed up and may be lost.

How Backups Work

Automated Backup Schedule - Runs every evening at 5:00 PM - Copies data from p910 BACKUP folder to BigData server - Automatic and requires no user action

Setting Up Your Backup

  1. Create your personal backup folder: /media/Data[1-3]/BACKUP/your_name/

  2. Organize your data: - Create a clear folder structure - Use descriptive names - Include README files

  3. Monitor your usage: - Maximum quota: 1 TB per user - You'll be notified if you exceed the limit

What to Back Up

Do back up: - Critical scripts not yet on GitHub - Important results not yet on BigData - Data you're actively working on - Temporary files needed for ongoing analysis

Don't back up: - Data already on BigData - Large raw datasets (store on BigData instead) - Temporary test files - Duplicate copies of existing data

Backup Quota

1 TB maximum per user. Keep your backup folder clean:

  • Remove files once safely stored on BigData
  • Delete obsolete analysis outputs
  • Archive completed projects elsewhere
  • Contact the Dry-lab manager if you need additional space

Data Safety

For valuable data:

  1. Primary copy: BigData server (permanent storage)
  2. Working copy: IBU account (for analysis)
  3. Temporary backup: p910 BACKUP folder (if needed)

Never rely solely on p910 for important data storage!


5. Quick Reference Guide

Common Scenarios

??? question "I just received raw sequencing data. What do I do?" 1. Submit to ENA/NCBI with metadata (set embargo if unpublished) 2. Transfer to your IBU account for analysis 3. Upload a copy to BigData → Raw_data/ folder 4. Start analysis on IBU, using version-controlled scripts

??? question "Where should I run my analysis?" - Primary: IBU account (best performance, designed for computation) - Testing only: p910 server (test scripts, then move to IBU) - Never: Directly on BigData (read-only access preferred)

??? question "I finished my analysis. Where do I store the results?" 1. Final results: BigData → Users/your_name/ 2. Shared results: BigData → Shared/ (if useful to others) 3. Scripts: GitHub (version controlled)

??? question "My paper was published. What should I do with the data?" 1. Ensure data is deposited in public repository (ENA/NCBI) 2. Move data to BigData → Archives/ folder 3. Include README with publication DOI and description 4. Clean up working copies from IBU account 5. Remove from p910 BACKUP if applicable

??? question "I need to share a protocol or presentation. Where do I put it?" Upload to Common Server in the appropriate folder:

- Protocols → `Research_doc/Protocols/`
- Presentations → `Presentations/`
- Scripts → `Research_doc/Scripts/`

??? question "I'm running out of space on p910 backup or IBU. What do I do?" 1. Move completed data to BigData 2. Delete temporary/obsolete files 3. Check for duplicate files 4. If you genuinely need more space, contact the Dry-lab manager

Data Storage Decision Tree

graph TD
    A[I have data] --> B{What type?}
    B -->|Raw NGS data| C[Submit to ENA/NCBI]
    C --> D[Store on IBU + BigData Raw_data/]
    B -->|Analysis results| E{Final or intermediate?}
    E -->|Final results| F[BigData Users/ or Shared/]
    E -->|Intermediate| G[Keep on IBU, backup if critical]
    B -->|Published data| H[BigData Archives/ + public repo]
    B -->|Protocols/Presentations| I[Common Server]
    B -->|Scripts/Code| J[GitHub + BigData Shared/]

Storage Capacity Guidelines

Storage Recommended Use Typical Size Long-term?
IBU Account Active analysis Project-sized (10-100 GB) No - clean up completed projects
BigData Raw_data/ Raw datasets Dataset-sized (10-1000 GB) Yes - until archived
BigData Archives/ Published data Dataset-sized Yes - permanent
BigData Users/ Your results Variable Yes - organized storage
p910 BACKUP Critical temp files < 1 TB per user No - temporary only
Common Server Light resources Small files (< 100 MB each) Yes - shared resources

6. Best Practices Checklist

For All Lab Members

  • [ ] Submit all raw NGS data to ENA/NCBI with complete metadata
  • [ ] Store permanent data copies on BigData server
  • [ ] Use descriptive, consistent file naming conventions
  • [ ] Include README files in all project folders
  • [ ] Clean up temporary files regularly
  • [ ] Contribute useful resources to Common Server

For Long-Term Members

  • [ ] Obtain IBU account for primary analysis work
  • [ ] Set up GitHub account and join ParisodLab organization
  • [ ] Version control all analysis scripts
  • [ ] Organize projects clearly on IBU account
  • [ ] Transfer final results to BigData promptly
  • [ ] Archive published data with proper documentation

For Data Management

  • [ ] Check p910 BACKUP quota regularly (< 1 TB)
  • [ ] Move archived data to BigData Archives/ after publication
  • [ ] Remove obsolete files from Raw_data/ folder
  • [ ] Document data provenance and processing steps
  • [ ] Use embargo settings appropriately for unpublished data

7. Getting Help

Common Issues

Can't access BigData or Common server: - Verify you're connected to UniFR network (or VPN) - Check the connection path for your operating system - Contact IT support if connection fails

Need more storage space: - Clean up temporary files first - Archive or delete obsolete data - Contact lab administrator if genuinely need more quota

Lost data or accidental deletion: - Check if data exists in BigData Archives/ - Check p910 BACKUP folder (if backed up there) - Contact system administrator immediately

Contacts

  • Dry-lab manager: For IBU accounts, storage questions, quota increases, p910 server issues, backup problems
  • IT support: For network drive connection issues

8. Summary

Key Takeaways

Data Lifecycle:

  1. Receive data → Submit to ENA/NCBI with metadata
  2. Analyze → Use IBU account, version control scripts on GitHub
  3. Store → Upload raw and final data to BigData
  4. Publish → Move to BigData Archives/ with documentation

Storage Hierarchy:

  • IBU: Your active workspace (temporary, high-performance)
  • BigData: Permanent storage (raw data, results, archives)
  • Common: Shared lab resources (protocols, presentations)
  • p910: Gateway and temporary testing only (< 1 TB backup quota)

Golden Rules:

Always include comprehensive metadata
Version control all code on GitHub
Document your data with README files
Clean up regularly to save space

Never use p910 /media/Data/ for long-term storage
Never store important data without backups


Last updated: November 2025