Data Organization Guidelines¶

This guide explains the lab's data management workflow, from receiving NGS data to long-term storage and backup. Following these guidelines ensures data accessibility, security, and compliance with best practices.

Overview: Storage Infrastructure¶

The lab uses a three-tier storage system:

Storage Location	Purpose	Access Level	Backup
IBU Account	Primary workspace for active analysis	Long-term users	User-managed
BigData Server	Permanent storage for raw and final data	Long-term users	User-managed
Common Server	Shared lab resources and light data	All lab members	User-managed
p910 Server	Gateway and temporary workspace	All lab members	BACKUP folder only

1. Getting Started: Required Accounts¶

For Long-Term Lab Members¶

Priority Setup

Obtain these accounts as soon as you join the lab:

IBU Account (Required for all long-term users)
- Purpose: Primary workspace for data analysis and pipeline development
- How to get it: Contact the Dry-lab manager
- What to use it for:
- Store and analyze raw data
- Develop and test analysis scripts
- Run computational pipelines

GitHub Access
- Purpose: Version control for all analysis code
- Organization: ParisodLab GitHub
- Requirement: All scripts and pipelines must be version-controlled

For Short-Term Users¶

If you don't have IBU access, you can use:
- p910 server for temporary storage and testing

2. Data Storage Workflow¶

Step 1: Receiving Raw Data¶

When you receive raw NGS data from the sequencing facility:

Immediate Actions:
- Submit data to ENA or NCBI with complete metadata
- Apply embargo if needed (typically 2 years for unpublished data)
- See ENA Guidelines for detailed submission instructions
Storage:
- Transfer raw data to your IBU account for analysis
- Upload a copy to BigData server → Raw_data/ folder

Metadata Requirements

Always include comprehensive metadata:

Sample information: Population, subspecies, elevation, collection date
ENA checklist fields: Propagation method, soil properties, health status
Sequencing library: Instrument model, library prep details, file checksums
Environmental context: When available, include collection methods and conditions

Step 2: Data Analysis¶

Primary Analysis Location: IBU Account

Store working copies of data on your IBU account
Develop analysis scripts and pipelines
Test and refine your workflows
Use version control (Git) for all code

Temporary Testing: p910 Server

Use /media/Data drives for script testing only
Not for long-term storage
Clean up after testing

Step 3: Final Results¶

When analysis is complete:

Upload to BigData Server:
- Unfiltered results (e.g., complete VCF files) → Raw_data/ or Users/your_name/
- Filtered results (final datasets) → Users/your_name/ or Shared/
Archive After Publication:
- Move published data to Archives/ folder
- Include a README with publication DOI and description

3. Storage Locations Explained¶

IBU Account (Primary Workspace)¶

Who needs it: All long-term lab members

Use for: - Active data analysis - Pipeline development and testing - Temporary storage of working datasets - Computational jobs

Best Practices: - Keep organized project folders - Clean up completed projects regularly - Version control all scripts on GitHub - Transfer important data to BigData for permanent storage

BigData Server (Permanent Storage)¶

Purpose: Long-term storage following "write once, read many" principle

How to Connect¶

=== "macOS / Linux" smb://bigdata.unifr.ch/science/biol/groupe_parisod/

=== "Windows" \\bigdata.unifr.ch\science\biol\groupe_parisod\

Folder Structure¶

Folder	Purpose	What to Store
`Archives/`	Published or archived data	Data already on NCBI/ENA, old projects no longer in use
`Raw_data/`	Active raw datasets	NGS data currently being analyzed or used in ongoing projects
`Shared/`	Lab-wide resources	Protocols, reference genomes, software, shared results
`Users/your_name/`	Personal storage	Your analysis results, processed data, project-specific files PLEASE NO TEMPORARY FILES

Best Practices

Archive promptly: Move data to Archives/ after publication or public deposition
Stay organized: Include README files explaining folder contents
Clean regularly: Remove obsolete files from Raw_data/
Follow naming conventions: Use descriptive, consistent file and folder names
Document everything: Future you (and others) will thank you!

Common Server (Shared Lab Resources)¶

Purpose: Lightweight data and collaborative resources

How to Connect¶

=== "macOS / Linux" smb://common.unifr.ch/biol/_Ecologie/Parisod_group/

=== "Windows" \\common.unifr.ch\biol\_Ecologie\Parisod_group\

What's Available¶

Folder	Contents
`Greenhouse/`	Environmental monitoring data (temperature, light, humidity)
`Lab_meetings/`	Meeting reports and notes
`Presentations/`	Lab posters, talks, and presentation materials
`Research_doc/Projects/`	Project documentation and plans
`Research_doc/Protocols/`	Lab protocols and methods
`Research_doc/Publications/`	Published papers and manuscripts
`Research_doc/Scripts/`	Shared analysis scripts

Contribute!

Help build our shared knowledge base:

Upload your presentations after talks or posters
Share useful protocols you've developed
Document your projects for future reference
Add scripts that others might find useful

p910 Server (Gateway & Temporary Workspace)¶

Purpose: Access point to network drives and temporary testing environment

What p910 IS Used For¶

Network Drive Gateway - Accessing BigData, Common, and other UniFR drives - Managing data transfers between servers

Temporary Testing - Testing scripts before running on IBU - Short-term storage for users without IBU access - Data transfers between sequencing facilities and BigData Documentation Management - Lab documentation (version-controlled on GitHub) - System administration tasks

What p910 IS NOT Used For¶

Long-term data storage on /media/Data drives
Primary analysis workspace
Permanent project storage

Storage Limitations

The /media/Data drives on p910 are NOT for permanent storage. Move valuable data to BigData as soon as possible.

4. Backup System on p910¶

Critical Information

Only data in the BACKUP folder on p910 will be automatically backed up. Data stored elsewhere on p910 is NOT backed up and may be lost.

How Backups Work¶

Automated Backup Schedule - Runs every evening at 5:00 PM - Copies data from p910 BACKUP folder to BigData server - Automatic and requires no user action

Setting Up Your Backup¶

Create your personal backup folder: /media/Data[1-3]/BACKUP/your_name/
Organize your data: - Create a clear folder structure - Use descriptive names - Include README files
Monitor your usage: - Maximum quota: 1 TB per user - You'll be notified if you exceed the limit

What to Back Up¶

Do back up: - Critical scripts not yet on GitHub - Important results not yet on BigData - Data you're actively working on - Temporary files needed for ongoing analysis

Don't back up: - Data already on BigData - Large raw datasets (store on BigData instead) - Temporary test files - Duplicate copies of existing data

Backup Quota

1 TB maximum per user. Keep your backup folder clean:

Remove files once safely stored on BigData
Delete obsolete analysis outputs
Archive completed projects elsewhere
Contact the Dry-lab manager if you need additional space

Data Safety

For valuable data:

Primary copy: BigData server (permanent storage)
Working copy: IBU account (for analysis)
Temporary backup: p910 BACKUP folder (if needed)

Never rely solely on p910 for important data storage!

5. Quick Reference Guide¶

Common Scenarios¶

??? question "I just received raw sequencing data. What do I do?" 1. Submit to ENA/NCBI with metadata (set embargo if unpublished) 2. Transfer to your IBU account for analysis 3. Upload a copy to BigData → Raw_data/ folder 4. Start analysis on IBU, using version-controlled scripts

??? question "Where should I run my analysis?" - Primary: IBU account (best performance, designed for computation) - Testing only: p910 server (test scripts, then move to IBU) - Never: Directly on BigData (read-only access preferred)

??? question "I finished my analysis. Where do I store the results?" 1. Final results: BigData → Users/your_name/ 2. Shared results: BigData → Shared/ (if useful to others) 3. Scripts: GitHub (version controlled)

??? question "My paper was published. What should I do with the data?" 1. Ensure data is deposited in public repository (ENA/NCBI) 2. Move data to BigData → Archives/ folder 3. Include README with publication DOI and description 4. Clean up working copies from IBU account 5. Remove from p910 BACKUP if applicable

??? question "I need to share a protocol or presentation. Where do I put it?" Upload to Common Server in the appropriate folder:

- Protocols → `Research_doc/Protocols/`
- Presentations → `Presentations/`
- Scripts → `Research_doc/Scripts/`

??? question "I'm running out of space on p910 backup or IBU. What do I do?" 1. Move completed data to BigData 2. Delete temporary/obsolete files 3. Check for duplicate files 4. If you genuinely need more space, contact the Dry-lab manager

Data Storage Decision Tree¶

graph TD
    A[I have data] --> B{What type?}
    B -->|Raw NGS data| C[Submit to ENA/NCBI]
    C --> D[Store on IBU + BigData Raw_data/]
    B -->|Analysis results| E{Final or intermediate?}
    E -->|Final results| F[BigData Users/ or Shared/]
    E -->|Intermediate| G[Keep on IBU, backup if critical]
    B -->|Published data| H[BigData Archives/ + public repo]
    B -->|Protocols/Presentations| I[Common Server]
    B -->|Scripts/Code| J[GitHub + BigData Shared/]

Storage Capacity Guidelines¶

Storage	Recommended Use	Typical Size	Long-term?
IBU Account	Active analysis	Project-sized (10-100 GB)	No - clean up completed projects
BigData Raw_data/	Raw datasets	Dataset-sized (10-1000 GB)	Yes - until archived
BigData Archives/	Published data	Dataset-sized	Yes - permanent
BigData Users/	Your results	Variable	Yes - organized storage
p910 BACKUP	Critical temp files	< 1 TB per user	No - temporary only
Common Server	Light resources	Small files (< 100 MB each)	Yes - shared resources

6. Best Practices Checklist¶

For All Lab Members¶

[ ] Submit all raw NGS data to ENA/NCBI with complete metadata
[ ] Store permanent data copies on BigData server
[ ] Use descriptive, consistent file naming conventions
[ ] Include README files in all project folders
[ ] Clean up temporary files regularly
[ ] Contribute useful resources to Common Server

For Long-Term Members¶

[ ] Obtain IBU account for primary analysis work
[ ] Set up GitHub account and join ParisodLab organization
[ ] Version control all analysis scripts
[ ] Organize projects clearly on IBU account
[ ] Transfer final results to BigData promptly
[ ] Archive published data with proper documentation

For Data Management¶

[ ] Check p910 BACKUP quota regularly (< 1 TB)
[ ] Move archived data to BigData Archives/ after publication
[ ] Remove obsolete files from Raw_data/ folder
[ ] Document data provenance and processing steps
[ ] Use embargo settings appropriately for unpublished data

7. Getting Help¶

Common Issues¶

Can't access BigData or Common server: - Verify you're connected to UniFR network (or VPN) - Check the connection path for your operating system - Contact IT support if connection fails

Need more storage space: - Clean up temporary files first - Archive or delete obsolete data - Contact lab administrator if genuinely need more quota

Lost data or accidental deletion: - Check if data exists in BigData Archives/ - Check p910 BACKUP folder (if backed up there) - Contact system administrator immediately

Contacts¶

Dry-lab manager: For IBU accounts, storage questions, quota increases, p910 server issues, backup problems
IT support: For network drive connection issues

8. Summary¶

Key Takeaways

Data Lifecycle:

Receive data → Submit to ENA/NCBI with metadata
Analyze → Use IBU account, version control scripts on GitHub
Store → Upload raw and final data to BigData
Publish → Move to BigData Archives/ with documentation

Storage Hierarchy:

IBU: Your active workspace (temporary, high-performance)
BigData: Permanent storage (raw data, results, archives)
Common: Shared lab resources (protocols, presentations)
p910: Gateway and temporary testing only (< 1 TB backup quota)

Golden Rules:

Always include comprehensive metadata
Version control all code on GitHub
Document your data with README files
Clean up regularly to save space

Never use p910 /media/Data/ for long-term storage
Never store important data without backups

Last updated: November 2025