Data Organization Guidelines¶
This guide explains the lab's data management workflow, from receiving NGS data to long-term storage and backup. Following these guidelines ensures data accessibility, security, and compliance with best practices.
Overview: Storage Infrastructure¶
The lab uses a three-tier storage system:
| Storage Location | Purpose | Access Level | Backup |
|---|---|---|---|
| IBU Account | Primary workspace for active analysis | Long-term users | User-managed |
| BigData Server | Permanent storage for raw and final data | Long-term users | User-managed |
| Common Server | Shared lab resources and light data | All lab members | User-managed |
| p910 Server | Gateway and temporary workspace | All lab members | BACKUP folder only |
1. Getting Started: Required Accounts¶
For Long-Term Lab Members¶
Priority Setup
Obtain these accounts as soon as you join the lab:
IBU Account (Required for all long-term users)
- Purpose: Primary workspace for data analysis and pipeline development
- How to get it: Contact the Dry-lab manager
- What to use it for:
- Store and analyze raw data
- Develop and test analysis scripts
- Run computational pipelines
GitHub Access
- Purpose: Version control for all analysis code
- Organization: ParisodLab GitHub
- Requirement: All scripts and pipelines must be version-controlled
For Short-Term Users¶
If you don't have IBU access, you can use:
- p910 server for temporary storage and testing
2. Data Storage Workflow¶
Step 1: Receiving Raw Data¶
When you receive raw NGS data from the sequencing facility:
-
Immediate Actions:
- Submit data to ENA or NCBI with complete metadata
- Apply embargo if needed (typically 2 years for unpublished data)
- See ENA Guidelines for detailed submission instructions
-
Storage:
- Transfer raw data to your IBU account for analysis
- Upload a copy to BigData server →
Raw_data/folder
Metadata Requirements
Always include comprehensive metadata:
- Sample information: Population, subspecies, elevation, collection date
- ENA checklist fields: Propagation method, soil properties, health status
- Sequencing library: Instrument model, library prep details, file checksums
- Environmental context: When available, include collection methods and conditions
Step 2: Data Analysis¶
Primary Analysis Location: IBU Account
- Store working copies of data on your IBU account
- Develop analysis scripts and pipelines
- Test and refine your workflows
- Use version control (Git) for all code
Temporary Testing: p910 Server
- Use
/media/Datadrives for script testing only - Not for long-term storage
- Clean up after testing
Step 3: Final Results¶
When analysis is complete:
-
Upload to BigData Server:
- Unfiltered results (e.g., complete VCF files) →
Raw_data/orUsers/your_name/ - Filtered results (final datasets) →
Users/your_name/orShared/
- Unfiltered results (e.g., complete VCF files) →
-
Archive After Publication:
- Move published data to
Archives/folder - Include a README with publication DOI and description
- Move published data to
3. Storage Locations Explained¶
IBU Account (Primary Workspace)¶
Who needs it: All long-term lab members
Use for: - Active data analysis - Pipeline development and testing - Temporary storage of working datasets - Computational jobs
Best Practices: - Keep organized project folders - Clean up completed projects regularly - Version control all scripts on GitHub - Transfer important data to BigData for permanent storage
BigData Server (Permanent Storage)¶
Purpose: Long-term storage following "write once, read many" principle
How to Connect¶
=== "macOS / Linux"
smb://bigdata.unifr.ch/science/biol/groupe_parisod/
=== "Windows"
\\bigdata.unifr.ch\science\biol\groupe_parisod\
Folder Structure¶
| Folder | Purpose | What to Store |
|---|---|---|
Archives/ |
Published or archived data | Data already on NCBI/ENA, old projects no longer in use |
Raw_data/ |
Active raw datasets | NGS data currently being analyzed or used in ongoing projects |
Shared/ |
Lab-wide resources | Protocols, reference genomes, software, shared results |
Users/your_name/ |
Personal storage | Your analysis results, processed data, project-specific files PLEASE NO TEMPORARY FILES |
Best Practices
- Archive promptly: Move data to
Archives/after publication or public deposition - Stay organized: Include README files explaining folder contents
- Clean regularly: Remove obsolete files from
Raw_data/ - Follow naming conventions: Use descriptive, consistent file and folder names
- Document everything: Future you (and others) will thank you!
Common Server (Shared Lab Resources)¶
Purpose: Lightweight data and collaborative resources
How to Connect¶
=== "macOS / Linux"
smb://common.unifr.ch/biol/_Ecologie/Parisod_group/
=== "Windows"
\\common.unifr.ch\biol\_Ecologie\Parisod_group\
What's Available¶
| Folder | Contents |
|---|---|
Greenhouse/ |
Environmental monitoring data (temperature, light, humidity) |
Lab_meetings/ |
Meeting reports and notes |
Presentations/ |
Lab posters, talks, and presentation materials |
Research_doc/Projects/ |
Project documentation and plans |
Research_doc/Protocols/ |
Lab protocols and methods |
Research_doc/Publications/ |
Published papers and manuscripts |
Research_doc/Scripts/ |
Shared analysis scripts |
Contribute!
Help build our shared knowledge base:
- Upload your presentations after talks or posters
- Share useful protocols you've developed
- Document your projects for future reference
- Add scripts that others might find useful
p910 Server (Gateway & Temporary Workspace)¶
Purpose: Access point to network drives and temporary testing environment
What p910 IS Used For¶
Network Drive Gateway - Accessing BigData, Common, and other UniFR drives - Managing data transfers between servers
Temporary Testing - Testing scripts before running on IBU - Short-term storage for users without IBU access - Data transfers between sequencing facilities and BigData Documentation Management - Lab documentation (version-controlled on GitHub) - System administration tasks
What p910 IS NOT Used For¶
Long-term data storage on /media/Data drives
Primary analysis workspace
Permanent project storage
Storage Limitations
The /media/Data drives on p910 are NOT for permanent storage. Move valuable data to BigData as soon as possible.
4. Backup System on p910¶
Critical Information
Only data in the BACKUP folder on p910 will be automatically backed up. Data stored elsewhere on p910 is NOT backed up and may be lost.
How Backups Work¶
Automated Backup Schedule - Runs every evening at 5:00 PM - Copies data from p910 BACKUP folder to BigData server - Automatic and requires no user action
Setting Up Your Backup¶
-
Create your personal backup folder:
/media/Data[1-3]/BACKUP/your_name/ -
Organize your data: - Create a clear folder structure - Use descriptive names - Include README files
-
Monitor your usage: - Maximum quota: 1 TB per user - You'll be notified if you exceed the limit
What to Back Up¶
Do back up: - Critical scripts not yet on GitHub - Important results not yet on BigData - Data you're actively working on - Temporary files needed for ongoing analysis
Don't back up: - Data already on BigData - Large raw datasets (store on BigData instead) - Temporary test files - Duplicate copies of existing data
Backup Quota
1 TB maximum per user. Keep your backup folder clean:
- Remove files once safely stored on BigData
- Delete obsolete analysis outputs
- Archive completed projects elsewhere
- Contact the Dry-lab manager if you need additional space
Data Safety
For valuable data:
- Primary copy: BigData server (permanent storage)
- Working copy: IBU account (for analysis)
- Temporary backup: p910 BACKUP folder (if needed)
Never rely solely on p910 for important data storage!
5. Quick Reference Guide¶
Common Scenarios¶
??? question "I just received raw sequencing data. What do I do?"
1. Submit to ENA/NCBI with metadata (set embargo if unpublished)
2. Transfer to your IBU account for analysis
3. Upload a copy to BigData → Raw_data/ folder
4. Start analysis on IBU, using version-controlled scripts
??? question "Where should I run my analysis?" - Primary: IBU account (best performance, designed for computation) - Testing only: p910 server (test scripts, then move to IBU) - Never: Directly on BigData (read-only access preferred)
??? question "I finished my analysis. Where do I store the results?"
1. Final results: BigData → Users/your_name/
2. Shared results: BigData → Shared/ (if useful to others)
3. Scripts: GitHub (version controlled)
??? question "My paper was published. What should I do with the data?"
1. Ensure data is deposited in public repository (ENA/NCBI)
2. Move data to BigData → Archives/ folder
3. Include README with publication DOI and description
4. Clean up working copies from IBU account
5. Remove from p910 BACKUP if applicable
??? question "I need to share a protocol or presentation. Where do I put it?" Upload to Common Server in the appropriate folder:
- Protocols → `Research_doc/Protocols/`
- Presentations → `Presentations/`
- Scripts → `Research_doc/Scripts/`
??? question "I'm running out of space on p910 backup or IBU. What do I do?" 1. Move completed data to BigData 2. Delete temporary/obsolete files 3. Check for duplicate files 4. If you genuinely need more space, contact the Dry-lab manager
Data Storage Decision Tree¶
graph TD
A[I have data] --> B{What type?}
B -->|Raw NGS data| C[Submit to ENA/NCBI]
C --> D[Store on IBU + BigData Raw_data/]
B -->|Analysis results| E{Final or intermediate?}
E -->|Final results| F[BigData Users/ or Shared/]
E -->|Intermediate| G[Keep on IBU, backup if critical]
B -->|Published data| H[BigData Archives/ + public repo]
B -->|Protocols/Presentations| I[Common Server]
B -->|Scripts/Code| J[GitHub + BigData Shared/]
Storage Capacity Guidelines¶
| Storage | Recommended Use | Typical Size | Long-term? |
|---|---|---|---|
| IBU Account | Active analysis | Project-sized (10-100 GB) | No - clean up completed projects |
| BigData Raw_data/ | Raw datasets | Dataset-sized (10-1000 GB) | Yes - until archived |
| BigData Archives/ | Published data | Dataset-sized | Yes - permanent |
| BigData Users/ | Your results | Variable | Yes - organized storage |
| p910 BACKUP | Critical temp files | < 1 TB per user | No - temporary only |
| Common Server | Light resources | Small files (< 100 MB each) | Yes - shared resources |
6. Best Practices Checklist¶
For All Lab Members¶
- [ ] Submit all raw NGS data to ENA/NCBI with complete metadata
- [ ] Store permanent data copies on BigData server
- [ ] Use descriptive, consistent file naming conventions
- [ ] Include README files in all project folders
- [ ] Clean up temporary files regularly
- [ ] Contribute useful resources to Common Server
For Long-Term Members¶
- [ ] Obtain IBU account for primary analysis work
- [ ] Set up GitHub account and join ParisodLab organization
- [ ] Version control all analysis scripts
- [ ] Organize projects clearly on IBU account
- [ ] Transfer final results to BigData promptly
- [ ] Archive published data with proper documentation
For Data Management¶
- [ ] Check p910 BACKUP quota regularly (< 1 TB)
- [ ] Move archived data to BigData Archives/ after publication
- [ ] Remove obsolete files from Raw_data/ folder
- [ ] Document data provenance and processing steps
- [ ] Use embargo settings appropriately for unpublished data
7. Getting Help¶
Common Issues¶
Can't access BigData or Common server: - Verify you're connected to UniFR network (or VPN) - Check the connection path for your operating system - Contact IT support if connection fails
Need more storage space: - Clean up temporary files first - Archive or delete obsolete data - Contact lab administrator if genuinely need more quota
Lost data or accidental deletion: - Check if data exists in BigData Archives/ - Check p910 BACKUP folder (if backed up there) - Contact system administrator immediately
Contacts¶
- Dry-lab manager: For IBU accounts, storage questions, quota increases, p910 server issues, backup problems
- IT support: For network drive connection issues
8. Summary¶
Key Takeaways
Data Lifecycle:
- Receive data → Submit to ENA/NCBI with metadata
- Analyze → Use IBU account, version control scripts on GitHub
- Store → Upload raw and final data to BigData
- Publish → Move to BigData Archives/ with documentation
Storage Hierarchy:
- IBU: Your active workspace (temporary, high-performance)
- BigData: Permanent storage (raw data, results, archives)
- Common: Shared lab resources (protocols, presentations)
- p910: Gateway and temporary testing only (< 1 TB backup quota)
Golden Rules:
Always include comprehensive metadata
Version control all code on GitHub
Document your data with README files
Clean up regularly to save space
Never use p910 /media/Data/ for long-term storage
Never store important data without backups
Last updated: November 2025