Files
site-nowyousea/content/docs-mvp/DATA.md

34 lines
1014 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Data Pipeline (Acquisition → Transfer → Storage)
This document mirrors the **/data** page on the website and is meant to be versioned and reviewed.
## Scenario A (default)
- **Raw 192 kHz audio continuously** when 4G throughput is OK.
- If throughput is low, use an **adaptive strategy** (compression / denoise / clips / features).
## Acquisition (edge)
- Sample rate (recommended): **192 kHz**
- Bit depth: **24-bit**
- Channels: **mono**
### Raw rate (order of magnitude)
- 192000 samples/s × 24-bit ≈ **576 kB/s** (~0.56 MB/s)
- ~**2.0 GB/hour**
- ~**49 GB/day**
## Transfer (4G)
- Raw 192 kHz mono 24-bit ≈ **~4.6 Mbps** (payload only, no overhead)
## Storage
Per station:
- ~49 GB/day
- ~343 GB/week
- ~18 TB/year
## Optimizations (planned as feature branches)
- `feat/data-compression` (FLAC/Zstd, chunking)
- `feat/data-downsample` (192k→48k)
- `feat/data-vad-events` (VAD/events)
- `feat/data-features-only` (FFT/SPL/peaks only)
- `feat/data-retention-policy` (retention + aggregation)