API Samples
This directory contains real Archive.org API response samples collected for testing and documentation purposes. These samples demonstrate edge cases and data variations that parsers must handle correctly.
Purpose
These JSON files serve multiple purposes:
- Parser Testing - Use these samples to verify your Archive.org API parser handles edge cases
- Implementation Reference - See real-world data structures returned by the Archive.org metadata API
- Edge Case Documentation - Understand the inconsistencies and variations in Archive.org responses
Collection Method
- Source: Archive.org metadata API (
https://archive.org/metadata/{identifier}) - Date Collected: 2025-10-13
- Truncation: Files array limited to first 15-20 entries, reviews limited to first 5 entries
- Selection Criteria: Recording identifiers sampled from production data that demonstrated parser challenges
Samples
baseline-complete.json
Recording: gd77-05-08.sbd.hicks.4982.sbeok.shnf (Cornell '77)
Lines: 367
Description: Well-formed metadata with complete fields
Demonstrates:
- Complete metadata fields (date, venue, description, setlist, source, taper, transferer, lineage)
- collection field as array: ["GratefulDead", "etree", "stream_only"]
- Standard audio file structure (FLAC format)
- Multiple reviews with ratings
- Typical track structure with metadata
Use this sample to:
- Test array handling for collection field
- Verify baseline parsing of complete data
- Validate track extraction and ordering
edge-null-fields.json
Recording: gd1965-11-01.sbd.bershaw.5417.sbeok.shnf (1965 show)
Lines: 385
Description: Early era recording with sparse metadata
Demonstrates:
- taper field as null (not missing, explicitly null)
- source field as null (not missing, explicitly null)
- collection field as array: ["GratefulDead", "etree"]
- Minimal description and venue information
- No setlist or lineage data
Use this sample to: - Test null field handling in mapper - Verify parser doesn't crash on missing taper/source - Validate that optional fields work correctly with null values
Edge Case Summary
| Edge Case | Field | Type | Sample(s) | Notes |
|---|---|---|---|---|
| Array field | collection |
String OR Array | baseline-complete.json, edge-null-fields.json | FlexibleStringSerializer should take first element |
| Null field | taper |
String OR null | edge-null-fields.json | Must handle gracefully, default to null |
| Null field | source |
String OR null | edge-null-fields.json | Must handle gracefully, default to null |
| Missing field | setlist |
String OR undefined | edge-null-fields.json | Common in early recordings |
| Missing field | lineage |
String OR undefined | edge-null-fields.json | Not always documented |
Known Variations Not Yet Captured
Based on implementation notes, these edge cases may exist but are not yet represented in samples:
venueas array instead of stringdescriptionas array instead of string- Extremely sparse metadata (only identifier and title)
- Recordings with zero reviews
- Recordings with zero audio files
If you encounter these variations in the wild, consider adding them to this collection.
Testing Recommendations
When implementing an Archive.org parser, test against these samples:
- Parse each sample successfully - No crashes or exceptions
- Verify field mappings - Check that domain models match expected values
- Test array handling - Ensure
collectionarray is converted to first element string - Test null handling - Ensure null
taperandsourcedon't break parser - Test track extraction - Verify audio files are correctly filtered and ordered
- Test review parsing - Ensure reviews are extracted properly
Usage Example
# Fetch fresh sample (requires curl)
curl -s "https://archive.org/metadata/gd77-05-08.sbd.hicks.4982.sbeok.shnf" | jq . > test-sample.json
# Validate against your parser
your-parser-tool test-sample.json
References
- Domain Models: See
domain-models.mdfor field definitions and types - API Integration: See
api-integration.mdfor mapper algorithms and caching - Archive.org API:
https://archive.org/metadata/{identifier}
Updates
This collection should be updated when:
- New edge cases are discovered in production
- Archive.org changes their API response format
- Additional variations are needed for testing
To add a new sample:
- Fetch the raw API response:
curl -s "https://archive.org/metadata/{identifier}" - Truncate large arrays (keep first 15-20 files, first 5 reviews)
- Save as
edge-{description}.jsonorbaseline-{description}.json - Update this index with description and edge cases demonstrated
- Add to Edge Case Summary table if introducing new variations