Spanish Accounts Verification Guide
🚀 Quick Start
1. Install Requirements
pip install -r requirements.txt
2. Run Verification
python run_verification.py
The script will automatically:
- Load your YAML file from
_data/spanish_accounts.yml - Verify each URL (Instagram, YouTube, etc.)
- Generate comprehensive reports in
_data/verification_results/
📊 Generated Reports
After running, you’ll find these reports in the output directory:
- verified_accounts_[timestamp].json - Complete JSON with all data
- spanish_accounts_analysis_[timestamp].xlsx - Multi-sheet Excel workbook
- category_analysis_[timestamp].json - Analysis by category
- quality_assessment_[timestamp].csv - Quality scores for each account
- failed_verifications_[timestamp].csv - List of URLs that couldn’t be verified
🎯 What Gets Verified
For each URL, the system checks:
Basic Verification
- ✅ Account exists and is accessible
- ✅ Account status (active, private, suspended, etc.)
- ✅ Platform-specific username extraction
Data Enrichment
- 📊 Follower/subscriber counts
- 📝 Bio/description content
- 🌍 Location information
- 🏷️ Content categories
- 📧 Email discovery
- 🔗 Cross-platform links
Spanish Language Analysis
- 🇪🇸 Language detection in bio/description
- 📍 Spanish location markers
- 🏴 Regional indicators (flags, city names)
- 💬 Spanish linguistic patterns
Quality Scoring
Each account receives quality scores based on:
- Completeness (25%): How many fields are populated
- Accuracy (25%): Verification confidence
- Freshness (15%): How recent the data is
- Consistency (15%): Username/name alignment
- Relevance (20%): Spanish market alignment
⚙️ Advanced Usage
Custom Batch Size
For better rate limit management:
processor = YAMLAccountsProcessor(yaml_file, output_dir)
# Modify batch size before verification
processor.verifier.batch_size = 10 # Smaller batches for cautious processing
Using YouTube API Key
For enhanced YouTube verification, add your API key:
processor.verifier = DatasetVerifier(youtube_api_key="YOUR_API_KEY")
Processing Subset of Categories
# Filter accounts by category before processing
processor.accounts_data = [
acc for acc in processor.accounts_data
if acc['category'] in ['Government', 'Media']
]
📈 Understanding Results
Verification Statuses
- active: Account is live and accessible
- private: Account exists but is private
- not_found: URL returns 404
- suspended: Account has been suspended
- rate_limited: Hit API limits (retry later)
- error: Other verification errors
Quality Score Interpretation
- > 0.8: Excellent quality, highly reliable data
- 0.6 - 0.8: Good quality, usable for most purposes
- 0.4 - 0.6: Average quality, may need manual review
- < 0.4: Low quality, significant data gaps
Language Confidence
- > 0.7: Strongly indicates Spanish content
- 0.5 - 0.7: Likely Spanish content
- 0.3 - 0.5: Possibly Spanish, needs review
- < 0.3: Unlikely to be Spanish content
🔄 Re-verification Strategy
For maintaining data quality:
- High-value accounts: Re-verify weekly
- Active accounts: Re-verify monthly
- Failed verifications: Retry after 24 hours
- Rate-limited: Retry with smaller batch size
⚠️ Important Notes
-
Rate Limits: The system respects platform rate limits. Large datasets may take time.
- Anonymous Limits: Without API keys, some platforms heavily restrict anonymous access:
- Instagram: ~200 requests/hour
- YouTube: Better with API key (10,000 units/day)
-
Data Freshness: Social media data changes rapidly. Regular re-verification recommended.
- Network Issues: Use a stable connection. The system will retry failed requests automatically.
🐛 Troubleshooting
“Rate limit reached”
- Reduce batch size in the processor
- Wait 1 hour before retrying
- Consider using API keys where available
“Module not found” errors
pip install -r requirements.txt --upgrade
Excel file won’t open
- Ensure openpyxl is installed:
pip install openpyxl - Check disk space for large datasets
Partial results
- Check
verification.logfor detailed errors - Failed URLs are saved separately for retry
📊 Sample Statistics Output
=======================================================================
SPANISH ACCOUNTS VERIFICATION SUMMARY
=======================================================================
Source File: spanish_accounts.yml
Total Accounts: 245
Total URLs: 389
Verified URLs: 375
--- Categories ---
Government: 45
Media: 38
Entertainment: 32
Sports: 28
...
--- Verification Status ---
active: 341 (90.9%)
private: 12 (3.2%)
not_found: 18 (4.8%)
error: 4 (1.1%)
--- Quality Metrics ---
Spanish Confirmed: 358
High Quality (>70%): 312
Verified Badges: 89
Total Followers: 45,234,891
Average Quality Score: 0.73
=======================================================================