Bodine & Co.|Social Scraper/ca-es-insurance

Deploy: Mar 30, 6:47 PM PDT

California E&S Insurance

active

Homeowner experiences, agent discussions, E&S/surplus lines, and FAIR Plan coverage in California wildfire zones

Overview

Configuration

SourcesAudiencesKeywordsGeographies

Results

PostsNewsReportsAnalytics

Operations

DiscoveryScrape LogImportSettingsRisk Zones
SourceStatusStartedCompletedPostsCostErrors
topic-discoverycompleted3/28/2026, 5:53:19 AM3/28/2026, 5:53:56 AM0$0.12—
Assessment
SourceStatusStartedCompletedPostsCostErrors
api-discoverycompleted3/28/2026, 5:21:58 AM3/28/2026, 5:22:00 AM10——
Assessment
SourceStatusStartedCompletedPostsCostErrors
api-discoverycompleted3/28/2026, 5:21:28 AM3/28/2026, 5:21:45 AM11$0.04—
Assessment
SourceStatusStartedCompletedPostsCostErrors
redditcompleted3/16/2026, 4:13:13 AM3/16/2026, 4:24:51 AM262——
Assessment

Marginal value — Tier C adds 262 posts but only 8 high-quality

Scraped 6 Tier C subreddits (hyperlocal fire zones + investor subs). r/Malibu had the best signal-to-noise with FAIR Plan pricing discussion. r/realestateinvesting was fully blocked by Reddit 403 rate-limiting. Total Reddit corpus now ~4,334 posts across 34 subreddits.

Relevance

3%

Location Mentions

16%

Cost/Post

Free

Audience Split

unknown: 106, homeowner: 156

Top Sources

r/SantaCruz (100)r/PacificPalisades (71)r/Landlord (50)r/Malibu (35)r/NapaValley (6)

Strengths

  • + r/Malibu best signal-to-noise (mean 9.5, 4 high-quality posts)
  • + Hyperlocal fire victim perspectives not found in broader subs
  • + No Apify cost — free lite actor
  • + Scraper infrastructure reusable for future runs

Weaknesses

  • - Low overall relevance (mean score 4.4 across all Tier C)
  • - r/SantaCruz high volume but mostly tangential fire content
  • - r/realestateinvesting fully blocked (0 posts)
  • - r/NapaValley too small to be useful (6 posts)

Recommendations

  • 1. Do not revisit r/NapaValley or r/realestateinvesting
  • 2. r/Malibu worth re-scraping in future runs for new FAIR Plan discussion
  • 3. Focus remaining effort on higher-value sources (#26 news articles, #16 Facebook/Nextdoor)
SourceStatusStartedCompletedPostsCostErrors
redditcompleted3/15/2026, 5:50:33 PM3/15/2026, 6:22:12 PM1246——
Assessment

Tier B targeted scrape — 1246 posts from 11 subreddits

Scraped 11 Tier B subreddits (hyperlocal fire zones, real estate, legal, financial planning). Subreddits: r/altadena, r/burbank, r/TahoeLocals, r/santarosa, r/grassvalley, r/RealEstate....

SourceStatusStartedCompletedPostsCostErrors
redditcompleted3/15/2026, 8:46:55 AM3/15/2026, 9:26:18 AM1798——
Assessment

Tier A targeted scrape — 1798 posts from 16 subreddits

Scraped 16 Tier A subreddits using subreddit: search operator with Apify lite actor. Subreddits: r/HomeInsurance, r/InsuranceClaims, r/InsuranceAgent, r/InsurancePros, r/homeowners, r/California....

SourceStatusStartedCompletedPostsCostErrors
redditcompleted3/15/2026, 8:41:58 AM3/15/2026, 8:44:59 AM144——
Assessment

Single subreddit test — 144 posts from r/Insurance

Test run of targeted subreddit scraper against r/Insurance. Validated that subreddit: search operator works with the free Apify lite actor.

SourceStatusStartedCompletedPostsCostErrors
uphelp.orgcompleted3/15/2026, 7:50:22 AM3/15/2026, 7:54:25 AM1——
Assessment

Unreliable — only 1 of 4 question pages scraped successfully via Apify. Cloudflare interference causes ~75% failure rate. High-value content but not worth the Apify cost at this success rate.

Weaknesses

  • - 75% Apify failure rate
  • - Cloudflare blocks most requests
  • - High cost per successful page

Recommendations

  • 1. Not viable for automated scraping
  • 2. Consider manual collection via issue #16
SourceStatusStartedCompletedPostsCostErrors
bogleheads.orgcompleted3/15/2026, 7:49:43 AM3/15/2026, 7:50:07 AM0——
Assessment

Blocked — 403 even with Apify Web Scraper + residential proxy + Chrome stealth. IP-level blocking, not just Cloudflare. 10 relevant threads identified but cannot be scraped automatically.

Weaknesses

  • - Aggressive IP blocking defeats all proxy approaches
  • - phpBB forum with custom anti-bot rules

Recommendations

  • 1. Skip automated scraping
  • 2. Consider manual collection via issue #16 if content is desired
SourceStatusStartedCompletedPostsCostErrors
mrmoneymustache.comcompleted3/15/2026, 7:49:06 AM3/15/2026, 7:49:28 AM106——
Assessment

Go — 106 homeowner posts from 5 FIRE community threads. Financially sophisticated homeowners analyzing insurance economics, self-insuring calculations, LA fire settlements. Unique audience perspective.

Strengths

  • + 100% homeowner audience
  • + Financial analysis depth (break-even calculations, cost comparisons)
  • + LA fire settlement advice
  • + Multi-page threads with rich discussion
  • + Zero cost

Weaknesses

  • - FIRE community skews higher net worth than average homeowner
  • - Some threads tangential to CA fire insurance
SourceStatusStartedCompletedPostsCostErrors
talkirvine.comcompleted3/15/2026, 7:48:27 AM3/15/2026, 7:48:46 AM42——
Assessment

Go — 42 homeowner posts from 3 Irvine/OC community threads. XenForo forum, same parser as insurance-forums.com. Real homeowners discussing premium increases, carrier comparisons, fire zone surcharges.

Strengths

  • + 100% homeowner audience
  • + Specific premium amounts and carrier names
  • + Local geographic specificity (Irvine/OC)
  • + Zero cost — direct requests

Weaknesses

  • - Small volume (3 threads)
  • - Geographically narrow (Orange County only)
SourceStatusStartedCompletedPostsCostErrors
insurance-forums.comcompleted3/15/2026, 7:04:20 AM3/15/2026, 7:15:54 AM54——
Assessment

54 new broker posts from 9 remaining threads. Total insurance-forums.com posts now 134 across 19 threads. Apify Web Scraper with stealth successfully replaces Firecrawl.

Location Mentions

NaN%

Strengths

  • + 100% broker audience
  • + Professional E&S market discussion
  • + Covers FAIR Plan, carrier exits, placement challenges
  • + Completes all 20 target threads

Weaknesses

  • - Some pagination pages (7-8) got 403 on long threads
  • - Partial data loss on 1 thread

Recommendations

  • 1. insurance-forums.com scraping complete for Phase 1
  • 2. Apify Web Scraper approach reusable for other Cloudflare-protected forums
SourceStatusStartedCompletedPostsCostErrors
insurance-forums.comcompleted3/15/2026, 7:03:23 AM3/15/2026, 7:03:58 AM5——
Assessment

Test successful — Apify Web Scraper with Chrome + stealth + residential proxy bypasses Cloudflare. XenForo HTML parsed via BeautifulSoup. 5 broker posts from 1 thread.

Strengths

  • + Cloudflare bypass confirmed
  • + Deterministic HTML parsing replaces LLM-based Firecrawl extraction
  • + No Firecrawl dependency

Recommendations

  • 1. Proceed with remaining 8 threads using this approach
SourceStatusStartedCompletedPostsCostErrors
insurance-forums.comcompleted3/15/2026, 7:02:06 AM3/15/2026, 7:02:43 AM0——
Assessment

Failed — Cloudflare blocked Apify Web Scraper without stealth mode. 403 on all retries. Led to enabling Chrome + stealth + residential proxy.

Weaknesses

  • - Cloudflare detects headless browser fingerprint
  • - Residential proxy alone insufficient

Recommendations

  • 1. Enable useChrome + useStealth flags
  • 2. Retry with stealth mode enabled
SourceStatusStartedCompletedPostsCostErrors
forum-discoverycompleted3/15/2026, 6:44:57 AM3/15/2026, 6:47:43 AM0$0.75—
Assessment
SourceStatusStartedCompletedPostsCostErrors
google-newscompleted3/15/2026, 6:33:22 AM3/15/2026, 6:34:01 AM85$0.18—
Assessment

Go — viable tertiary source. 85 news articles collected via SerpAPI Google News. Top sources: CalMatters, Insurance Journal, LA Times. Snippet text captures key information without full-article scraping.

Location Mentions

NaN%

Top Sources

calmatters.orginsurancejournal.comlatimes.comabc7news.cominsurancebusinessmag.comeenews.net

Strengths

  • + 100% date parse rate
  • + Negligible cost (~$0.18)
  • + 31% location mention rate
  • + Good mix of state and national outlets

Weaknesses

  • - Snippet text only, not full articles
  • - All audience_type=news, no firsthand commentary

Recommendations

  • 1. Use as supplementary context for Phase 1 analysis
  • 2. Consider full-article scraping in Phase 2 if needed
SourceStatusStartedCompletedPostsCostErrors
nextdoorcompleted3/15/2026, 6:33:12 AM3/15/2026, 6:33:20 AM0——
Assessment

no-go

SourceStatusStartedCompletedPostsCostErrors
facebookcompleted3/15/2026, 6:32:55 AM3/15/2026, 6:33:10 AM0——
Assessment

limited-go

SourceStatusStartedCompletedPostsCostErrors
redfincompleted3/15/2026, 5:52:29 AM3/15/2026, 5:52:43 AM0——
Assessment

No-go — editorial articles only, no community discussions

SerpAPI site-search returned 30 results: 10 staff-written blog/news articles about wildfire insurance, 19 property listings, and 1 false-positive blog post miscategorized as community. Zero user forums or Q&A sections. Redfin publishes agent-curated content but has no public community discussion features.

Strengths

  • + Redfin News has strong agent-quoted articles about CA wildfire insurance market
  • + Good editorial coverage of insurance-related housing trends

Weaknesses

  • - No user-generated community content — blog articles are staff-written
  • - Agent quotes are embedded in editorial pieces, not standalone discussions
  • - Apify Redfin actor scrapes listings only

Recommendations

  • 1. Skip Redfin for community discussion scraping
  • 2. Redfin News articles overlap with news scraping scope in issue #8
SourceStatusStartedCompletedPostsCostErrors
zillowcompleted3/15/2026, 5:52:16 AM3/15/2026, 5:52:27 AM0——
Assessment

No-go — no community discussion content found

SerpAPI site-search returned 30 results: 6 Zillow Research articles about insurance trends, 19 property listings, and 5 other pages. Zero user-generated forum posts, Q&A threads, or community discussions. Zillow has no public discussion boards — its "forums" are invite-only corporate events.

Strengths

  • + Zillow Research publishes useful insurance data (premium growth, climate risk analysis)

Weaknesses

  • - No user-generated community content — no forums, Q&A, or discussion boards
  • - Apify Zillow actor scrapes listings only, not community content
  • - Insurance mentions in listings are incidental, not discussion

Recommendations

  • 1. Skip Zillow for community discussion scraping
  • 2. Zillow Research articles may be useful as secondary data references for final analysis
SourceStatusStartedCompletedPostsCostErrors
twittercompleted3/15/2026, 5:05:04 AM3/15/2026, 5:08:53 AM229$0.05—
Assessment

Viable Tier 2 source. Better geographic specificity than Reddit (38% vs 4%), but content is shallow. Needs relevance scoring and stock spam filtering.

Location Mentions

NaN%

SourceStatusStartedCompletedPostsCostErrors
twittercompleted3/15/2026, 4:56:22 AM3/15/2026, 5:00:36 AM40$0.07—
Assessment

Development run — field mapping adjustment. See final run.

SourceStatusStartedCompletedPostsCostErrors
twittercompleted3/15/2026, 4:50:43 AM3/15/2026, 4:54:16 AM223$0.05—
Assessment

Initial run — author_id mapping was incorrect, data replaced by final run.

SourceStatusStartedCompletedPostsCostErrors
biggerpockets.comcompleted3/15/2026, 3:50:03 AM3/15/2026, 3:50:27 AM10——
Assessment

BiggerPockets investor/landlord perspective. Lower volume but useful for homeowner/investor insurance experience. Direct HTML scraping works (no Cloudflare).

SourceStatusStartedCompletedPostsCostErrors
insurance-forums.comcompleted3/15/2026, 3:46:16 AM3/15/2026, 3:50:26 AM80—4 error(s)
Assessment

100% broker content from insurance-forums.com. Professional discussions about FAIR Plan, E&S markets, carrier exits. Highest quality source. Firecrawl free tier exhausted.

SourceStatusStartedCompletedPostsCostErrors
redditcompleted3/15/2026, 2:05:56 AM3/15/2026, 2:14:39 AM472——
Assessment

Batch 2 — geographic queries, similar noise profile

Second Apify batch using geographic + subreddit-targeted queries (LA wildfire, Bay Area, Sacramento, etc.). 472 new posts, 23 duplicates caught. After cleanup: 884 total Reddit rows (115 posts, 769 comments, 11 deleted). Noise ratio still high — most comments are off-topic. Location mention rate improved to 16% (vs 4% in batch 1) thanks to geographic query focus.

Relevance

30%

Location Mentions

16%

Cost/Post

$0.005

Audience Split

unknown: 779, homeowner: 105

Top Sources

r/Californiar/LosAngelesr/bayarear/Insurancer/REBubbler/orangecounty

Strengths

  • + Geographic queries improved location mention rate (4% → 16%)
  • + Deduplication working correctly (23 caught)
  • + More geographic subreddits represented (r/LosAngeles, r/orangecounty)

Weaknesses

  • - Still ~87% comments, many off-topic
  • - Apify credits nearly exhausted (~$0.10 remaining)
  • - No control over which comments the actor returns

Recommendations

  • 1. Switch to public JSON scraper for remaining ~1,600 posts needed
  • 2. Focus JSON scraper on posts-only mode to reduce noise
  • 3. Apply relevance scoring filter (score >= 10) to surface useful content
SourceStatusStartedCompletedPostsCostErrors
redditcompleted3/15/2026, 12:19:37 AM3/15/2026, 12:26:48 AM437——
Assessment

Promising but noisy — needs filtering

Reddit has relevant CA home insurance threads, but the scraper pulls ~10 comments per thread and most are off-topic noise. Only 49 of 437 rows are thread-starting posts. 14 rows are [deleted]. Irrelevant subreddits (r/PokemonTCG, r/bouldering) also matched. Data needs relevance scoring and thread labeling before it is useful.

Relevance

30%

Location Mentions

4%

Cost/Post

Free

Audience Split

unknown: 332, homeowner: 105

Top Sources

r/Californiar/REBubbler/bayarear/Insurancer/HomeInsurance

Strengths

  • + Thread-starting posts contain valuable firsthand insurance experiences
  • + Strong coverage of FAIR Plan, non-renewals, wildfire insurance
  • + Geographic subreddits (r/bayarea, r/LosAngeles) provide location context
  • + Free via Apify Lite actor

Weaknesses

  • - High noise: 89% of rows are comments, many off-topic
  • - 14 rows are [deleted]/[removed]
  • - Irrelevant subreddits matched (r/PokemonTCG, r/bouldering)
  • - No control over which comments are scraped
  • - All rows labeled as posts — comments not distinguished
  • - Low explicit location mentions (4%)

Recommendations

  • 1. Implement relevance scoring (0-100) against weighted keywords
  • 2. Label thread-starting posts vs comments, group by thread
  • 3. Delete [deleted]/[removed] rows
  • 4. Add dashboard filter for relevance score threshold
  • 5. Consider paid Reddit actor for better targeting
  • 6. Add per-source keyword config with adjustable weights