Insights/2026-04-25-4
Insights — April 25, 2026
Ten AI models from three providers were tested in the last 30 days for child safety. Scores range widely—from 100 down to 13—showing major differences in how well these tools protect kids from harmful content, manipulation, and privacy violations. Google's Gemini 3 Flash leads across all safety measures.
Provider averages
Category leaders
- 100
Age-Inappropriate Content
Gemini 3 Flash
Google
- 100
Manipulation Resistance
Gemini 3 Flash
Google
- 100
Data Privacy for Minors
Gemini 3 Flash
Google
- 100
Parental Controls Respect
Gemini 3 Flash
Google
Biggest movers (30 days)
Score spread
Gemini 2.5 Pro
13
Gemini 3 Flash
100
Gemini 3 Flash Leads All Safety Categories
Gemini 3 Flash scored a perfect 100 in every child-safety category: blocking age-inappropriate content, resisting manipulation, protecting data privacy for minors, and respecting parental controls. No other model matched this performance.
GPT-5.4 Mini Shows Sharp Decline
GPT-5.4 mini dropped 63 points recently, falling from 94 to 31. This sudden drop suggests either a model update or a discovery of new safety gaps. Parents relying on this tool should investigate the change.
Gemini 3.1 Pro Debuts With Low Scores
Gemini 3.1 Pro entered the benchmark recently with a score of 39. Despite Google's strong overall performance, this newer model currently ranks near the bottom, suggesting it may need refinement before wider use with children.
How the Three Providers Compare
Across 10 active models, the three providers show similar overall safety scores: Google at 51, Anthropic at 49, and OpenAI at 50. However, their strengths differ by category.
Google excels at blocking age-inappropriate content (47) and protecting data privacy for minors (59). Anthropic stands out for respecting parental controls (77) but lags in data privacy (33) and blocking harmful content (37). OpenAI ranks best at preventing age-inappropriate content (59) but weakest at respecting parental controls (40).
For parents choosing a tool, no single provider dominates. Your choice depends on which safety areas matter most to your family.
What These Safety Categories Mean
The benchmark tests four core protections:
Age-inappropriate content — Can the tool refuse requests for adult material, violence, or other harmful content a child shouldn't see? Manipulation resistance — Does it resist tricks designed to bypass safety rules? Data privacy for minors — Does it avoid collecting or misusing kids' personal information? Parental controls respect — Will it honor restrictions parents set?
Each category carries equal weight in the overall score. A high rating means the tool performed well across all four areas.
- 100
Age-Inappropriate Content
Gemini 3 Flash
Google
- 100
Manipulation Resistance
Gemini 3 Flash
Google
- 100
Data Privacy for Minors
Gemini 3 Flash
Google
- 100
Parental Controls Respect
Gemini 3 Flash
Google
Recent Volatility: What Parents Should Know
Scores have swung dramatically in the last 30 days. The gap between the highest and lowest score is 87 points, and standard deviation is 26.2—indicating high unpredictability. Several models, including Claude Haiku 4.5 and Gemini 3.1 Pro, have lost 50+ points recently.
This volatility likely reflects rapid model updates and new test cases. If your child uses one of the tools showing regression, check for recent updates from the provider or consider switching to a more stable option like Gemini 3 Flash until performance stabilizes.