A Question I Couldn't Stop Asking
A few months after I finished my fieldwork in Bodh Gaya — after all the hours spent watching women queue at hand-pumps before sunrise, after writing about who gets piped water and who doesn't — I kept circling back to a smaller, quieter question. Getting water to your house is one battle. But once the water is there, sitting in a steel pot or a plastic drum in the corner of the kitchen, does anyone actually do anything to make it safer to drink?
It's a strange gap in how we usually talk about water poverty. We talk endlessly about access — pipes, taps, distance, scarcity — and comparatively little about what happens in the thirty seconds before a glass of water reaches someone's mouth. Do they boil it? Strain it through an old sari, the way their mother and grandmother did? Run it through a filter bought on EMI? Or do they just drink it as it comes, straight from the source, and hope?
That question sent me, somewhat unexpectedly, away from ethnographic fieldwork and into a dataset of 636,699 households. This post is about what I found, and — because a few of you have asked — about how my co-author Aniket Kumar and I actually built the analysis from scratch using Python.
This work has since been published as "Patterns and predictors of household water treatment in India" in Cleaner Water, but I wanted to write the human version here — the version with the "why," the dead ends, and the numbers that genuinely surprised me.
Why Household Water Treatment Matters More Than It Sounds
Let's start with why this is not a niche technical question. Globally, roughly 2.2 billion people still lack safely managed drinking water. Water-borne diseases like cholera, typhoid, hepatitis A and diarrhoea continue to kill people — diarrhoea alone is responsible for over 13% of all deaths of children under five in India, according to NFHS-5 data. Between 2011 and 2020, India recorded 565 cholera outbreaks. Typhoid cases in 2021 alone were estimated at around 10 million.
Most of these illnesses are entirely preventable. And household water treatment (HWT) — sometimes called point-of-use treatment — is one of the cheapest, most scalable tools we have to prevent them. You don't need new pipelines or new boreholes. You need a working stove, a clean cloth, a filter, or a few drops of chlorine, and the knowledge and motivation to use them consistently.
The trouble is that almost no one had looked at this systematically across India. We have plenty of scattered, small-scale studies — a district here, a state there — but nothing at the national scale that told us: how many households actually treat their water, what methods do they use, and who is being left out?
That gap is what we set out to fill.
Where the Numbers Came From
We used the National Family Health Survey, Round 5 (NFHS-5), conducted between 2019 and 2021 under India's Ministry of Health and Family Welfare, with technical support from the DHS Program, ICF (USA), and USAID. It's the closest thing India has to a comprehensive household health census — 636,699 households, surveyed across 36 states and union territories, using a two-stage stratified sampling design.
I want to be honest about something here: working with a dataset this size is not glamorous. It's not fieldwork. There's no woman telling you "Monday is the day the hand-pump breaks." It's a .dta file — a Stata data file — sitting on a hard drive, with thousands of cryptically named variables like hv237, hv204, sh47. Most of the actual work of "data analysis" is just patiently figuring out what each of those codes means and making sure you're not accidentally lying with your own numbers.
Building the Pipeline
We did everything in Python. Here's roughly how the pipeline came together, in the order we actually built it:
1. Loading and trimming the data. NFHS-5's full household file has hundreds of variables, most of which we didn't need. Using pyreadstat, we loaded only the columns relevant to water and sanitation, household demographics, and the survey's design variables (the primary sampling units, strata, and the all-important household weight variable, hv005). Even trimmed down, we were still pulling in survey design variables, the core water-treatment variable (hv237 and its eight or so sub-codes for different treatment methods), wealth, religion, caste, house type, and member-level rosters to identify the household head's age, sex, and education.
2. Applying survey weights. This is the step that's easy to skip and absolutely should not be. NFHS-5, like most national surveys, doesn't sample everyone with equal probability — some states and strata are deliberately oversampled to get statistically reliable estimates for smaller populations. If you just count raw rows in the data, you get a distorted picture of the country. The hv005 weight variable (provided with six implied decimal places, so we divided by 1,000,000) corrects for that. Every single percentage you'll read in this post — 41.7% national treatment rate, 95% in Nagaland, under 10% in Bihar — is a weighted estimate, built to reflect the actual population of India, not just the structure of our sample.
We did keep both unweighted and weighted estimates side by side for a while, mostly as a sanity check. They differ by a few percentage points, which on its own tells you something about how survey design shapes what conclusions you draw.
3. Defining the outcome. The key variable, hv237, asks: "Is anything done to the water to make it safe to drink?" It's a simple yes/no, but it comes bundled with fourteen sub-variables describing what is done — boiling, adding bleach or chlorine, straining through cloth, using a water filter, solar disinfection, letting it stand and settle, using alum, electronic purification, and a couple of "other" and "don't know" categories. We dropped the country-specific codes since they were empty for India.
This is also where we made an analytical decision that I think matters a lot: we split methods into "effective" (boiling, bleach/chlorine, water filters, solar disinfection, electronic purifiers) and "less effective" (straining through cloth, letting water settle, using alum, and unspecified "other" methods). This isn't us being snobbish about cloth filtering — sari-cloth filtration has real, peer-reviewed evidence behind it for reducing cholera transmission in places like Bangladesh. But as a general method, it doesn't remove the range of contaminants that boiling or proper filtration does, and lumping it in with boiling as "treatment" would have overstated how protected Indian households actually are.
4. Choosing the predictors. We settled on thirteen variables: gender of the household head, place of residence (rural/urban), education of the household head, caste category, religion, wealth quintile, house type (kutcha/semi-pucca/pucca), household structure (nuclear/non-nuclear), water source type, time taken to reach the water source, who in the household fetches water, and toilet facility type. Several of these required collapsing dozens of granular survey response codes into a handful of interpretable categories — something we did carefully, documenting every regrouping decision in a supplementary file so the choices are auditable rather than buried.
5. The statistics. Once the data was clean, we ran three layers of analysis:
- Descriptive statistics — simple weighted percentages, state by state, method by method. This is what produces a figure like Bihar's water treatment rate sitting below 10% while Kerala and Nagaland sit above 95%.
- Chi-square tests and Cramer's V — to check whether each predictor variable was statistically associated with water treatment, and more importantly, how strong that association was. Chi-square alone tells you "yes, there's a relationship," but with a sample of over 600,000 households, almost everything will be statistically significant whether or not it's meaningfully important. Cramer's V gives you an effect size you can actually compare across variables.
- Multivariate logistic regression — this is the part that lets us say something like "having a higher education makes a household 1.83 times more likely to treat its water, holding everything else constant." Using
scikit-learnandstatsmodels, we built a logistic regression model predicting the binary outcome (treats water: yes/no) from all thirteen predictors simultaneously, computing adjusted odds ratios (AOR) with 95% confidence intervals. We validated the model with 5-fold cross-validation, checking the area under the ROC curve (AUC), which came out to 0.753 — a solidly reliable model by the standards of social survey data.
We also generated a fairly elaborate set of outputs at the end — an Excel workbook with over twenty sheets of breakdowns, a PDF summary report, diagnostic plots checking the regression's residuals, and a heatmap visualising the Cramer's V associations. Most of that scaffolding exists so that two researchers working from different cities (I was in Tokyo, Aniket too, working independently on parts of the pipeline) could check each other's numbers without constantly re-running the whole script.
None of this is exotic statistics. What makes it useful is care — getting the weights right, being honest about what "effective" treatment means, and resisting the temptation to oversimplify a genuinely messy reality into a single clean number.
So, What Did We Actually Find?
The Headline Number, and Why It's Misleading on Its Own
41.7% of Indian households treat their water before drinking it. Say that number out loud and it sounds almost like a coin flip — like the country is roughly split down the middle. It isn't. That 41.7% is hiding a much sharper divide underneath it.
Split by location: 56.5% of urban households treat their water, compared to just 34.3% of rural households. Split by wealth: 67.2% of the richest quintile treat their water, against 22.6% of the poorest. Split geographically, the gap becomes almost absurd — over 95% of households in Nagaland and Kerala treat their water, while in Bihar it's under 10%.
And even among the households that do treat their water, not all treatment is equal. Of the 41.7% who treat, 67.8% use an effective method (boiling, filtering, chlorinating, solar disinfection, or electronic purification) while the remaining 32.2% rely solely on less effective methods like straining through cloth or letting water settle. So the "real," effective treatment rate for the country — the share of all households using a method that genuinely reduces health risk — is closer to 28.3%.
What People Actually Do
Boiling is the single most common method nationally, at 38.3%, followed closely by straining through cloth (35.6%), water filters (16.7%), chlorine bleaching (8.1%), and electronic purifiers (3.3%). About 78% of households that treat water use just one method; 19.2% combine two, and a small 1.3% layer in three.
But the geography of method choice tells its own story. Boiling dominates in Kerala, Nagaland, and Sikkim — states with strong public health messaging and high literacy. Straining through cloth is the default across the central and western belt — Madhya Pradesh (78%), Rajasthan (71.3%), and neighbouring states. The Northeast has its own distinct tradition: Tripura, Mizoram, and Assam rely heavily on indigenous clay-based ceramic water filters, a locally manufactured technology that doesn't show up much anywhere else in the country. And electronic purifiers — the reverse osmosis units you see advertised everywhere — cluster almost entirely in wealthy, urbanised pockets like Chandigarh (62.1%) and Delhi (55%).
That last one comes with an irony I didn't expect to find. Households in cities like Delhi and Chandigarh are often already receiving water that's been treated at a municipal level — and they're still running it through an RO purifier at home, largely out of distrust in water quality. The problem is that residential RO units are remarkably wasteful: typical recovery rates sit between 10% and 25%, meaning that for every litre of purified water that comes out, three to seven litres are discarded. In a country facing a serious water scarcity trajectory, the wealthiest households' anxiety about water purity is quietly making the scarcity problem worse for everyone else. India's National Green Tribunal has actually pushed for regulating RO use based on water-quality indicators like total dissolved solids, precisely because of this waste.
Who Treats, and Who Doesn't
This is where the inequality lens — the same one I've used to look at piped water access in Bodh Gaya — becomes unavoidable.
Wealth is the single sharpest divider. Richest-quintile households are over four times more likely to treat their water than the poorest, even after controlling for every other factor in our model (AOR = 4.399). This isn't surprising on its face — treatment requires fuel for boiling, money for filters, or time that poorer households, often working multiple jobs, don't have. But seeing the strength of that gradient quantified, holding everything else equal, is still sobering.
Education follows a similarly steep gradient. Households headed by someone with higher education treat 61.5% of their water; those with no education treat only 28.4%. This tracks closely with literacy: states with very high literacy — Kerala, Nagaland, Sikkim, Mizoram — sit at the top of the treatment table, while Bihar, with the country's lowest literacy rate, sits at the bottom.
Religion produced one of the more striking patterns in the data. Christian households had the highest treatment rate at 65.2%, compared to 41.5% for Hindu households and just 34.4% for Muslim households — a gap that persists in the adjusted model (Christian households are 2.85 times more likely to treat water than Muslim households, holding other factors constant). Part of this is explained by education: over 60% of Christian household heads in the sample have secondary or higher education, compared to roughly 35% of Muslim household heads with no education at all. But education alone doesn't fully explain it, which suggests there's something about community norms, possibly tied to specific regional and institutional contexts, that's also at play.
Caste produced the finding that genuinely surprised me the most. I went into this analysis expecting Scheduled Tribe households — among India's most socio-economically and educationally disadvantaged groups — to show the lowest treatment rates. Instead, they showed rates (48.98%) comparable to, even slightly higher than, General/upper-caste households (46.73%), and meaningfully higher than Scheduled Caste households, who treated the least of any group at just 32.6%.
The explanation, once you dig into it, makes sense: a significant share of India's Scheduled Tribe population is concentrated in the Northeastern states, which also have a large Christian population and a long-standing tradition of indigenous water treatment — those ceramic filters I mentioned in Tripura and Assam, for instance. So "Scheduled Tribe" as a national category is masking very different regional realities; tribal households in Tamil Nadu or Madhya Pradesh, the literature suggests, fare far worse on both water quality and treatment access than their Northeastern counterparts. Scheduled Caste households, meanwhile, treat water the least of any caste group, which lines up uncomfortably well with everything I documented in Bodh Gaya about caste-based exclusion from water infrastructure — the same communities pushed to the margins of the pipe network are also the least able to compensate for that exclusion at the point of consumption.
Gender showed a smaller but still real effect. Male-headed households had a marginally higher treatment rate (42.1%) than female-headed households (27.6%), though the magnitude was modest compared to wealth or education. More tellingly, the data confirmed something I already suspected from fieldwork: women fetch water in roughly 72% of households where someone fetches it at all, compared to a small fraction fetched by men. Whether the person fetching water is a woman seemed to slightly increase the odds of treatment compared to households where no one fetches water — possibly because women, who manage the household's water use most directly, are also the ones who end up managing its safety.
Sanitation infrastructure mattered more than I expected going in. Households with flush toilets, and even pit latrines, were more likely to treat their water than households with no sanitation facility at all. This isn't a coincidence — house type, toilet access, and water treatment all tend to travel together because they're all downstream of the same underlying socio-economic position. A household that can afford a pucca house and a flush toilet can usually also afford a water filter or the fuel to boil water regularly.
The Bigger Picture: Water Source Matters Most of All
Across every test we ran, one variable consistently came out on top in terms of statistical association strength (measured using Cramer's V): the water source itself. Households drawing from surface water or unimproved sources treated water at noticeably high rates (often because they know the source is unsafe), while a meaningful share of households relying on "improved" groundwater sources — borewells, protected wells — treated their water the least, possibly because they assume "improved" means "safe," when groundwater across huge swathes of India is contaminated with arsenic, fluoride, nitrate, or excess salinity regardless of how the source is classified.
This, to me, is the most policy-relevant finding in the whole study. We tend to talk about "improved water access" as the finish line. But improved infrastructure and improved water quality are not the same thing, and the data suggests many households are being lulled into a false sense of safety by infrastructure labels that don't reflect what's actually coming out of the tap.
Where This Leaves Us
None of the findings here are technically complicated to act on. Promote boiling and filtration alongside, not instead of, infrastructure expansion. Regulate RO purifier waste so that urban water anxiety doesn't deepen rural scarcity. Target the states that need it most — Bihar, Uttar Pradesh, West Bengal — by folding household water treatment messaging directly into existing programs like the Jal Jeevan Mission and Swachh Bharat Mission, rather than treating it as a separate initiative competing for the same limited attention. None of this requires new technology. India already has cheap, locally manufactured solutions — the IIT Bombay arsenic filter, built from local materials and assembled by local masons, currently serves a few hundred families per unit at minimal cost, and similar low-tech, high-impact tools exist for other contaminants.
What's missing isn't technical capacity. It's political will, and an honest accounting of who is being left to fend for themselves — a phrase that, I realise, keeps showing up in everything I write about water in India, whether I'm looking at one town through fieldwork or the whole country through a regression table.
This study has real limitations, and I want to be upfront about them. NFHS data is cross-sectional, so we can describe patterns but not prove causation. It tells us whether a household treats water, not whether that treatment actually worked — we have no direct water quality testing tied to these households. And it can't capture the psychological and behavioural reasons why someone who knows their water might be unsafe still doesn't treat it — habit, fatigue, distrust, or simply not having one more thing to manage in an already overstretched day. Those are the kinds of questions that, I suspect, only fieldwork like my Bodh Gaya work can really answer.
But even with those limits, I think the exercise was worth it. Sometimes you need the zoomed-out view — 636,699 households, weighted, modelled, cross-validated — to see that the story you found in one small pilgrimage town in Bihar isn't a local anomaly. It's a fractal. The same lines of caste, wealth, religion, and gender that decide who gets a pipe to their house also decide, quietly, who treats the water once it arrives.
I'm continuing to dig into this as part of my broader PhD work on water inequality in India, and I'll be writing more as that develops — including, I hope, a closer look at why some Scheduled Tribe communities outside the Northeast are facing a very different, much harsher water reality than the national averages suggest.
Further Reading
- Patterns and predictors of household water treatment in India — Cleaner Water, 2025
- Water-caste-gender-tourism nexus in Bodh Gaya, India — Sustainability Nexus Forum, 2025
- Burdens of household water collection from a gender perspective: a NFHS-5 study — in Population, Sanitation and Health, Springer, 2023