Usiru: Crowd Sourced Air Pollution reduction

11 min readNov 3, 2019

[ Usiru is a Kannada word for breath. This is a blog related to a proposed large scale project in Namma Bengaluru under Smart City Project . See the original blog and the beta website for details and A short TEDX style 15 minute video talk at MINT Digital Innovation Summit 2019

This blog will be more focused on India esp Delhi and Bengaluru but the issues and approach are global.]

Abstract Data Rich Smart Policy

Air Quality (AQ) is not a stable or slow changing phenomena. It may be a surprise to many that the AQ parameter values like PM 2.5 of 100 Micro-gm/cubic meter can vary by 100% or more at the same place during 24 hours and by 600–800% across seasons¹ Quantifying such volatile phenomena requires a rigorous statistical basis.

We need to establish the data and the method to show cost and benefits of interventions. This can provide the hard evidence for coercive methods or out-of-box innovative solutions. In this post we will outline a solution to moving away from DataPoor Seat-of-the-pants style of decision making.

Air Quality Index (AQI)

There are six AQI categories, namely Good, Satisfactory, Moderately polluted, Poor, Very Poor, and Severe. Each of these categories is decided based on ambient concentration values of air pollutants and their likely health impacts (known as health breakpoints). AQ sub-index and health breakpoints are evolved for eight pollutants (PM10, PM2.5, NO2, SO2, CO, O3, NH3, and Pb) for which short-term (upto 24-hours) National Ambient Air Quality Standards are prescribed.

Air quality data can be analysed in two fundamental ways:

By the frequency and length of time that a certain concentration is exceeded; this requires periodic sampling of the concentration at short time intervals. New mobile devices can sample down to 1 Hz (1 /Second) but 1 per minutes are more commonly reported.
By concentration averaged over a specified time interval. Modern mobile devices can report minute averages but hourly averages more common in reported statistics. National measures use 2, 8 or 24-hour average. National ambient air quality Guidelines prescribe methods and techniques.

For our purpose we will focus on PM 2.5 and NOx.

Please pursue the excellent website at http://www.urbanemissions.info/ . There are theoretical model apportioning to locations as well as sources of pollution (See NCR) . SAFAR (System of Air Quality Forecasting And Research from The Ministry of Earth Sciences (MoES), Government of India, provides location-specific estimates on air quality .

AQI is computed differently by Central Pollution Control Board’s National Air Quality Index, the Ministry of Earth Sciences’ SAFAR, or System of Air Quality and Weather Forecasting And Research, and the index by the World Air Quality Index Project. The Central Pollution Control Board uses a 24-hour average data, while SAFAR reports a real-time figure on its website and mobile app. World Air Quality Index project uses the scale developed by the United States Environmental Protection Agency, even though it says that India’s national index, with its higher readings, is “more adapted to Asian dust”.

Factors affecting Air Quality

The AQ Index (AQI) is meant to communicate the state of the city and encourage authorities to take action like asking some emitters (Construction) to reduce activity or declare holidays (for schools). All emitters are not equivalent. Diesel cars are worse on NOx but Power Plants (there are 5 within NCR and NTPC Badarpur is a particularly bad emitter) are worst on particulate matter. A report postulates crop burning by farmers in the region as also contributing significantly.

The AQI apportionment reports are based on models and most have never been validated with large scale on the ground data. So expect significant variations in the numbers of what factors contributes what %. This is one of the major reasons in Usiru we argue for a sound baseline database of 4,000 plus PM sensors at every 15 minutes.

AQ depend on meteorological activity and concentration and types of emitters. North India suffers fog during winter due to natural meteorological phenomena. There are secular factors like

growth in population
construction activity
car population

that need to be considered in making an assessment of cause and effect and forecast with any degree of certainty.

Delhi Actions

Most discussions tend to pick one factor as the driver and action that factor. Delhi has seen the following identified:

BREATHING CLEANER AIR Ten Scalable Solutions for Indian Cities

🚗 Diesel cars being sanctioned by the Supreme Court
✨ Diwali celebrations being curbed by the Supreme Court
🔥Stubble burning in nearby states [ The current October 2019 favourite ]
🍂leaf burning especially in winter in parks
🚧 Constructions all around
🚛garbage within the city
🚬 smoke from all cooking sources
🏭Thermal power stations nearby
industrial effluence from small scale industries

Cities like Delhi are supposed to have graded action plan (GRAP) in short term emergency.

Multi Factor drivers of Air Pollution

There are voices pointing the mistake of jumping to conclusion and punitive actions which do not produce much difference. Delhi should consider a ventilation index too

There is a misplaced (or mischievous) belief that as pollution rises only in the winters, the cause is stubble burning....
The sources of pollution remain constant through the year — it is cleaner because through the year, winds disperse pollutants and there is circulation in the atmosphere (defined as the ventilation index). So, sources do not disappear, but pollution is not in our face........ The high smog episodes happen even in December and January when there is no crop burning, but just local pollution and adverse weather. It is important that we recognise this because otherwise the entire attention is diverted to external sources. It may be good politics to shift the blame to other states. But it is certainly not a good pollution management strategy.
 ---    Sunita Narain Down2earth

Times of India 5 Nov 2019 Why capital is a gas chamber

We should consider Ventilation Index . The low level winds disperse smoke from burning leaves or stubble. Here is an extract from British Columbia

The burning of woody debris outdoors is only permitted when the forecast Ventilation Index is sufficient to disperse smoke. British Columbia guide

Multi Armed bandit

Analysis of a volatile measure

A part of reaching firm conclusions ( Like odd-even scheme in Delhi ) is the inherent large scale variability in measures of ambient air pollution. PM 2.5 is most variable with up-to 60% in a day and 800% across seasons. Breathe London created a network of 100 state-of-the-art sensors and began driving two specially equipped Google Street View cars through the capital to measure air pollution. The graphic shows Breathe London fixed monitors near 3 schools. This shows 55 to 75% variation within 24 hours on a summer day!

There are statistical methods like ANOVA and Randomized controlled trails RCT as well as current data science approaches like Multi Armed Bandit (MAB) or neural net based machine learning to help make statistically valid conclusions . We favour MAB as we need about 6 months of data to make firm tests and its unlikely any one factor or intervention will alone change or reduce by 15% or more

In marketing terms, a multi-armed bandit solution is a ‘smarter’ or more complex version of A/B testing that uses machine learning algorithms to dynamically allocate traffic to variations that are performing well, while allocating less traffic to variations that are under-performing. In theory, multi-armed bandits should produce faster results since there is no need to wait for a single winning variation.
The term "multi-armed bandit" comes from a hypothetical experiment where a person must choose between multiple actions (i.e. slot machines, the "one-armed bandits"), each with an unknown payout. The goal is to determine the best or most profitable outcome through a series of choices. At the beginning of the experiment, when odds and payouts are unknown, the gambler must determine which machine to pull, in which order and how many times. This is the “multi-armed bandit problem.”
  --What is the Multi-Armed Bandit Problem?

PM Reduction

Traditionally methods proposed focus on reducing activities that create pollution especially emissions. They assume large area like a city or state and mandatory , penal enforcement.

They typically are restrictions on source of pollution activity , intended to be punitive and mandatory but rarely enforced. China model of coercive intervention during Beijing Olympics 2008 may be a reference. Our Data Quality Intelligence Platform will capture this data as well. Our focus is bottom up resident lead crowd sourced action.

We have a small set of alternatives and more can be discovered that allow residents or smaller areas to pursue other methods. There are very little data available on how they help. Some studies indicate a 6 feet high dense shrub hedge may reduce PM by 8–12%.

Experiment Design

Our plan is to find a few dozen locations varying in size from a bungalow , school , gated community, shopping complex , factory estate to a city ward and have local community and civil activist as well as CSR patrons plan a series of “interventions” or experiments to explore reducing² PM 2.5 by 30%.

We assume the municipal authorities and state government will in parallel do some interventions.

The interventions will be from a large series of possible like pollution emitter control to reducing by other methods like dense urban forest, active air freshener and others

Local at site
Area containing the location (city ward)
Neighbourhood areas (wards)
City as a whole ( best in many granular wards not averages at mere 6–12 locations)

Our assumption is need to record multiple variables. We will measure PM , vibration or noise etc at every 15 minutes from multiple sensors in the location of experiment, in the neighbourhood and in city as whole and take photographs at every hour. We will also record wind, sunshine and precipitation data .

We aim to overlay sensor data with event identifiers (an overlay) like holidays, storms, municipal activity like road dusting , strike etc

We aim to use a machine learning model to correlate data across all locations and small time segments. We expect some measures may need smoothing, some may need to be correlated with lag or lead to to others ( Wind and sunshine will affect PM over time)

The data will be open sourced and the active data scientist community would be a great enabler to smarter experiments and better analysis

We assume a Multi Armed Bandit (MAB) model where each experiment proceeds independently. Each experiment makes adjustments at periodic review points of 45 to 90 days and may do any or all of the following:

measure more data ( proxy measures of other activities like genset usage )
measure more frequently ( weather every 4 hours rather then daily)
change sensor locations or add more sensors
Add additional interventions ( MAB assumes multiple changes : a decision tree ..)
Change sensors if we need better data quality . Impute Figure of Merit (FOM) for each sensor for data analytics

We found iScape Living labs as another organization with similar plans of exploration, experimentation and evaluation. They have 6 experiments underway .

We need a large set of measurements³ apart from the locations of experiments. We will reach out to colleges and schools to engage the students , help build or use a reference kit and join the project. We will also request companies, apartments and associations to adopt our recommendations on measuring and reporting

We do not want to get into debates on expensive sensors as the PM is a highly volatile quantity and 10–15% error in absolute measures is not going to dramatically make a difference. We are looking at relative drift from very high to high or medium and data science methods like Kalman Filter can deliver very acceptable results on “voting” by large number of measurements ( used extensively in self driving car..)

In many ways a simple way to do a red yellow or green check if a person is suffering from high temperature (fever) is the natural old analog method of touching a reference point the forehead or armpit.

Outcome

We expect to baseline a series of successful experiments where we can quantify the cost, effort and impact of interventions and enable the large community to rapidly adopt the useful methods.

We expect device and intervention providers to learn and improve
The large community to adopt successful methods
Governments to encourage more scaling by subsidy, regulation or larger pilots
Adoption across many cities

Foundation tooling

We need to start measurements at many locations. PM at 4 per SqKm and weather etc as per initial plan . This implies low cost devices and large scale crowd sourced data to be fed to our data platform
We need a way to mark each device ( PM, Weather, Camera, Vibration) with a score of its expected data quality . We call this a figure of merit (FOM) . For NOx FOM may not be a an issue but for PM 2.5 sensors display a large degree of variance. We assume a 70% percent correlation of PM 2.5 with regulatory grade sensor as acceptable. We will find out data quality of weather , vibration sensors etc in our initial 6 month phase ( Skymet is a partner as well as KSNDMC ( Karnataka State National Disaster Management Centre ))

3. We will work on algorithms and simple (self calibration) methods to allow a large number of devices ( 20 or more models for PM) to contribute to our data platform. These will establish zones of confidence of the device

3a. range of ambient temperature

3b. range of precipitation

3c. range of concentration of dust

3d. range of equipment life ( filters, battery)

4. We hope to get a mobile van equipped with some rudimentary calibration methods . In our expectation instead of co locating users devices to regulatory grade sensors ( 6–15 in a city) for months we will use a a quick method which will do an in-situ test of a few hours in accelerated dust, humidity and temperature cabins to trend against a reference calibration curve

5. We expect at least 6 or 9 months of baseline data before experiments outcome can be calculated

Summary

We are looking for volunteers and collaborators. Please register your interest at https:/tinyurl.com/UsiruBLR/collaborators

Current regulation and public policy is based on DataPoor mind set and can be dramatically improved with a modern IoT world. DataRich wireless sensor networks have delivered 12–25% improvement in world class factories that barely eke out 2–6% with Six Sigma or ISO. The potential benefits in DataPoor City and public policy decision making is probably even more.

Smart cities Ambient Air Quality Particulate Matter needs a DataRich Smart Public Policy using a widespread IoT sensor data intelligence system . A PDCA (Plan-Do-Check-Act) loop is the preferred approach rather then one shot BIG gamble methods.

Do watch the 5 Steps in Clean Air in India TedX talk by Arunabha Gosh and Fireside Chat on Usiru: Air Pollution reduction by Crowdsourced #RichData #SmartPolicy at IoTNext 2019

— — — — — — — — — — — — — -

Footnotes

The stock market which most of us consider very volatile varies by less than 12% a day for most individual stock and 5% at the level of NIFTY index on most days
NOx and others also
Skymet is attempting a large scale street by street measurement in Delhi. See the #delhipollution timeline on LinkedIn