FOR THE PEOPLE


Read More

Do you live or often visit the City of Turin? Take our anonymous 5 minutes survey, to help us shaping citizens social issues.

Attention! The Survey is in Italian, you can automatically translate it to English or to your preferred language by using Chrome with the Google Translator Extension.

FOR THE INSTITUTIONS


Read More

Are you working for an Institution of the City of Turin?

We are conducting a dedicated survey aimed at decision-makers of the Municipality and at all professionals who interact directly or indirectly with citizens. The goal is to collect valuable perspectives on the main social issues affecting Turin.

If you work for the Municipality of Turin, GTT, or any other institutional company impacting the Turin citizens, you are invited to take part in the Stakeholders Survey. It is anonymous, takes about 15–20 minutes, and includes tailored questions based on your role.

Please note:

  • The survey is in Italian. However, you can easily translate it into English (or your preferred language) using Chrome together with the Google Translate extension.
  • The survey must be completed on a device with a physical keyboard (e.g. a PC or laptop), so please make sure you are settled at your computer before starting.

Your participation is greatly appreciated and will help us better understand the voices and experiences shaping our city.

DATA and RESULTS


Read More

The HARMONIA project integrates heterogeneous data from the city of Turin and the perspectives of its residents into large language models (LLMs), with the goal of improving public services, particularly in the area of urban mobility.

The idea is to make artificial intelligence not only more informed, but also more closely aligned with people’s lived realities, thanks to three key dimensions:

  • Organized knowledge: not just scattered texts, but information that is structured and linked through semantic relationships.
  • Real behavioral data: numbers, statistics, and patterns that describe how people actually live and move.
  • Multiple citizen perspectives: viewpoints from residents and communities to avoid partial or distorted representations.

The ultimate goal is to create trustworthy AI that supports policy-making, accompanying public decision-making processes—from listening to decision-making to communication—in a more attentive, transparent, and collaborative way.


For the HARMONIA project, a broad ecosystem of heterogeneous data was built, integrating quantitative, qualitative, and geospatial sources related to the city of Turin, covering a timespan from 2012 to the present. The goal was to provide a realistic, multi-layered picture of urban mobility, social conditions, and residents’ everyday practices.

Read More

Demographic and Urban Data

  • Population: distribution by gender (female, male, foreigners), age groups, and household composition (children, working-age adults, older adults).
  • Urban elements: shapefiles with the location of schools, hospitals, and green areas.
  • Income by postal code: detailed socioeconomic information.

Public Transport Data

  • Urban transport network: geolocated lines and stops, updated annually.
  • Service frequency: timetables for urban lines (2014–2024), with particular detail for the period 2015–2017.
  • Ticket validations: dataset of validations (“weekly validations”), including information on trips made and disaggregated by GTT network line.
  • Sales and subscriptions: distribution by age, gender, and residence (Turin vs. outside Turin), with details on sales channels (e-commerce, CSC, metro TVM, major clients).
  • Revenue: monthly and annual data from online channels, including VAT and refunds.

Safety and enforcement data

  • Territorial inspections (2012–2024): data from the Linea Sicura project (safety, drug dealing, pickpocketing, harassment, vandalism, fare evasion).
  • Other monitoring lines (AaC, AdC): with annual and line-level details.
  • Fines: annual aggregates by gender, age group, and residence (Turin vs. outside Turin).
  • Accidents: database with location, event type, and vehicles involved.

Institutional Data

  • Resolutions: official documents (2012, 2018, 2023, 2024), with particular emphasis on the table of regional fares for extra-urban services effective from July 2024.
  • GTT staff: PDF with aggregated information on personnel (role, age, gender).

Textual Social Media Data

Alongside statistical and administrative sources, textual data from various social media platforms were also collected:

  • Facebook: posts from selected public groups, focusing on topics such as public transport, city institutions, health, schools, mobility, safety, real estate market, local communities, and neighborhoods.
  • Reddit: posts collected using keywords related to Turin.
  • Twitter (TWITITA): corpus of geolocated tweets or tweets referring to the city.

Data Vizualization and Analysis

Thanks to this wealth of data, dynamic maps and indicators were produced to monitor the evolution of urban and social phenomena over the period 2012–2019. The maps allow for:

  • Visualize the distribution of services and opportunities
  • Analyze territorial inequalities
  • Understand how demographic and infrastructural changes intersect with daily mobility

Some examples of the analyses conducted are presented in the following section.


To provide a solid foundation for the AI, an ontology was developed to integrate and connect diverse data sources.

Read More

Motivation:

LLMs alone are not reliable for critical decisions (e.g., due to the risk of “hallucinations” or incomplete information extraction). Ontologies and knowledge graphs can represent encyclopedic knowledge, but they often do not “communicate” with language models. It is therefore necessary to combine text (from which LLMs learn) with semantic structures (ontologies, knowledge graphs) to enable more equitable and evidence-based decision-making.

Methodology:

The ontology is grounded in two key theoretical frameworks. The first, Theory of Change (UN), provides a roadmap for understanding how an intervention can lead to concrete, measurable outcomes through evidence-based causal chains. The second, Nudging Theory (Thaler), emphasizes the use of subtle interventions that can guide people’s behaviors in predictable ways without imposing mandates or economic constraints.

  • The ontology represents social and mobility data, including:
    • Geospatial, demographic, and mobility data: providing the spatial and population context needed for informed decision-making.
    • Competency-driven questions: for example, identifying which areas require improvements in public transport.
    • Integration of social perspectives and stakeholders: ensuring that multiple viewpoints are considered in the analysis.
    • Modeling of sensitive topics: such as caregiving, accessibility, and behavioral nudges.
    • Evidence from scientific literature: capturing research on citizens’ transport choices and relating it to demographic characteristics.

An example of the population in the knowledge ontology, extracted from scientific articles, is shown in the figure, illustrating the issue of spatial mismatch (Cui et al., 2022).

Here’s the different components of the Nudging Ontology:

Examples of represented information:

  • Public transport policies
  • Services and cost of living by area
  • Indirect income estimates
  • Data on age, gender, migration status, and household size
  • Housing overcrowding and territorial inequalities

Below is an example of a query on the populated Knowledge Graph:


In parallel with the ontology, an experiment using Retrieval-Augmented Generation (RAG) was developed to improve the quality of LLM responses by integrating up-to-date and local data.

Read More

The Dataset:

  • It contains 7,019 examples related to Turin, with data from 2012 to 2019.
  • It covers 3,850 census areas, aggregated into 93 statistical zones and 9 districts.
  • It has evolved from 16 to:
    • Demographics: population, gender, age, foreign residents, households.
    • Public transport: stops, lines, average distance.
    • Urban geography: areas, zones, districts.
    • Traffic and accidents: number, type, involvement of public transport.

Main Challenges:

  • High complexity: many features need to be maintained without loss of information.
  • High computational cost: 27 hours required to verbalize with the LLM.
  • Need for more efficient verbalization schemes.

Verbalization Strategies:

  1. Zero-shot: without guiding examples, the model generates freely (risk of errors).
  2. Few-shot: with guiding examples (even just one), improving coherence and accuracy.
  3. Structured (JSON): data representation without using the LLM, excellent for information retrieval.
  4. Hybrid: combines the structured and few-shot approaches, with potential future integration of ontological rules.

Read More

In addition to the semantic integration of data, the project also explored new ways of representing and narrating information. In particular, heatmaps were created to describe aspects related to:

  • The demographics of different areas of the city
  • The distribution and accessibility of public transport
  • Urban mobility patterns

These maps were not used solely as visual tools, but also as inputs for an image-to-text verbalization approach. The idea is that a language model can “read” the maps, describing their contents in textual form, and thereby generate coherent and interpretable narratives for policymakers or the general public.

The initial experiments have shown promising results: the generated descriptions not only reproduce the underlying data but also provide an accessible translation, making it easier to understand territorial disparities and urban dynamics. This paves the way for future developments in which text, structured data, and visual representations can be integrated into a single analysis and communication pipeline.


Scroll to Top