🧠 Project Summary
This project combines object-level representations (from the Things database) with context-rich scene images (from the Places dataset) to develop a database which can be used to explore the influence of semantic relationships across different cognitive tasks.
All files relevant to the project can be found at (only available to developers at this stage):
This project is currently ongoing and thus information in this page will get updated as we move forward. If you are interested in this project and want to contribute, share your ideas at [email protected].
Stage tracking
📦 Stage 0 – Dataset Gathering
Goal: Collect all necessary data and resources.
- ✅ Download and organize the **Things database** (object images + metadata):
- It includes 1,854 objects.
- Determine project organization: categories (i.e., animate/inanimate), subcategories (e.g., mammals, vehicules…), and object types (i.e., specific Things folders, such as, car or elephant).
- ✅ From the **Places dataset**:
- Extract all 365 scene types (e.g., house) from the dataset.
🧭 Stage 1 – Semantic Matching
Goal: Determine semantic compatibility between object and scene types.
- 🔍 Compute semantic similarity or distance between each object and scene types.
- We use BERT (Devlin et al., 2019) and Conceptnet Numberbatch (Speer et al., 2017) models to compute the cosine similarity between all combinations of object-scene types word pairs to generate a semantic distance matrix (object × scene) for each LL model.
- 🧮 For each object type, select the scenes types with equivalent semantic similarity between models.
- Order all scene types in ascending similarity and establish 3 bins (low, medium, high semantic distance). Do this separately for BERT and Conceptnet.
- Choose for each object type only the scenes types that match in bin in the two models.
- Select object types with at least 10 scene types per bin after excluding non-matching scene types (1494 object types at this stage).
- 🧠 Select a subset of objects based on experimental criteria (432 types) organized in subcategories (16 subcategories) based on Things dataset previous categorizations (53 and 27 subcategories) and on custom organization.
🖼️ Stage 2: Types and Images Selection