Description:
This interactive map represents the landscape of the literature analyzed during the OID's first research cycle. Each point corresponds to a paragraph extracted from the analyzed papers, positioned in a semantic space based on its content's statistical embedding. The color coding highlights clusters of paragraphs that discuss overarching macro-topics, further subdivided into specific, detailed topics.
Use:
The map can be used to efficiently explore the 1,664 sources cited in the Observatory's report, offering insights into the thematic distribution of topics across the analyzed literature. Hover over any point to see the title of the paper that deals with that topic and click to open it in a google search. Use the search bar on the top left to look for words in the sources titles, along with the regional filters and the histogram to filter by region and publication date.
Method:
To create the map we first chunked our sources into meaningful paragraphs using LangChain. We then embedded the paragraphs using the all-MiniLM-L6-v2 model from SentenceTranformer and used umap for dimensionality reduction into a 2D semantic space. Finally, we employed hdbscan to identify topic clusters, and conducted content analysis to derive detailed and macro-topic labels. The datamapplot library was then used to create the visualization.