After generating a topic map using Nomic Atlas (refer to the previous tutorial on “How to Generate & Visualize Embeddings”), Nomic Atlas attempts to automatically identify and assign relevant labels to posts with semantic similarities. While useful as a starting point in data exploration, the resulting labels may not always be accurate or informative. Therefore, we recommend manually reviewing posts in different clusters to validate the automatically assigned labels. This tutorial will guide you through this process and demonstrate how to download posts from a selected topic/cluster as a CSV file for further review.
Terminology and User Interface #
1) Depending on the size of your dataset, Nomic Atlas generates either two or three levels of topics: broad topics (=fewer topics covering more posts), medium-level, and specific topics (=more topics containing fewer posts).
The Nomic Atlas interface uses the following terms interchangeably:
Topic: broad | Topic: 1 | _topic_depth_1 |
Topic: medium | Topic: 2 | _topic_depth_2 |
Topic: specific | Topic: 3 | _topic_depth_3 |
2) You can use the View Settings menu to change the color of the dots to highlight different topics visually.
Below is an example of how a sample map looks when coloring dots based on the high-level topics (on the left) vs based on more specific topics (on the right). The list of topics at the selected level will appear in the legend at the screen’s bottom left corner.
You will have to decide which level is most informative for your dataset. We suggest starting data exploration with high-level topics. If they do not provide enough specificity, move to the medium (Nomic Topic: 2) or specific level (Nomic Topic: 3) for a more granular look and examination of your data.
Selection Tools: Filter by Topic #
3) Using the Selection Tools panel is another useful option to explore different topics.
3.1) Start by clicking the Filter icon to activate the selection process.
3.2) Scroll to the end of the dropdown menu to select the designed topic level (e.g., “Nomic Topic: broad”).
3.3) In the second dropdown menu (“Choose field to filter by”), select the label of a topic that you would like to explore and validate.
4) After selecting one of the topics, Nomic Atlas will select and highlight all posts corresponding to this selected topic.
The following example shows 2,253 posts grouped under the same broad topic automatically labelled “War“. Note that despite selecting a higher-level topic like War, the map visualization still displays labels for more specific topics related to the selected posts.
The next step is manually reviewing a sample of posts on the selected topic. To achieve this, use the left and right arrow buttons (see the screenshot below) to “flip through” the text and associated metadata of the selected posts.
Considering the relatively large number of posts grouped under this topic and the fact that the posts are semantically similar (since they are close to each other on this map), it is reasonable to review a fraction of all posts to get a general sense of the content represented by this topic. The number of posts to review would depend on your dataset’s size and the analysis’s purpose. Read 1-5% of posts per topic for exploratory data analysis.
Selection Tools: Lasso Feature #
5) To give you more flexibility regarding what posts to review manually, the Nomic Atlas visualization has the Lasso feature, allowing users to draw the area on the map to select the posts for review. An example of how to use this feature is shown below.
(1) In the “View Settings” panel (top right), select “Nomic: Topic broad” to highlight related posts using different colors.
(2)–(3) Use the Pencil tool to circle a group of posts you want to review manually. (This manual selection is not surgically precise.)
(4) After drawing the circle, you can now use the navigation arrows (previous/next) in the left-right panel to review all or a sample of the selected posts.
You can combine Filter by Topic with Filter by Lasso for a more advanced selection, as shown below. This option may be useful when examining broad topics with visually apparent separation between some areas within the cluster of dots. Those separations may suggest that a topic (in this case, “War”) consists of several distinct sub-topics.
Selection Tools: Search Feature #
6) An alternative to using the Lasso tool is the Search feature, which allows users to select posts containing relevant words or phrases automatically. We can use this feature to examine the prevalence of posts with given words in the selected topic.
The example below shows only 41 posts containing “nuclear” out of 2,253 posts on the broad topic of “War”.
See the Nomic Atlas Documentation for more details about the Search option.
Download Selected Posts #
7) Regardless of the Filter or Search option used to select posts, you can download them as a CSV file using the Download button, as shown below. The downloaded file can be opened with Excel or Google Sheets for further review.