CrowdTangle: How to Collect and Explore Facebook and Instagram Data in Communalytic (Part 1/3)


Communalytic can collect and analyze public Facebook and Instagram posts that shared the same URL (ex. a URL to a single NYT story or the URL to any domain name) via the CrowdTangle API’s Links Endpoint. This particular data collection feature in Communalytic is designed for current CrowdTangle users. This feature is useful for studying shared interests in online communities and for detecting signs of possible coordination (a.k.a. coordinated inauthentic behavior) among seemingly disparate accounts on platforms like Facebook and Instagram.

1. Getting a CrowdTanlge API key

Currently, access to CrowdTangle – a data platform owned by Facebook –  is limited to the following two groups of users:

    • University-based researchers (including faculty, PhD students, and postdocs) – Apply here

    • People and organizations that have a Partnership with Facebook – More info here.

Once Facebook has given you access to CrowdTangle, sign in to your CrowdTangle account to retrieve your API key (available in the [Settings] [API Access]). 

2. Adding your CrowdTangle API key in Communalytic 

Once you retrieved your API key from CrowdTangle, add the Crowdtangle API key to your user profile on Communalytic. Now, you are ready to collect data via Crowdtangle API.

3. Data Collection

Next, go to the [My Datasets] page in Communalytic and click on [Facebook/Instagram] under [Crowdtangle] as the data source to create your new dataset.

Enter the URL or domain you want to use as your search criteria. You can also enter a list of URLs as a .txt file through Option 2. Make sure that it starts with http or https. (For ex. It can be a URL that you heard about from the news or something you saw in your social media feed.) Next, specify whether you want to collect Facebook or Instagram posts and for what time period. 

Once this is done, Communalytic will send your search request to CrowdTangle and create a dataset with all available posts that mentioned the URL.

4. Data Exploration

Once data is collected, your new dataset will appear on the  [My Datasets] page. In the example below, we have collected recent Facebook posts that shared a link to an article entitled “COVID-19: Bell’s palsy in four Pfizer volunteers not due to vaccine, says US FDA”.

5. Data Visualizations

Next, you can click on the dataset name listed under the [Dataset Name] tab to view informative visualizations about your new dataset, including graphs highlighting the total number of posts and the list of top 10 most active posters, a word cloud of frequently used keywords and emojis, and box plots summarizing engagement levels for the posts in your dataset.

Note: CrowdTangle API’s Links Endpoint only pulls data from CrowdTangle. It does not pull reaction/interaction metrics data from across the entirety of Facebook products.

5. Toxicity Analysis with Google Perspective/Detoxify API

Now that data collection is complete, you can also run a Toxicity Analysis with your new dataset using Google Perspective API or Detoxify API. This particular analysis will help you to automatically find and explore the most ‘toxic’ posts in the dataset as well as download toxicity scores assigned to each post for further analysis and validation. 

The Toxicity Analysis has been trained and validated by Google for the following seven languages (English, French, Spanish, German, Portuguese, Italian, Russian). The Detoxify API supports English, French, Spanish, Italian, Portuguese, Turkish or Russian. Note: If your dataset is not predominantly in one of these languages, we recommend that you don’t run this particular analysis on your dataset as the results might not be as reliable. Learn more about this analysis here and here.

The next tutorial will examine how you can use Social Network Analysis (SNA) to examine your dataset.