Frequently Asked Questions

What is Communalytic?

Communalytic is a computational social science research tool for studying online communities and discourse. It can collect and analyze publicly available data from Bluesky (~Apr. 2024), CrowdTangle (FB/IG), Mastodon, Reddit, Telegram, X (formerly Twitter), and YouTube, or you can import your own CSV or JSON data files – No coding required.

Communalytic also contains a suite of advanced data analytics modules, including: 1) a Toxicity Analyzer powered by two different machine learning APIs: Detoxify and Perspective, 2) a Sentiment Analyzer powered by three different text processing libraries: VADER (EN), TextBlob (EN, FR, DE) and Dostoevsky (RU), 3) a Topic Analyzer that uses text embeddings to identify and group together posts that are semantically similar automatically and 4) a built-in Network Analyzer

These modules can be used to automatically:

  • detect anti-social interactions (i.e., harassment, hate speech, extremist content, etc.),
  • assess sentiments in online discourse,
  • identify and group together social media posts that are semantically similar and identify latent topics within your dataset,
  • generate and visualize various types of networks, including communication and link-sharing networks, which in turn can be used to identify influencers, map shared interests among online actors, study the spread of mis/dis-information and detect signs of possible coordination among seemingly disparate actors.

For more details, see Communalytic’s Tutorials page.

There are two versions of Communalytic: EDU and PRO.

  • Communalytic EDU is designed to help students learn about social media data analytics and social network analysis.
  • Communalytic PRO is designed for the academic research community and is ideal for large-scale academic research projects. It provides researchers with the resources and infrastructure necessary for conducting independent research in the public interest. 

Each version is hosted on its own dedicated server with its own account creation and sign-in processes. Users of Communalytic can share datasets with other users using the same version of Communalytic (i.e., EDU users with EDU users and PRO with PRO).

The Network Analyzer Module in Communalytic can automatically generate and visualize various types of networks (graphs) including communication and link-sharing networks. It is unique among network research tools in that it can also generate and visualize so-called ‘signed networks’. (For more details see Network Visualization and Analysis in the Tutorials).

What’s a signed network?

A signed network*** is a network with edges that contains additional information such as scores or weights. To turn a network into a signed network in Communalytic, users have the option of running a couple of additional analyses (toxicity and/or sentiment) prior to creating a network representation of their dataset. The resulting toxicity scores and/or sentiment polarity scores would then be added as weights to edges in the network and visualized for easier exploration and analysis. This feature can be used to identify and visually highlight interactions of interest (e.g., anti-social interactions) within a network so that they may be examined in more detail.

Types of networks that can be automatically generated by Communalytic

  • Reply-To Network: Account-to-Account (Twitter, Reddit, Telegram) 
    • This communication network shows who replied to whom. 
  • Retweet Network: Account-to-Account (Twitter only)
    • This communication network shows who retweeted whom.
  • Two-Mode Link Sharing Network: Account-to-Website  (Twitter, Reddit, CrowdTangle, Telegram channels & groups)
    • This ‘link sharing’ network shows which accounts in your dataset shared a link to the same website. 

Communalytic automatically generates the following types of summary charts for each of your dataset. Each chart can be downloaded as a PNG image or as a CSV data file. Communalytic also offers an easy import option to explore and customize most of the summary charts in a popular visualization tool for structured data called Plotly Chart Studio.

  • Posts Per Day Chart 
    • This chart shows the number of posts per day over time.
  • Word Cloud Chart
    • This chart shows the 100 most frequently used words based on your full dataset. It excludes numbers, URLs, and stop words in 15 different languages.
  • Emoji Cloud Chart
    • This chart shows the 100 most frequently used emojis based on your full dataset.
  • Top 10 Posters
    • This chart shows the Top 10 posters in your dataset.

EDU Version - For Teaching & Learning

Free – Has a limited feature set and is meant for teaching and learning only

All Communalytic EDU accounts can collect and store up to 30K records shared across 3 datasets and have the following platform-specific data collection caps.

  • Bluesky: (Avail. ~Apr. 2024)
  • CrowdTangle Facebook/Instagram URL Search: Posts* from public Facebook/Instagram accounts, groups or pages that shared the same URLs (ex., a URL to a single NYT story or the URL to any other domain name). To use this collector, you will need to apply for academic access to Meta’s CrowdTangle platform. CrowdTangle data is not exhaustive; it only tracks public posts by “influential” accounts. Here’s more info about the types of Facebook/Instagram accounts, pages and groups indexed by CrowdTangle.
  • Mastodon Recent Posts: Recent public posts (≤5K) plus any then-available corresponding replies from any public Mastodon server.
  • Mastodon Hashtag Search: Recent public posts(≤ 5K) containing a specific hashtag plus any then-available corresponding replies from any public Mastodon server.
  • Reddit Recent Submissions: Recent public submissions (≤ 200) plus any then-available corresponding comments/replies* from any public subreddit. To use this collector, you will need to create and link your Reddit account to Communalytic.
    • Note 1: This collector also includes a keyword-based search filter which allows users to build highly curated datasets consisting of only submissions containing specified keywords.
    • Note 2: The EDU version does not collect posts from subreddits with 10 million or more subscribers, like r/askReddit. If you want to collect data from subreddits with 10 million+ subscribers, please check out Communalytic PRO.
  • Reddit Live: Available only in Communalytic PRO
  • Telegram Historical Posts: Posts* from public Telegram channels, groups or supergroups (5 max) per dataset. To use this collector, you will need to apply for a free Telegram Developer Account.
  • X Recent Search: Recent posts* posted within the previous 7 days that match a specified search query. (Req. an X Developers Account and a paid Twitter’s API plan).
  • X Threads: Recent replies* to any public post posted within the previous 7 days. This collector is ideal for studying recent posts that have attracted a high level of engagement. (Req. a Twitter Developers Account and a paid Twitter’s API plan)
  • YouTube Video Comments: Comments* from any public YouTube video. (Req. a Google Developer Account)

* Generally, there is no limit – except where noted – on the number of posts or replies that can be collected. However, due to API and/or computing restrictions, the total records per dataset can not exceed the EDU data storage cap of 30K.

Depending on the platform, you may need to apply directly to the platform(s) for API access. Data/API access is granted solely at the discretion of the respective social media platform. Access can be revoked at any time by the platform.

No, you cannot use Communalytic EDU to collect data that is private such as DMs or posts from accounts that are set to private.

The developers of Communalytic EDU are proponents of ethical computational social science research in the public interest. All data access in Communalytic EDU is granted solely at the discretion of the respective social media platform/public API. If you are working with social media data, we encourage you to review and follow ethical guidelines and best practices established by your institution. 

As a primer, please review Ethical Decision-Making and Internet Research” published by the Association of Internet Researchers (AOIR).

Yes, you can concurrently run one collector for each available data source.

You can collect and store ≤ 30K records shared across ≤3 datasets at any time in your Communalytic EDU account (i.e., per account, you can have 1 dataset with ≤ 30K records or up to 3 datasets with a variable number of records not exceeding 30K records in total).

If you’re at your account limit, you can download your previously collected datasets to free up space.

Alternatively, if your need is more robust, consider upgrading to Communalytic PRO where you can collect and store ≤ 10M records shared across ≤ 50 datasets.

Yes, you can download your datasets as a CSV file along with all toxicity and sentiment polarity scores. 

In addition, you can also download the resulting communication or semantic network files as a GraphML file. 

Yes, you can upload/import an existing dataset (in CSV format) into Communalytic EDU for analysis. Subject only to the EDU data cap of 30K records shared across 3 datasets.

Users of Communalytic can share datasets with other users who are using the same version of Communalytic, i.e., EDU users with EDU users and PRO with PRO. 

  • You can share datasets that you have collected with collaborators from within Communalytic under the ‘My Datasets’ tab.
  • You can accept shared datasets from a collaborator from within Communalytic under the ‘Shared with Me’ tab. (Look for a jingling red bell.)

Yes, you can move datasets from the EDU version to the PRO version. Start by downloading your dataset as a CSV file from Communalytic EDU and then upload the file to your Communalytic PRO account.

We’ll keep your datasets on our server for 100 days from the end of your collection date. 

You will receive a notification 3 weeks before the expiration date and 3 days before your dataset is automatically deleted from our system.

If you are using Communalytic in an academic publication, please cite us as: 

  • Gruzd, A., & Mai, P. (2022). Communalytic: A Research Tool For Studying Online Communities and Online Discourse. Available at https://Communalytic.com

Note: For information on how to properly describe Communalytic EDU data collection processes, see the FAQ section on “What are the parameters for data collection?”

PRO Version - For Research

Paid ($349 USD) – For a 6-month subscription to support site infrastructures such as server-side data collection, processing, analysis, visualization and higher data collection and storage capacity 

All Communalytic PRO accounts can collect and store up to 10M records shared across ≤ 50 datasets and have the following platform-specific data collection caps. 

  •  Bluesky: (Avail. ~Apr. 2024)
  • CrowdTangle Facebook/Instagram URL Search: Posts* from public Facebook/Instagram accounts, groups or pages that shared the same URLs (ex., a URL to a single NYT story or the URL to any other domain name). To use this collector, you will need to apply for academic access to Meta’s CrowdTangle platform. CrowdTangle data is not exhaustive; it only tracks public posts by “influential” accounts. Here’s more info about the types of Facebook/Instagram accounts, pages and groups indexed by CrowdTangle.
  • Mastodon Recent Posts: Recent public posts (≤ 50K) plus any then-available corresponding replies from any public Mastodon server.
  • Mastodon Hashtag Search: Recent public posts (≤ 50K) containing a specific hashtag plus any then-available corresponding replies from any public Mastodon server.
  • Reddit Recent Submissions: Recent public submissions (≤ 900) plus any then-available corresponding comments/replies* from any public subreddit. To use this collector, you will need to create and link your Reddit account to Communalytic.
    • Note 1: This collector also includes a keyword-based search filter which allows users to build highly curated datasets consisting of only submissions containing specified keywords.
  • Reddit Live  – Future public submissions* plus any then-available comments and replies* from any public subreddit for up to 7 consecutive days going forward. To use this collector, you will not need to create and link your Reddit account to Communalytic.
    • Note 1: Please also note that comments to Reddit submissions and replies to comments are only collected at the end of the specified data collection period. If a comment or a reply has been deleted by the moderator(s) or the poster prior to the end date of your data collection, it will not be included in the final dataset.
    • Note 2: Communalytic will try to collect any new submissions within the specified data collection period; however, some posts in “high volume” groups (such as r/all) may be dropped due to the Reddit API limitations. 
  • Telegram Historical Posts: Posts* from public Telegram channels, groups or supergroups (10 max) per dataset. To use this collector, you will need to apply for a free Telegram Developer Account.
  • X Recent Search: Recent posts* posted within the previous 7 days that match a specified search query. (Req. an X Developers Account and a paid Twitter’s API plan).
  • X Threads: Recent replies* to any public post posted within the previous 7 days. This collector is ideal for studying recent posts that have attracted a high level of engagement. (Req. a Twitter Developers Account and a paid Twitter’s API plan)
  • YouTube Video Comments: Comments* from any public YouTube video. (Req. a Google Developer Account)

* Generally, there is no limit – except where noted – on the number of posts or replies that can be collected. However, due to API and/or computing restrictions, the total records per dataset can not exceed the PRO data storage cap of 10M.

Depending on the platform, you may need to apply directly to the platform(s) for API access. Data/API access is granted solely at the discretion of the respective social media platform. API access can be revoked at any time by the platform.

No, you cannot use Communalytic PRO to collect data that is private such as DMs or for accounts that are set to private.

The developers of Communalytic PRO are proponents of ethical computational social science research in the public interest. All data access in Communalytic PRO is granted solely at the discretion of the respective social media platform/public API. If you are working with social media data, we encourage you to review and follow ethical guidelines and best practices established by your institution.

As a primer, please review “Ethical Decision-Making and Internet Research” published by the Association of Internet Researchers (AOIR).

Yes, you can concurrently run one collector for each available data source.

You can collect and store  10M records shared across  50 datasets at any time in your Communalytic PRO account (i.e., per account, you can have 1 dataset with  10M records or up to 50 datasets with a variable number of records not exceeding 10M records in total).

If you’re at your account limit, you can download your previously collected datasets to free up space.

Alternatively, if you know that you are likely to exceed either the 50-dataset cap or the 10M-record cap per account, you have the option to create a second PRO account using a different email address.

Yes, you can download your datasets as a CSV file along with all toxicity and sentiment polarity scores.

In addition, you can also download the resulting communication or semantic network files as a GraphML file.

Yes, you can upload/import an existing dataset (in CSV format) into Communalytic PRO for analysis. Subject only to the PRO data cap of 10M records shared across ≤50 datasets.

(NEW!) You can now also upload/import an existing Twitter or Telegram dataset from multiple JSON files.

Users of Communalytic can share datasets with other users who are using the same version of Communalytic, i.e., EDU users with EDU users and PRO with PRO. 

  • You can share datasets that you have collected with collaborators from within Communalytic under the ‘My Datasets’ tab.
  • You can accept shared datasets from a collaborator from within Communalytic under the ‘Shared with Me’ tab. (Look for an animate red bell.)

Yes, you can move datasets from the PRO to the EDU version. However, please note that due to the EDU low data cap, this ability is limited to datasets with ≤ 30K records.

We’ll keep your datasets on our server as long as your PRO account has not expired. You can extend your PRO account at any time for another 6 months via the My Profile menu within Communalytic PRO. 

You will receive a notification 7 days before your account’s expiration date. After your account has expired, you will have 14 days to upgrade it before your account and datasets are automatically removed from our system.

If you are using Communalytic in an academic publication, please cite us as:

  • Gruzd, A., & Mai, P. (2022). Communalytic: A Research Tool For Studying Online Communities and Online Discourse. Available at https://Communalytic.com

Note: For information on how to properly describe Communalytic PRO data collection processes, see the FAQ section on “What are the parameters for data collection?”