Frequently Asked Questions
Communalytic is a computational social science research tool for studying online communities and discourse. It can collect, analyze, and visualize publicly available data from Reddit, Telegram, YouTube, Facebook/ Instagram (via CrowdTangle) and Twitter, or from a user-uploaded CSV file – no coding required!
Communalytic also contains a suite of advanced data analytics modules including: 1) a Toxicity Analyzer powered by two different machine learning APIs: Detoxify and Perspective, 2) a Sentiment Analyzer powered by three different text processing libraries: VADER (EN), TextBlob (EN, FR, DE) and Dostoevsky (RU), 3) a Topic Analyzer that uses text embeddings to automatically identify and group together posts that are semantically similar and 4) a built-in Network Analyzer.
These modules can be used to automatically:
- detect anti-social interactions (i.e., harassment, hate speech, extremist content, etc.),
- assess sentiments in online discourse,
- identify and group together social media posts that are semantically similar and identify latent topics within your dataset,
- generate and visualize various types of networks, including communication and link-sharing networks, which in turn can be used to identify influencers, map shared interests among online actors, study the spread of mis/dis-information and detect signs of possible coordination among seemingly disparate actors.
For more details, see Communalytic’s Tutorials page.
There are two versions of Communalytic: EDU and PRO.
- Communalytic EDU is designed to help students learn about social media data analytics.
- Communalytic PRO is designed for the academic research community and is ideal for large-scale academic research projects. It provides researchers with the resources and infrastructure necessary for conducting independent research in the public interest.
Each version is hosted on its own dedicated server with its own account creation and sign-in processes. Users of Communalytic can share datasets with other users who are using the same version of Communalytic (i.e., EDU users with EDU users and PRO with PRO).
The Network Analyzer Module in Communalytic can automatically generate and visualize various types of networks (graphs) including communication and link-sharing networks. It is unique among network research tools in that it can also generate and visualize so-called ‘signed networks’. (For more details see Section 10: Network Visualization and Analysis in the Tutorials).
What’s a signed network?
A signed network*** is a network with edges that contains additional information such as positive or negative signs or scores (weights). To turn a network into a signed network in Communalytic, users have the option of running a couple of additional analyses (toxicity and/or sentiment) prior to creating a network representation of their dataset. The resulting toxicity scores and/or sentiment polarity scores would then be added as weights to edges in the network and visualized for easier exploration and analysis. This feature can be used to identify and visually highlight interactions of interest (e.g., anti-social interactions) within the network so that they may be examined in more detail.
In addition, if a user is working with Twitter data, they also have the option of running the Bot Analyzer Module and adding a bot probability score as weights to nodes in the network and visualized for easier exploration and analysis. This feature can be used to identify and visually highlight accounts and interactions of interest (e.g., Twitter accounts that might be bots) within the network so that they may be examined in more detail.
Types of networks that can be automatically generated by Communalytic
- Reply-To Network: Account-to-Account (Twitter, Reddit, Telegram)
- This communication network shows who replied to whom.
- Retweet Network: Account-to-Account (Twitter only)
- This communication network shows who retweeted whom.
- Two-Mode Link Sharing Network: Account-to-Website (Twitter, Reddit, CrowdTangle, Telegram channels & groups)
- This ‘link sharing’ network shows which accounts in your dataset shared a link to the same website.
Communalytic automatically generates the following types of summary charts for each of your dataset. Each chart can be downloaded as a PNG image or as a CSV data file. Communalytic also offers an easy import option to explore and customize most of the summary charts in a popular visualization tool for structured data called Plotly Chart Studio.
- Posts Per Day Chart
- This chart shows the number of posts per day over time.
- Word Cloud Chart
- This chart shows the 100 most frequently used words based on the full dataset. It excludes numbers, URLs, and stop words in 15 different languages.
- Emoji Cloud Chart
- This chart shows the 100 most frequently used emojis based on the full dataset.
- Top 10 Posters
- This chart shows the Top 10 posters in your dataset.
EDU Version - For Teaching & Learning
Requires an academic email address.
All Communalytic EDU accounts can collect and store up to 30K records shared across ≤ 3 datasets and have the following platform-specific data usage caps.
- YouTube: Comments (≤30K) from a specified publicly available YouTube video (Req. a Google Developer Account)
- Reddit Historical – Communalytic EDU can collect Reddit posts (≤30K), including: up to 200 recent submissions + corresponding comments and replies for a given public subreddit. (This collector supports keyword-based search for relevant submissions.) To use this collector, you will need to create and use your own Reddit account.
- Note: Communalytic EDU version does not collect posts from subreddits with 10 million or more subscribers like r/askReddit. If you want to collect data from subreddits with 10 million or more subscribers, please check out Communalytic PRO.
- Telegram – Communalytic EDU can collect messages (≤ 30k) from up to 5 public Telegram channels, groups or super groups per dataset. To use this collector, you will need to apply for a free Telegram Developer Account.
- Facebook/Instagram (via CrowdTangle) – Communalytic EDU can collect posts (≤ 30K) from public Facebook/Instagram account, groups or pages that shared the same URLs (ex. a URL to a single NYT story or the URL to any other domain name). To use this collector, you will need to apply for academic access to Meta’s CrowdTangle platform. CrowdTangle data is not exhaustive; it only tracks public posts made by “influential” accounts. Here’s more info about the types of Facebook/Instagram accounts, pages and groups indexed by CrowdTangle.
- Twitter Recent Search – Retrieves tweets (≤30K) posted within the previous 7 days that match a specified search query. (Req. a Twitter Developers Account and a paid Twitter’s API plan).
- Twitter Thread – Communalytic EDU can collect the most recent public replies (≤ 30K) to any public tweet posted within the previous 7 days. This data collection feature is ideal for studying recent tweets that have attracted a high level of engagement. (Req. a Twitter Developers Account and a paid Twitter’s API plan).
Data/API access is granted solely at the discretion of the respective social media platform. You will need to apply directly to the platform(s) of your choice for API access.
- YouTube: Request a YouTube API key from Google
- Reddit: Create a Reddit user account
- Telegram: Apply for a Telegram Developer Account
- CrowdTangle (Facebook/Instagram) URL Search: Apply to Meta for a CrowdTangle Account
- Twitter Recent Search: Apply for a Twitter Developers Account and purchase Twitter’s Basic (10k tweets/month) or Pro plan (1M tweets/month).
- Twitter Threads: Apply for a Twitter Developers Account and purchase Twitter’s Basic (10k tweets/month) or Pro plan (1M tweets/month).
- Perspective API (for Toxicity Analysis): Apply to Google for Access to the Perspective API
- Nomic API (for Topic Analysis): Create Nomic Atlas user account
No, you cannot use Communalytic EDU to collect data that is private such as DMs or posts from accounts that are set to private.
The developers of Communalytic EDU are proponents of ethical computational social science research in the public interest. All data access in Communalytic EDU is granted solely at the discretion of the respective social media platform/public API. If you are working with social media data, we encourage you to review and follow ethical guidelines and best practices established by your institution.
As a primer, please review “Ethical Decision-Making and Internet Research” published by the Association of Internet Researchers (AOIR).
Yes, you can run multiple data collectors simultaneously within Communalytic EDU (Concurrently collect 1 Reddit, 1 YouTube, 1 Telegram, 1 Twitter and 1 CrowdTangle).
You can collect and store ≤ 30K records shared across ≤3 datasets at any time in your Communalytic EDU account (i.e., per account, you can have 1 dataset with ≤ 30K records or up to 3 datasets with a variable number of records not exceeding 30K records in total).
If you’re at your account limit, you can download your previously collected datasets to free up space.
Alternatively, if your need is more robust, consider upgrading to Communalytic PRO where you can collect and store ≤ 10M records shared across ≤ 50 datasets.
Yes, you can download your datasets as a CSV file. In addition, you can also download the resulting communication or semantic network files as a GraphML file.
Yes, you can upload/import an existing dataset (in CSV format) into Communalytic EDU for analysis. Subject only to the EDU data cap of 30K records shared across ≤3 datasets.
Users of Communalytic can share datasets with other users who are using the same version of Communalytic, i.e., EDU users with EDU users and PRO with PRO.
- You can share datasets that you have collected with collaborators from within Communalytic under the ‘My Datasets’ tab.
- You can accept shared datasets from a collaborator from within Communalytic under the ‘Shared with Me’ tab. (Look for a jingling red bell.)
Yes, you can move datasets from the EDU version to the PRO version. Start by downloading your dataset as a CSV file from Communalytic EDU and then upload the file to your Communalytic PRO account.
We’ll keep your datasets on our server for 100 days from the end of your collection date.
You will receive a notification 3 weeks before the expiration date and 3 days before your dataset is automatically deleted from our system.
If you are using Communalytic in an academic publication, please cite us as:
- Gruzd, A., & Mai, P. (2022). Communalytic: A Research Tool For Studying Online Communities and Online Discourse. Available at https://Communalytic.com
Note: For information on how to properly describe Communalytic EDU data collection processes, see the FAQ section on “What are the parameters for data collection?”
PRO Version - For Research
Paid 6-month subscription to support site infrastructure – server-side data collection, processing and analysis via a dedicated commercial server, and extra data storage.
All Communalytic PRO accounts can collect and store up to 10M records shared across ≤ 50 datasets and have the following platform-specific data usage caps.
- YouTube: Comments from a specified publicly available YouTube video (Req. a Google Developer Account)
- Reddit Historical – Communalytic EDU can collect Reddit posts (≤ 10M), including up to 900 recent submissions + corresponding comments and replies for a given public subreddit. (This collector supports keyword-based search for relevant submissions.) To use this collector, you will need to create and use your own Reddit account.
- Reddit Live – Communalytic PRO can collect posts, including submissions, comments and replies to comments from a given public subreddit for up to 7 consecutive days from the start of data collection. To use this collector, you do not need to apply for a separate Reddit API key at this time.
- Note 1: Please also note that comments to Reddit submissions and replies to comments are only collected at the end of the specified data collection period. If a comment or a reply has been deleted by the moderator(s) or the poster prior to the end date of your data collection, it will not be included in the final dataset.
- Note 2: Communalytic will try to collect any new submissions within the specified data collection period; however, some posts in “high volume” groups (such as r/all) may be dropped due to the Reddit API limitations.
- Telegram – Communalytic PRO can collect messages (≤ 10M) from up to 10 public Telegram channels, groups or super groups per dataset. To use this collector, you will need to apply for a free Telegram Developer Account.
- Facebook/Instagram (via CrowdTangle) – Communalytic PRO can collect public Facebook/Instagram posts (≤ 10M) that shared the same URL (ex. a URL to a single NYT story or the URL to any other domain name). To use this collector, you will need to apply for academic access to Meta’s CrowdTangle platform. CrowdTangle data is not exhaustive, it only tracks public posts made by “influential” accounts. Here’s more info about the types of Facebook/Instagram accounts, pages and groups indexed by CrowdTangle.
- Twitter Recent Search – Retrieves tweets (≤500K) posted within the previous 7 days that match a specified search query. T
- (Req. a Twitter Developers Account and a paid Twitter’s API plan).
- Twitter Thread – Communalytic PRO can collect the most recent public replies (≤ 500K) to any public tweet posted within the previous 7 days. This data collection feature is ideal for studying recent tweets that have attracted a high level of engagement. (For example, viral tweets from politicians, celebrities, news outlets, etc…) ((Req. a Twitter Developers Account and a paid Twitter’s API plan).
Data/API access is granted solely at the discretion of the respective social media platform. You will need to apply directly to the platform(s) of your choice for API access.
- YouTube: Request a YouTube API key from Google
- Reddit: Create a Reddit user account
- Telegram: Apply for a Telegram Developer Account
- CrowdTangle (Facebook/Instagram) URL Search: Apply to Meta for a CrowdTangle Account
- Twitter Recent Search: Apply for a Twitter Developers Account and purchase Twitter’s Basic (10k tweets/month) or Pro plan (1M tweets/month).
- Twitter Threads: Apply for a Twitter Developers Account and purchase Twitter’s Basic (10k tweets/month) or Pro plan (1M tweets/month).
- Perspective API (for Toxicity Analysis): Apply to Google for Access to the Perspective API
- Nomic API (for Topic Analysis): Create Nomic Atlas user account
No, you cannot use Communalytic PRO to collect data that is private such as DMs or for accounts that are set to private.
The developers of Communalytic PRO are proponents of ethical computational social science research in the public interest. All data access in Communalytic PRO is granted solely at the discretion of the respective social media platform/public API. If you are working with social media data, we encourage you to review and follow ethical guidelines and best practices established by your institution.
As a primer, please review “Ethical Decision-Making and Internet Research” published by the Association of Internet Researchers (AOIR).
Yes, you can run multiple data collectors simultaneously within Communalytic PRO (Concurrently collect 1 Reddit, 1 YouTube, 1 Telegram, 1 Twitter and 1 CrowdTangle).
You can collect and store ≤ 10M records shared across ≤ 50 datasets at any time in your Communalytic PRO account (i.e., per account, you can have 1 dataset with ≤ 10M records or up to 50 datasets with a variable number of records not exceeding 10M records in total).
If you’re at your account limit, you can download your previously collected datasets to free up space.
Alternatively, if you know that you are likely to exceed either the 50-dataset cap or the 10M-record cap per account, you have the option to create a second PRO account using a different email address.
Yes, you can download your datasets as a CSV file.
You can also download the resulting communication or semantic network files as a GraphML file.
Yes, you can upload/import an existing dataset (in CSV format) into Communalytic PRO for analysis. Subject only to the PRO data cap of 10M records shared across ≤50 datasets.
(NEW!) You can now also upload/import an existing Twitter or Telegram dataset from multiple JSON files.
Users of Communalytic can share datasets with other users who are using the same version of Communalytic, i.e., EDU users with EDU users and PRO with PRO.
- You can share datasets that you have collected with collaborators from within Communalytic under the ‘My Datasets’ tab.
- You can accept shared datasets from a collaborator from within Communalytic under the ‘Shared with Me’ tab. (Look for an animate red bell.)
Yes, you can move datasets from the PRO to the EDU version. However, please note that due to the EDU low data cap, this ability is limited to datasets with ≤ 30K records.
We’ll keep your datasets on our server as long as your PRO account has not expired. You can extend your PRO account at any time for another 6 months via the My Profile menu within Communalytic PRO.
You will receive a notification 7 days before your account’s expiration date. After your account has expired, you will have 14 days to upgrade it before your account and datasets are automatically removed from our system.
If you are using Communalytic in an academic publication, please cite us as:
- Gruzd, A., & Mai, P. (2022). Communalytic: A Research Tool For Studying Online Communities and Online Discourse. Available at https://Communalytic.com
Note: For information on how to properly describe Communalytic PRO data collection processes, see the FAQ section on “What are the parameters for data collection?”