FAQ
Welcome to Communalytic! This page features answers to some common FAQs. As Communalytic is still under active development, the answers here are subject to change. We’ll update this FAQ periodically as we continue to release more features. Thank you for joining our research community!
Frequently Asked Questions
Overview
- What is Communalytic?
Communalytic is a computational social science research tool for studying online communities and discourse. It is designed to provide researchers with the resources and infrastructure necessary for conducting independent research in the public interest. Communalytic can collect and analyze publicly available data from Bluesky, Mastodon, Reddit, Telegram, X (formerly Twitter), and YouTube, or you can import your own CSV or JSON data files – No coding required.
In addition to a large assortment of easy-to-use social media data collectors, Communalytic also comes with a full suite of built-in data analytics modules, including a:
- Topic Analyzer that uses text embeddings to automatically identify and group together social media posts that are semantically similar,
- Network Analyzer that can automatically generate and visualize various types of signed and unsigned networks including communication and link-sharing networks,
- Toxicity Analyzer powered by two different machine learning APIs: Detoxify and Perspective,
- Sentiment Analyzer powered by three different text processing libraries: VADER (EN), TextBlob (EN, FR, DE) and Dostoevsky (RU).
These modules can automatically:
- detect anti-social interactions (i.e., harassment, hate speech, extremist content, etc.),
- assess sentiments in online discourse,
- automatically identify and group together social media posts that are semantically similar and identify latent topics within your dataset,
- generate and visualize various types of networks, including communication and link-sharing networks.
When used separately or together, these analytical modules can be used to study online communities and influencers, map shared interests among online actors, study the spread of mis and disinformation, and detect signs of possible coordination among seemingly disparate actors.
- How to cite Communalytic?
If you are using Communalytic in an academic publication, please cite us as:
- Gruzd, A., & Mai, P. (<access year>). Communalytic: a computational social science research tool for studying online communities and discourse. Available at https://Communalytic.org
- Which version of Communalytic (EDU or PRO) should I use?
There are two versions of Communalytic: EDU and PRO.
- Communalytic EDU is designed to help students learn about social media data analytics and social network analysis.
- Communalytic PRO is designed for the research community and is ideal for large-scale academic research projects. It provides researchers with the resources and infrastructure necessary for conducting independent research in the public interest.
Each version is hosted on its own dedicated server and has its own account creation and sign-in processes. Users of Communalytic can share datasets with other users using the same version of Communalytic (i.e., EDU users with EDU users and PRO with PRO).
- How many datasets can I have?
- Each Communalytic EDU account can collect and store ≤30K records shared across ≤3 datasets at any time (i.e., per account, you can have 1 dataset with ≤30K records or up to 3 datasets with a variable number of records not exceeding 30K records in total).
- Each Communalytic PRO account can collect and store ≤10M records shared across ≤50 datasets at any time (i.e., per account, you can have 1 dataset with ≤10M records or up to 50 datasets with a variable number of records not exceeding 10M records in total).
- Can I get more datasets for my account?
- If you’re at your account limit, you can download your previously collected datasets to free up space.
- Alternatively, if your need more space, consider upgrading to Communalytic PRO where you can collect and store ≤10M records shared across ≤50 datasets.
Data Collection
- Do I need to apply to the platforms for permission to access their APIs?
Communalytic provides stable and efficient structured data access via official APIs owned and maintained by various social media platforms; it does not “scrape the data.” Depending on the platform, Communalytic users may need to request and/or pay separately for API/data access. Since API/data access is granted solely at the discretion of the respective platform and can be revoked at any time by the platform, Communalytic can not and does not guarantee or promise data access to any particular social media platform. Currently, X requires a separate paid API plan to access their data. (See “What types of data are available via Communalytic EDU/PRO?” and our Tutorials for platform-specific details and instructions on how to request access.)
- BlueSky: You do not need to apply separately for API access. Bluesky automatically generates an API key upon request for each user/session.
- Mastodon: You do not need to create a Mastodon account or apply for a separate API key. Mastodon automatically generates an API key upon request for each user/session.
- Reddit: Sign up for Reddit and link your Reddit account to Communalytic.
- Telegram: Request a free Telegram Developer Account .
- X (Twitter): Request a Twitter Developer Account and purchase a Twitter Basic (10k tweets/month) or Pro plan (1M tweets/month) .
- YouTube: Request a free YouTube API key from Google.
- Perspective API (for Toxicity Analysis): Request a free Perspective API key from Google.
- What types of data are available via Communalytic EDU?
All Communalytic EDU accounts can collect and store up to 30K records shared across ≤ 3 datasets. However, due to platform API restrictions and/or limited computing resources, there may be additional platform-specific limits on how much data (posts + replies) can be collected.
- Bluesky Recent Search Data Collector: Recent public posts (≤5K) plus any then-available corresponding replies based on a given search query. To use this collector, you do not need to create a Bluesky account or apply for a separate API key; a Bluesky API key will automatically be generated by Bluesky during the collection process.
- See also Bluesky Data Structure
- Bluesky Thread Data Collector: All available replies to a given post, including replies to replies. To use this collector, you do not need to create a Bluesky account or apply for a separate API key; a Bluesky API key will automatically be generated during the collection process.
- Bluesky User Timeline Data Collector: Recent posts, reposts and replies from a given account (aka Timeline). To get a full view of conversations, the collector can also retrieve replies by other users in response to the specified user’s posts. To use this collector, you do not need to create a Bluesky account or apply for a separate API key; a Bluesky API key will automatically be generated during the collection process.
- Mastodon Recent Posts: Recent public posts (≤5K) plus any then-available corresponding replies from any public Mastodon server. To use this collector, you do not need to create a Mastodon account or apply for a separate API key; Mastodon will automatically generates an API key upon request for each user/session.
- See also Mastodon Data Structure
- Mastodon Hashtag Search: Recent public posts(≤ 5K ) containing a specific hashtag plus any then-available corresponding replies from any public Mastodon server. To use this collector, you do not need to create a Mastodon account or apply for a separate API key; Mastodon will automatically generates an API key upon request for each user/session.
- Reddit Recent Submissions: Recent public submissio ns (≤ 200) plus any then-available corresponding comments/replies from any public subreddit. To use this collector, you will need to create and link your Reddit account to Communalytic.
- See also Reddit Data Structure
- Note 1: This collector also includes a keyword-based search filter which allows users to build highly curated datasets consisting of only submissions containing specified keywords.
- Note 2: The EDU version does not collect posts from subreddits with 10 million or more subscribers, like r/askReddit. If you want to collect data from subreddits with 10 million+ subscribers, please check out Communalytic PRO.
- Reddit Live: Future public submissions plus any then-available comments and replies from any public subreddit for up to 7 consecutive days going forward. (Available only in Communalytic PRO.)
- Telegram Historical Posts: Posts from public Telegram channels, groups or supergroups (5 max) per dataset. To use this collector, you will need to apply for a free Telegram Developer Account.
- See also Telegram Data Structure
- X Recent Search: Recent posts posted within the previous 7 days that match a specified search query. (Req. an X Developers Account and a paid Twitter’s API plan )
- See also X (Twitter) Data Structure
- X Threads: Recent replies to any public post posted within the previous 7 days. This collector is ideal for studying recent posts that have attracted a high level of engagement. (Req. an X Developers Account and a paid Twitter’s API plan )
- YouTube Video Comments: Comments from any public YouTube video. (Req. a free Google Developer Account )
- See also YouTube Data Structure
- What types of data are available via Communalytic PRO?
All Communalytic PRO accounts can collect and store up to 10M records shared across ≤ 50 datasets. However, due to platform API restrictions and/or limited computing resources, there may be additional platform-specific limits on how much data (posts + replies) can be collected.
- Bluesky Recent Search Data Collector: Recent public posts (≤5K) plus any then-available corresponding replies based on a given search query. To use this collector, you do not need to create a Bluesky account or apply for a separate API key; a Bluesky API key will automatically be generated by Bluesky during the collection process.
- See also Bluesky Data Structure
- Bluesky Thread Data Collector: All available replies to a given post, including replies to replies. To use this collector, you do not need to create a Bluesky account or apply for a separate API key; a Bluesky API key will automatically be generated during the collection process.
- Bluesky User Timeline Data Collector: Recent posts, reposts and replies from a given account (aka Timeline). To get a full view of conversations, the collector can also retrieve replies by other users in response to the specified user’s posts. To use this collector, you do not need to create a Bluesky account or apply for a separate API key; a Bluesky API key will automatically be generated during the collection process.
- Mastodon Recent Posts: Recent public posts (≤50K) plus any then-available corresponding replies from any public Mastodon server. To use this collector, you do not need to create a Mastodon account or apply for a separate API key; Mastodon will automatically generates an API key upon request for each user/session.
- See also Mastodon Data Structure
- Mastodon Hashtag Search: Recent public posts(≤ 50K ) containing a specific hashtag plus any then-available corresponding replies from any public Mastodon server. To use this collector, you do not need to create a Mastodon account or apply for a separate API key; Mastodon will automatically generates an API key upon request for each user/session.
- Reddit Recent Submissions: Recent public submissio ns (≤ 900) plus any then-available corresponding comments/replies from any public subreddit. To use this collector, you will need to create and link your Reddit account to Communalytic.
- See also Reddit Data Structure
- Note 1: This collector also includes a keyword-based search filter which allows users to build highly curated datasets consisting of only submissions containing specified keywords.
- Note 2: The PRO version can collect posts from subreddits with 10 million+subscribers, like r/askReddit.
- Reddit Live: Future public submissions plus any then-available comments and replies from any public subreddit for up to 7 consecutive days going forward. To use this collector, you will need to create and link your Reddit account to Communalytic.
- Note 1: Please also note that comments to Reddit submissions and replies to comments are only collected at the end of the specified data collection period. If a comment or a reply has been deleted by the moderator(s) or the poster prior to the end date of your data collection, it will not be included in the final dataset.
- Note 2: Communalytic will try to collect any new submissions within the specified data collection period; however, some posts in “high volume” groups (such as r/all) may be dropped due to the Reddit API limitations.
- Telegram Historical Posts: Posts from public Telegram channels, groups or supergroups (10 max) per dataset. To use this collector, you will need to apply for a free Telegram Developer Account.
- See also Telegram Data Structure
- X Recent Search: Recent posts posted within the previous 7 days that match a specified search query. (Req. an X Developers Account and a paid Twitter’s API plan)
- See also X(Twitter) Data Structure
- X Threads: Recent replies to any public post posted within the previous 7 days. This collector is ideal for studying recent posts that have attracted a high level of engagement. (Req. an X Developers Account and a paid Twitter’s API plan)
- YouTube Video Comments: Comments from any public YouTube video. (Req. a free Google Developer Account)
- See also Youtube Data Structure
- Can I collect data that is private such as DMs or posts from private groups?
No, It is not possible for Communalytic to collect data that is private such as DMs or posts from accounts that are set to private.
The developers of Communalytic are proponents of ethical computational social science research in the public interest. All data access in Communalytic is granted solely at the discretion of the respective social media platform/public API. If you are working with social media data, we encourage you to review and follow ethical guidelines and best practices established by your institution.
As a primer, please review “Ethical Decision-Making and Internet Research ” published by the Association of Internet Researchers (AOIR).
- Can I run multiple data collectors simultaneously?
Yes, you can concurrently run one data collector for each available data source.
Data Management
- How long will you keep my datasets on your server?
- For Communalytic EDU accounts, datasets are kept for 100 days from the end of their collection date. You will receive a notification 3 weeks before the expiration date and 3 days before your dataset is automatically deleted from our system.
- For Communalytic PRO accounts, datasets are kept until the expiry of the PRO account. You can extend your PRO account anytime in the 6-month increment via the My Profile menu. You will receive a 7-day notification before your PRO account’s expiration date. After your account has expired, you will have 14 days to upgrade it before your account and datasets are automatically removed from our system.
- Can I move datasets between the EDU and the PRO version?
- There is no direct option to achieve this. However, you can move a dataset from the EDU version to the PRO version (and vice versa) by downloading it as a CSV file first and then uploading it to another account.
- Please note that due to the EDU data cap, the transfer from the PRO to the EDU version is limited to datasets with ≤30K records.
- Can I share my dataset with my collaborators?
Users of Communalytic can share datasets with other users using the same version of Communalytic, i.e., EDU users with EDU users and PRO with PRO.
• You can share datasets that you have collected with collaborators from within Communalytic under the ‘My Datasets’ tab.
• You can accept datasets that have been shared with you within Communalytic under the ‘Shared with Me’ tab. (Look for a jingling red bell.)
- Can I upload/import my own dataset?
Yes, you can upload/import an existing dataset into Communalytic for analysis, subject to the following caps:
- Communalytic EDU: File size: <10Mb; Dataset size: ≤30K records;
- Communalytic PRO: File size: <100Mb; Dataset size: ≤10M records;
For larger files, you can compress the CSV file into a ZIP or GZ archive. (The ZIP or GZ file should contain only one CSV file.)
If your dataset is from one of the listed social media platforms, use the provided templates to rename CSV columns to ensure that Communalytic can properly recognize the data filed:
If you're ONLY interested in using one of Communalytic’s textual analysis modules (e.g., Toxicity, Sentiment and/or Topic Analyzer), a CSV file with a single column called 'text' will suffice. This is ideal for analyzing any type of textual data.
If you're interested in using Some or ALL available data analysis modules in Communalytic (e.g. Toxicity Analyzer, Sentiment Analyzer, Topic Analyzer, Network Analyzer, Time Series, Word & Emoji Cloud and Top Posters), your CSV file should include Some or ALL the following columns:created_at
- text (Req. for Toxicity, Sentiment, Topic Analyzer, and Word/Emoji Cloud)
- created_at (Req. for Time Series) [i.e., when the post was created; example: 10/14/2022 19:03 OR 2020-03-13 23:15:56]
- user_screen_name (Req. for Top Posters and Network Analysis) [i.e., who created the post]
- in_reply_to_screen_name (Req. for Network Analysis) [i.e., recipient of the post, for replies only]
- How do I properly open a dataset (csv) that has been exported from Communalytic in Excel?
If you have exported a dataset from Communalytic as a CSV file, and now wish to view and analyze it further in Excel, follow the steps in this tutorial to learn how to properly open it in Excel.
Important: Do not double click on the CSV file to open it in Excel. Double clicking to open will cause Excel to improperly display emojis and other special characters. It may also corrupt some of the fields that store unique identifiers for posts and users (as these fields are usually represented as a long sequence of digits) which Excel will try to interpret as integers and will likely fail and corrupt your data.
- Can I download my datasets?
- Yes, you can download your datasets as a CSV or Excel file along with all toxicity and sentiment polarity scores.
- In addition, you can download the resulting network files as a GraphML file.
Data Analysis and Visualization
- Can I collect and analyze non-English posts?
You can collect and analyze data in different languages. Most modules support analysis of non-English posts:
- Overview Page (Time series, Interactive Word Cloud, Emoji Cloud, … are all language agnostic);
- Toxicity Analysis:
- The Perspective models currently support the following languages: Arabic, Chinese, Czech, Dutch, English, French, German, Hindi, Hinglish, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish and Swedish.
- The Detoxify models currently support the following languages (available in Communalytic Pro only): English, French, Spanish, Italian, Portuguese, Turkish or Russian.
- Sentiment Analysis is based on the following libraries: VADER (supports English only), TextBlob (English, French, German), and Dostoevsky (Russian).
- Topic Analysis relies on a multi-lingual language model (multilingual-MiniLM-L12-v2) in Communalytic Pro only.
- Network Analysis, by design, is language agnostic since it focuses on user interactions.
- What types of summary charts can Communalytic automatically generate about my dataset?
Communalytic automatically generates the following types of summary charts for each of your dataset. Each chart can be downloaded as a PNG image or as a CSV data file. Communalytic also offers an easy import option to explore and customize most of the summary charts in a popular visualization tool for structured data called Plotly Chart Studio
- Posts Per Day Chart
- This chart shows the number of posts per day over time.
- Word Cloud Chart
- This chart shows the 100 most frequently used words based on your full dataset. It excludes numbers, URLs, and stop words in 15 different languages.
- Emoji Cloud Chart
- This chart shows the 100 most frequently used emojis based on your full dataset.
- Top 10 Posters
- This chart shows the Top 10 posters in your dataset.
- How to conduct toxicty analysis with Communalytic’s Toxicity Analyzer Module?
Communalytic’s Toxicty Analyzer Module is an AI-powered toxicity analysis tool designed to automatically identify toxic and anti-social interactions in online discourse. Users can choose from two different AI toxicity detection systems: Detoxify or Perspective. For more info, visit our Tuorials page.
- How to conduct sentiment analysis with Communalytic’s Sentiment Analyzer Module?
Communalytic’s Sentiment Analyzer Module is a lexicon and rule-based sentiment analysis tool designed to detect the polarity of text in a dataset. Users can choose from 3 different sentiment analysis libraries: VADER (EN), TextBlob (EN, FR, DE), or Dostoevsky (RU). For more info, visit our Tuorials page .
- How to conduct topic analysis with Communalytic’s Topic Analyzer Module?
Communalytic’s Topic Analyzer Module is an AI-powered module designed to automatically identify and group together social media posts that are semantically similar using embeddings. No prior knowledge of the dataset is required. For more info, visit our Tuorials page .
- How to conduct a network analysis with Communalytic’s Network Analyzer Module and what types of networks can Communalytic generate and visualize?
Communalytic’s Network Analyzer Module can automatically generate and visualize various types of networks (graphs) including communication and link-sharing networks. It is unique among network research tools in that it can also generate and visualize so-called ‘signed networks’. For more info, visit our Tuorials page.
What’s a signed network?
A signed network is a network with edges that contains additional information such as scores or weights. To turn a network into a signed network in Communalytic, users have the option of running a couple of additional analyses (toxicity and/or sentiment) prior to creating a network representation of their dataset. The resulting toxicity scores and sentiment polarity scores would then be added as weights to edges in the network and visualized for easier exploration and analysis. This feature can be used to identify and visually highlight interactions of interest (e.g., anti-social interactions) within a network so that they may be examined in more detail.
Types of networks that can be automatically generated by Communalytic:
- Reply-To Network: Account-to-Account
- This communication network shows who replied to whom.
- Repost Network: Account-to-Account
- This communication network shows who reposts whom.
- Two-Mode Link Sharing Network: Account-to-Website
- This ‘link sharing’ network shows which accounts in your dataset shared a link to the same website.