If you are using Communalytic in an academic publication, please cite us as:
- Gruzd, A., & Mai, P. (2023). Communalytic: A Research Tool For Studying Online Communities and Online Discourse. Available at https://Communalytic.org
New to Communalytic? We have prepared a number of tutorials to help you get started.
Table of Contents
- Section 1: Getting Started With Communalytic
- Section 2: Working with Reddit Data
- Section 3: Working with Telegram Data
- Section 4: Working with CrowdTangle's Facebook & Instagram URL-Search Data
- Section 5: Working with Twitter Data (Req. Twitter's Developer Acc. + Twitter's API Plan)
- Section 6: Working with YouTube Data
- Section 7: Toxicity Analysis
- Section 8: Sentiment Analysis
- Section 9: Topic Analysis
- Section 10: Network Analysis and Visualization
- Section 11: Data Management (Data Import/Export)
Section 1: Getting Started With Communalytic
There are two versions of Communalytic: EDU and PRO. Each version is hosted on its own dedicated server with its own account creation and sign-in processes. Users of Communalytic can share datasets with other users who are using the same version of Communalytic (i.e., EDU users with EDU users and PRO with PRO).
- Communalytic EDU is designed to help students learn about social media data analytics
- Communalytic PRO is designed for the academic research community and is ideal for large-scale academic research projects. It provides researchers with the resources and infrastructure necessary for conducting independent research in the public interest.
Section 5: Working with Twitter Data (Req. Twitter's Developer Acc. + Twitter's API Plan)
To collect data from Twitter, you need to purchase Twitter’s Basic (10k tweets/month) or Pro API plan (1M tweets/month), subject to the limit allowed by Twitter for the specific plan you have purchased. This is in addition to creating your own Twitter’s developer account.
Section 7: Toxicity Analysis
The Toxicity Analysis Module uses AI models to detect the level of toxicity in online conversations. Powered by two machine learning APIs: Detoxify and Perspective, the module can be used to analyze posts in your dataset and can generate the following “toxicity” scores: Toxicity, Severe Toxicity, Identity Attack, Insult, Profanity, Threat.
- Troubleshooting tips for obtaining a Perspective API Key
- The Google account used to obtain a Google Perspective API Key can be different from the Google account you used to create your Communalytic account.
- In some instances, Google might not allow you to create a Google Cloud project with your academic/institutional email. If that is the case, you will need to use a Google account ending with @gmail.com.
- Troubleshooting tips for obtaining a Perspective API Key
Section 8: Sentiment Analysis
The Sentiment Analysis module in Communalytic can conduct sentiment analysis on text in the following languages: English, French, German and Russian using one or more of the following three popular sentiment analysis libraries: VADER (EN), TextBlob (EN, FR, DE) and Dostoevsky (RU).
- Posts in French or German will only be analyzed by TextBlob.
- Post in Russian will only be analyzed by Dostoevsky.
- Posts in English will be analyzed by both VADER and TextBlob. Researchers with a predominantly English language dataset will have the option to inspect conflicting polarity scores generated by these two different sentiment analysis libraries (VADER and TextBlob) and decide which library is better suited/more accurate for analyzing their particular dataset.
- Note: This tutorial is for datasets consisting of mostly English language posts
Section 9: Topic Analysis
The Topic Analysis Module automatically identifies and groups together posts that are semantically similar based on the similarity of their meaning and can be used to spot latent topics in a dataset (i.e., abstract topics that may not be directly observable from just reading the posts). The module uses AI to transform human-readable text such as social media posts into computer-readable vectors of numbers known as embeddings. Posts that are located close to each other in a multi-dimensional space are considered semantically similar (i.e., similar in their meaning). For more information on embedding see here and here.
Section 10: Network Analysis and Visualization
The Network Analyzer module in Communalytic can automatically generate and visualize the following types of networks:
- Reply-To Network: Account-to-Account (Reddit, YouTube, Telegram (groups only), Twitter)
- This communication network shows who replied to whom.
- Retweet Network: Account-to-Account (Twitter only)
- This communication network shows who retweeted whom.
- Two-Mode Link Sharing Network: Account-to-Website (Reddit, YouTube, Twitter, CrowdTangle, Telegram channels & groups)
- This ‘link sharing’ network shows which accounts in your dataset shared a link to the same website(s).
How to create a signed network in Communalytic
- The Network Analyzer module in Communalytic is unique among network research tools in that it can generate and visualize so-called “signed” networks. A signed network is a network with edges that contains additional information such as positive or negative signs or scores (weights).
- To turn a network into a signed network in Communalytic, users have the option to run a couple of additional analyses (toxicity and/or sentiment) prior to creating a network representation of their dataset. The resulting toxicity scores and/or sentiment polarity scores would be added as weights to edges in the network and visualized for easier exploration and analysis.
- This feature can be used to identify and visually highlight interactions of interest (e.g., anti-social interactions) within the network so that they may be examined in more detail.
Learn more about signed networks
- Signed Networks in Social Media by Leskovec, Huttenlocher, Kleinberg
- Signed Social Networks: A Survey by Girdhar & Bharadwaj