Tutorials
If you are using Communalytic in an academic publication, please cite us as:
- Gruzd, A., & Mai, P. (2023). Communalytic: A Research Tool For Studying Online Communities and Online Discourse. Available at https://Communalytic.org
New to Communalytic? We have prepared a number of tutorials to help you get started.
Table of Contents
- Section 1: Getting Started With Communalytic
- Section 2: Working with Reddit Data
- Section 3: Working with Telegram Data
- Section 4: Working with CrowdTangle's Facebook & Instagram URL-Search Data
- Section 5: Working with Twitter Data (Req. Twitter's Developer Acc. + Twitter's API Plan)
- Section 6: Working with YouTube Data
- Section 7: Toxicity Analysis
- Section 8: Sentiment Analysis
- Section 9: Topic Analysis
- Section 10: Network Analysis and Visualization
- Section 11: Data Management (Data Import/Export)
Section 1: Getting Started With Communalytic
There are two versions of Communalytic: EDU and PRO. Each version is hosted on its own dedicated server with its own account creation and sign-in processes. Users of Communalytic can share datasets with other users who are using the same version of Communalytic (i.e., EDU users with EDU users and PRO with PRO).
- Communalytic EDU is designed to help students learn about social media data analytics
- Communalytic PRO is designed for the academic research community and is ideal for large-scale academic research projects. It provides researchers with the resources and infrastructure necessary for conducting independent research in the public interest.
Section 2: Working with Reddit Data
Section 3: Working with Telegram Data
Telegram 101
How to obtain a Telegram API key
How to find Telegram channels/groups (Joining public channels/groups is not required to collect data)
- How to collect data from Telegram
- Learn about Telegram data structure
Section 4: Working with CrowdTangle's Facebook & Instagram URL-Search Data
Section 5: Working with Twitter Data (Req. Twitter's Developer Acc. + Twitter's API Plan)
To collect data from Twitter, you need to purchase Twitter’s Basic (10k tweets/month) or Pro API plan (1M tweets/month), subject to the limit allowed by Twitter for the specific plan you have purchased. This is in addition to creating your own Twitter’s developer account.
How to request a Twitter Bearer API
How to collect data from Twitter based on Recent Search
How to use Boolean Search operators and how to use ChatGPT to develop a search query
How to collect replies to a given tweet (Twitter Thread)
Learn more about Twitter data structure
Twitter Data Collection: Tweet Rehydration
Case Study: Toxicity Analysis of a Twitter Thread
Section 6: Working with YouTube Data
Section 7: Toxicity Analysis
The Toxicity Analysis Module uses AI models to detect the level of toxicity in online conversations. Powered by two machine learning APIs: Detoxify and Perspective, the module can be used to analyze posts in your dataset and can generate the following “toxicity” scores: Toxicity, Severe Toxicity, Identity Attack, Insult, Profanity, Threat.
How to obtain a Perspective API key
- Troubleshooting tips for obtaining a Perspective API Key
- The Google account used to obtain a Google Perspective API Key can be different from the Google account you used to create your Communalytic account.
- In some instances, Google might not allow you to create a Google Cloud project with your academic/institutional email. If that is the case, you will need to use a Google account ending with @gmail.com.
- Troubleshooting tips for obtaining a Perspective API Key
How to use the Toxicity Analysis module
Section 8: Sentiment Analysis
The Sentiment Analysis module in Communalytic can conduct sentiment analysis on text in the following languages: English, French, German and Russian using one or more of the following three popular sentiment analysis libraries: VADER (EN), TextBlob (EN, FR, DE) and Dostoevsky (RU).
- Posts in French or German will only be analyzed by TextBlob.
- Post in Russian will only be analyzed by Dostoevsky.
- Posts in English will be analyzed by both VADER and TextBlob. Researchers with a predominantly English language dataset will have the option to inspect conflicting polarity scores generated by these two different sentiment analysis libraries (VADER and TextBlob) and decide which library is better suited/more accurate for analyzing their particular dataset.
How to use the Sentiment Analysis module
How to inspect conflicting polarity scores between TextBlob and VADER in Excel/Google Sheet
- Note: This tutorial is for datasets consisting of mostly English language posts
Section 9: Topic Analysis
The Topic Analysis Module automatically identifies and groups together posts that are semantically similar based on the similarity of their meaning and can be used to spot latent topics in a dataset (i.e., abstract topics that may not be directly observable from just reading the posts). The module uses AI to transform human-readable text such as social media posts into computer-readable vectors of numbers known as embeddings. Posts that are located close to each other in a multi-dimensional space are considered semantically similar (i.e., similar in their meaning). For more information on embedding see here and here.
Section 10: Network Analysis and Visualization
The Network Analyzer module in Communalytic can automatically generate and visualize the following types of networks:
- Reply-To Network: Account-to-Account (Reddit, YouTube, Telegram (groups only), Twitter)
- This communication network shows who replied to whom.
- Retweet Network: Account-to-Account (Twitter only)
- This communication network shows who retweeted whom.
- Two-Mode Link Sharing Network: Account-to-Website (Reddit, YouTube, Twitter, CrowdTangle, Telegram channels & groups)
- This ‘link sharing’ network shows which accounts in your dataset shared a link to the same website(s).
How to create a signed network in Communalytic
- The Network Analyzer module in Communalytic is unique among network research tools in that it can generate and visualize so-called “signed” networks. A signed network is a network with edges that contains additional information such as positive or negative signs or scores (weights).
- To turn a network into a signed network in Communalytic, users have the option to run a couple of additional analyses (toxicity and/or sentiment) prior to creating a network representation of their dataset. The resulting toxicity scores and/or sentiment polarity scores would be added as weights to edges in the network and visualized for easier exploration and analysis.
- This feature can be used to identify and visually highlight interactions of interest (e.g., anti-social interactions) within the network so that they may be examined in more detail.
Learn more about signed networks
- Signed Networks in Social Media by Leskovec, Huttenlocher, Kleinberg
- Signed Social Networks: A Survey by Girdhar & Bharadwaj
Additional resources