This tutorial will demonstrate how to run the toxicity analysis on the data that you have collected.
Once a data collection has been completed, you’ll be able to access the dataset by clicking on your dataset name underneath the “Dataset Name” tab on your homepage.
On the top left side, you should see an overview of the dataset you’ve selected. If you scroll further down the page, basic visualizations of the dataset are also provided.
Now, let’s click on the “Toxicity Analysis” button on the left to begin your toxicity analysis.
To start the toxicity analysis, please select one of the following toxicity models: Perspective API or Detoxify.
The Perspective API model can currently analyze posts in the following languages: Arabic, Chinese, Czech, Dutch, English, French, German, Hindi, Hinglish, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish.
- Please note that an API key from Google Perspective Service is required to use the Perspective API model. If you do not have a Google Perspective API key, you can review our Google Perspective API Key Tutorial on how to obtain one.
- If you have a new API key, enter your API key in the “My Profile” section as demonstrated in our previous tutorial. Then, return to this page. If you would like to be notified when your toxicity analysis is complete, select “Email me once job is complete”. Then click “Start Analysis” to initiate the Toxicity Analysis Process.
Note: the steps for both models are very similar however, for this tutorial we will be using the Detoxify model.
From here, you are able to track the progress of your toxicity analysis. Google Perspectives API has a rate limit and can only process 100 queries per 100 seconds. Detoxify has a similar rate as well. Because of this, larger datasets will take longer to process. For example, a dataset with 1000 records will take approximately 20 minutes to process.
Once the toxicity analysis is complete, look for the blue icon under the Toxicity Analysis column on your home page. Click on it to see your results.
This first table provides a summary of your toxicity analysis results. Each row represents different types of toxicity such as “Toxicity”, “Identity Attack“, “Insult“, “Profanity” or “Threat“. “Toxicity” is a general model that considers all instances of toxicity. The toxicity values will be between 0 and 1.
You can click on the scores to see the top 10 posts with the highest overall value for each of the toxicity types.
For instance, clicking on the highest value of “Toxicity” will show you the top 10 posts with the highest toxicity values in your dataset. The same applies to the 10 lowest values.
If you return back to your toxicity analysis (by clicking “back” in the top left corner), you’ll notice the lower portion is a list of interactive charts visualizing your results. For example, the first tab provides a visualization of the distribution of toxicity values found within your dataset. This chart is also interactive, allowing you to customize it to your own liking.
For additional details on selecting a suitable threshold for toxicity scores, please refer to this guide. Typically, a threshold within the range of 0.7 to 0.9 is adequate, but we recommend conducting cross-validation by manually reviewing a small sample of posts and their respective scores within your dataset. Also, check out this academic paper by Pascual-Ferrá et.al. on how to use toxicity scores in a research study.
Now it’s time for you to explore the Toxicity Analysis tool for yourself!