.. POTATo: Pandas Online Text Analysis Tool documentation master file, created by
   sphinx-quickstart on Tue Jun 16 21:04:08 2020.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to POTATo: Pandas Online Text Analysis Tool's documentation!
====================================================================

.. toctree::
   :maxdepth: 4
   :hidden:
   :caption: Contents:

   source/exporting_chats
   source/chat_format
   source/config_file

Introduction
============

This collection of Python visualizations and analyses of chat messaging
histories was born out of our desire to see the evolution of our relationships
through our messages. The visualizations currently provided are a small sample
of what could be done with the collected data and the conversation metrics
we examine, but we judged these to be the most interesting plots for now.

User Guide
==========
1. Export all relevant chats across all relevant platforms: we currently
   support Signal, Discord, Facebook Messenger, WhatsApp, and GroupMe. Follow
   the relevant procedures to get them into the correct format, thus getting to
   the starting points of the flowchart below.
2. Create a config file - specify the locations of the chat exports, potential
   pseudonyms, colors for graphing, etc.
3. In the case of Signal chats, use chat_cleaning/opendb.py to extract tables
   from the exported database.
4. Parse the raw chats into an aggregated CSV file: run chat_cleaning_aggregate
   to produce an aggregate cleaned csv file containing the entire chat history
   with the sender(s).
5. Make plots as desired by running the plot-making files.

.. figure:: images/chat_parsing_flowchart.png
   
   Flowchart of chat parsing operations from raw data export to final cleaned csv file.


Plots
=====

* make_bin_plots generates what we called "binned" plots: the chat messaging
  history is sliced into time intervals (for example, days), and a data point
  is taken for each slice (for example, number of words written in the day).
  This is then plotted in an area graph.

* make_aggregate_plots generates plots using the text messaging history as a
  whole, ignoring time evolution. This consists of broad word/text per person
  bar graphs, a scatterplot of the number of words per conversation, and a
  scatterplot of the number of words vs duration of conversation. 

* make_heatmap_plots generates heatmaps with hours on x-axis and days on y-axis
  (though this can be changed to an arbitrary size matrix).

* make_word_ratio_scatterplots generates a scatterplot with each point
  representing a word, its position on the x-axis representing the ratio of its
  use between two senders, and its position on the y-axis representing how many
  times it occurs in the messaging history. This is the only plot that does not
  work for more than two senders.


Dependencies
============

This toolkit was developed for Python 3.7 or later. Additionally, the following
packages are needed:

* pandas
* numpy
* pyyaml
* plotly
* emoji
* nltk
* rake_nltk
* pathlib (for parsing Facebook Messenger messages)
* sqlite3 (for extracting Signal messenger backup)


.. toctree::
   :maxdepth: 4
   :caption: Modules:

   source/config
   source/chat_cleaning
   source/chat_cleaning_aggregate
   source/text_summary
   source/text_summary_list_utils
   source/file_utils
   source/stat_utils
   source/string_utils
   source/make_aggregate_plots
   source/make_bin_plots
   source/make_heatmap_plots
   source/make_word_ratio_scatterplot

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`