text_summary module

Created on Mon Aug 5 19:56:19 2019

@author: Daniel

class text_summary.TextSummary(data=None)

Bases: object

A representation of the basic statistics of a set of texts.

data

input dataframe of texts

Type

Pandas.DataFrame

count

Number of _, has keys texts, words, char, space letter, digit, emotes, punct.

Type

dict

prop

contains overall statistics that are fractions (laziness, % of emoji, words per text, verbosity)

Type

dict

occurrence_dicts

contains dictionaries {token: count} (words or emotes)

Type

dict

per_text_lists

contains statistics per text (sentiment: polarity, subjectivity, words per text, characters per text, emotes per text)

Type

dict

compare_freq(other, token)

Find differences in word or emoji use frequency.

Parameters
  • other (TextSummary) – TextSummary to compare to.

  • token (string) – key of the thing to compare (words or emotes)

Returns

dictionary where keys correspond to words and values are tuples (total, expected ratio)

Return type

diff_dict (dict)

get_conversations(names)

Get a list of conversations.

Parameters

names (list) – names of senders.

Returns

a list of dictionaries with conversation information

Return type

convos (list)

get_counts(word)

Find number of occurrences of word in each text.

Parameters

word (string) – the word to find.

Returns

list of integer counts.

Return type

counts (list)

set_counts(raw_text, emote_free_text)

Set the count statistics.

Parameters
  • raw_text (string) – the original concatenated text

  • emote_free_text (string) – the emote/emoji-free concatenated text

set_occurrence_dicts(raw_text, emote_free_text)

Fill a dictionary with the occurrences of each word and each emote.

Parameters
  • raw_text (string) – the original concatenated text

  • emote_free_text (string) – the emote/emoji-free concatenated text

set_per_text_lists()

Set the per text list dictionary with words per text, characters per text, and emotes per text.

set_props()

Set the proportion statistics.