string_utils module

Created on Tue Aug 6 16:51:53 2019

@author: dgurevich6

string_utils.count_emote_occurrences(raw_text)

Return a dictionary of emoji/emotes and their number of occurrences.

Parameters

raw_text (str) – string representation of raw text

Returns

dictionary with entries {emote/emoji: count}

Return type

emo_dict (dict)

string_utils.count_emotes(text)

Returns the number of emotes in a string

Parameters

text (string) – the original string

Returns

the number of emotes in the original string

Return type

count (int)

string_utils.count_word_occurrences(text)

Return a dictionary of words and their number of occurrences.

Parameters

clean_text (str) – string representation of clean text.

Returns

dictionary with entries {word: count}

Return type

word_dict (dict)

string_utils.get_words(text, minimal=False)

Return the words comprising a string.

Parameters
  • text (string) – the original string

  • minimal (boolean) – if true, break words at ‘ and -

Returns

a list of words

Return type

word_list (list)

string_utils.isemoji(character)

Returns true if the character is an emoji

Parameters

character (str) – the character.

Returns

if character is an emoji

Return type

bool

string_utils.remove_emotes(text)

Remove emotes from string and return the cleaned string

Parameters

text (string) – the original string

Returns

the string with emotes removed

Return type

text (string)

string_utils.strip_subs(text, substring_list)

Strip a string of the substrings in a list. (Assumption: no substring is contained in another)

Parameters
  • text (str) – input string

  • substring_list (list) – list of undesired substrings

Returns

input text with substrings removed.

Return type

clean_text (str)