string_utils module¶
Created on Tue Aug 6 16:51:53 2019
@author: dgurevich6
-
string_utils.
count_emote_occurrences
(raw_text)¶ Return a dictionary of emoji/emotes and their number of occurrences.
- Parameters
raw_text (str) – string representation of raw text
- Returns
dictionary with entries {emote/emoji: count}
- Return type
emo_dict (dict)
-
string_utils.
count_emotes
(text)¶ Returns the number of emotes in a string
- Parameters
text (string) – the original string
- Returns
the number of emotes in the original string
- Return type
count (int)
-
string_utils.
count_word_occurrences
(text)¶ Return a dictionary of words and their number of occurrences.
- Parameters
clean_text (str) – string representation of clean text.
- Returns
dictionary with entries {word: count}
- Return type
word_dict (dict)
-
string_utils.
get_words
(text, minimal=False)¶ Return the words comprising a string.
- Parameters
text (string) – the original string
minimal (boolean) – if true, break words at ‘ and -
- Returns
a list of words
- Return type
word_list (list)
-
string_utils.
isemoji
(character)¶ Returns true if the character is an emoji
- Parameters
character (str) – the character.
- Returns
if character is an emoji
- Return type
bool
-
string_utils.
remove_emotes
(text)¶ Remove emotes from string and return the cleaned string
- Parameters
text (string) – the original string
- Returns
the string with emotes removed
- Return type
text (string)
-
string_utils.
strip_subs
(text, substring_list)¶ Strip a string of the substrings in a list. (Assumption: no substring is contained in another)
- Parameters
text (str) – input string
substring_list (list) – list of undesired substrings
- Returns
input text with substrings removed.
- Return type
clean_text (str)