Yet another thing that confuses me
Application is probably the most important difference to note:
map
: mapping values from one domain to another, so is optimised for performance (e.g.,df['A'].map({1:'a', 2:'b', 3:'c'})
)applymap
: elementwise transformations across multiple rows/columns (e.g.,df[['A', 'B', 'C']].applymap(str.strip)
)apply
: applying any function that cannot be vectorised (e.g.,df['sentences'].apply(nltk.sent_tokenize)
)
The use of apply
is generally very slow, and should not be used if there are already ‘pre-processing’ functions for whatever you want to do. For example, strip() or lower().
In short, these are vectorised functions you don’t want to be using apply on:
- Series.str.split() Splits each element in the Series
- Series.str.strip() Strips whitespace from each string in the Series.
- Series.str.lower() Converts strings in the Series to lowercase.
- Series.str.upper() Converts strings in the Series to uppercase.
- Series.str.get() Retrieves the i-th element of each element in the Series.
- Series.str.replace() Replaces a regex or string in the Series with another string
- Series.str.cat() Concatenates strings in a Series.
- Series.str.extract() Extracts substrings from the Series matching a regex pattern.