In [2]:
df
Out[2]:
foo bar baz qux
0 one A 1 u
1 one B 2 v
2 one C 3 w
3 two A 4 x
4 two B 5 y
5 two C 6 z

The dimensions of the dataset.

In [3]:
df.shape
Out[3]:
(6, 4)

The number of rows in the DataFrame.

In [4]:
len(df)
Out[4]:
6

The number of rows times number of columns.

In [5]:
df.size
Out[5]:
24

Number of dimensions of the dataset.

In [6]:
df.ndim
Out[6]:
2

Summary statistics for the numerical columns (transposed for more readable output).

In [7]:
df.describe(include=[np.number]).T
Out[7]:
count mean std min 25% 50% 75% max
baz 6.0 3.5 1.870829 1.0 2.25 3.5 4.75 6.0

Summary stats for object and categorical columns (transposed for more readable output).

In [8]:
df.describe(include=[np.object, pd.Categorical]).T
Out[8]:
count unique top freq
foo 6 2 two 3
bar 6 3 A 2
qux 6 6 x 1

Count of non null values.

In [9]:
df.count()
Out[9]:
foo    6
bar    6
baz    6
qux    6
dtype: int64
In [10]:
df['bar'].value_counts()
Out[10]:
A    2
B    2
C    2
Name: bar, dtype: int64

Number of distinct values in a column.

In [11]:
df['bar'].nunique()
Out[11]:
3
In [12]:
df.min()
Out[12]:
foo    one
bar      A
baz      1
qux      u
dtype: object

List the index dtype and columns, non-null values and memory usage of the DataFrame.

In [13]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   foo     6 non-null      object
 1   bar     6 non-null      object
 2   baz     6 non-null      int64 
 3   qux     6 non-null      object
dtypes: int64(1), object(3)
memory usage: 320.0+ bytes

Memory usage of each column.

In [14]:
df.memory_usage(deep=True)
Out[14]:
Index    128
foo      360
bar      348
baz       48
qux      348
dtype: int64
In [15]:
df['foo'] = df['foo'].astype('category')
df['bar'] = df['bar'].astype('category')
df['baz'] = df['baz'].astype(np.int8)
In [16]:
df.memory_usage(deep=True)
Out[16]:
Index    128
foo      234
bar      288
baz        6
qux      348
dtype: int64
In [84]:
    
In [17]:
df.memory_usage(deep=True)
Out[17]:
Index    128
foo      234
bar      288
baz        6
qux      348
dtype: int64
In [18]:
df
Out[18]:
foo bar baz qux
0 one A 1 u
1 one B 2 v
2 one C 3 w
3 two A 4 x
4 two B 5 y
5 two C 6 z
In [19]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   foo     6 non-null      category
 1   bar     6 non-null      category
 2   baz     6 non-null      int8    
 3   qux     6 non-null      object  
dtypes: category(2), int8(1), object(1)
memory usage: 450.0+ bytes
In [ ]: