Pandas常用 | 愚苏记

你可以不用，但你不能不会。

Grammar	Explanation / Examples
Importing Data
pd.read_csv(filename)	From a CSV file
pd.read_table(filename)	From a delimited text file (like TSV)
pd.read_excel(filename)	From an Excel file
pd.read_sql(query, connection_object)
pd.read_json(json_string)
pd.read_html(url)
pd.DataFrame(dict)
Exporting Data
df.to_csv(filename)
df.to_sql(table_name, connection_object)
df.to_json(filename)
Create Test Objects
pd.DataFrame(np.random.rand(20, 5))
pd.Series(my_list)
df.index = pd.date_range(‘1900/1/30’, periods=df.shape[0])
Viewing / Inspecting Data
df.head(n)
df.tail(n)
df.shape
df.describe()
df.value_counts()
Selection
df[col]	Returns column with label col as Series
df[[col1, col2]]	Returns columns as a new DataFrame
ss.iloc[0]	Selection by position
df.iloc[0, :]	First row
df.iloc[0, 0]	First element of first column
Data Cleaning
df.columns = [‘a’,‘b’,‘c’]	Rename columns
pd.isnull()	Checks for null Values, Returns Boolean Arrray
pd.notnull()
df.dropna()	Drop all rows that contain null values
df.dropna(axis=1)	Drop all columns that contain null values
df.dropna(axis=1, thresh=n)	Drop all rows have have less than n non null values
df.fillna(x)	Replace all null values with x
ss.fillna(ss.mean())
ss.astype(float)	Convert the datatype of the series to float
ss.replace(1, ‘one’)	Replace all values equal to 1 with ‘one’
ss.replace([1, 3], [‘one’, ‘three’])
df.rename(columns=lambda x: x + 1)
df.rename(columns={‘old_name’: ‘new_ name’})
df.set_index(‘column_one’)
df.rename(index=lambda x: x + 1)
Filter, Sort, and Groupby
df[df[col].gt(0.5)]
df[(df[col] > 0.5) & (df[col] < 0.7)]
df.sort_values(col1)
df.sort_values(col2, ascending=False)
df.sort_values([col1,col2], ascending=[True, False])	Sort values by col1 in ascending order then col2 in descending order
df.groupby(col)
df.groupby([col1, col2])	Returns groupby object for values from multiple columns
df.pivot_table(index=col1, values=[col2, col3], aggfunc=np.mean)	Create a pivot table that groups by col1 and calculates the mean of col2 and col3
df.groupby(col1).agg(np.mean)
df.apply(np.mean)	Apply the function np.mean() across each column
df.apply(np.max, axis=1)	Apply the function np.max() across each row
Join / Combine
df1.append(df2)	Add the rows in df1 to the end of df2 (columns should be identical)
pd.concat([df1, df2], axis=1)	Add the columns in df1 to the end of df2 (rows should be identical)
df1.join(df2, on=col1, how=‘inner’)	SQL-style
Statistics
df.describe()
df.mean()
df.corr()	Returns the correlation between columns in a DataFrame
df.count()
df.max()
df.min()
df.median()
df.std()