Pandas常用
你可以不用,但你不能不会。
| Grammar | Explanation / Examples |
|---|---|
| Importing Data | |
| pd.read_csv(filename) | From a CSV file |
| pd.read_table(filename) | From a delimited text file (like TSV) |
| pd.read_excel(filename) | From an Excel file |
| pd.read_sql(query, connection_object) | |
| pd.read_json(json_string) | |
| pd.read_html(url) | |
| pd.DataFrame(dict) | |
| Exporting Data | |
| df.to_csv(filename) | |
| df.to_sql(table_name, connection_object) | |
| df.to_json(filename) | |
| Create Test Objects | |
| pd.DataFrame(np.random.rand(20, 5)) | |
| pd.Series(my_list) | |
| df.index = pd.date_range(‘1900/1/30’, periods=df.shape[0]) | |
| Viewing / Inspecting Data | |
| df.head(n) | |
| df.tail(n) | |
| df.shape | |
| df.describe() | |
| df.value_counts() | |
| Selection | |
| df[col] | Returns column with label col as Series |
| df[[col1, col2]] | Returns columns as a new DataFrame |
| ss.iloc[0] | Selection by position |
| df.iloc[0, :] | First row |
| df.iloc[0, 0] | First element of first column |
| Data Cleaning | |
| df.columns = [‘a’,‘b’,‘c’] | Rename columns |
| pd.isnull() | Checks for null Values, Returns Boolean Arrray |
| pd.notnull() | |
| df.dropna() | Drop all rows that contain null values |
| df.dropna(axis=1) | Drop all columns that contain null values |
| df.dropna(axis=1, thresh=n) | Drop all rows have have less than n non null values |
| df.fillna(x) | Replace all null values with x |
| ss.fillna(ss.mean()) | |
| ss.astype(float) | Convert the datatype of the series to float |
| ss.replace(1, ‘one’) | Replace all values equal to 1 with ‘one’ |
| ss.replace([1, 3], [‘one’, ‘three’]) | |
| df.rename(columns=lambda x: x + 1) | |
| df.rename(columns={‘old_name’: ‘new_ name’}) | |
| df.set_index(‘column_one’) | |
| df.rename(index=lambda x: x + 1) | |
| Filter, Sort, and Groupby | |
| df[df[col].gt(0.5)] | |
| df[(df[col] > 0.5) & (df[col] < 0.7)] | |
| df.sort_values(col1) | |
| df.sort_values(col2, ascending=False) | |
| df.sort_values([col1,col2], ascending=[True, False]) | Sort values by col1 in ascending order then col2 in descending order |
| df.groupby(col) | |
| df.groupby([col1, col2]) | Returns groupby object for values from multiple columns |
| df.pivot_table(index=col1, values=[col2, col3], aggfunc=np.mean) | Create a pivot table that groups by col1 and calculates the mean of col2 and col3 |
| df.groupby(col1).agg(np.mean) | |
| df.apply(np.mean) | Apply the function np.mean() across each column |
| df.apply(np.max, axis=1) | Apply the function np.max() across each row |
| Join / Combine | |
| df1.append(df2) | Add the rows in df1 to the end of df2 (columns should be identical) |
| pd.concat([df1, df2], axis=1) | Add the columns in df1 to the end of df2 (rows should be identical) |
| df1.join(df2, on=col1, how=‘inner’) | SQL-style |
| Statistics | |
| df.describe() | |
| df.mean() | |
| df.corr() | Returns the correlation between columns in a DataFrame |
| df.count() | |
| df.max() | |
| df.min() | |
| df.median() | |
| df.std() |
- Blog Link: https://neo1989.net/CheatSheet/CHEATSHEET-pandas/
- Copyright Declaration: 转载请声明出处。