DataFrame-行与列的查询与匹配

Author： Kent、张
发布时间：June 23, 2021
1287 views
No comments
5932 words
Categories： Coding

一文看清楚Pandas之DataFrame 的行与列的查询和条件匹配，

文末备注了html文件和jupyter notebook的文件

import numpy as np
import pandas as pd

df=pd.DataFrame(
               np.arange(12).reshape((3,4)),
               index=['one','two','thr'],
               columns=list('abcd')
               )

	a	b	c	d
one	0	1	2	3
two	4	5	6	7
thr	8	9	10	11

取列 (column)

df['a']#取a列
df[['a','b']]#取a、b列

	a	b
one	0	1
two	4	5
thr	8	9

取行（rows）

iloc可以用数字索引，也可以用index和column索引

df.iloc[0]#取第0行，按索引

a    0
b    1
c    2
d    3
Name: one, dtype: int64

df.iloc[0:2]#取第0到2行

	a	b	c	d
one	0	1	2	3
two	4	5	6	7

df.loc['one':'two']#取one、two行，按index

	a	b	c	d
one	0	1	2	3
two	4	5	6	7

df.loc[['one','two'],['a','c']]#取one、two行，abc列

	a	c
one	0	2
two	4	6

匹配:按条件获取

d=[
    pd.Series([21,'jel','20210623',21,51100],index=['age','nickname','login_date','user_id','lover_id']),
    pd.Series([19,'dscx','20200623',22,51101],index=['age','nickname','login_date','user_id','lover_id']),
    pd.Series([25,'lll','20210323',23,51002],index=['age','nickname','login_date','user_id','lover_id']),
    pd.Series([30,'lent','20210601',26,51021],index=['age','nickname','login_date','user_id','lover_id'])
]
df= pd.DataFrame(d,columns=['age','nickname','login_date','user_id','lover_id'])
df

	age	nickname	login_date	user_id	lover_id
0	21	jel	20210623	21	51100
1	19	dscx	20200623	22	51101
2	25	lll	20210323	23	51002
3	30	lent	20210601	26	51021

“==” 匹配

df.loc[df['user_id']>21]

	age	nickname	login_date	user_id	lover_id
1	19	dscx	20200623	22	51101
3	30	lent	20210601	26	51021

选取某列是否是某一类型的数值用 isin

df.loc[df['age'].isin([21,19])]

	age	nickname	login_date	user_id	lover_id
0	21	jel	20210623	21	51100
1	19	dscx	20200623	22	51101

多种条件的选取用 &

df.loc[(df['age'] <30) & df['user_id'].isin([21,22,23,26])]

	age	nickname	login_date	user_id	lover_id
0	21	jel	20210623	21	51100
1	19	dscx	20200623	22	51101
2	25	lll	20210323	23	51002

选取不等于某些值的行记录用 !=

df.loc[df['login_date'] != 20210623]

	age	nickname	login_date	user_id	lover_id
0	21	jel	20210623	21	51100
1	19	dscx	20200623	22	51101
2	25	lll	20210323	23	51002
3	30	lent	20210601	26	51021

isin返回一系列的数值,如果要选择不符合这个条件的数值使用"~"

df.loc[~df['user_id'].isin([21,22])]

	age	nickname	login_date	user_id	lover_id
2	25	lll	20210323	23	51002
3	30	lent	20210601	26	51021

html文件在此，有更好的阅读体验

DataFrame-查询与匹配.html.zip
DataFrame-查询与匹配.ipynb.zip

参考文献：

【1】pandas 根据列的值选取所有行

【2】pandas小技巧之--值替换

【3】[译]如何根据条件从pandas DataFrame中删除不需要的行？ - everfight - 博客园

【4】官网

Pandas 不错的学习资料：joyful-pandas.zip

Last modification：June 23, 2021

如果觉得我的文章对你有用，请随意赞赏

DataFrame-行与列的查询与匹配

Kent、张 • 2021 年 06 月 23 日

<h3>一文看清楚Pandas之DataFrame 的 行 与列的查询和条件匹配，</h3>
<h3>文末备注了html文件和jupyter notebook的文件</h3>
<pre><code class="language-python">import numpy as np
import pandas as pd</code></pre>
<pre><code class="language-python">df=pd.DataFrame(
               np.arange(12).reshape((3,4)),
               index=['one','two','thr'],
               columns=list('abcd')
               )</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>a</th>
      <th>b</th>
      <th>c</th>
      <th>d</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>one</th>
      <td>0</td>
      <td>1</td>
      <td>2</td>
      <td>3</td>
    </tr>
    <tr>
      <th>two</th>
      <td>4</td>
      <td>5</td>
      <td>6</td>
      <td>7</td>
    </tr>
    <tr>
      <th>thr</th>
      <td>8</td>
      <td>9</td>
      <td>10</td>
      <td>11</td>
    </tr>
  </tbody>
</table>
<h1>取列 (column)</h1>
<pre><code class="language-python">df['a']#取a列
df[['a','b']]#取a、b列</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>a</th>
      <th>b</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>one</th>
      <td>0</td>
      <td>1</td>
    </tr>
    <tr>
      <th>two</th>
      <td>4</td>
      <td>5</td>
    </tr>
    <tr>
      <th>thr</th>
      <td>8</td>
      <td>9</td>
    </tr>
  </tbody>
</table>
<h1>取行 （rows）</h1>
<blockquote>
<p>loc——通过行标签索引行数据  此处为one two / abcd<br />
iloc——通过行号索引行数据   01234</p>
</blockquote>
<p><strong>iloc可以用数字索引，也可以用index和column索引</strong></p>
<hr />
<pre><code class="language-python">df.iloc[0]#取第0行，按索引</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<pre><code>a    0
b    1
c    2
d    3
Name: one, dtype: int64</code></pre>
<hr />
<pre><code class="language-python">df.iloc[0:2]#取第0到2行</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>a</th>
      <th>b</th>
      <th>c</th>
      <th>d</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>one</th>
      <td>0</td>
      <td>1</td>
      <td>2</td>
      <td>3</td>
    </tr>
    <tr>
      <th>two</th>
      <td>4</td>
      <td>5</td>
      <td>6</td>
      <td>7</td>
    </tr>
  </tbody>
</table>
<hr />
<pre><code class="language-python">df.loc['one':'two']#取one、two行，按index</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>a</th>
      <th>b</th>
      <th>c</th>
      <th>d</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>one</th>
      <td>0</td>
      <td>1</td>
      <td>2</td>
      <td>3</td>
    </tr>
    <tr>
      <th>two</th>
      <td>4</td>
      <td>5</td>
      <td>6</td>
      <td>7</td>
    </tr>
  </tbody>
</table>
<hr />
<pre><code class="language-python">df.loc[['one','two'],['a','c']]#取one、two行，abc列</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>a</th>
      <th>c</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>one</th>
      <td>0</td>
      <td>2</td>
    </tr>
    <tr>
      <th>two</th>
      <td>4</td>
      <td>6</td>
    </tr>
  </tbody>
</table>
<h1>匹配:按条件获取</h1>
<pre><code class="language-python">d=[
    pd.Series([21,'jel','20210623',21,51100],index=['age','nickname','login_date','user_id','lover_id']),
    pd.Series([19,'dscx','20200623',22,51101],index=['age','nickname','login_date','user_id','lover_id']),
    pd.Series([25,'lll','20210323',23,51002],index=['age','nickname','login_date','user_id','lover_id']),
    pd.Series([30,'lent','20210601',26,51021],index=['age','nickname','login_date','user_id','lover_id'])
]
df= pd.DataFrame(d,columns=['age','nickname','login_date','user_id','lover_id'])
df</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>age</th>
      <th>nickname</th>
      <th>login_date</th>
      <th>user_id</th>
      <th>lover_id</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>21</td>
      <td>jel</td>
      <td>20210623</td>
      <td>21</td>
      <td>51100</td>
    </tr>
    <tr>
      <th>1</th>
      <td>19</td>
      <td>dscx</td>
      <td>20200623</td>
      <td>22</td>
      <td>51101</td>
    </tr>
    <tr>
      <th>2</th>
      <td>25</td>
      <td>lll</td>
      <td>20210323</td>
      <td>23</td>
      <td>51002</td>
    </tr>
    <tr>
      <th>3</th>
      <td>30</td>
      <td>lent</td>
      <td>20210601</td>
      <td>26</td>
      <td>51021</td>
    </tr>
  </tbody>
</table>
<h2>“==” 匹配</h2>
<pre><code class="language-python">df.loc[df['user_id']&gt;21]</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>age</th>
      <th>nickname</th>
      <th>login_date</th>
      <th>user_id</th>
      <th>lover_id</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1</th>
      <td>19</td>
      <td>dscx</td>
      <td>20200623</td>
      <td>22</td>
      <td>51101</td>
    </tr>
    <tr>
      <th>3</th>
      <td>30</td>
      <td>lent</td>
      <td>20210601</td>
      <td>26</td>
      <td>51021</td>
    </tr>
  </tbody>
</table>
<h2>选取某列是否是某一类型的数值 用 isin</h2>
<pre><code class="language-python">df.loc[df['age'].isin([21,19])]</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>age</th>
      <th>nickname</th>
      <th>login_date</th>
      <th>user_id</th>
      <th>lover_id</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>21</td>
      <td>jel</td>
      <td>20210623</td>
      <td>21</td>
      <td>51100</td>
    </tr>
    <tr>
      <th>1</th>
      <td>19</td>
      <td>dscx</td>
      <td>20200623</td>
      <td>22</td>
      <td>51101</td>
    </tr>
  </tbody>
</table>
<h2>多种条件的选取 用 &amp;</h2>
<pre><code class="language-python">df.loc[(df['age'] &lt;30) &amp; df['user_id'].isin([21,22,23,26])]</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>age</th>
      <th>nickname</th>
      <th>login_date</th>
      <th>user_id</th>
      <th>lover_id</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>21</td>
      <td>jel</td>
      <td>20210623</td>
      <td>21</td>
      <td>51100</td>
    </tr>
    <tr>
      <th>1</th>
      <td>19</td>
      <td>dscx</td>
      <td>20200623</td>
      <td>22</td>
      <td>51101</td>
    </tr>
    <tr>
      <th>2</th>
      <td>25</td>
      <td>lll</td>
      <td>20210323</td>
      <td>23</td>
      <td>51002</td>
    </tr>
  </tbody>
</table>
<h2>选取不等于某些值的行记录 用 !=</h2>
<pre><code class="language-python">df.loc[df['login_date'] != 20210623]</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>age</th>
      <th>nickname</th>
      <th>login_date</th>
      <th>user_id</th>
      <th>lover_id</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>21</td>
      <td>jel</td>
      <td>20210623</td>
      <td>21</td>
      <td>51100</td>
    </tr>
    <tr>
      <th>1</th>
      <td>19</td>
      <td>dscx</td>
      <td>20200623</td>
      <td>22</td>
      <td>51101</td>
    </tr>
    <tr>
      <th>2</th>
      <td>25</td>
      <td>lll</td>
      <td>20210323</td>
      <td>23</td>
      <td>51002</td>
    </tr>
    <tr>
      <th>3</th>
      <td>30</td>
      <td>lent</td>
      <td>20210601</td>
      <td>26</td>
      <td>51021</td>
    </tr>
  </tbody>
</table>
<h2>isin返回一系列的数值,如果要选择不符合这个条件的数值使用&quot;~&quot;</h2>
<pre><code class="language-python">df.loc[~df['user_id'].isin([21,22])]</code></pre>
<blockquote>
<p>执行结果：</p>
</blockquote>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>age</th>
      <th>nickname</th>
      <th>login_date</th>
      <th>user_id</th>
      <th>lover_id</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2</th>
      <td>25</td>
      <td>lll</td>
      <td>20210323</td>
      <td>23</td>
      <td>51002</td>
    </tr>
    <tr>
      <th>3</th>
      <td>30</td>
      <td>lent</td>
      <td>20210601</td>
      <td>26</td>
      <td>51021</td>
    </tr>
  </tbody>
</table>
<h1>html文件在此，有更好的阅读体验</h1>
<p><span class="external-link"><a class="no-external-link" href="http://blog-cdn.shipaoniu.com/blog/typecho/DataFrame-%E6%9F%A5%E8%AF%A2%E4%B8%8E%E5%8C%B9%E9%85%8D.html.zip-typecho" target="_blank"><i data-feather="external-link"></i>DataFrame-查询与匹配.html.zip</a></span><br />
<span class="external-link"><a class="no-external-link" href="http://blog-cdn.shipaoniu.com/blog/typecho/DataFrame-%E6%9F%A5%E8%AF%A2%E4%B8%8E%E5%8C%B9%E9%85%8D.ipynb.zip-typecho" target="_blank"><i data-feather="external-link"></i>DataFrame-查询与匹配.ipynb.zip</a></span></p>
<hr />
<p>参考文献：</p>
<p>【1】<span class="external-link"><a class="no-external-link" href="https://blog.csdn.net/changzoe/article/details/82348913" target="_blank"><i data-feather="external-link"></i>pandas 根据列的值选取所有行</a></span></p>
<p>【2】<span class="external-link"><a class="no-external-link" href="https://blog.csdn.net/weixin_37536446/article/details/81266273" target="_blank"><i data-feather="external-link"></i>pandas小技巧之--值替换</a></span></p>
<p>【3】<span class="external-link"><a class="no-external-link" href="https://www.cnblogs.com/everfight/p/pandas_condition_remove.html" target="_blank"><i data-feather="external-link"></i>[译]如何根据条件从pandas DataFrame中删除不需要的行？ - everfight - 博客园</a></span></p>
<p>【4】<span class="external-link"><a class="no-external-link" href="https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated" target="_blank"><i data-feather="external-link"></i>官网</a></span></p>
<p><strong>Pandas 不错的学习资料：</strong><span class="external-link"><a class="no-external-link" href="http://blog-cdn.shipaoniu.com/blog/typecho/joyful-pandas.zip-typecho" target="_blank"><i data-feather="external-link"></i>joyful-pandas.zip</a></span></p>

DataFrame-行与列的查询与匹配

一文看清楚Pandas之DataFrame 的行与列的查询和条件匹配，

文末备注了html文件和jupyter notebook的文件

取列 (column)

取行（rows）

匹配:按条件获取

“==” 匹配

选取某列是否是某一类型的数值用 isin

多种条件的选取用 &

选取不等于某些值的行记录用 !=

isin返回一系列的数值,如果要选择不符合这个条件的数值使用"~"

html文件在此，有更好的阅读体验

Leave a Comment Cancel reply
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

KANO模型：产品人必懂的需求分析法

拼多多-产品深度体验

阿里云免费证书部署微信小程序开发https

《把时间当做朋友》- 自我心智的认知

通过Git Hooks实现自动部署

《把时间当做朋友》- 自我心智的认知

用过的linux命令

一文带你入门算法分发

幕后产品打造突破式产品思维

产品经理能力------如何做问卷调研

DataFrame-行与列的查询与匹配

一文看清楚Pandas之DataFrame 的 行 与列的查询和条件匹配，

文末备注了html文件和jupyter notebook的文件

取列 (column)

取行 （rows）

匹配:按条件获取

“==” 匹配

选取某列是否是某一类型的数值 用 isin

多种条件的选取 用 &

选取不等于某些值的行记录 用 !=

isin返回一系列的数值,如果要选择不符合这个条件的数值使用"~"

html文件在此，有更好的阅读体验

Leave a Comment Cancel reply 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

DataFrame-行与列的查询与匹配

一文看清楚Pandas之DataFrame 的行与列的查询和条件匹配，

取行（rows）

选取某列是否是某一类型的数值用 isin

多种条件的选取用 &

选取不等于某些值的行记录用 !=

Leave a Comment Cancel reply
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款