Pandas

Author： harrytsz
发布时间：April 23, 2021
681 views
No comments
4126 words
Categories： Python

Pandas 读取数据

本代码演示：

Pandas 读取纯文本文件
- 读取 csv 文件
- 读取 txt 文件
Pandas 读取 xlsx 格式 excel 文件
Pandas 读取 mysql 数据表

import pandas as pd

读取纯文本文件

读取 csv，使用默认的标题行、逗号分隔符

fpath = "./datas/ml-latest-small/ratings.csv"

ratings = pd.read_csv(fpath)  # 使用 pd.read_csv 读取数据

ratings.head()  # 查看前几行数据

	userId	movieId	rating	timestamp
0	1	1	4.0	964982703
1	1	3	4.0	964981247
2	1	6	4.0	964982224
3	1	47	5.0	964983815
4	1	50	5.0	964982931

ratings.shape  # 查看数据的形状，返回（行数，列数）

(100836, 4)

ratings.columns  # 查看列名

Index(['userId', 'movieId', 'rating', 'timestamp'], dtype='object')

ratings.index  # 索引

RangeIndex(start=0, stop=100836, step=1)

ratings.dtypes  # 查看每列数据类型

userId         int64
movieId        int64
rating       float64
timestamp      int64
dtype: object

读取 txt 文件，自己指定分隔符、列名

fpath = "./datas/crazyant/access_pvuv.txt"

pvuv = pd.read_csv(
    fpath,
    sep="\t",
    header=None,
    names=['pdate', 'pv', 'uv']
)

pvuv

	pdate	pv	uv
0	2019-09-10	139	92
1	2019-09-09	185	153
2	2019-09-08	123	59
3	2019-09-07	65	40
4	2019-09-06	157	98
5	2019-09-05	205	151
6	2019-09-04	196	167
7	2019-09-03	216	176
8	2019-09-02	227	148
9	2019-09-01	105	61

读取 excel 文件

fpath = "./datas/crazyant/access_pvuv.xlsx"
pvuv = pd.read_excel(fpath)

pvuv

	日期	PV	UV
0	2019-09-10	139	92
1	2019-09-09	185	153
2	2019-09-08	123	59
3	2019-09-07	65	40
4	2019-09-06	157	98
5	2019-09-05	205	151
6	2019-09-04	196	167
7	2019-09-03	216	176
8	2019-09-02	227	148
9	2019-09-01	105	61

读取 mysql 数据表

import pymysql

conn = pymysql.connect(
    host='127.0.0.1',
    user='root',
    password='123456',
    database='test_db',
    charset='utf8'
)

mysql_page = pd.read_sql("select * from crazyant_pvuv", con=conn)

mysql_page

	pdate	pv	uv
0	2019-09-10	139	92
1	2019-09-09	185	153
2	2019-09-08	123	59
3	2019-09-07	65	40
4	2019-09-06	157	98
5	2019-09-05	205	151
6	2019-09-04	196	167
7	2019-09-03	216	176
8	2019-09-02	227	148
9	2019-09-01	105	61

Last modification：June 20, 2021

© Allow specification reprint

如果觉得我的文章对你有用，请随意赞赏

Leave a Comment Cancel reply
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

Comment *

Private comment

Name *

🎲

Email *

Site

Pandas

harrytsz • 2021 年 04 月 23 日

<h2>Pandas 读取数据</h2><p>本代码演示：</p><ul><li><p>Pandas 读取纯文本文件</p><ul><li>读取 csv 文件</li><li>读取 txt 文件</li></ul></li><li>Pandas 读取 xlsx 格式 excel 文件</li><li>Pandas 读取 mysql 数据表</li></ul><pre><code class="lang-python">import pandas as pd</code></pre><h3>读取纯文本文件</h3><h4>读取 csv，使用默认的标题行、逗号分隔符</h4><pre><code class="lang-python">fpath = &quot;./datas/ml-latest-small/ratings.csv&quot;</code></pre><pre><code class="lang-python">ratings = pd.read_csv(fpath)  # 使用 pd.read_csv 读取数据</code></pre><pre><code class="lang-python">ratings.head()  # 查看前几行数据</code></pre><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>userId</th>
      <th>movieId</th>
      <th>rating</th>
      <th>timestamp</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>1</td>
      <td>4.0</td>
      <td>964982703</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>3</td>
      <td>4.0</td>
      <td>964981247</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>6</td>
      <td>4.0</td>
      <td>964982224</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1</td>
      <td>47</td>
      <td>5.0</td>
      <td>964983815</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>50</td>
      <td>5.0</td>
      <td>964982931</td>
    </tr>
  </tbody>
</table><pre><code class="lang-python">ratings.shape  # 查看数据的形状，返回（行数，列数）</code></pre><pre><code>(100836, 4)

</code></pre><pre><code class="lang-python">ratings.columns  # 查看列名</code></pre><pre><code>Index([&#039;userId&#039;, &#039;movieId&#039;, &#039;rating&#039;, &#039;timestamp&#039;], dtype=&#039;object&#039;)

</code></pre><pre><code class="lang-python">ratings.index  # 索引</code></pre><pre><code>RangeIndex(start=0, stop=100836, step=1)

</code></pre><pre><code class="lang-python">ratings.dtypes  # 查看每列数据类型</code></pre><pre><code>userId         int64
movieId        int64
rating       float64
timestamp      int64
dtype: object

</code></pre><h3>读取 txt 文件，自己指定分隔符、列名</h3><pre><code class="lang-python">fpath = &quot;./datas/crazyant/access_pvuv.txt&quot;</code></pre><pre><code class="lang-python">pvuv = pd.read_csv(
    fpath,
    sep=&quot;\t&quot;,
    header=None,
    names=[&#039;pdate&#039;, &#039;pv&#039;, &#039;uv&#039;]
)</code></pre><pre><code class="lang-python">pvuv</code></pre><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>pdate</th>
      <th>pv</th>
      <th>uv</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>2019-09-10</td>
      <td>139</td>
      <td>92</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2019-09-09</td>
      <td>185</td>
      <td>153</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2019-09-08</td>
      <td>123</td>
      <td>59</td>
    </tr>
    <tr>
      <th>3</th>
      <td>2019-09-07</td>
      <td>65</td>
      <td>40</td>
    </tr>
    <tr>
      <th>4</th>
      <td>2019-09-06</td>
      <td>157</td>
      <td>98</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2019-09-05</td>
      <td>205</td>
      <td>151</td>
    </tr>
    <tr>
      <th>6</th>
      <td>2019-09-04</td>
      <td>196</td>
      <td>167</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2019-09-03</td>
      <td>216</td>
      <td>176</td>
    </tr>
    <tr>
      <th>8</th>
      <td>2019-09-02</td>
      <td>227</td>
      <td>148</td>
    </tr>
    <tr>
      <th>9</th>
      <td>2019-09-01</td>
      <td>105</td>
      <td>61</td>
    </tr>
  </tbody>
</table><h3>读取 excel 文件</h3><pre><code class="lang-python">fpath = &quot;./datas/crazyant/access_pvuv.xlsx&quot;
pvuv = pd.read_excel(fpath)</code></pre><pre><code class="lang-python">pvuv</code></pre><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>日期</th>
      <th>PV</th>
      <th>UV</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>2019-09-10</td>
      <td>139</td>
      <td>92</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2019-09-09</td>
      <td>185</td>
      <td>153</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2019-09-08</td>
      <td>123</td>
      <td>59</td>
    </tr>
    <tr>
      <th>3</th>
      <td>2019-09-07</td>
      <td>65</td>
      <td>40</td>
    </tr>
    <tr>
      <th>4</th>
      <td>2019-09-06</td>
      <td>157</td>
      <td>98</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2019-09-05</td>
      <td>205</td>
      <td>151</td>
    </tr>
    <tr>
      <th>6</th>
      <td>2019-09-04</td>
      <td>196</td>
      <td>167</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2019-09-03</td>
      <td>216</td>
      <td>176</td>
    </tr>
    <tr>
      <th>8</th>
      <td>2019-09-02</td>
      <td>227</td>
      <td>148</td>
    </tr>
    <tr>
      <th>9</th>
      <td>2019-09-01</td>
      <td>105</td>
      <td>61</td>
    </tr>
  </tbody>
</table><h3>读取 mysql 数据表</h3><pre><code class="lang-python">import pymysql

conn = pymysql.connect(
    host=&#039;127.0.0.1&#039;,
    user=&#039;root&#039;,
    password=&#039;123456&#039;,
    database=&#039;test_db&#039;,
    charset=&#039;utf8&#039;
)</code></pre><pre><code class="lang-python">mysql_page = pd.read_sql(&quot;select * from crazyant_pvuv&quot;, con=conn)</code></pre><pre><code class="lang-python">mysql_page</code></pre><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>pdate</th>
      <th>pv</th>
      <th>uv</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>2019-09-10</td>
      <td>139</td>
      <td>92</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2019-09-09</td>
      <td>185</td>
      <td>153</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2019-09-08</td>
      <td>123</td>
      <td>59</td>
    </tr>
    <tr>
      <th>3</th>
      <td>2019-09-07</td>
      <td>65</td>
      <td>40</td>
    </tr>
    <tr>
      <th>4</th>
      <td>2019-09-06</td>
      <td>157</td>
      <td>98</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2019-09-05</td>
      <td>205</td>
      <td>151</td>
    </tr>
    <tr>
      <th>6</th>
      <td>2019-09-04</td>
      <td>196</td>
      <td>167</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2019-09-03</td>
      <td>216</td>
      <td>176</td>
    </tr>
    <tr>
      <th>8</th>
      <td>2019-09-02</td>
      <td>227</td>
      <td>148</td>
    </tr>
    <tr>
      <th>9</th>
      <td>2019-09-01</td>
      <td>105</td>
      <td>61</td>
    </tr>
  </tbody>
</table><p></div></p>