-
Notifications
You must be signed in to change notification settings - Fork 0
/
search.xml
182 lines (96 loc) · 120 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>数据分析----数据清洗</title>
<link href="/2019/01/20/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90-%E6%95%B0%E6%8D%AE%E6%B8%85%E6%B4%97/"/>
<url>/2019/01/20/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90-%E6%95%B0%E6%8D%AE%E6%B8%85%E6%B4%97/</url>
<content type="html"><![CDATA[<h1 id="数据分析—-数据清洗"><a href="#数据分析—-数据清洗" class="headerlink" title="数据分析—-数据清洗"></a>数据分析—-数据清洗</h1><h2 id="一、导入数据"><a href="#一、导入数据" class="headerlink" title="一、导入数据"></a>一、导入数据</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> pandas <span class="keyword">as</span> pd</span><br><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">from</span> pandas <span class="keyword">import</span> Series,DataFrame</span><br><span class="line"><span class="keyword">import</span> xlrd</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">df = DataFrame(pd.read_excel(<span class="string">'datas/grades.xlsx'</span>)) </span><br><span class="line">print(df)</span><br></pre></td></tr></table></figure><pre><code> Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 30 蒋广佳 43.0 69.0 61.01 廖菲 80.0 64.0 62.02 沈秀玲 68.0 74.0 98.03 韦丹 48.0 53.0 64.04 张梦雅 72.0 73.0 96.05 赵雅欣 60.0 NaN 70.06 曹海广 74.0 60.0 20.07 陈泽灿 38.0 21.0 92.08 NaN 88.0 67.0 84.09 高海亮 86.0 74.0 96.010 顾晓冬 84.0 60.0 90.011 侯星宇 64.0 111.0 NaN12 江宜哲 60.0 33.0 70.013 NaN NaN NaN NaN14 梁杨杨 68.0 54.0 94.015 刘辉 NaN 63.0 98.016 罗嘉豪 39.0 44.0 56.017 施亚君 90.0 63.0 90.018 孙添 64.0 63.0 78.019 王杰 74.0 NaN 76.020 王泽 52.0 48.0 94.021 NaN 60.0 69.0 74.022 杨福程 70.0 49.0 76.023 尤澳晨 91.0 67.0 86.024 翟佳 78.0 73.0 88.025 张旭 100.0 60.0 98.026 支星哲 80.0 63.0 100.027 邹湘涛 54.0 40.0 90.0</code></pre><ul><li>我们可以看见上面的数据是缺少标注的,列名缺少标注;并且有很多是空值,因此我们要对数据进行清洗,提高数据的质量。在这里数据清洗有四个要点简称“完全合一”<ul><li><strong>完</strong>整性:单条数据是否完整,统计的字段是否完善。</li><li><strong>全</strong>面性:观察某一列的全部数值,选中一列,我们可以看到最大值,最小值,平均值。我们可以通过常识判断数据是否合理,比如:数据定义、单位标识、数值本身。</li><li><strong>合</strong>法性:数据的类型、内容、大小的合法性。比如数据中存在非ASCII字符,性别存在未知,总分超过100等。</li><li>唯<strong>一</strong>性:数据是否存在重复记录,由于数据来源于不同的渠道,重复的情况是非常常见的。行数据、列数据都需要是唯一的。</li></ul></li><li>事实上数据清洗的标准有差不多七八条,有兴趣的可以了解一下,这里归纳为“完全合一”四条,按照这四条基本上可以解决数据清洗中的大部分问题,使得数据<strong>标准、干净、连续</strong>。</li></ul><h2 id="二、开始数据清洗"><a href="#二、开始数据清洗" class="headerlink" title="二、开始数据清洗"></a>二、开始数据清洗</h2><h3 id="1、完整性"><a href="#1、完整性" class="headerlink" title="1、完整性"></a>1、完整性</h3><h3 id="problem-1-空行"><a href="#problem-1-空行" class="headerlink" title="problem 1:空行"></a>problem 1:空行</h3><ul><li>solution: 删除</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df.dropna(how=<span class="string">"all"</span>,inplace=<span class="keyword">True</span>)</span><br></pre></td></tr></table></figure><h3 id="problem-2-缺失值"><a href="#problem-2-缺失值" class="headerlink" title="problem 2:缺失值"></a>problem 2:缺失值</h3><ul><li>solution:<ul><li>删除:删除数据缺失的记录</li><li>均值:使用当前列的均值</li><li>高频:使用当前列出现平率最高的数据</li></ul></li><li>首先我们先把列的标注补上</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df = df.rename(columns={<span class="string">'Unnamed: 0'</span>:<span class="string">'index'</span>,<span class="string">'Unnamed: 1'</span>:<span class="string">'math'</span>,<span class="string">'Unnamed: 2'</span>:<span class="string">'english'</span>,<span class="string">'Unnamed: 3'</span>:<span class="string">'c++'</span>})</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df = df.drop(columns=<span class="string">'index'</span>)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(df)</span><br></pre></td></tr></table></figure><pre><code> math english c++0 43.0 69.0 61.01 80.0 64.0 62.02 68.0 74.0 98.03 48.0 53.0 64.04 72.0 73.0 96.05 60.0 NaN 70.06 74.0 60.0 20.07 38.0 21.0 92.08 88.0 67.0 84.09 86.0 74.0 96.010 84.0 60.0 90.011 64.0 111.0 NaN12 60.0 33.0 70.014 68.0 54.0 94.015 NaN 63.0 98.016 39.0 44.0 56.017 90.0 63.0 90.018 64.0 63.0 78.019 74.0 NaN 76.020 52.0 48.0 94.021 60.0 69.0 74.022 70.0 49.0 76.023 91.0 67.0 86.024 78.0 73.0 88.025 100.0 60.0 98.026 80.0 63.0 100.027 54.0 40.0 90.0</code></pre><ul><li>现在我们想对df[‘math’]中缺失的值用平均值代替</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df[<span class="string">'math'</span>].fillna(df[<span class="string">'math'</span>].mean(),inplace=<span class="keyword">True</span>)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(df)</span><br></pre></td></tr></table></figure><pre><code> math english c++0 43.000000 69.0 61.01 80.000000 64.0 62.02 68.000000 74.0 98.03 48.000000 53.0 64.04 72.000000 73.0 96.05 60.000000 NaN 70.06 74.000000 60.0 20.07 38.000000 21.0 92.08 88.000000 67.0 84.09 86.000000 74.0 96.010 84.000000 60.0 90.011 64.000000 111.0 NaN12 60.000000 33.0 70.013 68.653846 NaN NaN14 68.000000 54.0 94.015 68.653846 63.0 98.016 39.000000 44.0 56.017 90.000000 63.0 90.018 64.000000 63.0 78.019 74.000000 NaN 76.020 52.000000 48.0 94.021 60.000000 69.0 74.022 70.000000 49.0 76.023 91.000000 67.0 86.024 78.000000 73.0 88.025 100.000000 60.0 98.026 80.000000 63.0 100.027 54.000000 40.0 90.0</code></pre><ul><li>如果想用最高频率的数据对english进行填充,可以通过value_counts获取math字段最高频次english_maxf,然后对其进行填充</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">english_maxf = df[<span class="string">'english'</span>].value_counts().index[<span class="number">0</span>]</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df[<span class="string">'english'</span>].fillna(english_maxf,inplace=<span class="keyword">True</span>)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(df)</span><br></pre></td></tr></table></figure><pre><code> math english c++0 43.000000 69.0 61.01 80.000000 64.0 62.02 68.000000 74.0 98.03 48.000000 53.0 64.04 72.000000 73.0 96.05 60.000000 63.0 70.06 74.000000 60.0 20.07 38.000000 21.0 92.08 88.000000 67.0 84.09 86.000000 74.0 96.010 84.000000 60.0 90.011 64.000000 111.0 NaN12 60.000000 33.0 70.013 68.653846 63.0 NaN14 68.000000 54.0 94.015 68.653846 63.0 98.016 39.000000 44.0 56.017 90.000000 63.0 90.018 64.000000 63.0 78.019 74.000000 63.0 76.020 52.000000 48.0 94.021 60.000000 69.0 74.022 70.000000 49.0 76.023 91.000000 67.0 86.024 78.000000 73.0 88.025 100.000000 60.0 98.026 80.000000 63.0 100.027 54.000000 40.0 90.0</code></pre><h3 id="2、全面性"><a href="#2、全面性" class="headerlink" title="2、全面性"></a>2、全面性</h3><h3 id="problem:列数据单位不统一"><a href="#problem:列数据单位不统一" class="headerlink" title="problem:列数据单位不统一"></a>problem:列数据单位不统一</h3><p>solution:将不同的单位的找出来,将其进行迭代替换,比如说将榜(lbs)转化为千克(kgs)</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 获取 weight 数据列中单位为 lbs 的数据</span></span><br><span class="line">rows_with_lbs = df[<span class="string">'weight'</span>].str.contains(<span class="string">'lbs'</span>).fillna(<span class="keyword">False</span>)</span><br><span class="line"><span class="keyword">print</span> df[rows_with_lbs]</span><br><span class="line"><span class="comment"># 将 lbs 转换为 kgs, 2.2lbs=1kgs</span></span><br><span class="line"><span class="keyword">for</span> i,lbs_row <span class="keyword">in</span> df[rows_with_lbs].iterrows():</span><br><span class="line"> <span class="comment"># 截取从头开始到倒数第三个字符之前,即去掉 lbs。</span></span><br><span class="line"> weight = int(float(lbs_row[<span class="string">'weight'</span>][:<span class="number">-3</span>])/<span class="number">2.2</span>)</span><br><span class="line"> df.at[i,<span class="string">'weight'</span>] = <span class="string">'{}kgs'</span>.format(weight)</span><br></pre></td></tr></table></figure><h3 id="3、合理性"><a href="#3、合理性" class="headerlink" title="3、合理性"></a>3、合理性</h3><h3 id="problem-非ASCII字符"><a href="#problem-非ASCII字符" class="headerlink" title="problem:非ASCII字符"></a>problem:非ASCII字符</h3><p>solution:对于非ASCII字符,我们可以采用删除或者替换的方式,我们直接选择删除</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df[<span class="string">'name'</span>].replace({<span class="string">r'[^\x00-\x7f]+'</span>:<span class="string">''</span>},regex=<span class="keyword">True</span>,inplace=<span class="keyword">True</span>)</span><br></pre></td></tr></table></figure><h3 id="4、唯一性"><a href="#4、唯一性" class="headerlink" title="4、唯一性"></a>4、唯一性</h3><h3 id="problem1:一列有多个参数"><a href="#problem1:一列有多个参数" class="headerlink" title="problem1:一列有多个参数"></a>problem1:一列有多个参数</h3><p>solution:比如英文名字,是有两部分组成的,包含两个参数Firstname、Lastnamr,我们需要将name一列拆分为Firstname和Lastname两个字段,我们可以采用split方法,对其进行切分</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">df[<span class="string">'First_namr'</span>,<span class="string">'Last_name'</span>] = df[<span class="string">'name'</span>].str.split(expand=<span class="keyword">True</span>)</span><br><span class="line">df.drop(<span class="string">'name'</span>,axis=<span class="number">1</span>,inplace=<span class="keyword">True</span>)</span><br></pre></td></tr></table></figure><h3 id="problem2-重读数据"><a href="#problem2-重读数据" class="headerlink" title="problem2:重读数据"></a>problem2:重读数据</h3><p>solution:我们校验数据是否存在重复数据,如果有重复数据,如果就用pandas提供的drop_duplicates()来删除重复数据。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df.drop_duplicates([<span class="string">'First_name'</span>,<span class="string">'Last_name'</span>],inplace=<span class="keyword">True</span>)</span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> 数据分析 </tag>
</tags>
</entry>
<entry>
<title>爬取网易云热评并生成词云</title>
<link href="/2019/01/12/%E7%88%AC%E5%8F%96%E7%BD%91%E6%98%93%E4%BA%91%E7%83%AD%E8%AF%84%E5%B9%B6%E7%94%9F%E6%88%90%E8%AF%8D%E4%BA%91/"/>
<url>/2019/01/12/%E7%88%AC%E5%8F%96%E7%BD%91%E6%98%93%E4%BA%91%E7%83%AD%E8%AF%84%E5%B9%B6%E7%94%9F%E6%88%90%E8%AF%8D%E4%BA%91/</url>
<content type="html"><![CDATA[<h2 id="分析热评的请求URL"><a href="#分析热评的请求URL" class="headerlink" title="分析热评的请求URL"></a>分析热评的请求URL</h2><ul><li><p>首先我们先对请求抓包,发现所有的评论都包含在 <strong><a href="https://music.163.com/weapi/v1/resource/comments/R_SO_4_32785700?csrf_token="" target="_blank" rel="noopener">https://music.163.com/weapi/v1/resource/comments/R_SO_4_32785700?csrf_token="</a></strong>里面,然后再去分析这个请求,发现这是一个<strong>POST</strong>请求,请求参数由两个<strong>params</strong>以及<strong>encSecKey</strong>。好了到此我们需要的东西都有了,接下来我们分析如何去得到这两个参数。</p><h3 id="找到请求"><a href="#找到请求" class="headerlink" title="找到请求"></a>找到请求</h3><p><img src="/2019/01/12/爬取网易云热评并生成词云/1.png" alt="1.png"></p><h3 id="分析请求参数"><a href="#分析请求参数" class="headerlink" title="分析请求参数"></a>分析请求参数</h3><p><img src="/2019/01/12/爬取网易云热评并生成词云/2.png" alt="2.png"></p><h2 id="分析js加密"><a href="#分析js加密" class="headerlink" title="分析js加密"></a>分析js加密</h2></li><li>找到全局js文件,找到两个参数所在的位置<br><img src="/2019/01/12/爬取网易云热评并生成词云/3.png" alt="3.png"></li><li>发现这两个参数是由<strong>window.asrsea</strong>获得的,接着去定位到这个函数找到对应的原函数<strong>d</strong><br><img src="/2019/01/12/爬取网易云热评并生成词云/4.png" alt="4.ong"></li><li>对js进行调试,发现d的四个参数,有三个是定值,这个函数还用到了a、b、c三个函数<br><img src="/2019/01/12/爬取网易云热评并生成词云/5.png" alt="5"></li><li>其中a是产生一个16位的随机数(这里我直接让它等于<strong>FwtEYduOXlNEHbLP</strong>)为什么要等与这个呢 hhh 因为我发现这个随机数,他在生成encText的时候用了一次,生成encSecKey的时候,又用了一次,而且encSecKey就只跟这个随机数相关,所以让这个随机数为定值的话,就可以直接得到encSecKey的值,不用再去搞一个rsa加密,为了让你们看清楚,我还是把贴出来把<br><img src="/2019/01/12/爬取网易云热评并生成词云/6.png" alt="6"></li><li><strong>b</strong>函数就是我们主要要解决的<strong>AES</strong>加密,经过调试,我们可以知道它的两个参数a、b分别是加密字符转、密钥。以及AES的偏移量为<strong>0102030405060708</strong>、加密模式为<strong>CBC</strong><br><img src="/2019/01/12/爬取网易云热评并生成词云/7.png" alt="7"></li><li>接下来看c函数,c函数其实是<strong>RSA</strong>加密,获取encSecKey的值的他的三个参数,只有a是变量,是我们随机生成的16为随机数,这里我们就默认为定值,b、c应该是和rsa加密有关的参数,应为本身并没有学过加密,这里我就不多说了,但是经过调试,我们可以知道b、c是定值 <strong>b =010001</strong> c是一大串字符串。见下图。<br><img src="/2019/01/12/爬取网易云热评并生成词云/8.png" alt="8"></li><li><p>最后我们具体分析一下d函数,经过N次调试,我发现这其实和我的想法差不多,h是一个字典,包含了我们需要的两个参数。encText是由两次AES加密产生的及两次b,加密字符串是一样的,然后密钥第一次是个定值<strong>0CoJUm6Qyw8W8jud</strong>,第二次是16位随机数,也相当于定值。所以encText就出来了,params是由一次RSA加密产生的,并且只与16位的随机数有关,这里就清楚为什么我让随机数直接等于<strong>FwtEYduOXlNEHbLP</strong>,哈哈。因为我调试的时候,刚好出现了这么个随机数,于是我就直接拿过来用了,这个随机数对应的encSecKey = <strong>81e7a41af9830200d5606be1a632e57eb0006b3cdae579127115c6323d4c4802f3af9efcee21d9f4126dde266773cbd795f19ae44028f9f8d038cd62d2816952fa99bb61ecb5fba87d5b178ff4b982ee34c7491808f7cb774554a0235a210caf2e5e867a0e2ebdf6f994be1b198ab43b14ce1f7cfa6f80b9070dea5fc5d6c712</strong><br><img src="/2019/01/12/爬取网易云热评并生成词云/9.png" alt=""></p><h2 id="用python重写js加密"><a href="#用python重写js加密" class="headerlink" title="用python重写js加密"></a>用python重写js加密</h2></li><li><p>经过js加密码的分析,我用python实现了一下AES加密,具体代码如下,包含两个参数,一个是需要加密的字符串,一个是密钥具体如下</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">AES_encrypt</span><span class="params">(text, key)</span>:</span></span><br><span class="line"> pad = <span class="number">16</span> - len(text) % <span class="number">16</span></span><br><span class="line"> text = text + pad * chr(pad)</span><br><span class="line"> encryptor = AES.new(key, AES.MODE_CBC, <span class="string">"0102030405060708"</span>)</span><br><span class="line"> encrypt_text = encryptor.encrypt(text)</span><br><span class="line"> encrypt_text = base64.b64encode(encrypt_text)</span><br><span class="line"> <span class="keyword">return</span> encrypt_text</span><br></pre></td></tr></table></figure></li><li><p>两次调用这个函数。得到结果与调试的结果对比,一模一样。哈哈,上代码、上图</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">f_key = <span class="string">"0CoJUm6Qyw8W8jud"</span></span><br><span class="line">text = <span class="string">"{\"rid\":\"R_SO_4_32785700\",\"offset\":\"20\",\"total\":\"true\",\"limit\":\"20\",\"csrf_token\":\"\"}"</span></span><br><span class="line">rs = AES_encrypt(text, f_key)</span><br><span class="line">params = AES_encrypt(str(rs)[<span class="number">2</span>:<span class="number">-1</span>], <span class="string">"FwtEYduOXlNEHbLP"</span>)</span><br></pre></td></tr></table></figure><p>这里解释一下,text是我进过N次调试得出的,因为在请求评论之前,text有好几个值来验证其他的东西,这里我大概理解了一下text的含义,这里我们只要知道offset是偏移量,limit是每次请求多少条,比如你请求前二十条则offset=0,limit = 20,我上面的是请求20-40条。<br><img src="/2019/01/12/爬取网易云热评并生成词云/10.png" alt=""><br><img src="/2019/01/12/爬取网易云热评并生成词云/11.png" alt=""></p></li><li><p>然后直接获取的encSecKey直接赋值就好啦,结合这两个参数,我们的请求参数就构造好了,直接POST吧,就能得到评论啦,哈哈,上代码,上图</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"> data = {</span><br><span class="line"> <span class="string">'params'</span>: params,</span><br><span class="line"> <span class="string">'encSecKey'</span>: encSecKey</span><br><span class="line">}</span><br><span class="line">headers = {</span><br><span class="line"> <span class="string">'Accept-Language'</span>:<span class="string">"zh-CN,zh;q=0.9,en;q=0.8"</span>,</span><br><span class="line"> <span class="string">'User-Agent'</span>:<span class="string">'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36'</span>,</span><br><span class="line"></span><br><span class="line"> <span class="string">'Cookie'</span>: <span class="string">'appver=1.5.0.75771'</span>,</span><br><span class="line"> <span class="string">'Referer'</span>: <span class="string">'http://music.163.com/'</span></span><br><span class="line">}</span><br><span class="line">url = <span class="string">"https://music.163.com/weapi/v1/resource/comments/R_SO_4_32785700?csrf_token="</span></span><br><span class="line">raw = requests.post(url,headers=headers, data=data)</span><br><span class="line">print(raw.json())</span><br></pre></td></tr></table></figure><p><img src="/2019/01/12/爬取网易云热评并生成词云/12.png" alt=""></p></li></ul><h2 id="解析json,获取评论"><a href="#解析json,获取评论" class="headerlink" title="解析json,获取评论"></a>解析json,获取评论</h2>]]></content>
</entry>
<entry>
<title>git简单指令</title>
<link href="/2018/12/25/git%E7%AE%80%E5%8D%95%E6%8C%87%E4%BB%A4/"/>
<url>/2018/12/25/git%E7%AE%80%E5%8D%95%E6%8C%87%E4%BB%A4/</url>
<content type="html"><![CDATA[<h1 id="git简单指令"><a href="#git简单指令" class="headerlink" title="git简单指令"></a>git简单指令</h1><h2 id="首先放一张学习路线"><a href="#首先放一张学习路线" class="headerlink" title="首先放一张学习路线"></a>首先放一张学习路线</h2><p><img src="/2018/12/25/git简单指令/Git.png" alt="git学习路线"></p><h2 id="1、创建版本库"><a href="#1、创建版本库" class="headerlink" title="1、创建版本库"></a>1、创建版本库</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">mkdir huzai //创建一个空目录</span><br><span class="line"><span class="built_in">cd</span> huzai //进入此目录</span><br><span class="line">git init //初始化git仓库</span><br></pre></td></tr></table></figure><h2 id="2、添加文件到版本库"><a href="#2、添加文件到版本库" class="headerlink" title="2、添加文件到版本库"></a>2、添加文件到版本库</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git add file //将文件添加到缓存区</span><br><span class="line">git commit -m <span class="string">"post message"</span> //提交并附带提交信息</span><br></pre></td></tr></table></figure><h2 id="3、版本回退"><a href="#3、版本回退" class="headerlink" title="3、版本回退"></a>3、版本回退</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git reset --hard HEAD^ //HEAD是一个指针,指向当前的版本,^代表上一代版本,HEAD^^代表上两代</span><br><span class="line">git reflog //查询每次提交的commit_id</span><br><span class="line">git reset --hard commit_id //根据id进行回退</span><br></pre></td></tr></table></figure><h2 id="4、管理修改"><a href="#4、管理修改" class="headerlink" title="4、管理修改"></a>4、管理修改</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git diff HEAD -- file //查看工作区(file)与最新版本(HEAD)的区别</span><br></pre></td></tr></table></figure><h2 id="5、撤销修改"><a href="#5、撤销修改" class="headerlink" title="5、撤销修改"></a>5、撤销修改</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">git checkout -- file //直接丢弃工作区的修改(可用于恢复误删的文件)</span><br><span class="line">//对于已经添加到缓存的修改</span><br><span class="line">git reset HEAD file //撤销缓存区的修改</span><br><span class="line">git checkout -- file</span><br></pre></td></tr></table></figure><h2 id="6、删除文件"><a href="#6、删除文件" class="headerlink" title="6、删除文件"></a>6、删除文件</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">rm file //删除本地文件</span><br><span class="line">git rm file //删除版本库中的文件</span><br><span class="line">git commit -m <span class="string">"post delete"</span> //提交删除事务</span><br></pre></td></tr></table></figure><h2 id="7、连接远程仓库"><a href="#7、连接远程仓库" class="headerlink" title="7、连接远程仓库"></a>7、连接远程仓库</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">//先生成连接密钥</span><br><span class="line">ssh-kengen -t rsa -C username</span><br><span class="line">//将id_rsa.pub中的内容复制到github的密钥管理中</span><br><span class="line">//再根据github的提示将本地仓库与远程仓库进行关联</span><br><span class="line">git remote add origin [email protected]:username/repository</span><br><span class="line">//再推送master分支的所有内容到远程仓库</span><br><span class="line">git push -u origin master</span><br></pre></td></tr></table></figure><h2 id="8、从远程仓库进行下载"><a href="#8、从远程仓库进行下载" class="headerlink" title="8、从远程仓库进行下载"></a>8、从远程仓库进行下载</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> [email protected]:username/repository</span><br></pre></td></tr></table></figure><h2 id="9、创建新的分支并切换到该分支下"><a href="#9、创建新的分支并切换到该分支下" class="headerlink" title="9、创建新的分支并切换到该分支下"></a>9、创建新的分支并切换到该分支下</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git checkout -b branchname //创建并切换</span><br><span class="line">git branch branch_name //创建</span><br><span class="line">git checkout branch_name //切换</span><br></pre></td></tr></table></figure><h2 id="10、合并指定分支到当前分支"><a href="#10、合并指定分支到当前分支" class="headerlink" title="10、合并指定分支到当前分支"></a>10、合并指定分支到当前分支</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git merge branch_name</span><br></pre></td></tr></table></figure><h2 id="11、删除分支"><a href="#11、删除分支" class="headerlink" title="11、删除分支"></a>11、删除分支</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git branch -d branch_name</span><br></pre></td></tr></table></figure><h2 id="12、如果合并时出现冲突"><a href="#12、如果合并时出现冲突" class="headerlink" title="12、如果合并时出现冲突"></a>12、如果合并时出现冲突</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">cat conflic_filename</span><br><span class="line">//git 会用<<<<< ===== >>>>>>显示不同分支的内容,你则需要手动解决冲突</span><br></pre></td></tr></table></figure><h2 id="13、分支管理策略"><a href="#13、分支管理策略" class="headerlink" title="13、分支管理策略"></a>13、分支管理策略</h2><p>master分支应该是非常稳定的,也就是用于发布最新版本的,平时不应该在上面干活,干活都应该在dev分支上<br>也就是说dev分支是不稳定的,到了某个时候将dev分支合并到master分支上,你和你的小伙伴应该在各自的分支上干活,然后推送到dev分支上</p><h2 id="14、bug分支"><a href="#14、bug分支" class="headerlink" title="14、bug分支"></a>14、bug分支</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">git add now_file</span><br><span class="line">git stash //保护现场</span><br><span class="line">//这里修改bug</span><br><span class="line">git stash pop //提取现场,继续工作</span><br></pre></td></tr></table></figure><h2 id="15、丢弃一个没有被合并的分支"><a href="#15、丢弃一个没有被合并的分支" class="headerlink" title="15、丢弃一个没有被合并的分支"></a>15、丢弃一个没有被合并的分支</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git branch -D branch_name</span><br></pre></td></tr></table></figure><h2 id="16、多人协作"><a href="#16、多人协作" class="headerlink" title="16、多人协作"></a>16、多人协作</h2><p> 1、尝试git push origin branch_name<br> 2、如果推送失败,说明远程分支比你的版本新,则你git pull 拉取远程文件<br> 3、合并你两的分支,如果有冲突则手动解决问题<br> 4、重复1</p><ul><li>注:如果git pull 提示 no tracking information 则说明远程分支和本地分支没有关联用下面的命令进行关联<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git branch --<span class="built_in">set</span>-uostream-to branch_name origin/branch_name</span><br></pre></td></tr></table></figure></li></ul><p>或者你不知道有什么分支<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git remote -v //查看远程仓库的信息</span><br><span class="line">git checkout -b branch_name origin/branch_name //创建本地分支以及远程分支</span><br><span class="line">git branch --<span class="built_in">set</span>-upstream-to branch_name origin/branch_name //进行关联</span><br></pre></td></tr></table></figure></p>]]></content>
<tags>
<tag> git </tag>
</tags>
</entry>
<entry>
<title>数据分析-pandas</title>
<link href="/2018/12/25/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90-pandas/"/>
<url>/2018/12/25/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90-pandas/</url>
<content type="html"><![CDATA[<h1 id="数据分析—-pandas"><a href="#数据分析—-pandas" class="headerlink" title="数据分析—-pandas"></a>数据分析—-pandas</h1><h2 id="核心数据结构-Series-amp-DataFrame"><a href="#核心数据结构-Series-amp-DataFrame" class="headerlink" title="核心数据结构 Series & DataFrame"></a>核心数据结构 Series & DataFrame</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> pandas <span class="keyword">as</span> pd</span><br><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">from</span> pandas <span class="keyword">import</span> Series, DataFrame</span><br></pre></td></tr></table></figure><h3 id="Series是一个定长的字典序列,它有两个基本属性index-、-value-index-默认是-0-1-2-3-递增的,也可以自己指定索引-index-‘a’-‘b’-‘c’"><a href="#Series是一个定长的字典序列,它有两个基本属性index-、-value-index-默认是-0-1-2-3-递增的,也可以自己指定索引-index-‘a’-‘b’-‘c’" class="headerlink" title="Series是一个定长的字典序列,它有两个基本属性index 、 value index 默认是 0 ,1,2,3 递增的,也可以自己指定索引 index=[‘a’, ‘b’, ‘c’]"></a><strong>Series是一个定长的字典序列</strong>,它有两个基本属性<strong>index 、 value</strong> index 默认是 0 ,1,2,3 递增的,也可以自己指定索引 index=[‘a’, ‘b’, ‘c’]</h3><h3 id="创建Series的三种方式"><a href="#创建Series的三种方式" class="headerlink" title="创建Series的三种方式"></a>创建Series的三种方式</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">x1 = Series([<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>])</span><br><span class="line">x2 = Series(data=[<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>],index=[<span class="string">'a'</span>,<span class="string">'b'</span>,<span class="string">'c'</span>,<span class="string">'d'</span>])</span><br><span class="line">dic = {<span class="string">'a'</span>:<span class="number">1</span>,<span class="string">'b'</span>:<span class="number">2</span>,<span class="string">'c'</span>:<span class="number">3</span>,<span class="string">'d'</span>:<span class="number">4</span>}</span><br><span class="line">x3 = Series(dic)</span><br><span class="line">print(x1)</span><br><span class="line">print(x2)</span><br><span class="line">print(x3)</span><br></pre></td></tr></table></figure><pre><code>0 11 22 33 4dtype: int64a 1b 2c 3d 4dtype: int64a 1b 2c 3d 4dtype: int64</code></pre><h3 id="DataFrame类似数据库中的表,可以将其看成是由有相同的索引的Series组成"><a href="#DataFrame类似数据库中的表,可以将其看成是由有相同的索引的Series组成" class="headerlink" title="DataFrame类似数据库中的表,可以将其看成是由有相同的索引的Series组成"></a>DataFrame类似数据库中的表,可以将其看成是由有相同的索引的Series组成</h3><h3 id="创建DataFra几种方式"><a href="#创建DataFra几种方式" class="headerlink" title="创建DataFra几种方式"></a>创建DataFra几种方式</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">data = {<span class="string">"chinese"</span>:[<span class="number">90</span>,<span class="number">80</span>,<span class="number">70</span>,<span class="number">60</span>,<span class="number">50</span>],<span class="string">'math'</span>:[<span class="number">70</span>,<span class="number">80</span>,<span class="number">70</span>,<span class="number">90</span>,<span class="number">60</span>],<span class="string">'english'</span>:[<span class="number">30</span>,<span class="number">50</span>,<span class="number">70</span>,<span class="number">80</span>,<span class="number">60</span>]}</span><br><span class="line">df1 = DataFrame(data=data,index=[<span class="string">'zhangfei'</span>,<span class="string">'guanyu'</span>,<span class="string">'zhaoyun'</span>,<span class="string">'huangzhong'</span>,<span class="string">'machao'</span>])</span><br><span class="line">print(df)</span><br></pre></td></tr></table></figure><pre><code> chinese english mathzhangfei 90 30 70guanyu 80 50 80zhaoyun 70 70 70huangzhong 60 80 90machao 50 60 60</code></pre><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> xlrd</span><br><span class="line">df2 = DataFrame(pd.read_excel(<span class="string">'datas/grades.xlsx'</span>))</span><br><span class="line">df2 = df2.drop_duplicates()</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(df2)</span><br></pre></td></tr></table></figure><pre><code> 姓名 高数 英语 C++0 蒋广佳 43 69 611 廖菲 80 64 622 沈秀玲 68 74 983 韦丹 48 53 644 张梦雅 72 73 965 赵雅欣 60 63 706 曹海广 74 60 207 陈泽灿 38 21 928 邓杰 88 67 849 高海亮 86 74 9610 顾晓冬 84 60 9011 侯星宇 64 69 9612 江宜哲 60 33 7013 李洪汀 76 56 8414 梁杨杨 68 54 9415 刘辉 68 63 9816 罗嘉豪 39 44 5617 施亚君 90 63 9018 孙添 64 63 7819 王杰 74 60 7620 王泽 52 48 9421 徐孟圆 60 69 7422 杨福程 70 49 7623 尤澳晨 91 67 8624 翟佳 78 73 8825 张旭 100 60 9826 支星哲 80 63 10027 邹湘涛 54 40 90</code></pre><h2 id="数据清洗"><a href="#数据清洗" class="headerlink" title="数据清洗"></a>数据清洗</h2><h3 id="删除不必要的行或列"><a href="#删除不必要的行或列" class="headerlink" title="删除不必要的行或列"></a>删除不必要的行或列</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#删除行</span></span><br><span class="line">df2 = df2.drop(columns=[<span class="string">'姓名'</span>])</span><br><span class="line">print(df2)</span><br></pre></td></tr></table></figure><pre><code> 高数 英语 C++0 43 69 611 80 64 622 68 74 983 48 53 644 72 73 965 60 63 706 74 60 207 38 21 928 88 67 849 86 74 9610 84 60 9011 64 69 9612 60 33 7013 76 56 8414 68 54 9415 68 63 9816 39 44 5617 90 63 9018 64 63 7819 74 60 7620 52 48 9421 60 69 7422 70 49 7623 91 67 8624 78 73 8825 100 60 9826 80 63 10027 54 40 90</code></pre><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#删除列</span></span><br><span class="line">df2 = df2.drop(index = [<span class="number">27</span>])</span><br><span class="line">print(df2)</span><br></pre></td></tr></table></figure><pre><code> 高数 英语 C++0 43 69 611 80 64 622 68 74 983 48 53 644 72 73 965 60 63 706 74 60 207 38 21 928 88 67 849 86 74 9610 84 60 9011 64 69 9612 60 33 7013 76 56 8414 68 54 9415 68 63 9816 39 44 5617 90 63 9018 64 63 7819 74 60 7620 52 48 9421 60 69 7422 70 49 7623 91 67 8624 78 73 8825 100 60 9826 80 63 100</code></pre><h3 id="重命名列名"><a href="#重命名列名" class="headerlink" title="重命名列名"></a>重命名列名</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df2 = df2.rename(columns={<span class="string">'高数'</span>:<span class="string">'math'</span>,<span class="string">'英语'</span>:<span class="string">'english'</span>})</span><br></pre></td></tr></table></figure><h3 id="去除重复的值"><a href="#去除重复的值" class="headerlink" title="去除重复的值"></a>去除重复的值</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df2 = df2.drop_duplicates()</span><br></pre></td></tr></table></figure><h3 id="更改数据格式"><a href="#更改数据格式" class="headerlink" title="更改数据格式"></a>更改数据格式</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">df2[<span class="string">'math'</span>] = df2[<span class="string">'math'</span>].astype(<span class="string">'str'</span>)</span><br><span class="line"><span class="comment">#df2['math'].astype(np.int64)</span></span><br></pre></td></tr></table></figure><h3 id="清除数据间的空格"><a href="#清除数据间的空格" class="headerlink" title="清除数据间的空格"></a>清除数据间的空格</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">df2[<span class="string">'math'</span>] = df2[<span class="string">'math'</span>].map(str.strip) <span class="comment">#删除左右两边的空格</span></span><br><span class="line">df2[<span class="string">'math'</span>] = df2[<span class="string">'math'</span>].map(str.lstrip) <span class="comment">#删除左边的空格(str.rstrip 右边的空格)</span></span><br></pre></td></tr></table></figure><h3 id="删除指定字符"><a href="#删除指定字符" class="headerlink" title="删除指定字符"></a>删除指定字符</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df2[<span class="string">'math'</span>] = df2[<span class="string">'math'</span>].str.strip(<span class="string">'$'</span>)</span><br></pre></td></tr></table></figure><h3 id="大小写转换"><a href="#大小写转换" class="headerlink" title="大小写转换"></a>大小写转换</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df2.columns = df2.columns.str.upper() <span class="comment">#全部大写(lower()全部小写 title()首字母大写)</span></span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df2</span><br></pre></td></tr></table></figure><div><br><style scoped><br> .dataframe tbody tr th:only-of-type {<br> vertical-align: middle;<br> }<br><br> .dataframe tbody tr th {<br> vertical-align: top;<br> }<br><br> .dataframe thead th {<br> text-align: right;<br> }<br></style><br><table border="1" class="dataframe"><br> <thead><br> <tr style="text-align: right;"><br> <th></th><br> <th>MATH</th><br> <th>ENGLISH</th><br> <th>C++</th><br> </tr><br> </thead><br> <tbody><br> <tr><br> <th>0</th><br> <td>43</td><br> <td>69</td><br> <td>61</td><br> </tr><br> <tr><br> <th>1</th><br> <td>80</td><br> <td>64</td><br> <td>62</td><br> </tr><br> <tr><br> <th>2</th><br> <td>68</td><br> <td>74</td><br> <td>98</td><br> </tr><br> <tr><br> <th>3</th><br> <td>48</td><br> <td>53</td><br> <td>64</td><br> </tr><br> <tr><br> <th>4</th><br> <td>72</td><br> <td>73</td><br> <td>96</td><br> </tr><br> <tr><br> <th>5</th><br> <td>60</td><br> <td>63</td><br> <td>70</td><br> </tr><br> <tr><br> <th>6</th><br> <td>74</td><br> <td>60</td><br> <td>20</td><br> </tr><br> <tr><br> <th>7</th><br> <td>38</td><br> <td>21</td><br> <td>92</td><br> </tr><br> <tr><br> <th>8</th><br> <td>88</td><br> <td>67</td><br> <td>84</td><br> </tr><br> <tr><br> <th>9</th><br> <td>86</td><br> <td>74</td><br> <td>96</td><br> </tr><br> <tr><br> <th>10</th><br> <td>84</td><br> <td>60</td><br> <td>90</td><br> </tr><br> <tr><br> <th>11</th><br> <td>64</td><br> <td>69</td><br> <td>96</td><br> </tr><br> <tr><br> <th>12</th><br> <td>60</td><br> <td>33</td><br> <td>70</td><br> </tr><br> <tr><br> <th>13</th><br> <td>76</td><br> <td>56</td><br> <td>84</td><br> </tr><br> <tr><br> <th>14</th><br> <td>68</td><br> <td>54</td><br> <td>94</td><br> </tr><br> <tr><br> <th>15</th><br> <td>68</td><br> <td>63</td><br> <td>98</td><br> </tr><br> <tr><br> <th>16</th><br> <td>39</td><br> <td>44</td><br> <td>56</td><br> </tr><br> <tr><br> <th>17</th><br> <td>90</td><br> <td>63</td><br> <td>90</td><br> </tr><br> <tr><br> <th>18</th><br> <td>64</td><br> <td>63</td><br> <td>78</td><br> </tr><br> <tr><br> <th>19</th><br> <td>74</td><br> <td>60</td><br> <td>76</td><br> </tr><br> <tr><br> <th>20</th><br> <td>52</td><br> <td>48</td><br> <td>94</td><br> </tr><br> <tr><br> <th>21</th><br> <td>60</td><br> <td>69</td><br> <td>74</td><br> </tr><br> <tr><br> <th>22</th><br> <td>70</td><br> <td>49</td><br> <td>76</td><br> </tr><br> <tr><br> <th>23</th><br> <td>91</td><br> <td>67</td><br> <td>86</td><br> </tr><br> <tr><br> <th>24</th><br> <td>78</td><br> <td>73</td><br> <td>88</td><br> </tr><br> <tr><br> <th>25</th><br> <td>100</td><br> <td>60</td><br> <td>98</td><br> </tr><br> <tr><br> <th>26</th><br> <td>80</td><br> <td>63</td><br> <td>100</td><br> </tr><br> </tbody><br></table><br></div><h3 id="使用apply函数对数据进行清洗"><a href="#使用apply函数对数据进行清洗" class="headerlink" title="使用apply函数对数据进行清洗"></a>使用apply函数对数据进行清洗</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#df2['MATH'] = df2['MATH'].apply(str.lower)</span></span><br><span class="line">df2[<span class="string">'MATH'</span>] = df2[<span class="string">'MATH'</span>].astype(np.int64)</span><br><span class="line">df2</span><br></pre></td></tr></table></figure><div><br><style scoped><br> .dataframe tbody tr th:only-of-type {<br> vertical-align: middle;<br> }<br><br> .dataframe tbody tr th {<br> vertical-align: top;<br> }<br><br> .dataframe thead th {<br> text-align: right;<br> }<br></style><br><table border="1" class="dataframe"><br> <thead><br> <tr style="text-align: right;"><br> <th></th><br> <th>MATH</th><br> <th>ENGLISH</th><br> <th>C++</th><br> </tr><br> </thead><br> <tbody><br> <tr><br> <th>0</th><br> <td>43</td><br> <td>69</td><br> <td>61</td><br> </tr><br> <tr><br> <th>1</th><br> <td>80</td><br> <td>64</td><br> <td>62</td><br> </tr><br> <tr><br> <th>2</th><br> <td>68</td><br> <td>74</td><br> <td>98</td><br> </tr><br> <tr><br> <th>3</th><br> <td>48</td><br> <td>53</td><br> <td>64</td><br> </tr><br> <tr><br> <th>4</th><br> <td>72</td><br> <td>73</td><br> <td>96</td><br> </tr><br> <tr><br> <th>5</th><br> <td>60</td><br> <td>63</td><br> <td>70</td><br> </tr><br> <tr><br> <th>6</th><br> <td>74</td><br> <td>60</td><br> <td>20</td><br> </tr><br> <tr><br> <th>7</th><br> <td>38</td><br> <td>21</td><br> <td>92</td><br> </tr><br> <tr><br> <th>8</th><br> <td>88</td><br> <td>67</td><br> <td>84</td><br> </tr><br> <tr><br> <th>9</th><br> <td>86</td><br> <td>74</td><br> <td>96</td><br> </tr><br> <tr><br> <th>10</th><br> <td>84</td><br> <td>60</td><br> <td>90</td><br> </tr><br> <tr><br> <th>11</th><br> <td>64</td><br> <td>69</td><br> <td>96</td><br> </tr><br> <tr><br> <th>12</th><br> <td>60</td><br> <td>33</td><br> <td>70</td><br> </tr><br> <tr><br> <th>13</th><br> <td>76</td><br> <td>56</td><br> <td>84</td><br> </tr><br> <tr><br> <th>14</th><br> <td>68</td><br> <td>54</td><br> <td>94</td><br> </tr><br> <tr><br> <th>15</th><br> <td>68</td><br> <td>63</td><br> <td>98</td><br> </tr><br> <tr><br> <th>16</th><br> <td>39</td><br> <td>44</td><br> <td>56</td><br> </tr><br> <tr><br> <th>17</th><br> <td>90</td><br> <td>63</td><br> <td>90</td><br> </tr><br> <tr><br> <th>18</th><br> <td>64</td><br> <td>63</td><br> <td>78</td><br> </tr><br> <tr><br> <th>19</th><br> <td>74</td><br> <td>60</td><br> <td>76</td><br> </tr><br> <tr><br> <th>20</th><br> <td>52</td><br> <td>48</td><br> <td>94</td><br> </tr><br> <tr><br> <th>21</th><br> <td>60</td><br> <td>69</td><br> <td>74</td><br> </tr><br> <tr><br> <th>22</th><br> <td>70</td><br> <td>49</td><br> <td>76</td><br> </tr><br> <tr><br> <th>23</th><br> <td>91</td><br> <td>67</td><br> <td>86</td><br> </tr><br> <tr><br> <th>24</th><br> <td>78</td><br> <td>73</td><br> <td>88</td><br> </tr><br> <tr><br> <th>25</th><br> <td>100</td><br> <td>60</td><br> <td>98</td><br> </tr><br> <tr><br> <th>26</th><br> <td>80</td><br> <td>63</td><br> <td>100</td><br> </tr><br> </tbody><br></table><br></div><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">plus</span><span class="params">(df)</span>:</span></span><br><span class="line"> df[<span class="string">'Total'</span>] = df[<span class="string">'MATH'</span>]+df[<span class="string">'ENGLISH'</span>]+df[<span class="string">'C++'</span>]</span><br><span class="line"> <span class="keyword">return</span> df</span><br><span class="line">df2 = df2.apply(plus,axis=<span class="number">1</span>)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(df2)</span><br></pre></td></tr></table></figure><pre><code> MATH ENGLISH C++ Total0 43 69 61 1731 80 64 62 2062 68 74 98 2403 48 53 64 1654 72 73 96 2415 60 63 70 1936 74 60 20 1547 38 21 92 1518 88 67 84 2399 86 74 96 25610 84 60 90 23411 64 69 96 22912 60 33 70 16313 76 56 84 21614 68 54 94 21615 68 63 98 22916 39 44 56 13917 90 63 90 24318 64 63 78 20519 74 60 76 21020 52 48 94 19421 60 69 74 20322 70 49 76 19523 91 67 86 24424 78 73 88 23925 100 60 98 25826 80 63 100 243</code></pre><h2 id="pandas中常用的统计函数"><a href="#pandas中常用的统计函数" class="headerlink" title="pandas中常用的统计函数"></a>pandas中常用的统计函数</h2><p><img src="/2018/12/25/数据分析-pandas/1.jpg" alt="1.jpg"></p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(df2.describe())</span><br></pre></td></tr></table></figure><pre><code> MATH ENGLISH C++ Totalcount 27.000000 27.000000 27.000000 27.000000mean 69.444444 59.703704 81.148148 210.296296std 16.113380 12.406000 17.933003 34.410212min 38.000000 21.000000 20.000000 139.00000025% 60.000000 55.000000 72.000000 193.50000050% 70.000000 63.000000 86.000000 216.00000075% 80.000000 68.000000 95.000000 239.500000max 100.000000 74.000000 100.000000 258.000000</code></pre><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"></span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> 数据分析 </tag>
</tags>
</entry>
<entry>
<title>数据分析----numpy </title>
<link href="/2018/12/23/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90-numpy/"/>
<url>/2018/12/23/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90-numpy/</url>
<content type="html"><![CDATA[<h1 id="数据分析—numpy"><a href="#数据分析—numpy" class="headerlink" title="数据分析—numpy"></a>数据分析—numpy</h1><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> pandas <span class="keyword">as</span> pd</span><br><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br></pre></td></tr></table></figure><h2 id="创建普通数组"><a href="#创建普通数组" class="headerlink" title="创建普通数组"></a>创建普通数组</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">a = np.array([<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>])</span><br><span class="line">b = np.array([[<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>],[<span class="number">4</span>,<span class="number">5</span>,<span class="number">6</span>],[<span class="number">7</span>,<span class="number">8</span>,<span class="number">9</span>]])</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">b[<span class="number">1</span>,<span class="number">1</span>] = <span class="number">10</span></span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">print(a.shape)</span><br><span class="line">print(b.shape)</span><br><span class="line">print(a.dtype)</span><br><span class="line">print(b)</span><br></pre></td></tr></table></figure><pre><code>(3,)(3, 3)int64[[1 2 3] [4 5 6] [7 8 9]]</code></pre><h2 id="创建结构数组"><a href="#创建结构数组" class="headerlink" title="创建结构数组"></a>创建结构数组</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">personalType = np.dtype({</span><br><span class="line"> <span class="string">'names'</span>:[<span class="string">'name'</span>,<span class="string">'age'</span>,<span class="string">'chinese'</span>,<span class="string">'math'</span>,<span class="string">'english'</span>],</span><br><span class="line"> <span class="string">'formats'</span>:[<span class="string">'S25'</span>,<span class="string">'i'</span>,<span class="string">'i'</span>,<span class="string">'i'</span>,<span class="string">'f'</span>]</span><br><span class="line">})</span><br><span class="line">students = np.array([(<span class="string">"huzai"</span>,<span class="number">22</span>,<span class="number">99</span>,<span class="number">99</span>,<span class="number">99.5</span>),(<span class="string">"huzai"</span>,<span class="number">22</span>,<span class="number">99</span>,<span class="number">99</span>,<span class="number">99.5</span>)],dtype=personalType)</span><br><span class="line">age = students[:][<span class="string">'age'</span>]</span><br><span class="line">print(np.mean(age))</span><br></pre></td></tr></table></figure><pre><code>22.0</code></pre><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(students)</span><br></pre></td></tr></table></figure><pre><code>[(b'huzai', 22, 99, 99, 99.5) (b'huzai', 22, 99, 99, 99.5)]</code></pre><h2 id="创建连续数组"><a href="#创建连续数组" class="headerlink" title="创建连续数组"></a>创建连续数组</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">x1 = np.arange(<span class="number">1</span>,<span class="number">11</span>,<span class="number">2</span>) <span class="comment">#步长为2,从1开始的等差数组(不包括终值)</span></span><br><span class="line">x2 = np.linspace(<span class="number">1</span>,<span class="number">9</span>,<span class="number">5</span>) <span class="comment">#将1-9分成5块,结果如上</span></span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">print(x1)</span><br><span class="line">print(x2)</span><br></pre></td></tr></table></figure><pre><code>[1 3 5 7 9][1. 3. 5. 7. 9.]</code></pre><h2 id="数组间的算数运算"><a href="#数组间的算数运算" class="headerlink" title="数组间的算数运算"></a>数组间的算数运算</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">print(np.add(x1,x2))</span><br><span class="line">print(np.subtract(x1,x2))</span><br><span class="line">print(np.multiply(x1,x2))</span><br><span class="line">print(np.divide(x1,x2))</span><br></pre></td></tr></table></figure><pre><code>[ 2. 6. 10. 14. 18.][0. 0. 0. 0. 0.][ 1. 9. 25. 49. 81.][1. 1. 1. 1. 1.]</code></pre><h2 id="统计函数"><a href="#统计函数" class="headerlink" title="统计函数"></a>统计函数</h2><h3 id="数组中的最值-np-amin-amax"><a href="#数组中的最值-np-amin-amax" class="headerlink" title="数组中的最值 np.amin() amax()"></a>数组中的最值 np.amin() amax()</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">a = np.array([[<span class="number">1</span>,<span class="number">3</span>,<span class="number">7</span>],[<span class="number">2</span>,<span class="number">5</span>,<span class="number">8</span>],[<span class="number">6</span>,<span class="number">4</span>,<span class="number">9</span>]])</span><br><span class="line">print(np.amin(a))</span><br><span class="line">print(np.amin(a,<span class="number">0</span>)) <span class="comment">#每一列的最小值</span></span><br><span class="line">print(np.amin(a,<span class="number">1</span>)) <span class="comment">#每行的最小值</span></span><br></pre></td></tr></table></figure><pre><code>1[1 3 7][1 2 4]</code></pre><h3 id="统计最大值与最小值之差-ptp"><a href="#统计最大值与最小值之差-ptp" class="headerlink" title="统计最大值与最小值之差 ptp()"></a>统计最大值与最小值之差 ptp()</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">print(np.ptp(a))</span><br><span class="line">print(np.ptp(a,<span class="number">0</span>)) <span class="comment">#每列最大值与最小值的差</span></span><br><span class="line">print(np.ptp(a,<span class="number">1</span>)) <span class="comment">#每行最大值与最小值的差</span></span><br></pre></td></tr></table></figure><pre><code>8[5 2 2][6 6 5]</code></pre><h3 id="统计数组的百分位数-percentile-a-p-axis-a-数组名-p-代表百分比-axis代表是行还是列"><a href="#统计数组的百分位数-percentile-a-p-axis-a-数组名-p-代表百分比-axis代表是行还是列" class="headerlink" title="统计数组的百分位数 percentile(a, p, axis) a:数组名 p 代表百分比 axis代表是行还是列"></a>统计数组的百分位数 percentile(a, p, axis) a:数组名 p 代表百分比 axis代表是行还是列</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">print(np.percentile(a,<span class="number">50</span>))</span><br><span class="line">print(np.percentile(a,<span class="number">50</span>,<span class="number">0</span>))</span><br><span class="line">print(np.percentile(a,<span class="number">50</span>,<span class="number">1</span>))</span><br></pre></td></tr></table></figure><pre><code>5.0[2. 4. 8.][3. 5. 6.]</code></pre><h3 id="统计数组中的中位数以及平均数-median-mean"><a href="#统计数组中的中位数以及平均数-median-mean" class="headerlink" title="统计数组中的中位数以及平均数 median() mean()"></a>统计数组中的中位数以及平均数 median() mean()</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">print(np.median(a))</span><br><span class="line">print(np.median(a,<span class="number">0</span>))</span><br><span class="line">print(np.median(a,<span class="number">1</span>))</span><br></pre></td></tr></table></figure><pre><code>5.0[2. 4. 8.][3. 5. 6.]</code></pre><h3 id="数组中的加权平均值-average-a-weights"><a href="#数组中的加权平均值-average-a-weights" class="headerlink" title="数组中的加权平均值 average(a,weights)"></a>数组中的加权平均值 average(a,weights)</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">b = np.array([<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>])</span><br><span class="line">wts = np.array([<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>])</span><br><span class="line">print(np.average(b))</span><br><span class="line">print(np.average(b,weights=wts))</span><br></pre></td></tr></table></figure><pre><code>2.53.0</code></pre><h3 id="统计数组中的标准差(std())与方差(var())"><a href="#统计数组中的标准差(std())与方差(var())" class="headerlink" title="统计数组中的标准差(std())与方差(var())"></a>统计数组中的标准差(std())与方差(var())</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">print(np.std(b))</span><br><span class="line">print(np.var(b))</span><br></pre></td></tr></table></figure><pre><code>1.1180339887498951.25</code></pre><h2 id="Numpy排序"><a href="#Numpy排序" class="headerlink" title="Numpy排序"></a>Numpy排序</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">print(a)</span><br><span class="line">print(np.sort(a))</span><br><span class="line">print(np.sort(a,<span class="number">0</span>))</span><br></pre></td></tr></table></figure><pre><code>[[1 3 7] [2 5 8] [6 4 9]][[1 3 7] [2 5 8] [4 6 9]][[1 3 7] [2 4 8] [6 5 9]]</code></pre><h1 id="作业题"><a href="#作业题" class="headerlink" title="作业题"></a>作业题</h1><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">st_type = np.dtype({</span><br><span class="line"> <span class="string">'names'</span>:[<span class="string">'name'</span>,<span class="string">'chinese'</span>,<span class="string">'english'</span>,<span class="string">'math'</span>],</span><br><span class="line"> <span class="string">'formats'</span>:[<span class="string">'S25'</span>,<span class="string">'i'</span>,<span class="string">'i'</span>,<span class="string">'i'</span>]</span><br><span class="line">})</span><br><span class="line">grades = np.array([(<span class="string">'zhangfei'</span>,<span class="number">66</span>,<span class="number">65</span>,<span class="number">30</span>),(<span class="string">'guanyu'</span>,<span class="number">95</span>,<span class="number">85</span>,<span class="number">98</span>),(<span class="string">'zhaoyun'</span>,<span class="number">93</span>,<span class="number">92</span>,<span class="number">96</span>),(<span class="string">'huangzhong'</span>,<span class="number">90</span>,<span class="number">88</span>,<span class="number">77</span>),</span><br><span class="line"> (<span class="string">'dianwei'</span>,<span class="number">80</span>,<span class="number">90</span>,<span class="number">90</span>)],dtype=st_type)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(grades)</span><br></pre></td></tr></table></figure><pre><code>[(b'zhangfei', 66, 65, 30) (b'guanyu', 95, 85, 98) (b'zhaoyun', 93, 92, 96) (b'huangzhong', 90, 88, 77) (b'dianwei', 80, 90, 90)]</code></pre><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">chinese = grades[:][<span class="string">'chinese'</span>] </span><br><span class="line">english = grades[:][<span class="string">'english'</span>]</span><br><span class="line">math = grades[:][<span class="string">'math'</span>]</span><br><span class="line">total = np.add(chinese,english,math)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(total)</span><br></pre></td></tr></table></figure><pre><code>[131 180 185 178 170]</code></pre><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">c_a,e_a,m_a = np.average(chinese),np.average(english),np.average(math)</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">print(c_a)</span><br></pre></td></tr></table></figure><pre><code>84.8</code></pre><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"></span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> 数据分析 </tag>
</tags>
</entry>
<entry>
<title>小说更新后发送邮箱的问题解决</title>
<link href="/2018/11/23/%E5%B0%8F%E8%AF%B4%E6%9B%B4%E6%96%B0%E5%90%8E%E5%8F%91%E9%80%81%E9%82%AE%E7%AE%B1%E7%9A%84%E9%97%AE%E9%A2%98%E8%A7%A3%E5%86%B3/"/>
<url>/2018/11/23/%E5%B0%8F%E8%AF%B4%E6%9B%B4%E6%96%B0%E5%90%8E%E5%8F%91%E9%80%81%E9%82%AE%E7%AE%B1%E7%9A%84%E9%97%AE%E9%A2%98%E8%A7%A3%E5%86%B3/</url>
<content type="html"><![CDATA[<h2 id="遇到的问题"><a href="#遇到的问题" class="headerlink" title="遇到的问题"></a>遇到的问题</h2><ul><li>先上一张图,看一下问题在哪<br><img src="/2018/11/23/小说更新后发送邮箱的问题解决/1.png" alt="1.png"></li></ul><p>从图上我们可以看到,虽然小说是每天发两次给我,但是有时候会漏掉一章,有时候又会重复发一章,这都是小说作者不按时更新惹的祸啊!!!既然小说作者的习惯改变不了,那就完善自我吧!</p><h2 id="新的思路,加一个“缓存”"><a href="#新的思路,加一个“缓存”" class="headerlink" title="新的思路,加一个“缓存”"></a>新的思路,加一个“缓存”</h2><p>缓存是什么意思呢?</p><ul><li>本地建一个名为origin.txt的文本文件,每次爬取完成后与origin.txt里面的内容对比一下。</li><li>如果一样则不发送。</li><li>如果不一样就发送,并且将最新的内容保存到origin.txt文件中,作为下一次的对照。</li></ul><h2 id="演示效果"><a href="#演示效果" class="headerlink" title="演示效果"></a>演示效果</h2><ul><li>再origin.txt里写入test,然后启动<br><img src="/2018/11/23/小说更新后发送邮箱的问题解决/2.png" alt="2.png"><br>我们可以看到,origin文件已被重新写入,并且新的文本已经发送到邮箱</li><li>再次启动,也就是还没跟新的情况<br><img src="/2018/11/23/小说更新后发送邮箱的问题解决/3.png" alt="3"><br>我们看到,提示消息,这里不会跟新origin,也不会发送邮件</li></ul><h2 id="还遗留的问题"><a href="#还遗留的问题" class="headerlink" title="还遗留的问题"></a>还遗留的问题</h2><ul><li>如果作者一下爆发怎么办(一分钟更新十章!)<br>思考一下,其实原理差不多,缓存大小不同罢了!有兴趣的可以尝试一下!<h2 id="项目源码已发布在github"><a href="#项目源码已发布在github" class="headerlink" title="项目源码已发布在github"></a>项目源码已发布在github</h2><a href="https://github.com/huzai9527/fictionSend" target="_blank" rel="noopener">https://github.com/huzai9527/fictionSend</a></li></ul>]]></content>
<tags>
<tag> python </tag>
</tags>
</entry>
<entry>
<title>c++指针问题</title>
<link href="/2018/11/19/c-%E6%8C%87%E9%92%88%E9%97%AE%E9%A2%98/"/>
<url>/2018/11/19/c-%E6%8C%87%E9%92%88%E9%97%AE%E9%A2%98/</url>
<content type="html"><![CDATA[<h2 id="指针究竟是什么"><a href="#指针究竟是什么" class="headerlink" title="指针究竟是什么"></a>指针究竟是什么</h2><ul><li>指针是一类特殊的变量,他保存的不是一般数据的值,而是程序中另一对象在内存中的地址<br>我们先通过一个小程序看一看指针如何工作<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#include <iostream></span></span><br><span class="line">using namespace std;</span><br><span class="line">int <span class="function"><span class="title">main</span></span>(){</span><br><span class="line">int n = 123,m = 456;</span><br><span class="line">int *p = &n;</span><br><span class="line">cout<<<span class="string">"&n:"</span><<&n<<endl;</span><br><span class="line">cout<<<span class="string">"&p:"</span><<&p<<endl;</span><br><span class="line">cout<<<span class="string">" p:"</span><<p<<endl;</span><br><span class="line">cout<<<span class="string">"*p:"</span><<*p<<endl;</span><br><span class="line"><span class="built_in">return</span> 0;</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li></ul><p><img src="/2018/11/19/c-指针问题/1.png" alt="1"><br>从运行结果可以看出下面几点:</p><ul><li>p本身是有一个地址的且地址为 <strong>&p</strong></li><li>p的值是另一个变量n的地址 <strong>&n</strong></li><li>*p所表示的意思是地址为 <strong>p</strong> 的内存中所存的值 <strong>n</strong></li><li>即本段程序中共涉及到2个地址,一个是 <strong>n</strong> 的地址,一个是 <strong>p</strong> 的地址,我们用一张图来表示他们的关系<br><img src="/2018/11/19/c-指针问题/2.png" alt="2"><h2 id="指针的初始化"><a href="#指针的初始化" class="headerlink" title="指针的初始化"></a>指针的初始化</h2></li><li><p>被具有相同类型的对象初始化</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">int i = 10;</span><br><span class="line">int *p = &i;</span><br></pre></td></tr></table></figure></li><li><p>由另一个同一类型的指针初始化,这时两个指针指向同一地址空间</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">int *p1 = p;</span><br></pre></td></tr></table></figure></li><li><p>通过直接分配内存地址得到初值</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">int *p2 = new int;</span><br></pre></td></tr></table></figure></li><li><p>指针也可以没有类型,通用指针的定义,这样的指针可以指向任一对象</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">void *p3</span><br></pre></td></tr></table></figure></li></ul><h2 id="指针的运算符"><a href="#指针的运算符" class="headerlink" title="指针的运算符"></a>指针的运算符</h2><p>定义指针的目的事通过指针变量间接的访问变量</p><ul><li><strong>*</strong>:取指针值运算符。通过指针所指内存单元的地址间接的访问对应的存储单元。若指针变量p指向变量a,则 *p的运算结果为变量a的值</li><li><p><strong>&</strong>:取地址运算符。返回变量对应的存储单元地址,若a为int变量,p为int型指针变量,则 p = &a表示将a的存储单元地址赋给p。<br>用一个程序验证一下:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#include <iostream></span></span><br><span class="line">using namespace std;</span><br><span class="line">int <span class="function"><span class="title">main</span></span>(){</span><br><span class="line"> int a = 100;</span><br><span class="line"> int *p,*p1,*q;</span><br><span class="line"> p = &a;</span><br><span class="line"> p1 = p;</span><br><span class="line"> q = NULL;</span><br><span class="line"> cout<<<span class="string">"a="</span><<a<<<span class="string">","</span><<<span class="string">"*p="</span><<*p<<<span class="string">","</span><<<span class="string">"p="</span><<p<<endl;</span><br><span class="line"> *p1 = 200;</span><br><span class="line"> cout<<<span class="string">"a="</span><<a<<<span class="string">","</span><<<span class="string">"*p="</span><<*p<<<span class="string">","</span><<<span class="string">"p="</span><<p<<endl;</span><br><span class="line"> cout<<<span class="string">"*p1="</span><<*p1<<<span class="string">","</span><<<span class="string">"p1="</span><<p1<<endl;</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li><li><p>运行结果<br><img src="/2018/11/19/c-指针问题/3.png" alt="3.png"></p><h2 id="指针与数组的关系"><a href="#指针与数组的关系" class="headerlink" title="指针与数组的关系"></a>指针与数组的关系</h2></li><li>数组名和指针在引用数组元素和取他们的地址方面可以相互转换,但两者有一个重要的不同点</li><li><p>数组是在定义时就分配好内存空间的,因此数组名是一个地址常量,在程序中不能将数组名作为变量为其赋值,而指针是一个变量,可以多次赋值<br>我们通过一个程序看一下他们的关系</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#include <iostream></span></span><br><span class="line">using namespace std;</span><br><span class="line">int <span class="function"><span class="title">main</span></span>(){</span><br><span class="line">int a[10]={1,2,3,4,5,6,7,8,9,10};</span><br><span class="line">int *pa = a;</span><br><span class="line">int i = 3;</span><br><span class="line">cout<<<span class="string">"a[i] :"</span><<a[i]<<endl;</span><br><span class="line">cout<<<span class="string">"*(pa+i):"</span><<*(pa+i)<<endl;</span><br><span class="line">cout<<<span class="string">"*(a+i) :"</span><<*(a+i)<<endl;</span><br><span class="line">cout<<<span class="string">"&a[i] :"</span><<&a[i]<<endl;</span><br><span class="line">cout<<<span class="string">"a+i :"</span><<a+i<<endl;</span><br><span class="line">cout<<<span class="string">"pa+i :"</span><<pa+i<<endl;</span><br><span class="line"> </span><br><span class="line">}</span><br></pre></td></tr></table></figure></li><li><p>运行结果<br><img src="/2018/11/19/c-指针问题/4.png" alt="4"></p><h2 id="易重要的和易混淆的概念"><a href="#易重要的和易混淆的概念" class="headerlink" title="易重要的和易混淆的概念"></a>易重要的和易混淆的概念</h2></li><li>为什么要对指针初始化?<br>定义了指针变量后,系统会为其分配一个内存空间,若没有赋值则此内存区域的内容是随机的,也就是指针随机指向一个内存单元。你想想如果你对一个随机的内存空间进行写操作,会怎样!</li><li>指针的运算<br>指针 + 整数 = 指针<br>指针 - 指针 = 整数 //同类行的指针相减表示两个基类型变量的个数<br>指针 + 指针 = ???? //不可以</li><li><p>new、 new[]、 delete、 delete[]有什么区别</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">int *p = new int(3) // 为p分配了一个整形变量的存储区域并初始化为3</span><br><span class="line">int *p1 = new int[20] // 分配20个整形变量的区域给p1</span><br><span class="line">delete p //释放有new申请的空间</span><br><span class="line">delete[] p1 //释放由new[]申请的空间</span><br></pre></td></tr></table></figure></li><li><p>c++程序中动态分配的内存不会被自动释放</p></li><li>指针函数和函数指针含义相同吗?<br>完全不同!<br>指针函数:若一个函数返回的是一个地址,则称该函数为指针函数。格式是 数据类型<em> 函数名(参数列表)<br>函数指针:指针变量指向一个函数的入口地址,格式为 数据类型 (</em>函数指针变量)(参数列表)<br>函数指针的用法:<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#include <iostream></span></span><br><span class="line">using namespace std;</span><br><span class="line">int cul(int (*pf)(int,int), int x, int y){</span><br><span class="line"><span class="built_in">return</span> pf(x,y);</span><br><span class="line">}</span><br><span class="line">int add(int x,int y){</span><br><span class="line"><span class="built_in">return</span> x+y;</span><br><span class="line">}</span><br><span class="line">int sub(int x,int y){</span><br><span class="line"><span class="built_in">return</span> x-y;</span><br><span class="line">}</span><br><span class="line">int <span class="function"><span class="title">main</span></span>(){</span><br><span class="line">int a=10,b=20;</span><br><span class="line">cout<<a<<<span class="string">"+"</span><<b<<<span class="string">"="</span><<cul(add,a,b)<<endl;</span><br><span class="line">cout<<a<<<span class="string">"-"</span><<b<<<span class="string">"="</span><<cul(sub,a,b);</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li></ul><p><img src="/2018/11/19/c-指针问题/5.png" alt="5"></p><ul><li>常量指针、指针常量、指向常量的指针常量有什么区别<br>常量指针:表示指针指向的是一个常量,格式:const 类型<em> 指针变量 或 类型 const </em> 指针变量<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">int i;</span><br><span class="line">const int *p = &i;</span><br><span class="line">*p = 10; //错误</span><br><span class="line">i = 10; //正确</span><br></pre></td></tr></table></figure></li></ul><p>指针常量:表示所定义的指针是一个常量,只能在定义的时候初始化<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">int i,j;</span><br><span class="line">int * const p = &i;</span><br><span class="line">p = &j;//错误</span><br></pre></td></tr></table></figure></p><p>指向常量的指针常量:格式为 const 类型 * const 指针常量<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">int i,j;</span><br><span class="line">const int * const p = &i;</span><br><span class="line">*p = 10;//错误</span><br><span class="line">p = &j;//错误</span><br><span class="line">i = 10;//正确</span><br></pre></td></tr></table></figure></p>]]></content>
<tags>
<tag> c++ </tag>
</tags>
</entry>
<entry>
<title>scrapy构建自己的ip代理池</title>
<link href="/2018/11/18/scrapy%E6%9E%84%E5%BB%BA%E8%87%AA%E5%B7%B1%E7%9A%84ip%E4%BB%A3%E7%90%86%E6%B1%A0/"/>
<url>/2018/11/18/scrapy%E6%9E%84%E5%BB%BA%E8%87%AA%E5%B7%B1%E7%9A%84ip%E4%BB%A3%E7%90%86%E6%B1%A0/</url>
<content type="html"><![CDATA[<h1 id="用scrapy爬取可用的代理"><a href="#用scrapy爬取可用的代理" class="headerlink" title="用scrapy爬取可用的代理"></a>用scrapy爬取可用的代理</h1><h2 id="分析免费代理网站的结构"><a href="#分析免费代理网站的结构" class="headerlink" title="分析免费代理网站的结构"></a>分析免费代理网站的结构</h2><ul><li>我爬取了三个字段:<strong>IP</strong>、<strong>port</strong>、<strong>type</strong><br><img src="https://i.loli.net/2018/11/18/5bf12dc61a906.jpg" alt="TIM图片20181118171534.jpg"><h2 id="分析要爬取的数据,编写items-py"><a href="#分析要爬取的数据,编写items-py" class="headerlink" title="分析要爬取的数据,编写items.py"></a>分析要爬取的数据,编写items.py</h2></li><li>因此在items.py中,建立相应的字段<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">import scrapy</span><br><span class="line">class IproxyItem(scrapy.Item):</span><br><span class="line"> <span class="comment"># define the fields for your item here like:</span></span><br><span class="line"> <span class="comment"># name = scrapy.Field()</span></span><br><span class="line"> ip = scrapy.Field()</span><br><span class="line"> <span class="built_in">type</span> = scrapy.Field()</span><br><span class="line"> port = scrapy.Field()</span><br></pre></td></tr></table></figure></li></ul><h2 id="爬取所有的免费ip"><a href="#爬取所有的免费ip" class="headerlink" title="爬取所有的免费ip"></a>爬取所有的免费ip</h2><ul><li>在spider目录下,创建IpSpider.py<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">import scrapy</span><br><span class="line">import Iproxy.items</span><br><span class="line">class IpSpider(scrapy.Spider):</span><br><span class="line"> name = <span class="string">'IpSpider'</span></span><br><span class="line"> allowed_domains = [<span class="string">'xicidaili.com'</span>]</span><br><span class="line"> start_urls = [<span class="string">'http://www.xicidaili.com/'</span>]</span><br><span class="line"></span><br><span class="line"> def parse(self, response):</span><br><span class="line"> item = Iproxy.items.IproxyItem()</span><br><span class="line"> item[<span class="string">'ip'</span>] = response.css(<span class="string">'tr td:nth-child(2)::text'</span>).extract()</span><br><span class="line"> item[<span class="string">'port'</span>] = response.css(<span class="string">'tr td:nth-child(3)::text'</span>).extract()</span><br><span class="line"> item[<span class="string">'type'</span>] = response.css(<span class="string">'tr td:nth-child(6) ::text'</span>).extract()</span><br><span class="line"> yield item</span><br></pre></td></tr></table></figure></li></ul><h2 id="检测是否可用,如果可用则存入数据库"><a href="#检测是否可用,如果可用则存入数据库" class="headerlink" title="检测是否可用,如果可用则存入数据库"></a>检测是否可用,如果可用则存入数据库</h2><ul><li>因为是免费的ip,所以我们有必要检测一下他是否可用,对于可用的就存入数据库,反之则丢弃</li><li>检测处理数据在pipeline.py中编写</li><li>检测原理,通过代理访问百度,如果能够访问,则说明可用<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># Define your item pipelines here</span></span><br><span class="line"><span class="comment">#</span></span><br><span class="line"><span class="comment"># Don't forget to add your pipeline to the ITEM_PIPELINES setting</span></span><br><span class="line"><span class="comment"># See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html</span></span><br><span class="line"></span><br><span class="line">import pymysql</span><br><span class="line">import requests</span><br><span class="line"></span><br><span class="line">class IproxyPipeline(object):</span><br><span class="line"> def process_item(self, item, spider):</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">'@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'</span>)</span><br><span class="line"> db = pymysql.connect(<span class="string">"localhost"</span>, <span class="string">"root"</span>, <span class="string">"168168"</span>, <span class="string">"spider"</span>)</span><br><span class="line"> cursor = db.cursor()</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(1, len(item[<span class="string">'ip'</span>])):</span><br><span class="line"> ip = item[<span class="string">'ip'</span>][i] + <span class="string">':'</span> + item[<span class="string">'port'</span>][i]</span><br><span class="line"> try:</span><br><span class="line"> <span class="keyword">if</span> self.proxyIpCheck(ip) is False:</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">'此ip:'</span>+ip+<span class="string">"不能用"</span>)</span><br><span class="line"> <span class="built_in">continue</span></span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">'此ip:'</span>+ip+<span class="string">'可用,存入数据库!'</span>)</span><br><span class="line"> sql = <span class="string">'insert into proxyIp value ("%s")'</span> % (ip)</span><br><span class="line"> cursor.execute(sql)</span><br><span class="line"> db.commit()</span><br><span class="line"> except:</span><br><span class="line"> db.rollback()</span><br><span class="line"> db.close()</span><br><span class="line"> <span class="built_in">return</span> item</span><br><span class="line"></span><br><span class="line"> def proxyIpCheck(self, ip):</span><br><span class="line"> proxies = {<span class="string">'http'</span>: <span class="string">'http://'</span> + ip, <span class="string">'https'</span>: <span class="string">'https://'</span> + ip}</span><br><span class="line"> try:</span><br><span class="line"> r = requests.get(<span class="string">'https://www.baidu.com/'</span>, proxies=proxies, timeout=1)</span><br><span class="line"> <span class="keyword">if</span> (r.status_code == 200):</span><br><span class="line"> <span class="built_in">return</span> True</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="built_in">return</span> False</span><br><span class="line"> except:</span><br><span class="line"> <span class="built_in">return</span> False</span><br></pre></td></tr></table></figure></li></ul><h2 id="运行情况"><a href="#运行情况" class="headerlink" title="运行情况"></a>运行情况</h2><ul><li>可以看出还是有好多ip不能用的<br><img src="https://i.loli.net/2018/11/18/5bf1308222b42.png" alt="TIM图片20181118172712.png"></li><li>可用的存在数据库<br><img src="https://i.loli.net/2018/11/18/5bf130d8031b3.jpg" alt="TIM图片20181118172841.jpg"></li></ul>]]></content>
<tags>
<tag> python scrapy 爬虫 </tag>
</tags>
</entry>
<entry>
<title>python爬取最新更新的小说并发送到你的邮箱</title>
<link href="/2018/11/17/python%E7%88%AC%E5%8F%96%E6%9C%80%E6%96%B0%E6%9B%B4%E6%96%B0%E7%9A%84%E5%B0%8F%E8%AF%B4%E5%B9%B6%E5%8F%91%E9%80%81%E5%88%B0%E4%BD%A0%E7%9A%84%E9%82%AE%E7%AE%B1/"/>
<url>/2018/11/17/python%E7%88%AC%E5%8F%96%E6%9C%80%E6%96%B0%E6%9B%B4%E6%96%B0%E7%9A%84%E5%B0%8F%E8%AF%B4%E5%B9%B6%E5%8F%91%E9%80%81%E5%88%B0%E4%BD%A0%E7%9A%84%E9%82%AE%E7%AE%B1/</url>
<content type="html"><![CDATA[<h2 id="数据获取—Spider"><a href="#数据获取—Spider" class="headerlink" title="数据获取—Spider()"></a>数据获取—Spider()</h2><h3 id="找目标网站,该网站是你看小说的网站,分析该网站的结构方便你对内容的抓取"><a href="#找目标网站,该网站是你看小说的网站,分析该网站的结构方便你对内容的抓取" class="headerlink" title="找目标网站,该网站是你看小说的网站,分析该网站的结构方便你对内容的抓取"></a>找目标网站,该网站是你看小说的网站,分析该网站的结构方便你对内容的抓取</h3><p> <img src="https://i.loli.net/2018/11/17/5befc2f9dd2a9.png" alt="1.png"><br> 这里我获取最新章节的时间、标题以及标题的连接<br> <img src="https://i.loli.net/2018/11/17/5befc38daf280.png" alt="2.png"><br> 这里获取内容</p><h3 id="编写spider方法,确定他的返回值,这里我返回的是一个list,包括更新的时间、标题、内容"><a href="#编写spider方法,确定他的返回值,这里我返回的是一个list,包括更新的时间、标题、内容" class="headerlink" title="编写spider方法,确定他的返回值,这里我返回的是一个list,包括更新的时间、标题、内容"></a>编写spider方法,确定他的返回值,这里我返回的是一个list,包括更新的时间、标题、内容</h3><ul><li>方法中需要导入的包 <strong>requests</strong> <strong>bs4</strong> <strong>re</strong> <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">def spider():</span><br><span class="line"> list = []</span><br><span class="line"> response = requests.get(<span class="string">'https://www.xbiquge6.com/13_13134/'</span>)</span><br><span class="line"> response.encoding = (<span class="string">'utf-8'</span>)</span><br><span class="line"> html = response.text</span><br><span class="line"> html = BeautifulSoup(html, <span class="string">'html.parser'</span>)</span><br><span class="line"> time = html.select(<span class="string">'div#info p:nth-of-type(3)'</span>).__getitem__(0).text[5:]</span><br><span class="line"> title = html.select(<span class="string">'div#info p:nth-of-type(4) a[href]'</span>).__getitem__(0).text</span><br><span class="line"> href = html.select(<span class="string">'div#info p:nth-of-type(4) a[href]'</span>).__getitem__(0)</span><br><span class="line"> <span class="comment"># print(title)</span></span><br><span class="line"> pattern = re.compile(r<span class="string">'href="(.+?)"'</span>)</span><br><span class="line"> href = re.findall(pattern, href.__str__()).__getitem__(0)</span><br><span class="line"> href = <span class="string">"https://www.xbiquge6.com"</span> + href</span><br><span class="line"> response = requests.get(href)</span><br><span class="line"> response.encoding = (<span class="string">'utf-8'</span>)</span><br><span class="line"> html = BeautifulSoup(response.text, <span class="string">'html.parser'</span>)</span><br><span class="line"> content = html.select(<span class="string">'div#content'</span>)</span><br><span class="line"> <span class="comment"># print(content)</span></span><br><span class="line"> list.append(title)</span><br><span class="line"> list.append(content)</span><br><span class="line"> list.append(time)</span><br><span class="line"> <span class="built_in">return</span> list</span><br></pre></td></tr></table></figure></li></ul><h2 id="邮件发送—smtp"><a href="#邮件发送—smtp" class="headerlink" title="邮件发送—smtp()"></a>邮件发送—smtp()</h2><h3 id="首先先在你的邮箱中设置打开smtp服务"><a href="#首先先在你的邮箱中设置打开smtp服务" class="headerlink" title="首先先在你的邮箱中设置打开smtp服务"></a>首先先在你的邮箱中设置打开smtp服务</h3><p>比如我的QQ邮箱,先进入邮箱->点击设置->点击账户->下滑找到smtp服务->点击开启服务->生成授权码(就是你在smtp方法中用到的password)<br>![PCO_6AO93%@2W$B}<a href="https://i.loli.net/2018/11/17/5befc49990bec.png" target="_blank" rel="noopener">GFGHI0 (1).png</a></p><h3 id="编写smtp方法,向我的邮箱发送小说,确定返回值是bool类型,成功为True,失败为False"><a href="#编写smtp方法,向我的邮箱发送小说,确定返回值是bool类型,成功为True,失败为False" class="headerlink" title="编写smtp方法,向我的邮箱发送小说,确定返回值是bool类型,成功为True,失败为False"></a>编写smtp方法,向我的邮箱发送小说,确定返回值是bool类型,成功为True,失败为False</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">def mail():</span><br><span class="line"> list = spider();</span><br><span class="line"> ret = True</span><br><span class="line"> try:</span><br><span class="line"> mail_msg = list.__getitem__(1).__str__()</span><br><span class="line"> msg = MIMEText(mail_msg, <span class="string">'html'</span>, <span class="string">'utf-8'</span>)</span><br><span class="line"> msg[<span class="string">'From'</span>] = formataddr([<span class="string">'huzai'</span>, my_sender])</span><br><span class="line"> msg[<span class="string">'To'</span>] = formataddr([<span class="string">'huzai'</span>, receiver])</span><br><span class="line"> msg[<span class="string">'Subject'</span>] = list.__getitem__(0)</span><br><span class="line"> server = smtplib.SMTP_SSL(<span class="string">'smtp.qq.com'</span>, 465)</span><br><span class="line"> server.login(my_sender, my_pwd)</span><br><span class="line"> server.sendmail(my_sender, [receiver], msg.as_string())</span><br><span class="line"> server.quit()</span><br><span class="line"> except Exception:</span><br><span class="line"> ret = False</span><br><span class="line"> <span class="built_in">return</span> ret</span><br></pre></td></tr></table></figure><h2 id="上传脚本到服务器"><a href="#上传脚本到服务器" class="headerlink" title="上传脚本到服务器"></a>上传脚本到服务器</h2><h3 id="使用xftp将写好的smtp-py上传到你的云服务器上"><a href="#使用xftp将写好的smtp-py上传到你的云服务器上" class="headerlink" title="使用xftp将写好的smtp.py上传到你的云服务器上"></a>使用xftp将写好的smtp.py上传到你的云服务器上</h3><p><img src="https://i.loli.net/2018/11/17/5befc6acf033d.png" alt="3.png"><br>直接拖进去就行</p><h3 id="这里注意保证你的服务器上的python版本和你本机一致,且需要的包已经安装"><a href="#这里注意保证你的服务器上的python版本和你本机一致,且需要的包已经安装" class="headerlink" title="这里注意保证你的服务器上的python版本和你本机一致,且需要的包已经安装"></a>这里注意保证你的服务器上的python版本和你本机一致,且需要的包已经安装</h3><ul><li>如果你的服务器上的版本是2.*的可以运行下面代码安装python3<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">sudo apt-get remove python</span><br><span class="line">sudo apt-get install python3</span><br><span class="line">sudo apt autoremove</span><br></pre></td></tr></table></figure></li></ul><h3 id="用xshell进入服务器试着运行"><a href="#用xshell进入服务器试着运行" class="headerlink" title="用xshell进入服务器试着运行"></a>用xshell进入服务器试着运行</h3><p><img src="https://i.loli.net/2018/11/17/5befc966d6b17.png" alt="TIM图片20181117155505.png"></p><h2 id="在服务器端设置定时执行"><a href="#在服务器端设置定时执行" class="headerlink" title="在服务器端设置定时执行"></a>在服务器端设置定时执行</h2><h3 id="确保你安装了crontab(ubuntu默认安装)"><a href="#确保你安装了crontab(ubuntu默认安装)" class="headerlink" title="确保你安装了crontab(ubuntu默认安装)"></a>确保你安装了crontab(ubuntu默认安装)</h3><p>cron命名解析:执行的时间 + 执行的用户 + 执行的命令<br><img src="https://i.loli.net/2018/11/17/5befc8af89fb3.png" alt="4.png"></p><h3 id="查看原有的cron"><a href="#查看原有的cron" class="headerlink" title="查看原有的cron"></a>查看原有的cron</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cat /etc/crontab</span><br></pre></td></tr></table></figure><p><img src="https://i.loli.net/2018/11/17/5befc9f6040d2.png" alt="TIM图片20181117155728.png"></p><h3 id="编辑你的程序"><a href="#编辑你的程序" class="headerlink" title="编辑你的程序"></a>编辑你的程序</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo nano /etc/crontab</span><br></pre></td></tr></table></figure><p>编写你的命令,每天14:58给我发送邮件,这里根据你看的小说的更新时间设置,一天几更在大约什么时间等等<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">58 14 * * * root python3 smtp.py</span><br></pre></td></tr></table></figure></p><p>编辑好了再次查看cron是否已经写入,我这里已经写入<br><img src="https://i.loli.net/2018/11/17/5befcb198cbae.png" alt="TIM图片20181117160221.png"><br>重启crontab服务<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">service cron restart</span><br></pre></td></tr></table></figure></p><h2 id="静静的等待14-58的到来,查看邮箱"><a href="#静静的等待14-58的到来,查看邮箱" class="headerlink" title="静静的等待14:58的到来,查看邮箱"></a>静静的等待14:58的到来,查看邮箱</h2><ul><li>邮件收到了最新更新的哦<br><img src="https://i.loli.net/2018/11/17/5befcbd7281ec.png" alt="TIM图片20181117160515.png"></li></ul>]]></content>
<tags>
<tag> python </tag>
</tags>
</entry>
<entry>
<title>github+hexo搭建个人博客</title>
<link href="/2018/11/11/github-hexo%E6%90%AD%E5%BB%BA%E4%B8%AA%E4%BA%BA%E5%8D%9A%E5%AE%A2/"/>
<url>/2018/11/11/github-hexo%E6%90%AD%E5%BB%BA%E4%B8%AA%E4%BA%BA%E5%8D%9A%E5%AE%A2/</url>
<content type="html"><![CDATA[<h3 id="1-创建的项目名默认为-用户名-github-io-创建时点击生成readme文件,方便后面添加说明"><a href="#1-创建的项目名默认为-用户名-github-io-创建时点击生成readme文件,方便后面添加说明" class="headerlink" title="1.创建的项目名默认为 用户名.github.io,创建时点击生成readme文件,方便后面添加说明"></a>1.创建的项目名默认为 <strong>用户名.github.io</strong>,创建时点击生成readme文件,方便后面添加说明</h3><p><img aligen="center" src="https://i.loli.net/2018/11/13/5beaa5e07e5a7.png"></p><h3 id="2-在本地创建一个文件夹,我是在E盘创建的blog,推荐用vscode作为编辑器,在编辑器里面打开文件夹,打开Terminer"><a href="#2-在本地创建一个文件夹,我是在E盘创建的blog,推荐用vscode作为编辑器,在编辑器里面打开文件夹,打开Terminer" class="headerlink" title="2.在本地创建一个文件夹,我是在E盘创建的blog,推荐用vscode作为编辑器,在编辑器里面打开文件夹,打开Terminer"></a>2.在本地创建一个文件夹,我是在E盘创建的blog,推荐用vscode作为编辑器,在编辑器里面打开文件夹,打开Terminer</h3><p><img src="https://i.loli.net/2018/11/13/5beaacf147c83.png" alt="使用vscode打开文件夹"></p><h3 id="3-使用hexo初始化文件夹,这一步会产生很多的hexo配置文件,我们先不管,先跑起来"><a href="#3-使用hexo初始化文件夹,这一步会产生很多的hexo配置文件,我们先不管,先跑起来" class="headerlink" title="3.使用hexo初始化文件夹,这一步会产生很多的hexo配置文件,我们先不管,先跑起来"></a>3.使用hexo初始化文件夹,这一步会产生很多的hexo配置文件,我们先不管,先跑起来</h3><p><img src="https://i.loli.net/2018/11/13/5beaae3c7ee9d.png" alt="hexo初始化文件夹"></p><h3 id="4-运行hexo-server打开服务,看看本地能不能显示"><a href="#4-运行hexo-server打开服务,看看本地能不能显示" class="headerlink" title="4.运行hexo server打开服务,看看本地能不能显示"></a>4.运行hexo server打开服务,看看本地能不能显示</h3><p><img src="https://i.loli.net/2018/11/13/5beab03a63524.png" alt="hexo server"><br>运行后访问url,如果看到如图就成功了<br><img src="https://i.loli.net/2018/11/13/5beab09f5e2ab.jpg" alt="运行效果"></p><h3 id="5-配置文件中填写git的配置信息,按照如下格式填写"><a href="#5-配置文件中填写git的配置信息,按照如下格式填写" class="headerlink" title="5.配置文件中填写git的配置信息,按照如下格式填写"></a>5.配置文件中填写git的配置信息,按照如下格式填写</h3><p><img src="https://i.loli.net/2018/11/13/5beab1fb7f83a.png" alt="配置信息"></p><h3 id="6-打开文件夹,右键git-bash-here"><a href="#6-打开文件夹,右键git-bash-here" class="headerlink" title="6.打开文件夹,右键git bash here"></a>6.打开文件夹,右键git bash here</h3><p><img src="https://i.loli.net/2018/11/13/5beab3362770b.png" alt="git bash here"></p><h3 id="7-输入cd-ssh,进入ssh文件夹"><a href="#7-输入cd-ssh,进入ssh文件夹" class="headerlink" title="7.输入cd ~/.ssh,进入ssh文件夹"></a>7.输入cd ~/.ssh,进入ssh文件夹</h3><p><img src="https://i.loli.net/2018/11/13/5beab3e76e0c4.png" alt="ssh"></p><h3 id="8-配置git中的用户名和邮箱"><a href="#8-配置git中的用户名和邮箱" class="headerlink" title="8.配置git中的用户名和邮箱"></a>8.配置git中的用户名和邮箱</h3><p><img src="https://i.loli.net/2018/11/13/5beab93273357.png" alt="配置用户名"></p><h3 id="9-生成ssh密钥"><a href="#9-生成ssh密钥" class="headerlink" title="9.生成ssh密钥"></a>9.生成ssh密钥</h3><p><img src="https://i.loli.net/2018/11/13/5beab95f2f069.png" alt="生成密钥"></p><h3 id="10-在github的项目中加入密钥"><a href="#10-在github的项目中加入密钥" class="headerlink" title="10.在github的项目中加入密钥"></a>10.在github的项目中加入密钥</h3><p><img src="https://i.loli.net/2018/11/13/5beab988e1bda.png" alt="添加密钥"></p><h3 id="11-测试密钥链接是否成功"><a href="#11-测试密钥链接是否成功" class="headerlink" title="11.测试密钥链接是否成功"></a>11.测试密钥链接是否成功</h3><p><img src="https://i.loli.net/2018/11/13/5beab9fc045d7.png" alt="测试"></p><h3 id="12-测试成功后再再编辑器中运行"><a href="#12-测试成功后再再编辑器中运行" class="headerlink" title="12.测试成功后再再编辑器中运行"></a>12.测试成功后再再编辑器中运行</h3><pre><code>hexo cleanhexo ghexo d</code></pre><p><img src="https://i.loli.net/2018/11/13/5beaba9fb29d7.png" alt="4.png">这样就算上传成功</p><h3 id="13-访问你的博客,看到之前再本地运行的界面,就行了"><a href="#13-访问你的博客,看到之前再本地运行的界面,就行了" class="headerlink" title="13.访问你的博客,看到之前再本地运行的界面,就行了"></a>13.访问你的博客,看到之前再本地运行的界面,就行了</h3>]]></content>
<tags>
<tag> github </tag>
</tags>
</entry>
<entry>
<title>Hello World</title>
<link href="/2018/11/10/hello-world/"/>
<url>/2018/11/10/hello-world/</url>
<content type="html"><![CDATA[<p>Welcome to <a href="https://hexo.io/" target="_blank" rel="noopener">Hexo</a>! This is your very first post. Check <a href="https://hexo.io/docs/" target="_blank" rel="noopener">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a href="https://hexo.io/docs/troubleshooting.html" target="_blank" rel="noopener">troubleshooting</a> or you can ask me on <a href="https://github.com/hexojs/hexo/issues" target="_blank" rel="noopener">GitHub</a>.</p><h2 id="Quick-Start"><a href="#Quick-Start" class="headerlink" title="Quick Start"></a>Quick Start</h2><h3 id="Create-a-new-post"><a href="#Create-a-new-post" class="headerlink" title="Create a new post"></a>Create a new post</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo new <span class="string">"My New Post"</span></span><br></pre></td></tr></table></figure><p>More info: <a href="https://hexo.io/docs/writing.html" target="_blank" rel="noopener">Writing</a></p><h3 id="Run-server"><a href="#Run-server" class="headerlink" title="Run server"></a>Run server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo server</span><br></pre></td></tr></table></figure><p>More info: <a href="https://hexo.io/docs/server.html" target="_blank" rel="noopener">Server</a></p><h3 id="Generate-static-files"><a href="#Generate-static-files" class="headerlink" title="Generate static files"></a>Generate static files</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo generate</span><br></pre></td></tr></table></figure><p>More info: <a href="https://hexo.io/docs/generating.html" target="_blank" rel="noopener">Generating</a></p><h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerlink" title="Deploy to remote sites"></a>Deploy to remote sites</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo deploy</span><br></pre></td></tr></table></figure><p>More info: <a href="https://hexo.io/docs/deployment.html" target="_blank" rel="noopener">Deployment</a></p>]]></content>
</entry>
</search>