财务数据解释公司市值第一部分:数据获取

Posted by YU on May 21, 2019

聚宽财务数据用法介绍

1
2
3
4
5
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['axes.unicode_minus']=False
plt.rcParams['font.sans-serif']=['SimHei'] #指定默认字体 SimHei为黑体

首先要获取财务数据,在聚宽上调用 get_fundamentals, 传入要筛选的条件即可. 聚宽上四类财务数据分别存放到如下四个表中: 市值表, 利润表, 现金表, 负债表。

1
2
#获取沪深300成分股
stocks = get_index_stocks('000300.XSHG')

调用方法格式


get_fundamentals(query_object, date=None, statDate=None)


date和statDate参数只能传入一个:

  • 传入date时, 查询指定日期date所能看到的最近(对市值表来说, 最近一天, 对其他表来说, 最近一个季度)的数据, 我们会查找上市公司在这个日期之前(包括此日期)发布的数据, 不会有未来函数.

  • 传入statDate时, 查询statDate指定的季度或者年份的财务数据.

    注意:

    • 由于公司发布财报不及时, 一般是看不到当季度或年份的财务报表的, 回测中使用这个数据可能会有未来函数, 请注意规避.
    • 由于估值表每天更新, 当按季度或者年份查询时, 返回季度或者年份最后一天的数据
    • 对于年报数据, 我们目前只有现金流表和利润表, 当查询其他表时, 会返回该年份最后一个季报的数据
    • 当date和statDate都不传入时, 相当于使用date参数, date的默认值下面会描述.
1
2
3
4
5
6
7
8
### 获取单只股票在某一日期的市值数据
df = get_fundamentals(query(
        valuation
    ).filter(
        valuation.code == '000001.XSHE'
    ), date='2018-10-15')

df
id code pe_ratio turnover_ratio pb_ratio ps_ratio pcf_ratio capitalization market_cap circulating_cap circulating_market_cap day pe_ratio_lyr
0 18856883 000001.XSHE 7.2309 0.8169 0.8433 1.5932 1.7969 1717041.125 1735.9286 1717024.625 1735.9119 2018-10-15 7.486
1
2
# 取出总市值
df['market_cap'][0]
1
1735.9286
1
2
3
4
5
6
7
8
9
10
# 只选择表里的某些字段
### 获取多只股票在某一日期的市值, 利润, 现金流和负债数据
df = get_fundamentals(query(
        valuation.code, valuation.market_cap, valuation.pe_ratio, income.total_operating_revenue
    ).filter(
        # 这里不能使用 in 操作, 要使用in_()函数
        valuation.code.in_(['000001.XSHE', '600000.XSHG'])
    ), date='2015-10-15')

df
code market_cap pe_ratio total_operating_revenue
0 000001.XSHE 1598.2791 7.4984 2.590400e+10
1 600000.XSHG 2965.9021 6.1440 3.771000e+10
1
2
3
4
5
6
7
8
9
# 选出所有的总市值大于1000亿元, 市盈率小于10, 营业总收入大于200亿元的股票
df = get_fundamentals(query(
        valuation.code, valuation.market_cap, valuation.pe_ratio, income.total_operating_revenue
    ).filter(
        valuation.market_cap > 1000,
        valuation.pe_ratio < 10,
        income.total_operating_revenue > 2e10
    ), date='2015-10-15')



财务数据解释公司市值

1
2
3
4
5
6
7
8
9
stocks = get_index_stocks('399906.XSHE')
stocks
# 获取多只股票在某一日期的市值, 利润
df = get_fundamentals(query(
        valuation, income,indicator, balance
    ).filter(
    valuation.code.in_(stocks)
    ), date='2019-05-21')

给定日期的最新财务数据

1
2
3
#查询日期为2019-05-21,获得是2019-05-20这天能获取的财务数据。
df.head()
#800 rows × 178 columns
id code pe_ratio turnover_ratio pb_ratio ps_ratio pcf_ratio capitalization market_cap circulating_cap circulating_market_cap day pe_ratio_lyr id.1 code.1 day.1 pubDate statDate total_operating_revenue operating_revenue interest_income premiums_earned commission_income total_operating_cost operating_cost interest_expense commission_expense refunded_premiums net_pay_insurance_claims withdraw_insurance_contract_reserve policy_dividend_payout reinsurance_cost operating_tax_surcharges sale_expense administration_expense financial_expense asset_impairment_loss fair_value_variable_income investment_income invest_income_associates ... borrowing_capital trading_liability notes_payable accounts_payable advance_peceipts sold_buyback_secu_proceeds commission_payable salaries_payable taxs_payable interest_payable dividend_payable other_payable reinsurance_payables insurance_contract_reserves proxy_secu_proceeds receivings_from_vicariously_sold_securities non_current_liability_in_one_year other_current_liability total_current_liability longterm_loan bonds_payable longterm_account_payable specific_account_payable estimate_liability deferred_tax_liability other_non_current_liability total_non_current_liability total_liability paidin_capital capital_reserve_fund treasury_stock specific_reserves surplus_reserve_fund ordinary_risk_reserve_fund retained_profit foreign_currency_report_conv_diff equities_parent_company_owners minority_interests total_owner_equities total_sheet_owner_equities
0 35356284 000001.XSHE 8.2812 0.4580 0.9354 1.7544 21.0987 1.717041e+06 2125.6970 1.717025e+06 2125.6765 2019-05-20 8.5651 17903449 000001.XSHE 2019-05-20 2019-04-24 2019-03-31 3.247600e+10 3.247600e+10 4.266100e+10 NaN 1.157900e+10 9.924000e+09 NaN 2.188700e+10 2.019000e+09 NaN NaN NaN NaN NaN 3.040000e+08 NaN 9.620000e+09 NaN NaN -642000000.0 2.571000e+09 NaN ... 2.166900e+10 1.572100e+10 NaN NaN NaN 2.150200e+10 NaN 9.848001e+09 1.105300e+10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.017580e+11 NaN NaN 1.271000e+09 NaN NaN NaN 3.279242e+12 1.717000e+10 5.646500e+10 NaN NaN 1.078100e+10 3.985000e+10 1.016090e+11 NaN 2.509380e+11 NaN 2.509380e+11 3.530180e+12
1 35356285 000002.XSHE 9.0621 0.2635 1.9555 0.9774 6.4286 1.130214e+06 3080.9641 9.715170e+05 2648.3555 2019-05-20 9.1227 21894414 000002.XSHE 2019-05-20 2019-04-30 2019-03-31 4.837463e+10 4.837463e+10 NaN NaN NaN 4.225158e+10 3.142224e+10 NaN NaN NaN NaN NaN NaN NaN 4.437556e+09 1.371392e+09 2.639014e+09 1.841741e+09 NaN -35196480.0 -2.065606e+08 -312437184.0 ... NaN NaN NaN NaN 2.313821e+08 NaN NaN 4.440450e+09 1.433424e+10 NaN NaN 2.314350e+11 NaN NaN NaN NaN 6.156035e+10 4.635405e+10 1.115029e+12 1.261480e+11 5.261106e+10 NaN NaN 1.361696e+08 4.728514e+08 2.188339e+09 1.999485e+11 1.314977e+12 1.103915e+10 7.961989e+09 NaN NaN 4.739325e+10 NaN 9.284547e+10 NaN 1.575514e+11 7.863797e+10 2.361894e+11 1.551166e+12
2 35356288 000006.XSHE 10.7008 0.3713 1.1855 3.0648 3.1466 1.349995e+05 74.7897 1.347786e+05 74.6674 2019-05-20 8.5548 21977780 000006.XSHE 2019-05-20 2019-04-30 2019-03-31 4.134936e+08 4.134936e+08 NaN NaN NaN 3.444878e+08 2.511230e+08 NaN NaN NaN NaN NaN NaN NaN 2.051002e+07 2.214487e+06 2.797155e+07 2.901531e+07 13653447.0 NaN 6.441681e+07 NaN ... NaN NaN NaN NaN 2.288545e+09 NaN NaN 5.705216e+07 6.717919e+08 NaN NaN 6.829181e+08 NaN NaN NaN NaN 4.374894e+08 3.219049e+05 5.578220e+09 1.310600e+09 4.993531e+08 NaN NaN NaN 1.715713e+06 NaN 1.811669e+09 7.389889e+09 1.349995e+09 4.843216e+08 NaN NaN 1.483863e+09 NaN 2.990372e+09 NaN 6.308551e+09 3.212726e+08 6.629824e+09 1.401971e+10
3 35356290 000008.XSHE 32.6433 0.8227 1.4540 3.9970 -362.7030 2.780795e+05 106.5045 2.599124e+05 99.5465 2019-05-20 32.7654 22710880 000008.XSHE 2019-05-20 2019-04-27 2019-03-31 3.087470e+08 3.087470e+08 NaN NaN NaN 3.039850e+08 1.448895e+08 NaN NaN NaN NaN NaN NaN NaN 7.327400e+06 5.257320e+07 7.052305e+07 8.071832e+06 NaN -5482905.0 -4.299728e+05 NaN ... NaN NaN NaN NaN 1.919174e+08 NaN NaN 5.542671e+07 3.612078e+07 NaN NaN 3.779759e+08 NaN NaN NaN NaN NaN 1.849081e+07 2.813214e+09 2.344870e+08 NaN NaN NaN NaN 2.983337e+07 NaN 2.830551e+08 3.096269e+09 2.780795e+09 2.771978e+09 NaN 44309900.0 4.685366e+07 NaN 1.681122e+09 NaN 7.325059e+09 9.265987e+07 7.417719e+09 1.051399e+10
4 35356291 000009.XSHE 49.0255 0.7387 2.2674 1.0008 9.1161 2.149345e+05 119.9334 2.119789e+05 118.2842 2019-05-20 56.1012 23284630 000009.XSHE 2019-05-20 2019-04-30 2019-03-31 2.249621e+09 2.239609e+09 1.001234e+07 NaN NaN 2.117828e+09 1.438458e+09 NaN 2.091037e+06 NaN NaN NaN NaN NaN 4.292985e+07 2.564405e+08 1.624112e+08 1.306775e+08 685752.0 162383584.0 -6.207141e+07 -6610219.5 ... NaN NaN NaN NaN 2.233106e+09 NaN NaN 1.041870e+08 7.710143e+08 NaN NaN 1.438105e+09 NaN NaN NaN NaN 5.883695e+08 9.610228e+04 1.208985e+10 1.788000e+09 4.575568e+09 633191872.0 NaN 1.214505e+07 1.124689e+08 8.833930e+07 7.618770e+09 1.970862e+10 2.149345e+09 8.246493e+08 NaN 7931196.5 2.600634e+08 NaN 2.054833e+09 NaN 5.289418e+09 5.014061e+09 1.030348e+10 3.001209e+10
1
2
3
4
5
# 加载资产负债表的股票代码、总市值,资产、负债、股东权益、净利润、研发费用数据
#df[['code','market_cap','total_assets','total_liability','net_profit','development_expenditure']]
mydata = df[['code','market_cap','total_assets','total_liability','total_owner_equities','net_profit','development_expenditure']]
mydata.head()
#800 rows × 7 columns
code market_cap total_assets total_liability total_owner_equities net_profit development_expenditure
0 000001.XSHE 2125.6970 3.530180e+12 3.279242e+12 2.509380e+11 7.446000e+09 NaN
1 000002.XSHE 3080.9641 1.551166e+12 1.314977e+12 2.361894e+11 3.205934e+09 NaN
2 000006.XSHE 74.7897 1.401971e+10 7.389889e+09 6.629824e+09 1.156789e+08 NaN
3 000008.XSHE 106.5045 1.051399e+10 3.096269e+09 7.417719e+09 1.063877e+07 14590518.0
4 000009.XSHE 119.9334 3.001209e+10 1.970862e+10 1.030348e+10 1.859285e+08 26316454.0
1
mydata2 = pd.DataFrame()
1
2
#对数市值
mydata2['market_cap'] = mydata['market_cap'].apply(np.log)
1
2
#对数净资产,净资产即所有者权益,在聚宽上称股东权益
mydata2['net_assets'] = (mydata['total_assets']-mydata['total_liability']).apply(np.log)
1
2
#财务杠杆,负债/资产
mydata2['financial leverage'] = (mydata['total_liability']/mydata['total_assets'])
1
2
#对数净利润
mydata2['net_profit'] = mydata['net_profit'].apply(np.log)
1
2
#对数研发费用
mydata2['development_expenditure'] = mydata['development_expenditure'].apply(np.log).replace([np.nan,np.inf, -np.inf],0.0)
1
2
mydata2 = mydata2.dropna()
mydata2.head()
market_cap net_assets financial leverage net_profit development_expenditure
0 7.661855 26.248472 0.928916 22.730943 0.000000
1 8.032998 26.187900 0.847734 21.888269 0.000000
2 4.314680 22.614844 0.527107 18.566328 0.000000
3 4.668187 22.727137 0.294490 16.180015 16.495882
4 4.786937 23.055747 0.656689 19.040873 17.085705

市值解释模型

用线性模型与随机森林模型来解释市值

1) 线性市值解释模型

  • 模型形式为:
\[m_{it} = \alpha_{0t} IND_{it} + \alpha_{1t} b_{it} + \alpha_{2t} \ln NI_{it}^{+} + \alpha_{3t} I_{\lt 0} \ln NI_{it}^{-} + \alpha_{4t} LEV_{it} + \alpha_{5t} g_{it} + \alpha_{6t} RD_{it} + \epsilon_{it}\]
  • 其中, $m_{it}$为股票i在t时候的对数市值,$IND_{it}$为行业哑变量,$b_{it}$为股票对数净资产,$NI$为公司净利润,这里根据净利润的正负拆分了两个变量,$LEV_{it}$为公司财务杠杆,即负债除以资产, $g_{it}$为季度同比的营业收入增长率, $RD_{it}$为开发支出。

2) 随机森林非线性市值解释模型

  • 将线性模型的自变量作为随机森林模型的输入特征,对数市值作为数据的标签,进行训练。

下面用的数据未使用$IND_{it}$行业哑变量和$g_{it}$

OLS线性回归和随机森林回归

1
2
3
4
5
6
7
8
9
10
11
import statsmodels.api as sm
import scipy.stats as st
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import r2_score


#对数市值
y = mydata2['market_cap']
#输入标签
x = mydata2[['net_assets','financial leverage','net_profit','development_expenditure']]
1
2
3
4
5
6
#OLS线性回归
model_1 = sm.OLS(y, x, missing='drop').fit()
predict_market_1 = model_1.predict(x)
model_1_resid = model_1.resid
model_1_r2 = r2_score(y, predict_market_1)
print(model_1_r2)
1
0.5127315745827835
1
2
3
4
5
6
7
8
#随机森林回归
model_2 = RandomForestRegressor()
model_2.fit(x,y)
predict_market_2 = model_2.predict(x)

model_2_resid = y - predict_market_2
model_2_r2 = r2_score(y, predict_market_2)
print(model_2_r2)
1
0.94671393031602
1
2
print('线性模型的拟合R2:%.2f'%model_1_r2)
print('随机森林模型的拟合R2:%.2f'%model_2_r2)
1
2
线性模型的拟合R2:0.51
随机森林模型的拟合R2:0.95

残差

1
2
3
4
5
6
plt.figure(figsize = (15,5))
kwargs = dict(histtype='stepfilled', alpha=0.3, density=True, bins=40)
plt.hist(model_1_resid,label = 'OLS回归',**kwargs)
plt.hist(model_2_resid,label = '随机森林回归',**kwargs)
plt.title('2019年5月20日一期残差分布')
plt.legend()

本文采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议(CC BY-NC-ND 4.0)进行许可,转载请注明出处,请勿用于任何商业用途采用。

☛决定关注我了吗☚