11/15/2012

The Signal and The Noise by Nate Silver

作者 Nate Silver 擅長於統計分析  曾利用工作之餘自行開發出分析棒球球員能力的工具  有點像魔球(Money Ball)  另外他也喜歡玩德州撲克  後來也曾辭掉工作而從事職業撲克  不過因一些法規改變導致而到更專業的網站賭博而虧了些錢後  就放棄以賭博為生的日子  他也做一些美國大選的預測  由於有相當的準確性  因此目前在紐約時報有一專欄 專門分析美國大選

他的新書 The Signal and the Noise 主旨在於說明有哪些事是比較容易做預測的  哪些較不容易  書中有提到  選舉  棒球  籃球運動彩券  西洋棋  天氣  地震  撲克  股市  傳染病  經濟 恐怖活動等等

書中提到信號和雜訊常混在一起  所以如果誤把雜訊當信號 則會影響到預測的準確性

另一方面很多科學太強調簡化的觀念  簡單的模式固然容易懂  但有些行為是較複雜的  如果用這簡單的模式來預測就會不切實際

書中也提到一個簡單的機率分析概念<貝氏定理>  最通常的應用在於醫學的檢測上面  例如若 癌症基因檢測為陽性  那真的有癌細胞的可能性有多少  這跟此癌症在人口的比率有直接的關係  簡單的說明如下

人口中可分為 有癌症基因 與 無癌症基因 的兩族群

有癌症基因檢驗出陽性的機率為 X

無癌症基因檢驗出陽性的機率為 Y

那檢驗出陽性且有癌症基因的機率為

X * 有癌症基因的人口比率 / (  X * 有癌症基因的人口比率 + Y * 無癌症基因的人口比率 )

以大腸癌為例  假設大腸癌發生率約為每10萬人有307.4人  X 假設為 99%  Y 假設為 5%

檢驗出陽性且有癌症基因的機率為  0.003074*0.99 / (0.003074*0.99 + (1-0.003074)*0.05)
約為 5.75%

所以即使 X 高達 99%  實際上 驗出陽性且有癌症基因的機率僅為 5.75%

榮總有一癌症篩檢快報

  癌症 - 陽性人數 - 確診癌病(比率)
子宮頸癌 - 104 - 56(53.9%)
  乳癌 - 630 - 26(4.1%)
 直腸癌 - 303 - 19(6.3%)
 口腔癌 - 201 - 4(2%)

由此可知即使篩檢陽性的正確性很高也並非意味著確診的比率會很高

不過如果把發生率提高十倍  把 307.4 改為 3074
則 0.03074*0.99 / (0.03074*0.99 + (1-0.03074)*0.05) = 38.57%

由此可看出癌症的人口發生率與確診的比率有相當大的關係

* * *
有一個簡單的觀念可以加深大家對<貝氏定理>的印象  例如下雨地上會濕 X=100%   那如果地上濕了 下雨的機率有多少呢  顯然不會是100%  也不會相當高

地上濕了是因為下雨的機率   跟 { 下雨的機率 與 不下雨地上會濕的機率(Y) 還有 X } 有關

* * *
有些時候事件的發生率並無法掌握得很好  就必須用推估的  然後根據結果再回去修正  例如先假設一癌症發生率去推估確診比率  再用實際的確診比率去修正假設的癌症發生率  這樣循環下去就會得到更接近真實的資料

這也意味著一個模型通常僅是真實世界的一個簡化的版本  經過不斷的修正會慢慢接近真實的世界  當然也必須謹慎的選用資料  避免把雜訊當成正確的信號來處理

書中還提到很多有關於預測與賭博的趣事  有興趣的人不妨參考看看

The Signal and the Noise: Why So Many Predictions Fail-but Some Don't




10/14/2012

投資與生產力

有關退休與理財的書 都教導人們要儲蓄投資防老  一般投資獲利都需以扣掉通膨的實質報酬為準

投資有正的實質報酬意味著 現有的淨資產在未來可以換到較多的物品或服務  如果一個社會的勞動人口沒有增加 生產力也沒增加 而社會的總淨資產增加  這意味著除非能由外部輸入物品或服務 這種淨資產增加而勞動人口與生產力沒增加的狀況 將會很難持久

所以總淨資產的增加 須靠勞動與生產力的總效用提升來支撐

舉例來說  假設現有的總資產可換 10 串香蕉和10個餐點服務  經十年後可換得 20 串香蕉和20個餐點服務   這些需靠當時社會的生產來提供  若社會的生產沒有增加  則這增加的香蕉與餐點服務並沒有辦法履約  事實上這是一種矛盾的現象  也就是若社會的生產沒有增加  10 串香蕉和10個餐點服務 十年後還是 10 串香蕉和10個餐點服務

9/30/2012

南懷瑾先生

南懷瑾先生有不少言論著作 他讀過相當多有關於佛釋道禪易等的古籍  也似乎練就一些養生之術  從他的著作 個人感覺他是以說書的方式在說一本書或講一個道理  也有他個人獨到的見解  有些可能與一般的見解有所歧異  不過大致上 還是有相通之處

雖說他修行已有相當長的一段時間 但他還是很自謙的說明一些修行時容易疏忽之處或要注意的誤點  我並不是很了解他對於一些輪迴之說或各種道理 到底是以解釋的方式來做解釋 還是以悟道後的方式來做解釋  無論如何  南懷瑾先生已離開這人世 也許這時他又有另一番體會了吧 也許他不會那麼意外吧

Outsourcing and Economy

What has outsourcing done to our economy?  One of the intentions to outsource the productions or services to other countries is cost cutting.  But the unfavorite consequences is that the original production labors or service labors are unemployed.  The unemployment could be temporary or permanent, it depends on how the worker's effort to get back to the job market.

As the worker is unemployed, it comes the unemployment benefits for a certain period of time.  After that if the worker still cannot find an employment, he might apply for some kind of welfare if he is in trouble of surviving.

So what's the macro economic view of the outsourcing?  As we know, money cannot come from nowhere and go nowhere.  The original payment to the worker is decomposed to: 1. the payment to the new worker(outsourcee) 2. the social benefits to the unemployed worker 3. the new extra profit to the outsourcer.  As item 1 and 3 are quite straight forward, where does the item 2 come from?  Don't forget that our society is a humane society.  We don't let our people die in hunger.

Wait a second, who pays the item 2 after all?  The benefit is paid and shared by all the tax payers, not just from the outsourcer.  So when a worker loses his job due to the outsourcing, not only his income is compromised, the other employed workers also have to share the pain.  The outsourcer share some pain too, but he also gets his extra profit.

Well, the other workers might get some benefits from the outsourcing due to the goods might be cheaper, or he gets hired to do the outsourcing work.  Really?  How do you verify this?  An easy way to check is that just asking yourself if you can buy more goods with your income.  If yes, it means outsourcing is making you richer and living better, because you get benefits from it.  If no, it means outsourcing is making you poorer, and you actually pay for it.

If outsourcing is actually making average people poorer in a developed society, how to fix it?  A simple idea might come from the concept of self sustainable society.  In a self sustainable society, they try to make their goods by themselves.

For example, if the company Xpple sells N Xphones in our society, Xpple should at least make N Xphones in our society.  Xpple is free to make money on the other M Xphones that are sold to other societies by making those M Xphones anywhere Xpple wants.  That's fair, isn't it.

Well, people might argue this approach does not work, is not good, or what so ever.  The key concept is to make our society sustainable.   A society with high inequality will not sustain quite well after all.

Is X related to Y?

As we are investors and speculators, we always want to forecast the future, so we would like to find correlations between Xs and Y.  Xs are any kind of historical data that could come from macro-economic indicators, weather reports, fundamental analysis, technical indicators, and almost anything that you can think of.  Y is what we want to forecast, most are like stock prices, prices of commodities, or market indices.

So what we usually do is to find the "correlation" between X and Y, defined as corr(X,Y).  Basically we can find all the corr(Xi,Y).  So if corr(Xi,Y) is closer to 1 or -1, we might think there is some kind of correlation between X and Y, assuming you have enough samples of data points.  What if corr(Xi,Y) is almost zero?  Is there no correlation at all? or no relationships at all?

When we exam all the corr(Xi,Y), we tempt to pick those value closer to 1 or -1, and use those Xi to do more advanced analysis.  We used to neglect the Xi, whose corr(Xi,Y) is close to zero.  Is this approach valid?

Let's take a look at a simple example, although corr(Xi,Y) is close to zero, actually Xi is "correlated" to Y, or the causation does exist.

Let's say, Y=XOR(X1,X2), X1 and X2 are random numbers of -1 and 1 with probability prob(-1)=prob(1)=0.5.  corr(X1,Y) and corr(X2,Y) are both zero.  But the relationship between (X1,X2) and Y does exist.

Most people who know XOR know how this trick works, but in real life, it's hard to imagine how this could be applied to data analysis.

A weird thing could happen is like that, when we try to find all the possible Xs:=X to forecast Y, and we define Y as f(X), f could be any kind of function, and we calculate corr( f(X),Y).  We want to find the best f, such that f makes corr(f(X),Y) close to 1.  So we can use f(X) to do some forecast.  As we know, X might not be complete, there might be some Xu not observed, or ignored due to corr(Xu,Y) is close to zero.  And corr(fu(X),Y) might be close to zero for some fu, either.  But actually Y=XOR(Xu,fu(X)).  If we ignore Xu or fu, there is no chance to find this relationship.

It means we have to pay attentions to those Xu and fu, even corr(Xu,Y) or corr(fu(X),Y) is close to zero.  So having no correlation(individually) could mean having correlation(communally)?

How many Xu and fu are there to check?


A Simple Compiler - finding the resource first

Do you want to build a simple compiler?

I always wanted to build a simple compiler when I was in the university, though my major was not CS.  I have tried some simple projects since then, such as using C style programming to implement LISP AI projects.  So basically I had to implement most of the LISP basic functions in C, and modify the original LISP codes to more C-like function calls.  It's not that difficult to do so, just it took a while to figure out the correct memory management schemes.


The other project I have tried to do is using C# to implement Prolog-like logic programming.  There is a Japanese education site that gives most of the details about the implementation.  Its starting chapter is Prolog in Python, if you are not familiar with Prolog, you'd better start from there.


So what if you want to build a compiler or interpreter for a language like VB or C, or other object oriented programming languages.  Most of the textbooks are quite complicated, it talks a lot about the details of the theory of a compiler.  But if you just want to start from a simple one and DIY, basically you don't want to know too much about the details.


I found there is an easy start with a Jack compiler that is mentioned in the book, The Elements of Computer Systems, its mainly in chapter 9, 10, & 11.  An education site for the study plan is here.  Some people have implemented it using java or C#.


If you are interested in Open Source, you may take a look at this book, flex & bison.  So what is flex and what is bison?  These are two great tools to help you build your compiler.  Basically you only have to define the grammar of your language, and they can do the lexical analysis and do the parse for you.  So your work can be focused on the code generation part.


So a basic compiler can be done by 3 workers: tokenizer, parser, and code generator.

4/09/2012

$OEXA200R Market Indicator

An article by John F. Carlucci on dshort.com mentioned a market indicator using OEX stocks above the 200 day moving average, $OEXA200R, with the other indicators as buy/sell signals for the general US market.  The introduction of the methodology is here.

You can find the daily update of the $OEXA200R by Michael E. Pitre.

I created and monitored the OEX portfolio at finance.yahoo.com.  For S&P500 at current level 1380, if it drops another 3.5+%,  bulls should be cautious according to the methodology mentioned before.

4/08/2012

技術進步 產能過剩 與失業

技術越進步 效率越高 生產一件物品的人力需求與成本就會越來越低  而人類的需求成長隨著經濟發達也會逐漸降低  也就是 長期而言需求成長會變緩  如果生產效率成長比需求成長快  就會造成產能過剩 而造成企業裁員  如果沒有新的需求與新的企業來吸收失業的人口  失業問題就不容易解決

當遇到經濟萎縮時 也會遇到類似因需求驟減所導致失業的問題

另一方面 如果因自由貿易 企業將工作機會外移到國外成本較低的國家 也會造成國內失業的問題  這問題有點像機器人取代人力的問題

人類的基本消費可分 食衣住行 健保 教育與娛樂等  在先進的國家 一般食衣住行的消費 都成長不高  生活進步後 人們就會比較重視健保教育與娛樂  而在開發中國家 很多多以改進生活中的食衣住行基本消費為主  經濟較好的家庭則會多注重健保教育與娛樂

但整體而言 要增加食衣住行的基本消費 並不容易  例如人一天平均攝取 2000 大卡 很難說要再每年成長5%  一個家庭擁有一棟房子 如果要每年成長 5% 也不容易 行也是如此  總而言之 不可能為了消費成長而消費

在先進國家 也許食衣住行在量的方面不容易提升 但也是可以在質的方面提升 但畢竟有其界限

當量的問題解決  再過來就是質的問題  當質的問題也解決  要再有所謂的進步就難了  就好像考試一樣  從六十進步到七十比較容易  要從九十再進到一百就較難了

失業的問題 在於人對社會的貢獻的問題  你的薪資代表你從貢獻給社會的價值中所得到的回報 當然沒有工作並不意味著沒有價值  例如花草樹木 飛禽走獸 太陽也沒工作  但她們還是有價值  只不過從某個角度而言 她們可以不需要工作 或說不需要替人類工作來獲得報酬

有很多人都曾嘗試盡量過著自給自足的生活 例如 The New Good Life: Living Better Than Ever in an Age of Less 一書的作者 John Robbins 就與她的妻子小孩在加拿大過著近乎自給自足的生活   Early Retirement Extreme 一書的作者 Jacob Lund Fisker 嘗試盡可能早退休 利用之前工作所儲蓄的收入做投資 以輕便的方式過生活 當然他最近又回去職場工作  不過要提早退休 達到經濟上完全的自主性 還是要靠智慧與儲蓄

個人與家庭如何解決民生的問題 與一個國家社會如何解決其民生的問題 相去不遠  一味追求經濟成長而犧牲掉居住的環境  家庭的生活 人民的健康是沒有太大意義的

4/03/2012

Ed Thorp and Fortune's Formula - Part I

Ed Thorp and etc. wrote an article, "How does the Fortune's Formula-Kelly capital growth model perform?", which is published on The Journal of Portfolio Management, summer 2011.

Kelly capital growth model is a constant rebalanced portfolio strategy.  An article, "Dynamic Strategies for Asset Allocation", by Perold and Sharpe gives you a comparison of pros and cons of different investment strategies.  Investors can get a simple idea about in which environment people can get some advantages from rebalancing their portfolio periodically.

From Ed Thorp's point view, the Kelly strategy is good for long term investing.  Just investors have to choose a suitable factor to control the risks.  A full Kelly strategy is very risky for the short term.  Their article shows the results of different scenarios using full, 3/4, 1/2, 1/4, and 1/8 Kelly strategies.  Actually we don't have to run such simulations to get the sense of how Kelly strategies perform, we can just use a spreadsheet with simple calculations that would give you a better idea how Kelly strategies work and how Log utility works.

Investors should know that when we talk about the performance of an investment, it's a multidimensional thing.  We have to consider the risks, rewards, liquidity, taxes, transaction fees, and etc.  No matter how, we need to consider at least two components, risks and rewards.

An interesting point is how to compare (risks, rewards) together.  Some economists use Utility to describe human's behavior such as risk neutral or risk averse.  The utility curve of (risks, rewards) is different to different investors due to their preferences.  By the way, people should notice that Utility is not the only approach and it's not perfect.

The other thing to consider is when we say a strategy A is better than the strategy B.  What do we really mean?  Do we mean the expected return of A is better than the expected return of B?  Do we mean the chance of A's final asset larger than B's final asset is greater than 1/2?  Do we mean the expected growth rate of A's portfolio is better than the expected growth rate of B's portfolio?

3/08/2012

Withdrawal Rate and Investment Return

Here is the derivation of the withdrawal rate of a retirement portfolio. The value of the original portfolio is P, the initial withdrawal amount is x, the inflation rate is fi, the real return rate is ri, and the term is n.  The withdrawal amount is adjusted according to the inflation rate.

The withdrawal rate is x/P.  From the denominator, we can seen the effect of the real return, r1 > r2 > ... > rn.
Withdrawal Rate

Kelly Betting and Logarithmic Utility

proportional betting factor, l, betting n times, (W,L) probability (p,q).
From the derivation, we can see that there is no need to set n to infinity.
Kelly Betting and Logarithmic Utility

3/06/2012

Kelly Criterion and Betting

Kelly Criterion is used as a betting method mentioned in a lot of articles.  One of the famous proponents is Ed Thorp, he wrote several articles about how to use Kelly criterion in betting and stock investing.

But does Kelly betting really work or is it just an illusion?

The major problem with Kelly betting is that the volatility is large, so sometimes people would recommend half Kelly.  Not many articles discuss about the analysis of the "Risk and Rewards" of Kelly Criterion.  Usually they use Monte Carlo simulation to show the results of Kelly betting, half Kelly betting, or 0.25 Kelly betting.

I don't intend to challenge the correctness of Kelly's formula, I just want to use the simple probability concept to show the relationship between fractional betting and Kelly betting.

Let's say, there are two betting outcome W and L with probability (p,q).  p + q =  1.  We make N bets.

If N=2, the possible outcomes would be (WW,WL,LW,LL) with probability (pp,pq,qp,qq).  If the sequence doesn't matter, the outcomes would be (WW, WL, LL) with probability (pp,2pq,qq).

What's the Expected value of the outcomes?  It's the payoff of each outcome multiplied by its probability.

What is Kelly Criterion trying to do anyway?

Let's say, N=1,000, p=60%, q=40%.  So Kelly Criterion is trying to maximize the payoff of the outcome of (W*600, L*400) and then there comes the factor f for the proportional betting.

But the truth is that there are 1001 different outcomes from W*1000, (W*999, L),..., to (W, L*999) and L*1000.  The real expected value of the N=1000 betting with factor f would not be like what Kelly Criterion shows.  That's why the result of the Monte Carlo simulation always cannot be explained well by the author.  You can easily spot something wrong within the data.

If Wp+Lq is favorable, it's always best to choose f=1 to maximize the expected value of the proportional betting, just the volatility will be very large as well.  So the key is to choose the right f to give the appropriate risks and rewards.

2/20/2012

國債問題 大到不能倒?

現在歐美日大國都負債累累 通常經濟學家都喜歡以 "負債對GDP比(debt to GDP ratio)" 為參考目標  這當然只是一個指標 準不準另當別論  個人認為還是需參考 "負債對國民資產比" 比較好  就像是企業的經營一樣  通常會看資產負債比  若是跟GDP比 則以負債之利息支出對GDP比 個人認為較適當

若是以 dept to GDP ratio 來看 日本是算高的 約為200%  但以 interest payment to GDP ratio 來看日本只有 1.1%  反而算是低的

但國債怎麼還呢? 真的需要還嗎?

國債要還 要政府收支黑字才有可能  以美國為例 數十年來也沒幾年收支出現黑字 基本上都是靠通膨把 debt to GDP ratio 降低  這大概是唯一途徑

日本長期處於通貨緊縮的迷霧中 所以要降低 debt to GDP ratio 真是難上加難  不過好在日本是欠日幣 日本中央銀行必要時也會使點力 另一方面 經濟不振 也很難有比國債更安全的投資  日本的股市與不動產很難是好的標的  所以錢也沒地方跑

希臘位於歐元區 但又不能隨意印歐元鈔票 另一方面從國債借的錢發給老百姓 老百姓卻把錢存到國外去 所以一旦發生危險就無法從國內借到錢 只能靠外援了

其實類似希臘的問題也很可能發生在美國的地方政府  不過本地州民買本地的州債有免稅優惠 所以州民拿到的錢比較不會去買別州的州債或去買國債 除非倒債風險真的是很高

歐洲的國債問題相對其他地區複雜許多 因為歐元在歐元區內會流到財務較佳的國家 財務狀況差的國家要舉債就相對困難 若是本國競爭力不足 要償債真是難上加難  在全球競爭下歐元區的國家要有能力吸引歐元 可能需得要有貿易入超或金融收支入超  歐元區內有此能力的國家本來應該財務相對穩定 但因為銀行吸引過多歐元 又把錢借去給經濟不好的國家 以至連帶承受被倒債的風險  所以ㄧ時要解決也不太容易  不是靠樽節就能解決的