Category Archives: Geeky

Book Review: “How Not To Be Wrong: The Power of Mathematical Thinking” by Jordan Ellenberg

This is another book that a mathematician attempt to make some sense of the real world problems using mathematics. It’s supposed to answer people’s or his students’ question: when we’ll ever use this stuff? As an engineer, I have benefited a great deal with mathematics. Otherwise, my life could be very miserable and the world would be in a very different shape than it’s now. But this book is not for the light-hearted – unless you’re curious about some of the topics, this book could be overwhelming and hard to digest.

My takeaways from this book:

The missing bullet holes. The focus of the book is on the profound/simple quadrant.

I. Linearity:
when things are not linear, there’s a min/max – like the Laffer curve on a napkin.
a. Linear regression – each extra SAT point could cost you $28 in tuition.
b. Don’t always extrapolate linearly – obesity apocalypse (100% obese).
c. Law of large number: converges to 50% for coin toss when the number of tries go up. NBA best free shot throwers play least games – small number.
d. Large number of tries dilutes the previous results – not change of probability. Very important lesson.
e. Don’t talk about % of numbers when numbers can be negative.

II. Inference:
a. The Baltimore broker: They send you the correct stock prediction by process of elimination. By keep trimming off the mailing list of their incorrect prediction, they ensure all the remaining ones get the correct prediction. From them, they’ll have the confidence of the people and send them their money.
b. Reductio Ad Unlikely: Suppose null H is true, it follows from H that certain outcome O is very improbable (< 5%), but O is actually observed. Therefore, H is very improbable. Bible coders. III. Expectation a. Massachusetts State lottery: expected value should be average value. Playing the WinFall. b. Utility: maximize the utility vs. missing the plane. Stigler's argument: “If you never miss the plane, you're spending too much time in airports.” c. Tying geometry to picking the “random” lottery number, and hamming code. IV. Regression a. Triumph of mediocrity. Scatter plot of father-son height (oval shape), b. Correlation is not transitive (e.g. blood relation). c. Berkson's fallacy: Mean-nice vs. ugly handsome curve. V. Existence a. Public opinion doesn't exist b. Bush/Gore/Nadar election: how best to elect public officials when there are more than 3 candidates. c. Condorcet Paradoxes d. How to be right. General Comments: 1. eBook or hardcopy book is probably better than the audiobook. Easier to visualize on a physical book. 2. Good history of mathematicians and some of how the theorems came about. 3. Not for the faint of heart. Some mathematics are required of interest in it.

Don’t Try to Predict the Future; Be a Now-ist

Joi Ito strikes a chord with me and the central theme of this blog on how to deal with the complex world we’re living today by “having a compass and know where you’re going” then “focus on being connected, always learning, fully aware, and super present.” Don’t try to predict the future. Experiment, learn and improve continuously. Be a “now-ist.”

This is one of the great TED talk videos:

Say Hello and Goodbye to My Acer Chromebook – Why I’m Ditching It

Lately, I’ve been hearing so much about Chromebooks, I even played with it at Best Buy (thanks to their “Showrooming” ;-). When their prices dropped below $200 for a brand new Acer C720, I decided to get one to play with it. I read all about its limitations and the potentials for the expanded capability after installing Ubuntu through Chroot (Crouton) and dual-boot (ChrUbuntu) methods. I have used Enterprise Linux version at work and would like to try the “client” Linux like Ubuntu for the fun of it. Because I knew I wanted to play with Ubuntu Linux so I decided to buy a used 32GB (vs. standard 16GB) flash storage version from eBay for $169. By the way, 32GB is needed if you want to run ChrUbuntu since a full Ubuntu by itself would probably need 24GB on its own if you plan to keep the Chrome OS as a dual-boot.

Since I received the Chromebook on 6/27/14, I have spent numerous hours playing and hacking it. My impression of Chrome OS is that it’s really for the consumers of digital contents due to its limitations. Here are the pluses and minuses:
Pluses:
1) Fast boot and wake up from sleep (almost instantaneous like a tablet).
2) Excellent browsing experience. The two-finger scrolling and three-finger flipping through the browser tabs are nice. Of course, having a physical keyboard makes a big difference as compared to browsing on an iPad. And the trackpad on Acer C720 works really well with sufficient immunity from accidental palm touchings.
3) Most apps are responsive and fast, thanks to the Intel Haswell Celeron 2955U 1.4 GHz CPU, 2GB DRAM’s and 32GB flash drive.
4) Seamless integration with Chromecast. (More on Chromecast on a later blog).

Minuses:
1) No email clients like Thunderbird except webmails like gmail, which works very well but not good enough for work-related emails.
2) No use of OpenOffices.
3) No IPsec VPN (the one my employer uses)
4) Can’t run Java apps. Java plugins cannot be installed. I sensed some bad blood between Google and Oracle such that Google refused to put Java on Chromebook at the time of this writing.

If you don’t plan to do lots of emails and are mainly browsing the web and run only Chrome Apps, then Chromebook may be the right laptop for you as a supplement to your tablets. Since the #1 and #3 minuses are show stoppers for me, I need to add on Linux to mitigate them. There are two options: Chroot and ChrUbuntu.

I first installed Chroot to enable running Ubuntu along side the Chrome OS in “Developer” mode. This was ideal as I would have the best of the both worlds: Chrome OS and Ubuntu – switching back and forth with simple CTL-ALT strokes. The only problem was that I couldn’t install VPN properly on it – neither Cisco AnyConnect nor OpenConnect. I suspect that the Chrome OS, running in parallel, may be causing conflicts. I gave up on it after 3 days of intensive hacking.

Then I decided that install Ubuntu as a duel-boot partition. I followed the installation directions here. After a couple of hours of downloading and installation, I was able to boot to Ubuntu and installed Cisco AnyConnect VPN. I was now in business.

Then after playing with the Ubuntu on Chromebook, I discovered a few quirks that really got me to wonder why I bothered with Ubuntu.
1. Some system settings don’t work right, like disabling the mouse while typing to avoid cursor movement. I had to type in a command to enable it manually (“/usr/bin/syndaemon -i 1.5 -K d”) and I had a hard time putting in the autostart service.
2. Locale issue: Constant “Locale” warnings popped up when running a shell. I fixed it with this locale tip.
3. Font sizes: I had a hard time fixing the font sizes on Ubuntu. It was either too small or too big.
4. The touchpad no longer works after I accidentally disabled it and wasn’t able to bring back the driver. Sigh! (This was fixed by following the directions in this link).

So at the end, I started to miss and appreciate Microsoft’s Windows 7 or even Windows 8. With Ubuntu on Chromebook, I would be wasting lots of my time fixing some minor Linux issues which seem to pop up here and there, unless I revert it back to the standard Chrome OS Chromebook, which would not do better than an iPad or an Android tablet. Then I looked around and found that for an extra $20~30, I could have bought a cheap full Windows laptop instead!

My conclusion is that, at this time, Chrome OS Chromebook and Ubuntu Chromebook are not ready for prime time. I like the pluses but the minuses are too great for me to ignore. I will most likely be reselling my Chromebook on eBay…

Installing Thunderbird on Oracle Linux or Redhat Linux

Geek alert! If you don’t do system administration on a server, this is probably not very interesting to you. But if you’re curious about installing the Thunderbird email client on a Linux Server and how it’s different from your regular click-and-install on a Windows or Mac PC, then you may want to read on.

Oracle Linux is a freely downloaded operating system that anyone could download to your server or PC and enjoy an instant power over a very powerful server, commonly used to facilitate data processing on the Internet. You use them when you shop on Amazon, search on Google or use Google Drive and etc. So what’s a big deal in installing Thunderbird on a server?

Well, installing software on a Linux system is NOT straightforward. This is why Microsoft Windows remains the gorilla in the PC world, albeit a declining one. For several times over the span of months, I attempted to download Thunderbird directly from http://www.mozilla.org/en-US/thunderbird/ and install it on the Oracle Linux server that I manage, I kept running into all kinds of problem. Like after installation, Thunderbird would not run, complaining of “libxul.so: cannot open shared object file: No such file or directory Couldn’t load XPCOM.” And then the software would just die. I tried on several machines and kept running into the same problem. Many people are having the exact problem based on my Google search.

After googling around and looking for a solution for over a month, I discovered that the standard Thunderbird download site contains a simple Linux version of the Thunderbird, destined for mostly client-based Linux like Fedora, or others. (Yes, there are many Linux variants: those for the PC/Clients and those for the servers.) But it would not work properly in the “Enterprise”-class type of this Linux variant – Oracle Linux.

1. You’d need to do a software update. This is done with the “yum install” command to bring all the software modules up to date. This Oracle Public Yum site helped me.

2. Next, download the latest Thunderbird from this website
using your favorite browser.

3. Find the rpm module from this findrpm website instead of from Thunder Download. Keep in mind that you’d need to look for CentOS version as it’s mostly compatible with Oracle Linux. See below screenshot:
findrpm

4. Download the correct version:
centos_rpm
Look for your corresponding OS version. I had a Oracle Linux 5.10 so I looked for “x86_64” version of CentOS 5.10 if you have a 64-bit version, otherwise look for the “i386” version.

5. Install with “rpm -i Thunderbird-xxxxx.rpm”. You can try to run it but for the Oracle Linux 5 version, there is another rpm, Launchmail, to install before it’s fully functional.

6. Download and install “Launchmail”:
centos_rpm

There you have it. You can now run the Thunderbird from /usr/bin/thunderbird .

Unlike the PC or Mac OS version, Thunderbird for Enterprise-class Linux is a pain to install. No wonder people are stuck with Microsoft and MacOS because of the ease of installation and other numerous ease-of-use reasons. Hopefully, this tip helps. I surely hoped someone else had written this before I spent many hours getting Thunderbird to work. However, I got to understand how fragmented Linux Apps are. There were many details I wish I didn’t have to deal with. Just double click and install automatically. Until that happens, Linux is not going to catch on for most people.

Bayesian Theorem for the Practical Thinkers

I have been reading up on Bayesian Theorem, as you might have noticed based on my past blog posts. What really gets me is how hard it is to think in practical terms for people in general without a degree in Statistics and without resorting to complicated math. Because Bayesian Theorem/Inference is so useful in our daily lives, I would like to share my shortcut so people can calculate the probability using a simple 10-key calculator instead of a computer.

The shortcut is to always think in terms of odds instead of probability. The power of Bayesian Theorem is to take the base rate and after some new evidences provided the modified rate.

The best way to learn this is to use some examples:

Example 1:
From this blog, here is an example:

1% of women have breast cancer (and therefore 99% do not).
80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
9.6% of mammograms detect breast cancer when it’s not there (and therefore 90.4% correctly return a negative result).
What’s the probability of having the breast cancer once detected positive by mammograms?

The trick is to think of the base odd of getting the breast cancer:
1% to 99% = 1 to 99
Now think of the odd the evidence provided by mammograms:
80% to 9.6% = 8.33 to 1

So the odd of having the breast cancer when tested positive by mammograms are:
1/99 * 8.33/1 = 8.33/99 = 0.084 or “odd of 0.084 to 1”

Now you must convert the odd to probability (if probability is what you’re looking for):
0.084 to 1 odd = 0.084/(1+0.084) = 7.8%

Example 2:
Allen Downey, my favorite Bayesian Statistics author and professor, has this example in his blog:
Elvis Presley had a twin brother who died at birth. What is the probability that Elvis was an identical twin?

You need the following facts:
”Twins are estimated to be approximately 1.9% of the world population, with monozygotic twins making up 0.2% of the total—and 8% of all twins.”

The odd of getting identical twin to fraternal twins are:
8% of twins are identical twins
92% of twins are fraternal twins
So the odd of identical twins is 8% to 92% or 0.087 to 1

Now there is another piece of information we must take into account => It’s a twin brother. The odd of same sex in a twin increases the odd that his brother is an identical twin. What’s the odd? It’s 2:1 ( identical twin brother + fraternal twin brother to fraternal sister).

So the odd of Elvis’s brother being an identical twin is:
0.087:1 x 2:1 = 0.174:1
Converted to probability => 0.174/(1+0.174) = 15%

Example 3:
Let’s do a final example from Example 5 of Allen Downey’s blog:
According to the CDC, “Compared to nonsmokers, men who smoke are about 23 times more likely to develop lung cancer and women who smoke are about 13 times more likely.”
If you learn that a woman has been diagnosed with lung cancer, and you know nothing else about her, what is the probability that she is a smoker?

So the odd of getting cancer as a woman who smoke is 13 to 1 (13:1). Now we need to know the base odd of women who smoke to non-smoking women. From the blog, it’s 17.9% of woman smoke. So the odd is 17.9% to 82.1% or 0.21:1
So the odd that the woman is a smoker is:
13:1 x 0.21:1 = 2.83:1 => 2.83/(1+2.83)= 74%

Now isn’t that more intuitive and practical. Go out and apply the Bayesian Theorem in a party to impress people.
Or maybe you want to calculate your chance of meeting single women if you were a single men. Let’s say you’re going to a friend’s party whose friends are 25% female and 75% male and there’s a probability that 20% of the female are single, 80% is not single or unavailable. There you have base odd of 1:3 (25%:75%)and the odd of meeting single women is 1:4 (20%:80%). Your odd of meeting a single woman in a party is going to be (1/3)*(1/4)=1/12 or 0.083:1 => 7.7%. Suppose you’re highly selective and possess a prince charming quality, your odd of finding your qualified, desirable women is 1:10. Now your odd just drop to 1/120 or 0.83%. Unless the party attendance is going to be > 120 people, then it’s worth a shot. Otherwise, you might as well stay home and watch a sports game at home instead 😉

Book Review: “Think Bayes” by Allen Downey

I vaguely remember ever learned about Bayes’ Theorem in college. Until I read this book I never thought there are so many applications of Bayes’ Theorem in our daily life. The drug companies uses it to get FDA approvals on drugs. The food companies applies the theorem to “prove” certain foods are good for us. In a way, the theorem allows us to change our perception of certain events’ probability of occurring. I can imagine more applications than the ones he outlines in the book. By the way, the pdf format of this book can be freely downloaded here, thanks to the author’s generosity.

Because of this book and his Youtube videos, I got interested in re-learning Python as a practical/useful programming language. It opened my eyes to the wonderful world of Python and its ecosystem.

The book gets pretty technical fast after the first chapter which piqued my interest with his Cookie, M&M, and Monte Hall problems. At the end I don’t know if it’s really necessary to know all the various applications in prediction and simulation, but it could come in handy for people who work in the statistical field.

The key thing to remember is that Bayes Theorem gives a way to update or arrive at the posterior probability of a prior/known hypothesis, H, after learning some new piece of data. The equation is simple: P(H|D)=P(H)*P(D|H)/P(D)
As it turn out, the most difficult part is P(D) or the normalizing factor, which is the probability of seeing the data under any hypotheses at all.

Overall, all the concepts presented by the author make sense to me. I’m not sure I can actually implement and devise the models when the actual problem arrives. But at least I can recognize the Bayesian problem when I see one and know where to find help.

Chapter Summary:
1. Bayes’s Theorem was derived easily and discussed. He introduced the probability calculator of getting a heart attack.

2. Computational Statistics: The author provides a set of Python codes/tools to calculate the result quickly. I had to re-take a Python refresher course before getting any further. But the tools are powerful and help me understand the subject better. All the codes and downloadable from thinkbayes.com website. It took me a while to figure out Monte Hall’s problem – one of those paradox that’s hard to sink in.

3. Estimation: This chapter estimates the posterior probability upon rolling of a multi-face die (4-/6-/8-/12-/20-sides die) and successive rolling of dice (more data points changes the probability distribution), and probability of the number of locomotive given the observation of one locomotive number. This is when you really need to have the computer do the work for you. The intuitive part is that the more information you have, the better your estimation is going to be, as some data points would be ruled out. For example, finding 6 eliminates the 4-sided die.

4. More Estimation: In this chapter, the author covers the “biased” Euro coin problem. The interesting phenomenon is that the prior hypothesis makes little difference (provided you don’t rule things out with 0 probability), the posterior distribution will likely converge with more data points.

5. Odds and Addends: The author covers the “odds” form of the Bayesian Theorem, o(A|D)=o(A)*P(D/A)/P(D|B), which is probably easier to understand for most people. The Addends part of the chapter goes into getting the density and distribution functions.

6. Decision Analysis: Given a prior distribution function, what’s the best decision to make given some data. The author presented a Price Is Right scenario and solve the best price to bid for a show case showdown based the past data and best guess to adjust the bid. The PDF, Probability Density Function, is introduced here. Also the use of KDE (Kernel Density Estimation) is used to smooth PDF that fits the data. This is an interesting application of Bayes Theorem. The math is complicated and couldn’t be done without a computer. I guess it would be good to bring your computer to the Price Is Right game.

7. Prediction: Here the author presented a better way to predict the outcome of a playoff game score based on past data between two teams (Boston Bruin vs. Vancouver Canucks.) This should be no difference from how the gambling industry computes the odds before a big game. Just imagine all the gambling earning you could’ve made by mastering this chapter! I’m sure someone has applied the same idea/theory to the financial market like stocks, bonds and futures.

8. Observer Bias: This is another angle of predicting the outcome (like the wait time for the next train at a given time) taking into account of the observer bias.

9. Two Dimensions: Using the paintball example, the author applies the Bayesian framework to the two-dimensional problem. In addition, the joint distribution , marginal distribution, and conditional distribution. And you don’t think one dimensional Bayesian framework is confusing enough…

10. Approximate Bayesian Computation (ABC): When the likelihood of any particular dataset is 1) very small, 2) expensive to compute, 3) not really what we want. We don’t care of about the likelihood of seeing the exact dataset we saw but any dataset like that.

11. Hypothesis Testing: The Bayes Factor, the ratio of likelihood of a new scenario to that of the baseline, can be used to test the likelihood of a particular hypothesis, e.g. fairness/cheat of an Euro coin. Bayes factor of 1~3 is barely worth mentioning, 3~10 is substantial, 10~30 strong, 30~100 very strong, >100 decisive.

12. Evidence: Test for the strength of evidence. “How strong is the evidence that Alice is better than Bob, given their SAT score?”

13. Simulation: Simulate the tumor growth rate based on prior growth rate and the data points of current tumor size, age and etc.

14. A Hierarchical Model: reflects the structure of the system, with causes at the top and effects at the bottom. Instead of solving the “forward” problem, we can reverse engineer the distribution of the parameters given the data. The Gaiger counter problem demonstrates the connection between causation and hierarchical modeling.

15. Dealing with Dimensions: This last chapter combines all of the lessons so far and applies to the “Belly Button Bacteria” prediction and simulation. This is a very difficult chapter to understand and probably requires the full understanding of the previous lessons.


Corrupted Internet Explorer 11 Files – How I Fixed It

While fixing some Windows 7 system issues (like running Internet Explore or Quicken would hang) , I ran a lot of “sfc /scannow” command. And each time, it would complain that there were corrupted files that could not be fixed. (“Windows Resource Protection found corrupt files but was unable to fix some of them.”)

After looking into the log file residing in c:\windows\Log\CBS\CBS.log , I noticed that the majority of the corrupted files were related to Internet Explorer 11. Then I discovered this little “Search Protect” icon showing up in the task bar. Upon further search, I concluded that this was a malware from “Conduit”. I suspected this was the one that caused the Internet Explorer 11 to be corrupted.

First, I need to get rid of this malware. Based on the recommendation form my Google Search, I downloaded JRT (Junkware Removal Tool) and proceeded to remove the “Search Protect” from my system. Well, JRT didn’t quite remove it from the auto-start programs so I had to manually remove it using Microsoft’s autoruns. This was the only way to get rid of the annoying warning message that it couldn’t find the “backgroundcontainer.dll” software (already removed by JRT) upon logging into Windows 7 every time.

Since Internet Explorer 11 was the most up-to-date Internet Explorer, there was no new update to override it. I even tried downloading directly from Microsoft but the official download site was still down level. So I decided uninstall it, which was not a trivial task since Internet Explorer is an integrated software for Windows 7. Based on this recommendation, I would need to deselect Internet Explorer 11 in the Windows Features (Start -> Control Panel -> Programs and Features -> Select “Turning Windows Features On/off” on the left panel -> Deselect “Internet Explorer 10” ). Then go into Windows Update and Uninstall Internet Explorer 11 (Start -> Control Panel -> Click on “Installed Updated” on the left panel on the bottom -> Enter “Internet Explorer” -> Right Click on Windows Internet Explorer 11 ). By doing the above steps, upon reboot, the previous Internet Explorer (in my case IE 9) became the Internet Explorer app.

After doing more “sfc /scannow” and a few more reboots, I was able to run Internet Explorer 9 without any problem and my Quickens App was finally able to run without crashing. Evidently, Quickens uses Microsoft Framework which is integrated tightly with Internet Explorer.

Lessons learned:
1. Watch out for any strange icons on your task bars. Research their purposes. When in doubt, get rid of them so they don’t cause conflicts with other software.

2. Every so often (2 weeks), run “sfc /scannow” to check for any corrupted system files.