Mathworks : Mathworks公司官方信息发布
Matlab读书频道:与作者面对面交流
Matlab程序外包: Matlab程序项目有偿交易
〓〓 Matlab 论坛 〓〓
〓〓 Simulink 论坛 〓〓
分区版主: justbetwo
〓〓 实用帮助 〓〓
〓〓 家园建设 〓〓
分区版主: Txgz, yanpu
〓〓 娱乐天地 〓〓
会员报道:认识同城的MATLAB好友
Matlab中文论坛专题活动
友情链接
使用MATLAB来预测新兴市场中的金融危机
www.21jrr.com 发布时间:2010-05-18 16:11 来源:未知
摘要: 1997
年,马来西亚、菲律宾、泰国、印尼等国开始的金融危机迅速在世界范围内蔓延,造成了巨大的损失。经济学家Paul McNelis
着手研究是否现代的研究方法和工具可以预测这样的金融危机,从而减少它们带来的损失。 Pa……
http://scottlocklin.wordpress.com/2009/05/08/choose-your-weapon-matlab-r-or-something-else/ Locklin on scienceChoose your weapon: Matlab, R or something else?
As a data sort of guy, I use three programming tools on a daily basis, or at
least every week. One is Lush a
version of lisp. The other is Matlab.
Lastly, there is the R project.
I don’t want to use three tools for dealing with data, but it’s actually necessary right now. I don’t think it will be necessary forever. Lush is my general purpose programming language. It’s insanely great. Parts of it are wonky and slow, and parts of it are broken or missing, but it’s a lisp, it’s fast where I need it, and I like it a lot. More on this in a future entry. I use Lush for speed and original research. If there are no complex algorithms like what I need written in Matlab or R, I might as well write them in Lush. Lush is a high level language with low level speed when you need it. It would be perfect if it had more libraries. The only thing I may potentially like better is OCaML/F#, and frankly, I find the type inferencer there to get in the way more than it helps. If they made an OCaML where you could turn the type safety off most of the time, that would be better. Or, I could just be like everyone else and use Python or Java for this sort of thing. Not that there is anything wrong with that. Matlab would be my second choice for hacking out original research. Why Matlab? Matlab is reasonably fast, but one of the main value adds is that it is extremely intuitive if you’ve used Fortran or C, and if you don’t know how to do something, the help system is very informative. Matlab code is also extremely well supported. The debugger, profiler and editor are all excellent; some of the best I’ve used. Sure, someone will argue that they have more powerful debugger, but Matlab’s is the most handy I’ve yet used. I don’t need to read a manual to use it; I just use it. Sure, emacs is way better than the Matlab editor, but it isn’t as handy as Matlab’s editor. You can use Matlab to do just about anything. I’ve used it to code up embedded systems using xPC target and Real Time Workshop. I’ve used it to code up trading systems, from data feed to broker interface. I’ve embedded it in Excel for end users. I’ve deployed it in Enterprise software used by Fortune 100 companies. It’s amazingly useful stuff, especially if you have the proper toolbox to accomplish your tasks. You can build reasonably good numeric software with it as long as you don’t need fancy “programmy” features like concurrency. If Matlab had a way of making fast compiled code, it would be close to perfect for the type of thing I do. I wouldn’t bother with Lush any more, except when I was trying to write interpreter type things. Alas, Matlab’s way of doing this is to write code for your time critical pieces in C, and embed it into your code in a fairly laborious process. The only real drawbacks to Matlab are speed, plotting and expense. What is R good for then? Well, R is free, so many academics use it to share their latest econometric or machine learning software with everyone else. As such, just about everything statistical under the sun exists in R. And it’s free! What is not to love. Well, sadly, there is plenty not to love about R. First off, there is speed. R doesn’t seem to have anything that makes it inherently slow for an interpreted language: it should be comparable to Matlab in this regard. But it’s slow enough that most people do their heavy work in other languages. Most of the modules written for it have most of the code written in C or Fortran. This is somewhat true of Matlab also, and for the same reasons, but Matlab has a trivial way of telling you what you need to speed up, so R will always end up slower in practice. Second there is debugging. R is hard to debug. First off, it doesn’t drop you into an interactive top level the way Matlab (or Lush, or Python or anything where you write Real Programs) does. That sucks a lot, and removes a bunch of the utility of using an interpreted language. Oh, sure, there is a debugger, but it is buggy, poorly documented, and doesn’t work in the simple way that Matlab’s does. Thirdly, there is the syntax. Personally, I like the syntax; it’s a lot like OCaML. But most people don’t. What is more; the help system is very close to worthless if you’re trying to remember a simple command. People may say this is unfair, as I am just not used to R, but the fact is, I’ll never get as used to it as Matlab, and neither will anyone else. Oh, it’s OK for finding packages you want if you can think of the right keyword for them. But compared to Matlab, or even something like Lush, its online help is pretty worthless. Fourthly: for programming, while it should be better than Matlab in many ways, I haven’t ever seen a legible R program which was over 100 lines. I don’t know how they manage this. Part is doubtless the IDEs are rather bad. I don’t know anyone who claims they can write good, large pieces of software for R. I once asked a guy how he wrote big pieces of software, and he said, “very carefully.” This sounds pretty bad, but there are solid reasons to use R. For one thing; it’s free. There is a lot to be said for free. Among other things, if you want to give some code away for others to play with, R is going to be a better vehicle than distributing raw C or a matlab package. For another thing, it has a tremendous amount of work done on various hard numeric problems, and installation is trivial: just press a button. Want to wire the latest AdaBoost up to your database, and plot some nice results: pretty easy in R. I might be able to do all this in Matlab, with the correct packages and so on, but in R, it’s the work of seconds. Another thing: it’s a lot easier to make fancy plots in R than it is in Matlab. Matlab’s plotting utility is from the dark ages. It’s insanely bad. You can abstract some of its badness away with objects, but … you shouldn’t have to. Finally, for interacting with data, R wins. Matlab’s matrix paradigm makes it easy to use, but data.frames are more powerful. Here’s how my decision tree works. When I first heard about Benford’s law, I decided it was simple enough; I’d hack it out in Lush. I did. It worked, and I fiddled with it. Then I realized that goodness of fit to Benford’s distribution might be nice. I had chi-squared distributions already coded up in Lush, and some curve fitting stuff … but wiring it all together, then fiddling with the plotting routines: ugh. So, google informed me that some nice statistician had done all that work for me in R. So I used R. Probably, someone did it in Matlab also (actually, someone did), but it’s a pain to fire up my Windows laptop with Matlab on it, so I just went with R. That’s what R is good for. At some point, I’ll get Lush talking to R, at which point I may cease using Matlab unless someone pays me to do so. It will never be as slick as Matlab, and I will miss all the great user productivity features that Matlab offers, but it will get the job done better and quicker, I think. I use the cheat sheets in R a lot, for lack of a better help system, so if you want to fool around with it: A cheat sheet A better cheat sheet Other R documents
Be the first to like this post.
18 ResponsesLeave a Reply |
2012年4月23日星期一
Matlab R2010
订阅:
博文评论 (Atom)
In regards to R’s debugger – I’ll agree its poorly documented, but I haven’t found it to be that bad. Have you tried “options(error=recover)” and “withCallingHandlers(fun(), warning=function(c) recover())”? Also, I haven’t tried it (and it may be what you were talking about as ‘buggy’) but the debug package (install.packages(debug)) looks promising in terms of what you want.
I don’t like it as much as Matlab’s debugger (or what Lisp does by default), but it comes a lot closer to making me happy -maybe it will grow on me. Thanks for pointing it out.
Regarding Matlab, it is a poorly designed language. Its object system is bolted on. It is weakly typed and passes by value, with the resulting speed penalties. Maintaining Matlab code is unwieldy. It really never outgrew its roots in numerical algebra.
Regarding R, my impression is that you haven’t tried hard enough. I was a heavy Matlab user, but after some grad school learning curve got used to R. R is actually a much better designed language than Matlab. I never had problems with debugging using browser() or debug(). Its performance in linear algebra operations is very similar to Matlab (using BLAS, or better ATLAS in linux). And of course, you don’t want to do loops in either language. The packages available in R, from wavelets to shrinkage methods, to ensemble methods, SVM, to lattice/ggplot2, is just not comparable to anything SAS, SPSS, Matlab has to offer.
The issues I have with R are speed and multicore scalability. I can use C for speed, but not scalability (unless I get a second job to debug multithreads). I think F# has by far the best chances to succeed as a scientific, fast, scalable language, albeit not truly multiplatform.
Lush’s user base sucks, however, compared to the community of people who do numerics in Common Lisp: Lush’s user base is awesome and enormous. As a language it also has an advantage over Common Lisp: it’s very small and easily taken in within a couple of days. It also comes with useful source you can look at and imitate. When I picked Lush for my frankenstein’s monster, I was considering OCaML instead (which I agree is a great language, even in the F# version), but I went with Lush because a lot of the hard work was already done in Lush. I’m basically a machine learning dude, Lush is designed for ML, so it’s a nice fit. Python was also a consideration: it certainly would have made my life easier from a POV of having stuff already written for it, but SWIG+C isn’t a very good solution for speeding up the bits that need to be fast, and what they did with Python 3.0 is totally unacceptable. Another one which has come to my attention is Chicken: very fast, very configurable, and it doesn’t have the namespace problems Lush does. Still, I’m comfortable in my choice: there are no Chicken images with lots of math libraries in ‘em.
a suggestion and a question:
suggestion (re: time series): have you tried the package zoo? I use it regularly. It has many features, like missing data imputation. more here
question: I am intrigued by lush. Is there a newsgroup or a blog or any community focal point for lush? The only thing I could see is that the last announcement on the lush’s news page dates back to 2 years ago, the latest sourceforge image is dated Nov 2006, and was downloaded less than 5000 times. That’s when I got discouraged.
Lush is a very small lisp interpreter married to a compiled Lispy/Fortrany language that you can intersperse with C or C++. It’s also got all the basic fast matrix stuff you need built into it. As I said, the user community is tiny. There is a sourceforge mailing list. The biggest downside, besides the size of the user base, is the fact that there is no DB interface. I wrote a cheap interface to netCDF to shove timeseries in, but it’s too slow on writes, so I just dump objects to file for now. At some point, I may get around to writing a proper TS database in HDF, and a mySQL interface. Though I am also considering making Lush callable from R, and vice versa. Meanwhile I get paid to do something else.
Why it rules: it is exactly the level of abstraction you need. Most of the time you can write sloppy high level interpreter code. When you need to go faster, or have decided on a basic design, you can optimize down to the metal. Theoretically you can do stuff like this in Python + SWIG (something becoming more common at enlightened hedge funds) or OCaML (if only I could turn off the type inferencer when I don’t need to go fast/safe), but I liked the way Yann and Leon did stuff.
There is a new version of it being worked on by Ralf Juengling, but it’s not ready for prime time. The old version is pretty solid.
1) for time series, have you tried zoo? I use it, and am very happy with it. Check out the vignettes
2) do you know where lush users meet, ask questions, post code, etc? On sourceforce, the last image was uploaded 2 1/2 yrs ago, and downloaded less than 5,000 times.
My absolute favorite environment for rapid and powerful coding is Dyalog APL. The language allows really powerful abstraction, and the IDE’s debugger is the best I’ve ever seen. You can step backwards and forwards, and add/modify code without having to exit debug mode. Most Dyalog users find themselves writing code within the debugger.
http://en.wikipedia.org/wiki/K_(programming_language)
R is crazy frustrating at times, but with the helpful “cheat sheets” you can get a lot done. I’m sure your Splus will serve you well.
http://lionhrtpub.com/orms/orms-6-08/frswr.html
http://www.advantageforanalysts.com/
An ongoing weakness in R seems to be an odd/poor set of default choices (as with the options above). Google “stringsasfactors” and witness a long list of novice users ready to do violence to their computers. R’s factor-handling has brought me to the verge of tears, only to discover a single pithy sentence in the docs that clarifies all.
In short, R’s public relations team isnt likely to win any awards today, tomorrow, or ever…
Other fun: keeping track of which libraries you have loaded from where. I found a fun “bug” in my code which couldn’t be reproduced on different installs of R; apparently the old version of XTS (or ZOO, I never figured out which was at fault) allowed you to subset pretty sloppily. New version requires everything be just so. Finding out which lib R was pointing to … any of 3-4 in Framework or my home directory: insanity. In the end, I’m going to have to maintain an R distribution along with my code, because the libraries change so much underneath the code, I can’t rely on CRAN to do it for me for anything resembling software. Shoulda done it in Lush.
Clojure is interesting, though I feel better about resorting to C than I do about resorting to Java, even if the latter is safer. Incanter looks incredibly weaksauce though.
I suspect most people would find Lush pretty DIY and clunky. Compared to, say, OCaML, it isn’t as solid or well developed. But it is incredibly handy to get stuff done in. I may some day regret spending the time in Lush rather than OCaML, but I doubt it.