Wednesday, September 2, 2009

Monday, August 31, 2009

Dr Nate's Paper

If you are still interested in more of the same, you might want to read http://www.testpublishers.org/Documents/08-002Thompson-Final.pdf. The drawback to this article is that it is purely psychometric and does not discuss any issues endemic to India.
Thank you and regards,
Nate Thompson

______________________
Nathan A. Thompson, Ph.D. Vice President, Assessment Systems Corporation
Adjunct Faculty, University of CincinnatiPhone: 651-647-9220

Sunday, August 30, 2009

Links to two articles at sify.com

Will CAT-2009 follow Computer Adaptive Testing model?
2009-08-26 11:58:14
The decision of the Indian Institute of Managements (IIMs) to conduct……
-------------------------------------------------------------------------------------
Applying online for jobs in public sector banks
2009-07-23 13:09:56
Prof. Vipin K. Chilana, Head - Processing & Technology ....

Dr Nate's 5 Day Workshop on IRT/CAT


From July 29 to Aug 1, 2009, IBPS organized 5 days workshop on IRT/CAT. This is what I wrote to Dr Weiss as a feedback. You can get information about other scheduled workshops at http://www.assess.com/

At 02:05 AM 8/3/2009
Dear Dr Weiss,I provide below part of what I spoke on the concluding session of the 5 day India workshop. A group picture is also attached to test if you can spot me there and see how much has my appearance changed after a decade! It was around 16 July 1999 when I got my first tutorial on IRT and CAT from you in your cabin in the Minnesota University. Dr Nate is bringing with him whole set of pictures taken here and his impressions about India.


A mother persuades her son to go to school. She says, “Darling, Are you not ready to go to school?” The son says, “Mummy nobody at school likes me.” “But darling, you must go to school”, says the mother. “But the teachers don’t like me at school”, says the son. “But sweetheart, you are 45 years old, and you are the principal of the school. You have to go to school” Well, had I been in Dr Nate’s place for this workshop, my mother would have persuaded me like that from the second day onwards and I would have told her that they have got a bunch of people together who understand the subject somewhat and another bunch who don’t at all. But Dr Nate didn’t need any that kind of prodding from any one. His enthusiasm on the 5th day also didn’t wane and the zeal was the same in the last session as it was in the first. He balanced very well the content for the novice and those who knew about the subject a bit. He handled the question posed at -3 Theta with equal aplomb as that of +3 Theta without losing his cool smile and without letting anyone feel that unrelated or very basic questions are not welcome. I fact when we were planning this program we secretly wished that Dr Weiss himself should be a part of the workshop and had sent him an email to consider our request . The way Dr Nate has conducted this workshop for a divergent group like ours is an indirect compliment to Dr Weiss also as Dr Nate happens to be his student. We wish Dr Weiss and Dr Nate visit India in 2010 and the 2011 GMAC CAT conference is also held in India.

Regards,

Vipin
--------------------------------------

Vipin:

Thanks very much for the photo (you're the guy in the middle with the gray beard and the black sleveless vest). I also very much appreciate the comments on Nate's performance. Although he is young, as you indicated he is very knowledgeable as well as very personable. I'm pleased to hear that he did a good job!Where the next CAT conference is held is up to the people who sponsor it (GMAC). However, once the new CAT organization starts functioning, there is always the possibility that they could hold conferences in other locations in the years that GMAC doesn't sponsor a conference.

Dave

**********************************************

David J. Weiss, PresidentAssessment Systems Corporation

2233 University Avenue, Suite 200St. Paul Minnesota 55114, U.S.A.

Email: dweiss@assess.comPhone: 651-647-9220 Fax: 651-647-0412

Worldwide Web: http://www.assess.com/

**********************************************

Saturday, August 8, 2009

Will CAT BE CAT?:
Will Common Admission Test 2009 of IIMs follow Computer Adaptive Testing Model?

*Prof Vipin K Chilana
Professor (Psychometrics & HR) and Head (Processing and Technology Division), Institute of Banking Personnel Selection (IBPS), Mumbai· email: vchilana@ibpsmumbai.org; vchilana@hotmail.com; vchilana@gmail.com


Bill Gates in his book published in 1999, ‘Business @ the speed of thought’ has the following opening sentence in the Introduction: “Business is going to change more in the next ten years than it has in the last fifty”. Arthur C. Clarke’s popular quote is: We always overestimate the amount Technology will change over the next five years. We always underestimate the amount Technology will change over next ten years. In this digital age, where the mode of interaction and transaction is fast changing, the obvious question is - “Will the face and interface of examinations and testing undergo major transformation from paper pencil to digital mode?” Can we foresee these trends 10 years hence?

There are already such pointers and attempts to switch over to Computer based exams rather than paper-pencil tests. However, the approach followed based on different requirements varies. In a country like India, the number taking tests for some of the high stake exams is very large and the public perceptions are not so well shaped about all the aspects of this new mode. While it does throw many challenges, it provides scope for discovering a unique model best suited for our requirements rather than just importing the idea from other countries.

In the last about five years many organizations have conducted computer based exams in India. But the ratio of exams as well as number of examinees tested tilts very heavily towards the traditional Paper-pencil mode. One of the major reasons of this is that in the paper-pencil mode a significantly larger number can be tested in a single session with same set of question paper rather than using different sets and then equating them statistically to arrive at equivalence of scores. In a single session with the same set of questions, the number has gone as high as 600000 in one session in traditional mode (State Bank of India’s clerical examination) compared to not more than 5000 in a session for the computer based testing. On an average upto 200000 in a single session with same set of questions is a rule rather than exception for paper-pencil large scale examinations.


· In a high stake examination where one mark difference can make or mar one’s chances of getting in the list and where the examination carries much higher weightage in the process compared to other components, the candidates’ and public’s perception always favors same set of question paper in a single session examination. Any major deviation from this is always seen as bringing in some bias.

Can Computer based models replicate and follow this dictum of same test content for large scale high-stake examination? The answer is a simple No. Not even many years down the line. At present when the testing organizations think about scaling up the number in a session, the target hardly goes beyond 10000. Even when the density of computers increases and the connectivity is much better than what it is today, a single session examination using the same set of questions may not go beyond 50000. And this number too seems a gross over-estimation at this stage.

With this backdrop, Professors and Administrators of IIMs have taken a very bold initiative to transform paper-pencil CAT to online in a year where the number expected to apply will be more than 3 lakhs. Though it will be delivered with the help of an international organization, will it adapt it to include nuances best suited to our expectations and requirements?

This limitation of not being able to test a large number in one session compels us think of a different model rather than a “linear” transformation of content from paper-pencil mode to computer assisted mode beyond a single session and same set of content for the same examination. In a linear transformation all the other features remain the same except for the interface is on computers. Linear model holds good for manageable number in a session which has so far not gone beyond 5000 in India. Some of the International testing organizations do boast of a much larger number but that also is in multiple sessions using different sets of questions. In this linear model their number also may not exceed 5000. The linear model has high face validity and acceptability. CAT 2009 can certainly not afford this model given the numbers. Are the alternative models as robust and can it gain acceptability with the students and public?


There can be several ways in which examinations on computers can be classified and there are various tags by which these are labeled. On-line tests, Computerized Tests, Computer Assisted Tests, Computer Based Tests, Computer Adaptive Tests, Tailored tests are but a few examples of these types. From IT angle, based on how the tests are delivered can be classified as Computer Based Test (CBT) or Internet Based Tests (IBT). Major distinction is that CBT can be a standalone computer or in a LAN with a local server whereas IBT is largely delivered and controlled from a remote centralized server. CBT is also called a distributed model where the questions reside on the local server and after the exam the data is transferred to a central place. As compared to that, IBT is delivered through a centralized server using internet type protocol and the data is saved at regular intervals on the server during the session. Each model has its’ own pluses and negatives. This is the way computer technology will classify testing.

More significant way of classifying tests on computer is the Psychometric theory these are based on and the way the testing model uses content of the tests in multi-sessions for different candidates of a particular examination. As the field matures more questions will have to be addressed in this area to prove the robustness of the model being adopted and adapted. In a country like India, perceived fairness of the process is far more important than the technical issues like reliability and validity. While both these aspects are needed to be taken care of, general public and candidates will have more questions on fairness and equivalence. Based on Psychometric theory the two way classification will be CBT and CAT – Computer Adaptive Testing. CBT here will have a different connotation than the one explained above. Simply put, it is presenting the test items/questions on the computer screen rather than on paper. The answers are marked/ indicated by the candidates on the screen using key pad or click of mouse. This simple and linear transformation from Paper/Pencil can also incorporate many additional advantages to both candidates and to testing organizations.

The Historical development of testing indicates two parallel approaches of mode of delivery i.e. Individual vs. Group Testing. Individual testing means one test taker is attended to by one test administrator for the full testing session and has the following distinct advantages: recording additional information besides responses, like response time and changing answers pattern for some items, and behaviour associated etc., terminating the test (or sub-test) when the candidate is not able to respond to difficult items, presenting different types of items – Spoken words and Sentences, apparatus test etc besides written material, and authentic monitoring of time for tests and sub-tests.

Compared to Individual testing, Group testing was evolved to cater to the need of testing a large number of candidates in lesser time. However many advantages of Individual testing mode were lost. Ideal testing strategy would be where we could combine the above advantages of individual testing coupled with time efficiency of Group testing. Digital age has opened another mode of testing i.e. presenting items/ questions on Computer, which has the potential of combining most features of both Individual and Group mode testing and also adds many additional advantages.

Computer Adaptive Testing (CAT)

Ever since the maturing of testing and measurement discipline, Measurement Experts, having realized the limitations of Classical Test Theory, were concerned with issues of Reliability, Validity and Discrimination power of tests and tools. Item properties like Difficulty and Discrimination indices have largely been computed and interpreted using Classical Test Theory. In Classical Test theory these parameters are computed based on the performance of the test taking group. If the ability level of the test-taking group were different, the item parameters would also be different. This led to emergence of new theories of measurement. One such theory was the Item Response Theory (IRT) in which item parameters are computed based on mathematical model using Item Characteristic Curve (ICC) supposed to be the inherent characteristic of the item. A brief description of testing theory in general would be in order. Testing is sampling of behaviour, ability etc. Through this sample (represented by test items) we infer about the ability or the Characteristic being measured. A simple rule of statistics is – larger the sample better the inference. Keeping in view practical and psychometric considerations like optimal time duration for testing, fatigue on the part of test taker and reliability of the test, a minimum and maximum number of items to be included in the test is decided. The tests containing these items are expected to assess the ability level of candidates who differ widely on this ability being measured. Since same test is given to all competing candidates, items of varying difficulty level are selected to generate maximal individual differences. All the candidates are presented the same set of items irrespective of their actual ability on that construct or characteristic being measured. With this approach the effective sample (Number of items actually contributing to measurement) gets further reduced depending upon the ability level of the test taker. For example, if we have all candidates of higher ability and we give them ten items, which are very easy, all of them will score maximum or very high marks and test would not serve its purpose. Same would be the case if we have all ten very difficult items and all candidates are of lower ability level – all will get zero or very low score. If we give a test containing five very easy and five very difficult items to both the groups then the higher group gets differentiated from the lower group on the basis of only five items and not the ten actually contained in the test. Five items in this case are redundant and do not contribute to measurement of the characteristic for generating individual differences. Within the higher and lower ability group also the items should address to varying levels to generate more score differences within each group.

To sum up, through conventional group testing the ability or the characteristic gets measured via only a very small number of (effective) items though on the face of it we may be using more items. The number of items that actually contribute to measurement of a characteristic is much smaller than the actual number of items contained in the test. Since the same test is given to all the candidates representing a wide-spectrum of ability levels, the individual differences get reported at a broad level and lack micro-precision. This also results in many candidates getting same score because the sample given becomes further smaller as a function of the ability level of the test taking group.

Since increasing the overall number of items (to represent better sampling) poses some practical problems in terms of time and fatigue, another way to do so is to present each candidate items suited to her level of ability so as to measure her score with better precision. For that reason of “suiting to each”, it is called “tailored” or “adaptive” testing. Though this principle of tailored testing was well understood by measurement experts resulting in shift from Classical Item Theory to Item Response Theory, it could not be translated to measurement reality. Fusion of this thinking with advanced computing facilities gave birth to CAT model to estimate each examinee’s level of ability/proficiency with greater measurement precision.

The backbone of CAT is computer systems, item bank consisting of a large number of items at different ability levels (computed using Item Response Theory- IRT) and built around a programming logic of starting, presenting different items and appropriately stopping the Testing session. A typical session would be as follows. The candidate is presented with an item of moderate difficulty level selected randomly from a large pool of items of moderate difficulty level. Based on the responses, the ability is assessed and accordingly next item of higher or lower difficulty is presented. On the response of first item an initial estimate of ability is made which after each subsequent item is re-estimated to follow the same testing algorithm of branching to a next level of difficult item if the response to the previous item is right, otherwise to an easier one. When the maximal information about the candidate’s ability is available, testing terminates. In this model, different candidates are tested through different sets and different number of items and time duration.

There are many other technical inputs required like computing item difficulty of each item based on Item Response Theory and the time to be given for answering a particular question or set of questions. Based on these principles the scores are generated. Some add-ons of CAT over simple CBT are: better Measurement precision, better Security of test content as different candidates get different items in different order (all the items are not exposed to all the test taking candidates), more number of candidates can be tested in different sessions with a given item bank without threat to security to test content and less frustration on the part of candidate as the items beyond her ability being measured are not presented.

Major hurdle in CAT is that the awareness of this theory is very low and common sense doesn’t lend credibility to the practice that for the same examination different candidates not only are provided different questions but also varying number of questions. For some candidates the session may terminate after 5 questions while for others it may go on for 20 questions.

Different Models

It can be summed up that there are two basic models of tests on computer - based on Psychometrics – CBT and CAT. CBT has the following three variations:

1. Same content for all the examinees in one or multiple sessions (usually two – one after the other). This model is based on the classical theory and enjoys the same psychometric credibility as that of a single session paper-pencil test. It is restricted for CBT only in terms of the number it can manage in a session. Since the same content is presented to all, question on equivalence are not posed. This model can be called Simple CBT.

2. Different sessions with different but Parallel/Equivalent content. This can also be compared to multiple sessions in paper-pencil tests. Different parallel forms of the test need to be statistically equated. Eyebrows can be raised about equivalence and statistical bias. This model can be called CBT with Parallel Forms.

3. Different sessions with randomly selected items from a larger item pool based on different silos of item properties like facility index and item discrimination index. For example, if the items are to be selected on facility index (percentage passing the test in pre or previously tested) and if one has 5 items in each of the 8 buckets – 0 -10, 11-20, 21-30 and so on, for each examinee, one item from each bucket is picked up on random selection. Hence the complete test for any two examinees is likely to be very much different. This model can be called CBT with Random Item Selection.


The model 2 and 3 are more in vogue when the testing is in multiple sessions. In variation 2 above, the statistical equating is needed to be done after the score averages and standard deviation data of each session is available. In 3 above past statistics are used to bracket items in a range, which in a way results in more or less equivalent forms.

CAT model works on the principle of CBT with Random Item Selection with item characteristics having better precision rather than a broader range of characteristics. Besides, it has the advantages of terminating the session with much less number of items yet adding more precision to the score/assessment of ability.

Test of the pudding is in its eating. Some of the theoretical questions which need to be addressed may not get an exact or satisfactory answer. Test takers will always like to be assured of equivalence. In specific terms they may pose – “If the same candidate were to be tested with any of the parallel forms of the test or from a different pool of randomly selected items or based on CAT, would she get an equivalent score and the same select/reject decision each time?” Answer to this in layperson’s terms is difficult to explain. More than providing this explanation one also must accept that the score equivalence can never be so precise in statistically equated score.

However, there is one practical way of not only demonstrating that equivalence was built-in but also to filter out any residual statistical artifacts from arriving at the final precise score to be used for select/reject decision. The solution comes from a usual common-sense explained in the end. It works equally well with either CBT with Parallel Forms or CBT with Random Item Selection or with CAT model.

In India, announcement of IIMs’ Common Admission Test popularly known as CAT has evoked lot of interest and curiosity. It has the potential to completely change the landscape of testing in India if it works well, technological glitches notwithstanding. But alas! CAT is not CAT based. Which is to say, Common Admission Test is not based on Computer Adaptive Testing model. The overwhelming number of about 300000 candidates would be tested in about 30 sessions, 3 sessions each day for10 days – each session testing about 10000. For each session, an equivalent form of the test would be used. So, CAT of IIMs is CBT with Parallel Forms. These forms are likely to be statistically equated.

Here is the common-sense formula for validating the equivalence assumptions and at the same time factor-out statistical and different form variations. Out of the 300000, only the top 6000 or so reckon for the further process in IIMs. Think as if there is no tomorrow and build your each Parallel form as the final form, the score of which is to be used as it is. Check this assumption of equivalence when all the 30 forms are done with. Equate statistically the scores to call the top about 10000 for the final test using a single form. Take decision on the basis of this final form only. Use the information of earlier forms’ statistically corrected score vis-à-vis the final forms score. If they match, pat yourself on the back and proclaim you knew it. If they don’t, which is more likely to be the case, feel relieved that you have been fair to the candidates and have taken out the extra froth out of it. This information can further be used for correcting the remaining scores of those who have not been called for the final round so that the scores are more accurate when used by other Institutes. Administratively it calls for just one more testing session and delay of about 15 days in finalizing the results. But the payoff is that it would not only do justice but would also appear to be doing so. IIMs have taken a leadership role in ushering in what Andrew S Grove called ‘10X’ Changes resulting in Strategic Inflection Point (SIP) – referring to a time in the life of a business or practice when its “fundamentals” are about to change. This role has to be played with finesse and responsibility. In the long run, CAT has to be CAT – the adaptive model for the advantages are too many to let go.

This Mantra has the potential to become a norm for all large scale high-stake testing programs irrespective of whether it is CBT or CAT based. For the latter, this post testing design will prove to be a good testing ground for the CAT model itself.

Applying online for Jobs in Public Sector Banks

Applying online for Jobs in Public Sector Banks

Prof. Vipin K Chilana, Head – Processing & Tecnology Division, and Professor –Psychometrics & HR, IBPS



Public Sector Banks (PSBs), it is estimated, will hire 30,000 personnel in various cadres during 2009-10. Institute of Banking Personnel Selection, popularly known as IBPS, promoted by PSBs and the Reserve Bank of India, has been rendering assistance to PSBs and Financial Institutes in recruitment.
IBPS has helped PSBs adopt advanced methods of recruitment and selection. IBPS popularized use of objective tests and OMR technology for scanning and result processing. Public Sector Banks and IBPS adopted advanced technology in such a way that it becomes a facilitator for the masses rather than an advantage perceived as elitist. One such relatively recent area of technology adoption is on-line applications.

For these 30,000 jobs, it is estimated that more than 60 lakh candidates will apply. Had PSBs not adopted the technology in receiving on-line applications the whole selection process would have taken 10-12 months to complete from issue of advertisement to final offer. Now it takes about 7 months.

The online application has been adopted by different Banks with some variations suiting to the requirements of level of jobs, type of recruitment and branch network of the bank. Added to the online advantage is the payment of fee in any branch of the bank rather than purchasing Demand Draft or other such mode of payment. There are many advantages to the applicant in each of these methods. You should read the advertisement of the Bank posted on the website very carefully so that you follow the method indicated properly. The summary of two major variations is as follows:

Fully online - no print out or document or fee receipt to be sent
In this mode, the candidates go to any designated branch of the bank to pay the fee against payment challan which can be downloaded from Bank’s web-site. Then they fill the application online with payment details and retain a copy of the print-out and challan with them. Once you get the registration number your application is received in the database for further processing. You should retain your registration number as well as password as these will be needed for re-printing of application or call letter download facility if provided by the bank. In this mode the biggest advantage is that you are not affected by postal delay or loss.
Application online but print-out of registered application with DD/payment challan required to be sent

In this mode all the above steps are followed but unless the print-out with DD etc is received at the given post bag number within the stipulated time, the application is not valid. Hence when this mode is followed you should try to register as early as possible and send the print-out with DD/payment challan and other enclosures specified so that you don’t lose a chance because of postal delay.


Advantages of online application to the candidates:

The data is filled by you hence accuracy is checked. In the earlier mode data punching many errors would creep in.
You get intimation of call letter by email also
If no print out is to be sent you are assured of application having been received by the Bank online if registered properly and you save on postage also. Your application is not affected by postal delay or loss.

Some Tips:

Apply as early as possible to avoid last minute rush.
Read the advertisement “How to apply” posted on the website very carefully. The Bank for which you are applying may have different specifications.
Preserve registration number and password and a copy of the print-out and payment challan.
Read the print out carefully to ensure that the information has come correctly. You should check this for critical information like your address, date of birth, qualification details etc.
Purchase DD/pay the fee in advance. Banks don’t extend the date of DD or making payment in the branch. However, they generally give 1 or 2 days concession in applying online. There is usually a clause in the advertisement stating – “Even if the date of registration is extended by a day or two the date of DD or payment of fee in the branches will not be extended.
In case you are not able to register due to heavy traffic on the site, don’t panic. Try again during non-peak hours. In order to avoid this situation at the last minute, register as early as possible.
Fill the application yourself. You will learn from each process. Many banks during the interview ask about the feedback on the application process and judge your interest and alertness about the whole process.
In case of payment at the bank branch, even if the bank branch is not in close vicinity do make it a point to go yourself to pay the fee. It helps you know about the bank which can be to your advantage during interview. When you appear for the written exam or attend interview you do travel some distance. Have the same approach for payment of fee also. Though you have the option of requesting someone else known to you to buy DD or make payment on your behalf, doing it yourself will help you gain different experience of the process and will add to your learning.
Check your email and bank’s website frequently to know about the announcements. This should particularly be done around the crucial dates given in the advertisement.


While technology provides you many advantages, it also has some glitches, particularly in the initial stage. Because of the introduction of a new method you may face some problem sometimes and in the event of such happenings, PSBs ensure that the candidates are not put to any loss or disadvantage or deprived of their chance to apply arising out of any such difficulty. In most of the cases the date of registration is extended. You need to show positive attitude towards technology but in case of any problem do seek assistance in a proactive manner. Be part of the solution when faced with the problem. While having patience for such failures, do bring out such instances to the notice suggesting your recommended course of action. This positive attitude will help you in life also.