Research mentions for such machine organization are

Research and Assignment Reading 10 Max 1 page Read and briefly summarize the two papers labelled with R1. Comment on which of the cited features listed for IBM 360 are still used in today’s instruction sets? List some of the arguments made for and against RISC and explain if these arguments still hold true today.Solution:Summary (R1a):The paper includes description of the architecture of IBM 360 and various problems faced to reconcile the design and make it more robust and comprehensive. Also, there are various design objectives mentioned which IBM 360 successfully accomplishes including advance concepts, open ended design, general purpose function, efficient performance and intermodal compatibility. Further the author discussed more amendments requires to make the design more sound. These changes were done in the field of data format, instruction decisions, input and output systems, etc. The main purpose as the author mentions for such machine organization are for general purpose utilities, different types of data processing, machine language compatibility, etc.Summary (R1b):Demands for advancements in the computer architecture is elevating year by year and these task helps improving cost effectiveness of machines. For achieving higher processing and functionalities, the architecture turns correspondingly more complex. Likewise, this complexity and cost effectiveness increases in each offspring as the technology development progresses. While this cost effectiveness may bear a positive trend for some machines, for some it may not. Unlike CISC, RISC has commensurately more elevation in its cost effectiveness.As there are many consequences in using CISC including greater issues in designing, time required for implementation, faster memory requirements, etc. Also, majority of the instructions of CISC is seldom used and hence it turns out to be effective if only half or a quarter of those instructions sets are considered. So this derives the application of RISC wherein the aforementioned consequences can be coped. RISC provides easiness in design, comparatively less implementation time, speed, better usage of on-chip area and support for high-level programming.Features of IBM 360 yet in use: Data representations Floating point arithmetic Arithmetic operations Instruction formats Protection Sign codes General registers and storage addressingArguments:Arguments in favour of RISC Arguments against RISCLOAD and STORE are independent commands, thus the amount of work the computer must perform is drastically reduced These processors demands large memory cache as to feed the instructions.It has single-clock, reduced instructions which gives the advantage that the entire program will execute in same amount of time as multi cycle. Its performance will depend upon the code executed wherein Code expansion increases the size of the code. So better the compiler, better will be the code.The conversion operation converts the high-level language into its interpretable code. 10 Max 1 page Visit the Intel on?line microprocessor museum1, and determine the rate of increase in transistor counts and clock frequencies in the 70’s, 80’s, 90’s, 00’s, and this decade. Also, create a plot of the number of transistors versus technology feature size using an MS Excel spreadsheet.Solution:Moore asserts that the number of transistors on a chip roughly doubles every couple of years. This is termed as Moore’s law. The same can be observed in the table below which documents the progress of Intel in managing to achieve size reductions of their processors. The law is based on the logic that as the size of the transistors gets smaller generation by generation it turns convenient to accommodate a greater number of those transistors in a given area.Year Processor No. of Transistors Base Clock Frequency Technology1971 Intel 4004 2,300 108 KHz 10 ?m1972 Intel 8008 3,500 800 KHz 10 ?m1974 Intel 8080 4,500 2 MHz 6 ?m1978 Intel 8060 29,000 5 MHz 3 ?m1982 Intel 286 134,000 6 MHz 1.5 ?m1985 Intel 386 275,000 16 MHz 1.5 ?m1989 Intel 486 1.2 X 106 25 MHz 1 ?m1993 Intel Pentium 3.1 X 106 66 MHz 0.8 ?m1995 Intel Pentium Pro 5.5 X 106 200 MHz 0.35 ?m1997 Intel Pentium II 7.5 X 106 300 MHz 0.25 ?m1998 Intel Celeron 7.5 X 106 266 MHz 025 ?m1999 Intel Pentium III 9.5 X 106 600 MHz 0.25 ?m2000 Intel Pentium 4 42 X 106 1.5 GHz 0.18 ?m2001 Intel Xeon 42 X 106 1.7 GHz 0.18 ?m2003 Intel Pentium M 55 X 106 1.7 GHz 90 nm2006 Intel Core 2 Duo 291 X 106 2.66 GHz 65 nm2008 Intel Core 2 Duo 410 X 106 2.4 GHz 45 nm2010 2nd generation Intel Core (i3,i5,i7) 1.16 X 109 3.8 GHz 32 nm2012 3rd generation Intel Core (i3,i5,i7) 1.4 X 109 2.9 GHz 22 nm2014 5th generation Intel Core (i3,i5,i7) 1.3 X 109 2.6 – 3.8 GHz 22 nm2017 8th generation Intel Core (i3,i5,i7) 1.7 X 109 3.2 – 4.7 GHz 14 nm2017 Intel Core i9 19 X 109 4.3 GHz 14 nmObserved trends:Trends in transistor count Trends in Clock frequency1971 – 1980: Increased 67 times1981 – 1990: Increased 9.5 times1991 – 2000: Increased 21 times2001 – 2010: Increased 33 times2011 – 2018: Increased 12 times 1971 – 1980: Increased 7 times1981 – 1990: Increased 2 times1991 – 2000: Increased 4.5 times2001 – 2010: Increased 5.6 times2011 – 2018: Increased 2.3 timesChart for observed trends: Exercises 10 Table below shows relevant chip statistics that influence the cost of several processors. Explore the effect of different possible design decisions for the Processor A and answer the below questions.Chip Die Size(mm2) Estimated defect rate (per cm2) Estimated defect rate (per cm2) Manufacturing size (nm) Transistors (millions)Processor A 400 0.30 0.30 130 276Processor B 380 0.75 0.75 90 279Processor C 199 0.75 0.75 90 233a. What is the die yield for Processor A? (Assume wafer yield is 100%, process-complexity factor is 5 for 130 nm technology)b. What might be the reasons that Processor A has a lower defect rate than the others?Solution (a):Given: Die size: 400 mm2 | Manufacturing size: 130nm | Estimated defect rate: 0.30To find: Yield for Processor AAccording to Bose-Einstein formula, Die Yield = Wafer yield * 1 / 1+ (Defects per unit area * Die area) NHence, inserting the given values to the equation yields:Die yield =100% *1 / 1 + ( 0.30 * (400/100) ) 5 =1 / 1 + (0.30 * 4) 5 =1 / 1 + 1.2 5 =1 / 2.2 5 = 2.2 -5 =0.0194Solution (b):Processor A has lower defect rates as commensurate to Processor B and C because of it larger transistor feature size (130 nm > 90 nm). 10 One challenge for architects is that the design created today will require several years of implementation, verification, and testing before appearing on the market. This means that the architect must project what the technology will be like several years in advance. Sometimes, this is difficult to do. According to the trend in device scaling observed by Moore’s law, the number of transistors on a chip in 2025 should be how many times the number in 2015?Solution:Moore’s law states that the number of transistor on a given die area increased by double to its predeceasing amount every 1.5 to 2 years.Considering the number of transistors to be ‘x’ in the year 2005.The difference between the duration of years from 2015 to 2025 is 10 years Case 1: When it doubles the predeceasing amount in 2 years.So number of times the amount would double itself in 10 years is: 10 / 2 = 5 times.Hence the number of transistors in 2025 would be: x * 25 = x * 32 = 32x.Clearly, it has scaled upto 32 times in 10 years.Case 2: When it doubles the predeceasing amount in 1.5 years.So number of times the amount would double itself in 10 years is: 10 / 1.5 = 6.66 times.Hence the number of transistors in 2025 would be: x * 26.66 = x * 101.125 = 101.125x.Clearly, it has scaled approximately 101.125 times in 10 years.So the amount of transistors in 2015 would be around 32 to 101.125 times of itself in 2025. 10 When parallelizing an application, the ideal speedup is speeding up by the number of processors. This is limited by two things: percentage of the application that can be parallelized and the cost of communication. Amdahl’s law takes into account the former but not the latter. What is the speedup with N processors if 50% of the application is parallelizable, ignoring the cost of communication? What will be the speedup for a system with 1000 processors?Solution:Given: Number of processors: 1000 | Parallelization amount: 50%To find: Speedup (overall)According to Amdahl’s law,Speedupoverall = 1?(( 1-Fraction enhanced )+ (Fraction enhanced)/(Speedup enhanced))Since parallelization is 50%, the fractionenhanced would be 0.5 and number of processors is equal to 1000.Hence, inserting the given values to the equation yields:Speedupenhanced is =    1?((1-0.5)+ 0.5/1000)      =( 1)?((0.5)+ 0.5/1000)            =( 1)?(0.5+ 0.0005)      = 1/0.5005         = 1.998Thus the overall speed up would be 1.998 for 1000 processors. 10 One critical factor in powering a server farm is cooling. If heat is not removed from the computer efficiently, the fans will blow hot air back onto the computer, not cold air. Observe the effect of different design decisions on the necessary cooling, and thus the price, of a system. A cooling door for a rack costs $4,000 and dissipates 14 KW (into the room; additional cost is required to get it out of the room). How many servers with a Processor P2, 1 GB 240-pin DRAM, and a single 7,200 rpm hard drive can you cool with one cooling door? Use the table below for your power calculations.Component Type Product Performance PowerProcessor P1 1.2 GHz 72 – 79 W peak P2 2 GHz 45 – 60 WDRAM MEM1 184 – pin 3.7 W MEM2 240 – pin 2.3 WHard disk drive HDD1 5400 rpm 7.9 W read/seek, 2.9 W ideal HDD2 7200 rpm 7.9 W read/seek, 4.0 W idealSolution:Given: Power dissipated by cooling door: 14KW | Power consumption by processor P2: 60W | Power consumption by memory MEM2: 2.3W | Power consumption by hard disk drive HDD2: 7.9WTo find: Number of setups including components P2 processor, MEM2 memory, HDD2 hard disk drive, that can be cooled using the given cooling door.Total setups = (Total power generated)?(Summation of power of individual components)  Hence, inserting the given values to the equation yields:       Total setups = (14*1000)?(60+2.3+7.9)           = 14000?70.2           = 199.43 ? 199So 199 such setups can be cooled by using the given cooling door.Case Studies You have the following characteristics, as shown in the table below, on your company’s processor for a certain benchmark, which runs at 400 MHz:Instruction Type Frequency (%) CyclesArithmetic and logical 30 1Load and Store 20 2Branches 40 3Floating Point (FP) 10 5You are asked to consider a cheaper, lower?performance version of this processor, by removing some of the FP hardware to reduce the die size. The wafer has a diameter of 10 cm, costs $1,000, and has a defect rate of 2/(cm2). This wafer has a 75% yield. The current chip has a die size of 12 mm2. The new chip becomes 10 mm2, and FP instructions will now take 13 cycles to execute.a. 10 What are the old and new CPI (Cycles Per Instructions) and MIPS (Million Instructions Per Second) ratings running this benchmark?b. 10 What are the old and new die yields? What are the old and new costs per (working) processor? Please comment on the overall effect of the proposed hardware change on the cost and the performance of the processor. (Assume process-complexity factor is 4)c. 10 What would be the theoretical limit of the best possible overall speedup that we could ever get by only improving the FP unit, and what would be the CPI and MIPS ratings of this new processor?Solution (a):Given: Wafer diameter: 10 cm | Wafer cost: $1000 | Defect rate: 2/cm2 | Yield = 75% | Die size of current chip: 12mm2 | Die size of new chip: 10mm2, Cycles for FP instructions: 13 | Clock rate: 4 * 108To find: New and old CPI and MIPSHere,CPI = ?_(i-1)^n??( IC*CPI )?/ Instruction countMIPS = (Clock rate)?(CPI*?10?^6 )Hence, substituting respective values in CPI equation gives:CPIoriginal = ((30* 1)+(20* 2)+(40* 3)+(10* 5))?((30+20+40+10))   = (30+40+120+50)?100   = 240/100   = 2.4CPInew = ((30* 1)+(20* 2)+(40* 3)+(10* 13))?((30+20+40+10))   = (30+40+120+130)?100   = 320/100   = 3.2Similarly, substituting respective values in MIPS equation gives:MIPSoriginal = (4* ?10?^8)?(2.4* ?10?^6 )       = 166.66MIPSnew = (4* ?10?^8)?(3.2* ?10?^6 )       = 125Solution (b):Die per wafer is given by the following equation,Die per wafer = (?*?(wafer diameter/2)?^2)?(Die area)-( ?*wafer diameter)??(2*die area)Hence, inserting the given values to the equation yields:?Die per wafer?_old=(? *?(10/2)?^2)/0.12-(? * 10)/?(2 * 0.12)?Die per wafer?_old= 590.37?Die per wafer?_new=(? *?(10/2)?^2)/0.1-(? *10)/?(2 * 0.1)?Die per wafer?_new= 715.02Meanwhile,Die yield = Wafer yield * 1 / 1+ (Defects per unit area * Die area) NInserting the given values to the equation yields:Die yieldold = 0.75 * 1 / 1+ (2 * 0.12) 4                    = 0.59Die yieldnew = 0.75 * 1 / 1 + (2 * 0.1) 4                     = 0.62Therefore, Cost of dieold = 1000 / (590.37 * 0.59) = 2.87$Cost of dienew = 1000 / (715.02 * 0.62) = 2.26$The proposed change in the hardware seems affirmative. Die per yield achieved in the new die is commensurately more than the old one. Meanwhile there is cost reduction too in the manufacturing of the new die when juxtaposed with the older one. Hence, the new die is lucidly more advantageous in terms both processing power and manufacturing cost.Solution (c):Assuming, each FP operation will complete in one cycle only. Hence, Fractionenhanced = 5/11, Speedupenhanced = 5Speedupoverall = 1?(( 1-Fraction enhanced )+ (Fraction enhanced)/(Speedup enhanced))Hence, inserting the given values to the equation yields:Speedupoverall = 1?(( 1-5/11)+ (5/11)/5)             = 1.57Similarly,CPI = ?_(i-1)^n??( IC*CPI )?/ Instruction countCPInew = ((30* 1)+(20* 2)+(40* 3)+(10* 1))?((30+20+40+10))   = (30+40+120+10)?100   = 200/100   = 2.00MIPS = (Clock rate)?(CPI*?10?^6 )MIPSnew = (4* ?10?^8)?(2* ?10?^6 )       = 200 10 Your company produces a mobile device. To extend the battery life in the newer version of the device, you are asked to elaborate on the idea to simply reduce the processor clock speed by 20%, and make no other changes. Stating your assumptions, describe whether this is a good idea or a bad idea, and why. Make sure to address both power and energy.Solution:We know that clock frequency is inversely proportional to the clock period.Hence when the clock speed decreases by 20 percent, the same results in an increase of 20 percent in the clock period. A decrease in clock period results in an overall increase in the execution time as the same number of instructions still be need to processed but this time with less or decreased number of clocks per second.This decreases the processing and hence the execution is delayed.Also, Power = Voltage * (Capacitance)2 * FrequencySo frequency being not the only factor for power consumption, it is difficult to examine or assert about the power in the given situation. For example, suppose for a hypothetical processor ‘P’, the clock speed is 1 GHz and is required to perform only 1 instruction. Such processor will only need 1 cycle in order to complete its execution.Now considering another hypothetical processor ‘P1’ with clock speed less than 20 percent as that of processor ‘P’ and is also required to perform only 1 instruction. Hence ‘P1’ will completed 80 percent of its execution in one cycle, while still have to perform operations in order to complete the execution. So this extra time required for keeping on the processor needs power. Thus it is difficult to comment on the battery life on simply considering a reduction in the clock speed of the processor. References1. Textbook:Computer Architecture – A Quantitative Approach by Hennessy & Patterson, 5th edition, MKP, 20122. Amdahl, Gene M., Gerrit A. Blaauw and Frederick P. Brooks. “Architecture of the IBM System/360.” IBM Journal of Research and Development 8 (1964): 87-101.3. David A. Patterson and David R. Ditzel. 1980. The case for the reduced instruction set computer. SIGARCH Computer Architecture News 8, 6 (October 1980), 25-33.4. Information regarding IBM 360 fetched from:Student Textbook – Introduction to IBM System/360 Architecture, IBM.5. Processors transformation information fetched from: 6. Specification of Intel processor fetched from: and 7. Information on RISC from: and