Digitaalipiirit/Digitaalisen tiedon esitystavat

Digitaalipiirit

Tämä teksti on tuotu vieraskielisestä lähteestä ja sen käännös on keskeneräinen.
Voit auttaa Wikikirjastoa tekemällä käännöksen loppuun.

Lukumäärä vs Luvut

Lukumäärän ja lukujen välillä on merkittävä ero. Lukumäärä on yksinkertaisesti jokin määrä jotakin asiaa. Esimerkiksi viisi omenaa, kolme kiloa ja yksi henkilöauto ovat erilaisten asioiden lukumääriä. Lukumäärää voidaan esittää millä tahansa esitystavalla. Esimerkiksi tukkimiehen kirjanpito, helmet helmitaulussa tai kivet taskussa voivat kaikki esittää jonkin asian lukumäärää. Yksi tunnetuimmista esitystavoista on desimaalijärjestelmä eli 10-järjestelmä. Siinä on käytössä 10 erilaista numeroa 0-9. Kun yli 9 kohdetta pitää laskea, teemme uuden sarakkeen jossa on 1 (joka esittää kymmeniä) ja jatkamme laskemista siitä.

Tietokoneet, kuitenkaan, eivät voi laskea desimaalijärjestelmän mukaan. Tietokonelaitteisto käyttää järjestelmää jossa arvot esitetään sisäisesti sarjana jännite-eroja. Esimerkiksi useimmissa tietokoneissa +5V jännite esittää numeroa "1" ja 0V esittää numeroa "0". Muut numerot eivät ole mahdollisia! Siksi tietokoneiden pitää käyttää lukujärjestelmää jossa on käytössä vain kaksi numeroa: binäärijärjestelmä eli 2-järjestelmä.

Binääriluvut

Binäärilukujärjestelmän ymmärtäminen voi olla aluksi hankalaa. Saattaa olla helpompaa aloittaa desimaaliluvulla, sillä se on tutumpi. On mahdollista kirjoittaa luku kuten 1234 niin, että jokaisen paikan painoarvo on näkyvissä:

$1234_{10}=1\times 10^{3}+2\times 10^{2}+3\times 10^{1}+4\times 10^{0}=1000+200+30+4$

Huomaa, että jokainen numero kerrotaan 10:n peräkkäisillä potensseilla sillä tämä on desimaali, tai 10-järjestelmä. Ykköset ("4" tässä esimerkissä) kerrotaan $10^{0}$ , tai 1:llä. Jokainen numero ykkösten vasemmalla puolella kerrotaan yhtä suuremmalla kymmenen potenssilla ja lisätään edeltävään arvoon.

Nyt teemme saman binääriluvulle, mutta koska tämä on 2-kantainen luku, korvaamme kymmenenpotenssit kahdenpotensseilla:

$1011_{2}=1\times 2^{3}+0\times 2^{2}+1\times 2^{1}+1\times 2^{0}=11_{10}$

Alaindeksit merkitsevät kantalukua. Huomaa, että yllä olevissa yhtälöissä:

$1011_{2}=11_{10}$

Binääriluvut ovat samoja kuin niitä vastaavat desimaaliluvut, ne ovat vain erilainen tapa esittää annettua lukumäärää. Yksinkertaistaen, ei ole väliä onko sinulla $1011_{2}$ vai $11_{10}$ omenaa, voit silti tehdä piirakan.

Bitit

Bitti (englanniksi bit, eli binary digit, ”binäärinumero”). Jokainen bitti on yksittäinen binääriarvo: 1 tai nolla. Tietokoneet yleensä esittävät 1:n positiivisena jännitteenä (5 volttia tai 3,3 volttia ovat yleisiä arvoja) ja nollaa 0 volttina.

Eniten merkitsevä bitti ja vähiten merkitsevä bitti

Desimaaliluvussa 48723 numero 4 esittää kymmenen suurinta potenssia (tai $10^{5}$ ), ja numero 3 esittää pienintä kymmenen potenssia ( $10^{0}$ ). Joten tässä luvussa 4 on eniten merkitsevä numero ja 3 on vähiten merkitsevä numero. Ajattele tilannetta jossa pitopalvelun on valmistettava 156 ateriaa häihin. Jos pitopalvelu tekee virheen vähiten merkitsevässä numerossa ja vahingossa tekee 157 ateriaa, se ei ole suuri ongelma. Kuitenkin jos virhe tehdään eniten merkitsevän numeron kohdalla (joka on 1) ja valmistaa 256 ateriaa, aiheutuu suuri ongelma!

Nyt ajattele binäärilukua 101011. Eniten merkitsevä bitti (MSB, Most Significant Bit) on vasemmanpuolimmainen bitti, koska se esittää suurinta kahden potenssia ( $2^{5}$ ). Vähiten merkitsevä bitti (LSB, Least Significant Bit) on oikeanpuolimmainen bitti ja esittää pienintä kahden potenssia ( $2^{0}$ ).

Huomaa, että MSB ja LSB eivät vastaa muista luonnontieteistä tuttuja merkitseviä numeroita. Desimaalilla 123000 on 3 merkitsevää numeroa, mutta MSB on 1(ensimmäinen numero vasemmalta) ja LSB(ensimmäinen numero oikealta) on 0.

Standardit Koot

Nibble: Nibble on 4 bittiä pitkä. Nibble voi esittää arvoa 0 - 15 (desimaalijärjestelmässä).

Tavu (Byte): tavu on 8 bittiä pitkä. Tavu voi esittää arvoa 0 - 255 (desimaalijärjestelmässä).

Sana (Word): Sana on 16 bittiä, tai 2 tavua pitkä. Sana voi esittää arvoa väliltä 0 - 65535 (desimaalijärjestelmässä). Joskus tämän ja termin "konesana" välillä on sekaannusta. Katso konesana alta.

Kaksoissana (Double-Word): Kaksoissana on 2 word:iä pitkä, tai 4 tavua pitkä. Nämä tunnetaan myös nimellä "DWords". Kaksoissana on 32 bittiä pitkä. Siten 32-bittiset tietokoneet käsittelevät dataa joka on kaksoissanan pituista.

Quad Word: Quad-Word on 2 DWordiä pitkä, 4 sanaa pitkä ja 8 tavua pitkä. Niitä kutsutaan yleensä termillä "QWords". QWordit ovat 64 bittiä pitkiä ja täten ovat yleisin datakoko 64-bittisissä tietokoneissa.

Konesana (Machine Word): Konesana on kyseisen koneen yleisimmin käytössä oleva datan koko. Esimerkiksi 32-bittisessä tietokoneessa on käytössä 32-bittinen konesana. Vastaavasti 64-bittisessä tietokoneessa konesana on 64-bittiä pitkä. Joskus termi "konesana" lyhennetään yksinkertaisesti sanaksi ja tämä jättää mahdollisuuden virhetulkintaan, eli puhutaanko nyt konesanasta vai pelkästä sanasta.

Negatiiviset luvut

Voisi näyttää loogiselta, että esittääkseen negatiivisen luvun binäärilukuna, lukijan tarvitsisi vain liittää luvun eteen "-"-merkki. Esimerkiksi binääriluku 1101 voitaisiin muuttaa negatiiviseksi vain kirjoittamalla se "-1101". Tämä näyttää hyvältä ja toimivalta kunnes tajutaan, että tietokoneet ja digitaalipiirit eivät ymmärrä miinusmerkkiä. Digitaalipiireissä on käytössä vain bittejä ja siksi negatiivisen ja positiivisen luvun erottamiseen voidaan käyttää vain bittejä. Kun tämä pidetään mielessä, meillä on useampia vaihtoehtoja joilla voidaan tehdä binääriluvut negatiivisiksi tai positiivisiksi. Näitä ovat etumerkillinen, yhden komplementti ja kahden komplementti.

Etumerkki-itseisarvo (Sign and Magnitude)

Tässä etumerkki-itseisarvo esityksessä, annetun binääriluvun eniten merkitsevää bittiä (MSB) käytetään merkitsemään onko luku negatiivinen vai positiivinen. Jos MSB = 0, luku on positiivinen ja jos MSB = 1, luku on negatiivinen. Tämä tapa näyttää todella helpolta ja yksinkertaiselta lukuun ottamatta yhtä seikkaa: laskutoimitukset tällä esitystavalla ovat todella vaikeita. Otetaan kaksi 4 bitin mittaista lukua: 1001 ja 0111. Etumerkki-itseisarvo esityksellä voimme muuttaa ne esittämään: -001 ja +111. Desimaalijärjestelmässä ne ovat luvut -1 ja +7.

Kun laskemme ne yhteen, summan -1 + 7 = 6 pitäisi esittää tulosta jonka me saamme. Kuitenkin:

 001
+111
----
 000

Ja tämä ei pidä paikkaansa. Tämän takia tarvitsemme järjestelmän joka kertoo onko MSB asetettu vai ei, ja jos se on asetettu, me vähennämme, ja jos ei ole asetettu, me summaamme. Tämä on suuri ongelma ja siksi eteumerkki-itseisarvo esitystä ei käytetä.

 IN MICROPROCESSOR OPERATION 1 IS A -VE NUMBER AND 0 IS A +VE NUMBER

One's Complement

Let's now examine a scheme where we define a negative number as being the logical inverse of a positive number. We will use the same "!" operator to express a logical inversion on multiple bits. For instance, !001100 = 110011. 110011 is binary for 51, and 001100 is binary for 12. but in this case, we are saying that 001100 = -110011, or 110011(binary) = -12 decimal. let's perform the addition again:

 001100 (12)
+110011 (-12)
-------
 111111

We can see that if we invert 000000(binary) we get the value 111111(binary). and therefore 111111(binary) is negative zero! What exactly is negative zero? it turns out that in this scheme, positive zero and negative zero are identical.

However, one's complement notation suffers because it has 2 representations for zero: all 0 bits, or all 1 bits. This is a bit clumsy, so we create a new representation, two's complement.

Two's Complement

Two's complement is a number representation that is very similar to one's complement. We find the negative of a number X using the following formula:

-X = !X + 1

Let's do an example. If we have the binary number 11001 (which is 25 in decimal), and we want to find the representation for -25 in twos complement, we follow two steps:

Invert the numbers. 11001 -> 00110
add 1. 00110 + 1 = 00111

Therefore -11001 = 00111. Let's do a little addition:

 11001
+00111
------
 00000

Now, there is a carry from adding the two MSBs together, but this is digital logic, so we discard the carrys. It is important to remember that digital circuits have capacity for a certain number of bits, and any extra bits are discarded.

An important example of two's complement arithmetic is the C programming language, which stores it's negative integer values in two's complement form.

Signed vs Unsigned

One important fact to remember is that computers are dumb. A computer doesnt know whether or not a given set of bits represents a signed number, or an unsigned number (or, for that matter, and number of other data objects). It is therefore important for the programmer (or the programmers trusty compiler) to keep track of this data for us. Consider the bit pattern 100110:

Unsigned: 38(decimal)
Sign+Magnitude: -6
One's Complement: -25
Two's Complement: -26

See how the representation we use changes the value of the number! It is important to understand that bits are bits, and the computer doesnt know what the bits represent. It is up to the circuit designer and the programmer to keep track of what the numbers mean.

Character Data

We've seen how binary numbers can represent unsigned values, and how they can represent negative numbers using various schemes. But now we have to ask ourselves, how do binary numbers represent other forms of data, like text characters? The answer is that there exist different schemes for converting binary data to characters. Each scheme acts like a map to convert a certain bit pattern into a certain character. There are 3 popular schemes: ASCII, UNICODE and EBCDIC.

ASCII

The ASCII code (American Standard Code for Information Interchange) is the most common code for mapping bits to characters. ASCII uses only 7 bits, although since computers can only deal with 8-bit bytes at a time, ASCII characters have an unused 8th bit as the MSB. ASCII codes 0-31 are "Control codes" which are characters that are not printable to the screen, and are used by the computer to handle certain operations. code 32 is a single space (hit the space bar). The character code for the character '1' is 49, '2' is 50, etc... notice in ASCII '2' = '1' + 1 (the character 1 plus the integer number 1)). This is difficult for many people to grasp at first, so don't worry if you are confused.

Capital letters start with 'A' = 65 to 'Z' = 90. The lower-case letters start with 'a' = 97 to 'z' = 122.

Almost all the rest of the ASCII codes are different punctuation marks.

Extended ASCII

Since computers use data that is the size of bytes, it made no sense to have ASCII only contain 7 bits of data (which is a maximum of 128 character codes). Many companies therefore incorporated the extra bit into an "Extended ASCII" code set. These extended sets have a maximum of 256 characters to use. The first 128 characters are the original ASCII characters, but the next 128 characters are platform-defined. Each computer maker could define their own characters to fill in the last 128 slots.

UNICODE

When computers began to spread around the world, other languages began to be used by computers. Before too long, each country had it's own character code sets, to represent their own letters. It is important to remember that some alphabets in the world have more then 256 characters! Therefore, the UNICODE standard was proposed. UNICODE uses "wide characters" which are 2-bytes (1 word) long, and can therefore encompass up to 65536 possible characters. Each different language then was given a "page" of values in this 65536 character range. For convenience, the first 128 characters of the UNICODE set are the orginial ASCII characters.

EBCDIC

EBCDIC (Extended Binary Coded Decimal Interchange format) is a character code that was originally proposed by IBM, but was passed in favor of ASCII. IBM however still uses EBCDIC in some of it's super computers, mainframes, and server systems.

Other Number Representations

Similarly to Binary, the Octal representation has been very popular for some facets of data representation.

Octal

Octal numbers have a base of 8. It uses the digits 0-7. For instance, if we examine the octal number 347:

$347_{8}=3\times 8^{2}+4\times 8^{1}+7\times 8^{0}=231_{10}$

Octal and Binary

Octal has an interesting property that it is specifically easy to convert between octal and binary numbers. Consider the binary number: 101110000. To convert this number to octal, we must first break it up into groups of 3 bits: 101, 110, 000. Then we simply add up the values of each bit:

$101_{2}=1\times 2^{2}+0\times 2^{1}+1\times 2^{0}=5_{8}$
$110_{2}=...=6_{8}$
$000_{2}=0_{8}$

And then we put all the octal digits together: 560(octal).

Hexadecimal is a very common data representation. It is more common than octal, but it can be a little harder to learn and understand.

Hexadecimal

Hexadecimal uses a base of 16. However, there is a big difficulty in that it requires 16 digits, and the common number system only has 10 digits to play with (0 through 9). So, to have the necessary number of digits to play with, we use the letters A through F, in addition to the digits 0-9. Here is an example:

Hex Digit	Decimal Digit	Octal Digit	Binary Bit
0	0	0	0000
1	1	1	0001
2	2	2	0010
3	3	3	0011
4	4	4	0100
5	5	5	0101
6	6	6	0110
7	7	7	0111
8	8	10	1000
9	9	11	1001
A	10	12	1010
B	11	13	1011
C	12	14	1100
D	13	15	1101
E	14	16	1110
F	15	17	1111
10	16	20	10000

Hexadecimal Notation

Depending on the source you are reading hexadecimal may be indicated in one of several ways:

0xaa11: The normal "C" notation for hexadecimal. the "0x" prefix indicates that the remaining digits are all hexadeximal digits. For instance, 0x1000 is not the same as 1000 (decimal).

\xaa11: The "C string" notation for hexadecimal.

0aa11h: The "ASM" notation for hexadecimal. The "h" suffix shows that the number is hexadecimal. This is possible because "h" is not a decimal digit. The '0' (zero) at the beginning is to ensure the assembler reads it as a number rather than as a label.

#AA11: The "BASIC" notation for hexadecimal.

aa11₁₆: The "math" notation for hexadecimal.

Both uppercase and lowercase may be used. Lowercase is generally preferred in a UNIX or C environment, while uppercase is generally preferred in a mainframe or COBOL environment. When not on a computer, however, either case is equally fine.

For further reading

Floating Point/Fixed-Point Numbers