Moving Your Applications from COBOL to C
Shawn M. Gordon
S.M.Gordon & Associates


Over the last few years I have read many fine (and not so fine) books, articles and columns on the C language. I have even read several that compared it to languages like PASCAL and SPL. What I have never seen, and have been waiting for, is one that showed how a COBOL programmer, such as myself, might find it easier to jump onto the C bandwagon. Finally I decided I was going to have to write it myself, so I wrote a review of a C compiler, and during the course of the review rewrote several of my COBOL applications into C. I should also mention that I have taken several C class's over the last four years.

What I hope to do here is help those COBOL programmers out there who are wondering what all the C fuss is about, and if they should bother learning a whole new language. If you do decide to make the switch, hopefully you will find enough information here to help get you started. Remember, this is not intended to teach you the C language, it is meant to point you in the right direction when trying to come up with command equivalencies so you have a better starting point.

Remember that C and COBOL come from completely different roots, this is what makes comparisons very difficult, and it is also why making comparisons to PASCAL is very easy.


COBOL gives us a couple of options for sharing previously defined record layouts and variable declarations. The easiest and most straight forward is the $INCLUDE 'filename'. This will take an MPE numbered file and copy it into the source file at compile time. This works just fine if you aren't worried about modifying any of the statements.

The more versatile option is to use COBEDIT and maintain a COPYLIB. This is a KSAM file that contains all the different sequences of COBOL commands that you are interested in maintaining in a global area. The advantage to using the COPY command to retrieve members from a COPYLIB is that you can do dynamic text replacement. So if you want to define your customer master data set twice, once for the data set and once for a file, you could have set up a prefix that in the COPYLIB that you could change at compile time by using the command;

COPY textname REPLACING ==pseudo-text-1== BY ==pseudo-text-2==.

Thus eliminating the need to maintain two different definitions in your COPYLIB.

While C does have an 'include' statement, it doesn't have an equivalent to the COPY statement. However, by using the '#if', '#ifdef', '#elif' and '#ifndef' commands you can control what files get included at compile time. Say for example you are coding some sort of portable application and you have defined different file I/O functions in different files to accommodate your different platforms. You could control it very easily with the following construct;

#define SYS "HP3000"
#if SYS == "HP3000"
#include "hp3kio.h"
#elif SYS == "VAX"
#include "vaxio.h"
#elif SYS == "AS400"
#include "as4io.h"

There are several important things here to learn; First off the #define command is actually almost the same as the $DEFINE in COBOL. They both allow you to define and name a macro, this macro will create "in-line" code, so it will generate more efficient, if slightly larger code. However the COBOL define statement can be as long as you want, whereas many C compilers restrict you to just the line that contains the #define statement. Passing parameters to a C macro is more confusing as well, just make sure to sprinkle liberally with parentheses to ensure the order of precedence.

The #define can also work rather like an 88 level variable in COBOL, for example;

#define FALSE 0
#define TRUE !FALSE

I could then say 'if FALSE' just like in COBOL with an 88. The advantage here is that I can define TRUE to be 'not' FALSE by use of the ! (which means not) in the second #define statement. This doesn't really have an equivalent in COBOL.

Now that I have explained just enough about the #define statement to get us through our first example, why don't I finish explaining the example.

Since we have defined SYS to be equal to the character string HP3000 the first #if statement will be true and the file hp3kio.h will be copied into the program at compile time. The #elif is a really stupid way of saying 'else if', only the assembler gods know why they chose to abbreviate it the way they did. So if we want to compile our code to run on a VAX we would only need to change the #define line, and then our code will run on a VAX if we have done our initial setup correctly.

Believe it or not you could accomplish almost the same thing using the little known and rarely used compiler switches available in COBOL. By using the following commands you could accomplish a similar affect.


It is important to note that while COBOL is not case sensitive, C is. So myvar, Myvar, and MYVAR would actually be three different things. An informal standard that I have seen is that all your 'regular' C code is in lower case and any macros and sometimes functions, that you define are done in upper case to make it quick and easy to distinguish between what is part of C and what is yours. This isn't a bad idea at all.

One last note while we are talking about getting started, is how to put comments in your code. In COBOL you put a * in column 7 and then type whatever you want, or you can use the last 5 characters of each line to put something meaningful. C is a little like SPL in the way it deal's with comments. Your comments must be enclosed in matching /* ... */, this can occur on a line with commands, by itself, or spread across lines. Here are a few examples;

/* here is my comment */
here is another comment
that spans multiple lines
a=b; /* here is a comment on a line with a command */


OK the most obvious next step is how to define variables in C. This is where C really has it over COBOL in some respects, as you can define global AND local variables. This would be the equivalent of having something like a state validation paragraph in a COBOL program that declared all the variables it needed at the top of the paragraph and then they went away when the paragraph returned to the calling process. Imagine, no more global variables getting declared that only got used once, or even not at all.

C differs significantly from COBOL in the way that variables are defined. The easiest way to illustrate the differences for integer types is to compare them to IMAGE integer types and then declare them in each language. Since I J and K are basically the same, and J is the only one used commonly in COBOL, I will use that.



J PIC S9(4) COMP short
J2 PIC S9(9) COMP long
J4 PIC S9(18) COMP extended /* CCS specific */
Z PIC 9 char
Real numbers can't directly be used in COBOL, but the C can use them. COBOL typically uses an implied decimal in a J2 type field.
R2 float
R4 double
P4 PIC S9(3) COMP-3 char[3]
P8 PIC S9(7) COMP-3 char[7]
P12 PIC S9(11) COMP-3 char[11]
P16 PIC S9(15) COMP-3 char[15]

An important note here on declaring simple integers in C, a standard declaration would be 'int my_counter'. Now the length of my_counter will be dependent on the native architecture of the machine it was declared on. So on a Classic MPE V machine saying 'int' will declare a 16 bit integer or PIC S9(4) COMP. The problem is that if you move that code to a spectrum that declaration will suddenly be 32 bit or PIC S9(9) COMP. We get around that by declaring them to be either a 'short' int which is 16 bits, or a 'long' int which is 32 bits.

Our next variable type is the string or character array. C deals with strings in an extremely annoying way. Because everything is a single character you have to define an array of characters, this is also how you have to reference it. Here is an example of how you could do it in COBOL and how you would have to do it in COBOL if it worked like C;

char name[8]; /* a variable called name that is eight characters */

01 NAME PIC X(08).

or if it was defined like C would have you do


Actually character arrays in C will always be null terminated, so if you needed an eight character array you would need to make it nine to account for the null character at the end.

The only way to initialize a character array to spaces in C is to move single characters to each element of the array. Here would be your choices in the two languages;


int i;
char name[9];
for (i=1; i<=9; i++)
name[i] = " ";

Pretty nasty huh? I will explain all the things in the C statement that didn't make sense later. C also differentiates between a character and a string. Since the 'char' type only really declares a single character those don't need to be null terminated, so these two declarations are different;

char switch;
char switch2[1];

You would actually need to make 'switch2' an array of two characters for it to function the way you would expect it to because of the null terminator on strings. A null is defined in C with \0, and all of the string manipulation functions rely on the proper placement of the null character so that you will get the expected output.

The distinction between a character and a string in C takes a little getting used to, for example, if you want to initialize our character variable switch to Y you would enclose it in the single quote character ', i.e., 'Y'. Single quotes denote that it is a single character, whereas for switch2 you would use double quotes " to enclose the string, i.e., "Y". It is VITAL that you keep this straight, some compilers won't complain if you use this incorrectly and you could get some really unpredictable results.

Now that we know how to declare simple variables in C, how would we declare a record structure analogous to the 01 variable declaration in COBOL? There is what is known as the 'struct' in C that is used for this exact purpose, although the implementation is a little bit odd. First let's declare a simple layout in COBOL, then I'll do the same in C;

    03 CM-NUMBER      PIC X(06).
    03 CM-NAME        PIC X(30).
    03 CM-AMT-OWE     PIC S9(9) COMP.
    03 CM-YTD-BAL     PIC S9(9) COMP OCCURS 12.
    03 CM-PHONE       PIC X(10).

struct customer_master {
       char cm_number[7];
       char cm_name[31];
       long cm_amt_owe;
       long cm_ytd_bal[12];
       char cm_phone[10];
struct customer_master cust_mast;

The 'struct' verb declares a template of the record type that you are concerned with, once the template is declared you can then declare a variable that is a type of that structure. So the line 'struct customer_master cust_mast' declares a variable 'cust_mast' to be of a type 'customer_master'. You would then reference the member's of the structure by specifying the variable name dot member, i.e., 'cust_mast.cm_name'.

This can be especially handy if you are going to reuse a structure for a different purpose. The drawback here is that there is no convenient way to initialize the variable inside of a structure without addressing each member individually. COBOL has the very handy INITIALIZE verb to do this, you could write a general purpose initialization function in C that would serve the same purpose however.

You can name the structure at the same time as you declare if you don't want to reuse it. After the } and before the ; just put any old variable name that you want it to have.

The last common verb used in the Working Storage section is REDEFINES. At first I didn't think there was an equivalent, but I was wrong, it is the 'union' verb. Redefines is mostly handy for working with a variable as either alpha or numeric. Since byte referencing was introduced in COBOL-85, you hardly ever see a REDEFINE statement used to get at various substrings within a variable anymore. Now let's look at how you would declare a REDEFINE and a 'union'.

        05 CM-DL-NUM PIC 999.

union redef {
char days_late[4];
int dl_num;
union redef days_late_test;

The setup and use of unions is very similar to structs, you can even put a union inside a struct, which is where you would want to use it most of the time anyway. We made 'days_late' a character array of 4 because we have to remember to account for the null character. You can do all sorts of strange things with union's if you care to, but that is really all I am going to touch on.

One other type that I want to touch on is the enumerated type. By using the 'enum' keyword, we can create a new "type" and specify values it may have. (Actually, 'enum' is type 'int', so we really create a new name for an existing type.) The purpose of the enumerated type is to help make a program more readable, like the COBOL 88 level. The syntax is similar to that used for structures;

enum ranges {min = 10, max = 100, mid = 55};
enum ranges tester;
tester = mid;
if (tester > min)

The if statement would be true because tester would have a value of 55. I suggest that if you want to use enumerated types that you read up on them a heck of a lot more than what I just touched on here.

The last point I want to make about variable declaration is that C has very little facility for applying edit masks compared to COBOL. This makes it a less than convenient language for writing reports and such where date and dollar edit masks are used extensively.


Let's first go through the difference between = and == in C. In COBOL there are several ways to get data into a variable. A common way is the MOVE verb, the equivalent in C is =. This is confusing because if you set up a logical test and use = it will always evaluate to true because the value on the right hand side of the equal sign will be assigned to the variable on the left side. You need to use == if you want to compare values. Here is a simple mistake to make that can cause all sorts of problems. If in C I were to say

if(my_int = 5)

what would happen is that the variable 'my_int' would be assigned the value 5 and since the assignment went successfully then it would return a non-zero value which would indicate true. This would make the 'if' statement evaluate as true. So this 'if' statement would always be true. Simple mistake, major repercussions, and the compiler won't complain about it because it's a valid statement. Make sure you learn the distinction between = and == early and never forget it.

Our next boolean operator is 'AND'. COBOL makes it very easy to compare multiple values in an IF statement;


The C representation for 'AND' is '&&', 'OR' is '||', and 'NOT EQUAL' is identified by '!='.

None of these are particularly difficult to learn, but they are less than intuitive when first learning the language.

Being creative you could make use of the #define to make the operator's anything you want, for example;

#define AND &&
#define OR ||
#define NOT !=
#define EQUALS ==

Now this may tick off the other C programmers, but if it helps you to make your code more readable and easier to maintain, who's to say it's wrong?


Since I already talked about how the = sign in C is equivalent to the COBOL MOVE verb, I won't talk about it again. However another option to using the = sign in C is to use the string function 'strcpy'. This performs a 'string copy' into a variable. This is a good way to initialize a character array and make sure that a null is properly placed (strcpy appends the null to the end of the string copied in), especially if you want to append data to the string later. The verb in C to append strings is 'strcat' for 'string concatenation'. In COBOL we have the 'STRING' verb, which gives you much finer control over how variables are concatenated. All the 'strcat' does is copy one character until it hits a null into another character array, starting at the terminating null. You can see now why having a null in the right place is so important.

There are a couple of really cool increment operator shortcuts in C that I absolutely love. In COBOL if you want to increment a variable in a loop for instance you can do it a couple of ways;


In C you could say the following;

kount = kount + 1
kount += 1
kount ++
for (kount = 1; kount >= max; kount++)

The first example should be obvious, in the second example the += is a shortcut of the first example, it means to include the variable on the left side of the = sign in the computation on the right side. The third example means increment the value by 1. Our first two examples could have had any value in the addition, but the third one simply increments by one. The last example is identical to the last COBOL example. What is interesting is the last parameter 'kount++', by putting the ++ on the right side of the variable we are saying to increment the variable after the test is made as to weather it is greater than or equal to 'max'. If the ++ is put to the left side, as in '++kount', it means to increment BEFORE the test is made. This is the same as the PERFORM directive 'WITH TEST BEFORE' or 'WITH TEST AFTER'. The - sign can be used to decrement in the exact same fashion, i.e.,= or --.


Hopefully everyone here is familiar with the COBOL PERFORM statement. It has more variations than I am willing to get into, but it is the only looping construct it has, unless you count using GO TO. C offers several different looping controls, there is;




Now COBOL nicely lumps the functionality of both the 'while' and 'for' loops into the PERFORM. The 'do..while' loop however is not explicitly the same, you can simulate it by controlling your variables correctly. In essence, the difference between 'while' and 'do..while' is that in the 'do..while' loop, it will 'do' the loop at least once since the test isn't until AFTER the loop has been executed once. In the 'while' loop your test may not be valid the first time you do it so you may never actually go through the loop.

int i = 21;
while(i++ < 20) {

/* the above line will display nothing since i is already greater than 20 */

} while(i++ < 20);

/* the above line will display 21 */

Let's talk about the 'if' statement, here is an example of how confusing a string comparison operation can be. In COBOL the following statement is very straight forward


You cannot compare strings that way in C. There is a function in the 'string.h' header file that will compare two strings, however to get the same result it would have to be worded as follows;

if (!strncmp(string1,string2)) {

Let me explain, first off 'strncmp' is a string compare function, if string1 is less than string2 then a value less than zero will be returned (sort of like using Condition Code in COBOL). If string1 is equal to string2 then a value of zero is returned. If string1 is greater than string2 then a value greater than zero is returned. The problem is that if you use strncmp in an IF statement and the strings are equal then zero is returned, zero indicates that the IF statement is false, that is why we prefaced the 'strncmp' with '!' which means not. This has the net result of returning a non-zero value, which is TRUE for the IF statement, if string1 and string2 are equal. This also further illustrates how logical expressions can be embedded in the 'if' statement.


C is blessed with having no I/O facilities built into the language at all. Sort of like SPL, but SPL has the advantage of using the I/O intrinsics easily. So how do you do any sort of terminal or file I/O? Fortunately somebody back in the dark ages of C programming wrote the Standard I/O header file. So if you want to do any I/O you must include <stdio.h>. I will talk about some of the more basic features and functions included in stdio.

The most commonly used functions from stdio are 'printf' and 'scanf'. 'printf' is used to display information to STDLIST and 'scanf' is used to read information from STDIN. Both of these functions have extensive formatting capabilities included in their usage. COBOL is nice because you can use DISPLAY and ACCEPT to read or write virtually anything you want. You can't however do type conversion or variable formatting in the statements themselves, you would have to declare a formatting variable in working storage first. While C gives very little to no capability for declaring formatting variables, it does give you extensive control over formatting your output. A simple example would be displaying an integer that has two decimal points.

01 EDIT-INT     PIC ZZ9.99
01 MY-INT       PIC S9(3)V99.


float my_int;

As you can see, it's simpler to format in C, if somewhat less intuitive. The way 'printf' works is that it takes a literal and/or formatting string in the first paramter, which is the part inside the quotes. And then takes variable substitution the second parameter. Then \n means issue a new line at that point. Here is an example of embedded text;

printf("I have %d apples and %d oranges",count_apple, count_orange);

This would substitute 'count_apple' into the first parameter and 'count_orange' into the second. There are almost a dozen different formatters available in the 'printf'.

Another interesting feature of 'printf' is it's ability to do type conversion, for this example you need to know that %d means print an integer and %c means print a character;

printf("%c %d\n", 'A', 'A');

what do you think the output of this would be? Odds are you guessed wrong, you would see "A 65" because by specifying %d for the alpha character 'A' it would format it to the decimal ASCII code for 'A', which is 65.

Like 'printf()', 'scanf()' uses a control string followed by a list of arguments. The main difference is in the argument list. Printf() uses variable names, constants, and expressions. Scanf() uses pointers to variables. Fortunately, we don't have to know anything about pointers to use the function. Just remember these two rules:

1. If you want to read a value for a basic variable type, precede the variable name with an &.

2. If you want to read a string variable, don't use an &.

Here is a short example of displaying output and prompting for input:

int age;
float assets;
char pet[30];
printf("enter your age, assets, and favorite pet.\n");
scanf("%d %f", &age, &assets);
scanf("%s", pet); /* no & for char array */
printf("%d $%.0f %s\n ", age, assets, pet);

Which would look something like this if you were to run it.

enter your age, assets, and favorite pet.
32 $10507 penguin

An interesting point here is that the scanf() can read more than one variable in at a time, unlike the COBOL ACCEPT verb. One last point on printf(), the following 2 statements work identically:

printf("Enter your option ");

DISPLAY "Enter your option " NO ADVANCING.

As I mentioned earlier, C is geared towards single characters, not strings. So there is a whole set of functions in stdio that are geared towards reading and writing single characters. Since COBOL cares very little about how big an array you use to read or write I am not going to get into the specifics, just remember these four functions names;


You could use the MPE file I/O intrinsics if you didn't want to use the functions in stdio. You can even mix and match the two I/O facilities if you wish, just like you can in COBOL.

I am going to cover just two more COBOL verb comparisons before I get into showing some small program shells. Two of my favorite verbs are STRING and UNSTRING, they are used to concatenate variables and literals and to parse strings based on user defined tokens. In general they are very easy to use, their C counterparts however aren't. I will just run through the COBOL example and then show the exact same code in C.


           "." DELIMITED BY SIZE
           "." DELIMITED BY SIZE



#include <stdio>
#include <string>
char ws_full_name[27];
struct full_file_name {
char file[9];
char group[9];
char acct[9];
} fn;


/* displays "MYFILE.MYGROUP.MYACCT" */

You know, I don't think I am going to show the equivalent of UNSTRING, it is just to confusing if you are just getting started with C. The function is called 'strtok' and it requires that you use pointers to strings, and since I didn't really get into pointers at all I don't want to confuse the issue. Keep in mind however that you MUST understand how pointers in C work or you will never be able to use the language effectively. It's just that a full discussion of pointers is beyond the scope of this paper.

Anyway, in our C example we used the 'strcpy' command to copy a string into a variable. The function also adds the null terminator and essentially initializes our character array. Using the 'strcat' function, concatenates the string in the second parameter to the variable named in the first parameter. It looks for the null terminator and then starts writing the string onto it.

As you can see, it is more difficult and roundabout to deal with strings in C than COBOL. Now don't get the wrong idea, I think C is great for some things, it's just that if you are coding a standard business type application a lot of those things aren't necessary.


OK, so just what do you have to do to write a program in C? As you know, COBOL has it's four divisions that must be used;





Each of this has it's own purpose in life as to what it describes. In C everything is based on the function. As you saw in my previous example I had 'main()' towards the top of my program. You can think of main() as the PROCEDURE DIVISION. It is a function just like everything else in your C program will be, and it is usually the only function you MUST have. I say usually because if you are writing a series of subroutines that are going to go into an RBM you don't need to name function main().

So to write the classic 'Hello World' program in C, this is all you would need to do.

#include <stdio.h>
printf("Hello World\n");

That is a lot less code than it would take under COBOL, but it's still more than you would have to do in BASIC. The problem that I see here is that having to do the include of stdio.h just for the one printf() statement, cause's your program to be fairly large for something that is so trivial.


You hear a lot about how portable C is, and how you should code for portability. Let's be honest, how many of you are really concerned about moving your application across various platforms? The only real portability I would be interested in is MPE V to MPE/iX. There are just so many things that change from platform to platform that it could be really tedious to code for portability.

If you code to take advantage of your CPU's native architecture you will see a great increase in speed and reduced code size. If you code for portability by using the languages native constructs you will have a larger, slower program. If you want the best of both worlds you will want to code your own intermediate include files that allow you to switch between architectures. A good example is the file I/O intrinsics, if you always call my_fopen you can change what my_fopen does, call the MPE intrinsic FOPEN or the UNIX version, or even HPFOPEN for MPE/iX.

The last method is probably the preferred way to code if you want the best of both worlds. It isn't a trivial amount of work however, so seriously look at if you are interested in ultimate portability or not.


So what do I conclude from all this? Before I tell you, let me tell you about a letter I read that someone had sent into to the Computer Language magazine. The gentleman was outraged that they had bothered to include COBOL in their magazine, and if they were going to do that they might as well include RPG as well since neither one was a real language. His criteria for a real language was pretty much comprised of the few things you can do in C that you can't do in COBOL. Such as using pointers and declaring local variables. So why does this gentleman love a language so much that has no built in I/O facility, and deals with strings as an array of bytes? Obviously he has spent too much time bathed in the phosphor glow of his CRT.

Now this had to have been one of the more asinine statements I have ever seen. COBOL is one of the most popular languages for business applications, and while I don't care for RPG, IBM has made sure that it has maintained a huge installed base as well. So why do I bother to mention this? Well all the kids coming out of college these days are learning SQL, C and PASCAL, which makes them almost useless for the average HP 3000 shop. So don't let them talk you into something that they say is better just because they are familiar with it.

I would never ever write a user application system in C on the 3000. The language is just to cumbersome to handle most of what you want to do (in my opinion), and is just to foreign to the architecture. I would however use it for any of those special little subprograms that you use to use SPL for. C++ is a better C, and you don't have to use the OOP stuff in it if you don't want to, but to date no one has offered C++ on any of the 3000's. PASCAL wouldn't even be a bad choice, especially PASCAL/XL, although a creative C programmer can make their code look exactly like PASCAL if they want to, of course the only reason to do this would be to get around PASCAL's stringent type checking, but have the syntactic constructs.