Devel1

Thursday, January 14, 2010

C -internals--Malloc

The following information can be useful if you want to write your wrappers for malloc and free.
With wrappers we have the problem about not knowing how many bytes we are going to free. This is because we only have pointer to the memory location which we are going to free.

But first, A small information about memory chunks

1. Memory is allocated from a chunk of fixed size

2. chunk sizes are spaced at 16 byte ( on 64 bit machine)

chunk sizes: 16 ,32,48 ,64, 80,96.....

3. Chunk contains, 2 chunk sizes-4 byte each + user requested size

(Please see the GLIBC explanation and the diagram below to get better understanding)

so for example, you request 40 bytes, through a malloc(40) call

8+40 =48 bytes are required.

So we will get a chunk of 48 bytes in size

Suppose if we had asked for 41 bytes

8+41 =49

We will get a chunk of size 64 bytes

This might explain as to why you may not get memory corruption issues some times, even if you have written few bytes beyond the memory allocation you have asked for.

Wrappers

Now Back to our wrappers for malloc and free.

Following program can be used to track, how many bytes we are actually using, by adding counters to hymalloc(), hyrealloc() and hyfree()

The point to be noted is, how can we calculate the chunk size from memory pointer we have with us.

#include
#include

/*
* Some System dependent memory calculation techniques
*/

#ifndef SIZE_SZ_C
#define SIZE_SZ_C sizeof(size_t)
#endif
#ifndef INTERNAL_SIZE_T_C
#define INTERNAL_SIZE_T_C size_t
#endif

#define PREV_INUSE_C 0x1
#define NON_MAIN_ARENA_C 0x4
#define IS_MMAPPED_C 0x2

#define SIZE_BITS_C (PREV_INUSE_C|IS_MMAPPED_C|NON_MAIN_ARENA_C)

/* Get size, ignoring use bits */
#define chunksize_c(p) ((p)->size & ~(SIZE_BITS_C))

struct malloc_chunk_c {

INTERNAL_SIZE_T_C prev_size; /* Size of previous chunk (if free). */
INTERNAL_SIZE_T_C size; /* Size in bytes, including overhead. */

struct malloc_chunk_c* fd; /* double links -- used only if free. */
struct malloc_chunk_c* bk;

/* Only used for large blocks: pointer to next larger size. */
struct malloc_chunk_c* fd_nextsize; /* double links -- used only if free. */
struct malloc_chunk_c* bk_nextsize;
};
typedef struct malloc_chunk_c* mchunkptrm;

int hymemsize(void *ptr)
{

char * temp ;
mchunkptrm p; /* chunk corresponding to mem */
if(ptr == NULL)
return 0;

p = ((mchunkptrm)((char*)(ptr) - 2*SIZE_SZ_C));

printf("Chunk :0x%x size:%d \n",ptr, chunksize_c((p)));

return chunksize_c((p));
}

void *hymalloc(size_t size, int lno)
{
void * ptr;
int chunk;

ptr = malloc(size);
chunk =hymemsize(ptr);

printf("0x%x malloc size:%d chunksize %d\n",ptr, size,chunk);
return ptr;
}

void * hyrealloc( void * ptr ,size_t size,int lno)
{
int chunk, rechunk;

chunk = hymemsize(ptr);

ptr =(void *) realloc(ptr,size);
rechunk = hymemsize(ptr);

printf("0x%x realloc size:%d chunk=%d, rechunk=%d\n",ptr, size,chunk,rechunk);
return ptr;
}
void hyfree(void *ptr)
{
int chunk;

chunk = hymemsize(ptr);
printf("0x%x Free size:%d \n",ptr, chunk );

free(ptr);

}
/*
* End of Memory calculation
*
*/

int main(int argc, char *argv[])
{

char * memtest=NULL;
int msize=0;
if(argc !=2)
{
printf("Usage: memorytest n\n Where n is the number of bytes to allocate \n");
return -1;
}
else
{
msize = atoi(argv[1]);
}

memtest = (char *) hymalloc(msize,__LINE__);

hyfree(memtest);

return 0;
}

Output:

./memorytest 50

Chunk :0xc631010 size:64

0xc631010 malloc size:50 chunksize 64

Chunk :0xc631010 size:64

0xc631010 Free size:64

./memorytest 40

Chunk :0x173a2010 size:48

0x173a2010 malloc size:40 chunksize 48

Chunk :0x173a2010 size:48

0x173a2010 Free size:48

Just for quick reference

Some cut paste about malloc from GLIBC malloc.c file itself

Rather than me writing absured about how malloc works, its better I leave it to the original implementor.

GLIBC

/* glibc-2.10.1/malloc/malloc.c:1803:
malloc_chunk details:

(The following includes lightly edited explanations by Colin Plumb.)

Chunks of memory are maintained using a `boundary tag' method as
described in e.g., Knuth or Standish. (See the paper by Paul
Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
survey of such techniques.) Sizes of free chunks are stored both
in the front of each chunk and at the end. This makes
consolidating fragmented chunks into bigger chunks very fast. The
size fields also hold bits representing whether chunks are free or
in use.

An allocated chunk looks like this:

chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk, if allocated | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of chunk, in bytes |M|P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| User data starts here... .
. .
. (malloc_usable_size() bytes) .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of chunk |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

M - Mapped flage P - Previous in use Flag

Where "chunk" is the front of the chunk for the purpose of most of
the malloc code, but "mem" is the pointer that is returned to the
user. "Nextchunk" is the beginning of the next contiguous chunk.

Chunks always begin on even word boundries, so the mem portion
(which is returned to the user) is also on an even word boundary, and
thus at least double-word aligned.

Free chunks are stored in circular doubly-linked lists, and look like this:

chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of chunk, in bytes |P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Forward pointer to next chunk in list |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Back pointer to previous chunk in list |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Unused space (may be 0 bytes long) .
. .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of chunk, in bytes |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The P (PREV_INUSE) bit, stored in the unused low-order bit of the
chunk size (which is always a multiple of two words), is an in-use
bit for the *previous* chunk. If that bit is *clear*, then the
word before the current chunk size contains the previous chunk
size, and can be used to find the front of the previous chunk.
The very first chunk allocated always has this bit set,
preventing access to non-existent (or non-owned) memory. If
prev_inuse is set for any given chunk, then you CANNOT determine
the size of the previous chunk, and might even get a memory
addressing fault when trying to do so.

Note that the `foot' of the current chunk is actually represented
as the prev_size of the NEXT chunk. This makes it easier to
deal with alignments etc but can be very confusing when trying
to extend or adapt this code.

The two exceptions to all this are

1. The special chunk `top' doesn't bother using the
trailing size field since there is no next contiguous chunk
that would have to index off it. After initialization, `top'
is forced to always exist. If it would become less than
MINSIZE bytes long, it is replenished.

2. Chunks allocated via mmap, which have the second-lowest-order
bit M (IS_MMAPPED) set in their size fields. Because they are
allocated one-by-one, each must contain its own trailing size field.

*/

Debugging memory issues

On your shell do this

export MALLOC_CHECK_=3

This will start the internal memory check of malloc
After this while you run your program on that shell, any memory access violations,corruptions will be reported.

Reference:

http://gee.cs.oswego.edu/dl/html/malloc.html

Tuesday, January 5, 2010

NullCon Security Conference

Guys, get ready for security event of the year happening here in India. So all those hackers enthusiasts make it a point to be there.

Sunday, December 27, 2009

Linux Scheduler

Just few years back(about 2 years), I was reading about linux 2.6 scheduler, the O(1) scheduler. For me it seemed the perfect scheduler(I am no kernel expert or developer, but I keep crossing the kernel domain because of my work), since it was O(1). There were no lookups required. I wondered how much it would effect the performance of my application in the O(1) and the earlier schedulers.

But then it seems, there still problems associated with this scheduler too. The issue that is identified is in the complexity and time cost for calculating the priority for task.

As result of this we have a new scheduler called CFS (Completely Fair scheduler) in Linux 2.6.23

Wait.. it seems we can expect another new scheduler very soon.

References:

The O(1) Scheduler

Completely Fair Schedular

Tuesday, November 17, 2009

Event Driven Programming

The traditional method of socket programing is multi thread (pthread) or multi process i.e(fork).

But using them requires synchronization, locking and others. People who have used them may realize that some times it becomes a nightmare to trace a problem/crash.

One can still process multiple client request in a single thread, without blocking on the request. This is achieved by using event driven programing.

How to do

1. Use event notifiers such as poll/select/epoll/kqueue

2. Put your connections (i.e the socket file descriptors ) in non-blocking mode.

By doing so you do not wait for the entire data to be written or the entire data to be read at one go. You will read/write in chunk if buffer space or data is available else you can go a head doing other works like process the other client requests, by putting the current connection back into event notifier.

3. Maintain state information, in order to take appropriate action when a event takes place. Unlike the multi threaded, where a single thread as sequential logic.In event driven you will process a connection when a event occurs, for this you need to keep track about the state in which you left it last time.

4. Minimize the use of systems call or your code which can go into blocking, to avoid starvation of other connections.

Advantages

1. It saves the time in context switching, due to threads/process getting scheduling.

2. Saves the trouble of maintaining locking/synchronization (you may still have to use them in event driven programing, but to a small extent)

Sunday, November 15, 2009

sqlite3 Quick reference

I always keep forgetting the syntax of sqlite command line queries

Here are some which I use more frequently

1. To see the table names present in the current database

sqlite> .tables

2. To see the databases and files

sqlite> .databases

3. To know the structure of table

sqlite> .schema

4. Create a table

CREATE TABLE dummy (key varchar(250) UNIQUE, value BLOB, pid bigint )

5. To alter a table

sqlite> alter table dummy add key varchar(250);

6. To exit from sqlite shell, this I always keep forgetting.

sqlite> .quit

7. For help

sqlite> .help

8. How to find which process is locking the database
fuser path_to_sqlite_file

Wednesday, August 19, 2009

RPM Quick Reference

Following are some most frequently used rpm commands ....

1. How to find, a given file belongs to which rpm package
rpm -qf file name_with_path

2. How to find the files and there paths installed by a particular RPM package
rpm -ql procps-3.2.7-11.1.el5

3. How to find which rpm packages are installed
rpm -qa

4. How to find the version of the package installed
rpm -qa procps
procps-3.2.7-11.1.el5

5. How to query a uninstalled rpm package
rpm -qp the_rpm_package_to_query

6. How to find the files and there path of a uninstalled rpm package
rpm -qpl package.rpm

7. RPM Build from Source rpm

rpmbuild -ba rpm_spec file

rpmbuild -ta tarball

for more rpmbuild options see manpage

8. How to find dependencies of a installed rpm package

rpm -qR gcc-4.1.2-44.el5

Tuesday, August 11, 2009

SVN Quick Reference

SVN guide

1. checkout code
svn co file:///sourceof thsvnrepository

svn co svn+ssh:///host/sourceoftherepository_from_root

on linux machine I had problem related to permissions for users using svn+ssh://

instead of using the above one can use the following if u donot want to get into permissions problem

svn co svn:///host/sourceoftherepository_related_to_svn_repo

By this method, svn client connects to the svn server (port 3690), rather then using the ssh tunnel. which requires some system level permission handling to the file system.

2. adding a file to repository
svn add new file

3. commit your changes
svn commit file_to_comit -m "Message"

4. Create new branch from main trunk

svn mkdir branchname

svn copy trunk branchname

svn commit branchname -m "new branch"

Resources:
http://artis.imag.fr/~Xavier.Decoret/resources/svn/index.html

http://subversion.apache.org/docs/

http://svnbook.red-bean.com/

http://queens.db.toronto.edu/~nilesh/linux/subversion-howto/

http://blog.tplus1.com/index.php/2007/08/29/how-to-use-vimdiff-as-the-subversion-diff-tool/ - how to use svn with vimdiff