Convert multiple columns to rownames in R pipe

Convert multiple columns to rownames in R pipe

It is often the case in Genomics workflows to reshape the data to apply a statistical method or to visualize obvious patterns in the data. I use R language for 90% of my analyses and I would like to share some of the code I regulary use in my work.

Before joining the party, let me tell you that I love UNIX command line a lot. I disperately learnt the uses of piping commands and process substitution early on and addicted to it. Later, in the process of learning R in depth, I came to know that there exists something similar to piping in R. I started experimenting with it and somehow became comfortable using R piping in my day to day (epi)genomics analysis.

Here is a piece of R code that requires dplyr and tibble packags to format the data. E.g RNA-seq count tables or ChIP-seq count tables. Most read counting tools output the results in table formated text files. We always need polishing of these tables to proceed to downsstream analysis. A typical example of ChIP-seq count tables will look like,

cat notes_01.txt

chr	start	end	sample1	sample2	sample3
chr1	120	130	1	0	14
chr1	150	200	35	12	56
chr2	300	500	67	56	78
chr4	250	400	13	24	90
...
...

Above table should be formated in such a way that chr, start, end become rownames, so in the downstream process one can identify which regions are interesting from the observations of the analysis. Using dplyr & tibble package, following code with piping can result in what we need.

library(dplyr)

dat=read.delim("notes_01.txt", header = TRUE)

dat = dat %>% dplyr::mutate(region=paste(chr, start, end, sep="_")) %>% 
  dplyr::select(-c(chr, start, end)) %>% 
  tibble::column_to_rownames(var="region")

This will result in

             sample1 sample2 sample3
chr1_120_130       1       0      14
chr1_150_200      35      12      56
chr2_300_500      67      56      78
chr4_250_400      13      24      90

Now this modified table/data.frame can be used for clustering/visualization from within R. With a slight modification, this code can also be applied to RNA-seq count tables. That is for you to experiment.

Have fun!

comments powered by Disqus