Categories
R

Reading multiple files.

By now, we all are familiar with reading csv file into R. But, what if there is a block of operations that we need to perform on multiple files? I think that will be a quite tiring job to include each csv every time and run the script.

The best and the easiest way will be to automate the whole process for which we need to design a Rscript.

By now, we all are familiar with reading csv file into R. But, what if there is a block of operations that we need to perform on multiple files? I think that will be a quite tiring job to include each csv every time and run the script.

The best and the easiest way will be to automate the whole process for which we need to design a Rscript.

Step 1:  We begin by listing all the files in my working directory. We have specified the file format by mentioning “.csv ” as pattern.

file_list <- list.files(pattern="*.csv")

Step 2:  After listing, it’s time to find the number of csv files in the directory.

l <- length(file_list)

Step 3: Now, by running a loop, we can access the content of each csv file.

for (i in 1:l) {
  x <- read.csv(temp[i])
}

Yeah! by now we can read the contents of all the files automatically by running the  script.

Now, if you have the csv files with different number of columns and you want to work with specific columns of all the csv files, but the column number of that column is different in different csv file, it will be a quite difficult situation to handle.

Say, for an example, I have three files names “A.csv”, “B.csv” and “C.csv” and I want to work with “Entropy” Column of all the csv files, but it occurs as 3rd column in “A.csv”, 5th column in “B.csv” and and 9th column in “C.csv”. As there is no uniformity in the column number, it cannot be accessed dynamically as desired. This will be a great fallback in automating the process. So, what I would do is:

## checking if the name of the column is "Entropy"
if(collnames(x)[j]=="Entropy") {

  ## saving the original column name for future use
  y[j]<-colnames(x)[j]

  ## changing the name of the jth column
  colnames(x)[j]<-'test'

  ## accessing the column by it's name
  ent [q]<-entropy(table(x$test))

  ## again assigning the original column name to the jth column
  colnames(x)[j]<-y[j]
}

So, finally my RScript looks like this :

file_list <- list.files(pattern="*.csv")
l<-length(file_list)

for (i in 1:l) {
    
  x <- read.csv(temp[i])
  y <- names(x)

for( j in 1:ncol(x)) {

  if(collnames(x)[j]=="Entropy") {
     y[j]<-colnames(x)[j]
     csv<-c(csv,temp[i])
     colnames(x)[j]<-'test'
     ent [q]<-entropy(table(x$test))
     colnames(x)[j]<-y[j]
  }
  q<-q+1
}

df <- data.frame(csv=character(), entropy=character() , stringsAsFactors=FALSE)
df <- cbind(csv,attribute)

Hope this helps and saves lots of time and effort. Happy Mining!

One reply on “Reading multiple files.”

Leave a comment