473,382 Members | 1,232 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

check to see if file contents are text.

I am trying my hand at indexing a folder, and then searching for the contents. i am using lupy as a base because of its simplicity

now, the problem with this is that it indexes png files and jpeg and what not as text documents, thus slowing down my stuff. so how do i check to see if the inards of the file is legit text or not. simply putting a filter on the extension txt wont work since i also want to index html and other documents.
Jul 27 '07 #1
1 10871
bartonc
6,596 Expert 4TB
I am trying my hand at indexing a folder, and then searching for the contents. i am using lupy as a base because of its simplicity

now, the problem with this is that it indexes png files and jpeg and what not as text documents, thus slowing down my stuff. so how do i check to see if the inards of the file is legit text or not. simply putting a filter on the extension txt wont work since i also want to index html and other documents.
I came up with a solution like this for checking to see if a password was encrypted. It just checks to see if there are non-ascii components of a string. You'd need to open each file in text mode, grab the data (or a chunk if it), then:
Expand|Select|Wrap|Line Numbers
  1. import re
  2. ReCheck = re.compile(".*[\x00-\x1f\x7f-\xff]+.*")
  3. if ReCheck.match(data):
  4.     print "Has non-ascii content"
Could work..
Jul 27 '07 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: Gema Gema | last post by:
I have a large collection of directories full of various files and am looking to create custom text files for the contents of each directory. Here is the situation: The directories are named...
1
by: semi | last post by:
Hello, I am using VS C++ to make a gui that displays a text file contents from main dialog. So I created main dialog and another dialog for text file viewer. From the main dialog, I call text...
33
by: Jason Heyes | last post by:
I would like to modify the contents of a file, replacing all occurances of one string with another. I wrote these functions: bool read_file(std::string name, std::string &s); bool...
11
by: Skc | last post by:
I have a .txt which has been exported as a .csv from an external source. What i need to do is to import this into SQL2000 (into a table) but I need to do special things on the data: 1. I need to...
4
by: Jim Michaels | last post by:
after a file upload, $_FILES is not populated but $_POST is. what's going on here? $_POST=C $_POST=C $_POST=C $_POST=C:\\www\\jimm\\images\\bg1.jpg $_FILES= $_FILES= $_FILES=
3
by: japi | last post by:
Hi, i would like to know what approach i should use to insert (not append) a line of text to the begin of an existing text file. My current approach works, but i am afraid of loosing data if...
3
by: Beliavsky | last post by:
A crude way to check if two files are the same on Windows is to look at the output of the "fc" function of cmd.exe, for example def files_same(f1,f2): cmnd = "fc " + f1 + " " + f2 return...
4
by: giftson.john | last post by:
Hi, I am creating an application which migrates all documents from one repository to another repository. Before migration i have to verify all the documents are unique. No duplicates has to be...
24
by: Bill | last post by:
Hello, I'm trying to output buffer content to a file. I either get an access violation error, or crazy looking output in the file depending on which method I use to write the file. Can anyone...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.